Research

Dates: Mon, 02/01/2010 - 12:00

This paper examines the formalization of negative concord in terms of the Minimalist Program, focusing entirely on negative concord in West Flemish. It is shown that a recent analysis of negative concord which advocates Multiple Agree is empirically inadequate. Instead of Multiple Agree, it is argued that a particular implementation of the simpler and less powerful binary Agree is superior in deriving the data in question.

A lexical basis for N400 context effects: Evidence from MEG

Within-subject MEG studies on the topography of N400 effects suggest that such effects reflect facilitated access to lexical information, and not difficulty integrating a word with its semantic context.

Linguistics

Contributor(s): Ellen Lau
Non-ARHU Contributor(s):

Diogo Almeida, Paul Hines, David Poeppel

Dates: Wed, 10/07/2009 - 12:00

The electrophysiological response to words during the ‘N400’ time window (approximately 300–500 ms post-onset) is affected by the context in which the word is presented, but whether this effect reflects the impact of context on access of the stored lexical information itself or, alternatively, post-access integration processes is still an open question with substantive theoretical consequences. One challenge for integration accounts is that contexts that seem to require different levels of integration for incoming words (i.e., sentence frames vs. prime words) have similar effects on the N400 component measured in ERP. In this study we compare the effects of these different context types directly, in a within-subject design using MEG, which provides a better opportunity for identifying topographical differences between electrophysiological components, due to the minimal spatial distortion of the MEG signal. We find a qualitatively similar contextual effect for both sentence frame and prime-word contexts, although the effect is smaller in magnitude for shorter word prime contexts. Additionally, we observe no difference in response amplitude between sentence endings that are explicitly incongruent and target words that are simply part of an unrelated pair. These results suggest that the N400 effect does not reflect semantic integration difficulty. Rather, the data are consistent with an account in which N400 reduction reflects facilitated access of lexical information.

Dates: Thu, 10/01/2009 - 12:00

According to Kratzer (2003), the thematic relation Theme, construed very generally, is not a ‘‘natural relation.’’ She says that the ‘‘natural relations’’ are ‘‘cumulative’’ and argues that Theme is not cumulative, in contrast to Agent. It is therefore best, she concludes, to remove Theme from the palette of semantic analysis. Here I oppose the premises of Kratzer’s argument and then introduce a new challenge to her conclusion, based on the resultative construction in Mandarin. The facts show that Theme and Agent are on equal footing, insofar as neither has the property that Kratzer’s conjecture requires of a natural relation.

Dates: Fri, 09/11/2009 - 12:00

This paper discusses the interaction of aspect and modality, and focuses on the puzzling implicative effect that arises when perfective aspect appears on certain modals: perfective somehow seems to force the proposition expressed by the complement of the modal to hold in the actual world, and not merely in some possible world. I show that this puzzling behavior, originally discussed in Bhatt (1999) for the ability modal, extends to all modal auxiliaries with a circumstantial modal base (i.e., root modals), while epistemic interpretations of the same modals are immune to the effect. I propose that implicative readings are contingent on the relative position of the modal w.r.t. aspect: when aspect scopes over the modal (as I argue is the case for root modals), it forces an actual event, thereby yielding an implicative reading. When a modal element scopes over aspect, no actual event is forced. This happens (i) with epistemics, which structurally appear above tense and aspect; (ii) with imperfective on a root modal: imperfective brings in an additional layer of modality, itself responsible for removing the necessity for an actual event. This proposal enables us to solve the puzzle while maintaining a standardized semantics for aspects and modals.

Dates: Tue, 09/01/2009 - 12:00

We identify three components of any learning theory: the representations, the learner’s data intake, and the learning algorithm. With these in mind, we model the acquisition of the English anaphoric pronoun one in order to identify necessary constraints for successful acquisition, and the nature of those constraints. Whereas previous modeling efforts have succeeded by using a domain-general learning algorithm that implicitly restricts the data intake to be a subset of the input, we show that the same kind of domain-general learning algorithm fails when it does not restrict the data intake. We argue that the necessary data intake restrictions are domain-specific in nature. Thus, while a domain-general algorithm can be quite powerful, a successful learner must also rely on domain-specific learning mechanisms when learning anaphoric one.

On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch

Already within 25-40 ms of an acoustic speech stimulus, brain cortex is calculating abstract representations of the signal that are sensitive to phonological constraints.

Linguistics

Non-ARHU Contributor(s):

Phil Monahan

Dates: Sat, 08/01/2009 - 12:00

The goal of this dissertation is to show that even at the earliest (non-invasive) recordable stages of auditory cortical processing, we find evidence that cortex is calculating abstract representations from the acoustic signal. Looking across two distinct domains (inferential pitch perception and vowel normalization), I present evidence demonstrating that the M100, an automatic evoked neuromagnetic component that localizes to primary auditory cortex is sensitive to abstract computations. The M100 typically responds to physical properties of the stimulus in auditory and speech perception and integrates only over the first 25 to 40 ms of stimulus onset, providing a reliable dependent measure that allows us to tap into early stages of auditory cortical processing. In Chapter 2, I briefly present the episodicist position on speech perception and discuss research indicating that the strongest episodicist position is untenable. I then review findings from the mismatch negativity literature, where proposals have been made that the MMN allows access into linguistic representations supported by auditory cortex. Finally, I conclude the Chapter with a discussion of the previous findings on the M100/N1. In Chapter 3, I present neuromagnetic data showing that the response properties of the M100 are sensitive to the missing fundamental component using well-controlled stimuli. These findings suggest that listeners are reconstructing the inferred pitch by 100 ms after stimulus onset. In Chapter 4, I propose a novel formant ratio algorithm in which the third formant (F3) is the normalizing factor. The goal of formant ratio proposals is to provide an explicit algorithm that successfully "eliminates" speaker-dependent acoustic variation of auditory vowel tokens. Results from two MEG experiments suggest that auditory cortex is sensitive to formant ratios and that the perceptual system shows heightened sensitivity to tokens located in more densely populated regions of the vowel space. In Chapter 5, I report MEG results that suggest early auditory cortical processing is sensitive to violations of a phonological constraint on sound sequencing, suggesting that listeners make highly specific, knowledge-based predictions about rather abstract anticipated properties of the upcoming speech signal and violations of these predictions are evident in early cortical processing.

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

A novel application of fine-grained linguistic soft constraints, syntactic and semantic, to statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems.

Linguistics

Non-ARHU Contributor(s): Yuval Marton

Dates: Sat, 08/01/2009 - 12:00

This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with linguistic knowledge sources that are based on manual text annotation or word grouping according to semantic commonalities. I gainfully apply fine-grained linguistic soft constraints -- of syntactic or semantic nature -- on statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems. The introduction of semantic soft constraints involves intrinsic evaluation on word-pair similarity ranking tasks, extension from words to phrases, application in a novel distributional paraphrase generation technique, and an introduction of a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined. Fine granularity is key in the successful combination of these soft constraints, in many cases. I show how to softly constrain SMT models by adding fine-grained weighted features, each preferring translation of only a specific syntactic constituent. Previous attempts using coarse-grained features yielded negative results. I also show how to softly constrain corpus-based semantic models of words (“distributional profiles”) to effectively create word-sense-aware models, by using semantic word grouping information found in a manually compiled thesaurus. Previous attempts, using hard constraints and resulting in aggregated, coarse-grained models, yielded lower gains. A novel paraphrase generation technique incorporating these soft semantic constraints is then also evaluated in a SMT system. This paraphrasing technique is based on the Distributional Hypothesis. The main advantage of this novel technique over current “pivoting” techniques for paraphrasing is the independence from parallel texts, which are a limited resource. The evaluation is done by augmenting translation models with paraphrase-based translation rules, where fine-grained scoring of paraphrase-based rules yields significantly higher gains. The model augmentation includes a novel semantic reinforcement component: In many cases there are alternative paths of generating a paraphrase-based translation rule. Each of these paths reinforces a dedicated score for the "goodness"; of the new translation rule. This augmented score is then used as a soft constraint, in a weighted log-linear feature, letting the translation model learn how much to “trust” the paraphrase-based translation rules. The work reported here is the first to use distributional semantic similarity measures to improve performance of an end-to-end phrase-based SMT system. The unified framework for statistical NLP models with soft linguistic constraints enables, in principle, the combination of both semantic and syntactic constraints -- and potentially other constraints, too -- in a single SMT model.

Dates: Sat, 08/01/2009 - 12:00

This dissertation examines the elliptical structures of (a) sluicing (John called someone, but I don't know who!), (b) fragment answers (A: Who did John call?, B: Mary!), (c) gapping (John is eating ice-cream, and Mary apple pie!), and (d) Right Node Raising (John cooked and Mary ate the apple pie!) in Turkish and gives a PF-deletion-based analysis of all these elliptical structures. As to sluicing and fragment answers, evidence in support of PF-deletion comes from P-(non-)stranding and Case Matching, respectively. Further, these elliptical structures are island-insensitive in Turkish. As to gapping, this study gives a movement + deletion' analysis, in which remnants in the second conjunct raise to the left periphery of the second conjunct and the rest of the second conjunct is elided. One striking property of gapping in Turkish is that it is a root phenomenon; in other words, it cannot occur in complement clauses, for instance. As to Right Node Raising, again, a PF-deletion analysis is given: the identical element(s) in the first conjunct is/are elided under identity with (an) element(s) in the second conjunct. The striking property of RNR is that remnants in this elliptical structure may not be clause-mate, in contrast to other elliptical structures -where remnants can be non-clause-mate under very specific contexts. This, I suggest, is due to the fact that PF-deletion in RNR applies at a later derivational stage than in other elliptical structures. In this stage, a syntactic derivation consists of linearized (sub-)lexical forms, where there is no hierarchical representation. This also suggests that Markovian system exists in grammar. In brief, this thesis looks at different elliptical structures in Turkish, and gives arguments for PF-deletion for all these elliptical structures, which has interesting implications for the generative theory.

Beyond Statistical Learning in the Acquisition of Phrase Structure

A comparison of statistical learning of language-like patterns by adults, children, and Simple Recurrent Networks, aimed at discovering how the range of possibile human grammars might be constrained by innate linguistic knowledge.

Linguistics

Non-ARHU Contributor(s):

Eri Takahashi

Dates: Sat, 08/01/2009 - 12:00

The notion that children use statistical distributions present in the input to acquire various aspects of linguistic knowledge has received considerable recent attention. But the roles of learner's initial state have been largely ignored in those studies. What remains unclear is the nature of learner's contribution. At least two possibilities exist. One is that all that learners do is to collect and compile accurately predictive statistics from the data, and they do not have antecedently specified set of possible structures (Elman, et al. 1996; Tomasello 2000). On this view, outcome of the learning is solely based on the observed input distributions. A second possibility is that learners use statistics to identify particular abstract syntactic representations (Miller & Chomsky 1963; Pinker 1984; Yang 2006). On this view, children have predetermined linguistic knowledge on possible structures and the acquired representations have deductive consequences beyond what can be derived from the observed statistical distributions alone. This dissertation examines how the environment interacts with the structure of the learner, and proposes a linking between distributional approach and nativist approach to language acquisition. To investigate this more general question, we focus on how infants, adults and neural networks acquire the phrase structure of their target language. This dissertation presents seven experiments, which show that adults and infants can project their generalizations to novel structures, while the Simple Recurrent Network fails. Moreover, it will be shown that learners' generalizations go beyond the stimuli, but those generalizations are constrained in the same ways that natural languages are constrained. This is compatible with the view that statistical learning interacts with inherent representational system, but incompatible with the view that statistical learning is the sole mechanism by which the existence of phrase structure is discovered. This provides novel evidence that statistical learning interacts with innate constraints on possible representations, and that learners have a deductive power that goes beyond the input data. This suggests that statistical learning is used merely as a method for mapping the surface string to abstract representation, while innate knowledge specifies range of possible grammars and structures.

Dates: Wed, 07/01/2009 - 12:00

This dissertation explores the hypothesis that predictive processing—the access and construction of internal representations in advance of the external input that supports them—plays a central role in language comprehension. Linguistic input is frequently noisy, variable, and rapid, but it is also subject to numerous constraints. Predictive processing could be a particularly useful approach in language comprehension, as predictions based on the constraints imposed by the prior context could allow computation to be speeded and noisy input to be disambiguated. Decades of previous research have demonstrated that the broader sentence context has an effect on how new input is processed, but less progress has been made in determining the mechanisms underlying such contextual effects. This dissertation is aimed at advancing this second goal, by using both behavioral and neurophysiological methods to motivate predictive or top-down interpretations ofcontextual effects and to test particular hypotheses about the nature of the predictive mechanisms in question. The first part of the dissertation focuses on the lexical-semantic predictions made possible by word and sentence contexts. MEG and fMRI experiments, in conjunction with a meta-analysis of the previous neuroimaging literature, support the claim that an ERP effect classically observed in response to contextual manipulations—the N400 effect—reflects facilitation in processing due to lexical- semantic predictions, and that these predictions are realized at least in part through top-down changes in activity in left posterior middle temporal cortex, the cortical region thought to represent lexical-semantic information in long-term memory. The second part of the dissertation focuses on syntactic predictions. ERP and reaction time data suggest that the syntactic requirements of the prior context impacts processing of the current input very early, and that predicting the syntactic position in which the requirements can be fulfilled may allow the processor to avoid a retrieval mechanism that is prone to similarity-based interference errors. In sum, the results described here are consistent with the hypothesis that a significant amount of language comprehension takes place in advance of the external input, and suggest future avenues of investigation towards understanding the mechanisms that make this possible.

Featured Research News

Mayfest 2025 - Constraints on Meaning

Negative Concord and (Multiple) Agree: A Case Study of West Flemish

A conservative, Agree-based account of negative concord in West Flemish.

A lexical basis for N400 context effects: Evidence from MEG

Within-subject MEG studies on the topography of N400 effects suggest that such effects reflect facilitated access to lexical information, and not difficulty integrating a word with its semantic context.

Themes, cumulativity, and resultatives: Comments on Kratzer 2003

Alexander Williams argues against Kratzer's claim that direct objects do not in general bind a general thematic relation.

On the Interaction of Aspect and Modal Auxiliaries

"Sam was able to eat a dozen eggs" may imply that Sam did eat a dozen eggs. Such implications arise, argues Valentine Hacquard, from interactions between the modal predicate and perfective or imperfective aspect.

When Domain General Learning Succeeds and When it Fails

Learning how to interpret anaphoric "one" requires domain-specific mechanisms.

On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch

Already within 25-40 ms of an acoustic speech stimulus, brain cortex is calculating abstract representations of the signal that are sensitive to phonological constraints.

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

A novel application of fine-grained linguistic soft constraints, syntactic and semantic, to statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems.

Dimensions of Ellipsis: Investigations in Turkish

A PF deletion account of Sluicing, Gapping, Right Node Raising and fragment answers in Turkish.

Beyond Statistical Learning in the Acquisition of Phrase Structure

A comparison of statistical learning of language-like patterns by adults, children, and Simple Recurrent Networks, aimed at discovering how the range of possibile human grammars might be constrained by innate linguistic knowledge.

The Predictive Nature of Language Comprehension

Data from fMRI, MEG and EEG show that predictive processing plays a central role in language comprehension, for instance by facilitating lexical access, as indexed by N400 effects in ERP.

Research

Research Areas

Featured Research News

Mayfest 2025 - Constraints on Meaning

Show activities matching...

filter by...

Negative Concord and (Multiple) Agree: A Case Study of West Flemish

A conservative, Agree-based account of negative concord in West Flemish.

A lexical basis for N400 context effects: Evidence from MEG

Within-subject MEG studies on the topography of N400 effects suggest that such effects reflect facilitated access to lexical information, and not difficulty integrating a word with its semantic context.

Themes, cumulativity, and resultatives: Comments on Kratzer 2003

Alexander Williams argues against Kratzer's claim that direct objects do not in general bind a general thematic relation.

On the Interaction of Aspect and Modal Auxiliaries

"Sam was able to eat a dozen eggs" may imply that Sam did eat a dozen eggs. Such implications arise, argues Valentine Hacquard, from interactions between the modal predicate and perfective or imperfective aspect.

When Domain General Learning Succeeds and When it Fails

Learning how to interpret anaphoric "one" requires domain-specific mechanisms.

On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch

Already within 25-40 ms of an acoustic speech stimulus, brain cortex is calculating abstract representations of the signal that are sensitive to phonological constraints.

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

A novel application of fine-grained linguistic soft constraints, syntactic and semantic, to statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems.

Dimensions of Ellipsis: Investigations in Turkish

A PF deletion account of Sluicing, Gapping, Right Node Raising and fragment answers in Turkish.

Beyond Statistical Learning in the Acquisition of Phrase Structure

A comparison of statistical learning of language-like patterns by adults, children, and Simple Recurrent Networks, aimed at discovering how the range of possibile human grammars might be constrained by innate linguistic knowledge.

The Predictive Nature of Language Comprehension

Data from fMRI, MEG and EEG show that predictive processing plays a central role in language comprehension, for instance by facilitating lexical access, as indexed by N400 effects in ERP.