Skip to main content
Skip to main content

Research

Research at our top-ranked department spans syntax, semantics, phonology, language acquisition, computational linguistics, psycholinguistics and neurolinguistics. 

Connections between our core competencies are strong, with theoretical, experimental and computational work typically pursued in tandem.

A network of collaboration at all levels sustains a research climate that is both vigorous and friendly. Here new ideas develop in conversation, stimulated by the steady activity of our labs and research groups, frequent student meetings with faculty, regular talks by local and invited scholars and collaborations with the broader University of Maryland language science community, the largest and most integrated language science research community in North America.

Show activities matching...

filter by...

Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Questioning automatic coherence evaluations for neural topic models.

Linguistics

Contributor(s): Philip Resnik
Non-ARHU Contributor(s):

Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber

Dates:

Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Recent models relying on neural components surpass classical topic models according to these metrics. At the same time, unlike classical models, the practice of neural topic model evaluation suffers from a validation gap: automatic coherence for neural models has not been validated using human experimentation. In addition, as we show via a meta-analysis of topic modeling literature, there is a substantial standardization gap in the use of automated topic modeling benchmarks. We address both the standardization gap and the validation gap. Using two of the most widely used topic model evaluation datasets, we assess a dominant classical model and two state-of-the-art neural models in a systematic, clearly documented, reproducible way. We use automatic coherence along with the two most widely accepted human judgment tasks, namely, topic rating and word intrusion. Automated evaluation will declare one model significantly different from another when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.

Read More about Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Informativity, topicality, and speech cost: comparing models of speakers’ choices of referring expressions

Is use of a pronoun motivated by topicality or efficiency?

Linguistics

Contributor(s): Naomi Feldman
Non-ARHU Contributor(s):

Naho Orita *15 (Tokyo University of Science)

Dates:

This study formalizes and compares two major hypotheses in speakers’ choices of referring expressions: the topicality model that chooses a form based on the topicality of the referent, and the rational model that chooses a form based on the informativity of the form and its speech cost. Simulations suggest that both the topicality of the referent and the informativity of the word are important to consider in speakers’ choices of reference forms, while a speech cost metric that prefers shorter forms may not be.

Read More about Informativity, topicality, and speech cost: comparing models of speakers’ choices of referring expressions

Linguistic meanings as cognitive instructions

"More" and "most" do not encode the same sorts of comparison.

Linguistics

Contributor(s): Tyler Knowlton, Paul Pietroski, Jeffrey Lidz
Non-ARHU Contributor(s):

Tim Hunter *10 (UCLA), Alexis Wellwood *14 (USC), Darko Odic (University of British Columbia), Justin Halberda (Johns Hopkins University),

Dates:

Natural languages like English connect pronunciations with meanings. Linguistic pronunciations can be described in ways that relate them to our motor system (e.g., to the movement of our lips and tongue). But how do linguistic meanings relate to our nonlinguistic cognitive systems? As a case study, we defend an explicit proposal about the meaning of most by comparing it to the closely related more: whereas more expresses a comparison between two independent subsets, most expresses a subset–superset comparison. Six experiments with adults and children demonstrate that these subtle differences between their meanings influence how participants organize and interrogate their visual world. In otherwise identical situations, changing the word from most to more affects preferences for picture–sentence matching (experiments 1–2), scene creation (experiments 3–4), memory for visual features (experiment 5), and accuracy on speeded truth judgments (experiment 6). These effects support the idea that the meanings of more and most are mental representations that provide detailed instructions to conceptual systems.

Read More about Linguistic meanings as cognitive instructions

Social inference may guide early lexical learning

Assessment of knowledgeability and group membership influences infant word learning.

Linguistics

Contributor(s): Naomi Feldman, William Idsardi
Non-ARHU Contributor(s):

Alayo Tripp *19

Dates:

We incorporate social reasoning about groups of informants into a model of word learning, and show that the model accounts for infant looking behavior in tasks of both word learning and recognition. Simulation 1 models an experiment where 16-month-old infants saw familiar objects labeled either correctly or incorrectly, by either adults or audio talkers. Simulation 2 reinterprets puzzling data from the Switch task, an audiovisual habituation procedure wherein infants are tested on familiarized associations between novel objects and labels. Eight-month-olds outperform 14-month-olds on the Switch task when required to distinguish labels that are minimal pairs (e.g., “buk” and “puk”), but 14-month-olds' performance is improved by habituation stimuli featuring multiple talkers. Our modeling results support the hypothesis that beliefs about knowledgeability and group membership guide infant looking behavior in both tasks. These results show that social and linguistic development interact in non-trivial ways, and that social categorization findings in developmental psychology could have substantial implications for understanding linguistic development in realistic settings where talkers vary according to observable features correlated with social groupings, including linguistic, ethnic, and gendered groups.

Read More about Social inference may guide early lexical learning

Japanese children's knowledge of the locality of "zibun" and "kare"

Initial errors in the acquisition of the Japanese local- or long-distance anaphor "zibun."

Linguistics

Contributor(s): Jeffrey Lidz, Naomi Feldman
Non-ARHU Contributor(s):

Naho Orita *15, Hajime Ono *06

Dates:

Although the Japanese reflexive zibun can be bound both locally and across clause boundaries, the third-person pronoun kare cannot take a local antecedent. These are properties that children need to learn about their language, but we show that the direct evidence of the binding possibilities of zibun is sparse and the evidence of kare is absent in speech to children, leading us to ask about children’s knowledge. We show that children, unlike adults, incorrectly reject the long-distance antecedent for zibun, and while being able to access this antecedent for a non-local pronoun kare, they consistently reject the local antecedent for this pronoun. These results suggest that children’s lack of matrix readings for zibun is not due to their understanding of discourse context but the properties of their language understanding.

Read More about Japanese children's knowledge of the locality of "zibun" and "kare"

Debate Reaction Ideal Points: Political Ideology Measurement Using Real-Time Reaction Data

Estimating an individual's ideology from their real-time reactions to presidential debates.

Linguistics

Contributor(s): Philip Resnik
Non-ARHU Contributor(s):

Daniel Argyle, Lisa P. Argyle, Vlad Eidelman

Dates:

Ideal point models have become a powerful tool for defining and measuring the ideology of many kinds of political actors, including legislators, judges, campaign donors, and members of the general public. We extend the application of ideal point models to the public using a novel data source: real-time reactions to statements by candidates in the 2012 presidential debates. Using these reactions as inputs to an ideal point model, we estimate individual-level ideology and evaluate the quality of the measure. Debate reaction ideal points provide a method for estimating a continuous, individual-level measure of ideology that avoids survey response biases, provides better estimates for moderates and the politically unengaged, and reflects the content of salient political discourse relevant to viewers’ attitudes and vote choices. As expected, we find that debate reaction ideal points are more extreme among respondents who strongly identify with a political party, but retain substantial within-party variation. Ideal points are also more extreme among respondents who are more politically interested. Using topical subsets of the debate statements, we find that ideal points in the sample are more moderate for foreign policy than for economic or domestic policy.

Read More about Debate Reaction Ideal Points: Political Ideology Measurement Using Real-Time Reaction Data

A direct comparison of theory-driven and machine learning prediction of suicide: A meta-analysis

Machine learning models are better than models driven by psychological theories in predicting suicidal ideation and suicide attempts.

Linguistics

Contributor(s): Philip Resnik
Non-ARHU Contributor(s):

Katherine M. Schafer, Grace Kennedy, Austin Gallyer

Dates:

Theoretically-driven models of suicide have long guided suicidology; however, an approach employing machine learning models has recently emerged in the field. Some have suggested that machine learning models yield improved prediction as compared to theoretical approaches, but to date, this has not been investigated in a systematic manner. The present work directly compares widely researched theories of suicide (i.e., BioSocial, Biological, Ideation-to-Action, and Hopelessness Theories) to machine learning models, comparing the accuracy between the two differing approaches. We conducted literature searches using PubMed, PsycINFO, and Google Scholar, gathering effect sizes from theoretically-relevant constructs and machine learning models. Eligible studies were longitudinal research articles that predicted suicide ideation, attempts, or death published prior to May 1, 2020. 124 studies met inclusion criteria, corresponding to 330 effect sizes. Theoretically-driven models demonstrated suboptimal prediction of ideation (wOR = 2.87; 95% CI, 2.65–3.09; k = 87), attempts (wOR = 1.43; 95% CI, 1.34–1.51; k = 98), and death (wOR = 1.08; 95% CI, 1.01–1.15; k = 78). Generally, Ideation-to-Action (wOR = 2.41, 95% CI = 2.21–2.64, k = 60) outperformed Hopelessness (wOR = 1.83, 95% CI 1.71–1.96, k = 98), Biological (wOR = 1.04; 95% CI .97–1.11, k = 100), and BioSocial (wOR = 1.32, 95% CI 1.11–1.58, k = 6) theories. Machine learning provided superior prediction of ideation (wOR = 13.84; 95% CI, 11.95–16.03; k = 33), attempts (wOR = 99.01; 95% CI, 68.10–142.54; k = 27), and death (wOR = 17.29; 95% CI, 12.85–23.27; k = 7). Findings from our study indicated that across all theoretically-driven models, prediction of suicide-related outcomes was suboptimal. Notably, among theories of suicide, theories within the Ideation-to-Action framework provided the most accurate prediction of suicide-related outcomes. When compared to theoretically-driven models, machine learning models provided superior prediction of suicide ideation, attempts, and death.

Read More about A direct comparison of theory-driven and machine learning prediction of suicide: A meta-analysis

Chain reduction via substitution: Evidence from Mayan

Extraction out of adjuncts in K'ichean languages shows that "overt traces" are possible.

Linguistics

Non-ARHU Contributor(s):

Gesoel Mendes *20, Rodrigo Ranero *21

Dates:
Publisher: Glossa

We argue that deletion is not the only way that chain links created by A′-movement can be affected at PF. Chain links can also be substituted by a morpheme. This substitution delivers a linearizable output (in a manner parallel to deletion), creating overt “traces” on the surface. We demonstrate the virtues of our proposal through the empirical lens of adjunct extraction in two Mayan languages of the K’ichean branch: K’iche’ and Kaqchikel. In these languages, extraction of low adjuncts triggers the appearance of a verbal enclitic wi. The distribution of the enclitic upon long distance extraction shows that it must be analyzed as a surface reflex of substitution of a chain link. Our proposal provides evidence that movement proceeds successive cyclically and has two additional theoretical consequences: (i) C0 must be a phase head (contra den Dikken 2009; 2017), (ii) v0 cannot be a phase head (in line with Keine 2017).

Read More about Chain reduction via substitution: Evidence from Mayan

Optional agreement in Santiago Tz'utujil (Mayan) is syntactic

Agreement is optional only for complements, and is conditioned by whether the argument is a DP or a reduced nominal.

Linguistics

Non-ARHU Contributor(s):

Theodore Levin, Paulina Lyskawa *21 and Rodrigo Ranero *21

Dates:
Publisher: Zeitschrift für Sprachwissenschaft

Some Mayan languages display optional verbal agreement with 3pl arguments (Dayley 1985; Henderson 2009; England 2011). Focusing on novel data from Santiago Tz’utujil (ST), we demonstrate that this optionality is not reducible to phonological or morphological factors. Rather, the source of optionality is in the syntax. Specifcally, the distinction between arguments generated in the specifer position and arguments generated in the complement position governs the pattern. Only base-complements control agreement optionally; base-specifers control agreement obligatorily. We provide an analysis in which optional agreement results from the availability of two syntactic representations (DP vs. reduced nominal argument). Thus, while the syntactic operation Agree is deterministic, surface optionality arises when the operation targets two diferent sized goals.

Read More about Optional agreement in Santiago Tz'utujil (Mayan) is syntactic

Proxy Control: A new species of control in grammar

In German and Italian, 'Maria asked Bill to leave early' may be used to mean that Maria sought permission for people she represents. Aaron and Sandhya provide an analysis.

Linguistics

Contributor(s): Aaron Doliana
Non-ARHU Contributor(s):

Sandhya Sundaresan

Dates:
Publisher: Natural Language and Linguistic Theory

The control dependency in grammar is conventionally distinguished into two classes: exhaustive (ii) and non-exhaustive (ii + (j)). Here, we show that, in languages like German and Italian, some speakers allow a new kind of “proxy control” which differs from both, such that, for a controller i, and a controllee jj = proxy(i). The proxy function picks out a set of individuals that is discourse-pragmatically related to i. For such speakers, the German/Italian proxy control equivalent of the sentence: “Mariai asked Billj (for permission) [PRO𝑝𝑟𝑜𝑥𝑦(𝑖)proxy(i) to leave work early]” would thus mean that Maria asked Bill for permission for some salient set of individuals related to herself to leave early. We examine the theoretical and empirical properties of this new control relation in detail, showing that it is irreducible to other, more familiar referential dependencies. Using standard empirical diagnostics, we then illustrate that proxy control can be instantiated both as a species of obligatory control (OC) and non-obligatory control (NOC) in German and Italian and develop a syntactic and semantic model that derives each and details the factors conditioning the choice between the two. We also investigate the factors that condition different degrees of exhaustiveness (exhaustive vs. partial vs. proxy) in control, which then sheds light on why proxy control obtains in some languages, but not others and, within a language, is possible for some speakers but not others.

Read More about Proxy Control: A new species of control in grammar