888 - Cassidy Henry / Towards a register-sensitive model of human-computer dialogue

888 - Cassidy Henry / Towards a register-sensitive model of human-computer dialogue
Thursday January 16 at noon, Cassidy Henry defends their 888, "'Who do you think you’re talking to?!' Towards a Register-Sensitive Model of Human-Computer Dialogue," to a committee comprising Bill Idsardi, Naomi Feldman, Kate Mooney and alum Alayo Tripp *19, Assistant Professor at the University of Florida. The 888, abstracted below, develops the proposal that "contrasting speech directed to humans and to machines provides an excellent opportunity to discern what how perception of interlocutor category membership impacts communicative behavior."
A new interlocutor has become ubiquitous in society – the machine. However, little is known about how humans specifically modulate their speech behavior towards machines. Akin to other registers of speech directedness, prosodic features may play a crucial role in discrimination of this register.
To demonstrate the role of prosody in distinguishing machine directed speech from other registers, American English speaking, U.S. born participants were recruited using the Prolific Platform. (N=80) They each performed a perceptual task, labeling either unmodified speech samples (standard condition) or low-pass filtered speech (low pass filter condition). Low-pass filtered speech serves as a condition to rule in the influence of prosody, where cues are unavailable to identify lexical content. Participants were presented with machine directed speech samples alongside two registers of speech known to be perceptually distinguishable: adult-human-directed speech, and infant-directed speech (Matsuda et al 2011).
Participants were able to reliably discriminate in all conditions and for all registers, providing two kinds of confirmatory evidence that pre-lexically, prosodic behavior guides linguistic awareness.
To create a resource facilitating further study of the variation between machine and adult directed speech, participant consent was obtained to publicly publish production data. Employing an imagined addressee paradigm similar to that of Cohn et al 2024, each participant was directed to produce speech directed to one of three addressee types: a friend/loved one, a stranger, and a machine. This parallel corpus of speech will support future work to create a parallelized benchmark of human abilities, to serve as the basis of analysis in computational modeling studies for prosodic register, and will serve as a public speech corpus resource upon completion of the project.