Updated February 2025
Temporal feedback perturbation (TFP) studies show that speakers can learn compensatory responses when the timing of the perturbation is predictable, and that phonological factors such as stress and segment identity can modulate the response. This study investigates whether similar response patterns occur when the timing of TFP relative to an utterance cannot be anticipated, which we refer to as unpredictable temporal feedback perturbation (UTFP).
- Fengyue Zhao, Sam Tilsen. Syllable Position Prominence in Unsupervised Neural Network Segment Categorization. LabPhon 19. June 27 - 29 2024. Hanyang University, Seoul, South Korea. Motivation Infants initially discriminate most sound contrasts but quickly attune to those of their native language. This raises the question: how do infants identify the relevant acoustic dimensions for learning phonetic categories? The distributional learning account proposes that infants track the distribution of sounds, and identify acoustic dimensions as contrastive if their distribution has two or more distinct peaks (i.e. multimodal distributions) [1]. However, while multimodality appear in controlled experiments, they are rarely found in naturalistic, highly variable speech, suggesting that multimodality is not a reliable way to identify contrastive dimensions [2]. Recent work comparing languages with/without vowel length contrasts suggests that even without multimodality, contrastive dimensions show more contextual variability: when a dimension is contrastive, the shape of its distribution will vary more across contexts [3]. The distributional learning across contexts hypothesis proposes that infants utilize this contextual variability to distinguish phonetic categories. This study tests this hypothesis by examining Hong Kong Cantonese tones, exploring whether ease of acquiring different tonal contrasts is linked to their contextual variability in distribution shape. Cantonese serves as a valuable test case due to the overlapping acoustic distributions between its six tones: high-level (T1), high-rising (T2), mid-level (T3), low-falling (T4), low-rising (T5), and low-level (T6).
- Fengyue Zhao, Sam Tilsen. Syllable Position Prominence in Unsupervised Neural Network Segment Categorization. LabPhon 19. June 27 - 29 2024. Hanyang University, Seoul, South Korea. Motivation English obstruents exhibit diverse phonetic realizations across syllable positions, like /t/ and /p/ in words such as top and pot [1]. Linguistically we assume that phone identity—(e.g. /p/ vs. /t/) is a strong predictor of representational similarity, while syllable position—e.g. onset vs. coda—is perhaps a secondary factor. But is this always the case? Unsupervised learning in neural networks presents a practical approach for exploring this interplay, because it does not require presuppositions about phonological categories such as segments and syllable. Previous studies [2, 3] have demonstrated the capacity of neural networks to learn abstract representations from acoustic signals. This study employed an unsupervised autoencoder neural network to explore the correlation between phonological categories and network-learned representations. Surprisingly, we found that for consonants, syllable position plays a larger role in representational similarity than phone identity.
(This work was my honors thesis at UMass Amherst, and was advised by Dr. Brian Dillon and Dr. Ming Xiang .) - Fengyue Zhao, Brian Dillon, Ming Xiang. Probabilistic Listener: A Case of Chinese Mandarin Reflexive ziji. Ambiguity Resolution. 36th Annual Conference on Human Sentence Processing. March 9 - 11 2023. University of Pittsburgh, Pittsburgh, PA.