Research interests

  • Representation and learning of morphophonological alternations
  • Computational models of phonological learning
  • Structure of paradigms
  • Phonology of Seediq (Atayalic, Austronesian)
  • Phonetics and phonology of Mam vowels


Paradigms that have neutralizing alternations can be difficult to learn, and be prone to renalaysis over time. Understanding the direction and output of reanalysis can further our understanding of the factors that drive phonological learning. There are few quantitative models of reanalysis, and existing ones such as Albright's Minimal Generalization Learner (MGL; 2002; 2003, et seq) and Nosofky's Generalized Context Model (GCM; 2011) predict renalaysis to always be predictable from statistical distributions. In other words, reanalysis should be in the direction of the most likely alternant.

In this project, I look at reanalysis in a subset of Malagasy stems, where consonant contrasts are found in the suffixed allomorphs, but neutralized in non-suffixed allomorphs. Evidence of reanalysis was found by comparing historical Proto-Malayo-Polynesian forms to their corresponding modern Malagasy stems. I find evidence that reanalysis is sensitive to both statistical distributions and markedness effects. Based on these results, I propose a model of reanalysis based in MaxEnt, where markedness constraints are biased to have higher weight than faithfulness constraints.

Preliminary results were presented in TripleAFLA 2022. SLIDES

In Tgdaya Seediq, all forms of a verb paradigm suffer from some type of neutralization, such that no slot can be used to perfectly predict the rest of the paradigm. Classic approaches to morphophonological analysis deal with this by setting up URs which combine information from multiple forms of the paradigm (Kenstowicz and Kisseberth, 1977). The alternate single surface-base approach, proposed by Albright (2002, et seq.), argues that URs must be based on a single slot (or suface allomorph) within the paradigm. For my MA , I used a corpus of verbs to show that Seediq verb paradigms have likely undergone restructuring based on the non-suffixed slots of the paradigm, consistent with the single surface-base hypothesis.

To test whether Seediq speakers will productively 'undo' neutralizations of non-suffixed base forms, I utilized a variant of the wug-test (Berko, 1958); stimuli were not nonce-words, but rather ‘gapped forms’, or existing words in the Seediq lexicon which are never found in their suffixed forms. This method, though less common than wug-testing, was used out of respect for my Seediq consultants, who for cultural reasons did not want to work with nonce words. Results from 10 participants suggests that speakers are able to productively extend some alternations.

In joint work with Emily Grabowski (UC Berkeley), we are comparing different clustering algorithms in their ability to characterize vowel spaces. In descriptive phonetics, clustering algorithms are often used to evaluate the dispersion of vowel categories. The k-means algorithm is most commonly used for this purpose but has major limitations: i) it requires a specified number of clusters, ii) it assumes that clusters are spheres of similar size (Han et al. 2011), and iii) can be skewed by outliers. OPTICS is an algorithm that uses areas of high density in the data to identify clusters in contrast to k-means, which partitions all points in the data into k clusters (Ankerst et al 1999). As a result, OPTICS avoids many of the pitfalls of k-means: it organically assigns the number of clusters, can learn clusters of varying shape and density, and separates prototypical members of a cluster from outliers.

In a preliminary study, we compared the vowel clusters found by k-means and OPTICS in two datasets of English monophthongs, respectively of ‘lab’speech (Hillenbrand et al.1995) and ‘corpus’ speech (Buckeye Corpus: Pitt et al. 2005).

Results show that k-means partitions the vowel space into roughly evenly sized clusters, which results in inaccurate boundaries between certain vowels. On the other hand, OPTICS identifies accurate vowel centers, but doesn’t attempt to categorize every observation. Because of these different approaches to clustering, OPTICS seems to be better-suited to illustrating the core vowel space, and minimizing the effect of outliers.

I am working with James Stanford at Dartmouth College and Cathryn Yang at Yunnan Minzu University on sociolinguistic variation in Na (or Mosuo). Na is a rural language community with ~40,000 people, located mainly around Lugu Lake in Yunnan and Sichuan. The Na practice matrilineal inheritance with lifelong matrilocal residence. As such, variation in Na is of particular theoretical interest for examining Labov's classic gender principles of sound change (e.g., Labov 1966, 2001, Trudgill 1972, Wolfram 1969).

Preliminary results, based on word-list reading from 48 speakers, suggests that consistent with Labovian principles, Na women are leading sound changes in tone and vowel quality.