Research

Research interests

  • Representation and learning of morphophonological alternations
  • Computational models of phonological learning
  • Phonology of Austronesian langauges (Seediq, Rukai, Maori, Samoan, Malagasy)
  • Effect of input and frequency on morphophonological learning
  • Statistical learning and variation

Projects

Paradigms that have neutralizing alternations can be difficult to learn, and be prone to renalaysis over time. Understanding the direction and output of reanalysis can further our understanding of the factors that drive phonological learning. There are few quantitative models of reanalysis, and existing ones such as Albright's Minimal Generalization Learner (MGL; 2002; 2003, et seq) and Nosofky's Generalized Context Model (GCM; 2011) predict renalaysis to always be predictable from local statistical distributions within a paradigm. In other words, reanalysis should be in the direction of the most likely alternant.

In my dissertation work, I look at how paradigms have been restructured across three languages: Malagasy, Samoan, and Maori. In all three languages, I find evidence that reanalysis is sensitive not just to local statistical distributions, but also to markedness effects. I also propose these markedness effects are directly learned from stem phonotactics. Based on these results, I propose a model of reanalysis based in MaxEnt, where markedness constraints (derived from stem phonotactics) are biased to have higher weight than competing faithfulness constraints.

Results of this project form the basis of my [dissertation](/publications/2023-phd-dissertation). Other work that came out of this project include my papers on Malagasy, on OCP-place effects in Samoan, and on hiatus avoidance in Maori.


Traditionally in morphophonemic analysis, surface forms are derived from an underlying representation (UR), which contains all indiosyncratic properties of a morpheme. In this approach, there are no restrictions on how abstract the UR can be (i.e. how much it can deviate from its corresponding surface forms). However, an increasing body of work, such as Albright (2002, et seq.), argues that paradigms must be surface-based, in that the UR must be based on a single slot within the paradigm. In my work on two Formosan languages of Taiwan, Tgdaya Seediq and Maga Rukai, I find evdience in favor of this surface-base approach.

Tgdaya Seediq and Maga Rukai share the trait that all forms of a verb paradigm suffer from some type of neutralization, such that no slot can be used to perfectly predict the rest of the paradigm. Under traditional morphophononemic analysis, this is dealt with by setting up URs that are composite. Composite URs combine information from multiple forms of the paradigm , and therefore do not correspond to any surface allomorph(Kenstowicz and Kisseberth, 1977). If composite URs are learnable, then we would expect them to remain stable over time. However, I find that in both Seediq and Rukai, paradigms have been restructured in a way that results in more concrete URs, in line with a surface-base approach.

My work on Seediq formed the basis of my MA thesis; a follow-up study, which incorporates wug-test results with Seediq speaekrs, was published in Phonological Data and Analysis.

My work on Maga Rukai will be presented in SEALS (Southeast Asian Linguistics), and is available as a manuscript in progress.


In joint work with Emily Grabowski (UC Berkeley), we are comparing different clustering algorithms in their ability to characterize vowel spaces. In descriptive phonetics, clustering algorithms are often used to evaluate the dispersion of vowel categories. The k-means algorithm is most commonly used for this purpose but has major limitations: i) it requires a specified number of clusters, ii) it assumes that clusters are spheres of similar size (Han et al. 2011), and iii) can be skewed by outliers. OPTICS is an algorithm that uses areas of high density in the data to identify clusters in contrast to k-means, which partitions all points in the data into k clusters (Ankerst et al 1999). As a result, OPTICS avoids many of the pitfalls of k-means: it organically assigns the number of clusters, can learn clusters of varying shape and density, and separates prototypical members of a cluster from outliers.

In a preliminary study, we compared the vowel clusters found by k-means and OPTICS in two datasets of English monophthongs, respectively of ‘lab’speech (Hillenbrand et al.1995) and ‘corpus’ speech (Buckeye Corpus: Pitt et al. 2005).

Results show that k-means partitions the vowel space into roughly evenly sized clusters, which results in inaccurate boundaries between certain vowels. On the other hand, OPTICS identifies accurate vowel centers, but doesn’t attempt to categorize every observation. Because of these different approaches to clustering, OPTICS seems to be better-suited to illustrating the core vowel space, and minimizing the effect of outliers.


I am working with James Stanford at Dartmouth College and Cathryn Yang at Yunnan Minzu University on sociolinguistic variation in Na (or Mosuo). Na is a rural language community with ~40,000 people, located mainly around Lugu Lake in Yunnan and Sichuan. The Na practice matrilineal inheritance with lifelong matrilocal residence. As such, variation in Na is of particular theoretical interest for examining Labov's classic gender principles of sound change (e.g., Labov 1966, 2001, Trudgill 1972, Wolfram 1969).

Preliminary results, based on word-list reading from 48 speakers, suggests that consistent with Labovian principles, Na women are leading sound changes in tone and vowel quality.