This morning I focused for the first time on the fact that my little random word program -- the database that suggests Koa roots in need of meanings -- is suggesting roots containing /c/ a disproportionate amount of the time. This in itself wasn't surprising: /c/ only returned to active use about 15 months ago, so it would make sense that more roots containing it would be available. It made me wonder, though, just how much variation there is in consonant phoneme frequency in Koa. I ran some numbers...
On the other end of things, /k/ and /l/ are significantly overrepresented at nearly 150% of average! ...Which also kind of makes sense because they're also favorites of mine.
I guess it just hadn't occurred to me that my own personal aesthetics would have figured so prominently in root choice with respect to phoneme frequency! I must have expected that each consonant would appear approximately equally, as odd as that would have been cross-linguistically?
That raises a really interesting point, though, which I also had never considered: the particular character of Koa as it has always existed manifests these frequency biases. Like any language, the phonemes are represented unequally, and that gives it an important part of its unique phonological character. As such, moving towards greater uniformity -- as my random picker would automatically tend to do -- would, over time, actually alter the feel of Koa.
And if I like the phonological aesthetics as they've been up to this point -- which it turns out I do -- I may actually not want to continue generating words this way! I'm not sure yet exactly how I'll do this, but what we really want is for the randomness to be weighted -- towards words with Koa's favorite phonemes, and away from words with those it prefers less -- such that a random sample of suggested words would tend to show the same frequency distribution as the language as a whole.
I almost wonder if I should go back to an earlier version of the file, run these numbers again, and use those statistics; the program potentially had a noticeable impact on the frequencies with those 200+ words in the past couple months. Though...on the other hand I was still vetting the choices so my aesthetics were still probably in force, even if being nudged. I could figure out the statistics of the recent additions on their own just to be sure.
Anyway this is certainly an interesting little surprise for me to ponder.
No comments:
Post a Comment