Back in 2023 I realized with excitement that I was in the position to calculate some statistics on the origin of the Koa lexicon. Since then some additional forensic techniques have developed, and I can now lay out a significantly clearer and more detailed picture, including previously unexamined data around patterns and subjects of inspiration and productivity. I have to say I'm endlessly fascinated by this kind of intense meta navel gazing, and the kind of autobiographical insight it can provide (assuming I can figure out how to interpret the data).
In the domain of etymology, first of all, I now have data on all but 50 of my 1066 roots, and year-of-creation data for all but one of them. Below are the percentages of my predicate roots by source; origins with fewer than 10 tokens apiece (5% of the total) have been grouped into "Other," representing a wide range of both natural (Basque, Latin, Malay, Nahuatl, Quechua, Swahili, Turkish and many others) and artificial languages (Doraja, Esperanto, Quenya, Seadi, etc.):
Some notes on the categories...
Random words (30%) were created with the help of my vocab tracker and random-word selector, which ensures no unintended homophony among roots, and provides random suggestions for needed words weighted by the preferability of their structure. I should clarify that I never make these choices without considerable thought and review: my program makes a suggestion, but the aesthetics have to feel right before I'll accept it.
Internal coinages (7%) were formed via clusters of existing vocabulary, mainly (but not exclusively) particles:
to "that [adjectival]" + a "[indefinite]" > toa "that [pronominal]"
me "with" + no "without" > meno "regardless, anyway"
ke "which" + ne "in, on, at" > kene "location"
ai "[question marker]" + saa "receive, be allowed" > aisa "pardon, excuse me, may I"
Polynesian words (11%) include Hawaiian, Samoan, Tahitian, Tongan and Maori. I think I originally expected this category to be much larger given Koa's nearly Polynesian phonology, but it's turned out that Finnish words tend to be more amenable to the constraints of Koa root structure.
Finnish loans (26%) are the largest non-random category, providing more than a quarter of Koa's core vocabulary. When looking to assign a new root, I almost always check Finnish first -- if it has a suitable word, or one I can easily modify to be suitable, it's generally my first choice. I'm positive that this is a reflection of my own aesthetic biases, but it certainly also helps that the phoneme inventory and syllabic constraints of Finnish are not much more complicated than Koa's, while providing more variety and salience than Polynesian for this purpose.
Family & Friends (1%), lastly, include coinages for Koa by other people in my life! Two former partners are represented here (anu "water" from Amelia, culi "move" and many others from Olga), friends (soto "meditate" from Jonathan), and of course my daughter (lapa "safe"). I actually really love these little lasting artifacts of my loved ones; I wish I'd sought more of these over the years! I've been considering giving other friends and family members the phonology and commissioning creations for words that have been problematic heretofore (like "rice," "springtime," "spoon" and "unenvious").
In addition to the origins of Koa words, it's also interesting to look at the timing of their creation, which has been anything but linear.
There were two significant spikes, centered on 2011 and 2023, one with a lead-in from the previous year, the other with a coda into the following. In all, these two spikes and their entourage are responsible for 876 roots, 82% of the total. It appears that word-creation is something of a separate "project" for me: to be taken on when necessary, but otherwise proceeding only at a trickle. Let's see what happens if we compare the rate of word creation with blog post creation...
Blog posts -- which tend to occur alongside significant structural or theoretical work on the language -- have spikes too, though apparently following a different drummer than word creation. For one thing, interestingly, spikes tend to skip years: unlike with words, there are no two consecutive years with blog posts being written at a higher than average rate. Also, though there may not be enough data to justify attributing causation, it would appear that structural work may drive word creation: in both cases of a significant word-creation spike, a period of structural work preceded it, sometimes for several years.
Thinking about this subjectively, I suppose it ought not to be surprising. Assigning words is probably my least favorite area of conlanging: (1) the stakes feel high, as these choices end up determining much of the aesthetic character of the language: where I make choices I dislike, these can affect my interest in the whole language; and (2) it's almost entirely a subjective/creative/artistic process, something that I can do but which is much more difficult for me to access than, say, problem-solving. (This is one reason I enjoy the company of ADHD folks, for whom eliciting that kind of creative vision tends, seemingly, to be rather effortless.)
In other words, then, what seems to happen is that my main conlanging inspiration is structural or philosophical -- phonology, morphology, syntax, pragmatics, even sociolinguistics -- and I turn to lexis basically to support my ability to continue developing those areas, or when needed in order to produce or translate a certain kind of text. Fascinating! I really had no conscious sense of this.
I did, however, have a sense that there were certain times of year when I tended to be more active in Koa development than others, and finally found a way to explore this. Again taking blog publishing rates as a convenient approximate metric on activity level, we can see that there are interesting patterns associated with months of the year:
In particular, a post is about twice as likely to get written between November and April (69%) than it is between May and October, with February by far the most productive month. On the other end of the spectrum, in almost 26 years there have only been four posts written in the month of August. My immediate assumption was that this differential was likely to be weather-related, so I did some additional research...
Using Weather Underground's history features, I put together a grid showing the average monthly temperature for each month in which posts were published, localized to wherever I happened to be at the time.
This bears out my suspicion in that again a post is about twice as likely to be published when the average temperature is below 60°F, but it's hard to get more than that from these data because the frequency of posts gets entangled with the frequency of the temperature ranges themselves. To counteract this, I redid the chart showing post frequency per temperature as a percentage of a baseline in which temperature did not affect productivity. Here the results are pretty striking:
Below a monthly average temperature of 60°F, posting activity is approximately at or significantly above baseline, reaching a magnificent high of 149% between 50°F and 54°F. On the other hand, as soon as the temperature rises to 60°F the productivity rate compared to baseline begins to fall rapidly, as low as 48% when the average temperature is at or above 70°F.
The interesting conclusion is that I clearly feel more inspired or productive when the weather is cooler: approximately the range of temperatures in which I tend to wear a hat and socks indoors, and keep a space heater running in my room. When the temperature is warmer I tend to spend more time outside doing other activities, for one thing, but I think that beyond that my mindset changes somewhat: it's not the way I spend my time, it's the way I'm inspired to spend my time.
Another thing I thought was interesting is that it seems I'm significantly more likely to feel like working on Koa when the temperature is cool but not too cold. This surprised me: I had assumed there was just a "warm weather" mode and a "cold weather" mode, but it appears that outdoor temperatures below a certain point diminish inspiration or motivation in some way.
In the end I worry whether all this may just represent cheerful but ultimately frivolous data-crunching...but given that I've given so much time, thought and effort to this project over so many years of my life, I find it rather wonderful and meaningful to get to take a look at the character of the work itself. When I feel inspired, that fire tends to seize me and I don't really notice what I'm doing; it turns out that there is some kind of internal form and science, not just to the product but to the inspiration itself. It feels kind of moving, somehow, and makes me want to take more pride, and have more faith, in my particular curious artistic process.