The corpus I mainly work on, the Oxford Corpus of Old Japanese, is tagged in XML, following the conventions of the Text Encoding Initiative (TEI). In the first stage, texts were romanized including information about whether something was logographically or phonographically rendered, and then tagged for morphological and syntactic information. This allows us to, for example, search for a lexical particular lexical item in any syntactic environment, and include only those items which were recorded phonographically.
In an earlier post, I noted that the verb mi– ‘see’ was the most frequently attested word; it’s attested 1358 times in a smallish corpus of only 111,000 words. But what does that really tell us?
In addition to marking up texts, one of my projects was to create a Lexicon which contained various information about each lexical item. The online version of the lexicon has links to various stored searches. Here’s the Lexicon entry for mi-:
The link for statistics gives the following information:
The most common inflection, ‘stem’, tells us that this verb is usually followed by either an auxiliary or another verb. We can get a sense of that by clicking the attestations link from the Lexicon entry. (I’m not going to present that here, as it’s rather long.)
The collocations link shows nouns that head noun phrases which are marked as subjects or objects of the verb or nouns which are modified by the verb in a noun modifying construction. I’ve shortened it to just show nouns attested at least 10 times.
Looking at a list like this, the first thing I think is that I should have had also automated the definitions for the nouns, but I’ll do that another day.
Except for ime ‘dream’, the first 7 nouns refer to people (including me ‘eye’, which is used metaphorically with mi- to refer to the person you [want to] see). So this verb occurs more often in the OCOJ with humans than with inanimate objects.
This gives us more of an idea of how mi- is used than looking just at its frequency.
Not that collocations show the whole picture either. The verb abur- ‘broil’ occurs only twice in the OCOJ, both times with pito ‘person’, referring to the person doing the broiling, not a person being broiled.