Corpus

A reliable dictionary of any language cannot be compiled without objective evidence of that language in use. This is largely the role of the text corpus, a large and structured set of texts meant to serve as evidence to inform lexicographers during their endless task of compiling and revising dictionaries. In fact, the lack of such corpora is common in the Cree lexicographic tradition, where dictionaries have been compiled largely by eliciting words from fluent Cree-speakers, sometimes by using a previously compiled dictionary as a prompt. The limitations of this technique are readily observed in most Cree dictionaries, including earlier editions of the Dictionary of Moose Cree. And while citations from Cree texts may have been put to good use for these previous editions, a lack of time and resources severely restricted our ability to build and analyze a text corpus representative of the dialect – until recently.

Our growing corpus presently contains over 250,000 words of running text or tokens and we expect to double its size in the coming years. Texts include transcriptions of stories and conversations by contemporary Cree-speakers, as well as examples compiled previously by linguists and anthropologists. The inclusion of modern and historical literature, translations and original works, brings a sort of balance to the spoken language component of the corpus, expanding the range of topics covered.

This corpus provides the objective evidence needed to compile a reliable dictionary of the Moose Cree dialect, not only as it is spoken nowadays, but how it came to be what it presently is. This is particularly important for a language that is under pressure and losing speakers to a dominant language. Upcoming editions of the dictionary, including its online version, will include thousands of examples drawn directly from this corpus in order to aid speakers in understanding how words in this language are used.

While a corpus of Moose Cree texts provides evidence for the compilation of our Cree to English dictionary, we also depend on a corpus of English texts for our English to Cree dictionary. In particular, the English headwords in our dictionary were arrived at by using a frequency list derived from the greater than 1 billion word Corpus of Contemporary American English. By working to include the more frequently used words in the English language, something about which a text corpus can inform us, we hope to provide a dictionary that is both comprehensive and useful. For the third edition of our dictionary, work began with the first 3,000 words of that frequency list. Those that were not suitable for Cree translation, for a number of reasons, were rejected and the resulting list was then supplemented with culturally-relevant vocabulary. By systematically working through the frequency list we hope to expand the English to Cree dictionary in a way that is logical and beneficial to users.