What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for:
- a dictionary: some specialized lexicons are useful too
- a list of locutions and their translation: to be more accurate what is needed are noun locutions, adjective locutions, adverbial locutions, verbal locutions and their translations in other language.
- a detailed grammar (in any language): ideally, the grammar should be very detailed, mentioning notably irregular verbs, noun plurals, etc. Subjonctive, conditional tenses must also be accurately described.
- in addition, elision rules, euphony rules, should also be described.
- most importantly: a description of the main variants of the language and their differences. This is needed to handle what we can call the ‘variant problem’ (we shall say a bit more about this later): as an effect of diversity, endangered languages are often polynomic and come with variants. But translation must be coherent and a mix of several variants is not acceptable as a translation.
Let us mention that endangered languages are commonly associated with another language, being in a diglossia relationship one with another. To take an example, Corsican language is associated with French. So we consider the French-Corsican pair, and what is relevant is a French-Corsican. If we consider the sardinian gallurese language (‘gaddhuresu’), the relevant pair is Italian-Gallurese. Other relevant pairs are: