Let us consider a specific kind of superlative. Such form specific to Corsican language is notably mentioned by grammarian and author Santu Casta, in his Punteghju, who recommends the following translation of “C’était le village le plus riche du canton” (It was the richest village of the canton): Era u più paese riccu di stu cantone (pages 26 & 54-55). The structure is original in the sense that the comparative (più) precedes the noun (campanile, bell tower) that precedes the adjective (altu, high).
In Corsican language, French word ‘femme’ can be translated, depending on the context
- either into donna (woman)
- or into moglia (wife)
The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles successfully the semantic disambiguation (hard) of ‘femme’, two instances of which are properly translated into moglia (wife). As the Corsican proverb says, in a cianga l’oru luci sempri (in the mud, gold is still shining).
French samples are from the French corpora of the University of Leipzig.
Translation of the French word ‘Noël’ yields another case of ambiguity. For ‘Noël’ can translate:
- either into Natali (Christmas, Christmas Day): the annual festival commemorating Jesus Christ’s birth
- or into, identically, Natali (‘Noel‘): the firstname
Now it seems there is no case of disambiguation, since in either case, ‘Noël’ in French translates into Natali (Natali in sartinese and taravese variants; Natale in cismuntincu variant). But ambiguity lurks when one considers some sentences including ‘Noël’. Let us consider then the following sentence: ‘Je l’ai donné à Noël.’ Now it can be translated:
- either into: L’aghju datu in Natali. (I gave it at Christmas.)
- or into: L’aghju datu à Natali (I gave it to Noel.)
since French preposition ‘à’ translates differently in both cases. A phenomenon of the same nature occurs when one considers translation from French to English.
Interestingly, when the two ambiguous consecutive words are repeated, ambiguity vanishes. Since ‘Je l’ai donné à Noël à Noël.’ translates unambiguously into L’aghju datu à Natali in Natali (I gave it to Noel at Christmas.). For we can ignore the order: L’aghju datu in Natali à Natali (I gave it at Christmas to Noel.) amounts to the same. In this last case, the translation is meaning-preserving.
Here are a few suggestions on how rule-based and statistical machine translation can help each other:
To begin with, rule-based and statistical machine translation are often contrasted and compared: it would be oversimplifying to conclude that one is better than the other. From a more objective standpoint, let us consider that each method has its strengths and weaknesses. Let us investigate on how one could make them collaborate in order to add up their respective strengths
in the case of an endangered language, the lack of good quality corpora has been pointed out. But one way for rule-based and statistical machine translation to collaborate would be to use rule-based translation for building a better quality corpus for statistical machine translation
suppose we begin with a statistical machine translation software that performs 50% on average with regard to French to Corsican translation
let us sketch the process of creating these better corpora: let us take the example of the French-Corsican diglossic pair (the Corsican language being considered by Unesco as a definitely endangered language). Now presently we lack a quality French-Corsican corpus or to say it more accurately, the corpus at our disposal is a low-quality one. The idea would be to use rule-based machine translation to create a much better corpus to use with statistical machine translation.
let us sketch now the different steps of this collaborative process: (i) create a French-Corsican corpus with the help of rule-based machine translation: if the software has some average 90% performance, then the corpus would be on average 90% reliable. With appropriate training, statistical MT should now perform some, say, 80% on average (to be compared with the previous 50% performance)
(ii) from this French-Corsican corpus, other corpora pairs can be created, such as Italian-Corsican, English-Corsican, etc. since French-Italian, English-Italian, etc. corpora of excellent quality already exist. The performance gain should then extend to other language pairs such as Italian-Corsican, English-Corsican, etc.
with the help of this process, we re finally in a position to combine and add up the strengths of the two complementary approaches to MT: on the one hand, rule-based MT is able to translate with good accuracy even in the lack of corpora; on the other hand, statistical machine translation is able to handle successfully and fastly a great many language pairs. To sum up, as the Corsican proverb says: una mani lava l’altra (One hand washes the other).