A hard case for disambiguation: polymorphic disambiguation

Let us investigate an issue that relates to disambiguation. It is a hard case that needs to be addressed: I shall call it in what follows, for reasons that will become clearer later, polymorphic disambiguation. Let us take an example. It relates to the translation of the two consecutive words: ‘de fait’. The first French sentence ‘De fait, il part.’ translates into Difatti, parti‘ (Actually, he’s leaving.): in this case, ‘de fait’ is considered as an adverbial locution. The second French sentence ‘Il n’y a rien de fait. translates correctly into Ùn ci hè nienti di fattu. (There is nothing done.) where ‘fait’ is now identifed as a participe. The instance at hand concerns French to Corsican, but it should be clear that it arises in the same way within French to English translation. To sum up: the two consecutive words ‘de fait’ can be identifed either as an adverbial locution, or as a preposition (‘de’) followed by a participe (‘fait’, done).

Now we are in a position to formulate the problem in a more general way. It concerns two or more consecutive words, that may be grammatically interpreted differently in the sentence and that may, thus, be translated in a different way. Generally speaking, disambiguation may concern one word (in most cases) but also a group of words. Now polymorphic disambiguation relates then to a given groups of words, i.e. sequences of 2-words, 3-words, 4-words, etc.

A try with online translators shows that statistical MT does better with polymorphic disambiguation. That is truly an interesting difference. So it is a gap that should be filled for rule-based MT.