Monthly Archives: January 2021

Grammatical word-disambiguation again

The challenge is especially that of generalizing the grammatical word-disambiguation to several languages. Creating a module of grammatical word-disambiguation for each language appears to be a long and arduous task. This seems to be the main difficulty. But if a module specific to a given language can be generalized to several other languages, this could be an important advance in the field of rule-based machine translation (which simulates human reasoning seems to me a more appropriate term).

We can describe the problem more precisely. We have about 100 grammatical categories for a given language. We also have about 300 ambiguous grammatical types – to fix ideas – which are: e.g., adverb or preposition, singular masculine noun or singular masculine adjective, etc. The problem is to describe an algorithm to remove the ambiguity and determine the corresponding grammatical type according to the context.

Now rewriting the complete module of disambiguation by grammatical type, so that it can be used and adapted to other languages (Italian in the first place). It remains to be seen if this can be done.

First steps in gallurese language

The translator takes his first steps in translating from French into the Gallurian language. The first tests show a score of 75-80%, with many errors in grammar, spelling and vocabulary. It will be necessary to reach a score of 90% before the result can be published.

The ideal would have been the Italian-Gallurian translation, but this is not yet possible: it will be necessary to translate (i) Italian into French, then (ii) French into Gallurian.

Hinting at the Control problem

The question of choosing the best system to solve the problems posed by word disambiguation in the field of translation seems to be linked to the AGI control problem (how to avoid that an AGI finally turns out to be harmful for its creators). It seems that when we have the choice between several methods to develop an AI, it is wiser to choose the one that allows a better control of the AGI. As far as machine translation is concerned, we should thus prefer in this regard the method that emulates human reasoning, and that produces a response that can be broken down step by step into the reasoning that leads to it. This makes it possible to accurately determine the cause of an error, but also to remedy it. This problem does not only concern machine translation, but has a somewhat extended scope. For grammatical disambiguation concerns machine translation, but also the understanding of natural language, and disambiguation according to context, in the very absence of any translation.

On the implementation of grammatical disambiguation

Grammatical disambiguation – i.e. whether ‘maintenant’ is and adverb (now) or the gerundive (maintaining) of the verb ‘maintenir’ – seems to be the crucial issue for the adoption of the rule-based model or statistical model for machine translation. This problem is widespread and seems to concern all languages. For the French language, this problem of grammatical disambiguation concerns about 1 word out of 7. Effective grammatical disambiguation is difficult to implement. The advantage of adopting the statistical method for grammatical disambiguation is that the same method can be generalized and used for several languages. In the case of the rule-based model, the module of grammatical disambiguation must be rewritten for each language, which generates considerable complexity and requires a very significant development time. Therefore, a rule-based method for grammatical disambiguation that can be easily applied to several languages would be of great interest. This seems to be the main difficulty that rule-based machine translation is designed to overcome.

But if we want an artificial intelligence that not only provides an (mostly accurate) answer without being able to really explain its reasoning, but is truly able to emulate human reasoning and to justify and describe step by step the reasoning that leads to its answer, then it is worth the effort.

The 90% rule

The translation from French to Gallurese is in progress and currently under development. An application for Android is first planned. It will be called ‘traducidori gaddhuresu’. Currently the French-Gallurese translator is undergoing testing. It will only be published if its performance (evaluated by an open test) is above 90%. This is a rule that we apply to ourselves, and is specific to endangered languages. We consider that for them, a poor or low quality translation can be more harmful than useful.