What are the characteristics we want for an AGI (artificial general intelligence)? An AGI should have a very advanced capacity in NLP and language comprehension. One of the qualities we expect from an AGI is respect for multilingualism. Hopefully, the AGI should have extensive NLP capabilities, which apply to a large number of languages, and even to the 8000 languages of the planet, i.e. also to the 90% of endangered languages. The AGI could thus help to solve an important problem inherent to the problem of language extinction, which affects human cultural diversity (it can be assumed that some languages will be extinct at the time of the AGI event, but the AGI could thus help to revitalize them).
Here is a problem for a human intelligence (or an AGI): we have a dictionary (with words, lemmas and grammatical types) in a language A and a second dictionary in a language B. If we have an extensive corpus of each of the two languages, is it possible to create a translation dictionary from A to B, and how? To take an example: if the two languages were French and English, we would have to associate ‘cheval’ with ‘horse’, etc. in the final translation dictionary, and so on for all the words of language A.
Highly related seems to be this paper: Deciphering Undersegmented Ancient Scripts Using Phonetic Prior.
Let us expand the idea of text analysis derived from rule-based translation. Above is an example of a classic word-based search. In this particular case, it is the French word ‘été’. This word is ambiguous because it can be a common noun (‘summer’), or a past participle (‘been’). Below is an example of a search for the word ‘summer’ associated with the grammatical type ‘common noun’.
Finally, we have below an example of a search for the word ‘summer’ associated with the grammatical type ‘past participle’.
Rule-based translation is difficult to implement. The main difficulty encountered is taking into account the groups of words, so as to be on a par with statistics-based translation. The main problems in this regard are (i) polymorphic disambiguation; and (ii) building a fair typology of grammatical types. But once these steps begin to be mastered, there are many advantages. What seems essential here is that with the same piece of software, both machine translation and text analysis can be carried out. Among the modules that are easy to implement are the following:
- part-of-speech tagger
- grammar checker
- type extractor: a module that allows you to extract words from a text according to their grammatical category
For the implementation of rule-based translation provides the machine with some inherent understanding of the text, in the same way that a human being does. To put it in a nutshell, it is better artificial intelligence.
Finally, other modules, more advanced, seem possible (to be confirmed).
Let us consider superintelligence with regard to machine translation. To fix ideas, we can propose a rough definition: it consists of a machine with the ability to translate with 99% (or above) accuracy from one of the 8000 languages to another. It seems relevant here to mention the present 8000 human languages, including some 4000 or 5000 languages which are at risk of extinction before the end of the XXIth century. It could also include relevantly some extinct languages which are somewhat well-described and meet the conditions for building rule-based translation. But arguably, this definition needs some additional criteria. What appears to be the most important is the ability to self-improve its performance. In practise, this could be done by reading or hearing texts. The superintelligent translation machine should be able to acquire new vocabulary from its readings or hearings: not only words and vocabulary, but also locutions (noun locutions, adjective locutions, adverbial locutions, verbal locutions, etc.). It should also be able to acquire new sentence structures from its readings and enrich its database of grammatical sentence structures. It should also be able to make grow its database of word meanings for ambiguous words and instantly build the associate disambiguation rules. In addition, it should be capable of detecting and implementing specific grammatical structures.
It seems superintelligence will be reached when the superintelligent translation machine will be able to perform all that without any human help.
Also relevant in this discussion is the fact, previously argued, that rule-based translation is better suited to endangered langages translation than statistic-based translation. Why? Because high-scale corpora do not exist for endangered languages. From the above definition of SMT, it follows that rule-based translation is also best suited to SMT, since it massively includes endangered languages (but arguably, statistic-based MT could still be used for translating main languages one into another).
Let us speculate now on how this path to superintelligent translation will be achieved. We can mention here:
- a quantitative scenario: (i) acquire, fist, an ability to translate very accurately, say, 100 languages. (ii) develop, second, the ability to self-improve (iii) extend, third, the translation ability to whole set of 8000 human languages.
- alternatively, there could be a qualitative scenario: (i) acquire, first, an ability to translate somewhat accurately the 8000 languages (the accuracy could vary from language to language, especially with rare endangered languages). (ii) suggest improvements to vocabulary, locutions, sentence structures, disambiguation rules, etc. that are verified and validated by human (iii) acquire, third, the ability to self-improve by reading texts or hearing conversations.
- it is worth mentioning a third alternative that would consist of an hybrid scenario, i.e. a mix of quantitative and qualitative improvements. It will be our preferred scenario.
But we should provide more details on how these steps could be achieved. To fix ideas, let us focus on the word self-improvement module: it allows the superintelligent machine translation to extend its vocabulary in any language. This could be accomplished by reading or hearing new texts in any language. When facing a new word, the superintelligent machine translation (SMT, for short) should be able to translate it instantly into the 8000 other languages and add it to its vocabulary database.
To give another example, another module would be locution self-improvement module: it allows the superintelligent machine translation to extend its locution knowledge in any language.
Also relevant to this topic is the following question: could SMT be achieved without AGI ( general AI)? We shall address this question later.
To begin with, let us state the 1% problem, for machine translation: it seems some 99% accuracy in machine translation could be attainable but the remaining 1% (1% is just a given number, somewhat arbitrarily chosen, but useful to to fix ideas) may be hard of even very hard to reach. Now a question arises: is some progress on the remaining 1% problem attainable without general-purpose AI. Prima facie, the answer is no. For it seems that progress on the remaining 1% problem requires, for example, some abilities such as being able to find the translation of a given word on external databases. For it will occur sometimes that the 1% untranslated will be due to the presence of a new word, for instance very recently created, and thus lacking in the MT internal dictionary. In order to find the relevant translated word, the machine should be able to search and find it on external databases (say, the web), just as a human would do. So, solving the remaining 1% problem requires – among other capabilities – any such ability which is part of a general-purpose AI.
Artificial general intelligence (AGI) is prima facie a somewhat abstract notion, that needs to be refined and made more explicit. Problems encountered in implementing machine translation systems can help make this notion more accurate and concrete. The ability to find the translation of a given word on external databases is just one of the required abilities needed to solve the remaining 1% problem. So we shall mention some other abilities of the same type later.