There is an ongoing debate on whether AI software should go open source or not (for example Bostrom’s paper Strategic Implications of Openness in AI Development). Now our current concern is of whether MT software should go open source or not. Prima facie, for safety reasons, it would be better to render public MT code, thus allowing anyone to check the code and find eventual errors, … Such openness would notably be a defense against the AI control problem , in short, the fact that superintelligence could harm humans. From this standpoint, it seems that publicness of code is much better than privateness. Regarding rule-based translation (the distinction between statistical and rule-based MT is not as clear-cut as one could think at first glance, since some rules could be applied on a statistical basis), it would allow people to check step-by-step the resulting translation. It seems better transparency should be attained accordingly.
Another advantage or publishing the code would be to allow anyone to improve it and extend its capabilities, notably by adding new modules targeted at new languages (human languages’ count being around 7000).
To begin with, let us state the 1% problem, for machine translation: it seems some 99% accuracy in machine translation could be attainable but the remaining 1% (1% is just a given number, somewhat arbitrarily chosen, but useful to to fix ideas) may be hard of even very hard to reach. Now a question arises: is some progress on the remaining 1% problem attainable without general-purpose AI. Prima facie, the answer is no. For it seems that progress on the remaining 1% problem requires, for example, some abilities such as being able to find the translation of a given word on external databases. For it will occur sometimes that the 1% untranslated will be due to the presence of a new word, for instance very recently created, and thus lacking in the MT internal dictionary. In order to find the relevant translated word, the machine should be able to search and find it on external databases (say, the web), just as a human would do. So, solving the remaining 1% problem requires – among other capabilities – any such ability which is part of a general-purpose AI.
Artificial general intelligence (AGI) is prima facie a somewhat abstract notion, that needs to be refined and made more explicit. Problems encountered in implementing machine translation systems can help make this notion more accurate and concrete. The ability to find the translation of a given word on external databases is just one of the required abilities needed to solve the remaining 1% problem. So we shall mention some other abilities of the same type later.