Category Archives: Semantics: blog

Modulators of determinants

We have mentioned the special category of determinant modulators. It seems that this category is interesting and deserves to be explored further. A determinant modulator is placed before a determinant and changes its meaning. As we have already seen, from the viewpoint of two-sided grammar, a determinant preceded by a determinant modulator (MODD) remains a determinant.

We can give some examples that apply to different categories of determinants:

  • MODD applying to possessive determinants (mes, tes, ses, nos, vos, leurs; my, your, his/her, our, your, their; i me, i to, i so, i nostri, i vostri, i so), demonstrative determinants (ces; these; ‘ssi/’sse) and definite article determinants (les; the; i/e): certaines de, certains de, l’un de, l’une de, la majeure partie de, la plupart de, tous, toutes, une bonne partie de, une grande partie de; some of, some of, one of, one of, most of, most of, all of, all of, a good part of, a large part of; une poche di, uni pochi di, unu di, una di, parte è più di, a maiò parte di, tutti, tutte, une bella parte di, parte assai di. Here are some examples: “certains de mes chevaux étaient bruns” (some of my horses were brown; uni pochi di i me cavalli eranu bruni); la majeure partie des (= de les) habitants étaient riches (most of the inhabitants were rich; a maiò parte di l’abitanti eranu ricchi).

In addition, we have three other categories of MODDs that have already been mentioned:

  • MODD applying to cardinal determinants (deux, trois, quatre, cinq, … ; two, three, four, five…; dui, trè, quattru, cinqui,…): au moins, presque, quasiment, environ, plus de, moins de, approximativement, etc. (at least, almost, nearly, about, more than, less than, approximately, etc. ; alminu, guasgi, guasgi, circa, più di, menu di, apprussimativamenti, etc.)
  • MODD applying to indefinite article determinants: plus de, au moins; more than, at least; più di, alminu
  • MODD applying to indefinite determinants (aucun, aucune, quelques; none, none, a few ; nisciunu, nisciuna, calchì): au moins, presque; at least, almost; alminu, guasi

Finally, it seems that this category of MODD has some consistency and could be of practical interest.

Grammatical categories by position again: the case of adverbs and modulators placed before a modulator

Let us try to delve more deeply into the case of adverbs. We shall continue now to define them by their position in relation to other grammatical categories. The result is that adverbs are divided into several different categories. Now let’s look at the adverbs that may be placed before an adjective modulator. To begin with, let us cite but a few adjective modulators:

  • peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, etc.
  • pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, etc.
  • not veryveryextremelyespeciallysurprisinglyhardlyreallyenoughall/very, tooso, etc.

Now some modulators of adjective modulators are:

  • pas, peut-être, surtout, vraiment, etc.
  • micca, forse, soprattuttu, veramente, è cetera.
  • not, maybe, mostly, really, etc.

Here are some relevant examples: “il était surtout trop blanc” (he was mostly too white, era sopratuttu troppu biancu); “il était vraiment très beau” (he was really very beautiful, era propriu bellissimu); “il était bien trop grand” (he was far too tall ; era bellu troppu maiore).

Let’s call this category modulators of adjective modulators. The fact of being placed before the adjective modulator is related to the fact that the modulator modifies the meaning of the adjectivemodulator.

Hence, if we reason in terms of two-sided grammar, an adjective modulator preceded by a modulator remains an adjective modulator: MOD-MODAQ = MODAQ.

To sum up. So far we have distinguished several categories among the classical class of adverbs:

  • modulators of adjectives
  • modulators preceding verbs: verb pre-modulators
  • modulators following verbs: verb post-modulators
  • modulators preceding cardinal determinants
  • modulators preceding adjective modulators

Grammatical categories by position: the case of adverbs and modulators placed before a determinant

Let us try to delve more deeply into the case of adverbs, trying to define them by their position in relation to other grammatical categories. The adverbs are divided into several different categories. Now let’s look at the adverbs that may be placed before a determinant:

  • au moins, presque, quasiment, environ, plus de, moins de, approximativement, etc.
  • alminu, guasgi, guasgi, circa, più di, menu di, apprussimativamenti, etc.
  • at least, almost, nearly, about, more than, less than, approximately, etc.

Here are some examples: “au moins cinq chevaux dormaient” (at least five horses were spleeping, alminu cinqui cavaddi durmiani); “pendant presque un an” (for almost a year, mentri guasgi un annu); “presque aucun soldat ne manquait” (almost no soldier was missing ; guasgi nisciunu suldatu ùn mancava).

Let’s call these categories modulators (of determinants). The fact of being placed before the determinant is related to the fact that the modulator modifies the meaning of the determinant. The relevant determinants are:

  • cardinal determinants: deux, trois, quatre, cinq, … (two, three, four, five…; dui, trè, quattru, cinqui,…)
  • indefinite article determinants
  • indefinite determinants: aucun, aucune, quelques (none, none, a few ; nisciunu, nisciuna, calchì)

Hence, if we reason in terms of two-sided grammar, a determinant preceded by a modulator remains a determinant: MODD-D = D.

To sum up. So far we have distinguished several categories among the classical class of adverbs:

  • modulators of adjectives
  • modulators preceding verbs
  • modulators following verbs
  • modulators preceding cardinal determinants

Grammatical categories by position: the case of adverbs and verb modulators placed before the verb

Let us look again at the case of adverbs and try to define them by their position in relation to other grammatical categories. We are now splitting the adverbs into several different categories. Now let’s look at the adverbs that may be placed before the conjugated verb:

  • aussitôt, jamais, longtemps, parfois, quelquefois, rarement, souvent, toujours, etc.
  • subitu subitu, mai, à longu, ogni tantu, qualchì volta, raramente, à spessu, sempre, etc.
  • immediately, never, long, sometimes, rarely, often, always, etc.

Here is an example: “Michel parfois buvait.” (Michel sometimes drank.): Michele qualchì volta beìa .

Let’s call these categories modulators (of verbs). The fact of being placed before the verb is linked to the fact that the modulator modifies the meaning of the verb. Moreover, if we reason in terms of two-sided grammar, a verb preceded by a (verb) modulator remains a verb: MODV-V = V.

Grammatical categories by position: the case of adverbs and verb modulators

If we look again at the case of adverbs and try to define them by their position in relation to other grammatical categories, it follows that we need to split the adverbs into several different categories. To begin with, some adverbs are placed after a verb:

  • bien, doucement, lentement, mal, vite, volontiers, fort, trop, quelquefois, souvent, peu, rarement, tard, tôt, toujours, déjà, bientôt, beaucoup, etc.
  • bè, pianamenti, pianu, mali, in freccia, vulinteri, forti, troppu, calchì volta, à spessu, pocu, raramenti, tardi, in freccia, sempri, dighjà, prestu, mori, etc.
  • well, gently, slowly, badly, quickly, willingly, strongly, too much, sometimes, often, little, rarely, late, early, always, already, soon, a lot, etc.

Here are some examples: “il mange beaucoup; tu fumes trop” (he eats a lot; you smoke too much): manghja mori ; fumi troppu

Let’s call these categories verb modulators. The fact of being placed after the verb is linked to the fact that the modulator modifies the meaning of the verb. Moreover, if we reason in terms of two-sided grammar, a verb followed by a modulator remains a verb: V-MODV = V.

Grammatical categories by position: the case of adverbs and adjective modulators

If we look at the case of adverbs and try to define them by their position in relation to other grammatical categories, it follows that we need to split the adverbs into several different categories. To begin with, some adverbs are placed before an adjective:

  • peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, etc.
  • pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, etc.
  • not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so, etc.

Let’s call these categories modulators (of adjectives). The fact of being placed before the adjective is linked to the fact that the modulator modifies the meaning of the adjective. Moreover, if we reason in terms of two-sided grammar, a modulator followed by an adjective remains an adjective: MODAQ-AQ = AQ.

The fact that we are dealing here with adjective modulators is well illustrated by the fact that the equivalents in the Corsican language accord with the corresponding adjectives: “ils sont bien contents” = sò belli cuntenti = they are well satisfied; “elle est tellement contente” = hè tanta cuntente = she’s so happy.

Defining grammatical types

It seems that a reflection on the nature of grammatical type is necessary. The categories of common noun, qualifying adjective, verb, personal pronoun, etc. are well known. But what is a grammatical type? What is the criterion for distinguishing them? For the purposes of machine translation, the notion of grammatical type must be defined rigorously and precisely. Categories have an important role to play in the translation process, especially in the crucial process of disambiguation. The appropriate criterion here seems to be the place of one grammatical category in relation to another. For example, the definite article determinant precedes the common noun.

New considerations on priority language pairs for machine translation: thinking to Gallurese language

The question is whether to implement the Italian-Gallurese pair or the French-Gallurese pair. As already emphasized, the Italian-Gallurese pair is a priority. But since some excellent translators such as Deepl are able to translate Italian-French very correctly, it follows that the Italian-Gallurese translation can be done in two steps: first Italian-French and then French-Gallurese. Implementing the French-Gallurese pair is much easier for rule-based MT than implementing the Italian-Gallurese pair. The idea here is to keep the priorities, but these priorities could be reached more easily with the help of intermediate pairs.

Palindroms in Corsican language

Here are a certain number of palindromes, in each of the main variants of the Corsican language:

  • cismuntincu: apa (bee), ala (wing), anilina (aniline), elle (them), ebbe (did have), esegese (exegesis), issi (you hoist), inni (hymns), usu (usage)
  • sartinesu: abba (bee), adda (garlic), ala (wing), anilina (aniline), iddi (them), issi (you hoist), inni (hymns), iri (ires), uttettu (byte), usu (usage)
  • taravesu: abba (bee), ala (wing), anilina (aniline), issi (you hoist), inni (hymns), iri (ires), uttettu (byte), usu (use)

Further reflexions on the status of “I love you” in Corsican language

Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?

Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.

Update to priority pairs for endangered languages

If we were to update the priorities for language pairs to be achieved, from the point of view of endangered languages, the result would be as follows:

  • Corsican language: French to Corsican (already done)
  • Sardinian Gallurese: Italian to Gallurese
  • Sardinian Sassarese: Italian to Sassarese
  • Sicilian: Italian to Sicilian: sicilian language is close to Corsican sartinesu or taravesu
  • Munegascu: French to Munegascu: munegascu language bears some similarities with Corsican language

Pairs such as French to Gallurese, French to Sassarese, English to Gallurese, English to Sassarese, English to Sicilian do not have priority, as they can be resolved using an intermediate pair. French to Gallurese is done with the French to Italian pair (e.g. with Deepl) and then with the Italian to Gallurese pair, etc.

The enigmatic grammatical status of “I love you” in Corsican language

Translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed translation ‘ti tengu caru/cara‘, whose (difficult) disambiguation must be done according to the context.

It is worth sketching a few ideas, in order to get some insight into this issue. First of all, let’s look at the problem synthetically. This underlines the problem inherent in the grammatical status of the sentence ‘je t’aime’ (I love you) in French or in English, as it is not known whether it is addressed to a male or a female person. If one were to assign a gender to this sentence, it would therefore be masculine or feminine, with an inherent ambiguity. Assigning in some way a gender – masculine or feminine – to a sentence may seem strange prima facie, but it could prove useful (to be confirmed) In this case, the gender associated with the sentence would be inherited from the pronoun ‘t’ (short for ‘te’) which remains unambiguated with the sentence ‘je t’aime’ (I love you, ti tengu caru/cara) alone.

Second, let’s look at the issue from an analytical perspective. For another way to solve the problem could be to assign a reference to the pronoun ‘te’ (you). The latter could be identified according to the context. This sounds more promising and more in line with the well-known problem of pronoun resolution.

The taxonomy optimization problem

Let us add further reflexions on the remaining 1% problem. As hinted at previously, the remaining 1% problem may only be solved by general AI (GAI). Let us sketch in a series of posts what features are required for general AI in this context. On feature of GAI would be the ability to solve the ‘taxonomy optimization problem’. Let’s focus on defining it (very roughly, to begin with). Let us consider a given language, defined with a certain number of words, and a corpus of sentences (or a set of rules to define licit sentences in this language). In this context, the ‘taxonomy optimization problem’ is the question of deciding what is the simplest taxonomy with its associated rules to resolve the type ambiguities existing in this language? This feature of GAI would be notably capable of defining the best taxonomy for resolving type ambiguities existing within this language. And it is possible that such a feature of GAI would revolutionize grammar and our present grammatical taxonomy.

More on polymorphic disambiguation…

Let’s take another look at polymorphic disambiguation. We shall consider the French word sequence ‘nombre de’. The translation into Corsican (the same goes for English and other languages) cannot be identical, because ‘number of’ can be translated in two different ways. In the sequence ‘mais nombre de poissons sont longs’ (but many fish are long), ‘number of’ is an indefinite determiner: it translates as bon parechji (many). On the other hand, in the sequence ‘mais le nombre de poissons est supérieur à dix’ (but the number of fish is greater than ten), ‘nombre de’ is a common name followed by the preposition ‘de’: it is translated by numaru di (number of). Statistical MT does usually better than human-like (rule-based) MT at polymorphic disambiguation (I did a test with both sentences with Deepl and Google translate, and both of them successfully solve the relevant polymorphic disambiguation), but it turns out that human-like (rule-based) MT is also capable of handling that.

More on the remaining 1% problem

The analysis of the Wikipedia article of the day in French is interesting, in the sense that it sheds light on the skills that will be necessary for a machine translation system to achieve a 100% accurate translation. The error that appears here is characteristic and must probably be placed in the missing 1% to achieve 100% accuracy in the translation (the problem of the remaining 1%). The phrase ‘Her father studied at the University of Oregon and then at Yale Law School‘ has a definite article with elision: l’. The translation given (u/a, i.e. indeterminate between the masculine definite article u and the feminine definite article a) is not correct in that it fails to determine the gender – masculine or feminine – of Yale Law School, the name of an English school. In order to provide the correct translation, it is necessary to know how to translate Yale Law School into Corsican, and thus to determine that school is translated by scola, which is feminine. Therefore the correct translation should have been: po à a Yale Law School prima di ….
This finally shows that a translator capable of translating with 100% performance must be able (i) to determine the language in which the text parts are written in another language and (ii) to translate those text parts into the target language. This highlights the skills necessary to successfully achieve the remaining 1% are: (i) the ability to determine the language of a subtext and (ii) the ability to translate a subtext from any language in the target language.

Presently, we can only conjecture that this ability to solve the remaining 1% requires artificial general intelligence (AGI ). Now providing concrete and detailed examples may help to confirm or disprove that hypothesis.

More on two-sided grammar

Let us expand the idea of two-sided (from the analytic/synthetic duality standpoint) grammatical analysis: consider, for example, ‘beaucoup et souvent’ (a lot and often) in the sentence ‘il mange beaucoup et souvent’ (he eats a lot and often). Analytically, ‘beaucoup et souvent’ is composed of and adverb (‘beaucoup’, a conjunction (‘et’) and another adverb (‘souvent’). But synthetically, ‘beaucoup et souvent’ is an adverb, the structure of which is ADVERB+CONJUNCTIONCORD+ADVERB, according to the meta-rule ADVERB = ADVERB+CONJUNCTIONCORD+ADVERB . In the same way, ‘beaucoup mais souvent’ (a lot but often) is also, from a synthetic point of view, an adverb. Analogously, ‘rarement ou souvent’ (rarely or often) is also an adverb, from a synthetic viewpoint. In the same way, ‘rarement voire jamais’ is also a synthetic adverb. This leads to considering ‘even’ as a conjunction of coordination.

Now it is patent that we can expand on that. As hinted at earlier, it seems some progress in rule-based machine translation (we should better speak of, say, ‘human-like MT, since it mimics human reasoning) requires revolutionizing grammar.

Some advance in polymorphic disambiguation

Just powered the new engine (prototypal, not yet transferred to the API which is used both by the current site translator and the Android application) and made a few tests: it works! Let us take an example with French ‘en fait’: ‘en fait’ (in fact, actually, difatti) from the viewpoint of two-sided grammar is synthetically an adverb, made up – analytically – of a preposition followed by a singular noun. ‘en fait’ is polymorphic in the sense that it may also be part of the prepositional locution ‘en fait de’ (in fact of, in fatti di). Alternatively, ‘en fait’ may also be a pronoun (‘en’, it, ni) followed by the present tense (‘fait’, faci) of the verb ‘faire’ (makes) at the 3rd person of the singular. So, ‘en fait’ is highly ambiguous and context-sensitive.

As the above screenshot illustrates, the new engine handles adequately the three kinds of ‘en fait’. It could be kind of a breakthrough with regard to rule-based translation, since it is a well-known weakness of this type of MT implementation. Presumably, this progress on polymorphic disambiguation opens the path to some 95% or 96% scoring.

Autonomous MT system

Let us speculate about what could be an autonomous MT system. In the present state of MT we provide rules and dictionary to the software (rules-based translation) or we feed it with a corpus regarding a given pair of languages (statistical MT). But let us imagine that we could do otherwises and build an autonomous MT system. We provide the MT system with a corpus regarding a given source language. It analyses, first, the thoroughly this language. It begins with identifying single words. It creates then grammatical types and assigns then to the vocabulary. It also identifes locutions (adverbial, verbal, adjective locutions, verb locutions, etc.) and assigns them a grammatical type. The MT system also identifies prefixes and suffixes. It also computes elision rules, euphony rules, etc. for that source language.
Now the autonomous MT system should, second, do the same for the target language.
The MT system creates, third, a set of rules for translating the source language into the target one. For that purpose, the MT system could for example assign a structured reference to all these words and locutions. For instance, ‘oak’ in English refers to ‘quercus ilex’, ‘cat’ refers’ to ‘felis sylvestris’. For abstract entities, we presume it would not be a trivial task… Alternatively but not exclusively, it could use suffixes and exhibit morphing rules from the source language to the target one.

Is it feasible or pure speculation? It could be testable. Prima facie, this sounds like a different approach to IA than the classical one. It operates at a meta-level, since the MT system creates the rules and in some respect, builds the software.

On the statistical/rule-based divide regarding MT

The classical divide with regard to MT separates statistical from rule-based MT. But this divide is not as clear-cut as one could think at first glance. For rule-based MT can operate statistically. Let us take an example, concerning the disambiguation of French ‘est’: it can be translated either as is or as east, depending on the context. Defining the rules for disambiguating ‘est’ can be somewhat complicated. A rule-based MT could then define a few rules that would cover 90% of the cases, and for the remaining 10%, it could apply a closure rule that translates ‘est’ into is inconditionnally. Such rule would be based on the statistical fact that most often, ‘est’ translates into is and not into east. Such rule may succeed in most of the cases. As we see it, such rule is statistical by essence. Hence the conclusion, the statistical/rule-based divide regarding MT is not as as clear-cut as one could think prima facie. For a disambiguating system for rule-based MT could be built with closure rules of this type, that would ooperate statistically.