Category Archives: Semantics: blog

Gallurese language

Our next project will be to implement the translation from Italian into Gallurese (gaddhuresu), or from French into Gallurese. The Gallurese language is close to the Corsican language, in particular to the ‘Rucchisgiana’ (Alta Rocca) or ‘Sartinese’ variant of the Corsican language. However, there are significant differences in writing and morphology between Gallurese and Corsican. A difficulty will be, as for the Corsican language, the management of the variants. The ideal would be to manage the main variants. In a first step, we will try to implement one of the main variants of the Gallurese language (we will preferably choose a well documented variant, such as the one used in the writings of Maria Teresa Inzaina).

Updating our grammatical typology

We now have the following categories in our grammatical taxonomy:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs (but in a restricted sense with regard to classical grammar)
  • adverb (still in a restricted sense) modifiers

To be noted: the classical category of adverbs comprises here the following categories:

  • adjective modifiers
  • verb modifiers
  • adverb modifiers

On the category of adverb modifiers

Let’s continue to rethink the gruesome (so is it argued here) category of adverbs (in the classical sense). Let’s now turn our attention to the category of ‘adverb modifiers’. Adverbs are understood here in a restricted sense: they are either verb modifiers or proposition modifiers. In this context, we are likely to encounter adverb modifiers. In general, the adverb modifier precedes the adverb. Thus, very (‘très’) is an adverb modifier in the sequence he was eating very rarely (il mangeait très rarement’, manghjava mori raramenti).

Likewise more (‘plus’, più) is in some cases an adverb modifier. This is the case in the sequence he was drinking more frequently (‘il buvait plus fréquemment’, biia più suventi).

The case of adjective modifiers and the notion of grammatical proof

Let’s consider again the case of adjective modifiers (in classical grammar, this category of words are considered as degree adverbs). These include the following: peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,… We have argued that this category of words are ‘adjective modifiers’, when they precede an adjective. But is such an assertion likely to be proven, or is there some form of evidence available? Grammar, like other disciplines, requires that assertions be justified, and if possible proven. The notion of proof in grammar, however, is uncommon. Let’s see if we can provide such proof or justification?

Consider the case of ‘tellement’ (so much), which we consider to be an adjective modifier when it precedes an adjective. Now, let us consider the following translations, where ‘tellement’ is used:

  • in French: il est tellement beau, ils sont tellement petits, elles est tellement belle, elles sont tellement intelligentes
  • in English: it is so beautiful, they are so small, they are so beautiful, they are so smart
  • in Corsican: hè tantu bellu, sò tanti chjuchi, hè tanta bella, sò tante intelligente (an alternative translation hè: hè cusì bellu, sò cusì chjuchi, hè cusì bella, sò cusì intelligente)
  • in Italian: è così bello, sono così piccoli, sono così belli, sono così intelligenti

It is patent here that ‘tellement’ preceding an adjective is translated in Corsican by:

  • tantu, when the adjective is singular masculine
  • tanti, when the adjective is plural masculine
  • tanta, when the adjective is singular feminine
  • tante, when the adjective is plural feminine

Thus ‘tellement’ (so much, tantu/tanti/tanta/tante), employed in this usage, i.e. preceding an adjective, accords with the adjective to which it refers. This sounds as a justification of its classification as an adjective modifier.

The status of adverbs

What are adverbs in the present grammatical taxonomy? Adverbs have a much more restrictive definition here than in their traditional definition. Adverbs in this typology are verb modifiers. Therefore, adverbs are distinct from:

  • adjective modifiers (such as peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,…
  • proposition modifiers, which change the meaning of a proposition

The status of adjective modifiers

What is the status of adjective modifiers (tant, tout juste, un rien, un tantinet, très, extrêmement, … = so much, just a little, a little, a little, very, extremely, …) in the present grammatical typology? Adjectives are defined as noun modifiers. So adjective modifiers would be modifiers of noun modifiers? This sounds intriguing. In reality, we do not have the concept of ‘modifiers of modifiers’. In fact, we have the following rules:

  • a verb modifier followed by a verb is a verb
  • a determinant modifier followed by a determinant is a determinant
  • and generally speaking, a modifier of an X followed by an X is an X (where X is a given grammatical type)
    So a noun modifier followed by a noun is a noun, i.e. an adjective followed by a noun is a noun. For example: ‘un très beau livre’ (a very nice book), where ‘very’ is an adjective modifier, ‘nice’ is an adjective, i.e. a noun modifier, and ‘book’ is a noun.
    Hence finally, ‘an adjective modifier is a modifier of a noun modifier’ reads as follows: an adjective modifier is a modifier of [noun modifier].

Grammatical typology again

What are the characteristics of the resulting grammatical typology? We now have the following categories:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs but in a restricted sense

The status of adjectives

What is the status of adjectives in the present grammatical typology? The notion of modifier is central to this taxonomy. Thus, the adjective is a noun modifier. In the expression ‘the blue sky’, ‘blue’ is a modifier of the noun ‘sky’. The definition of the adjective as a noun modifier is quite in line with the definition given for example by Merriam-Webster: ‘a word belonging to one of the major form classes in any of numerous languages and typically serving as a modifier of a noun to denote a quality of the thing named, to indicate its quantity or extent, or to specify a thing as distinct from something else’.

The case of new words for machine translation

Another case that argues for the use of rule-based translation, i.e. human-like, is the following. Frequently we come across a new word, a word we have never seen before. More often than not, a human knows how to translate it. Because there are rules that allow to translate a word from a given language into another language, even if we do not know the meaning of this last word. For example, ‘anthranilic acid’ can be translated precisely as ‘anthranilic acid’ by a human, even if he has no knowledge of the acid in question. For this type of ability to translate new words encountered, the statistical method is not adequate and the machine translator must have the ability to determine (i) the grammatical nature of the word in question; (ii) translate the new word encountered based on the morphological rules for translating words of this grammatical type from one language to another. An AGI, capable of translating, should possess this type of ability.

Characteristics of an AGI (artificial general intelligence)

What are the characteristics we want for an AGI (artificial general intelligence)? An AGI should have a very advanced capacity in NLP and language comprehension. One of the qualities we expect from an AGI is respect for multilingualism. Hopefully, the AGI should have extensive NLP capabilities, which apply to a large number of languages, and even to the 8000 languages of the planet, i.e. also to the 90% of endangered languages. The AGI could thus help to solve an important problem inherent to the problem of language extinction, which affects human cultural diversity (it can be assumed that some languages will be extinct at the time of the AGI event, but the AGI could thus help to revitalize them).

The two-language matching problem

Here is a problem for a human intelligence (or an AGI): we have a dictionary (with words, lemmas and grammatical types) in a language A and a second dictionary in a language B. If we have an extensive corpus of each of the two languages, is it possible to create a translation dictionary from A to B, and how? To take an example: if the two languages were French and English, we would have to associate ‘cheval’ with ‘horse’, etc. in the final translation dictionary, and so on for all the words of language A.

Highly related seems to be this paper: Deciphering Undersegmented Ancient Scripts Using Phonetic Prior.

Prototype of text search with optional grammatical type

Inconditional search

Let us expand the idea of text analysis derived from rule-based translation. Above is an example of a classic word-based search. In this particular case, it is the French word ‘été’. This word is ambiguous because it can be a common noun (‘summer’), or a past participle (‘been’). Below is an example of a search for the word ‘summer’ associated with the grammatical type ‘common noun’.

Conditional search based on ‘noun’ grammatical type

Finally, we have below an example of a search for the word ‘summer’ associated with the grammatical type ‘past participle’.

Conditional search based on ‘past participle’ grammatical type

Why it’s worth it to engage in rule-based translation

Rule-based translation is difficult to implement. The main difficulty encountered is taking into account the groups of words, so as to be on a par with statistics-based translation. The main problems in this regard are (i) polymorphic disambiguation; and (ii) building a fair typology of grammatical types. But once these steps begin to be mastered, there are many advantages. What seems essential here is that with the same piece of software, both machine translation and text analysis can be carried out. Among the modules that are easy to implement are the following:

  • lemmatizer
  • part-of-speech tagger
  • singularizer
  • pluralizer
  • grammar checker
  • type extractor: a module that allows you to extract words from a text according to their grammatical category

For the implementation of rule-based translation provides the machine with some inherent understanding of the text, in the same way that a human being does. To put it in a nutshell, it is better artificial intelligence.

Finally, other modules, more advanced, seem possible (to be confirmed).

A two-sided analysis of postpositions

#preposition #postposition Consider the following adverbs: après (after, dopu) (he would eat after), avant (before, nanzi) (they had seen them before). They can also be considered as prepositions:

  • après la fête: after the feast, dopu à a festa
  • avant le mois de juin: before the month of June, nanzi u mesi di ghjunghju
    Likewise, during is also a preposition: durant la procession, during the procession, mentri a prucissioni
    But après, avant, durant can also be used differently:
  • deux jours après: two days after, dui ghjorni dopu
  • une semaine avant: one week before, una sittimana innanzi
  • deux mois durant: for two months, mentri dui mesi
    From our point of view, these are postpositions, because they are then followed by punctuation (in general), and preceded by a common name.
    If we now extend this analysis to locutions, the following locutions are also postpositions:
  • plus tard: later, dopu; deux jours plus tard: two days later, dui ghjorni dopu
  • plus loin: further, più luntanu; trois mètres plus loin: three meters further
  • plus près: closer, più vicinu; dix centimètres plus près: ten centimeters closer

More on two-sided grammar


Let’s focus on analyzing the following phrases:

  • à force de courage (bravely)
  • à force de courage et de persévérance (by dint of courage and perseverance)
  • avec beaucoup d’abnégation (selflessly)
  • d’une manière ou d’une autre (in any way)
  • d’une façon vraiment admirable (in a very admirable way)
  • au moment le plus opportun (when most appropriate)

What is their grammatical nature? From the point of view of two-sided grammar, what are they?

From a synthetic standpoint, first of all, they are adverbs. Let us turn now to their nature from an analytical point of view.

  • à force de courage (bravely): analytically, it is a preposition, followed by a common noun, then another preposition, then another common noun: PS-NC-PS-NC.
  • à force de courage et de persévérance (by dint of courage and perseverance): analytically, it is a preposition, followed by a common noun, then another preposition, then another common noun, then a conjunction, then another preposition and then another common noun: PS-NC-PS-NC-CONJ-PS-NC.
  • and so on

Reflections on grammatical typologies

It is useful to point out the differences that may exist between different grammatical typologies. The classical grammatical taxonomy is essentially aimed at teaching and comprehension. It therefore has a pedagogical purpose. On the other hand, the taxonomy that is useful for rule-based machine translation has a different purpose: it aims essentially at allowing disambiguation, both grammatically and semantically, because ambiguity is a fundamental and very common problem in this particular context. Such a typology essentially focuses on the location of word types, on the structures encountered in the sentence. This explains why typologies can be different, as they have different goals and purposes.

Analyzing relative pronouns

What is the status of ‘relative pronouns’ of classical grammar within the present conceptual framework? Traditionally, a distinction is made between simple relative pronouns (qui, que, dont, où ; who, what, whose, where) and compound relative pronouns (à qui, pour lesquelles, à côté duquel, etc.; to whom, for whom, beside whom, etc.). If we look first at simple relative pronouns, the category does not seem satisfactory, in particular because of the presence of ‘qui’ (who) and ‘que’ (what), whose grammatical role appears, in the present context, to be quite different. Consider the two short sentences: ‘la maison que j’habite est grande’; et ‘l’homme qui parle est grand’. (the house I live in is big and the man who speaks is tall.). As these two examples illustrate, the structures following ‘que’ and ‘qui’ appear different. Here, ‘que’ is followed by a personal pronoun (‘j’habite’: I live) and a conjugated verb; and ‘qui’ is followed directly by a conjugated verb (‘parle’: speaks). From our present perspective, these are inherently different structures. Here, it turns out that ‘dont’ and ‘où’ admit the same type of structure as ‘que’. Thus, the homogeneous category, from our point of view, is formed here by ‘que’, ‘dont’, ‘où’, but not by ‘qui’. If we extend this analysis to other words, by searching for those who could fit into this category, we also find: ‘duquel’ (= de lequel; from which), ‘de laquelle’, ‘desquels’ (= de lesquels; from which), ‘desquelles’ (= de lesquelles; from which), ‘auquel’ (à lequel), à laquelle, ‘auxquels’ (à lesquels), ‘auxquelles’ (à lesquelles). But we also have all forms of the same type built from another preposition than ‘de’ or ‘à’: ‘sur lequel’, ‘sur laquelle’, …, ‘par lequel’, ‘par laquelle’, ‘avec lequel’, etc. Les pronoms relatifs composés classiques tels que ‘à qui’, ‘pour lesquelles’, ‘à côté duquel’, etc.; to whom, for whom, beside whom, etc.), s’intègrent également naturellement dans cette catégorie. But from the point of view of two-sided grammar, ‘à l’aide duquel’, ‘au moyen de laquelle’, ‘à la suite de quoi’, ‘à l’aide de qui’, etc. (with the help of which, by means of which, as a result of which, with the help of whom, etc.) also belong to this category. (to be continued)

Powering MT with two-sided grammar: the case of ‘près de’

‘près de’ (near) is considered to be a prepositive locution. From the viewpoint of two-sided grammar, it is (synthetically) a preposition, made up (analytically) of an adverb (‘près’) followed by the preposition ‘de’. In Corsican language, this is translated as vicinu à. But this grammatical analysis does not solve all cases, as the example above shows. Because in the sentence ‘depuis près de dix ans, il travaillait’ (for almost ten years, he has been working), ‘près de’ (almost; guasgi) has a different grammatical role. According to classical analysis, it would rather be an adverb.
In the present conceptual framework, we will analyze ‘près de’ (almost; guasgi) in ‘depuis près de dix ans, il travaillait’ (for almost ten years, he has been working) as a modulator of the cardinal determinant ‘dix’ (ten), i.e. as a modulator of cardinal determinant. A prototype implemented with this type of grammatical analysis then gives the correct translation, where ‘near’ is replaced by guasgi (nearly) . It seems that two-sided grammar is beginning to produce interesting results (to be confirmed).

Expanding on noun modulators

Let’s take a closer look at noun modulators, especially common noun modulators. We have seen that adjectives could be considered, in the present conceptual framework, as noun modulators. In this context, the question arises, are there other forms of noun modulators? It seems that there are.

Let us consider elements of sentences such as ‘bois de châtaignier’ (chestnut wood; legnu castagninu) or ‘oiseau de proie’ (bird of prey; aceddu di preda). In ‘bois de châtaignier’, ‘de châtaignier’ seems to play the role of noun modulator, in the same way as an adjective. In traditional grammar, ‘de châtaignier’ is considered as a noun complement. In the present framework, it would be a noun modulator, since it clarifies and restricts the meaning of the noun ‘bois’ (wood; legnu). The role of ‘de proie’ in ‘oiseau de proie’ is identical, as it acts as a modulator of the name ‘bird’.

Interestingly, it turns out that the comparison between languages tends to validate this type of analysis. Indeed, ‘bois de châtaignier’ is better translated in Corsican language by legnu castagninu than litterally by legnu di castagnu (chestnut wood); and in this case, castagninu (of chestnut) is an adjective, i.e. a noun modulator. Thus, castagninu and di castagnu being equivalent here, confirming in both cases their same nature of adjective modulator.

Modulators: the case of adjectives

Using the notion of modulator again, we can now insert adjectives into this framework: in this context, they consist of noun modulators (mostly common nouns, but sometimes proper nouns as well). The adjective, as a noun modulator, is placed either before or after the noun.

So we have the following categories:

  • modulators of nouns (= adjectives)
  • modulators of adjectives
  • modulators of verbs, i.e. adverbs in a restrictive but classical sense
  • modulators of determinants