An analysis of French word ‘très’

According to our analysis, the word ‘très’ is likely to occur in the following grammatical types:

  • Adjective modifier: here, ‘très’ modifies the meaning of an adjective: très beau (very beautiful, biddisimu), très content (very happy, cuntentissimu)
  • Adverb modifier: ‘très’ here modifies the meaning of an adverb: ‘très rarement’ = very rarely, raramenti; ‘très souvent’ = very often, mori à spessu
  • Adverb (i.e. in our terminology, a Verb modifier): ‘very’ modifies here the meaning of a verb: ‘j’ai très faim’ = I am very hungry, t’aghju mori fami; ‘il avait très soif’ = he was very thirsty, t’aia mori seti: where the verb is here the verbal locution ‘avoir faim’ = to be hungry, avè a fami; avoir soif = to be thirsty, avè a seti

Grammatical taxonomy again: the case of prepositions

Let’s look at the translation of the word ‘whose’. Depending on the case, ‘whose’ can be a

  • relative pronoun: ‘la difficulté dont je t’ai parlé’ (the difficulty I told you about), ‘voilà le professeur dont j’apprécie beaucoup les cours’ (this is the teacher whose classes I really enjoy.)
  • or, more rarely, a preposition: ‘il y avait cinq couleurs, dont le rouge et le bleu’. (there were five colours, including red and blue.)

It is the latter case that we will be looking at. In this case, ‘dont’ is translated into English as ‘including’. In Corsican, the translation is: c’eranu cinque culori, frà i quali u rossu è u turchinu. But if we translate ‘il y avait cinq plantes, dont le ciste et la bruyère’ (‘there were five plants, including cistus and heather’), we get: c’eranu cinque piante, frà e quale u muchju è a scopa. Thus the translation of ‘dont’ (including) as a preposition is either frà i quali (masculine plural, culore being masculine in Corsican) or frà e quale (feminine plural), depending on which noun ‘dont’ refers to.

Thus ‘dont’ is translated into the masculine plural or the feminine plural, depending on the noun – either masculine or feminine – to which it refers. This casts doubt on the ‘prepositional’ nature of ‘dont’, and leads to further analysis to determine whether there might not be a more suitable grammatical type.

It is worth noting that ‘dont (including) can be replaced by ‘parmi lequels’ (among which, frà i quali) or ‘parmi lesquelles’ (among which, frà e quale) depending on whether the noun to which ‘whose’ refers is in the masculine plural or the feminine plural. This suggests that ‘whose’ could be conceived of as a preposition followed by a pronoun. In the spirit of this analysis, the BDL site notes: ‘Dont’ is probably the relative pronoun whose use is the most delicate. To use it correctly, one must know that dont always ‘hides’ the preposition ‘de’; ‘dont’ is equivalent to ‘de qui’, ‘de quoi’, ‘duquel’, etc. This link between ‘dont’ and ‘de’ goes back to the Latin origin of ‘dont’, which is from ‘unde’ “from where”.

More generally, this suggests that further analysis of some prepositions may be needed.

Creating new grammatical types

Italian has ‘prepositions followed by articles’ (preposizione articolate). This is a specific grammatical type, which refers to a word (e.g. della) that replaces a preposition (di) followed by an article (la):

	il	lo	l’	la	i	gli	le
di	del	dello	dell’	della	dei	degli	delle
a	al	allo	all’	alla	ai	agli	alle
da	dal	dallo	dall’	dalla	dai	dagli	dalle
in	nel	nello	nell’	nella	nei	negli	nelle
su	sul	sullo	sull’	sulla	sui	sugli	sulle

This specific grammatical type also corresponds to:

  • in French: du = de le, des = de les
  • in Corsican and especially in the Sartenese variant: ‘llu = di lu, ‘lla = di la, etc.

This raises the general problem of the number of grammatical types we should retain. Should we create new grammatical types beyond the classical ones, in order to optimise translators and NLP in general? What is the best grammatical type to retain for ‘prepositions followed by an article’: a new primitive one or a compound one (always keeping Occam’s razor in mind)? A preposition followed by an article behaves like a preposition for words on its left, and like an article for words on its right.

Adjective modifiers again

We will consider again a category of words such as ‘very’, when they precede an adjective. Traditionally, this category is termed ‘adverbs’ or ‘adverbs of degree’, but we prefer ‘adjective modifier’, because (i) analytically, they change the meaning of an adjective and (ii) synthetically, an adjective modifier followed by an adjective is still an adjective. A more complete list is: almost, absolutely, badly, barely, completely, decidedly, deeply, enormously, entirely, extremely, fairly, fully, greatly, hardly, highly, how, incredibly, intensely, less, most, much, nearly, perfectly, positively, practically, pretty, purely, quite, rather, really, scarcely, simply, somewhat, strongly, terribly, thoroughly, totally, utterly, very, virtually, well.

If we look at sentences such as: il est bien content (he is very happy, hè beddu cuntenti), ils étaient bien contents (they were very happy, erani beddi cuntenti), elle serait bien contente (she would be very happy, saria bedda cuntenti), elles sont bien contentes (they are very happy, sò beddi cuntenti), we can see that the modifier of the adjective ‘bien’ is rendered as very in English and in Corsican as:

  • bellu/beddu: singular masculine
  • belli/beddi: plural masculine
  • bella/bedda: feminine singular
  • belle/beddi: feminine plural

This shows that the adjective modifier is invariable in French and English, but varies in gender and number in Corsican. Thus, in Corsican grammar, it seems appropriate to distinguish between:

  • singular masculine adjective modifier
  • plural masculine adjective modifier
  • singular feminine adjective modifier
  • plural feminine adjective modifier

On the other hand, such a distinction does not seem useful in English and French, where the category of ‘adjective modifier’ is sufficient and there is no need for further detail.

On the category of adverb modifiers

Let’s continue to rethink the gruesome (so is it argued here) category of adverbs (in the classical sense). Let’s now turn our attention to the category of ‘adverb modifiers’. Adverbs are understood here in a restricted sense: they are either verb modifiers or proposition modifiers. In this context, we are likely to encounter adverb modifiers. In general, the adverb modifier precedes the adverb. Thus, very (‘très’) is an adverb modifier in the sequence he was eating very rarely (il mangeait très rarement’, manghjava mori raramenti).

Likewise more (‘plus’, più) is in some cases an adverb modifier. This is the case in the sequence he was drinking more frequently (‘il buvait plus fréquemment’, biia più suventi).

The case of adjective modifiers and the notion of grammatical proof

Let’s consider again the case of adjective modifiers (in classical grammar, this category of words are considered as degree adverbs). These include the following: peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,… We have argued that this category of words are ‘adjective modifiers’, when they precede an adjective. But is such an assertion likely to be proven, or is there some form of evidence available? Grammar, like other disciplines, requires that assertions be justified, and if possible proven. The notion of proof in grammar, however, is uncommon. Let’s see if we can provide such proof or justification?

Consider the case of ‘tellement’ (so much), which we consider to be an adjective modifier when it precedes an adjective. Now, let us consider the following translations, where ‘tellement’ is used:

  • in French: il est tellement beau, ils sont tellement petits, elles est tellement belle, elles sont tellement intelligentes
  • in English: it is so beautiful, they are so small, they are so beautiful, they are so smart
  • in Corsican: hè tantu bellu, sò tanti chjuchi, hè tanta bella, sò tante intelligente (an alternative translation hè: hè cusì bellu, sò cusì chjuchi, hè cusì bella, sò cusì intelligente)
  • in Italian: è così bello, sono così piccoli, sono così belli, sono così intelligenti

It is patent here that ‘tellement’ preceding an adjective is translated in Corsican by:

  • tantu, when the adjective is singular masculine
  • tanti, when the adjective is plural masculine
  • tanta, when the adjective is singular feminine
  • tante, when the adjective is plural feminine

Thus ‘tellement’ (so much, tantu/tanti/tanta/tante), employed in this usage, i.e. preceding an adjective, accords with the adjective to which it refers. This sounds as a justification of its classification as an adjective modifier.

The status of adjective modifiers

What is the status of adjective modifiers (tant, tout juste, un rien, un tantinet, très, extrêmement, … = so much, just a little, a little, a little, very, extremely, …) in the present grammatical typology? Adjectives are defined as noun modifiers. So adjective modifiers would be modifiers of noun modifiers? This sounds intriguing. In reality, we do not have the concept of ‘modifiers of modifiers’. In fact, we have the following rules:

  • a verb modifier followed by a verb is a verb
  • a determinant modifier followed by a determinant is a determinant
  • and generally speaking, a modifier of an X followed by an X is an X (where X is a given grammatical type)
    So a noun modifier followed by a noun is a noun, i.e. an adjective followed by a noun is a noun. For example: ‘un très beau livre’ (a very nice book), where ‘very’ is an adjective modifier, ‘nice’ is an adjective, i.e. a noun modifier, and ‘book’ is a noun.
    Hence finally, ‘an adjective modifier is a modifier of a noun modifier’ reads as follows: an adjective modifier is a modifier of [noun modifier].

More on two-sided grammar

Let’s focus on analyzing the following phrases:

  • à force de courage (bravely)
  • à force de courage et de persévérance (by dint of courage and perseverance)
  • avec beaucoup d’abnégation (selflessly)
  • d’une manière ou d’une autre (in any way)
  • d’une façon vraiment admirable (in a very admirable way)
  • au moment le plus opportun (when most appropriate)

What is their grammatical nature? From the point of view of two-sided grammar, what are they?

From a synthetic standpoint, first of all, they are adverbs. Let us turn now to their nature from an analytical point of view.

  • à force de courage (bravely): analytically, it is a preposition, followed by a common noun, then another preposition, then another common noun: PS-NC-PS-NC.
  • à force de courage et de persévérance (by dint of courage and perseverance): analytically, it is a preposition, followed by a common noun, then another preposition, then another common noun, then a conjunction, then another preposition and then another common noun: PS-NC-PS-NC-CONJ-PS-NC.
  • and so on

Reflections on grammatical typologies

It is useful to point out the differences that may exist between different grammatical typologies. The classical grammatical taxonomy is essentially aimed at teaching and comprehension. It therefore has a pedagogical purpose. On the other hand, the taxonomy that is useful for rule-based machine translation has a different purpose: it aims essentially at allowing disambiguation, both grammatically and semantically, because ambiguity is a fundamental and very common problem in this particular context. Such a typology essentially focuses on the location of word types, on the structures encountered in the sentence. This explains why typologies can be different, as they have different goals and purposes.

Expanding on noun modulators

Let’s take a closer look at noun modulators, especially common noun modulators. We have seen that adjectives could be considered, in the present conceptual framework, as noun modulators. In this context, the question arises, are there other forms of noun modulators? It seems that there are.

Let us consider elements of sentences such as ‘bois de châtaignier’ (chestnut wood; legnu castagninu) or ‘oiseau de proie’ (bird of prey; aceddu di preda). In ‘bois de châtaignier’, ‘de châtaignier’ seems to play the role of noun modulator, in the same way as an adjective. In traditional grammar, ‘de châtaignier’ is considered as a noun complement. In the present framework, it would be a noun modulator, since it clarifies and restricts the meaning of the noun ‘bois’ (wood; legnu). The role of ‘de proie’ in ‘oiseau de proie’ is identical, as it acts as a modulator of the name ‘bird’.

Interestingly, it turns out that the comparison between languages tends to validate this type of analysis. Indeed, ‘bois de châtaignier’ is better translated in Corsican language by legnu castagninu than litterally by legnu di castagnu (chestnut wood); and in this case, castagninu (of chestnut) is an adjective, i.e. a noun modulator. Thus, castagninu and di castagnu being equivalent here, confirming in both cases their same nature of adjective modulator.

Modulators of determinants

We have mentioned the special category of determinant modulators. It seems that this category is interesting and deserves to be explored further. A determinant modulator is placed before a determinant and changes its meaning. As we have already seen, from the viewpoint of two-sided grammar, a determinant preceded by a determinant modulator (MODD) remains a determinant.

We can give some examples that apply to different categories of determinants:

  • MODD applying to possessive determinants (mes, tes, ses, nos, vos, leurs; my, your, his/her, our, your, their; i me, i to, i so, i nostri, i vostri, i so), demonstrative determinants (ces; these; ‘ssi/’sse) and definite article determinants (les; the; i/e): certaines de, certains de, l’un de, l’une de, la majeure partie de, la plupart de, tous, toutes, une bonne partie de, une grande partie de; some of, some of, one of, one of, most of, most of, all of, all of, a good part of, a large part of; une poche di, uni pochi di, unu di, una di, parte è più di, a maiò parte di, tutti, tutte, une bella parte di, parte assai di. Here are some examples: “certains de mes chevaux étaient bruns” (some of my horses were brown; uni pochi di i me cavalli eranu bruni); la majeure partie des (= de les) habitants étaient riches (most of the inhabitants were rich; a maiò parte di l’abitanti eranu ricchi).

In addition, we have three other categories of MODDs that have already been mentioned:

  • MODD applying to cardinal determinants (deux, trois, quatre, cinq, … ; two, three, four, five…; dui, trè, quattru, cinqui,…): au moins, presque, quasiment, environ, plus de, moins de, approximativement, etc. (at least, almost, nearly, about, more than, less than, approximately, etc. ; alminu, guasgi, guasgi, circa, più di, menu di, apprussimativamenti, etc.)
  • MODD applying to indefinite article determinants: plus de, au moins; more than, at least; più di, alminu
  • MODD applying to indefinite determinants (aucun, aucune, quelques; none, none, a few ; nisciunu, nisciuna, calchì): au moins, presque; at least, almost; alminu, guasi

Finally, it seems that this category of MODD has some consistency and could be of practical interest.

Grammatical categories by position again: the case of adverbs and modulators placed before a modulator

Let us try to delve more deeply into the case of adverbs. We shall continue now to define them by their position in relation to other grammatical categories. The result is that adverbs are divided into several different categories. Now let’s look at the adverbs that may be placed before an adjective modulator. To begin with, let us cite but a few adjective modulators:

  • peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, etc.
  • pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, etc.
  • not veryveryextremelyespeciallysurprisinglyhardlyreallyenoughall/very, tooso, etc.

Now some modulators of adjective modulators are:

  • pas, peut-être, surtout, vraiment, etc.
  • micca, forse, soprattuttu, veramente, è cetera.
  • not, maybe, mostly, really, etc.

Here are some relevant examples: “il était surtout trop blanc” (he was mostly too white, era sopratuttu troppu biancu); “il était vraiment très beau” (he was really very beautiful, era propriu bellissimu); “il était bien trop grand” (he was far too tall ; era bellu troppu maiore).

Let’s call this category modulators of adjective modulators. The fact of being placed before the adjective modulator is related to the fact that the modulator modifies the meaning of the adjectivemodulator.

Hence, if we reason in terms of two-sided grammar, an adjective modulator preceded by a modulator remains an adjective modulator: MOD-MODAQ = MODAQ.

To sum up. So far we have distinguished several categories among the classical class of adverbs:

  • modulators of adjectives
  • modulators preceding verbs: verb pre-modulators
  • modulators following verbs: verb post-modulators
  • modulators preceding cardinal determinants
  • modulators preceding adjective modulators

Further reflexions on the status of “I love you” in Corsican language

Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?

Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.

More on two-sided grammatical analysis

Let us give some further examples of two-sided grammatical analysis:

  • “à dessein” (purposedly), “à volonté” (at will), “à tort” (mistakenly): from an analytical standpoint, these are prepositions followed by a singular noun. From a synthetical viewpoint, they are adverbs (adverbial locutions).
  • “à jamais” (forever): from an analytical standpoint, it is a preposition followed by an adverb. From a synthetical viewpoint, it is an adverb (adverbial locution).
  • “à genoux” (on my/his/her/… knees), “à torrents” (in torrents): from an analytical standpoint, these are prepositions followed by a plural noun. From a synthetical viewpoint, they are adverbs (adverbial locutions).

Two-sided grammatical analysis

Let us call two-sided grammatical analysis the type of grammatical analysis that will be described below. Two-sided grammatical analysis contrasts with one-sided analysis, which sees a sequence of words either as a locution type (adverbial locution, verbal locution, noun locution, etc.) or as the sequence of types of it constituent words. From the standpoint of two-sided grammatical analysis, a given sequence of words can be attributed one (synthetically) single type, and (analytically) several grammatical types corresponding one-by-one to its constituent words. The upshot is that a given sequence of words can be described from two – synthetic & analytic – different viewpoints. What is now the status of ‘de fait’, from the viewpoint of ‘two-sided grammatical analysis’? From a synthetic standpoint, it is an adverb. And from an analytic viewpoint, it is made up of one preposition (‘de’) followed by a common noun (‘fait’). Both viewpoints are complementary and cast each light on one facet of the same reality. (lacking the time to write a scholar article, but I hope the main idea should be clear…)

What are the conditions for a given endangered language to be a candidate for rule-based machine translation?

What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for:

  • a dictionary: some specialized lexicons are useful too
  • a list of locutions and their translation: to be more accurate what is needed are noun locutions, adjective locutions, adverbial locutions, verbal locutions and their translations in other language.
  • a detailed grammar (in any language): ideally, the grammar should be very detailed, mentioning notably irregular verbs, noun plurals, etc. Subjonctive, conditional tenses must also be accurately described.
  • in addition, elision rules, euphony rules, should also be described.
  • most importantly: a description of the main variants of the language and their differences. This is needed to handle what we can call the ‘variant problem’ (we shall say a bit more about this later): as an effect of diversity, endangered languages are often polynomic and come with variants. But translation must be coherent and a mix of several variants is not acceptable as a translation.

Let us mention that endangered languages are commonly associated with another language, being in a diglossia relationship one with another. To take an example, Corsican language is associated with French. So we consider the French-Corsican pair, and what is relevant is a French-Corsican. If we consider the sardinian gallurese language (‘gaddhuresu’), the relevant pair is Italian-Gallurese. Other relevant pairs are:

  • Italian-Sassarese
  • Italian-Sicilian
  • Italian-Venetian