Tag Archives: semantic disambiguation

Disambiguating ‘nombre de’

Let us consider here the disambiguation of ‘nombre de’ which can be according to the cases:

  • a singular masculine noun followed by a preposition: in this case, ‘nombre de’ translates to numaru di (number of)
  • an indefinite pronoun: in this case, French ‘nombre de’ translates to Corsican into bon parechji (many, a great many)

Si tratta quì di a disambiguazioni di ‘nombre de’ chì pò essa siont’è i casi:

  • un nomu maschili singulari suvitatu da una pripusizioni: in ‘ssu casu, ‘nombre de’ si traduci pà numaru di
  • un prunomu indefinitu: in ‘ssu casu, ‘nombre de’ pò essa traduttu in corsu da bon parechji

Semantic disambiguation of French ‘femme’: in the mud, gold is still shining

In Corsican language, French word ‘femme’ can be translated, depending on the context

  • either into donna (woman)
  • or into moglia (wife)

The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles successfully the semantic disambiguation (hard) of ‘femme’, two instances of which are properly translated into moglia (wife). As the Corsican proverb says, in a cianga l’oru luci sempri (in the mud, gold is still shining).

French samples are from the French corpora of the University of Leipzig.

Word-sense disambiguation: first test of new engine

Now testing the new engine with the semantically ambiguous French ‘échecs’ = fiaschi/scacchi (failures/chess).

What is interesting here is that semantic disambiguation transfers successfully into English (although the French/English engine is still in its infancy as there are still a lot of grammatical errors):

Now further tests are needed with some other semantically ambiguous words:

  • ‘défense’: defense/tusk; Corsican: difesa/sanna
  • ‘fils’: sons/wires; Corsican: figlioli/fili
  • ‘comprendre’:
    understand/comprise; Corsican: capisce/cumprende
  • ‘vol’: flight/theft; Corsican: bulu/arrubecciu
  • ‘voler’: fly/steal; Corsican: bulà/arrubà
  • ‘échecs’: chess/failures; Corsican: scacchi/fiaschi
  • ‘palais’: palace/palaces/palate/palates; Corsican: palazzu/palazzi/palate/palates

In the background, the unresolved threefold ambiguity of French ‘partie’ = parti/partita/partita (part/game/gone) is lurking…

Feigenbaum test and semantic disambiguation

Now it is patent that there cannot be successful  Feigenbaum test (i.e. not only occasional Feigenbaum hits, but regular and average performance) without an adequate treatment of semantic disambiguation. Arguably, it is one hard problem of machine translation. Here are some typical instances:

  • ‘défense’: defense/tusk; Corsican: difesa/sanna
  • ‘fils’: sons/wires; Corsican: figlioli/fili
  • ‘comprendre’:
    understand/comprise; Corsican: capisce/cumprende
  • ‘vol’: flight/theft; Corsican: bulu/arrubecciu
  • ‘voler’: fly/steal; Corsican: bulà/arrubà
  • ‘échecs’: chess/failures; Corsican: scacchi/fiaschi
  • and the fourfold ambiguous ‘palais’: palace/palaces/palate/palates; Corsican: palazzu/palazzi/palate/palates

In short: no successful semantic disambiguation = no genuine successful  Feigenbaum test. Semantic disambiguation engine needs to be rewritten.