Now testing large-scale self-evaluation. In the present sample, self-evaluation relates to a 7693 words (45437 characters) text from the French wikipedia article on Constance II (Constantius II): 414 errors found.
The present test illustrates well the benefits of self-evaluation: it runs fast, and gives a rough estimation of MT accuracy (± 2%).
Testing self-evaluation accuracy: in the present case, it yields a 100% performance. However, there is one error ‘par des explosifs’ should read da splusivi or even da i splusivi (by explosives): a problem of partitive article. Arguably, there is a second grammatical error to which self-evaluation is blind: ‘sont ensuite détruits’ should read sò distrutti dopu (are then destroyed): the problem lies in the fact that prepostion dopu should be placed more adequately before the verb. In short : human evalution yields 98,14% performance in the present case. (by the way, it seems average performance on MT open test is currently nearing 94%.)