Publications | 𝕄aTOS

Here you can find the publications produced in the context of the project, which can also be found on the HAL website.

Investigating Length Issues in Document-level Machine Translation

Ziqian Peng, Rachel Bawden, François Yvon. Investigating Length Issues in Document-level Machine Translation. Proc. MT-Summit 2025. Genève.

Transformer architectures are increasingly effective at processing and generating very long chunks of texts, opening new perspectives for document-level machine translation (MT). In this work, we challenge the ability of MT systems to handle texts comprising up to several thousands of tokens. We design and implement a new approach designed to precisely measure the effect of length increments on MT outputs. Our experiments with two representative architectures unambiguously show that (a)~translation performance decreases with the length of the input text; (b)~the position of sentences within the document matters, and translation quality is higher for sentences occurring earlier in a document. We further show that manipulating the distribution of document lengths and of positional embeddings only marginally mitigates such problems. Our results suggest that even though document-level MT is computationally feasible, it does not yet match the performance of sentence-based MT.

Morphological Competence of LLMs: Applied to Translation of Scientific Neologisms

"Paul Lerner. Morphological Competence of LLMs: Applied to Translation of Scientific Neologisms. ChangeLing (CMU) seminars, online, https://changelinglab.github.io/. 2025."

Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs

Paul Lerner and François Yvon. 2025. Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5181–5190, Abu Dhabi, UAE. Association for Computational Linguistics.

Large Language Models (LLMs) rely on subword vocabularies to process and generate text. However, because subwords are marked as initial- or intra-word, we find that LLMs perform poorly at handling some types of affixations, which hinders their ability to generate novel (unobserved) word forms. The largest models trained on enough data can mitigate this tendency because their initial- and intra-word embeddings are aligned; in-context learning also helps when all examples are selected in a consistent way; but only morphological segmentation can achieve a near-perfect accuracy.

Towards the Machine Translation of Scientific Neologisms

Paul Lerner and François Yvon. 2025. Towards the Machine Translation of Scientific Neologisms. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5181–5190, Abu Dhabi, UAE. Association for Computational Linguistics.

Scientific research continually discovers and invents new concepts, which are then referred to by new terms, neologisms, or neonyms in this context. As the vast majority of publications are written in English, disseminating this new knowledge to the general public often requires translating these terms. However, by definition, no parallel data exist to provide such translations. Therefore, we propose to leverage term definitions as a useful source of information for the translation process. As we discuss, Large Language Models are well suited for this task and can benefit from in-context learning with co-hyponyms and terms sharing the same derivation paradigm. These models, however, are sensitive to the superficial and morphological similarity between source and target terms. Their predictions are also impacted by subword tokenization, especially for prefixed terms.

Survey of existing Human Metrics and Protocols for Document-Level Machine Translation

Maud Bénard, Natalie Kübler, Alexandra Mestivier, Joachim Minder et Lichao Zhu. Étude des Protocoles d'Évaluation Humaine pour la Traduction de Documents. Rapport D4-1.1, Projet ANR MaTOS. 2024, pp.84.

This report provides an overview of the various protocols for evaluating the quality of human translation, machine translation and/or post-editing. After a brief summary of the automatic metrics developed in NLP, we focus on the evaluation protocols implemented by humans. Psychological approaches are distinguished from textual or discursive approaches. We discuss in more detail the description of textual approaches, i.e. mainly error typologies, in theoretical, professional and pedagogical contexts for evaluating the quality of human and machine translation and post-editing. Finally, we develop the new typology adapted to these three types of production, which is being implemented in the MaTOS project. The manual for this typology is presented in the appendix.

Handling Very Long Contexts in Neural Machine Translation: a Survey

Ziqian Peng, Rachel Bawden, François Yvon. Handling Very Long Contexts in Neural Machine Translation, a Survey. Livrable D3-2.1, Projet ANR MaTOS. 2024, pp.50.

Ce rapport étudie les méthodes visant à intégrer un contexte discursif étendu en traduction automatique (TA), en se focalisant sur les méthodes de traduction neuronales. Les systèmes de traduction automatique traduisent en général chaque phrase indépendemment de ses voisines, ce qui entraîne des erreurs systématiques qui résultent d'un contexte discursif trop étroit. Diverses approches ont été proposées pour intégrer le contexte au-delà de la phrase courante, en s'appuyant sur l'architecture transformeur, qui est l'architecture prédominante en TA. Récemment, l'introduction de grands modèles de langue (LLM) a également créé de nouvelles opportunités pour traiter les dépendances à longue portée, donnant lieu à la formulation d'approches holistiques de la traduction, qui prennent en compte un contexte étendu. Nous discutons des défis que pose la traduction de longs documents, avant de présenter les méthodes proposées pour les architectures encodeurs-décodeurs et les approches à base de LLM, avec un bref aperçu des implémentations efficaces pour les transformeurs, qui subsubmment ces deux types de modèles. En complément, nous considérons également des stratégies d'extension de la fênetre du contexte pour d'autres tâches de TAL; nous avons également listé des corpus de documents parallèles récemment disponibles en source ouverte, pour une exploration future. Nous concluons par un résumé des travaux actuels et des principales directions de recherche.

Towards Machine Translation of Scientific Neologisms (In French)

Paul Lerner, François Yvon. Vers la traduction automatique des néologismes scientifiques. Proceedings of the 31st Conférence sur le Traitement Automatique des Langues Naturelles, pages 245-261, Toulouse, France, ATALA.

Scientific research continually discovers and invents new concepts, which are then referred to by new terms, neologisms, or neonyms in this context. As the vast majority of publications are written in English, disseminating this new knowledge in French often requires translating these terms, to avoid multiplying anglicisms that are less easily understood by the general public. We propose to explore this task using two thesauri, exploiting the definition of the term to translate it more accurately. To this end, we explore the capabilities of two large multilingual models, BLOOM and CroissantLLM, which can translate scientific terms to some extent. In particular, we show that they often use appropriate morphological procedures, but are limited by the segmentation into sub-lexical units. They are also biased by the frequency of term occurrences and surface similarities between English and French.

Document Level Machine Translation: does length matter? (In French)

Ziqian Peng, Rachel Bawden and François Yvon (2024). Document Level Machine Translation: does length matter?. Proceedings of the 31st Conférence sur le Traitement Automatique des Langues Naturelles, pages 2-21, Toulouse, France, ATALA.

Today’s machine translation architectures can process long segments and go beyond the translation of isolated sentences, opening up the possibility of translating full documents. To achieve this goal, it is necessary to overcome several difficulties related to the length of source documents. In this work, we discuss document-level machine translation from an evaluation perspective, trying to answer a simple question: how can we measure whether translation performance degrades with document length? Our analysis, which compares encoder-decoder systems and a large language model using multiple metrics on a scientific document translation task, suggests that translating long documents holistically remains a challenging problem.

Les modèles Bloom pour le traitement automatique de la langue française

Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, Marco Naguib, Aurélie Névéol and François Yvon. Les modèles Bloom pour le traitement automatique de la langue française. 2024. Technical report.

The development of very large language models, capable of performing a large range of automatic language processing tasks, simultaneously requires to develop the infrastructure needed to evaluate these models, ideally covering as many tasks as possible. Numerous benchmarks have already been compiled for the English language, making it possible to evaluate these large models from multiple angles. Several multilingual test sets are also available, with a much lesser coverage, which are used to measure the ability of these models to handle multiple languages. In this paper, we present our efforts to assemble a multi-task evaluation set for French, which is then used to evaluate models from the BLOOM family. Our results confirm and complement the main evaluation results for BLOOM in English; they allow us to conclude that the performances obtained in French and English are very similar and even better when the prompts used at inference are written in the same language as the texts to analyze.

Translate your Own: a Post-Editing Experiment in the NLP domain

Rachel Bawden, Ziqian Peng, Maud Bénard, Eric Villemonte de La Clergerie, Raphaël Esamotunu, Mathilde Huguin, Natalie Kübler, Alexandra Mestivier, Mona Michelot, Laurent Romary, Lichao Zhu and François Yvon (2024). Translate your Own: a Post-Editing Experiment in the NLP domain. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation, pages 431–443, Sheffield, UK, European Association for Machine Translation.

The improvements in neural machine translation make translation and post- editing pipelines ever more effective for a wider range of applications. In this paper, we evaluate the effectiveness of such a pipeline for the translation of scientific documents (limited here to article abstracts). Using a dedicated interface, we collect, then analyse the post-edits of approximately 350 abstracts (English→French) in the Natural Language Processing domain for two groups of post-editors: domain experts (academics encouraged to post-edit their own articles) on the one hand and trained translators on the other. Our results confirm that such pipelines can be effective, at least for high-resource language pairs. They also highlight the difference in the post-editing strategy of the two subgroups. Finally, they suggest that working on term translation is the most pressing issue to improve fully automatic translations, but that in a post-editing setup, other error types can be equally annoying for post-editors.

Translating scientific abstracts in the bio-medical domain with structure-aware models

Sadaf Abdul Rauf, François Yvon (2024). Translating scientific abstracts in the bio-medical domain with structure-aware models. Computer Speech & Language, vol. 87.

Machine Translation (MT) technologies have improved in many ways and generate usable outputs for a growing number of domains and language pairs. Yet, most sentence based MT systems struggle with contextual dependencies, processing small chunks of texts, typically sentences, in isolation from their textual context. This is likely to cause systematic errors or inconsistencies when processing long documents. While various attempts are made to handle extended contexts in translation, the relevance of these contextual cues, especially those related to the structural organization, and the extent to which they affect translation quality remains an under explored area. In this work, we explore ways to take these structural aspects into account, by integrating document structure as an extra conditioning context. Our experiments on biomedical abstracts, which are usually structured in a rigid way, suggest that this type of structural information can be useful for MT and document structure prediction. We also present in detail the impact of structural information on MT output and assess the degree to which structural information can be learned from the data.

Document-level Machine Translation for scientific texts

Ziqian Peng (2023). Document-level Machine Translation for scientific texts. Mémoire de Master, Université Paris-Saclay.

While neural machine translation has seen significant progress during recent years at sentencelevel, translating full documents remains a challenge to efficiently incorporate document-level context. Various approaches have been proposed, but most of them consider only one to three previous source and/or target sentences as the context. This is not sufficient to faithfully translate some language phenomena, like lexical consistency and document coherence, especially in some scientific texts. In this work, we conducted experiments to include full contextual context and investigate the impact of all the past / future sentences on the source side with a context ablation study, on some abstracts from scientific publications. Our results show that future context is more influential than the past source context, and in our experiments, the Transformer architecture performs much better to translate the beginning of a long document than the end.

MaTOS Traduction automatique pour la science ouverte

Maud Bénard, Alexandra Mestivier, Natalie Kübler, Lichao Zhu, Rachel Bawden, Éric De La Clergerie, Laurent Romary, Mathilde Huguin, Jean-François Nominé, Ziqian Peng, François Yvon (2023). MaTOS Traduction automatique pour la science ouverte. Actes de l'Atelier sur l'Analyse et la Recherche de Textes Scientifiques, CORIA-TALN 2023. 5 juin 2023 Paris (France).

This contribution presents the MaTOS (Machine Translation for Open Science) project, which aims to develop new methods for the complete machine translation (MT) of scientific documents between English and French, as well as automatic metrics to evaluate the translation quality. To this end, MaTOS is interested in (a) the collection of open resources for specialised MT ; (b) the description of textual coherence markers for scientific articles ; (c) the development of new multilingual processing methods for documents ; and (d) metrics to measure progress in document-level machine translation.