Owing to the adoption of powerful and scalable machine learning methodologies based on neural architectures, machine translation (MT) technologies have reached a new level of development and are delivering usable services for a growing base of users, ranging from professional translators who post-edit and enhance automatically translated output to lay users who rely on MT to assist them in reading content in an otherwise undecipherable idiom.
Most MT systems strongly rely on parallel sentences, as source sentences and their translations in a target language constitute elementary units for training and testing. Processing isolated sentences prevents MT systems from taking into account extra-sentential discourse phenomena, thereby limiting the translation quality of automatic systems. This is especially unsatisfactory for MT applications targeting near-publishable output quality.
Another defining property of existing MT engines is their ability to be trained end-to-end from unanalysed samples, without using additional resources or representational layers beyond basic tokenisation rules. This goes against the working practices of professional translators, who routinely use tools and resources such as translation memories, storing instances of past translations, and terminology management tools, providing definitions, concordances and collocations for terms and multiword expressions. Such tools can help to maintain the consistency in the translation of terms and their variants, which is one aspect of document-level translation; ignoring them in MT engines can only amplify issues related to the lack of global context described above.
While the technical challenges to improving these two aspects are genuine, one critical issue that is currently slowing down progress is the lack of well-defined metrics to measure progress. Reference-based automatic metrics such as BLEU have been instrumental in the development of statistical approaches in MT: in spite of their well-documented flaws, these metrics are still shaping developments in the field. As they only (a) reward surface similarities between automatic and human translations and (b) deliver global scores, they are mostly unable to faithfully assess progress in the directions of more consistent translation outputs or of a better use of the proper terminology.
These three issues: translation of terms and their variants, document-level translation, and MT evaluation constitute the three main scientific challenges of the MaTOS project. We have chosen to focus on one specific application area where they can be studied in a more controlled way: the translation of scholarly papers between English and French in both directions. For this application domain, document-level phenomena related to the typical organisation of scientific contents as well as implied by the argumentation strategies of the authors are critical and cannot be ignored when generating a translation. Furthermore, the domain of scholarly translation is one in which high-quality terminological resources have been developed and are easy to access. Finally, this application domain is also one in which the writer’s (and translator’s) communication goals are rather transparent, which means that it might be possible to evaluate how well they have been achieved.
The automatic translation of scholarly content has been an early promise of MT and we believe that existing technologies have the potential to finally make this perspective a more concrete reality: this is not only a possibility, but also a social imperative, in the light notably of the recent “Helsinki Initiative on Multilingualism in scholarly communications”, whose goals are also reflected in the recent Open Science programme of the French Ministry of Higher-Education, Research and Innovation. These texts, as many others, strongly reiterate the necessity to disseminate the scholarly research outcomes in more than one language. First, this will make the latest advances of research more easily accessible for the general public and for other stakeholders. Second, shedding light on research outcomes phrased in languages not in the mainstream of scholarly dissemination will increase the diversity and inclusiveness of the international research ecosystem. These reports suggest that MT can help in two ways that we consider equally important: to inform non-English readers of the results of scholarly work produced internationally, when it is only available in English; to facilitate the production and dissemination of research works, when they only exist in French.
MaTOS thus aims to deliver contributions that will have both a genuine scientific as well as social impact. A first tangible outcome will be usable and shareable resources to study MT in the scholarly domain. This material will consist of term inventories for a selection of scientific domains, also documenting the scope of their morphological and syntactic variation. It will also include fresh training and annotated test corpora for these domains, which will be used to develop adapted machine translation engines. Regarding methods, we will focus on two specific issues related to the translation of terms: (a) to systematically characterise their variation within a document and its translation as a step towards modelling and automatically regenerating variants in a consistent manner; (b) to develop new methods of generating translations for new terms, going beyond methods that rely on pre-existing bilingual term inventories. Methods dedicated to the processing of complete documents will include new computational models to take very long contexts into account.
Another line of research will study often overlooked aspects of scholarly documents: (a) that they are logically organised in specific sections and subsections, following a structure and a presentation of the content that reflects the practices of each scientific field; (b) that they include non-textual elements (figures, graphs, tables) which, in part, also need to be translated, arguably with dedicated methods. For this, we will develop models that can handle and regenerate this logical structure, as well as take into account the type and function of structural elements. Finally, our results will also comprise experimental material: protocols and results of experiments aimed at collecting human judgements related to the overall translation quality, through both small scale and large-scale experiments and new automatic metrics that we hope to widely disseminate in the community. Last, we expect that MaTOS, owing to the implication of institutions strongly involved in the open science movement and the diffusion of scientific information, will help raise the awareness of linguistic issues in the communication of science both in academic and non-academic circles and stimulate the adoption of technologies, such as MT, to mitigate these issues.