million words *
Source text word counts only. The share of LINST selected for LETRINT 0 is also based on STWC Source text word counts only. The share of LETRINT 0 selected for LETRINT 1 is also based on STWC Source text word counts only. The share of LETRINT 1 selected for LETRINT 1+ is also based on STWC
{{ formatLevelName ( gs.interfaceData.levels[ gs.interfaceData.levels.findIndex( d=> d===gs.level) >0?gs.interfaceData.levels.findIndex( d=> d===gs.level)-1:0 ] ).toUpperCase() }} {{ formatLevelName(gs.level).toUpperCase() }} {{ (gs.interfaceData.levelsPercentages[gs.level]*100).toFixed(2)}}%


Mapping and analyzing multilingual text production of international organizations



overview diagram

The LETRINT corpora are four sets of trilingual textual datasets, including one comparable (LINST) and three parallel corpora (LETRINT 0, LETRINT 1 and LETRINT 1+). Their scope and features are determined by the goals of the LETRINT project. They comprise documents published in English, French and Spanish by the four main European Union institutions, the United Nations and its International Court of Justice, and the World Trade Organization in 2005, 2010 and 2015. The following infographic explains the composition and the methodological details of each corpus. For further details, you can click on the “+” sign that will appear at the top of each page. This resource provides a comprehensive overview of the corpora, but does not give access to the corpora themselves. A freely accessible corpus query interface, LETRINT-Q, has been developed for this purpose. You can also access an information sheet about the LETRINT corpora by clicking on “Overview” during the interactive exploration of the infographic.



help diagram