In response to the war in Ukraine, academics from the large research infrastructure team LINDAT/CLARIAH-CZ developed in record time a publicly available automatic translator between Czech and Ukrainian. This translator is freely available under the CUBBITT tool, which was introduced to the public less than two years ago. The tool uses neural networks and allows translation between several languages. Scientists are currently working on expanding it to include other languages, while developing new methods that will allow even higher quality translation. The Czech-Ukrainian and Ukrainian-Czech versions were created as a result of hackathon to help refugees from Ukraine overcome the language barrier and facilitate their contact in the Czech environment.
“Preliminary test results show that the quality of Ukrainian-Czech translation is higher than, for example, the Google Translate system, mainly due to new machine learning methods originally created for Czech-English translation,” say the creators of the Czech-Ukrainian translator. Another advantage of the Prague system is that, unlike other freely available online systems, it does not use English as an intermediate step, but translates between Czech and Ukrainian directly. The Ukrainian version of CUBBITT works in a simple interface into which it is possible to write or copy the translated text. The tool will show the Ukrainian version of the Czech text not only in Cyrillic, but also in Latin alphabet, which further helps mutual understanding.
LINDAT – Technology for Digital Humanities
LINDAT/CLARIAH-CZ is a Czech large research infrastructure for language technologies, arts and humanities, which covers the participation of the Czech Republic in the European Research Infrastructure Consortia CLARIN ERIC (Common Language Resources and Technology Infrastructure) and DARIAH ERIC (Digital Research Infrastructure for the Arts and Humanities) and is based at the Faculty of Mathematics and Physics of Charles University. It engages in international cooperation between similar research infrastructures and directly between institutions in all humanities disciplines and emphasizes digital and interdisciplinary methods of processing, including modern methods of machine learning and artificial intelligence. Part of the training data needed for the development of the automatic Czech-Ukrainian translation tool was also supplied to researchers from the Institute of Formal and Applied Linguistics by another language-oriented large research infrastructure – the Czech National Corpus.