Language Research Infrastructure in the Czech Republic

lindat-logoLINDAT/CLARIN – website

Hosting institution: Charles University

Partner institutions:

LINDAT/CLARIN is a Czech national node of CLARIN (Common Language Resources and Technology Infrastructure), a pan-European research infrastructural network established as an ERIC in 2012, currently consisting of 14 countries. The aim of LINDAT/CLARIN is to provide open and free access to language research data through a certified repository and language technology and services to be used in social sciences and humanities (linguistics and related interdisciplinary research such as formal and computational linguistics, translatology, lexicography, literary studies, psychology, sociology, history, neurolinguistics, cognitive sciences and artificial intelligence). In addition, LINDAT/CLARIN also serves language technology and application development areas and is compatible with the META-SHARE network by creating linguistically analysed open resources for Czech and other languages. LINDAT/CLARIN connects linguistic resources (data) of various types and structure with language technology for the full range of natural language processing applications important in the Czech language environment. Such data are important for R&D of technologies based on machine learning (natural language processing, speech recognition and synthesis and a combined analysis of text, speech, image and other multimedia). LINDAT/CLARIN also provides space and support for persistent data storage for language related data resulting from projects of external researchers and institutions, allowing long-term preservation, easy citations and metadata transfer to the CLARIN portal. LINDAT/CLARIN closely cooperates with research infrastructures CNC and CESNET at the national level and with several other pan-European research infrastructures, namely DARIAH (Digital Research Infrastructure for the Arts and Humanities), EHRI (European Holocaust Research Infrastructure), ELRA (European Language Resources Association), EUDAT (European Collaborative Data Infrastructure) and LDC (Linguistic Data Consortium).

Future development

LINDAT/CLARIN has been fully operational since 2014 with the highest CLARIN certification (CLARIN B Centre). Maintenance and further development of the repository is a key for continuous data storage and preservation. Substantial expansion of its web services and tools aimed at all users is planned in the near future. It will further strengthen its international collaboration with other national nodes within CLARIN research infrastructure and other pan-European research infrastructures, primarily in the social sciences and humanities (especially DARIAH, but also EHRI, EUDAT and others).

Socio-economic impact

Language technologies are essential for all areas of the European economy, especially for its inherently multilingual market. In turn, LINDAT/CLARIN language resources and services are essential for R&D in this area. In the social sciences and humanities, language is the primary means of communication and information exchange and recording. Text analysis and the analysis of multimedia using language technology will broadly support research in the area of national heritage and cultural identity. Open Access to the data and services guarantees uninhibited use in research and education at all levels at the universities as well as in the Academy of Sciences of the Czech Republic. LINDAT/CLARIN is taking part in various initiatives to change the European legal system in the intellectual property rights area to make the use of language resources easier in all areas of research and applications. It will also continue to serve the general public in Czech language-related needs (orthography, grammar, lexicons).