CNC

Name: Czech National Corpus

Institution: Charles University

Coordinator: Mgr. Michal Křen, Ph.D.; michal.kren@ff.cuni.cz

Website

CNC is continuously mapping the Czech language by building large electronic language corpora and providing access to them. As the only project of this kind, CNC focuses on broad-scale and complex data collection, including contemporary written Czech in all its genres and varieties, spoken Czech (covering the whole area of the Czech Republic), older Czech, as well as translated Czech. Given its large scope, diverse and balanced design, high processing standard, reliable metadata and high-quality linguistic annotation, CNC language data can compete with similar resources for major world languages. What is crucial is the continuity of data collection that enables researchers to carry out longitudinal studies of the language’s development, as well as to study changes in language awareness and public discourse in different periods of time. CNC language corpora serve as a primary research material for a wide range of research topics mainly within the social sciences and humanities (linguistics, sociology, translation studies, history, literary science, etc.), but also in natural language processing. CNC provides user access to the corpora through specialized analytical tools in the form of web-based applications, enabling user-friendly, yet effective work with language data. Together with complex user support (an online user forum, documentation and knowledge base for corpus linguistics), these applications are located at the CNC research web portal. The CNC portal is based on open access. To fully use all the features, free online registration is required; however, many tools and functions are also available to unregistered users. CNC actively cooperates with the CLARIN ERIC (Common Language Resources and Technology Infrastructure) European research infrastructure and with its Czech national node, the LINDAT/CLARIAH-CZ (Digital Research Infrastructure for the Language Technologies, Arts and Humanities) large research infrastructure. CNC is an associated member of the CLARIN-CZ consortium with the K-centre status and maintains active contacts with many foreign research institutions with similar focus.