"The Unification of Grammatical Annotation Systems in Turkic Corpora"
UniTurk Workshop
6-9 February 2014, Kazan

In the context of globalization and the integration of scientific research, the unification of the grammatical annotation in corpora is very important, especially for the related languages. The actual situation is that there are no common principles of text annotation in the Turkic corpus linguistics. In the future this will lead to significant difficulties for comparative studies and development of Turkic parallel corpora, as well as for multilingual text processing and other theoretical and applied tasks.

Currently, we don’t have any conventional morphological standards for the Turkic languages, despite their structural similarity. This situation was recently discussed at the "Computer Processing of Turkic languages" conference in Astana, Kazakhstan (October, 2-4, 2013).

The same morphological categories in different Turkic languages &񗜻&񗜻are designated differently. Developers use the notation borrowed from other language groups, which are not always relevant to the specific features of Turkic languages. It should be noted that the unification is needed not only in the framework of the Turkic languages, but also on a wider scale: similar phenomena in differently-structured languages should also be designated in the same way and under the same rules. For example, the Leipzig glossing rules have already become a kind of standard in typology and could be taken as a basis for the specific corpus annotation.

Taking this into account, the Research Institute of Applied Semiotics of the Tatarstan Academy of Sciences is initiating the special workshop for the unification of grammatical annotation systems in Turkic corpora. It is expected to involve the developers of Turkic corpora, typologists, as well as linguists with rich experience in the development and unification of annotation systems for other language groups.

The unification of systems of case marking is not trivial practical task, and also requires a theoretical rethinking of many traditional grammatical descriptions, and development of proposals on their unification, therefore to improve the efficiency workshop to be held in two stages: correspondence stage and final stage. The correspondence stage (before the meeting):

  • Formed a working group of participants,
  • Contacts were established with the developers of the Turkic electronic enclosures,
  • Create a page of the seminar, in which workers are exposed materials,
  • Conducted a preliminary analysis of the existing systems of symbols and drafting proposals for unification.

As of today there is a formation of the working group participants.

We are pleased to announce that Vladimir Plungian agreed to be the supervisor of the working group. Also the working group includes Ayrat Gatiatullin (coordinator of the workshop), Timofey Arkhangelsky (coordinator of the Moscow group) and Bulat Khakimov. All proposals for the participants for inclusion into the working group, please send the coordinators of the workshop.

Final stage will include presentations and a round table discussion on the prepared materials.

Abstracts of up to 4 pages of text are accepted until January 8, 2014. It is planned to publish materials prior to the workshop.

Working languages - English, Russian, Tatar

Центр перспективных экономических исследований Академии Наук РТ 50 лучших инновационных идей для Республики Татарстан Виртуальный музей-библиотека Академии Наук Республики Татарстан Татарстанский ЦНТИ