The Uppsala Russian Corpus
The Uppsala Corpus (Upsal'skij korpus russkix tekstov) consists of some 600 Russian texts with a total of one million running words (word tokens), equally divided between informative and literary prose. The informative texts are from between 1985 and 1989, while the literary texts, whose vocabulary does not date as quickly, cover a longer period, 1960-88. The corpus does not include poetry or drama.
Within the given frameword, considerable effort has been made to ensure as representative and varied a corpus as possible. The informative texts are drawn from 25 different subject areas: economics, foreign affairs / foreign policy, ideology / domestic policy, party matters, Soviet society, social issues, defence, education, law, history, culture, linguistics, medicine / health care, psychology, environment / ecology, agriculture, engineering, information technology, space research, energy, biology, geology / geography, physics, chemistry and sport. Certain areas which were felt to be more important are represented by a larger volume of texts.
The literary half of the corpus comprises work by the following 40 authors: Abramov, Ajtmatov, Astaf'ev, Baklanov, Bek, Belov, Bitov, Bondarev, Dubov, Ganin, Gladyshev, Granin, Grekova, Goncharov, Iskander, Kaverin, Kazakov, Kochnev, Kozhevnikova, Nagibin, Lixanov, Lidin, Paustovskij, Pogodin, Pristavkin, Troepol'skij, Rasputin, Shcherbakova, Simonov, Solouxin, Shmelev, Tendrjakov, Tokareva, Tolstaja, Trifonov, Vasil'ev, Vorobl'ev, Zalygin and Zorin. Here, too, there is unequal representation, with a larger amount of writing by the better-known authors.
For further details about the corpus, see Lönngren, Lennart (ed.), 1993. Chastotnyj slovar' sovremennogo russkogo jazyka. (A Frequency Dictionary of Modern Russian. With a Summary in English.) Acta Universitatis Upsaliensis, Studia Slavica Upsaliensia 32. 188 pp. Uppsala. ISBN 91-554-3134-8.
Purpose:
The aim is to provide a corpus of Russian prose texts.
Go to data source
Opens in a new tabhttps://www.lingexp.uni-tuebingen.de/sfb441/b1/en/korpora.html
Citation and access
Citation and access
Data access level:
Creator/Principal investigator(s):
- Lennart Lönngren - University of Tromsø - Department of Language and Linguistics
- Uppsala University - Department of Modern Languages
Research principal:
Data contains personal data:
No
Citation:
Corpus
Corpus
Method and outcome
Method and outcome
Administrative information
Administrative information
Topic and keywords
Topic and keywords
Relations
Relations
Publications
Publications
Metadata
Metadata
Version 1

University of Tromsø