Skip to main content
Researchdata.se
ℹ️ This is a preview version of Researchdata.se. The site contents and features are under development.

The Swedish Culturomics Gigaword Corpus

The Swedish Culturomics Gigaword Corpus
https://doi.org/10.23695/3WMV-1Z09
One billion Swedish words from 1950 and onwards. Please reference the dataset using the following reference: Stian Rødven Eide, Nina Tahmasebi, Lars Borin. 2016. The Swedish Culturomics Gigaword Corpus: A One Billion Word Swedish Reference Dataset for NLP Code to extract data from the corpus, as well as usage instructions, can be downloaded from https://svn.spraakbanken.gu.se/sb-arkiv/tools/gigawordOpens in a new tab Sentences per year for each genre fiction government news science socialmedia 1950 - 420 413 - - - 1960 - 424 920 - - - 1965 - - 53 624 - - 1970 - 459 867 - - - 1976 - - 89 175 - - 1977 499 030 - - - - 1980 - 534 194 - - - 1981 307 597 - - - - 1987 97 398 - 364 226 - - 1990 - 551 988 - - - 1991 330 127 - - - - 1992 - - - 44 538 - 1994 - 391 882 1 538 748 - - 1995 - - 514 797 - - 1996 - - 449 148 118 542 - 1997 - - 980 230 125 096 - 1998 - - 804 178 121 895 1 638 1999 194 699 - - 113 568 40 099 2000 - - - 109 289 12 945 2001 - - 1 393 257 115 012 20 006 2002 - 41 066 2 610 740 110 830 191 234 2003 - - 2 095 700 96 778 16 382 2004 - - 2 094 251 103 881 487 447 2005 - - 3 013 787 85 023 985 094 2006 - 50 684 2 634 386 - 408 425 2007 - - 2 530 808 523 102 1 638 311 2008 - - 2 607 657 - 754 801 2009 - - 2 795 855 - 605 194 2010 - - 2 635 687 - 790 148 2011 - - 2 973 928 - 957 017 2012 - - 2 681 277 673 820 1 589 999 2013 - - 2 501 426 - 594 982 2014 - - - - 590 146 2015 - - - 12 293 254 187 253
Go to data source
Opens in a new tab
https://doi.org/10.23695/3WMV-1Z09

Citation and access

Administrative information

Topic and keywords

Metadata

sprakbanken-textgu_en