Parallel text typology dataset
https://doi.org/10.5281/zenodo.7506220
This repository contains data accompanying the paper Neural models can sometimes discover typological generalizations, currently being submitted for publication. It contains the following information for 1295 different languages:
language vector representations from a range of neural models
automatically derived lists of affixes
automatically derived lists of inflectional paradigms
typological features derived from annotation projection, and statistics on dependency relations
typological features derived from classifiers trained on language vectors and typological databases
automatically derived word lists
data needed for automatic evaluation of language representations (code in separate repository)
Note that the multilingual word embeddings described in the paper are very large, and therefore distributed in a separate public repository.
The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at C3SE partially funded by the Swedish Research Council through grant agreement no. 2018-05973. This work was funded in part by the Swedish Research Council through grant agreement no. 2019-04129.
Go to data source
Opens in a new tabhttps://doi.org/10.5281/zenodo.7506220
Citation and access
Citation and access
Data access level:
Creator/Principal investigator(s):
Research principal:
Citation:
Language:
Administrative information
Administrative information
Topic and keywords
Topic and keywords
Relations
Relations
Metadata
Metadata
