Ordklasstaggningsmodell: Stanza
https://doi.org/10.23695/YGW3-GF17
Models
Stanza is currently the default annotation tool used by Sparv. We provide two Stanza POS-tagging models.
stanza_eval is trained on SUC3 with Talbanken_SBX_dev as dev set. The advantage of this model is that it can be evaluated, using Talbanken_SBX_test or SIC2. The evaluation results are reported in the table below.
Test set
Exact match
POS
MSD
Talbanken_SBX_test
0.973
0.983
0.988
SIC2
0.918
0.932
0.957
Read more about the evaluation here.
stanza_full is trained on SUC3 + Talbanken_SBX_test + SIC2 with Talbanken_SBX_dev as dev set. We cannot evaluate the performance of this model, but we expect it to perform better than stanza_eval, or at least not worse. This is the model used by Sparv.
We updated the "pretrain" file in spring 2025. This was a minor format change.
Using the models on your own
Unzip the model you want to use and the "pretrain" file (which contains word2vec embeddings encoded in a format required by Stanza). Follow the instructions provided by Stanza
Gå till källa för data
Öppnas i en ny tabbhttps://doi.org/10.23695/YGW3-GF17
Citering och åtkomst
Citering och åtkomst
Skapare/primärforskare:
Forskningshuvudman:
Citering:
Språk:
Administrativ information
Administrativ information
Ämnesområde och nyckelord
Ämnesområde och nyckelord
Metadata
Metadata
