User Tools

Site Tools


material_in_the_databases

This is an old revision of the document!


Material in the databases

We give a description of the material included in the two sub-bases.

BCS

Add description

Slovenian

The list of most common Slovenian verbs was made using Clarin.si’s infrastructure that uses NoSketch Engine to search and analyze different corpora. For the purposes of this database, we used the Gigafida 2.0 corpora. You can find general information abot the corpora here and its website here.

Some general notes

Items that got on the list due to mistakes in annotation in the corpus were excluded from our list and replaced by the next web on the list of most common verbs. One such example is ‘Hoče’. Hoče is indeed the 3. person singular form of the verb hoteti, but it is also a proper name of a Slovenian municipality. Since hoteti ‘to want’ was independently on the list, the form ‘Hoče’ was excluded.

The list of verbs includes several homophonous verbs. Since the corpus is not annotated for meaning, homophonous verbs are counted as one verb. For example, the verb brati can mean ‘read’ or ‘gather, collect’. In such cases the annotators annotated the verb for what they took to be the more frequent use of the verb. Same goes for prefixed versions (prebrati ‘to finish reading’ or ‘pick through’) but note that not all meanings appear with all prefixes (odbrati just ‘collect some items from a set, separate’).

Back to start page.

material_in_the_databases.1646125523.txt.gz · Last modified: 2022/03/01 10:05 by pm