This is an old revision of the document!
Table of Contents
Material in the databases
We give a description of the material included in the two sub-bases.
BCS
Add description
Some general notes
Slovenian
The list of most common Slovenian verbs was made using Clarin.si’s infrastructure that uses NoSketch Engine to search and analyze different corpora. For the purposes of this database, we used the Gigafida 2.0 corpora. You can find general information abot the corpora here and its website here.
Some general notes
Items that got on the list due to mistakes in annotation in the corpus were excluded from our list and replaced by the next web on the list of most common verbs. One such example is ‘Hoče’. Hoče is indeed the 3. person singular form of the verb hoteti, but it is also a proper name of a Slovenian municipality. Since hoteti ‘to want’ was independently on the list, the form ‘Hoče’ was excluded.
The list of verbs includes several homophonous verbs. Since the corpus is not annotated for meaning, homophonous verbs are counted as one verb. For example, the verb brati can mean ‘read’ or ‘gather, collect’. In such cases the annotators annotated the verb for the propertes associated with what they took to be the more frequent use of the verb. Same goes for prefixed versions (prebrati ‘to finish reading’ or ‘pick through’) but note that not all meanings appear with all prefixes (odbrati just ‘collect some items from a set, separate’).