User Tools

Site Tools


material_in_the_databases

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
material_in_the_databases [2022/08/17 09:16] pmmaterial_in_the_databases [2023/09/06 09:49] (current) pm
Line 2: Line 2:
 We give a description of the material included in the two sub-bases.  We give a description of the material included in the two sub-bases. 
  
-===== BCS =====+===== BCMS =====
  
-The verb selection for BCS was conducted using the corpora [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=srwac&struct_attr_stats=1|srWac]], [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=hrwac&struct_attr_stats=1|hrWaC]], [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=bswac&struct_attr_stats=1|bsWaC]] and [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=mewac&struct_attr_stats=1|meWaC]], all of which are part of [[https://www.clarin.si/noske/index.html|Clarin.si]]’s infrastructure that uses [[https://nlp.fi.muni.cz/trac/noske|NoSketch Engine]] to search and analyze different corpora. The criterion was frequency: the 3000 most frequent verbs from each of the corpora were included. The corpora of BSC had substantial overlap, which is why the number of included verbs is not 12000, as expected without any overlap, but 5300, with a number of verbs repeated in regional variants.+The verb selection for BCMS was conducted using the corpora [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=srwac&struct_attr_stats=1|srWac]], [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=hrwac&struct_attr_stats=1|hrWaC]], [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=bswac&struct_attr_stats=1|bsWaC]] and [[https://www.clarin.si/noske/run.cgi/corp_info?corpname=mewac&struct_attr_stats=1|meWaC]], all of which are part of [[https://www.clarin.si/noske/index.html|Clarin.si]]’s infrastructure that uses [[https://nlp.fi.muni.cz/trac/noske|NoSketch Engine]] to search and analyze different corpora. The criterion was frequency: the 3000 most frequent verbs from each of the corpora were included. The corpora of BCMS had substantial overlap, which is why the number of included verbs is not 12000, as expected without any overlap, but 5300, with a number of verbs repeated in regional variants.
 Different shapes that the same verbs have in two or each of the varieties were introduced as separate entries and annotated as variants of one verb. Some typical examples of variants are ekavian and ijekavian versions (e.g. //verovati// and //vjerovati// 'to believe'), or versions emerging from using different native suffixes to adopt  Different shapes that the same verbs have in two or each of the varieties were introduced as separate entries and annotated as variants of one verb. Some typical examples of variants are ekavian and ijekavian versions (e.g. //verovati// and //vjerovati// 'to believe'), or versions emerging from using different native suffixes to adopt 
 borrowed verbs (e.g. //lajk-a-ti// and //lajk-ova-ti// 'to like (on social media)').    borrowed verbs (e.g. //lajk-a-ti// and //lajk-ova-ti// 'to like (on social media)').   
material_in_the_databases.txt · Last modified: 2023/09/06 09:49 by pm