Geographic Adaptation of Pretrained Language Models
While pretrained language models (PLMs) have been shown to possess a plethora of
linguistic knowledge, the existing body of research has largely neglected extralinguistic …
linguistic knowledge, the existing body of research has largely neglected extralinguistic …
CLASSLA-Stanza: The next step for linguistic processing of South Slavic Languages
L Terčon, N Ljubešić - ar** the languages of Twitter in Finland
Twitter is a popular social media platform for scholarly research, because the user-
generated content on the platform can also include geographic and temporal information …
generated content on the platform can also include geographic and temporal information …
Using social-media data to investigate morphosyntactic variation and dialect syntax in a lesser-used language: Two case studies from Welsh
D Willis - Glossa, 2020 - ora.ox.ac.uk
Data gathered from social media have been used extensively to examine lexical dialect
variation in widely used languages such as English and Spanish, but their use to date in …
variation in widely used languages such as English and Spanish, but their use to date in …
Together we are stronger: Bootstrap** language technology infrastructure for South Slavic languages with CLARIN. SI
In this chapter we describe the recent developments in language technology infrastructure
building for three South Slavic languages–Slovenian, Croatian, and Serbian. These …
building for three South Slavic languages–Slovenian, Croatian, and Serbian. These …
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
This paper presents a collection of highly comparable web corpora of Slovenian, Croatian,
Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole …
Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole …
How to optimize your Twitter collection: Dutch keywords for better coverage
Twitter allows API calls to retrieve one percent of all tweets at any time using a search word
list. Since some languages, including Dutch, make up less than one percent of all tweets on …
list. Since some languages, including Dutch, make up less than one percent of all tweets on …
6 Data Collection and Representation for Similar Languages, Varieties and Dialects
Collections of digital text intended for research–known as language corpora–have been
used as linguistic data since the pioneering work on the Brown corpus by Francis and …
used as linguistic data since the pioneering work on the Brown corpus by Francis and …
[PDF][PDF] The Russian invasion of Ukraine through the lens of ex-Yugoslavian Twitter
ABSTRACT The Russian invasion of Ukraine marks a dramatic change in international
relations globally, as well as at specific, already unstable, regions. The geographical area of …
relations globally, as well as at specific, already unstable, regions. The geographical area of …