mmarco: A multilingual version of the ms marco passage ranking dataset

L Bonifacio, V Jeronymo, HQ Abonizio… - arxiv preprint arxiv …, 2021‏ - arxiv.org
The MS MARCO ranking dataset has been widely used for training deep learning models for
IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this …

Mono vs multilingual bert for hate speech detection and text classification: A case study in marathi

A Velankar, H Patil, R Joshi - IAPR Workshop on Artificial Neural Networks …, 2022‏ - Springer
Transformers are the most eminent architectures used for a vast range of Natural Language
Processing tasks. These models are pre-trained over a large text corpus and are meant to …

L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models

A Velankar, H Patil, A Gore, S Salunke… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Social media platforms are used by a large number of people prominently to express their
thoughts and opinions. However, these platforms have contributed to a substantial amount …

[PDF][PDF] EmotivITA at EVALITA2023: Overview of the Dimensional and Multidimensional Emotion Analysis Task.

G Gafà, F Cutugno, M Venuti - EVALITA, 2023‏ - ceur-ws.org
EmotivITA is the first shared task for Italian Dimensional and Multidimensional Emotion
Analysis, aiming to promote research in the field of emotion detection within the Italian …

[PDF][PDF] Overview of multicardioner task at bioasq 2024 on medical speciality and language adaptation of clinical ner systems for spanish, english and italian

S Lima-López, E Farré-Maduell, J Rodríguez-Miret… - CLEF Working …, 2024‏ - ceur-ws.org
Transformers and large language models (LLMs) are increasingly used for clinical data
analysis, mostly in English, but also in many other languages used within medical care …

Detection of bangla hate comments and cyberbullying in social media using nlp and transformer models

MIH Emon, KN Iqbal, MHK Mehedi… - … on Advances in …, 2022‏ - Springer
Hate speech and cyberbullying detection on social media is one of the most trending natural
language processing tasks in the current scenario. However, due to the lack of resources, a …

MultiMediate'24: Multi-Domain Engagement Estimation

P Müller, M Balazia, T Baur, M Dietz, A Heimerl… - Proceedings of the …, 2024‏ - dl.acm.org
Estimating the momentary level of participant's engagement is an important prerequisite for
assistive systems that support human interactions. Previous work has addressed this task in …

ConFit: Improving resume-job matching using data augmentation and contrastive learning

X Yu, J Zhang, Z Yu - Proceedings of the 18th ACM Conference on …, 2024‏ - dl.acm.org
A reliable resume-job matching system helps a company find suitable candidates from a
pool of resumes, and helps a job seeker find relevant jobs from a list of job posts. However …

Synthetic cross-language information retrieval training data

J Mayfield, E Yang, D Lawrie, S Barham… - arxiv preprint arxiv …, 2023‏ - arxiv.org
A key stumbling block for neural cross-language information retrieval (CLIR) systems has
been the paucity of training data. The appearance of the MS MARCO monolingual training …

Automatic sexism detection with multilingual transformer models

M Schütz, J Boeck, D Liakhovets, D Slijepčević… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Sexism has become an increasingly major problem on social networks during the last years.
The first shared task on sE**sm Identification in Social neTworks (EXIST) at IberLEF 2021 is …