Overview and importance of data quality for machine learning tasks

A Jain, H Patel, L Nagalapatti, N Gupta… - Proceedings of the 26th …, 2020 - dl.acm.org
It is well understood from literature that the performance of a machine learning (ML) model is
upper bounded by the quality of the data. While researchers and practitioners have focused …

Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification

L **ang, G Ding, J Han - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
In real-world scenarios, data tends to exhibit a long-tailed distribution, which increases the
difficulty of training deep networks. In this paper, we propose a novel self-paced knowledge …

Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification

V Suresh, DC Ong - arxiv preprint arxiv:2109.05427, 2021 - arxiv.org
Fine-grained classification involves dealing with datasets with larger number of classes with
subtle differences between them. Guiding the model to focus on differentiating dimensions …

Conditionally adaptive multi-task learning: Improving transfer learning in nlp using fewer parameters & less data

J Pilault, A Elhattami, C Pal - arxiv preprint arxiv:2009.09139, 2020 - arxiv.org
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring
learned knowledge across different tasks. However, MTL must deal with challenges such as …

Domain-aligned data augmentation for low-resource and imbalanced text classification

N Stylianou, D Chatzakou, T Tsikrika… - … on Information Retrieval, 2023 - Springer
Data Augmentation approaches often use Language Models, pretrained on large quantities
of unlabeled generic data, to conditionally generate examples. However, the generated data …

What can we Learn by Predicting Accuracy?

O Risser-Maroix, B Chamand - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper seeks to answer the following question:" What can we learn by predicting
accuracy?". Indeed, classification is one of the most popular tasks in machine learning, and …

Text characterization toolkit (TCT)

D Simig, T Wang, V Dankers… - Proceedings of the …, 2022 - aclanthology.org
We present a tool, Text Characterization Toolkit (TCT), that researchers can use to study
characteristics of large datasets. Furthermore, such properties can lead to understanding the …

Extracting cause of death from verbal autopsy with deep learning interpretable methods

A Blanco, A Pérez, A Casillas… - IEEE Journal of …, 2020 - ieeexplore.ieee.org
The international standard to ascertain the cause of death is medical certification. However,
in many low and middle-income countries, the majority of deaths occur outside of health …

Binary and multiclass text classification by means of separable convolutional neural network

E Solovyeva, A Abdullah - Inventions, 2021 - mdpi.com
In this paper, the structure of a separable convolutional neural network that consists of an
embedding layer, separable convolutional layers, convolutional layer and global average …

Software module classification for commercial bug reports

CE Öztürk, EH Yilmaz, Ö Köksal… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
In this work, we curate and investigate a dataset named Turkish Software Report-Module
Classification (TSRMC), consisting of commercial software bug reports of a company …