Študovňa Google

RS Portnoff, S Afroz, G Durrett, JK Kummerfeld… - Proceedings of the 26th …, 2017 - dl.acm.org

Underground forums are widely used by criminals to buy and sell a host of stolen items,
datasets, resources, and criminal services. These forums contain important resources for …

Uložiť Citovať Citované 112-krát Súvisiace články Všetky verzie 14

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Substructure substitution: Structured data augmentation for NLP

H Shi, K Livescu, K Gimpel - arxiv preprint arxiv:2101.00411, 2021 - arxiv.org

We study a family of data augmentation methods, substructure substitution (SUB2), for
natural language processing (NLP) tasks. SUB2 generates new examples by substituting …

Uložiť Citovať Citované 48-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving pre-trained multilingual models with vocabulary expansion

H Wang, D Yu, K Sun, J Chen, D Yu - arxiv preprint arxiv:1909.12440, 2019 - arxiv.org

Recently, pre-trained language models have achieved remarkable success in a broad range
of natural language processing tasks. However, in multilingual setting, it is extremely …

Uložiť Citovať Citované 41-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations

M Sanguinetti, C Bosco, L Cassidy, Ö Çetinoğlu… - Language Resources …, 2023 - Springer

This article presents a discussion on the main linguistic phenomena which cause difficulties
in the analysis of user-generated texts found on the web and in social media, and proposes …

Uložiť Citovať Citované 29-krát Súvisiace články Všetky verzie 21

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

You are your photographs: Detecting multiple identities of vendors in the darknet marketplaces

X Wang, P Peng, C Wang, G Wang - Proceedings of the 2018 on Asia …, 2018 - dl.acm.org

Darknet markets are online services behind Tor where cybercriminals trade illegal goods
and stolen datasets. In recent years, security analysts and law enforcement start to …

Uložiť Citovať Citované 43-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Identifying products in online cybercrime marketplaces: A dataset for fine-grained domain adaptation

G Durrett, JK Kummerfeld, T Berg-Kirkpatrick… - arxiv preprint arxiv …, 2017 - arxiv.org

One weakness of machine-learned NLP models is that they typically perform poorly on out-
of-domain data. In this work, we study the task of identifying products being bought and sold …

Uložiť Citovať Citované 36-krát Súvisiace články Všetky verzie 10 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] unica.it

Treebanking user-generated content: A proposal for a unified representation in Universal Dependencies

M Sanguinetti, B Cristina, C Lauren, C Ozlem… - Proceedings of the 12th …, 2020 - iris.unica.it

The paper presents a discussion on the main linguistic phenomena of user-generated texts
found in web and social media, and proposes a set of annotation guidelines for their …

Uložiť Citovať Citované 19-krát Súvisiace články Všetky verzie 18 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] rug.nl

A taxonomy for in-depth evaluation of normalization for user generated content

R Van Der Goot, R van Noord… - … Conference on Language …, 2018 - research.rug.nl

In this work we present a taxonomy of error categories for lexical normalization, which is the
task of translating user generated content to canonical language. We annotate a recent …

Uložiť Citovať Citované 18-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] ssrn.com

Discovery of stylistic patterns in business process textual descriptions: It ticket case

N Rizun, V Meister, A Revina - Innovation Management and …, 2020 - papers.ssrn.com

Growing IT complexity and related problems, which are reflected in IT tickets, create a need
for new qualitative approaches. The goal is to automate the extraction of main topics …

Uložiť Citovať Citované 14-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

From noisy questions to Minecraft texts: Annotation challenges in extreme syntax scenario

HM Alonso, D Seddah, B Sagot - … of the 2nd Workshop on Noisy …, 2016 - aclanthology.org

User-generated content presents many challenges for its automatic processing. While many
of them do come from out-of-vocabulary effects, others spawn from different linguistic …

Uložiť Citovať Citované 18-krát Súvisiace články Všetky verzie 5 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Foreebank: Syntactic analysis of customer support forums

Tools for automated analysis of cybercriminal markets

Substructure substitution: Structured data augmentation for NLP

Improving pre-trained multilingual models with vocabulary expansion

Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations

You are your photographs: Detecting multiple identities of vendors in the darknet marketplaces

Identifying products in online cybercrime marketplaces: A dataset for fine-grained domain adaptation

Treebanking user-generated content: A proposal for a unified representation in Universal Dependencies

A taxonomy for in-depth evaluation of normalization for user generated content

Discovery of stylistic patterns in business process textual descriptions: It ticket case

From noisy questions to Minecraft texts: Annotation challenges in extreme syntax scenario