Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Tools for automated analysis of cybercriminal markets
Underground forums are widely used by criminals to buy and sell a host of stolen items,
datasets, resources, and criminal services. These forums contain important resources for …
datasets, resources, and criminal services. These forums contain important resources for …
Substructure substitution: Structured data augmentation for NLP
We study a family of data augmentation methods, substructure substitution (SUB2), for
natural language processing (NLP) tasks. SUB2 generates new examples by substituting …
natural language processing (NLP) tasks. SUB2 generates new examples by substituting …
Improving pre-trained multilingual models with vocabulary expansion
Recently, pre-trained language models have achieved remarkable success in a broad range
of natural language processing tasks. However, in multilingual setting, it is extremely …
of natural language processing tasks. However, in multilingual setting, it is extremely …
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations
This article presents a discussion on the main linguistic phenomena which cause difficulties
in the analysis of user-generated texts found on the web and in social media, and proposes …
in the analysis of user-generated texts found on the web and in social media, and proposes …
You are your photographs: Detecting multiple identities of vendors in the darknet marketplaces
Darknet markets are online services behind Tor where cybercriminals trade illegal goods
and stolen datasets. In recent years, security analysts and law enforcement start to …
and stolen datasets. In recent years, security analysts and law enforcement start to …
Identifying products in online cybercrime marketplaces: A dataset for fine-grained domain adaptation
One weakness of machine-learned NLP models is that they typically perform poorly on out-
of-domain data. In this work, we study the task of identifying products being bought and sold …
of-domain data. In this work, we study the task of identifying products being bought and sold …
Treebanking user-generated content: A proposal for a unified representation in Universal Dependencies
The paper presents a discussion on the main linguistic phenomena of user-generated texts
found in web and social media, and proposes a set of annotation guidelines for their …
found in web and social media, and proposes a set of annotation guidelines for their …
A taxonomy for in-depth evaluation of normalization for user generated content
In this work we present a taxonomy of error categories for lexical normalization, which is the
task of translating user generated content to canonical language. We annotate a recent …
task of translating user generated content to canonical language. We annotate a recent …
Discovery of stylistic patterns in business process textual descriptions: It ticket case
Growing IT complexity and related problems, which are reflected in IT tickets, create a need
for new qualitative approaches. The goal is to automate the extraction of main topics …
for new qualitative approaches. The goal is to automate the extraction of main topics …
From noisy questions to Minecraft texts: Annotation challenges in extreme syntax scenario
User-generated content presents many challenges for its automatic processing. While many
of them do come from out-of-vocabulary effects, others spawn from different linguistic …
of them do come from out-of-vocabulary effects, others spawn from different linguistic …