Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deep transfer learning & beyond: Transformer language models in information systems research
AI is widely thought to be poised to transform business, yet current perceptions of the scope
of this transformation may be myopic. Recent progress in natural language processing …
of this transformation may be myopic. Recent progress in natural language processing …
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …
natural language processing (NLP), fueling a paradigm shift in information acquisition …
Deduplicating training data makes language models better
We find that existing language modeling datasets contain many near-duplicate examples
and long repetitive substrings. As a result, over 1% of the unprompted output of language …
and long repetitive substrings. As a result, over 1% of the unprompted output of language …
Dothash: estimating set similarity metrics for link prediction and document deduplication
Metrics for set similarity are a core aspect of several data mining tasks. To remove duplicate
results in a Web search, for example, a common approach looks at the Jaccard index …
results in a Web search, for example, a common approach looks at the Jaccard index …
Noise-robust de-duplication at scale
Identifying near duplicates within large, noisy text corpora has a myriad of applications that
range from de-duplicating training datasets, reducing privacy risk, and evaluating test set …
range from de-duplicating training datasets, reducing privacy risk, and evaluating test set …
Connected Components for Scaling Partial-order Blocking to Billion Entities
T Backes, S Dietze - ACM Journal of Data and Information Quality, 2024 - dl.acm.org
In entity resolution, blocking pre-partitions data for further processing by more expensive
methods. Two entity mentions are in the same block if they share identical or related …
methods. Two entity mentions are in the same block if they share identical or related …
Privacy-preserving record linkage using local sensitive hash and private set intersection
The amount of data stored in data repositories increases every year. This makes it
challenging to link records between different datasets across companies and even …
challenging to link records between different datasets across companies and even …
Proposed threshold-based and rule-based approaches to detecting duplicates in bibliographic database
Bibliographic databases are used to measure the performance of researchers, universities
and research institutions. Thus, high data quality is required and data duplication is avoided …
and research institutions. Thus, high data quality is required and data duplication is avoided …
Understanding the limitations of using large language models for text generation
D Ippolito - 2023 - search.proquest.com
State-of-the-art neural language models are capable of generating incredibly fluent English
text. This success provides opportunities for novel forms of interaction, where human writers …
text. This success provides opportunities for novel forms of interaction, where human writers …
Privacy-preserving Fuzzy Name Matching for Sharing Financial Intelligence
Financial institutions rely on data for many operations, including a need to drive efficiency,
enhance services and prevent financial crime. Data sharing across an organisation or …
enhance services and prevent financial crime. Data sharing across an organisation or …