- Academic Search

On the impact of tokenizer and parameters on n-gram based code analysis

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org

Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Gem Citer Citeret af 141 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Outline, then details: Syntactically guided coarse-to-fine code generation

W Zheng, SP Sharan, AK Jaiswal… - International …, 2023 - proceedings.mlr.press

For a complicated algorithm, its implementation by a human programmer usually starts with
outlining a rough control flow followed by iterative enrichments, eventually yielding carefully …

Gem Citer Citeret af 25 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] concordia.ca

Natural software revisited

M Rahman, D Palani, PC Rigby - 2019 IEEE/ACM 41st …, 2019 - ieeexplore.ieee.org

Recent works have concluded that software code is more repetitive and predictable, ie more
natural, than English texts. On re-examination, we find that much of the apparent" …

Gem Citer Citeret af 62 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Labeling hacker exploits for proactive cyber threat intelligence: A deep transfer learning approach

B Ampel, S Samtani, H Zhu, S Ullman… - … on intelligence and …, 2020 - ieeexplore.ieee.org

With the rapid development of new technologies, vulnerabilities are at an all-time high.
Companies are investing in develo** Cyber Threat Intelligence (CTI) to counteract these …

Gem Citer Citeret af 39 Relaterede artikler Alle 6 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codebert-nt: code naturalness via codebert

A Khanfir, M Jimenez, M Papadakis… - 2022 IEEE 22nd …, 2022 - ieeexplore.ieee.org

Much of recent software-engineering research has investigated the naturalness of code, the
fact that code, in small code snippets, is repetitive and can be predicted using statistical …

Gem Citer Citeret af 15 Relaterede artikler Alle 6 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models

MT Alrefaie, NE Morsy, N Samir - arxiv preprint arxiv:2403.11130, 2024 - arxiv.org

This paper presents a comprehensive examination of the impact of tokenization strategies
and vocabulary sizes on the performance of Arabic language models in downstream natural …

Gem Citer Citeret af 3 Relaterede artikler Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] iaii.or.id

Hoax detection system on Twitter using feed-Forward and back-propagation neural networks classification method

CW Kencana, EB Setiawan, I Kurniawan - Jurnal RESTI (Rekayasa …, 2020 - jurnal.iaii.or.id

Social media is one of the ways to connect every individual in the world. It also used by
irresponsible people to spread a hoax. Hoax is false news that is made as if it is true. It may …

Gem Citer Citeret af 22 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] uni.lu

Are mutants really natural? a study on how" naturalness" helps mutant selection

M Jimenez, TT Checkam, M Cordy… - Proceedings of the 12th …, 2018 - dl.acm.org

Background: Code is repetitive and predictable in a way that is similar to the natural
language. This means that code is" natural" and this" naturalness" can be captured by …

Gem Citer Citeret af 29 Relaterede artikler Alle 7 versioner

[Free GPT-4]
[DeepSeek]

[PDF] uni.lu

[PDF][PDF] Enabling the continous analysis of security vulnerabilities with vuldata7

M Jimenez, Y Le Traon, M Papadakis - 18th IEEE International Working …, 2018 - orbilu.uni.lu

Studies on security vulnerabilities require the analysis, investigation and comprehension of
real vulnerable code instances. However, collecting and experimenting with a sufficient …

Gem Citer Citeret af 21 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Exploring the Landscape of Programming Language Identification with Machine Learning Approaches

A Verma, R Saha, G Kumar, A Brighente, M Conti… - IEEE …, 2025 - ieeexplore.ieee.org

The increasing complexity of modern software development necessitates tools and
methodologies for code analysis, maintenance, and migration in multi-language Integrated …

Gem Citer Relaterede artikler

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

On the impact of tokenizer and parameters on n-gram based code analysis

Comparison of text preprocessing methods

Outline, then details: Syntactically guided coarse-to-fine code generation

Natural software revisited

Labeling hacker exploits for proactive cyber threat intelligence: A deep transfer learning approach

Codebert-nt: code naturalness via codebert

Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models

Hoax detection system on Twitter using feed-Forward and back-propagation neural networks classification method

Are mutants really natural? a study on how" naturalness" helps mutant selection

[PDF][PDF] Enabling the continous analysis of security vulnerabilities with vuldata7

Exploring the Landscape of Programming Language Identification with Machine Learning Approaches