A systematic process for Mining Software Repositories: Results from a systematic literature review
M Vidoni - Information and Software Technology, 2022 - Elsevier
Abstract Context: Mining Software Repositories (MSR) is a growing area of Software
Engineering (SE) research. Since their emergence in 2004, many investigations have …
Engineering (SE) research. Since their emergence in 2004, many investigations have …
Heterogeneous network representation learning: A unified framework with survey and benchmark
Since real-world objects and their interactions are often multi-modal and multi-typed,
heterogeneous networks have been widely used as a more powerful, realistic, and generic …
heterogeneous networks have been widely used as a more powerful, realistic, and generic …
Hierarchical topic mining via joint spherical tree and text embedding
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since
topic correlations are ubiquitous in massive text corpora. To account for potential …
topic correlations are ubiquitous in massive text corpora. To account for potential …
Discriminative topic mining via category-name guided text embedding
Mining a set of meaningful and distinctive topics automatically from massive text corpora has
broad applications. Existing topic models, however, typically work in a purely unsupervised …
broad applications. Existing topic models, however, typically work in a purely unsupervised …
Metadata-induced contrastive learning for zero-shot multi-label text classification
Large-scale multi-label text classification (LMTC) aims to associate a document with its
relevant labels from a large candidate set. Most existing LMTC approaches rely on massive …
relevant labels from a large candidate set. Most existing LMTC approaches rely on massive …
{SourceFinder}: Finding malware {Source-Code} from publicly available repositories in {GitHub}
Where can we find malware source code? This question is motivated by a real need: there is
a dearth of malware source code, which impedes various types of security research. Our …
a dearth of malware source code, which impedes various types of security research. Our …
Rep2vec: Repository embedding via heterogeneous graph adversarial contrastive learning
Driven by the exponential increase of software and the advent of the pull-based
development system Git, a large amount of open-source software has emerged on various …
development system Git, a large amount of open-source software has emerged on various …
Effective seed-guided topic discovery by integrating multiple types of contexts
Instead of mining coherent topics from a given text corpus in a completely unsupervised
manner, seed-guided topic discovery methods leverage user-provided seed words to extract …
manner, seed-guided topic discovery methods leverage user-provided seed words to extract …
Code recommendation for open source software developers
Open Source Software (OSS) is forming the spines of technology infrastructures, attracting
millions of talents to contribute. Notably, it is challenging and critical to consider both the …
millions of talents to contribute. Notably, it is challenging and critical to consider both the …
MATCH: Metadata-aware text classification in a large hierarchy
Multi-label text classification refers to the problem of assigning each given document its most
relevant labels from a label set. Commonly, the metadata of the given documents and the …
relevant labels from a label set. Commonly, the metadata of the given documents and the …