A systematic process for Mining Software Repositories: Results from a systematic literature review

M Vidoni - Information and Software Technology, 2022 - Elsevier
Abstract Context: Mining Software Repositories (MSR) is a growing area of Software
Engineering (SE) research. Since their emergence in 2004, many investigations have …

Heterogeneous network representation learning: A unified framework with survey and benchmark

C Yang, Y **ao, Y Zhang, Y Sun… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Since real-world objects and their interactions are often multi-modal and multi-typed,
heterogeneous networks have been widely used as a more powerful, realistic, and generic …

Hierarchical topic mining via joint spherical tree and text embedding

Y Meng, Y Zhang, J Huang, Y Zhang, C Zhang… - Proceedings of the 26th …, 2020 - dl.acm.org
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since
topic correlations are ubiquitous in massive text corpora. To account for potential …

Discriminative topic mining via category-name guided text embedding

Y Meng, J Huang, G Wang, Z Wang, C Zhang… - Proceedings of The …, 2020 - dl.acm.org
Mining a set of meaningful and distinctive topics automatically from massive text corpora has
broad applications. Existing topic models, however, typically work in a purely unsupervised …

Metadata-induced contrastive learning for zero-shot multi-label text classification

Y Zhang, Z Shen, CH Wu, B **e, J Hao… - Proceedings of the …, 2022 - dl.acm.org
Large-scale multi-label text classification (LMTC) aims to associate a document with its
relevant labels from a large candidate set. Most existing LMTC approaches rely on massive …

{SourceFinder}: Finding malware {Source-Code} from publicly available repositories in {GitHub}

MOF Rokon, R Islam, A Darki, EE Papalexakis… - … on Research in Attacks …, 2020 - usenix.org
Where can we find malware source code? This question is motivated by a real need: there is
a dearth of malware source code, which impedes various types of security research. Our …

Rep2vec: Repository embedding via heterogeneous graph adversarial contrastive learning

Y Qian, Y Zhang, Q Wen, Y Ye, C Zhang - Proceedings of the 28th ACM …, 2022 - dl.acm.org
Driven by the exponential increase of software and the advent of the pull-based
development system Git, a large amount of open-source software has emerged on various …

Effective seed-guided topic discovery by integrating multiple types of contexts

Y Zhang, Y Zhang, M Michalski, Y Jiang… - Proceedings of the …, 2023 - dl.acm.org
Instead of mining coherent topics from a given text corpus in a completely unsupervised
manner, seed-guided topic discovery methods leverage user-provided seed words to extract …

Code recommendation for open source software developers

Y **, Y Bai, Y Zhu, Y Sun, W Wang - … of the ACM Web Conference 2023, 2023 - dl.acm.org
Open Source Software (OSS) is forming the spines of technology infrastructures, attracting
millions of talents to contribute. Notably, it is challenging and critical to consider both the …

MATCH: Metadata-aware text classification in a large hierarchy

Y Zhang, Z Shen, Y Dong, K Wang, J Han - Proceedings of the Web …, 2021 - dl.acm.org
Multi-label text classification refers to the problem of assigning each given document its most
relevant labels from a label set. Commonly, the metadata of the given documents and the …