[HTML][HTML] Strategies and principles of distributed machine learning on big data

EP **ng, Q Ho, P **e, D Wei - Engineering, 2016 - Elsevier
The rise of big data has led to new demands for machine learning (ML) systems to learn
complex models, with millions to billions of parameters, that promise adequate capacity to …

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

H Jelodar, Y Wang, C Yuan, X Feng, X Jiang… - Multimedia tools and …, 2019 - Springer
Topic modeling is one of the most powerful techniques in text mining for data mining, latent
data discovery, and finding relationships among data and text documents. Researchers …

Pipedream: Fast and efficient pipeline parallel dnn training

A Harlap, D Narayanan, A Phanishayee… - arxiv preprint arxiv …, 2018 - arxiv.org
PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes
computation by pipelining execution across multiple machines. Its pipeline parallel …

Distributionally robust language modeling

Y Oren, S Sagawa, TB Hashimoto, P Liang - arxiv preprint arxiv …, 2019 - arxiv.org
Language models are generally trained on data spanning a wide range of topics (eg, news,
reviews, fiction), but they might be applied to an a priori unknown target distribution (eg …

Petuum: A new platform for distributed machine learning on big data

EP **ng, Q Ho, W Dai, JK Kim, J Wei, S Lee… - Proceedings of the 21th …, 2015 - dl.acm.org
How can one build a distributed framework that allows efficient deployment of a wide
spectrum of modern advanced machine learning (ML) programs for industrial-scale …

Federated latent dirichlet allocation: A local differential privacy based framework

Y Wang, Y Tong, D Shi - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
Abstract Latent Dirichlet Allocation (LDA) is a widely adopted topic model for industrial-
grade text mining applications. However, its performance heavily relies on the collection of …

[PDF][PDF] Docchat: An information retrieval approach for chatbot engines using unstructured documents

Z Yan, N Duan, J Bao, P Chen, M Zhou… - Proceedings of the …, 2016 - aclanthology.org
Most current chatbot engines are designed to reply to user utterances based on existing
utterance-response (or QR) 1 pairs. In this paper, we present DocChat, a novel information …

Introducing an interpretable deep learning approach to domain-specific dictionary creation: A use case for conflict prediction

S Häffner, M Hofer, M Nagl, J Walterskirchen - Political Analysis, 2023 - cambridge.org
Recent advancements in natural language processing (NLP) methods have significantly
improved their performance. However, more complex NLP models are more difficult to …

Heterogeneous latent topic discovery for semantic text mining

Y Li, D Jiang, R Lian, X Wu, C Tan… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In order to mine latent semantics from text data, word embedding and topic modeling are two
major methodologies in the industry. From a pragmatic perspective, each of these two lines …

Toward understanding the impact of staleness in distributed machine learning

W Dai, Y Zhou, N Dong, H Zhang, EP **ng - arxiv preprint arxiv …, 2018 - arxiv.org
Many distributed machine learning (ML) systems adopt the non-synchronous execution in
order to alleviate the network communication bottleneck, resulting in stale parameters that …