Workload characterization: A survey revisited

MC Calzarossa, L Massari, D Tessera - ACM Computing Surveys (CSUR …, 2016 - dl.acm.org
Workload characterization is a well-established discipline that plays a key role in many
performance engineering studies. The large-scale social behavior inherent in the …

A survey of Web crawlers for information retrieval

M Kumar, R Bhatia, D Rattan - Wiley Interdisciplinary Reviews …, 2017 - Wiley Online Library
Performance of any search engine relies heavily on its Web crawler. Web crawlers are the
programs that get webpages from the Web by following hyperlinks. These webpages are …

Understanding html with large language models

I Gur, O Nachum, Y Miao, M Safdari, A Huang… - ar** library and command-line tool for text discovery and extraction
A Barbaresi - Proceedings of the 59th Annual Meeting of the …, 2021 - aclanthology.org
An essential operation in web corpus construction consists in retaining the desired content
while discarding the rest. Another challenge finding one's way through websites. This article …

Opinion mining and information fusion: a survey

JA Balazs, JD Velásquez - Information Fusion, 2016 - Elsevier
Abstract Interest in Opinion Mining has been growing steadily in the last years, mainly
because of its great number of applications and the scientific challenge it poses …

[LIVRE][B] Data-intensive text processing with MapReduce

J Lin, C Dyer - 2022 - books.google.com
Our world is being revolutionized by data-driven methods: access to large amounts of data
has generated new insights and opened exciting new opportunities in commerce, science …