[PDF][PDF] Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics

M Armbrust, A Ghodsi, R **n… - Proceedings of …, 2021 - 15721.courses.cs.cmu.edu
This paper argues that the data warehouse architecture as we know it today will wither in the
coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) …

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

Data locality in high performance computing, big data, and converged systems: An analysis of the cutting edge and a future system architecture

S Usman, R Mehmood, I Katib, A Albeshri - Electronics, 2022 - mdpi.com
Big data has revolutionized science and technology leading to the transformation of our
societies. High-performance computing (HPC) provides the necessary computational power …

Ernest: Efficient performance prediction for {Large-Scale} advanced analytics

S Venkataraman, Z Yang, M Franklin, B Recht… - … USENIX symposium on …, 2016 - usenix.org
Recent workload trends indicate rapid growth in the deployment of machine learning,
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

In-memory big data management and processing: A survey

H Zhang, G Chen, BC Ooi, KL Tan… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Growing main memory capacity has fueled the development of in-memory big data
management and processing. By eliminating disk I/O bottleneck, it is now possible to support …

Low latency geo-distributed data analytics

Q Pu, G Ananthanarayanan, P Bodik… - ACM SIGCOMM …, 2015 - dl.acm.org
Low latency analytics on geographically distributed datasets (across datacenters, edge
clusters) is an upcoming and increasingly important challenge. The dominant approach of …

Efficient coflow scheduling with varys

M Chowdhury, Y Zhong, I Stoica - … of the 2014 ACM conference on …, 2014 - dl.acm.org
Communication in data-parallel applications often involves a collection of parallel flows.
Traditional techniques to optimize flow-level metrics do not perform well in optimizing such …

Making sense of performance in data analytics frameworks

K Ousterhout, R Rasti, S Ratnasamy… - … USENIX Symposium on …, 2015 - usenix.org
There has been much research devoted to improving the performance of data analytics
frameworks, but comparatively little effort has been spent systematically identifying the …

Effective straggler mitigation: Attack of the clones

G Ananthanarayanan, A Ghodsi, S Shenker… - … USENIX Symposium on …, 2013 - usenix.org
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be
plagued by disproportionately long-running tasks called stragglers. In the production …