- Academic Search

GI Yu, JS Jeong, GW Kim, S Kim, BG Chun - 16th USENIX Symposium …, 2022 - usenix.org

Large-scale Transformer-based models trained for generation tasks (eg, GPT-3) have
recently attracted huge interest, emphasizing the need for system support for serving models …

Zapisz Cytuj Cytowane przez 369 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating large language models for radiology natural language processing

Z Liu, T Zhong, Y Li, Y Zhang, Y Pan, Z Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

The rise of large language models (LLMs) has marked a pivotal shift in the field of natural
language processing (NLP). LLMs have revolutionized a multitude of domains, and they …

Zapisz Cytuj Cytowane przez 35 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Achieving Peak Performance for Large Language Models: A Systematic Review

ZRK Rostam, S Szénási, G Kertész - IEEE Access, 2024 - ieeexplore.ieee.org

In recent years, large language models (LLMs) have achieved remarkable success in
natural language processing (NLP). LLMs require an extreme amount of parameters to …

Zapisz Cytuj Cytowane przez 7 Powiązane artykuły Wszystkie wersje 5

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Transformer uncertainty estimation with hierarchical stochastic attention

J Pei, C Wang, G Szarvas - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied
to many real-world products. Understanding the reliability and certainty of transformer …

Zapisz Cytuj Cytowane przez 22 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Influential recommender system

H Zhu, H Ge, X Gu, P Zhao… - 2023 IEEE 39th …, 2023 - ieeexplore.ieee.org

Traditional recommender systems are typically passive in that they try to adapt their
recommendations to the user's historical interests. However, it is highly desirable for …

Zapisz Cytuj Cytowane przez 7 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

HPipe: Large Language Model Pipeline Parallelism for Long Context on Heterogeneous Cost-effective Devices

R Ma, X Yang, J Wang, Q Qi, H Sun… - Proceedings of the …, 2024 - aclanthology.org

Micro-enterprises and individual developers emerge analysis demands for long sequence
with powerful Large Language Models (LLMs). They try to deploy the LLMs at local, but only …

Zapisz Cytuj Cytowane przez 3 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

TCP: A Tensor Contraction Processor for AI Workloads Industrial Product

H Kim, Y Choi, J Park, B Bae, H Jeong… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

We introduce a novel tensor contraction processor (TCP) architecture that offers a paradigm
shift from traditional architectures that rely on fixed-size matrix multiplications. TCP aims at …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

iServe: An Intent-based Serving System for LLMs

D Liakopoulos, T Hu, P Sinha… - arxiv preprint arxiv …, 2025 - arxiv.org

Large Language Models (LLMs) are becoming ubiquitous across industries, where
applications demand they fulfill diverse user intents. However, developers currently face the …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] googleapis.com

Dynamic batching for inference system for transformer-based generation tasks

YU Gyeongin, G Kim, JS Jeong, S Kim… - US Patent …, 2022 - Google Patents

An inference system applies a machine-learning transformer model to a batch of requests
with variable input length or variable target length or variable internal state length by …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 2 Kopia

[Free GPT-4]
[DeepSeek]

[PDF] googleapis.com

Selective batching for inference system for transformer-based generation tasks

YU Gyeongin, G Kim, JS Jeong, S Kim… - US Patent …, 2024 - Google Patents

An inference system applies a machine-learning transformer model to a batch of requests
with variable input length or variable target length or variable internal state length by …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 4 Kopia

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Easy and efficient transformer: Scalable inference solution for large NLP model

Orca: A distributed serving system for {Transformer-Based} generative models

Evaluating large language models for radiology natural language processing

Achieving Peak Performance for Large Language Models: A Systematic Review

Transformer uncertainty estimation with hierarchical stochastic attention

Influential recommender system

HPipe: Large Language Model Pipeline Parallelism for Long Context on Heterogeneous Cost-effective Devices

TCP: A Tensor Contraction Processor for AI Workloads Industrial Product

iServe: An Intent-based Serving System for LLMs

Dynamic batching for inference system for transformer-based generation tasks

Selective batching for inference system for transformer-based generation tasks