关注
Sebastian Jaszczur
Sebastian Jaszczur
在 uw.edu.pl 的电子邮件经过验证
标题
引用次数
引用次数
年份
Sparse is Enough in Scaling Transformers
S Jaszczur, A Chowdhery, A Mohiuddin, L Kaiser, W Gajewski, ...
Advances in Neural Information Processing Systems 34, 9895-9907, 2021
942021
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
M Pióro, K Ciebiera, K Król, J Ludziejewski, S Jaszczur
arXiv preprint arXiv:2401.04081, 2024
572024
Scaling Laws for Fine-Grained Mixture of Experts
J Krajewski, J Ludziejewski, K Adamczewski, M Pióro, M Krutul, ...
arXiv preprint arXiv:2402.07871, 2024
35*2024
Neural heuristics for SAT solving
S Jaszczur, M Łuszczyk, H Michalewski
arXiv preprint arXiv:2005.13406, 2020
142020
Use of domain knowledge and feature engineering in helping AI to play Hearthstone
P Przybyszewski, S Dziewiątkowski, S Jaszczur, M Śmiech, M Szczuka
2017 Federated Conference on Computer Science and Information Systems …, 2017
62017
Structured packing in llm training improves long context utilization
K Staniszewski, S Tworkowski, S Jaszczur, Y Zhao, H Michalewski, ...
arXiv preprint arXiv:2312.17296, 2023
52023
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation
S Antoniak, M Krutul, M Pióro, J Krajewski, J Ludziejewski, K Ciebiera, ...
Advances in Neural Information Processing Systems 37, 103873-103896, 2025
4*2025
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
J Ludziejewski, M Pióro, J Krajewski, M Stefaniak, M Krutul, J Małaśnicki, ...
arXiv preprint arXiv:2502.05172, 2025
2025
Sparse attention neural networks
A Chowdhery, A Mohiuddin, H Michalewski, JM Kanerva, LM Kaiser, ...
US Patent App. 17/666,400, 2022
2022
Different Rates for Different Weights: Decoupled Relative Learning Rate Schedules
J Ludziejewski, J Małaśnicki, M Pióro, M Krutul, K Ciebiera, M Stefaniak, ...
系统目前无法执行此操作,请稍后再试。
文章 1–10