Suivre
Taja Kuzman
Titre
Citée par
Citée par
Année
Automatic genre identification: a survey
T Kuzman, N Ljubešić
Language Resources and Evaluation, 1-34, 2023
1172023
Neural machine translation of literary texts from English to Slovene
T Kuzman, Š Vintar, M Arcan
Proceedings of the qualities of literary machine translation, 1-9, 2019
382019
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint. ana 2.0
T Erjavec, M Ogrodniczuk, P Osenova, N Ljubešić, K Simov, V Grigorova, ...
CLARIN ERIC, 2021
35*2021
ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
T Kuzman, I Mozetič, N Ljubešić
arXiv preprint arXiv: 2303.03953, 2023
30*2023
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages
M Banón, M Espla-Gomis, ML Forcada, C García-Romero, T Kuzman, ...
23rd Annual Conference of the European Association for Machine Translation …, 2022
222022
Training corpus ssj500k 1.3. Slovenian language resource repository CLARIN. SI
S Krek, T Erjavec, K Dobrovoljc, S Može, N Ledinek, N Holz
202013
The GINCO training dataset for web genre identification of documents out in the wild
T Kuzman, P Rupnik, N Ljubešić
arXiv preprint arXiv:2201.03857, 2022
152022
Automatic genre identification for robust enrichment of massive text collections: Investigation of classification methods in the era of large language models
T Kuzman, I Mozetič, N Ljubešić
Machine Learning and Knowledge Extraction 5 (3), 1149-1175, 2023
142023
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
N Ljubešić, T Kuzman
arXiv preprint arXiv:2403.12721, 2024
82024
BENCHić-lang: A benchmark for discriminating between Bosnian, Croatian, Montenegrin and Serbian
P Rupnik, T Kuzman, N Ljubešić
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2023
82023
Training corpus ssj500k 1.4
S Krek, K Dobrovoljc, T Erjavec, S Može, N Ledinek, N Holz
Centre for Language Resources and Technologies, University of Ljubljana, 2015
82015
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
R van Noord, T Kuzman, P Rupnik, N Ljubešić, M Esplà-Gomis, ...
arXiv preprint arXiv:2403.08693, 2024
62024
Get to know your parallel data: Performing english variety and genre classification over macocu corpora
T Kuzman, P Rupnik, N Ljubešić
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial …, 2023
62023
Assessing comparability of genre datasets via cross-lingual and cross-dataset experiments
T Kuzman, N Ljubešic, S Pollak
Jezikovne tehnologije in digitalna humanistika: zbornik konference …, 2022
62022
Verbal multiword expressions in Slovene
P Gantar, S Krek, T Kuzman
Computational and Corpus-Based Phraseology: Second International Conference …, 2017
62017
Slovene-English parallel corpus MaCoCu-sl-en 2.0
M Bañón, M Chichirau, M Esplà-Gomis, ML Forcada, A Galiano-Jiménez, ...
Jožef Stefan Institute, 2023
52023
Choice of plausible alternatives dataset in Serbian COPA-SR
N Ljubešić, M Starović, T Kuzman, T Samardžić
Jožef Stefan Institute, 2022
52022
Exploring the Impact of Lexical and Grammatical Features on Automatic Genre Identification
T Kuzman, N Ljubešić
Proceedings of the Odkrivanje Znanja in Podatkovna Skladišca—SiKDD …, 2022
52022
Annotated Corpora and Tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
C Ramisch, SR Cordeiro, A Savary, V Vincze, V Barbu Mititelu, A Bhatia, ...
LINDAT/CLARIAH-CZ Digital Library at the Institute of Formal and Applied …, 2018
52018
ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification. March 7, 2023
T Kuzman, I Mozetič, N Ljubešić
Reference Source, 0
5
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–20