Chroniclingamericaqa: A large-scale question answering dataset based on historical american newspaper pages

B Piryani, J Mozafari, A Jatowt - … of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Question answering (QA) and Machine Reading Comprehension (MRC) tasks have
significantly advanced in recent years due to the rapid development of deep learning …

Leveraging open large language models for historical named entity recognition

CE González-Gallardo, HTH Tran, A Hamdi… - … Conference on Theory …, 2024 - Springer
The efficacy of large-scale language models (LLMs) as few-shot learners has dominated the
field of natural language processing, achieving state-of-the-art performance in most tasks …

Playertv: Advanced player tracking and identification for automatic soccer highlight clips

HM Solberg, MH Sarkhoosh, S Gautam… - arxiv preprint arxiv …, 2024 - arxiv.org
In the rapidly evolving field of sports analytics, the automation of targeted video processing
is a pivotal advancement. We propose PlayerTV, an innovative framework which harnesses …

Injecting temporal-aware knowledge in historical named entity recognition

CE González-Gallardo, E Boros, E Giamphy… - … on Information Retrieval, 2023 - Springer
In this paper, we address the detection of named entities in multilingual historical collections.
We argue that, besides the multiple challenges that depend on the quality of digitization (eg …

Deep learning approaches for information extraction from visually rich documents: datasets, challenges and methods

H Gbada, K Kalti, MA Mahjoub - International Journal on Document …, 2024 - Springer
This paper focuses on Information Extraction from Visually Rich Documents, exploring how
deep learning methods are applied in this field. For the purpose of comparing the …

MHlinker: Research on a Joint Extraction Method of Fault Entity Relationship for Mine Hoist

X Dang, H Deng, X Dong, Z Zhu, F Li, L Wang - Electronics, 2023 - mdpi.com
Triplet extraction is the key technology to automatically construct knowledge graphs.
Extracting the triplet of mechanical equipment fault relationships is of great significance in …

Confidence-Aware Document OCR Error Detection

A Hemmer, M Coustaty, N Bartolo, JM Ogier - International Workshop on …, 2024 - Springer
Abstract Optical Character Recognition (OCR) continues to face accuracy challenges that
impact subsequent applications. To address these errors, we explore the utility of OCR …

Enhancing OCR with line segmentation mask for container text recognition in container terminal

Z Zhang, Y Ding, R Li, K Chen - Engineering Applications of Artificial …, 2024 - Elsevier
Abstract Optical Character Recognition (OCR) plays a pivotal role in enhancing the
operational efficiency of container ports. However, challenges such as angle limitations and …

Text Role Classification in Scientific Charts Using Multimodal Transformers

HJ Kim, N Lell, A Scherp - … on Applications of Natural Language to …, 2024 - Springer
Text role classification involves classifying the semantic role of textual elements within
scientific charts. We propose to finetune the multimodal document layout analysis models …

Generalizability in Document Layout Analysis for Scientific Article Figure & Caption Extraction

JP Naiman - arxiv preprint arxiv:2301.10781, 2023 - arxiv.org
The lack of generalizability--in which a model trained on one dataset cannot provide
accurate results for a different dataset--is a known problem in the field of document layout …