Automated audio captioning: An overview of recent progress and new challenges

X Mei, X Liu, MD Plumbley, W Wang - … journal on audio, speech, and music …, 2022 - Springer
Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …

What do you mean by relation extraction? a survey on datasets and study on scientific relation classification

E Bassignana, B Plank - arxiv preprint arxiv:2204.13516, 2022 - arxiv.org
Over the last five years, research on Relation Extraction (RE) witnessed extensive progress
with many new dataset releases. At the same time, setup clarity has decreased, contributing …

Who evaluates the evaluations? objectively scoring text-to-image prompt coherence metrics with t2iscorescore (ts2)

M Saxon, F Jahara, M Khoshnoodi, Y Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
With advances in the quality of text-to-image (T2I) models has come interest in
benchmarking their prompt faithfulness--the semantic coherence of generated images to the …

ACTUAL: Audio captioning with caption feature space regularization

Y Zhang, H Yu, R Du, ZH Tan, W Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Audio captioning aims at describing the content of audio clips with human language. Due to
the ambiguity of audio content, different people may perceive the same audio clip differently …

[PDF][PDF] Automated audio captioning with keywords guidance

X Mei, X Liu, H Liu, J Sun, MD Plumbley… - Proc. Conf. Detection …, 2022 - dcase.community
This technical report describes an automated audio captioning system we submitted to
Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …

Learning to rank visual stories from human ranking data

CY Hsu, YW Chu, V Chen, KC Lo, C Chen… - Proceedings of the …, 2022 - aclanthology.org
Visual storytelling (VIST) is a typical vision and language task that has seen extensive
development in the natural language generation research domain. However, it remains …

Multi-Granularity Feature Fusion for Image-Guided Story Ending Generation

P Li, Q Huang, Z Li, Y Cai, F Shuang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Image-guided Story Ending Generation aims at generating a reasonable and logical ending
given a story context and an ending-related image. The existing models have achieved …

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

AJ Piergiovanni, D Kim, MS Ryoo, I Noble… - arxiv preprint arxiv …, 2024 - arxiv.org
Generating automatic dense captions for videos that accurately describe their contents
remains a challenging area of research. Most current models require processing the entire …

[BUKU][B] Responsible AI via responsible large language models

SG Levy - 2023 - search.proquest.com
Large language models have advanced the state-of-the-art in natural language processing
and achieved success in tasks such as summarization, question answering, and text …

[PDF][PDF] Cross-domain Relation Extraction

E Bassignana - 2024 - pure.itu.dk
Abstract Language technologies are widely spreading over a diverse range of applications.
Therefore, the ability of computational systems to easily adapt to new unseen situations is …