[HTML][HTML] Social media data for conservation science: A methodological overview
Improved understanding of human-nature interactions is crucial to conservation science and
practice, but collecting relevant data remains challenging. Recently, social media have …
practice, but collecting relevant data remains challenging. Recently, social media have …
A comprehensive survey of scene graphs: Generation and application
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …
attributes, and relationships between objects in the scene. As computer vision technology …
A metaverse: Taxonomy, components, applications, and open challenges
SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …
based on the social value of Generation Z that online and offline selves are not different …
Semantic communications: Principles and challenges
Semantic communication, regarded as the breakthrough beyond the Shannon paradigm,
aims at the successful transmission of semantic information conveyed by the source rather …
aims at the successful transmission of semantic information conveyed by the source rather …
Merlot reserve: Neural script knowledge through vision and language and sound
As humans, we navigate a multimodal world, building a holistic understanding from all our
senses. We introduce MERLOT Reserve, a model that represents videos jointly over time …
senses. We introduce MERLOT Reserve, a model that represents videos jointly over time …
The all-seeing project v2: Towards general relation comprehension of the open world
Abstract We present the All-Seeing Project V2: a new model and dataset designed for
understanding object relations in images. Specifically, we propose the All-Seeing Model V2 …
understanding object relations in images. Specifically, we propose the All-Seeing Model V2 …
[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models
Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …
grounding natural language in image data, facilitating a broad range of cross-modal tasks …
Causal intervention for weakly-supervised semantic segmentation
We present a causal inference framework to improve Weakly-Supervised Semantic
Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by …
Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by …
Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion
In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-
intricate setting, ie, generating intricate visual content from simple abstract text prompts …
intricate setting, ie, generating intricate visual content from simple abstract text prompts …
Unbiased scene graph generation from biased training
Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …