Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …
Webui: A dataset for enhancing visual ui understanding with web semantics
Modeling user interfaces (UIs) from visual information allows systems to make inferences
about the functionality and semantics needed to support use cases in accessibility, app …
about the functionality and semantics needed to support use cases in accessibility, app …
Evaluating a large language model on searching for gui layouts
P Brie, N Burny, A Sluÿters… - Proceedings of the ACM on …, 2023 - dl.acm.org
The field of generative artificial intelligence has seen significant advancements in recent
years with the advent of large language models, which have shown impressive results in …
years with the advent of large language models, which have shown impressive results in …
Towards complete icon labeling in mobile applications
Accurately recognizing icon types in mobile applications is integral to many tasks, including
accessibility improvement, UI design search, and conversational agents. Existing research …
accessibility improvement, UI design search, and conversational agents. Existing research …
Never-ending learning of user interfaces
Machine learning models have been trained to predict semantic information about user
interfaces (UIs) to make apps more accessible, easier to test, and to automate. Currently …
interfaces (UIs) to make apps more accessible, easier to test, and to automate. Currently …
Data-driven prototy** via natural-language-based GUI retrieval
Rapid GUI prototy** has evolved into a widely applied technique in early stages of
software development to facilitate the clarification and refinement of requirements …
software development to facilitate the clarification and refinement of requirements …
Dreamstruct: Understanding slides and user interfaces via synthetic data generation
Enabling machines to understand structured visuals like slides and user interfaces is
essential for making them accessible to people with disabilities. However, achieving such …
essential for making them accessible to people with disabilities. However, achieving such …
Predicting and explaining mobile ui tappability with vision modeling and saliency analysis
UI designers often correct false affordances and improve the discoverability of features when
users have trouble determining if elements are tappable. We contribute a novel system that …
users have trouble determining if elements are tappable. We contribute a novel system that …
Understanding screen relationships from screenshots of smartphone applications
All graphical user interfaces are comprised of one or more screens that may be shown to the
user depending on their interactions. Identifying different screens of an app and …
user depending on their interactions. Identifying different screens of an app and …