Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - ncbi.nlm.nih.gov
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …

Webui: A dataset for enhancing visual ui understanding with web semantics

J Wu, S Wang, S Shen, YH Peng, J Nichols… - Proceedings of the …, 2023 - dl.acm.org
Modeling user interfaces (UIs) from visual information allows systems to make inferences
about the functionality and semantics needed to support use cases in accessibility, app …

Evaluating a large language model on searching for gui layouts

P Brie, N Burny, A Sluÿters… - Proceedings of the ACM on …, 2023 - dl.acm.org
The field of generative artificial intelligence has seen significant advancements in recent
years with the advent of large language models, which have shown impressive results in …

Towards complete icon labeling in mobile applications

J Chen, A Swearngin, J Wu, T Barik, J Nichols… - Proceedings of the …, 2022 - dl.acm.org
Accurately recognizing icon types in mobile applications is integral to many tasks, including
accessibility improvement, UI design search, and conversational agents. Existing research …

Never-ending learning of user interfaces

J Wu, R Krosnick, E Schoop, A Swearngin… - Proceedings of the 36th …, 2023 - dl.acm.org
Machine learning models have been trained to predict semantic information about user
interfaces (UIs) to make apps more accessible, easier to test, and to automate. Currently …

Data-driven prototy** via natural-language-based GUI retrieval

K Kolthoff, C Bartelt, SP Ponzetto - Automated software engineering, 2023 - Springer
Rapid GUI prototy** has evolved into a widely applied technique in early stages of
software development to facilitate the clarification and refinement of requirements …

Dreamstruct: Understanding slides and user interfaces via synthetic data generation

YH Peng, F Huq, Y Jiang, J Wu, XY Li… - … on Computer Vision, 2024 - Springer
Enabling machines to understand structured visuals like slides and user interfaces is
essential for making them accessible to people with disabilities. However, achieving such …

Predicting and explaining mobile ui tappability with vision modeling and saliency analysis

E Schoop, X Zhou, G Li, Z Chen, B Hartmann… - Proceedings of the 2022 …, 2022 - dl.acm.org
UI designers often correct false affordances and improve the discoverability of features when
users have trouble determining if elements are tappable. We contribute a novel system that …

Understanding screen relationships from screenshots of smartphone applications

S Feiz, J Wu, X Zhang, A Swearngin, T Barik… - Proceedings of the 27th …, 2022 - dl.acm.org
All graphical user interfaces are comprised of one or more screens that may be shown to the
user depending on their interactions. Identifying different screens of an app and …