Large-scale text-to-image generation models for visual artists' creative works

HK Ko, G Park, H Jeon, J Jo, J Kim, J Seo - Proceedings of the 28th …, 2023 - dl.acm.org
Large-scale Text-to-image Generation Models (LTGMs)(eg, DALL-E), self-supervised deep
learning models trained on a huge dataset, have demonstrated the capacity for generating …

Understanding design collaboration between designers and artificial intelligence: A systematic literature review

Y Shi, T Gao, X Jiao, N Cao - Proceedings of the ACM on Human …, 2023 - dl.acm.org
Recent interest in design through the artificial intelligence (AI) lens is rapidly increasing.
Designers, as a special user group interacting with AI, have received more attention in the …

Pix2struct: Screenshot parsing as pretraining for visual language understanding

K Lee, M Joshi, IR Turc, H Hu, F Liu… - International …, 2023 - proceedings.mlr.press
Visually-situated language is ubiquitous—sources range from textbooks with diagrams to
web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to …

Androidinthewild: A large-scale dataset for android device control

C Rawles, A Li, D Rodriguez… - Advances in Neural …, 2024 - proceedings.neurips.cc
There is a growing interest in device-control systems that can interpret human natural
language instructions and execute them on a digital device by directly controlling its user …

Dark patterns at scale: Findings from a crawl of 11K shop** websites

A Mathur, G Acar, MJ Friedman, E Lucherini… - Proceedings of the …, 2019 - dl.acm.org
Dark patterns are user interface design choices that benefit an online service by coercing,
steering, or deceiving users into making unintended and potentially harmful decisions. We …

Screen2words: Automatic mobile UI summarization with multimodal learning

B Wang, G Li, X Zhou, Z Chen, T Grossman… - The 34th Annual ACM …, 2021 - dl.acm.org
Mobile User Interface Summarization generates succinct language descriptions of mobile
screens for conveying important contents and functionalities of the screen, which can be …

Webui: A dataset for enhancing visual ui understanding with web semantics

J Wu, S Wang, S Shen, YH Peng, J Nichols… - Proceedings of the …, 2023 - dl.acm.org
Modeling user interfaces (UIs) from visual information allows systems to make inferences
about the functionality and semantics needed to support use cases in accessibility, app …

Android in the zoo: Chain-of-action-thought for gui agents

J Zhang, J Wu, Y Teng, M Liao, N Xu, X **ao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone,
which completes a task triggered by natural language through predicting a sequence of …

Layoutprompter: awaken the design ability of large language models

J Lin, J Guo, S Sun, Z Yang, JG Lou… - Advances in Neural …, 2023 - proceedings.neurips.cc
Conditional graphic layout generation, which automatically maps user constraints to high-
quality layouts, has attracted widespread attention today. Although recent works have …

Screen recognition: Creating accessibility metadata for mobile applications from pixels

X Zhang, L De Greef, A Swearngin, S White… - Proceedings of the …, 2021 - dl.acm.org
Many accessibility features available on mobile platforms require applications (apps) to
provide complete and accurate metadata describing user interface (UI) components …