Perceptual User Interfaces

Our goal is to develop next-generation human-machine interfaces that offer human-like interactive capabilities. To this end, we research fundamental computational methods as well as ambient and on-body systems to sense, model, and analyse everyday non-verbal human behavior and cognition.

Our research towards this goal spans five main areas: Cognition-aware Computing, Social Signal Processing, Multimodal Interaction, Attentive User Interfaces, and Usable Security and Privacy.

Please find selected key publications for each area below.

Cognition-aware Computing

Neural Reasoning About Agents’ Goals, Preferences, and Actions

Matteo Bortoletto, Lei Shi, Andreas Bulling

Proc. 38th AAAI Conference on Artificial Intelligence (AAAI), pp. 1–13, 2024.

Abstract Links BibTeX Project

We propose the Intuitive Reasoning Network (IRENE) – a novel neural model for intuitive psychological reasoning about agents’ goals, preferences, and actions that can generalise previous experiences to new situations. IRENE combines a graph neural network for learning agent and world state representations with a transformer to encode the task context. When evaluated on the challenging Baby Intuitions Benchmark, IRENE achieves new state-of-the-art performance on three out of its five tasks – with up to 48.9 % improvement. In contrast to existing methods, IRENE is able to bind preferences to specific agents, to better distinguish between rational and irrational agents, and to better understand the role of blocking obstacles. We also investigate, for the first time, the influence of the training tasks on test performance. Our analyses demonstrate the effectiveness of IRENE in combining prior knowledge gained during training for unseen evaluation tasks.

doi:

Paper: bortoletto24_aaai.pdf

Code: https://git.hcics.simtech.uni-stuttgart.de/public-projects/IRENE

@inproceedings{bortoletto24_aaai, author = {Bortoletto, Matteo and Shi, Lei and Bulling, Andreas}, title = {Neural Reasoning About Agents’ Goals, Preferences, and Actions}, booktitle = {Proc. 38th AAAI Conference on Artificial Intelligence (AAAI)}, year = {2024}, pages = {1--13}, doi = {} }

Usable and Fast Interactive Mental Face Reconstruction

Florian Strohm, Mihai Bâce, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 1–15, 2023.

Abstract Links BibTeX Project

We introduce an end-to-end interactive system for mental face reconstruction – the challenging task of visually reconstructing a face image a person only has in their mind. In contrast to existing methods that suffer from low usability and high mental load, our approach only requires the user to rank images over multiple iterations according to the perceived similarity with their mental image. Based on these rankings, our mental face reconstruction system extracts image features in each iteration, combines them into a joint feature vector, and then uses a generative model to visually reconstruct the mental image. To avoid the need for collecting large amounts of human training data, we further propose a computational user model that can simulate human ranking behaviour using data from an online crowd-sourcing study (N=215). Results from a 12-participant user study show that our method can reconstruct mental images that are visually similar to existing approaches but has significantly higher usability, lower perceived workload, and is 40% faster. In addition, results from a third 22-participant lineup study in which we validated our reconstructions on a face ranking task show a identification rate of 55.3%, which is in line with prior work. These results represent an important step towards new interactive intelligent systems that can robustly and effortlessly reconstruct a user’s mental image.

doi: https://doi.org/10.1145/3586183.3606795

Paper: strohm23_uist.pdf

@inproceedings{strohm23_uist, author = {Strohm, Florian and B{\^a}ce, Mihai and Bulling, Andreas}, title = {Usable and Fast Interactive Mental Face Reconstruction}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, year = {2023}, pages = {1--15}, doi = {https://doi.org/10.1145/3586183.3606795} }

Neural Photofit: Gaze-based Mental Image Reconstruction

Florian Strohm, Ekta Sood, Sven Mayer, Philipp Müller, Mihai Bâce, Andreas Bulling

Proc. IEEE International Conference on Computer Vision (ICCV), pp. 245-254, 2021.

Abstract Links BibTeX Project

We propose a novel method that leverages human fixations to visually decode the image a person has in mind into a photofit (facial composite). Our method combines three neural networks: An encoder, a scoring network, and a decoder. The encoder extracts image features and predicts a neural activation map for each face looked at by a human observer. A neural scoring network compares the human and neural attention and predicts a relevance score for each extracted image feature. Finally, image features are aggregated into a single feature vector as a linear combination of all features weighted by relevance which a decoder decodes into the final photofit. We train the neural scoring network on a novel dataset containing gaze data of 19 participants looking at collages of synthetic faces. We show that our method significantly outperforms a mean baseline predictor and report on a human study that shows that we can decode photofits that are visually plausible and close to the observer’s mental image. Code and dataset available upon request.

doi: 10.1109/ICCV48922.2021.00031

Paper: strohm21_iccv.pdf

Code: Available upon request.

Dataset: Available upon request.

@inproceedings{strohm21_iccv, title = {Neural Photofit: Gaze-based Mental Image Reconstruction}, author = {Strohm, Florian and Sood, Ekta and Mayer, Sven and Müller, Philipp and Bâce, Mihai and Bulling, Andreas}, year = {2021}, booktitle = {Proc. IEEE International Conference on Computer Vision (ICCV)}, doi = {10.1109/ICCV48922.2021.00031}, pages = {245-254} }

Deep Gaze Pooling: Inferring and Visually Decoding Search Intents From Human Gaze Fixations

Hosnieh Sattar, Mario Fritz, Andreas Bulling

Neurocomputing, 387, pp. 369–382, 2020.

Abstract Links BibTeX Project

Predicting the target of visual search from human eye fixations (gaze) is a difficult problem with many applications, e.g. in human-computer interaction. While previous work has focused on predicting specific search target instances, we propose the first approach to predict categories and attributes of search intents from gaze data and to visually reconstruct plausible targets. However, state-of-the-art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we further propose a novel Gaze Pooling Layer that combines gaze information with visual representations from Deep Learning approaches. Our scheme incorporates both spatial and temporal aspects of human gaze behavior as well as the appearance of the fixated locations. We propose an experimental setup and novel dataset and demonstrate the effectiveness of our method for gaze-based search target prediction and reconstruction. We highlight several practical advantages of our approach, such as compatibility with existing architectures, no need for gaze training data, and robustness to noise from common gaze sources.

doi: 10.1016/j.neucom.2020.01.028

Paper: sattar20_neurocomp.pdf

@article{sattar20_neurocomp, title = {Deep Gaze Pooling: Inferring and Visually Decoding Search Intents From Human Gaze Fixations}, author = {Sattar, Hosnieh and Fritz, Mario and Bulling, Andreas}, journal = {Neurocomputing}, year = {2020}, pages = {369–382}, volume = {387}, doi = {10.1016/j.neucom.2020.01.028} }

Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

Ekta Sood, Simon Tannert, Philipp Müller, Andreas Bulling

Advances in Neural Information Processing Systems (NeurIPS), pp. 1–15, 2020.

Abstract Links BibTeX Project

A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing (NLP). We propose a novel hybrid text saliency model (TSM) that, for the first time, combines a cognitive model of reading with explicit human gaze supervision in a single machine learning framework. We show on four different corpora that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modelling approach to integrate the predictions of the TSM into the attention layer of a network designed for a specific upstream task without the need for task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state-of-the-art performance for sentence compression on the challenging Google Sentence Compression corpus. As such, our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.

Paper: sood20_neurips.pdf

Code: https://git.hcics.simtech.uni-stuttgart.de/public-projects/human-gaze-guided-neural-attention-for-nlp

Supplementary Material: sood20_neurips_sup.pdf

Paper Access: https://proceedings.neurips.cc/paper/2020/hash/460191c72f67e90150a093b4585e7eb4-Abstract.html

@inproceedings{sood20_neurips, author = {Sood, Ekta and Tannert, Simon and Müller, Philipp and Bulling, Andreas}, title = {Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention}, year = {2020}, pages = {1--15}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, url = {https://proceedings.neurips.cc/paper/2020/hash/460191c72f67e90150a093b4585e7eb4-Abstract.html} }

Moment-to-Moment Detection of Internal Thought during Video Viewing from Eye Vergence Behavior

Michael Xuelin Huang, Jiajia Li, Grace Ngai, Hong Va Leong, Andreas Bulling

Proc. ACM Multimedia (MM), pp. 1–9, 2019.

Abstract Links BibTeX Project

Internal thought refers to the process of directing attention away from a primary visual task to internal cognitive processing. Internal thought is a pervasive mental activity and closely related to primary task performance. As such, automatic detection of internal thought has significant potential for user modelling in intelligent interfaces, particularly for e-learning applications. Despite the close link between the eyes and the human mind, only a few studies have investigated vergence behaviour during internal thought and none has studied moment-to-moment detection of internal thought from gaze. While prior studies relied on long-term data analysis and required a large number of gaze characteristics, we describe a novel method that is computationally light-weight and that only requires eye vergence information that is readily available from binocular eye trackers. We further propose a novel paradigm to obtain ground truth internal thought annotations that exploits human blur perception. We evaluate our method for three increasingly challenging detection tasks: (1) during a controlled math-solving task, (2) during natural viewing of lecture videos, and (3) during daily activities, such as coding, browsing, and reading. Results from these evaluations demonstrate the performance and robustness of vergence-based detection of internal thought and, as such, open up new directions for research on interfaces that adapt to shifts of mental attention.

doi: 10.1145/3343031.3350573

Paper: huang19_mm.pdf

@inproceedings{huang19_mm, title = {Moment-to-Moment Detection of Internal Thought during Video Viewing from Eye Vergence Behavior}, author = {Huang, Michael Xuelin and Li, Jiajia and Ngai, Grace and Leong, Hong Va and Bulling, Andreas}, booktitle = {Proc. ACM Multimedia (MM)}, year = {2019}, doi = {10.1145/3343031.3350573}, pages = {1--9} }

Eye movements during everyday behavior predict personality traits

Sabrina Hoppe, Tobias Loetscher, Stephanie Morey, Andreas Bulling

Frontiers in Human Neuroscience, 12, pp. 1–8, 2018.

Abstract Links BibTeX Project

Besides allowing us to perceive our surroundings, eye movements are also a window into our mind and a rich source of information on who we are, how we feel, and what we do. Here we show that eye movements during an everyday task predict aspects of our personality. We tracked eye movements of 42 participants while they ran an errand on a university campus and subsequently assessed their personality traits using well-established questionnaires. Using a state-of-the-art machine learning method and a rich set of features encoding different eye movement characteristics, we were able to reliably predict four of the Big Five personality traits (neuroticism, extraversion, agreeableness, conscientiousness) as well as perceptual curiosity only from eye movements. Further analysis revealed new relations between previously neglected eye movement characteristics and personality. Our findings demonstrate a considerable influence of personality on everyday eye movement control, thereby complementing earlier studies in laboratory settings. Improving automatic recognition and interpretation of human social signals is an important endeavor, enabling innovative design of human–computer systems capable of sensing spontaneous natural user behavior to facilitate efficient interaction and personalization.

doi: 10.3389/fnhum.2018.00105

Paper: hoppe18_fhns.pdf

@article{hoppe18_fhns, title = {Eye movements during everyday behavior predict personality traits}, author = {Hoppe, Sabrina and Loetscher, Tobias and Morey, Stephanie and Bulling, Andreas}, doi = {10.3389/fnhum.2018.00105}, volume = {12}, pages = {1--8}, year = {2018}, journal = {Frontiers in Human Neuroscience} }

Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

Hosnieh Sattar, Andreas Bulling, Mario Fritz

Proc. IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 2740-2748, 2017.

Abstract Links BibTeX Project

Predicting the target of visual search from eye fixation (gaze) data is a challenging problem with many applications in human-computer interaction. In contrast to previous work that has focused on individual instances as search target, we propose the first approach to predict categories and attributes of search targets based on gaze data. However, state of the art models for categorical recognition in general require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism - incorporating both spatial and temporal aspects of human gaze behavior. We show that our approach is effective even when the gaze pooling layer is added to an already trained CNN, thus eliminating the need for expensive joint data collection of visual and gaze data. We propose an experimental setup and data set and demonstrate the effectiveness of our method for search target prediction based on gaze behavior. We further study how to integrate temporal and spatial gaze information most effectively, and indicate directions for future research in gaze-based prediction of mental states.

doi: 10.1109/ICCVW.2017.322

Paper: sattar17_iccvw.pdf

@inproceedings{sattar17_iccvw, title = {Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling}, author = {Sattar, Hosnieh and Bulling, Andreas and Fritz, Mario}, year = {2017}, pages = {2740-2748}, doi = {10.1109/ICCVW.2017.322}, booktitle = {Proc. IEEE International Conference on Computer Vision Workshops (ICCVW)} }

Prediction of Search Targets From Fixations in Open-world Settings

Hosnieh Sattar, Sabine Müller, Mario Fritz, Andreas Bulling

Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 981-990, 2015.

Abstract Links BibTeX Project

Previous work on predicting the target of visual search from human fixations only considered closed-world settings in which training labels are available and predictions are performed for a known set of potential targets. In this work we go beyond the state of the art by studying search target prediction in an open-world setting in which we no longer assume that we have fixation data to train for the search targets. We present a dataset containing fixation data of 18 users searching for natural images from three image categories within synthesised image collages of about 80 images. In a closed-world baseline experiment we show that we can predict the correct target image out of a candidate set of five images. We then present a new problem formulation for search target prediction in the open-world setting that is based on learning compatibilities between fixations and potential targets.

doi: 10.1109/CVPR.2015.7298700

Paper: sattar15_cvpr.pdf

@inproceedings{sattar15_cvpr, author = {Sattar, Hosnieh and M{\"{u}}ller, Sabine and Fritz, Mario and Bulling, Andreas}, title = {Prediction of Search Targets From Fixations in Open-world Settings}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2015}, pages = {981-990}, doi = {10.1109/CVPR.2015.7298700} }

Cognition-Aware Computing

Andreas Bulling, Thorsten O. Zander

IEEE Pervasive Computing, 13(3), pp. 80-83, 2014.

Abstract Links BibTeX Project

Despite significant advances in context sensing and inference since its inception in the late 1990s, context-aware computing still doesn’t implement a holistic view of all covert aspects of the user state. Here, the authors introduce the concept of cognitive context as an extension to the current notion of context with a cognitive dimension. They argue that visual behavior and brain activity are two promising sensing modalities for assessing the cognitive context and thus the development of cognition-aware computing systems.

doi: 10.1109/mprv.2014.42

Paper: bulling14_pcm.pdf

@article{bulling14_pcm, author = {Bulling, Andreas and Zander, Thorsten O.}, keywords = {bioinformatics, cognition, cognition-aware computing, Context modeling, Context-aware computing, electroencephalography, intelligent systems, Pervasive computing, Sensors, tracking, Visualization}, title = {Cognition-Aware Computing}, journal = {IEEE Pervasive Computing}, volume = {13}, number = {3}, year = {2014}, pages = {80-83}, doi = {10.1109/mprv.2014.42} }

ConAn: A Usable Tool for Multimodal Conversation Analysis

Anna Penzkofer, Philipp Müller, Felix Bühler, Sven Mayer, Andreas Bulling

Proc. ACM International Conference on Multimodal Interaction (ICMI), pp. 341-351, 2021.

Abstract Links BibTeX Project

Multimodal analysis of group behavior is a key task in human-computer interaction, as well as the social and behavioral sciences, but is often limited to more easily controllable laboratory settings or requires elaborate multi-sensor setups and time-consuming manual data annotation. We present ConAn – a usable tool to explore and automatically analyze non-verbal behavior of multiple persons during natural group conversations. In contrast to traditional multi-sensor setups, our tool only requires a single 360° camera and uses state-of-the-art computer vision methods to automatically extract behavioral indicators, such as gaze direction, facial expressions, and speaking activity. Thus, our tool allows for easy and fast deployment supporting researchers in understanding both individual behavior and group interaction dynamics, but also in quantifying user-object interactions. We illustrate the benefits of our tool on three sample use cases: general conversation analysis, assessment of collaboration quality, and impact of technology on audience behavior. Taken together, ConAn represents an important step towards democratizing automatic conversation analysis in HCI and beyond.

doi: 10.1145/3462244.3479886

Paper: penzkofer21_icmi.pdf

Code: https://git.hcics.simtech.uni-stuttgart.de/public-projects/conan

Video: https://www.youtube.com/watch?v=H2KfZNgx6CQ

@inproceedings{penzkofer21_icmi, author = {Penzkofer, Anna and Müller, Philipp and Bühler, Felix and Mayer, Sven and Bulling, Andreas}, title = {ConAn: A Usable Tool for Multimodal Conversation Analysis}, booktitle = {Proc. ACM International Conference on Multimodal Interaction (ICMI)}, year = {2021}, doi = {10.1145/3462244.3479886}, pages = {341-351}, video = {https://www.youtube.com/watch?v=H2KfZNgx6CQ} }

Emergent Leadership Detection Across Datasets

Philipp Müller, Andreas Bulling

Proc. ACM International Conference on Multimodal Interaction (ICMI), pp. 274-278, 2019.

Abstract Links BibTeX Project

Automatic detection of emergent leaders in small groups from nonverbal behaviour is a growing research topic in social signal processing but existing methods were evaluated on single datasets – an unrealistic assumption for real-world applications in which systems are required to also work in settings unseen at training time. It therefore remains unclear whether current methods for emergent leadership detection generalise to similar but new settings and to which extent. To overcome this limitation, we are the first to study a cross-dataset evaluation setting for the emergent leadership detection task. We provide evaluations for within- and cross-dataset prediction using two current datasets (PAVIS and MPIIGroupInteraction), as well as an investigation on the robustness of commonly used feature channels and online prediction in the cross-dataset setting. Our evaluations show that using pose and eye contact based features, cross-dataset prediction is possible with an accuracy of 0.68, as such providing another important piece of the puzzle towards real-world emergent leadership detection.

doi: 10.1145/3340555.3353721

Paper: mueller19_icmi.pdf

@inproceedings{mueller19_icmi, title = {Emergent Leadership Detection Across Datasets}, author = {M{\"{u}}ller, Philipp and Bulling, Andreas}, year = {2019}, pages = {274-278}, booktitle = {Proc. ACM International Conference on Multimodal Interaction (ICMI)}, doi = {10.1145/3340555.3353721} }

Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behavior

Philipp Müller, Michael Xuelin Huang, Andreas Bulling

Proc. ACM International Conference on Intelligent User Interfaces (IUI), pp. 153-164, 2018.

Abstract Links BibTeX Project

Rapport, the close and harmonious relationship in which interaction partners are "in sync" with each other, was shown to result in smoother social interactions, improved collaboration, and improved interpersonal outcomes. In this work, we are first to investigate automatic prediction of low rapport during natural interactions within small groups. This task is challenging given that rapport only manifests in subtle non-verbal signals that are, in addition, subject to influences of group dynamics as well as inter-personal idiosyncrasies. We record videos of unscripted discussions of three to four people using a multi-view camera system and microphones. We analyse a rich set of non-verbal signals for rapport detection, namely facial expressions, hand motion, gaze, speaker turns, and speech prosody. Using facial features, we can detect low rapport with an average precision of 0.7 (chance level at 0.25), while incorporating prior knowledge of participants’ personalities can even achieve early prediction without a drop in performance. We further provide a detailed analysis of different feature sets and the amount of information contained in different temporal segments of the interactions.

doi: 10.1145/3172944.3172969

Paper: mueller18_iui.pdf

@inproceedings{mueller18_iui, title = {Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behavior}, author = {M{\"{u}}ller, Philipp and Huang, Michael Xuelin and Bulling, Andreas}, year = {2018}, pages = {153-164}, booktitle = {Proc. ACM International Conference on Intelligent User Interfaces (IUI)}, doi = {10.1145/3172944.3172969} }
Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour

Philipp Müller, Michael Xuelin Huang, Xucong Zhang, Andreas Bulling

Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 1–10, 2018.

Abstract Links BibTeX Project

Eye contact is one of the most important non-verbal social cues and fundamental to human interactions. However, detecting eye contact without specialized eye tracking equipment poses significant challenges, particularly for multiple people in real-world settings. We present a novel method to robustly detect eye contact in natural three- and four-person interactions using off-the-shelf ambient cameras. Our method exploits that, during conversations, people tend to look at the person who is currently speaking. Harnessing the correlation between people’s gaze and speaking behaviour therefore allows our method to automatically acquire training data during deployment and adaptively train eye contact detectors for each target user. We empirically evaluate the performance of our method on a recent dataset of natural group interactions and demonstrate that it achieves a relative improvement over the state-of-the-art method of more than 60%, and also improves over a head pose based baseline.

doi: 10.1145/3204493.3204549

Paper: mueller18_etra.pdf

@inproceedings{mueller18_etra, author = {M{\"{u}}ller, Philipp and Huang, Michael Xuelin and Zhang, Xucong and Bulling, Andreas}, title = {Robust Eye Contact Detection in Natural Multi-Person Interactions Using Gaze and Speaking Behaviour}, booktitle = {Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA)}, year = {2018}, pages = {1--10}, doi = {10.1145/3204493.3204549} }

Discovery of Everyday Human Activities From Long-term Visual Behaviour Using Topic Models

Julian Steil, Andreas Bulling

Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), pp. 75-85, 2015.

Abstract Links BibTeX Project

Human visual behaviour has significant potential for activity recognition and computational behaviour analysis, but previous works focused on supervised methods and recognition of predefined activity classes based on short-term eye movement recordings. We propose a fully unsupervised method to discover users’ everyday activities from their long-term visual behaviour. Our method combines a bag-of-words representation of visual behaviour that encodes saccades, fixations, and blinks with a latent Dirichlet allocation (LDA) topic model. We further propose different methods to encode saccades for their use in the topic model. We evaluate our method on a novel long-term gaze dataset that contains full-day recordings of natural visual behaviour of 10 participants (more than 80 hours in total). We also provide annotations for eight sample activity classes (outdoor, social interaction, focused work, travel, reading, computer work, watching media, eating) and periods with no specific activity. We show the ability of our method to discover these activities with performance competitive with that of previously published supervised methods.

doi: 10.1145/2750858.2807520

Paper: steil15_ubicomp.pdf

@inproceedings{steil15_ubicomp, author = {Steil, Julian and Bulling, Andreas}, title = {Discovery of Everyday Human Activities From Long-term Visual Behaviour Using Topic Models}, booktitle = {Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)}, year = {2015}, doi = {10.1145/2750858.2807520}, pages = {75-85} }
Emotion recognition from embedded bodily expressions and speech during dyadic interactions

Philipp Müller, Sikandar Amin, Prateek Verma, Mykhaylo Andriluka, Andreas Bulling

Proc. International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 663-669, 2015.

Abstract Links BibTeX Project

Previous work on emotion recognition from bodily expressions focused on analysing such expressions in isolation, of individuals or in controlled settings, from a single camera view, or required intrusive motion tracking equipment. We study the problem of emotion recognition from bodily expressions and speech during dyadic (person-person) interactions in a real kitchen instrumented with ambient cameras and microphones. We specifically focus on bodily expressions that are embedded in regular interactions and background activities and recorded without human augmentation to increase naturalness of the expressions. We present a human-validated dataset that contains 224 high-resolution, multi-view video clips and audio recordings of emotionally charged interactions between eight couples of actors. The dataset is fully annotated with categorical labels for four basic emotions (anger, happiness, sadness, and surprise) and continuous labels for valence, activation, power, and anticipation provided by five annotators for each actor. We evaluate vision and audio-based emotion recognition using dense trajectories and a standard audio pipeline and provide insights into the importance of different body parts and audio features for emotion recognition.

doi: 10.1109/ACII.2015.7344640

Paper: mueller15_acii.pdf

@inproceedings{mueller15_acii, title = {Emotion recognition from embedded bodily expressions and speech during dyadic interactions}, author = {M{\"{u}}ller, Philipp and Amin, Sikandar and Verma, Prateek and Andriluka, Mykhaylo and Bulling, Andreas}, year = {2015}, pages = {663-669}, doi = {10.1109/ACII.2015.7344640}, booktitle = {Proc. International Conference on Affective Computing and Intelligent Interaction (ACII)} }
The Royal Corgi: Exploring Social Gaze Interaction for Immersive Gameplay

Mélodie Vidal, Remi Bismuth, Andreas Bulling, Hans Gellersen

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 115-124, 2015.

Abstract Links BibTeX Project

The eyes are a rich channel for non-verbal communication in our daily interactions. We propose social gaze interaction as a game mechanic to enhance user interactions with virtual characters. We develop a game from the ground-up in which characters are designed to be reactive to the player’s gaze in social ways, such as getting annoyed when the player seems distracted or changing their dialogue depending on the player’s apparent focus of attention. Results from a qualitative user study provide insights bout how social gaze interaction is intuitive for users, elicits deep feelings of immersion, and highlight the players’ self-consciousness of their own eye movements through their strong reactions to the characters.

doi: 10.1145/2702123.2702163

Paper: vidal15_chi.pdf

@inproceedings{vidal15_chi, author = {Vidal, M{\'{e}}lodie and Bismuth, Remi and Bulling, Andreas and Gellersen, Hans}, title = {{The Royal Corgi: Exploring Social Gaze Interaction for Immersive Gameplay}}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2015}, pages = {115-124}, doi = {10.1145/2702123.2702163} }

A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors

Andreas Bulling, Ulf Blanke, Bernt Schiele

ACM Computing Surveys, 46(3), pp. 1–33, 2014.

Abstract Links BibTeX Project

The last 20 years have seen an ever increasing research activity in the field of human activity recognition. With activity recognition having considerably matured so did the number of challenges in designing, implementing and evaluating activity recognition systems. This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition. It specifically focuses on activity recognition using on-body inertial sensors. We first discuss the key research challenges that human activity recognition shares with general pattern recognition and identify those challenges that are specific to human activity recognition. We then describe the concept of an activity recognition chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems. We detail each component of the framework, provide references to related research and introduce the best practise methods developed by the activity recognition research community. We conclude with the educational example problem of recognising different hand gestures from inertial sensors attached to the upper and lower arm. We illustrate how each component of this framework can be implemented for this specific activity recognition problem and demonstrate how different implementations compare and how they impact overall recognition performance.

doi: 10.1145/2499621

Paper: bulling14_csur.pdf

@article{bulling14_csur, author = {Bulling, Andreas and Blanke, Ulf and Schiele, Bernt}, title = {A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors}, journal = {ACM Computing Surveys}, volume = {46}, number = {3}, year = {2014}, pages = {1--33}, doi = {10.1145/2499621} }

EyeContext: Recognition of High-level Contextual Cues from Human Visual Behaviour

Andreas Bulling, Christian Weichel, Hans Gellersen

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 305-308, 2013.

Abstract Links BibTeX Project

In this work we present EyeContext, a system to infer high-level contextual cues from human visual behaviour. We conducted a user study to record eye movements of four participants over a full day of their daily life, totalling 42.5 hours of eye movement data. Participants were asked to self-annotate four non-mutually exclusive cues: social (interacting with somebody vs. no interaction), cognitive (concentrated work vs. leisure), physical (physically active vs. not active), and spatial (inside vs. outside a building). We evaluate a proof-of-concept EyeContext system that combines encoding of eye movements into strings and a spectrum string kernel support vector machine (SVM) classifier. Our results demonstrate the large information content available in long-term human visual behaviour and opens up new venues for research on eye-based behavioural monitoring and life logging.

doi: 10.1145/2470654.2470697

Paper: bulling13_chi.pdf

Video: https://www.youtube.com/watch?v=bhdVmWnnnIM

@inproceedings{bulling13_chi, author = {Bulling, Andreas and Weichel, Christian and Gellersen, Hans}, title = {EyeContext: Recognition of High-level Contextual Cues from Human Visual Behaviour}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2013}, pages = {305-308}, doi = {10.1145/2470654.2470697}, video = {https://www.youtube.com/watch?v=bhdVmWnnnIM} }
AutoBAP: Automatic Coding of Body Action and Posture Units from Wearable Sensors

Eduardo Velloso, Andreas Bulling, Hans Gellersen

Proc. Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 135-140, 2013.

Abstract Links BibTeX Project

Manual annotation of human body movement is an integral part of research on non-verbal communication and computational behaviour analysis but also a very time-consuming and tedious task. In this paper we present AutoBAP, a system that automates the coding of bodily expressions according to the body action and posture (BAP) coding scheme. Our system takes continuous body motion and gaze behaviour data as its input. The data is recorded using a full body motion tracking suit and a wearable eye tracker. From the data our system automatically generates a labelled XML file that can be visualised and edited with off-the-shelf video annotation tools. We evaluate our system in a laboratory-based user study with six participants performing scripted sequences of 184 actions. Results from the user study show that our prototype system is able to annotate 172 out of the 274 labels of the full BAP coding scheme with good agreement with a manual annotator (Cohen’s kappa > 0.6).

doi: 10.1109/ACII.2013.29

Paper: velloso13_acii.pdf

@inproceedings{velloso13_acii, author = {Velloso, Eduardo and Bulling, Andreas and Gellersen, Hans}, title = {AutoBAP: Automatic Coding of Body Action and Posture Units from Wearable Sensors}, booktitle = {Proc. Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)}, year = {2013}, pages = {135-140}, doi = {10.1109/ACII.2013.29} }
MotionMA: Motion Modelling and Analysis by Demonstration

Eduardo Velloso, Andreas Bulling, Hans Gellersen

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1309-1318, 2013.

Abstract Links BibTeX Project

Particularly in sports or physical rehabilitation, users have to perform body movements in a specific manner for the exercises to be most effective. It remains a challenge for experts to specify how to perform such movements so that an automated system can analyse further performances of it. In a user study with 10 participants we show that experts’ explicit estimates do not correspond to their performances. To address this issue we present MotionMA, a system that: (1) automatically extracts a model of movements demonstrated by one user, e.g. a trainer, (2) assesses the performance of other users repeating this movement in real time, and (3) provides real-time feedback on how to improve their performance. We evaluated the system in a second study in which 10 other participants used the system to demonstrate arbitrary movements. Our results demonstrate that MotionMA is able to extract an accurate movement model to spot mistakes and variations in movement execution.

doi: 10.1145/2470654.2466171

Paper: velloso13_chi.pdf

Video: https://www.youtube.com/watch?v=fFFWyt9LOhg

@inproceedings{velloso13_chi, author = {Velloso, Eduardo and Bulling, Andreas and Gellersen, Hans}, title = {MotionMA: Motion Modelling and Analysis by Demonstration}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2013}, pages = {1309-1318}, doi = {10.1145/2470654.2466171}, video = {https://www.youtube.com/watch?v=fFFWyt9LOhg} }

Multimodal Recognition of Reading Activity in Transit Using Body-Worn Sensors

Andreas Bulling, Jamie A. Ward, Hans Gellersen

ACM Transactions on Applied Perception (TAP), 9(1), pp. 1–21, 2012.

Abstract Links BibTeX Project

Reading is one of the most well studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognise reading activity by jointly analysing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognising reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognising reading events by more than 24%. Our work demonstrates that the joint analysis of multiple modalities is beneﬁcial for reading recognition and opens up discussion on the wider applicability of this recognition approach to other visual and physical activities.

doi: 10.1145/2134203.2134205

Paper: bulling12_tap.pdf

@article{bulling12_tap, author = {Bulling, Andreas and Ward, Jamie A. and Gellersen, Hans}, title = {Multimodal {R}ecognition of {R}eading {A}ctivity in {T}ransit {U}sing {B}ody-{W}orn {S}ensors}, journal = {ACM Transactions on Applied Perception (TAP)}, volume = {9}, number = {1}, year = {2012}, pages = {1--21}, doi = {10.1145/2134203.2134205} }

Eye Movement Analysis for Activity Recognition Using Electrooculography

Andreas Bulling, Jamie A. Ward, Hans Gellersen, Gerhard Tröster

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(4), pp. 741-753, 2011.

Abstract Links BibTeX Project

In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data was recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals - saccades, fixations, and blinks - and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video, and browsing the web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.

doi: 10.1109/TPAMI.2010.86

Paper: bulling11_pami.pdf

@article{bulling11_pami, author = {Bulling, Andreas and Ward, Jamie A. and Gellersen, Hans and Tr{\"{o}}ster, Gerhard}, keywords = {Feature evaluation and selection, signal processing, Ubiquitous computing}, title = {Eye {M}ovement {A}nalysis for {A}ctivity {R}ecognition {U}sing {E}lectrooculography}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, volume = {33}, number = {4}, year = {2011}, pages = {741-753}, doi = {10.1109/TPAMI.2010.86} }

Multimodal Interaction

Mouse2Vec: Learning Reusable Semantic Representations of Mouse Behaviour

Guanhua Zhang, Zhiming Hu, Mihai Bâce, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–17, 2024.

Abstract Links BibTeX Project

The mouse is a pervasive input device used for a wide range of interactive applications. However, computational modelling of mouse behaviour typically requires time-consuming design and extraction of handcrafted features, or approaches that are application-specific. We instead propose Mouse2Vec – a novel self-supervised method designed to learn semantic representations of mouse behaviour that are reusable across users and applications. Mouse2Vec uses a Transformer-based encoder-decoder architecture, which is specifically geared for mouse data: During pretraining, the encoder learns an embedding of input mouse trajectories while the decoder reconstructs the input and simultaneously detects mouse click events. We show that the representations learned by our method can identify interpretable mouse behaviour clusters and retrieve similar mouse trajectories. We also demonstrate on three sample downstream tasks that the representations can be practically used to augment mouse data for training supervised methods and serve as an effective feature extractor.

doi: 10.1145/3613904.3642141

Paper: zhang24_chi.pdf

@inproceedings{zhang24_chi, title = {Mouse2Vec: Learning Reusable Semantic Representations of Mouse Behaviour}, author = {Zhang, Guanhua and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas}, year = {2024}, pages = {1--17}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3613904.3642141} }

SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning

Chuhan Jiao, Zhiming Hu, Mihai Bâce, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 1–13, 2023.

Abstract Links BibTeX Project

We introduce SUPREYES – a novel self-supervised method to increase the spatio-temporal resolution of gaze data recorded using low(er)-resolution eye trackers. Despite continuing advances in eye tracking technology, the vast majority of current eye trackers – particularly mobile ones and those integrated into mobile devices – suffer from low-resolution gaze data, thus fundamentally limiting their practical usefulness. SUPREYES learns a continuous implicit neural representation from low-resolution gaze data to up-sample the gaze data to arbitrary resolutions. We compare our method with commonly used interpolation methods on arbitrary scale super-resolution and demonstrate that SUPREYES outperforms these baselines by a significant margin. We also test on the sample downstream task of gaze-based user identification and show that our method improves the performance of original low-resolution gaze data and outperforms other baselines. These results are promising as they open up a new direction for increasing eye tracking fidelity as well as enabling new gaze-based applications without the need for new eye tracking equipment.

doi: 10.1145/3586183.3606780

Paper: jiao23_uist.pdf

@inproceedings{jiao23_uist, author = {Jiao, Chuhan and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas}, title = {SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, year = {2023}, pages = {1--13}, doi = {10.1145/3586183.3606780} }

Understanding, Addressing, and Analysing Digital Eye Strain in Virtual Reality Head-Mounted Displays

Teresa Hirzle, Fabian Fischbach, Julian Karlbauer, Pascal Jansen, Jan Gugenheimer, Enrico Rukzio, Andreas Bulling

ACM Transactions on Computer-Human Interaction (TOCHI), 29(4), pp. 1-80, 2022.

Abstract Links BibTeX Project

Digital eye strain (DES), caused by prolonged exposure to digital screens, stresses the visual system and negatively affects users’ well-being and productivity. While DES is well-studied in computer displays, its impact on users of virtual reality (VR) head-mounted displays (HMDs) is largely unexplored—despite that some of their key properties (e.g., the vergence-accommodation conflict) make VR-HMDs particularly prone. This work provides the first comprehensive investigation into DES in VR HMDs. We present results from a survey with 68 experienced users to understand DES symptoms in VR-HMDs. To help address DES, we investigate eye exercises resulting from survey answers and blue light filtering in three user studies (N = 71). Results demonstrate that eye exercises, but not blue light filtering, can effectively reduce DES. We conclude with an extensive analysis of the user studies and condense our findings in 10 key challenges that guide future work in this emerging research area.

doi: 10.1145/3492802

Paper: hirzle22_tochi.pdf

@article{hirzle22_tochi, title = {Understanding, Addressing, and Analysing Digital Eye Strain in Virtual Reality Head-Mounted Displays}, author = {Hirzle, Teresa and Fischbach, Fabian and Karlbauer, Julian and Jansen, Pascal and Gugenheimer, Jan and Rukzio, Enrico and Bulling, Andreas}, year = {2022}, pages = {1-80}, doi = {10.1145/3492802}, journal = {ACM Transactions on Computer-Human Interaction (TOCHI)}, volume = {29}, number = {4} }

Designing for Noticeability: The Impact of Visual Importance on Desktop Notifications

Philipp Müller, Sander Staal, Mihai Bâce, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–13, 2022.

Abstract Links BibTeX Project

Desktop notifications should be noticeable but are also subject to a number of design choices, e.g. concerning their size, placement, or opacity. It is currently unknown, however, how these choices interact with the desktop background and their influence on noticeability. To address this limitation, we introduce a software tool to automatically synthesize realistically looking desktop images for major operating systems and applications. Using these images, we present a user study (N=34) to investigate the noticeability of notifications during a primary task. We are first to show that visual importance of the background at the notification location significantly impacts whether users detect notifications. We analyse the utility of visual importance to compensate for suboptimal design choices with respect to noticeability, e.g. small notification size. Finally, we introduce noticeability maps - 2D maps encoding the predicted noticeability across the desktop and inform designers how to trade-off notification design and noticeability.

doi: 10.1145/3491102.3501954

Paper: mueller22_chi.pdf

@inproceedings{mueller22_chi, title = {Designing for Noticeability: The Impact of Visual Importance on Desktop Notifications}, author = {Müller, Philipp and Staal, Sander and B{\^a}ce, Mihai and Bulling, Andreas}, year = {2022}, pages = {1--13}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3491102.3501954} }

A Critical Assessment of the Use of SSQ as a Measure of General Discomfort in VR Head-Mounted Displays

Teresa Hirzle, Maurice Cordts, Enrico Rukzio, Jan Gugenheimer, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–14, 2021.

Abstract Links BibTeX Project

Based on a systematic literature review of more than 300 papers published over the last 10 years, we show that the simulator sickness questionnaire (SSQ) is extensively used and widely accepted as general discomfort measure in virtual reality (VR) research - although it only accounts for one category of symptoms. This results in important other categories (digital eye strain (DES) and ergonomics) being largely neglected. To contribute to a more comprehensive picture of discomfort in VR head-mounted displays, we further conducted an online study (N=352) on the severity and relevance of all three symptom categories. Most importantly, our results reveal that symptoms of simulator sickness are significantly less severe and of lower prevalence than those of DES and ergonomics. In light of these findings, we critically discuss the current use of SSQ as the only discomfort measure and propose a more comprehensive factor model that also includes DES and ergonomics.

doi: 10.1145/3411764.3445361

Paper: hirzle21_chi.pdf

@inproceedings{hirzle21_chi, title = {A Critical Assessment of the Use of {SSQ} as a Measure of General Discomfort in VR Head-Mounted Displays}, author = {Hirzle, Teresa and Cordts, Maurice and Rukzio, Enrico and Gugenheimer, Jan and Bulling, Andreas}, year = {2021}, pages = {1--14}, doi = {10.1145/3411764.3445361}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)} }

Quantification of Users’ Visual Attention During Everyday Mobile Device Interactions

Mihai Bâce, Sander Staal, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–14, 2020.

Abstract Links BibTeX Project

We present the first real-world dataset and quantitative evaluation of visual attention of mobile device users in-situ, i.e. while using their devices during everyday routine. Understanding user attention is a core research challenge in mobile HCI but previous approaches relied on usage logs or self-reports that are only proxies and consequently do neither reflect attention completely nor accurately. Our evaluations are based on Everyday Mobile Visual Attention (EMVA) – a new 32-participant dataset containing around 472 hours of video snippets recorded over more than two weeks in real life using the front-facing camera as well as associated usage logs, interaction events, and sensor data. Using an eye contact detection method, we are first to quantify the highly dynamic nature of everyday visual attention across users, mobile applications, and usage contexts. We discuss key insights from our analyses that highlight the potential and inform the design of future mobile attentive user interfaces.

doi: 10.1145/3313831.3376449

Paper: bace20_chi.pdf

Dataset: http://www.emva-dataset.org/

News: https://ethz.ch/en/news-and-events/eth-news/news/2020/09/our-actual-attention-is-now-measurable.html

Video: https://www.youtube.com/watch?v=SzLn3LujIqw

@inproceedings{bace20_chi, title = {Quantification of Users' Visual Attention During Everyday Mobile Device Interactions}, author = {B{\^a}ce, Mihai and Staal, Sander and Bulling, Andreas}, year = {2020}, pages = {1--14}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3313831.3376449}, news = {https://ethz.ch/en/news-and-events/eth-news/news/2020/09/our-actual-attention-is-now-measurable.html}, video = {https://www.youtube.com/watch?v=SzLn3LujIqw} }

A Design Space for Gaze Interaction on Head-mounted Displays

Teresa Hirzle, Jan Gugenheimer, Florian Geiselhart, Andreas Bulling, Enrico Rukzio

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–12, 2019.

Abstract Links BibTeX Project

Augmented and virtual reality (AR/VR) has entered the mass market and, with it, will soon eye tracking as a core technology for next generation head-mounted displays (HMDs). In contrast to existing gaze interfaces, the 3D nature of AR and VR requires estimating a user’s gaze in 3D. While first applications, such as foveated rendering, hint at the compelling potential of combining HMDs and gaze, a systematic analysis is missing. To fill this gap, we present the first design space for gaze interaction on HMDs. Our design space covers human depth perception and technical requirements in two dimensions aiming to identify challenges and opportunities for interaction design. As such, our design space provides a comprehensive overview and serves as an important guideline for researchers and practitioners working on gaze interaction on HMDs. We further demonstrate how our design space is used in practice by presenting two interactive applications: EyeHealth and XRay-Vision.

doi: 10.1145/3290605.3300855

Paper: hirzle19_chi.pdf

@inproceedings{hirzle19_chi, author = {Hirzle, Teresa and Gugenheimer, Jan and Geiselhart, Florian and Bulling, Andreas and Rukzio, Enrico}, title = {A Design Space for Gaze Interaction on Head-mounted Displays}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2019}, doi = {10.1145/3290605.3300855}, pages = {1--12} }

Which one is me? Identifying Oneself on Public Displays

Mohamed Khamis, Christian Becker, Andreas Bulling, Florian Alt

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–12, 2018.

Abstract Links BibTeX Project Best paper honourable mention award

While user representations are extensively used on public displays, it remains unclear how well users can recognize their own representation among those of surrounding users. We study the most widely used representations: abstract objects, skeletons, silhouettes and mirrors. In a prestudy (N=12), we identify five strategies that users follow to recognize themselves on public displays. In a second study (N=19), we quantify the users’ recognition time and accuracy with respect to each representation type. Our findings suggest that there is a significant effect of (1) the representation type, (2) the strategies performed by users, and (3) the combination of both on recognition time and accuracy. We discuss the suitability of each representation for different settings and provide specific recommendations as to how user representations should be applied in multi-user scenarios. These recommendations guide practitioners and researchers in selecting the representation that optimizes the most for the deployment’s requirements, and for the user strategies that are feasible in that environment.

doi: 10.1145/3173574.3173861

Paper: khamis18_chi_2.pdf

Video: https://www.youtube.com/watch?v=yG5_RBrnRx0

@inproceedings{khamis18_chi_2, title = {Which one is me? Identifying Oneself on Public Displays}, author = {Khamis, Mohamed and Becker, Christian and Bulling, Andreas and Alt, Florian}, year = {2018}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3173574.3173861}, pages = {1--12}, video = {https://www.youtube.com/watch?v=yG5_RBrnRx0} }
The Past, Present, and Future of Gaze-enabled Handheld Mobile Devices: Survey and Lessons Learned

Mohamed Khamis, Florian Alt, Andreas Bulling

Proc. ACM International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI), pp. 1–17, 2018.

Abstract Links BibTeX Project Best paper honourable mention award

While first-generation mobile gaze interfaces required special-purpose hardware, recent advances in computational gaze estimation and the availability of sensor-rich and powerful devices is finally fulfilling the promise of pervasive eye tracking and eye-based interaction on off-the-shelf mobile devices. This work provides the first holistic view on the past, present, and future of eye tracking on handheld mobile devices. To this end, we discuss how research developed from building hardware prototypes, to accurate gaze estimation on unmodified smartphones and tablets. We then discuss implications by laying out 1) novel opportunities, which include pervasive advertising and conducting in-the-wild eye tracking studies on handhelds, as well as 2) new challenges that require further research, such as the visibility of the user’s eyes, lighting conditions, and privacy implications. We discuss how these developments shape MobileHCI research in the future, possibly the “next 20 years”, as the overarching theme of MobileHCI 2018 suggests.

doi: 10.1145/3229434.3229452

Paper: khamis18_mobilehci.pdf

@inproceedings{khamis18_mobilehci, author = {Khamis, Mohamed and Alt, Florian and Bulling, Andreas}, title = {The Past, Present, and Future of Gaze-enabled Handheld Mobile Devices: Survey and Lessons Learned}, booktitle = {Proc. ACM International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI)}, year = {2018}, doi = {10.1145/3229434.3229452}, pages = {1--17} }

Look together: using gaze for assisting co-located collaborative search

Yanxia Zhang, Ken Pfeuffer, Ming Ki Chong, Jason Alexander, Andreas Bulling, Hans Gellersen

Personal and Ubiquitous Computing, 21(1), pp. 173-186, 2017.

Abstract Links BibTeX Project

Gaze information provides indication of users focus which complements remote collaboration tasks, as distant users can see their partner’s focus. In this paper, we apply gaze for co-located collaboration, where users’ gaze locations are presented on the same display, to help collaboration between partners. We integrated various types of gaze indicators on the user interface of a collaborative search system, and we conducted two user studies to understand how gaze enhances coordination and communication between co-located users. Our results show that gaze indeed enhances co-located collaboration, but with a trade-off between visibility of gaze indicators and user distraction. Users acknowledged that seeing gaze indicators eases communication, because it let them be aware of their partner’s interests and attention. However, users can be reluctant to share their gaze information due to trust and privacy, as gaze potentially divulges their interests.

doi: 10.1007/s00779-016-0969-x

Paper: zhang17_puc.pdf

@article{zhang17_puc, title = {Look together: using gaze for assisting co-located collaborative search}, author = {Zhang, Yanxia and Pfeuffer, Ken and Chong, Ming Ki and Alexander, Jason and Bulling, Andreas and Gellersen, Hans}, year = {2017}, journal = {Personal and Ubiquitous Computing}, publisher = {Springer}, volume = {21}, number = {1}, pages = {173-186}, doi = {10.1007/s00779-016-0969-x} }
EyePACT: Eye-Based Parallax Correction on Touch-Enabled Interactive Displays

Mohamed Khamis, Daniel Buschek, Tobias Thieron, Florian Alt, Andreas Bulling

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 1(4), pp. 1–18, 2017.

Abstract Links BibTeX Project

The parallax effect describes the displacement between the perceived and detected touch locations on a touch-enabled surface. Parallax is a key usability challenge for interactive displays, particularly for those that require thick layers of glass between the screen and the touch surface to protect them from vandalism. To address this challenge, we present EyePACT, a method that compensates for input error caused by parallax on public displays. Our method uses a display-mounted depth camera to detect the user’s 3D eye position in front of the display and the detected touch location to predict the perceived touch location on the surface. We evaluate our method in two user studies in terms of parallax correction performance as well as multi-user support. Our evaluations demonstrate that EyePACT (1) significantly improves accuracy even with varying gap distances between the touch surface and the display, (2) adapts to different levels of parallax by resulting in significantly larger corrections with larger gap distances, and (3) maintains a significantly large distance between two users’ fingers when interacting with the same object. Our results provide implications for the development of future touch-enabled public displays.

doi: 10.1145/3161168

Paper: khamis17_imwut.pdf

@article{khamis17_imwut, author = {Khamis, Mohamed and Buschek, Daniel and Thieron, Tobias and Alt, Florian and Bulling, Andreas}, title = {EyePACT: Eye-Based Parallax Correction on Touch-Enabled Interactive Displays}, journal = {Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)}, year = {2017}, volume = {1}, number = {4}, pages = {1--18}, doi = {10.1145/3161168} }
InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation

Marc Tonsen, Julian Steil, Yusuke Sugano, Andreas Bulling

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 1(3), pp. 1–21, 2017.

Abstract Links BibTeX Project Distinguished paper award

Analysis of everyday human gaze behaviour has significant potential for ubiquitous computing, as evidenced by a large body of work in gaze-based human-computer interaction, attentive user interfaces, and eye-based user modelling. However, current mobile eye trackers are still obtrusive, which not only makes them uncomfortable to wear and socially unacceptable in daily life, but also prevents them from being widely adopted in the social and behavioural sciences. To address these challenges we present InvisibleEye, a novel approach for mobile eye tracking that uses millimetre-size RGB cameras that can be fully embedded into normal glasses frames. To compensate for the cameras’ low image resolution of only a few pixels, our approach uses multiple cameras to capture different views of the eye, as well as learning-based gaze estimation to directly regress from eye images to gaze directions. We prototypically implement our system and characterise its performance on three large-scale, increasingly realistic, and thus challenging datasets: 1) eye images synthesised using a recent computer graphics eye region model, 2) real eye images recorded of 17 participants under controlled lighting, and 3) eye images recorded of four participants over the course of four recording sessions in a mobile setting. We show that InvisibleEye achieves a top person-specific gaze estimation accuracy of 1.79° using four cameras with a resolution of only 5×5 pixels. Our evaluations not only demonstrate the feasibility of this novel approach but, more importantly, underline its significant potential for finally realising the vision of invisible mobile eye tracking and pervasive attentive user interfaces.

doi: 10.1145/3130971

Paper: tonsen17_imwut.pdf

@article{tonsen17_imwut, author = {Tonsen, Marc and Steil, Julian and Sugano, Yusuke and Bulling, Andreas}, title = {InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation}, journal = {Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)}, year = {2017}, doi = {10.1145/3130971}, volume = {1}, number = {3}, pages = {1--21} }

EyeScout: Active Eye Tracking for Position and Movement Independent Gaze Interaction with Large Public Displays

Mohamed Khamis, Axel Hoesl, Alexander Klimczak, Martin Reiss, Florian Alt, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 155-166, 2017.

Abstract Links BibTeX Project

While gaze holds a lot of promise for hands-free interaction with public displays, remote eye trackers with their confined tracking box restrict users to a single stationary position in front of the display. We present EyeScout, an active eye tracking system that combines an eye tracker mounted on a rail system with a computational method to automatically detect and align the tracker with the user’s lateral movement. EyeScout addresses key limitations of current gaze-enabled large public displays by offering two novel gaze-interaction modes for a single user: In "Walk then Interact" the user can walk up to an arbitrary position in front of the display and interact, while in "Walk and Interact" the user can interact even while on the move. We report on a user study that shows that EyeScout is well perceived by users, extends a public display’s sweet spot into a sweet line, and reduces gaze interaction kick- off time to 3.5 seconds - a 62% improvement over state of the art solutions. We discuss sample applications that demonstrate how EyeScout can enable position and movement-independent gaze interaction with large public displays.

doi: 10.1145/3126594.3126630

Paper: khamis17_uist.pdf

Video: https://www.youtube.com/watch?v=J7_OiRqsmdM

@inproceedings{khamis17_uist, title = {EyeScout: Active Eye Tracking for Position and Movement Independent Gaze Interaction with Large Public Displays}, author = {Khamis, Mohamed and Hoesl, Axel and Klimczak, Alexander and Reiss, Martin and Alt, Florian and Bulling, Andreas}, year = {2017}, pages = {155-166}, doi = {10.1145/3126594.3126630}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, video = {https://www.youtube.com/watch?v=J7_OiRqsmdM} }

TextPursuits: Using Text for Pursuits-Based Interaction and Calibration on Public Displays

Mohamed Khamis, Ozan Saltuk, Alina Hang, Katharina Stolz, Andreas Bulling, Florian Alt

Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), pp. 274-285, 2016.

Abstract Links BibTeX Project

Pursuits, a technique that correlates users’ eye movements with moving on-screen targets, was recently introduced for calibration-free interaction with public displays. While prior work used abstract objects or dots as targets, we explore the use of Pursuits with text (read-and-pursue). Given that much of the content on public displays includes text, designers could greatly benefit from users being able to spontaneously interact and implicitly calibrate an eye tracker while simply read- ing text on a display. At the same time, using Pursuits with textual content is challenging given that the eye movements performed while reading interfere with the pursuit movements. We present two systems, EyeVote and Read2Calibrate, that enable spontaneous gaze interaction and implicit calibration by reading text. Results from two user studies (N=37) show that Pursuits with text is feasible and can achieve similar accu- racy as non text-based pursuit approaches. While calibration is less accurate, it integrates smoothly with reading and allows to identify areas of the display the user is looking at.

doi: 10.1145/2971648.2971679

Paper: khamis16_ubicomp.pdf

@inproceedings{khamis16_ubicomp, author = {Khamis, Mohamed and Saltuk, Ozan and Hang, Alina and Stolz, Katharina and Bulling, Andreas and Alt, Florian}, booktitle = {Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)}, title = {TextPursuits: Using Text for Pursuits-Based Interaction and Calibration on Public Displays}, year = {2016}, doi = {10.1145/2971648.2971679}, pages = {274-285} }

Eye tracking for public displays in the wild

Yanxia Zhang, Ming Ki Chong, Jörg Müller, Andreas Bulling, Hans Gellersen

Springer Personal and Ubiquitous Computing, 19(5), pp. 967-981, 2015.

Abstract Links BibTeX Project

In public display contexts, interactions are spontaneous and have to work without preparation. We propose gaze as a modality for such con- texts, as gaze is always at the ready, and a natural indicator of the user’s interest. We present GazeHorizon, a system that demonstrates sponta- neous gaze interaction, enabling users to walk up to a display and navi- gate content using their eyes only. GazeHorizon is extemporaneous and optimised for instantaneous usability by any user without prior configura- tion, calibration or training. The system provides interactive assistance to bootstrap gaze interaction with unaware users, employs a single off-the- shelf web camera and computer vision for person-independent tracking of the horizontal gaze direction, and maps this input to rate-controlled nav- igation of horizontally arranged content. We have evaluated GazeHorizon through a series of field studies, culminating in a four-day deployment in a public environment during which over a hundred passers-by interacted with it, unprompted and unassisted. We realised that since eye move- ments are subtle, users cannot learn gaze interaction from only observing others, and as a results guidance is required.

doi: 10.1007/s00779-015-0866-8

Paper: zhang15_puc.pdf

@article{zhang15_puc, title = {Eye tracking for public displays in the wild}, author = {Zhang, Yanxia and Chong, Ming Ki and M\"uller, J\"org and Bulling, Andreas and Gellersen, Hans}, year = {2015}, doi = {10.1007/s00779-015-0866-8}, pages = {967-981}, volume = {19}, number = {5}, journal = {Springer Personal and Ubiquitous Computing}, keywords = {Eye tracking; Gaze interaction; Public displays; Scrolling; Calibration-free; In-the-wild study; Deployment} }
The Feet in HCI: A Survey of Foot-Based Interaction

Eduardo Velloso, Dominik Schmidt, Jayson Alexander, Hans Gellersen, Andreas Bulling

ACM Computing Surveys, 48(2), pp. 1–36, 2015.

Abstract Links BibTeX Project

Foot-operated computer interfaces have been studied since the inception of Human-Computer Interaction. Thanks to the miniaturisation and decreasing cost of sensing technology, there is an increasing interest exploring this alternative input modality, but no comprehensive overview of its research landscape. In this survey, we review the literature on interfaces operated by the lower limbs. We investigate the characteristics of users and how they affect the design of such interfaces. Next, we describe and analyse foot-based research prototypes and commercial systems in how they capture input and provide feedback. We then analyse the interactions between users and systems from the perspective of the actions performed in these interactions. Finally, we discuss our findings and use them to identify open questions and directions for future research.

doi: 10.1145/2816455

Paper: velloso15_csur.pdf

@article{velloso15_csur, author = {Velloso, Eduardo and Schmidt, Dominik and Alexander, Jayson and Gellersen, Hans and Bulling, Andreas}, title = {{The Feet in HCI: A Survey of Foot-Based Interaction}}, journal = {ACM Computing Surveys}, year = {2015}, volume = {48}, number = {2}, pages = {1--36}, doi = {10.1145/2816455} }

Orbits: Enabling Gaze Interaction in Smart Watches using Moving Targets

Augusto Esteves, Eduardo Velloso, Andreas Bulling, Hans Gellersen

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 457-466, 2015.

Abstract Links BibTeX Project Best paper award

We introduce Orbits, a novel gaze interaction technique that enables hands-free input on smart watches. The technique relies on moving controls to leverage the smooth pursuit movements of the eyes and detect whether and at which control the user is looking at. In Orbits, controls include targets that move in a circular trajectory in the face of the watch, and can be selected by following the desired one for a small amount of time. We conducted two user studies to assess the technique’s recognition and robustness, which demonstrated how Orbits is robust against false positives triggered by natural eye movements and how it presents a hands-free, high accuracy way of interacting with smart watches using off-the-shelf devices. Finally, we developed three example interfaces built with Orbits: a music player, a notifications face plate and a missed call menu. Despite relying on moving controls – very unusual in current HCI interfaces – these were generally well received by participants in a third and final study.

doi: 10.1145/2807442.2807499

Paper: esteves15_uist.pdf

@inproceedings{esteves15_uist, title = {Orbits: Enabling Gaze Interaction in Smart Watches using Moving Targets}, author = {Esteves, Augusto and Velloso, Eduardo and Bulling, Andreas and Gellersen, Hans}, year = {2015}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2807442.2807499}, pages = {457-466} }
GravitySpot: Guiding Users in Front of Public Displays Using On-Screen Visual Cues

Florian Alt, Andreas Bulling, Gino Gravanis, Daniel Buschek

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 47-56, 2015.

Abstract Links BibTeX Project

Users tend to position themselves in front of interactive public displays in such a way as to best perceive its content. Currently, this sweet spot is implicitly defined by display properties, content, the input modality, as well as space constraints in front of the display. We present GravitySpot – an approach that makes sweet spots flexible by actively guiding users to arbitrary target positions in front of displays using visual cues. Such guidance is beneficial, for example, if a particular input technology only works at a specific distance or if users should be guided towards a non-crowded area of a large display. In two controlled lab studies (n=29) we evaluate different visual cues based on color, shape, and motion, as well as position-to-cue mapping functions. We show that both the visual cues and mapping functions allow for fine-grained control over positioning speed and accuracy. Findings are complemented by observations from a 3-month real-world deployment.

doi: 10.1145/2807442.2807490

Paper: alt15_uist.pdf

Video: https://www.youtube.com/watch?v=laWfbOpQQ8A

@inproceedings{alt15_uist, title = {GravitySpot: Guiding Users in Front of Public Displays Using On-Screen Visual Cues}, author = {Alt, Florian and Bulling, Andreas and Gravanis, Gino and Buschek, Daniel}, year = {2015}, pages = {47-56}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2807442.2807490}, video = {https://www.youtube.com/watch?v=laWfbOpQQ8A} }
Gaze+RST: Integrating Gaze and Multitouch for Remote Rotate-Scale-Translate Tasks

Jayson Turner, Jason Alexander, Andreas Bulling, Hans Gellersen

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 4179-4188, 2015.

Abstract Links BibTeX Project

Our work investigates the use of gaze and multitouch to flu- idly perform rotate-scale translate (RST) tasks on large dis- plays. The work specifically aims to understand if gaze can provide benefit in such a task, how task complexity af- fects performance, and how gaze and multitouch can be com- bined to create an integral input structure suited to the task of RST. We present four techniques that individually strike a different balance between gaze-based and touch-based trans- lation while maintaining concurrent rotation and scaling op- erations. A 16 participant empirical evaluation revealed that three of our four techniques present viable options for this scenario, and that larger distances and rotation/scaling opera- tions can significantly affect a gaze-based translation configu- ration. Furthermore we uncover new insights regarding mul- timodal integrality, finding that gaze and touch can be com- bined into configurations that pertain to integral or separable input structures.

doi: 10.1145/2702123.2702355

Paper: turner15_chi.pdf

@inproceedings{turner15_chi, author = {Turner, Jayson and Alexander, Jason and Bulling, Andreas and Gellersen, Hans}, title = {Gaze+RST: Integrating Gaze and Multitouch for Remote Rotate-Scale-Translate Tasks}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2015}, pages = {4179-4188}, doi = {10.1145/2702123.2702355} }

Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction

Moritz Kassner, William Patera, Andreas Bulling

Adj. Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), pp. 1151-1160, 2014.

Abstract Links BibTeX Project

In this paper we present Pupil – an accessible, affordable, and extensible open source platform for pervasive eye tracking and gaze-based interaction. Pupil comprises 1) a light-weight eye tracking headset, 2) an open source software framework for mobile eye tracking, as well as 3) a graphical user interface to playback and visualize video and gaze data. Pupil features high-resolution scene and eye cameras for monocular and binocular gaze estimation. The software and GUI are platform-independent and include state-of-the-art algorithms for real-time pupil detection and tracking, calibration, and accurate gaze estimation. Results of a performance evaluation show that Pupil can provide an average gaze estimation accuracy of 0.6 degree of visual angle (0.08 degree precision) with a processing pipeline latency of only 0.045 seconds.

doi: 10.1145/2638728.2641695

Paper: kassner14_ubicomp.pdf

@inproceedings{kassner14_ubicomp, title = {Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction}, author = {Kassner, Moritz and Patera, William and Bulling, Andreas}, year = {2014}, doi = {10.1145/2638728.2641695}, booktitle = {Adj. Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)}, pages = {1151-1160} }
GazeHorizon: Enabling Passers-by to Interact with Public Displays by Gaze

Yanxia Zhang, Hans Jörg Müller, Ming Ki Chong, Andreas Bulling, Hans Gellersen

Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), pp. 559-563, 2014.

Abstract Links BibTeX Project

Public displays can be made interactive by adding gaze control. However, gaze interfaces do not offer any physical affordance, and require users to move into a tracking range. We present GazeHorizon, a system that provides interactive assistance to enable passers-by to walk up to a display and to navigate content using their eyes only. The system was developed through field studies culminating in a four-day deployment in a public environment. Our results show that novice users can be facilitated to successfully use gaze control by making them aware of the interface at first glance and guiding them interactively into the tracking range.

doi: 10.1145/2632048.2636071

Paper: zhang14_ubicomp.pdf

Video: https://www.youtube.com/watch?v=zKsSeLvvsXU

@inproceedings{zhang14_ubicomp, author = {Zhang, Yanxia and M{\"{u}}ller, Hans J{\"{o}}rg and Chong, Ming Ki and Bulling, Andreas and Gellersen, Hans}, title = {GazeHorizon: Enabling Passers-by to Interact with Public Displays by Gaze}, booktitle = {Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)}, year = {2014}, pages = {559-563}, doi = {10.1145/2632048.2636071}, video = {https://www.youtube.com/watch?v=zKsSeLvvsXU} }

Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets

Mélodie Vidal, Andreas Bulling, Hans Gellersen

Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), pp. 439-448, 2013.

Abstract Links BibTeX Project

Although gaze is an attractive modality for pervasive interactions, the real-world implementation of eye-based interfaces poses significant challenges, such as calibration. We present Pursuits, an innovative interaction technique that enables truly spontaneous interaction with eye-based interfaces. A user can simply walk up to the screen and readily interact with moving targets. Instead of being based on gaze location, Pursuits correlates eye pursuit movements with objects dynamically moving on the interface. We evaluate the influence of target speed, number and trajectory and develop guidelines for designing Pursuits-based interfaces. We then describe six realistic usage scenarios and implement three of them to evaluate the method in a usability study and a field study. Our results show that Pursuits is a versatile and robust technique and that users can interact with Pursuits-based interfaces without prior knowledge or preparation phase.

doi: 10.1145/2468356.2479632

Paper: vidal13_ubicomp.pdf

Video: https://www.youtube.com/watch?v=fpVPD_wQAWo

@inproceedings{vidal13_ubicomp, author = {Vidal, M{\'{e}}lodie and Bulling, Andreas and Gellersen, Hans}, title = {Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets}, booktitle = {Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)}, year = {2013}, pages = {439-448}, doi = {10.1145/2468356.2479632}, video = {https://www.youtube.com/watch?v=fpVPD_wQAWo} }
Pursuit Calibration: Making Gaze Calibration Less Tedious and More Flexible

Ken Pfeuffer, Mélodie Vidal, Jayson Turner, Andreas Bulling, Hans Gellersen

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 261-270, 2013.

Abstract Links BibTeX Project

Eye gaze is a compelling interaction modality but requires a user calibration before interaction can commence. State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. We present a novel approach, pursuit calibration, that instead uses moving targets for calibration. Users naturally perform smooth pursuit eye movements when they follow a moving target, and we use correlation of eye and target movement to detect the users attention and to sample data for calibration. Because the method knows when the users is attending to a target, the calibration can be performed implicitly, which enables more flexible design of the calibration task. We demonstrate this in application examples and user studies, and show that pursuit calibration is tolerant to interruption, can blend naturally with applications, and is able to calibrate users without their awareness.

doi: 10.1145/2501988.2501998

Paper: pfeuffer13_uist.pdf

Video: https://www.youtube.com/watch?v=T7S76L1Rkow

@inproceedings{pfeuffer13_uist, author = {Pfeuffer, Ken and Vidal, M{\'{e}}lodie and Turner, Jayson and Bulling, Andreas and Gellersen, Hans}, title = {Pursuit Calibration: Making Gaze Calibration Less Tedious and More Flexible}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, year = {2013}, pages = {261-270}, doi = {10.1145/2501988.2501998}, video = {https://www.youtube.com/watch?v=T7S76L1Rkow} }
SideWays: A Gaze Interface for Spontaneous Interaction with Situated Displays

Yanxia Zhang, Andreas Bulling, Hans Gellersen

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 851-860, 2013.

Abstract Links BibTeX Project

Eye gaze is compelling for interaction with situated displays as we naturally use our eyes to engage with them. In this work we present SideWays, a novel person-independent eye gaze interface that supports spontaneous interaction with displays: users can just walk up to a display and immediately interact using their eyes, without any prior user calibration or training. Requiring only a single off-the-shelf camera and lightweight image processing, SideWays robustly detects whether users attend to the centre of the display or cast glances to the left or right. The system supports an interaction model in which attention to the central display is the default state, while "sidelong glances" trigger input or actions. The robustness of the system and usability of the interaction model are validated in a study with 14 participants. Analysis of the participants’ strategies in performing different tasks provides insights on gaze control strategies for design of SideWays applications.

doi: 10.1145/2470654.2470775

Paper: zhang13_chi.pdf

Video: https://www.youtube.com/watch?v=cucOArVoyV0

@inproceedings{zhang13_chi, author = {Zhang, Yanxia and Bulling, Andreas and Gellersen, Hans}, title = {SideWays: A Gaze Interface for Spontaneous Interaction with Situated Displays}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2013}, pages = {851-860}, doi = {10.1145/2470654.2470775}, video = {https://www.youtube.com/watch?v=cucOArVoyV0} }

Attentive User Interfaces

SalChartQA: Question-driven Saliency on Information Visualisations

Yao Wang, Weitian Wang, Abdullah Abdelhafez, Mayar Elfares, Zhiming Hu, Mihai Bâce, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–14, 2024.

Abstract Links BibTeX Project

Understanding the link between visual attention and user’s needs when visually exploring information visualisations is under-explored due to a lack of large and diverse datasets to facilitate these analyses. To fill this gap, we introduce SalChartQA – a novel crowd-sourced dataset that uses the BubbleView interface as a proxy for human gaze and a question-answering (QA) paradigm to induce different information needs in users. SalChartQA contains 74,340 answers to 6,000 questions on 3,000 visualisations. Informed by our analyses demonstrating the tight correlation between the question and visual saliency, we propose the first computational method to predict question-driven saliency on information visualisations. Our method outperforms state-of-the-art saliency models, improving several metrics, such as the correlation coefficient and the Kullback-Leibler divergence. These results show the importance of information needs for shaping attention behaviour and paving the way for new applications, such as task-driven optimisation of visualisations or explainable AI in chart question-answering.

doi: 10.1145/3613904.3642942

Paper: wang24_chi.pdf

Supplementary Material: wang24_chi_sup.pdf

Dataset: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-3884

@inproceedings{wang24_chi, title = {SalChartQA: Question-driven Saliency on Information Visualisations}, author = {Wang, Yao and Wang, Weitian and Abdelhafez, Abdullah and Elfares, Mayar and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas}, year = {2024}, pages = {1--14}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3613904.3642942} }

Scanpath Prediction on Information Visualisations

Yao Wang, Mihai Bâce, Andreas Bulling

IEEE Transactions on Visualization and Computer Graphics (TVCG), (), pp. 1–15, 2023.

Abstract Links BibTeX Project

We propose Unified Model of Saliency and Scanpaths (UMSS) – a model that learns to predict multi-duration saliency and scanpaths (i.e. sequences of eye fixations) on information visualisations. Although scanpaths provide rich information about the importance of different visualisation elements during the visual exploration process, prior work has been limited to predicting aggregated attention statistics, such as visual saliency. We present in-depth analyses of gaze behaviour for different information visualisation elements (e.g. Title, Label, Data) on the popular MASSVIS dataset. We show that while, overall, gaze patterns are surprisingly consistent across visualisations and viewers, there are also structural differences in gaze dynamics for different elements. Informed by our analyses, UMSS first predicts multi-duration element-level saliency maps, then probabilistically samples scanpaths from them. Extensive experiments on MASSVIS show that our method consistently outperforms state-of-the-art methods with respect tto several, widely used scanpath and saliency evaluation metrics. Our method achieves a relative improvement in sequence score of 11.5 % for scanpath prediction, and a relative improvement in Pearson correlation coefficient of up to 23.6 % for saliency prediction. These results are auspicious and point towards richer user models and simulations of visual attention on visualisations without the need for any eye tracking equipment.

doi: 10.1109/TVCG.2023.3242293

Paper: wang23_tvcg.pdf

Supplementary Material: wang23_tvcg_sup.pdf

Dataset: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-3361

@article{wang23_tvcg, title = {Scanpath Prediction on Information Visualisations}, author = {Wang, Yao and Bâce, Mihai and Bulling, Andreas}, year = {2023}, journal = {IEEE Transactions on Visualization and Computer Graphics (TVCG)}, volume = {}, number = {}, pages = {1--15}, doi = {10.1109/TVCG.2023.3242293} }

Improving Neural Saliency Prediction with a Cognitive Model of Human Visual Attention

Ekta Sood, Lei Shi, Matteo Bortoletto, Yao Wang, Philipp Müller, Andreas Bulling

Proc. the 45th Annual Meeting of the Cognitive Science Society (CogSci), pp. 3639–3646, 2023.

Abstract Links BibTeX Project

We present a novel method for saliency prediction that leverages a cognitive model of visual attention as an inductive bias. This approach is in stark contrast to recent purely data-driven saliency models that achieve performance improvements mainly by increased capacity, resulting in high computational costs and the need for large-scale training datasets. We demonstrate that by using a cognitive model, our method achieves competitive performance to the state of the art across several natural image datasets while only requiring a fraction of the parameters. Furthermore, we set the new state of the art for saliency prediction on information visualizations, demonstrating the effectiveness of our approach for cross-domain generalization. We further provide augmented versions of the full MSCOCO dataset with synthetic gaze data using the cognitive model, which we used to pre-train our method. Our results are highly promising and underline the significant potential of bridging between cognitive and data-driven models, potentially also beyond attention.

Paper: sood23_cogsci.pdf

Code: https://git.hcics.simtech.uni-stuttgart.de/public-projects/neural-saliency-prediction-with-a-cognitive-model/

Supplementary Material: sood23_cogsci_sup.pdf

Dataset: https://perceptualui.org/research/datasets/MSCOCOEMMAFigureQAEMMA/

@inproceedings{sood23_cogsci, author = {Sood, Ekta and Shi, Lei and Bortoletto, Matteo and Wang, Yao and Müller, Philipp and Bulling, Andreas}, title = {Improving Neural Saliency Prediction with a Cognitive Model of Human Visual Attention}, booktitle = {Proc. the 45th Annual Meeting of the Cognitive Science Society (CogSci)}, year = {2023}, pages = {3639--3646} }

MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation

Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(1), pp. 162-175, 2019.

Abstract Links BibTeX Project

Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.

doi: 10.1109/TPAMI.2017.2778103

Paper: zhang19_pami.pdf

@article{zhang19_pami, title = {MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation}, author = {Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas}, year = {2019}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, doi = {10.1109/TPAMI.2017.2778103}, pages = {162-175}, volume = {41}, number = {1} }

Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications

Xucong Zhang, Yusuke Sugano, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–13, 2019.

Abstract Links BibTeX Project

Appearance-based gaze estimation methods that only require an off-the-shelf camera have significantly improved but they are still not yet widely used in the human-computer interaction (HCI) community. This is partly because it remains unclear how they perform compared to model-based approaches as well as dominant, special-purpose eye tracking equipment. To address this limitation, we evaluate the performance of state-of-the-art appearance-based gaze estimation for interaction scenarios with and without personal calibration, indoors and outdoors, for different sensing distances, as well as for users with and without glasses. We discuss the obtained findings and their implications for the most important gaze-based applications, namely explicit eye input, attentive user interfaces, gaze-based user modelling, and passive eye monitoring. To democratise the use of appearance-based gaze estimation and interaction in HCI, we finally present OpenGaze (www.opengaze.org), the first software toolkit for appearance-based gaze estimation and interaction.

doi: 10.1145/3290605.3300646

Paper: zhang19_chi.pdf

Code: http://www.opengaze.org/

@inproceedings{zhang19_chi, author = {Zhang, Xucong and Sugano, Yusuke and Bulling, Andreas}, title = {Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2019}, doi = {10.1145/3290605.3300646}, pages = {1--13} }

GazeDirector: Fully Articulated Eye Gaze Redirection in Video

Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, Andreas Bulling

Computer Graphics Forum (CGF), 37(2), pp. 217-225, 2018.

Abstract Links BibTeX Project Best paper honourable mention award

We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior.

doi: 10.1111/cgf.13355

Paper: wood18_cgf.pdf

Video: https://www.youtube.com/watch?v=rSNUGciJH6A

@article{wood18_cgf, title = {GazeDirector: Fully Articulated Eye Gaze Redirection in Video}, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Morency, Louis-Philippe and Robinson, Peter and Bulling, Andreas}, year = {2018}, journal = {Computer Graphics Forum (CGF)}, volume = {37}, number = {2}, pages = {217-225}, doi = {10.1111/cgf.13355}, video = {https://www.youtube.com/watch?v=rSNUGciJH6A} }

Training Person-Specific Gaze Estimators from Interactions with Multiple Devices

Xucong Zhang, Michael Xuelin Huang, Yusuke Sugano, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–12, 2018.

Abstract Links BibTeX Project

Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.

doi: 10.1145/3173574.3174198

Paper: zhang18_chi.pdf

@inproceedings{zhang18_chi, title = {Training Person-Specific Gaze Estimators from Interactions with Multiple Devices}, author = {Zhang, Xucong and Huang, Michael Xuelin and Sugano, Yusuke and Bulling, Andreas}, year = {2018}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3173574.3174198}, pages = {1--12} }
Forecasting User Attention During Everyday Mobile Interactions Using Device-Integrated and Wearable Sensors

Julian Steil, Philipp Müller, Yusuke Sugano, Andreas Bulling

Proc. ACM International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI), pp. 1–13, 2018.

Abstract Links BibTeX Project Best paper award

Visual attention is highly fragmented during mobile interactions but the erratic nature of attention shifts currently limits attentive user interfaces to adapt after the fact, i.e. after shifts have already happened. We instead study attention forecasting – the challenging task of predicting users’ gaze behavior (overt visual attention) in the near future. We present a novel long-term dataset of everyday mobile phone interactions, continuously recorded from 20 participants engaged in common activities on a university campus over 4.5 hours each (more than 90 hours in total). We propose a proof-of-concept method that uses device-integrated sensors and body-worn cameras to encode rich information on device usage and users’ visual scene. We demonstrate that our method can forecast bidirectional attention shifts and whether the primary attentional focus is on the handheld mobile device. We study the impact of different feature sets on performance and discuss the significant potential but also remaining challenges of forecasting user attention during mobile interactions.

doi: 10.1145/3229434.3229439

Paper: steil18_mobilehci.pdf

@inproceedings{steil18_mobilehci, author = {Steil, Julian and M{\"{u}}ller, Philipp and Sugano, Yusuke and Bulling, Andreas}, title = {Forecasting User Attention During Everyday Mobile Interactions Using Device-Integrated and Wearable Sensors}, booktitle = {Proc. ACM International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI)}, year = {2018}, doi = {10.1145/3229434.3229439}, pages = {1--13} }
Error-Aware Gaze-Based Interfaces for Robust Mobile Gaze Interaction

Michael Barz, Florian Daiber, Daniel Sonntag, Andreas Bulling

Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 1–10, 2018.

Abstract Links BibTeX Project Best paper award

Gaze estimation error is unavoidable in head-mounted eye trackers and can severely hamper usability and performance of mobile gaze-based interfaces given that the error varies constantly for different interaction positions. In this work, we explore error-aware gaze-based interfaces that estimate and adapt to gaze estimation error on-the-fly. We implement a sample error-aware user interface for gaze-based selection and different error compensation methods: a naïve approach that increases component size directly proportional to the absolute error, a recent model by Feit et al. (CHI’17) that is based on the 2-dimensional error distribution, and a novel predictive model that shifts gaze by a directional error estimate. We evaluate these models in a 12-participant user study and show that our predictive model outperforms the others significantly in terms of selection rate, particularly for small gaze targets. These results underline both the feasibility and potential of next generation error-aware gaze-based user interfaces.

doi: 10.1145/3204493.3204536

Paper: barz18_etra.pdf

@inproceedings{barz18_etra, author = {Barz, Michael and Daiber, Florian and Sonntag, Daniel and Bulling, Andreas}, title = {Error-Aware Gaze-Based Interfaces for Robust Mobile Gaze Interaction}, booktitle = {Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA)}, year = {2018}, pages = {1--10}, doi = {10.1145/3204493.3204536} }
Understanding Face and Eye Visibility in Front-Facing Cameras of Smartphones used in the Wild

Mohamed Khamis, Anita Baier, Niels Henze, Florian Alt, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–12, 2018.

Abstract Links BibTeX Project

Commodity mobile devices are now equipped with high-resolution front-facing cameras, paving the way for applications in biometrics, facial expression analysis, or gaze interaction. However, it is unknown how often users hold devices in a way that allows capturing their face or eyes, and how this impacts detection accuracy. We collected 25,726 in-the-wild photos taken from the front-facing camera of smartphones and associated application usage logs. We found that the full face is visible about 29% of the time, and that in most cases the face is only partially visible. We further identified an influence of users’ current activity; for example, when watching videos, the eyes but not the entire face are visible 75% of the time in our dataset. We found that state-of-the-art face detection algorithms perform poorly against photos taken from front-facing cameras. We discuss how these findings impact mobile applications that leverage face and eye detection, and derive practical implications to address state-of-the art’s limitations.

doi: 10.1145/3173574.3173854

Paper: khamis18_chi.pdf

Video: https://www.youtube.com/watch?v=_L6FyzTjFG0

@inproceedings{khamis18_chi, title = {Understanding Face and Eye Visibility in Front-Facing Cameras of Smartphones used in the Wild}, author = {Khamis, Mohamed and Baier, Anita and Henze, Niels and Alt, Florian and Bulling, Andreas}, year = {2018}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3173574.3173854}, pages = {1--12}, video = {https://www.youtube.com/watch?v=_L6FyzTjFG0} }

It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation

Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling

Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2299-2308, 2017.

Abstract Links BibTeX Project

Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and par- ticularly pronounced for the most challenging extreme head poses.

doi: 10.1109/CVPRW.2017.284

Paper: zhang17_cvprw.pdf

@inproceedings{zhang17_cvprw, title = {It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation}, author = {Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year = {2017}, doi = {10.1109/CVPRW.2017.284}, pages = {2299-2308} }
Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery

Xucong Zhang, Yusuke Sugano, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 193-203, 2017.

Abstract Links BibTeX Project Best paper honourable mention award

Eye contact is an important non-verbal cue in social signal processing and promising as a measure of overt attention in human-object interactions and attentive user interfaces. However, robust detection of eye contact across different users, gaze targets, camera positions, and illumination conditions is notoriously challenging. We present a novel method for eye contact detection that combines a state-of-the-art appearance-based gaze estimator with a novel approach for unsupervised gaze target discovery, i.e. without the need for tedious and time-consuming manual data annotation. We evaluate our method in two real-world scenarios: detecting eye contact at the workplace, including on the main work display, from cameras mounted to target objects, as well as during everyday social interactions with the wearer of a head-mounted egocentric camera. We empirically evaluate the performance of our method in both scenarios and demonstrate its effectiveness for detecting eye contact independent of target object type and size, camera position, and user and recording environment.

doi: 10.1145/3126594.3126614

Paper: zhang17_uist.pdf

Video: https://www.youtube.com/watch?v=ccrS5XuhQpk

@inproceedings{zhang17_uist, title = {Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery}, author = {Zhang, Xucong and Sugano, Yusuke and Bulling, Andreas}, year = {2017}, pages = {193-203}, doi = {10.1145/3126594.3126614}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, video = {https://www.youtube.com/watch?v=ccrS5XuhQpk} }

Pervasive Attentive User Interfaces

Andreas Bulling

IEEE Computer, 49(1), pp. 94-98, 2016.

Abstract Links BibTeX Project

As the number of displays we interact with rapidly increases, managing user attention has emerged as a critical challenge for next-generation human−computer interfaces.

doi: 10.1109/MC.2016.32

Paper: bulling16_computer.pdf

@article{bulling16_computer, title = {Pervasive Attentive User Interfaces}, author = {Bulling, Andreas}, doi = {10.1109/MC.2016.32}, year = {2016}, journal = {IEEE Computer}, volume = {49}, number = {1}, pages = {94-98} }

Learning an appearance-based gaze estimator from one million synthesised images

Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, Andreas Bulling

Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 131–138, 2016.

Abstract Links BibTeX Project Emerging investigator award

Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, the appearance variability as well as the head pose and gaze angle distribution they can synthesize. We present UnityEyes, a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real- time approximations for complex eyeball materials and structures as well as novel anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework freely available online for the benefit of the research community.

doi: 10.1145/2857491.2857492

Paper: wood16_etra.pdf

@inproceedings{wood16_etra, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Morency, Louis-Philippe and Robinson, Peter and Bulling, Andreas}, title = {Learning an appearance-based gaze estimator from one million synthesised images}, booktitle = {Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA)}, year = {2016}, pages = {131--138}, doi = {10.1145/2857491.2857492} }
A 3D Morphable Eye Region Model for Gaze Estimation

Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, Andreas Bulling

Proc. European Conference on Computer Vision (ECCV), pp. 297-313, 2016.

Abstract Links BibTeX Project

Morphable face models are a powerful tool, but have previ- ously failed to model the eye accurately due to complexities in its material and motion. We present a new multi-part model of the eye that includes a morphable model of the facial eye region, as well as an anatomy-based eyeball model. It is the first morphable model that accurately captures eye region shape, since it was built from high-quality head scans. It is also the first to allow independent eyeball movement, since we treat it as a separate part. To showcase our model we present a new method for illumination- and head-pose–invariant gaze estimation from a single RGB image. We fit our model to an image through analysis-by-synthesis, solving for eye region shape, texture, eyeball pose, and illumination simul- taneously. The fitted eyeball pose parameters are then used to estimate gaze direction. Through evaluation on two standard datasets we show that our method generalizes to both webcam and high-quality camera images, and outperforms a state-of-the-art CNN method achieving a gaze estimation accuracy of 9.44° in a challenging user-independent scenario.

doi: 10.1007/978-3-319-46448-0_18

Paper: wood16_eccv.pdf

@inproceedings{wood16_eccv, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Morency, Louis-Philippe and Robinson, Peter and Bulling, Andreas}, title = {A 3D Morphable Eye Region Model for Gaze Estimation}, booktitle = {Proc. European Conference on Computer Vision (ECCV)}, year = {2016}, pages = {297-313}, doi = {10.1007/978-3-319-46448-0_18} }
Spatio-Temporal Modeling and Prediction of Visual Attention in Graphical User Interfaces

Pingmei Xu, Yusuke Sugano, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 3299-3310, 2016.

Abstract Links BibTeX Project Best paper honourable mention award

We present a computational model to predict users’ spatio-temporal visual attention for WIMP-style (windows, icons, mouse, pointer) graphical user interfaces. Like existing models of bottom-up visual attention in computer vision, our model does not require any eye tracking equipment. Instead, it predicts attention solely using information available to the interface, specifically users’ mouse and keyboard input as well as the UI components they interact with. To study our model in a principled way we further introduce a method to synthesize user interface layouts that are functionally equivalent to real-world interfaces, such as from Gmail, Facebook, or GitHub. We first quantitatively analyze attention allocation and its correlation with user input and UI components using ground-truth gaze, mouse, and keyboard data of 18 participants performing a text editing task. We then show that our model predicts attention maps more accurately than state-of-the-art methods. Our results underline the significant potential of spatio-temporal attention modeling for user interface evaluation, optimization, or even simulation.

doi: 10.1145/2858036.2858479

Paper: xu16_chi.pdf

@inproceedings{xu16_chi, title = {Spatio-Temporal Modeling and Prediction of Visual Attention in Graphical User Interfaces}, author = {Xu, Pingmei and Sugano, Yusuke and Bulling, Andreas}, year = {2016}, pages = {3299-3310}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/2858036.2858479} }
AggreGaze: Collective Estimation of Audience Attention on Public Displays

Yusuke Sugano, Xucong Zhang, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 821-831, 2016.

Abstract Links BibTeX Project Best paper honourable mention award

Gaze is frequently explored in public display research given its importance for monitoring and analysing audience attention. However, current gaze-enabled public display interfaces require either special-purpose eye tracking equipment or explicit personal calibration for each individual user. We present AggreGaze, a novel method for estimating spatio-temporal audience attention on public displays. Our method requires only a single off-the-shelf camera attached to the display, does not require any personal calibration, and provides visual attention estimates across the full display. We achieve this by 1) compensating for errors of state-of-the-art appearance-based gaze estimation methods through on-site training data collection, and by 2) aggregating uncalibrated and thus inaccurate gaze estimates of multiple users into joint attention estimates. We propose different visual stimuli for this compensation: a standard 9-point calibration, moving targets, text and visual stimuli embedded into the display content, as well as normal video content. Based on a two-week deployment in a public space, we demonstrate the effectiveness of our method for estimating attention maps that closely resemble ground-truth audience gaze distributions.

doi: 10.1145/2984511.2984536

Paper: sugano16_uist.pdf

Video: https://www.youtube.com/watch?v=eFK39S_lgdg

@inproceedings{sugano16_uist, title = {AggreGaze: Collective Estimation of Audience Attention on Public Displays}, author = {Sugano, Yusuke and Zhang, Xucong and Bulling, Andreas}, year = {2016}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2984511.2984536}, pages = {821-831}, video = {https://www.youtube.com/watch?v=eFK39S_lgdg} }

Appearance-based Gaze Estimation in the Wild

Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling

Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511-4520, 2015.

Abstract Links BibTeX Project

Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild.

doi: 10.1109/CVPR.2015.7299081

Paper: zhang15_cvpr.pdf

Video: https://www.youtube.com/watch?v=rw6LZA1USG8

@inproceedings{zhang15_cvpr, author = {Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas}, title = {Appearance-based Gaze Estimation in the Wild}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2015}, pages = {4511-4520}, doi = {10.1109/CVPR.2015.7299081}, video = {https://www.youtube.com/watch?v=rw6LZA1USG8} }
GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays

Christian Lander, Sven Gehring, Antonio Krüger, Sebastian Boring, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 395-404, 2015.

Abstract Links BibTeX Project

Mobile gaze-based interaction with multiple displays may occur from arbitrary positions and orientations. However, maintaining high gaze estimation accuracy in such situations remains a significant challenge. In this paper, we present GazeProjector, a system that combines (1) natural feature tracking on displays to determine the mobile eye tracker’s position relative to a display with (2) accurate point-of-gaze estimation. GazeProjector allows for seamless gaze estimation and interaction on multiple displays of arbitrary sizes independently of the user’s position and orientation to the display. In a user study with 12 participants we compare GazeProjector to established methods (here: visual on-screen markers and a state-of-the-art video-based motion capture system). We show that our approach is robust to varying head poses, orientations, and distances to the display, while still providing high gaze estimation accuracy across multiple displays without re-calibration for each variation. Our system represents an important step towards the vision of pervasive gaze-based interfaces.

doi: 10.1145/2807442.2807479

Paper: lander15_uist.pdf

@inproceedings{lander15_uist, title = {GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays}, author = {Lander, Christian and Gehring, Sven and Kr{\"{u}}ger, Antonio and Boring, Sebastian and Bulling, Andreas}, year = {2015}, pages = {395-404}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2807442.2807479} }
Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency

Yusuke Sugano, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 363-372, 2015.

Abstract Links BibTeX Project

Head-mounted eye tracking has significant potential for gaze-based applications such as life logging, mental health monitoring, or quantified self. However, a neglected challenge for such applications is that drift in the initial person-specific eye tracker calibration, for example caused by physical activity, can severely impact gaze estimation accuracy and, thus, system performance and user experience. We first analyse calibration drift on a new dataset of natural gaze data recorded using synchronised video-based and Electrooculography-based eye trackers of 20 users performing everyday activities in a mobile setting. Based on this analysis we present a method to automatically self-calibrate head-mounted eye trackers based on a computational model of bottom-up visual saliency. Through evaluations on the dataset we show that our method is 1) effective in reducing calibration drift in calibrated eye trackers and 2) given sufficient data, can achieve competitive gaze estimation accuracy to a calibrated eye tracker without any manual calibration.

doi: 10.1145/2807442.2807445

Paper: sugano15_uist.pdf

Video: https://www.youtube.com/watch?v=CvsZ3YCWFPk

@inproceedings{sugano15_uist, title = {Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency}, author = {Sugano, Yusuke and Bulling, Andreas}, year = {2015}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2807442.2807445}, pages = {363-372}, video = {https://www.youtube.com/watch?v=CvsZ3YCWFPk} }
Rendering of Eyes for Eye-Shape Registration and Gaze Estimation

Erroll Wood, Tadas Baltrušaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, Andreas Bulling

Proc. IEEE International Conference on Computer Vision (ICCV), pp. 3756-3764, 2015.

Abstract Links BibTeX Project

Images of the eye are key in several computer vision problems, such as shape registration and gaze estimation. Recent large-scale supervised methods for these problems require time-consuming data collection and manual annotation, which can be unreliable. We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. We used computer graphics techniques to build a collection of dynamic eye-region models from head scan geometry. These were randomly posed to synthesize close-up eye images for a wide range of head poses, gaze directions, and illumination conditions. We used our model’s controllability to verify the importance of realistic illumination and shape variations in eye-region training data. Finally, we demonstrate the benefits of our synthesized training data (SynthesEyes) by out-performing state-of-the-art methods for eye-shape registration as well as cross-dataset appearance-based gaze estimation in the wild.

doi: 10.1109/ICCV.2015.428

Paper: wood15_iccv.pdf

@inproceedings{wood15_iccv, title = {Rendering of Eyes for Eye-Shape Registration and Gaze Estimation}, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Zhang, Xucong and Sugano, Yusuke and Robinson, Peter and Bulling, Andreas}, doi = {10.1109/ICCV.2015.428}, year = {2015}, pages = {3756-3764}, booktitle = {Proc. IEEE International Conference on Computer Vision (ICCV)} }

Eye Tracking and Eye-Based Human-Computer Interaction

Päivi Majaranta, Andreas Bulling

Stephen H. Fairclough, Kiel Gilleade (Eds.): Advances in Physiological Computing, Springer Publishing London, pp. 39-65, 2014.

Abstract Links BibTeX Project

Eye tracking has a long history in medical and psychological research as a tool for recording and studying human visual behavior. Real-time gaze-based text entry can also be a powerful means of communication and control for people with physical disa-bilities. Following recent technological advances and the advent of affordable eye trackers, there is a growing interest in pervasive at-tention-aware systems and interfaces that have the potential to rev-olutionize mainstream human-technology interaction. In this chapter, we provide an introduction to the state-of-the art in eye tracking technology and gaze estimation. We discuss challenges involved in using a perceptual organ, the eye, as an input modality. Examples of real life applications are reviewed, together with design solutions derived from research results. We also discuss how to match the user requirements and key features of different eye tracking sys-tems to find the best system for each task and application.

doi: 10.1007/978-1-4471-6392-3_3

Paper: majaranta14_apc.pdf

@inbook{majaranta14_apc, author = {Majaranta, P{\"{a}}ivi and Bulling, Andreas}, title = {Eye Tracking and Eye-Based Human-Computer Interaction}, booktitle = {Advances in Physiological Computing}, editor = {Fairclough, Stephen H. and Gilleade, Kiel}, year = {2014}, pages = {39-65}, publisher = {Springer Publishing London}, doi = {10.1007/978-1-4471-6392-3_3} }

Usable Security and Privacy

EyeSeeIdentity: Exploring Natural Gaze Behaviour for Implicit User Identification during Photo Viewing

Yasmeen Abdrabou, Mariam Hassib, Shuqin Hu, Ken Pfeuffer, Mohamed Khamis, Andreas Bulling, Florian Alt

Proc. Symposium on Usable Security and Privacy (USEC), pp. 1–12, 2024.

Abstract Links BibTeX Project

Existing gaze-based methods for user identification either require special-purpose visual stimuli or artificial gaze behaviour. Here, we explore how users can be differentiated by analysing natural gaze behaviour while freely looking at images. Our approach is based on the observation that looking at different images, for example, a picture from your last holiday, induces stronger emotional responses that are reflected in gaze behavioor and, hence, is unique to the person having experienced that situation. We collected gaze data in a remote study (N = 39) where participants looked at three image categories: personal images, other people’s images, and random images from the Internet. We demonstrate the potential of identifying different people using machine learning with an accuracy of 85%. The results pave the way towards a new class of authentication methods solely based on natural human gaze behaviour.

Paper: abdrabou24_usec.pdf

@inproceedings{abdrabou24_usec, author = {Abdrabou, Yasmeen and Hassib, Mariam and Hu, Shuqin and Pfeuffer, Ken and Khamis, Mohamed and Bulling, Andreas and Alt, Florian}, title = {EyeSeeIdentity: Exploring Natural Gaze Behaviour for Implicit User Identification during Photo Viewing}, booktitle = {Proc. Symposium on Usable Security and Privacy (USEC)}, year = {2024}, pages = {1--12} }

Privacy-Aware Eye Tracking: Challenges and Future Directions

Céline Gressel, Rebekah Overdorf, Inken Hagenstedt, Murat Karaboga, Helmut Lurtz, Michael Raschke, Andreas Bulling

IEEE Pervasive Computing, 22(1), pp. 95-102, 2023.

Abstract Links BibTeX Project

What do you have to keep in mind when developing or using eye-tracking technologies regarding privacy? In this article we discuss the main ethical, technical, and legal categories of privacy, which is much more than just data protection. We additionally provide recommendations about how such technologies might mitigate privacy risks and in which cases the risks are higher than the benefits of the technology.

doi: 10.1109/MPRV.2022.3228660

Paper: gressel23_pcm.pdf

@article{gressel23_pcm, title = {Privacy-Aware Eye Tracking: Challenges and Future Directions}, author = {Gressel, Céline and Overdorf, Rebekah and Hagenstedt, Inken and Karaboga, Murat and Lurtz, Helmut and Raschke, Michael and Bulling, Andreas}, journal = {IEEE Pervasive Computing}, year = {2023}, volume = {22}, number = {1}, doi = {10.1109/MPRV.2022.3228660}, pages = {95-102} }

Federated Learning for Appearance-based Gaze Estimation in the Wild

Mayar Elfares, Zhiming Hu, Pascal Reisert, Andreas Bulling, Ralf Küsters

Proceedings of The 1st Gaze Meets ML workshop, PMLR, pp. 20–36, 2023.

Abstract Links BibTeX Project

Gaze estimation methods have significantly matured in recent years, but the large number of eye images required to train deep learning models poses significant privacy risks. In addition, the heterogeneous data distribution across different users can significantly hinder the training process. In this work, we propose the first federated learning approach for gaze estimation to preserve the privacy of gaze data. We further employ pseudo-gradient optimisation to adapt our federated learning approach to the divergent model updates to address the heterogeneous nature of in-the-wild gaze data in collaborative setups. We evaluate our approach on a real-world dataset (MPIIGaze) and show that our work enhances the privacy guarantees of conventional appearance-based gaze estimation methods, handles the convergence issues of gaze estimators, and significantly outperforms vanilla federated learning by 15.8% (from a mean error of 10.63 degrees to 8.95 degrees). As such, our work paves the way to develop privacy-aware collaborative learning setups for gaze estimation while maintaining the model’s performance.

Paper: elfares23_gmml.pdf

Paper Access: https://proceedings.mlr.press/v210/elfares23a.html

@inproceedings{elfares23_gmml, title = {Federated Learning for Appearance-based Gaze Estimation in the Wild}, author = {Elfares, Mayar and Hu, Zhiming and Reisert, Pascal and Bulling, Andreas and K{\"u}sters, Ralf}, booktitle = {Proceedings of The 1st Gaze Meets ML workshop, PMLR}, pages = {20--36}, year = {2023}, editor = {Lourentzou, Ismini and Wu, Joy and Kashyap, Satyananda and Karargyris, Alexandros and Celi, Leo Anthony and Kawas, Ban and Talathi, Sachin}, volume = {210}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v210/elfares23a/elfares23a.pdf}, url = {https://proceedings.mlr.press/v210/elfares23a.html} }
Impact of Privacy Protection Methods of Lifelogs on Remembered Memories

Passant Elagroudy, Mohamed Khamis, Florian Mathis, Diana Irmscher, Ekta Sood, Andreas Bulling, Albrecht Schmidt

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–10, 2023.

Abstract Links BibTeX Project

Lifelogging is traditionally used for memory augmentation. However, recent research shows that users’ trust in the completeness and accuracy of lifelogs might skew their memories. Privacy-protection alterations such as body blurring and content deletion are commonly applied to photos to circumvent capturing sensitive information. However, their impact on how users remember memories remain unclear. To this end, we conduct a white-hat memory attack and report on an iterative experiment (N=21) to compare the impact of viewing 1) unaltered lifelogs, 2) blurred lifelogs, and 3) a subset of the lifelogs after deleting private ones, on confidently remembering memories. Findings indicate that all the privacy methods impact memories’ quality similarly and that users tend to change their answers in recognition more than recall scenarios. Results also show that users have high confidence in their remembered content across all privacy methods. Our work raises awareness about the mindful designing of technological interventions.

doi: 10.1145/3544548.3581565

Paper: elagroudy23_chi.pdf

@inproceedings{elagroudy23_chi, author = {Elagroudy, Passant and Khamis, Mohamed and Mathis, Florian and Irmscher, Diana and Sood, Ekta and Bulling, Andreas and Schmidt, Albrecht}, title = {Impact of Privacy Protection Methods of Lifelogs on Remembered Memories}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2023}, doi = {10.1145/3544548.3581565}, pages = {1--10} }

PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features

Julian Steil, Marion Koelle, Wilko Heuten, Susanne Boll, Andreas Bulling

Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 1–10, 2019.

Abstract Links BibTeX Project Best video award

Eyewear devices, such as augmented reality displays, increasingly integrate eye tracking but the first-person camera required to map a user’s gaze to the visual scene can pose a significant threat to user and bystander privacy. We present PrivacEye, a method to detect privacy-sensitive everyday situations and automatically enable and disable the eye tracker’s first-person camera using a mechanical shutter. To close the shutter in privacy-sensitive situations, the method uses a deep representation of the first-person video combined with rich features that encode users’ eye movements. To open the shutter without visual input, PrivacEye detects changes in users’ eye movements alone to gauge changes in the "privacy level" of the current situation. We evaluate our method on a first-person video dataset recorded in daily life situations of 17 participants, annotated by themselves for privacy sensitivity, and show that our method is effective in preserving privacy in this challenging setting.

doi: 10.1145/3314111.3319913

Paper: steil19_etra.pdf

Supplementary Material: steil19_etra_sup.pdf

Video: https://www.youtube.com/watch?v=Gy61255F8T8

@inproceedings{steil19_etra, title = {PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features}, author = {Steil, Julian and Koelle, Marion and Heuten, Wilko and Boll, Susanne and Bulling, Andreas}, year = {2019}, booktitle = {Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA)}, pages = {1--10}, doi = {10.1145/3314111.3319913}, video = {https://www.youtube.com/watch?v=Gy61255F8T8} }
Privacy-Aware Eye Tracking Using Differential Privacy

Julian Steil, Inken Hagestedt, Michael Xuelin Huang, Andreas Bulling

Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA), pp. 1–9, 2019.

Abstract Links BibTeX Project Best paper award

With eye tracking being increasingly integrated into virtual and augmented reality (VR/AR) head-mounted displays, preserving users’ privacy is an ever more important, yet under-explored, topic in the eye tracking community. We report a large-scale online survey (N=124) on privacy aspects of eye tracking that provides the first comprehensive account of with whom, for which services, and to which extent users are willing to share their gaze data. Using these insights, we design a privacy-aware VR interface that uses differential privacy, which we evaluate on a new 20-participant dataset for two privacy sensitive tasks: We show that our method can prevent user re-identification and protect gender information while maintaining high performance for gaze-based document type classification. Our results highlight the privacy challenges particular to gaze data and demonstrate that differential privacy is a potential means to address them. Thus, this paper lays important foundations for future research on privacy-aware gaze interfaces.

doi: 10.1145/3314111.3319915

Paper: steil19_etra_2.pdf

@inproceedings{steil19_etra_2, title = {Privacy-Aware Eye Tracking Using Differential Privacy}, author = {Steil, Julian and Hagestedt, Inken and Huang, Michael Xuelin and Bulling, Andreas}, year = {2019}, booktitle = {Proc. ACM International Symposium on Eye Tracking Research and Applications (ETRA)}, doi = {10.1145/3314111.3319915}, pages = {1--9} }

CueAuth: Comparing Touch, Mid-Air Gestures, and Gaze for Cue-based Authentication on Situated Displays

Mohamed Khamis, Ludwig Trotter, Ville Mäkelä, Emanuel Zezschwitz, Jens Le, Andreas Bulling, Florian Alt

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2(7), pp. 1–22, 2018.

Abstract Links BibTeX Project

Secure authentication on situated displays s (e.g., to access sensitive information or to make purchases) is becoming increasingly important. A promising approach are authentication schemes that employ cues that users respond to while authenticating; these schemes overwhelm observers by requiring them to observe the cue itself as well as users’ response to the cue. Although previous work proposed a variety of modalities, such as gaze and mid-air gestures, to further improve security, an understanding of how they compare with regard to usability and security is still missing as of today. In this paper, we compare modalities for cue-based authentication on situated displays. We provide the first comparison between touch, mid-air gestures, and calibration-free gaze using a state-of-the-art authentication concept. In two user studies (N=37) we found that the choice of touch or gaze presents a clear trade-off between usability and security. For example, while gaze input is more secure, it is also more demanding and requires longer authentication times. Mid-air gestures are slightly slower and more secure than touch but users hesitate using them in public. We conclude with design implications for authentication using touch, mid-air gestures, and gaze and discuss how the choice of modality creates opportunities and challenges for improved authentication in public.

doi: 10.1145/3287052

Paper: khamis18_imwut.pdf

@article{khamis18_imwut, title = {CueAuth: Comparing Touch, Mid-Air Gestures, and Gaze for Cue-based Authentication on Situated Displays}, author = {Khamis, Mohamed and Trotter, Ludwig and Mäkelä, Ville and von Zezschwitz, Emanuel and Le, Jens and Bulling, Andreas and Alt, Florian}, year = {2018}, journal = {Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)}, volume = {2}, number = {7}, pages = {1--22}, doi = {10.1145/3287052} }

GazeTouchPIN: Protecting Sensitive Data on Mobile Devices using Secure Multimodal Authentication

Mohamed Khamis, Mariam Hassib, Emanuel Zezschwitz, Andreas Bulling, Florian Alt

Proc. ACM International Conference on Multimodal Interaction (ICMI), pp. 446-450, 2017.

Abstract Links BibTeX Project

Although mobile devices provide access to a plethora of sensitive data, most users still only protect them with PINs or patterns, which are vulnerable to side-channel attacks (e.g., shoulder surfing). However, prior research has shown that privacy-aware users are willing to take further steps to protect their private data. We propose GazeTouchPIN, a novel secure authentication scheme for mobile devices that combines gaze and touch input. Our multimodal approach complicates shoulder-surfing attacks by requiring attackers to observe the screen as well as the user’s eyes to find the password. We evaluate the security and usability of GazeTouchPIN in two user studies (N=30). We found that while GazeTouchPIN requires longer entry times, privacy aware users would use it on-demand when feeling observed or when accessing sensitive data. The results show that successful shoulder surfing attack rate drops from 68% to 10.4% when using GazeTouchPIN.

doi: 10.1145/3136755.3136809

Paper: khamis17_icmi.pdf

Video: https://www.youtube.com/watch?v=gs2YO0gP4kI

@inproceedings{khamis17_icmi, title = {GazeTouchPIN: Protecting Sensitive Data on Mobile Devices using Secure Multimodal Authentication}, author = {Khamis, Mohamed and Hassib, Mariam and von Zezschwitz, Emanuel and Bulling, Andreas and Alt, Florian}, year = {2017}, pages = {446-450}, doi = {10.1145/3136755.3136809}, booktitle = {Proc. ACM International Conference on Multimodal Interaction (ICMI)}, video = {https://www.youtube.com/watch?v=gs2YO0gP4kI} }
They are all after you: Investigating the Viability of a Threat Model that involves Multiple Shoulder Surfers

Mohamed Khamis, Linda Bandelow, Stina Schick, Dario Casadevall, Andreas Bulling, Florian Alt

Proc. International Conference on Mobile and Ubiquitous Multimedia (MUM), pp. 1–5, 2017.

Abstract Links BibTeX Project Best paper honourable mention award

Many of the authentication schemes for mobile devices that were proposed lately complicate shoulder surfing by splitting the attacker’s attention into two or more entities. For example, multimodal authentication schemes such as GazeTouchPIN and GazeTouchPass require attackers to observe the user’s gaze input and the touch input performed on the phone’s screen. These schemes have always been evaluated against single observers, while multiple observers could potentially attack these schemes with greater ease, since each of them can focus exclusively on one part of the password. In this work, we study the effectiveness of a novel threat model against authentication schemes that split the attacker’s attention. As a case study, we report on a security evaluation of two state of the art authentication schemes in the case of a team of two observers. Our results show that although multiple observers perform better against these schemes than single observers, multimodal schemes are significantly more secure against multiple observers compared to schemes that employ a single modality. We discuss how this threat model impacts the design of authentication schemes.

doi: 10.1145/3152832.3152851

Paper: khamis17_mum.pdf

@inproceedings{khamis17_mum, title = {They are all after you: Investigating the Viability of a Threat Model that involves Multiple Shoulder Surfers}, author = {Khamis, Mohamed and Bandelow, Linda and Schick, Stina and Casadevall, Dario and Bulling, Andreas and Alt, Florian}, year = {2017}, doi = {10.1145/3152832.3152851}, pages = {1--5}, booktitle = {Proc. International Conference on Mobile and Ubiquitous Multimedia (MUM)} }

SkullConduct: Biometric User Identification on Eyewear Computers Using Bone Conduction Through the Skull

Stefan Schneegass, Youssef Oualil, Andreas Bulling

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1379-1384, 2016.

Abstract Links BibTeX Project

Secure user identification is important for the increasing number of eyewear computers but limited input capabilities pose significant usability challenges for established knowledge-based schemes, such as a passwords or PINs. We present SkullConduct, a biometric system that uses bone conduction of sound through the user’s skull as well as a microphone readily integrated into many of these devices, such as Google Glass. At the core of SkullConduct is a method to analyze the characteristic frequency response created by the user’s skull using a combination of Mel Frequency Cepstral Coefficient (MFCC) features as well as a computationally light-weight 1NN classifier. We report on a controlled experiment with 10 participants that shows that this frequency response is person-specific and stable - even when taking off and putting on the device multiple times - and thus serves as a robust biometric. We show that our method can identify users with 97.0% accuracy and authenticate them with an equal error rate of 6.9%, thereby bringing biometric user identification to eyewear computers equipped with bone conduction technology.

doi: 10.1145/2858036.2858152

Paper: schneegass16_chi.pdf

Video: https://www.youtube.com/watch?v=A4BCnsQmo6c

@inproceedings{schneegass16_chi, title = {SkullConduct: Biometric User Identification on Eyewear Computers Using Bone Conduction Through the Skull}, author = {Schneegass, Stefan and Oualil, Youssef and Bulling, Andreas}, year = {2016}, pages = {1379-1384}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/2858036.2858152}, video = {https://www.youtube.com/watch?v=A4BCnsQmo6c} }
GazeTouchPass: Multimodal Authentication Using Gaze and Touch on Mobile Devices

Mohamed Khamis, Florian Alt, Mariam Hassib, Emanuel Zezschwitz, Regina Hasholzner, Andreas Bulling

Ext. Abstr. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 2156-2164, 2016.

Abstract Links BibTeX Project

We propose a multimodal scheme, GazeTouchPass, that combines gaze and touch for shoulder-surfing resistant user authentication on mobile devices. GazeTouchPass allows passwords with multiple switches between input modalities during authentication. This requires attackers to simultaneously observe the device screen and the user’s eyes to find the password. We evaluate the security and usability of GazeTouchPass in two user studies. Our findings show that GazeTouchPass is usable and significantly more secure than single-modal authentication against basic and even advanced shoulder-surfing attacks.

doi: 10.1145/2851581.2892314

Paper: khamis16_chi.pdf

@inproceedings{khamis16_chi, author = {Khamis, Mohamed and Alt, Florian and Hassib, Mariam and von Zezschwitz, Emanuel and Hasholzner, Regina and Bulling, Andreas}, title = {GazeTouchPass: Multimodal Authentication Using Gaze and Touch on Mobile Devices}, booktitle = {Ext. Abstr. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2016}, pages = {2156-2164}, doi = {10.1145/2851581.2892314} }

Graphical Passwords in the Wild – Understanding How Users Choose Pictures and Passwords in Image-based Authentication Schemes

Florian Alt, Stefan Schneegass, Alireza Sahami, Mariam Hassib, Andreas Bulling

Proc. ACM International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI), pp. 316-322, 2015.

Abstract Links BibTeX Project

Common user authentication methods on smartphones, such as lock patterns, PINs, or passwords, impose a trade-off between security and password memorability. Image-based passwords were proposed as a secure and usable alternative. As of today, however, it remains unclear how such schemes are used in the wild. We present the first study to investigate how image-based passwords are used over long periods of time in the real world. Our analyses are based on data from 2318 unique devices collected over more than one year using a custom application released in the Android Play store. We present an in-depth analysis of what kind of images users select, how they define their passwords, and how secure these passwords are. Our findings provide valuable insights into real-world use of image-based passwords and inform the design of future graphical authentication schemes.

doi: 10.1145/2785830.2785882

Paper: alt15_mobilehci.pdf

@inproceedings{alt15_mobilehci, title = {Graphical Passwords in the Wild -- Understanding How Users Choose Pictures and Passwords in Image-based Authentication Schemes}, author = {Alt, Florian and Schneegass, Stefan and Sahami, Alireza and Hassib, Mariam and Bulling, Andreas}, year = {2015}, pages = {316-322}, booktitle = {Proc. ACM International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI)}, doi = {10.1145/2785830.2785882} }

SmudgeSafe: Geometric Image Transformations for Smudge-resistant User Authentication

Stefan Schneegass, Frank Steimle, Andreas Bulling, Florian Alt, Albrecht Schmidt

Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), pp. 775-786, 2014.

Abstract Links BibTeX Project

Touch-enabled user interfaces have become ubiquitous, such as on ATMs or portable devices. At the same time, authentication using touch input is problematic, since finger smudge traces may allow attackers to reconstruct passwords. We present SmudgeSafe, an authentication system that uses random geometric image transformations, such as translation, rotation, scaling, shearing, and flipping, to increase the security of cued-recall graphical passwords. We describe the design space of these transformations and report on two user studies: A lab-based security study involving 20 participants in attacking user-defined passwords, using high quality pictures of real smudge traces captured on a mobile phone display; and an in-the-field usability study with 374 participants who generated more than 130,000 logins on a mobile phone implementation of SmudgeSafe. Results show that SmudgeSafe significantly increases security compared to authentication schemes based on PINs and lock patterns, and exhibits very high learnability, efficiency, and memorability.

doi: 10.1145/2632048.2636090

Paper: schneegass14_ubicomp.pdf

@inproceedings{schneegass14_ubicomp, author = {Schneegass, Stefan and Steimle, Frank and Bulling, Andreas and Alt, Florian and Schmidt, Albrecht}, title = {SmudgeSafe: Geometric Image Transformations for Smudge-resistant User Authentication}, booktitle = {Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)}, year = {2014}, pages = {775-786}, doi = {10.1145/2632048.2636090} }

Increasing the Security of Gaze-Based Cued-Recall Graphical Passwords Using Saliency Masks

Andreas Bulling, Florian Alt, Albrecht Schmidt

Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 3011-3020, 2012.

Abstract Links BibTeX Project

With computers being used ever more ubiquitously in situations where privacy is important, secure user authentication is a central requirement. Gaze-based graphical passwords are a particularly promising means for shoulder-surfing-resistant authentication, but selecting secure passwords remains challenging. In this paper, we present a novel gaze-based authentication scheme that makes use of cued-recall graphical pass- words on a single image. In order to increase password security, our approach uses a computational model of visual attention to mask those areas of the image that are most likely to attract visual attention. We create a realistic threat model for attacks that may occur in public settings, such as filming the user’s interaction while drawing money from an ATM. Based on a 12-participant user study, we show that our approach is significantly more secure than a standard image-based authentication and gaze-based 4-digit PIN entry.

doi: 10.1145/2207676.2208712

Paper: bulling12_chi.pdf

@inproceedings{bulling12_chi, author = {Bulling, Andreas and Alt, Florian and Schmidt, Albrecht}, keywords = {Cued-recall graphical passwords, Eye Tracking, Gaze-based, Saliency masks, user authentication}, title = {Increasing the Security of Gaze-Based Cued-Recall Graphical Passwords Using Saliency Masks}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2012}, pages = {3011-3020}, doi = {10.1145/2207676.2208712} }

Cognition-aware Computing

Social Signal Processing

Multimodal Interaction

Attentive User Interfaces

Usable Security and Privacy

Links

Contact Us