- Spatio-Temporal Modeling and Prediction of Visual Attention in Graphical User Interfaces
(CHI 2016, best paper honourable mention award) - AggreGaze: Collective Estimation of Audience Attention on Public Displays
(UIST 2016, best paper honourable mention award) - Learning an appearance-based gaze estimator from one million synthesised images
(ETRA 2016, emerging investigator award) - Emotion recognition from embedded bodily expressions and speech during dyadic interactions
(ACII 2015) - Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency
(UIST 2015) - Appearance-Based Gaze Estimation in the Wild
(CVPR 2015) - Orbits: Enabling Gaze Interaction in Smart Watches using Moving Targets
(UIST 2015, best paper award) - Prediction of Search Targets From Fixations in Open-World Settings
(CVPR 2015)
pervasive gaze estimation
Gaze estimation is an active topic of research in several fields, most notably mobile and ubiquitous computing, computer vision, and robotics. Advances in head-mounted eyes tracking and egocentric vision promise continuous visual behaviour sensing in mobile everyday settings over days or even weeks. We have been working on advancing the state of the art in both remote and head-mounted gaze estimation for several years. For example, we have developed computer vision methods for appearance-based gaze estimation in the wild using large-scale and learning-by-synthesis methods. We have further presented computational methods for head-mounted eye tracker self-calibration, seamless gaze estimation across multiple hand-held and ambient displays, as well as for robust pupil detection and tracking under challenging real-world occlusion conditions.
selected publications
![]() | Yusuke Sugano; Andreas Bulling Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 363-372, 2015. @inproceedings{Sugano_UIST15, title = {Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency}, author = {Yusuke Sugano and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Sugano_UIST15.pdf https://www.youtube.com/watch?v=CvsZ3YCWFPk}, doi = {10.1145/2807442.2807445}, year = {2015}, date = {2015-11-05}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {363-372}, abstract = {Head-mounted eye tracking has significant potential for gaze-based applications such as life logging, mental health monitoring, or quantified self. However, a neglected challenge for such applications is that drift in the initial person-specific eye tracker calibration, for example caused by physical activity, can severely impact gaze estimation accuracy and, thus, system performance and user experience. We first analyse calibration drift on a new dataset of natural gaze data recorded using synchronised video-based and Electrooculography-based eye trackers of 20 users performing everyday activities in a mobile setting. Based on this analysis we present a method to automatically self-calibrate head-mounted eye trackers based on a computational model of bottom-up visual saliency. Through evaluations on the dataset we show that our method is 1) effective in reducing calibration drift in calibrated eye trackers and 2) given sufficient data, can achieve competitive gaze estimation accuracy to a calibrated eye tracker without any manual calibration.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Head-mounted eye tracking has significant potential for gaze-based applications such as life logging, mental health monitoring, or quantified self. However, a neglected challenge for such applications is that drift in the initial person-specific eye tracker calibration, for example caused by physical activity, can severely impact gaze estimation accuracy and, thus, system performance and user experience. We first analyse calibration drift on a new dataset of natural gaze data recorded using synchronised video-based and Electrooculography-based eye trackers of 20 users performing everyday activities in a mobile setting. Based on this analysis we present a method to automatically self-calibrate head-mounted eye trackers based on a computational model of bottom-up visual saliency. Through evaluations on the dataset we show that our method is 1) effective in reducing calibration drift in calibrated eye trackers and 2) given sufficient data, can achieve competitive gaze estimation accuracy to a calibrated eye tracker without any manual calibration. |
![]() | Christian Lander; Sven Gehring; Antonio Krüger; Sebastian Boring; Andreas Bulling GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 395-404, 2015. @inproceedings{Lander_UIST15, title = {GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays}, author = {Christian Lander and Sven Gehring and Antonio Krüger and Sebastian Boring and Andreas Bulling }, url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Lander_UIST15.pdf https://www.youtube.com/watch?v=peuL4WRfrRM}, doi = {10.1145/2807442.2807479}, year = {2015}, date = {2015-11-01}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {395-404}, abstract = {Mobile gaze-based interaction with multiple displays may occur from arbitrary positions and orientations. However, maintaining high gaze estimation accuracy in such situations remains a significant challenge. In this paper, we present GazeProjector, a system that combines (1) natural feature tracking on displays to determine the mobile eye tracker’s position relative to a display with (2) accurate point-of-gaze estimation. GazeProjector allows for seamless gaze estimation and interaction on multiple displays of arbitrary sizes independently of the user’s position and orientation to the display. In a user study with 12 participants we compare GazeProjector to established methods (here: visual on-screen markers and a state-of-the-art video-based motion capture system). We show that our approach is robust to varying head poses, orientations, and distances to the display, while still providing high gaze estimation accuracy across multiple displays without re-calibration for each variation. Our system represents an important step towards the vision of pervasive gaze-based interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Mobile gaze-based interaction with multiple displays may occur from arbitrary positions and orientations. However, maintaining high gaze estimation accuracy in such situations remains a significant challenge. In this paper, we present GazeProjector, a system that combines (1) natural feature tracking on displays to determine the mobile eye tracker’s position relative to a display with (2) accurate point-of-gaze estimation. GazeProjector allows for seamless gaze estimation and interaction on multiple displays of arbitrary sizes independently of the user’s position and orientation to the display. In a user study with 12 participants we compare GazeProjector to established methods (here: visual on-screen markers and a state-of-the-art video-based motion capture system). We show that our approach is robust to varying head poses, orientations, and distances to the display, while still providing high gaze estimation accuracy across multiple displays without re-calibration for each variation. Our system represents an important step towards the vision of pervasive gaze-based interfaces. |
![]() | Xucong Zhang; Yusuke Sugano; Mario Fritz; Andreas Bulling Appearance-Based Gaze Estimation in the Wild Inproceedings Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 4511-4520, 2015. @inproceedings{zhang15_cvpr, title = {Appearance-Based Gaze Estimation in the Wild}, author = {Xucong Zhang and Yusuke Sugano and Mario Fritz and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/04/zhang_CVPR15.pdf https://www.youtube.com/watch?v=rw6LZA1USG8 https://perceptual.mpi-inf.mpg.de/research/datasets/#zhang15_cvpr}, doi = {10.1109/CVPR.2015.7299081}, year = {2015}, date = {2015-03-02}, booktitle = {Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)}, pages = {4511-4520}, abstract = {Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild. |
![]() | Erroll Wood; Tadas Baltrusaitis; Xucong Zhang; Yusuke Sugano; Peter Robinson; Andreas Bulling Rendering of Eyes for Eye-Shape Registration and Gaze Estimation Inproceedings Proc. of the IEEE International Conference on Computer Vision (ICCV 2015), pp. 3756-3764, 2015. @inproceedings{wood2015_iccv, title = {Rendering of Eyes for Eye-Shape Registration and Gaze Estimation}, author = {Erroll Wood and Tadas Baltrusaitis and Xucong Zhang and Yusuke Sugano and Peter Robinson and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/wp-content/blogs.dir/12/files/2016/06/wood2015_iccv.pdf http://www.technologyreview.com/view/537891/virtual-eyes-train-deep-learning-algorithm-to-recognize-gaze-direction/ http://www.cl.cam.ac.uk/research/rainbow/projects/syntheseyes/}, doi = {10.1109/ICCV.2015.428}, year = {2015}, date = {2015-01-01}, booktitle = {Proc. of the IEEE International Conference on Computer Vision (ICCV 2015)}, pages = {3756-3764}, abstract = {Images of the eye are key in several computer vision problems, such as shape registration and gaze estimation. Recent large-scale supervised methods for these problems require time-consuming data collection and manual annotation, which can be unreliable. We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. We used computer graphics techniques to build a collection of dynamic eye-region models from head scan geometry. These were randomly posed to synthesize close-up eye images for a wide range of head poses, gaze directions, and illumination conditions. We used our model's controllability to verify the importance of realistic illumination and shape variations in eye-region training data. Finally, we demonstrate the benefits of our synthesized training data (SynthesEyes) by out-performing state-of-the-art methods for eye-shape registration as well as cross-dataset appearance-based gaze estimation in the wild.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Images of the eye are key in several computer vision problems, such as shape registration and gaze estimation. Recent large-scale supervised methods for these problems require time-consuming data collection and manual annotation, which can be unreliable. We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. We used computer graphics techniques to build a collection of dynamic eye-region models from head scan geometry. These were randomly posed to synthesize close-up eye images for a wide range of head poses, gaze directions, and illumination conditions. We used our model's controllability to verify the importance of realistic illumination and shape variations in eye-region training data. Finally, we demonstrate the benefits of our synthesized training data (SynthesEyes) by out-performing state-of-the-art methods for eye-shape registration as well as cross-dataset appearance-based gaze estimation in the wild. |
![]() | Lech Świrski; Andreas Bulling; Neil Dodgson Robust, real-time pupil tracking in highly off-axis images Inproceedings Proc. of the 7th International Symposium on Eye Tracking Research and Applications (ETRA 2012), pp. 173-176, 2012. @inproceedings{swirski2012_etra, title = {Robust, real-time pupil tracking in highly off-axis images}, author = {Lech Świrski and Andreas Bulling and Neil Dodgson}, url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/swirski12_etra.pdf}, doi = {10.1145/2168556.2168585}, year = {2012}, date = {2012-01-01}, booktitle = {Proc. of the 7th International Symposium on Eye Tracking Research and Applications (ETRA 2012)}, pages = {173-176}, abstract = {Robust, accurate, real-time pupil tracking is a key component for online gaze estimation. On head-mounted eye trackers, existing algorithms that rely on circular pupils or contiguous pupil regions fail to detect or accurately track the pupil. This is because the pupil ellipse is often highly eccentric and partially occluded by eyelashes. We present a novel, real-time dark-pupil tracking algorithm that is robust under such conditions. Our approach uses a Haar-like feature detector to roughly estimate the pupil location, performs a k-means segmentation on the surrounding region to refine the pupil centre, and fits an ellipse to the pupil using a novel image-aware Random Sample Concensus (RANSAC) ellipse fitting. We compare our approach against existing real-time pupil tracking implementations, using a set of manually labelled infra-red dark-pupil eye images. We show that our technique has a higher pupil detection rate and greater pupil tracking accuracy.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Robust, accurate, real-time pupil tracking is a key component for online gaze estimation. On head-mounted eye trackers, existing algorithms that rely on circular pupils or contiguous pupil regions fail to detect or accurately track the pupil. This is because the pupil ellipse is often highly eccentric and partially occluded by eyelashes. We present a novel, real-time dark-pupil tracking algorithm that is robust under such conditions. Our approach uses a Haar-like feature detector to roughly estimate the pupil location, performs a k-means segmentation on the surrounding region to refine the pupil centre, and fits an ellipse to the pupil using a novel image-aware Random Sample Concensus (RANSAC) ellipse fitting. We compare our approach against existing real-time pupil tracking implementations, using a set of manually labelled infra-red dark-pupil eye images. We show that our technique has a higher pupil detection rate and greater pupil tracking accuracy. |
visual behaviour modelling and analysis
User modelling is among the most fundamental problems in human-computer interaction and ubiquitous computing. We have shown that everyday activities, such as reading or common office activities, can be predicted in both stationary and mobile settings from eye movements alone. Eye movements are closely linked to human visual information processing and cognition, such as perceptual learning, experience, or visual search. We have therefore further explored eye movement analysis as a promising approach towards the vision of cognition-aware computing: Computing systems that sense and adapt to covert aspects of user state. The vast majority of previous works focused on short-term visual behaviour lasting only a few minutes. We have contributed methods for recognition of high-level contextual cues, such as social interactions, as well as for discovery of everyday activities from visual behaviour.
selected publications
![]() | Julian Steil; Andreas Bulling Discovery of Everyday Human Activities From Long-Term Visual Behaviour Using Topic Models Inproceedings Proc. of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015), pp. 75-85, 2015. @inproceedings{Steil_Ubicomp15, title = {Discovery of Everyday Human Activities From Long-Term Visual Behaviour Using Topic Models}, author = {Julian Steil and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Steil_Ubicomp15.pdf https://perceptual.mpi-inf.mpg.de/research/datasets/#steil15_ubicomp}, doi = {10.1145/2750858.2807520}, year = {2015}, date = {2015-05-21}, booktitle = {Proc. of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015)}, pages = {75-85}, abstract = {Human visual behaviour has significant potential for activity recognition and computational behaviour analysis, but previous works focused on supervised methods and recognition of predefined activity classes based on short-term eye movement recordings. We propose a fully unsupervised method to discover users' everyday activities from their long-term visual behaviour. Our method combines a bag-of-words representation of visual behaviour that encodes saccades, fixations, and blinks with a latent Dirichlet allocation (LDA) topic model. We further propose different methods to encode saccades for their use in the topic model. We evaluate our method on a novel long-term gaze dataset that contains full-day recordings of natural visual behaviour of 10 participants (more than 80 hours in total). We also provide annotations for eight sample activity classes (outdoor, social interaction, focused work, travel, reading, computer work, watching media, eating) and periods with no specific activity. We show the ability of our method to discover these activities with performance competitive with that of previously published supervised methods.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Human visual behaviour has significant potential for activity recognition and computational behaviour analysis, but previous works focused on supervised methods and recognition of predefined activity classes based on short-term eye movement recordings. We propose a fully unsupervised method to discover users' everyday activities from their long-term visual behaviour. Our method combines a bag-of-words representation of visual behaviour that encodes saccades, fixations, and blinks with a latent Dirichlet allocation (LDA) topic model. We further propose different methods to encode saccades for their use in the topic model. We evaluate our method on a novel long-term gaze dataset that contains full-day recordings of natural visual behaviour of 10 participants (more than 80 hours in total). We also provide annotations for eight sample activity classes (outdoor, social interaction, focused work, travel, reading, computer work, watching media, eating) and periods with no specific activity. We show the ability of our method to discover these activities with performance competitive with that of previously published supervised methods. |
![]() | Andreas Bulling; Thorsten O. Zander Cognition-Aware Computing Journal Article IEEE Pervasive Computing, 13 (3), pp. 80-83, 2014. @article{bulling14_pcm, title = {Cognition-Aware Computing}, author = { Andreas Bulling and Thorsten O. Zander}, url = {http://dx.doi.org/10.1109/mprv.2014.42 https://perceptual.mpi-inf.mpg.de/files/2014/10/bulling14_pervasive.pdf}, year = {2014}, date = {2014-01-01}, journal = {IEEE Pervasive Computing}, volume = {13}, number = {3}, pages = {80-83}, abstract = {Despite significant advances in context sensing and inference since its inception in the late 1990s, context-aware computing still doesn't implement a holistic view of all covert aspects of the user state. Here, the authors introduce the concept of cognitive context as an extension to the current notion of context with a cognitive dimension. They argue that visual behavior and brain activity are two promising sensing modalities for assessing the cognitive context and thus the development of cognition-aware computing systems.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Despite significant advances in context sensing and inference since its inception in the late 1990s, context-aware computing still doesn't implement a holistic view of all covert aspects of the user state. Here, the authors introduce the concept of cognitive context as an extension to the current notion of context with a cognitive dimension. They argue that visual behavior and brain activity are two promising sensing modalities for assessing the cognitive context and thus the development of cognition-aware computing systems. |
![]() | Andreas Bulling; Jamie A. Ward; Hans Gellersen Multimodal Recognition of Reading Activity in Transit Using Body-Worn Sensors Journal Article ACM Transactions on Applied Perception, 9 (1), pp. 2:1–2:21, 2012. @article{bulling12_tap, title = {Multimodal Recognition of Reading Activity in Transit Using Body-Worn Sensors}, author = {Andreas Bulling and Jamie A. Ward and Hans Gellersen}, url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling12_tap.pdf}, doi = {10.1145/2134203.2134205}, year = {2012}, date = {2012-01-01}, journal = {ACM Transactions on Applied Perception}, volume = {9}, number = {1}, pages = {2:1--2:21}, abstract = {Reading is one of the most well studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognise reading activity by jointly analysing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognising reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognising reading events by more than 24%. Our work demonstrates that the joint analysis of multiple modalities is beneficial for reading recognition and opens up discussion on the wider applicability of this recognition approach to other visual and physical activities.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Reading is one of the most well studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognise reading activity by jointly analysing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognising reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognising reading events by more than 24%. Our work demonstrates that the joint analysis of multiple modalities is beneficial for reading recognition and opens up discussion on the wider applicability of this recognition approach to other visual and physical activities. |
![]() | Andreas Bulling; Jamie A. Ward; Hans Gellersen; Gerhard Tröster Eye Movement Analysis for Activity Recognition Using Electrooculography Journal Article IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (4), pp. 741-753, 2011. @article{bulling11_pami, title = {Eye Movement Analysis for Activity Recognition Using Electrooculography}, author = {Andreas Bulling and Jamie A. Ward and Hans Gellersen and Gerhard Tröster}, url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling11_pami.pdf http://doi.ieeecomputersociety.org/10.1109/TPAMI.2010.86}, year = {2011}, date = {2011-01-01}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {33}, number = {4}, pages = {741-753}, abstract = {In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data was recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals - saccades, fixations, and blinks - and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video, and browsing the web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data was recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals - saccades, fixations, and blinks - and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video, and browsing the web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities. |
![]() | Andreas Bulling; Daniel Roggen Recognition of Visual Memory Recall Processes Using Eye Movement Analysis Inproceedings Proc. of the 13th International Conference on Ubiquitous Computing (UbiComp 2011), pp. 455-464, 2011. @inproceedings{bulling11_ubicomp, title = {Recognition of Visual Memory Recall Processes Using Eye Movement Analysis}, author = {Andreas Bulling and Daniel Roggen}, url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling11_ubicomp.pdf}, year = {2011}, date = {2011-01-01}, booktitle = {Proc. of the 13th International Conference on Ubiquitous Computing (UbiComp 2011)}, pages = {455-464}, abstract = {Physical activity, location, as well as a person's psychophysiological and affective state are common dimensions for developing context-aware systems in ubiquitous computing. An important yet missing contextual dimension is the cognitive context that comprises all aspects related to mental information processing, such as perception, memory, knowledge, or learning. In this work we investigate the feasibility of recognising visual memory recall. We use a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the methodology in a dual user study with a total of fourteen participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. Using person-independent training, we are able to discriminate between familiar and unfamiliar abstract pictures with a top recognition rate of 84.3% (89.3% recall, 21.0% false positive rate) over all participants. We show that eye movement analysis is a promising approach to infer the cognitive context of a person and discuss the key challenges for the real-world implementation of eye-based cognition-aware systems.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Physical activity, location, as well as a person's psychophysiological and affective state are common dimensions for developing context-aware systems in ubiquitous computing. An important yet missing contextual dimension is the cognitive context that comprises all aspects related to mental information processing, such as perception, memory, knowledge, or learning. In this work we investigate the feasibility of recognising visual memory recall. We use a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the methodology in a dual user study with a total of fourteen participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. Using person-independent training, we are able to discriminate between familiar and unfamiliar abstract pictures with a top recognition rate of 84.3% (89.3% recall, 21.0% false positive rate) over all participants. We show that eye movement analysis is a promising approach to infer the cognitive context of a person and discuss the key challenges for the real-world implementation of eye-based cognition-aware systems. |
everyday gaze-based human-computer interfaces
Despite considerable advances in eye tracking, previous work on eye-based human-computer interfaces mainly developed use of the eyes in settings that involved single user, single device, and WIMP-style interactions. This is despite the fact that the eyes are involved in nearly everything that we do and thus potentially hold a lot of valuable information for interactive systems. In this spirit, we have introduced smooth pursuit eye movements — the movements we perform when latching onto a moving object — as a novel gaze interaction technique for dynamic interfaces. We have demonstrated the use of pursuits for eye tracker calibration as well as interaction with smart watches. Inspired by how visual attention mediates interactions between humans, we have further proposed social gaze as a new paradigm for designing user interfaces that react to visual attention. Another important research direction is to use gaze for interaction in unconstrained everyday settings, in particular with the increasing number of personal devices and ambient displays.
selected publications
![]() | Augusto Esteves; Eduardo Velloso; Andreas Bulling; Hans Gellersen Orbits: Gaze Interaction in Smart Watches using Moving Targets Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 457-466, 2015, (best paper award). @inproceedings{Esteves_UIST15, title = {Orbits: Gaze Interaction in Smart Watches using Moving Targets}, author = {Augusto Esteves and Eduardo Velloso and Andreas Bulling and Hans Gellersen}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/09/Esteves_UIST15.pdf https://www.youtube.com/watch?v=KEIgw5A0yfI http://www.wired.co.uk/news/archive/2016-01/22/eye-tracking-smartwatch}, doi = {10.1145/2807442.2807499}, year = {2015}, date = {2015-11-01}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {457-466}, abstract = {We introduce Orbits, a novel gaze interaction technique that enables hands-free input on smart watches. The technique relies on moving controls to leverage the smooth pursuit movements of the eyes and detect whether and at which control the user is looking at. In Orbits, controls include targets that move in a circular trajectory in the face of the watch, and can be selected by following the desired one for a small amount of time. We conducted two user studies to assess the technique’s recognition and robustness, which demonstrated how Orbits is robust against false positives triggered by natural eye movements and how it presents a hands-free, high accuracy way of interacting with smart watches using off-the-shelf devices. Finally, we developed three example interfaces built with Orbits: a music player, a notifications face plate and a missed call menu. Despite relying on moving controls – very unusual in current HCI interfaces – these were generally well received by participants in a third and final study.}, note = {best paper award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We introduce Orbits, a novel gaze interaction technique that enables hands-free input on smart watches. The technique relies on moving controls to leverage the smooth pursuit movements of the eyes and detect whether and at which control the user is looking at. In Orbits, controls include targets that move in a circular trajectory in the face of the watch, and can be selected by following the desired one for a small amount of time. We conducted two user studies to assess the technique’s recognition and robustness, which demonstrated how Orbits is robust against false positives triggered by natural eye movements and how it presents a hands-free, high accuracy way of interacting with smart watches using off-the-shelf devices. Finally, we developed three example interfaces built with Orbits: a music player, a notifications face plate and a missed call menu. Despite relying on moving controls – very unusual in current HCI interfaces – these were generally well received by participants in a third and final study. |
![]() | Yanxia Zhang; Ming Ki Chong; Jörg Müller; Andreas Bulling; Hans Gellersen Eye Tracking for Public Displays in the Wild Journal Article Personal and Ubiquitous Computing, 19 (5), pp. 967-981, 2015. @article{Zhang15_PUC, title = {Eye Tracking for Public Displays in the Wild}, author = {Yanxia Zhang and Ming Ki Chong and Jörg Müller and Andreas Bulling and Hans Gellersen}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/07/Zhang15_UC.pdf}, doi = {10.1007/s00779-015-0866-8}, year = {2015}, date = {2015-07-03}, journal = {Personal and Ubiquitous Computing}, volume = {19}, number = {5}, pages = {967-981}, abstract = {In public display contexts, interactions are spontaneous and have to work without preparation. We propose gaze as a modality for such con- texts, as gaze is always at the ready, and a natural indicator of the user’s interest. We present GazeHorizon, a system that demonstrates sponta- neous gaze interaction, enabling users to walk up to a display and navi- gate content using their eyes only. GazeHorizon is extemporaneous and optimised for instantaneous usability by any user without prior configura- tion, calibration or training. The system provides interactive assistance to bootstrap gaze interaction with unaware users, employs a single off-the- shelf web camera and computer vision for person-independent tracking of the horizontal gaze direction, and maps this input to rate-controlled nav- igation of horizontally arranged content. We have evaluated GazeHorizon through a series of field studies, culminating in a four-day deployment in a public environment during which over a hundred passers-by interacted with it, unprompted and unassisted. We realised that since eye move- ments are subtle, users cannot learn gaze interaction from only observing others, and as a results guidance is required.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In public display contexts, interactions are spontaneous and have to work without preparation. We propose gaze as a modality for such con- texts, as gaze is always at the ready, and a natural indicator of the user’s interest. We present GazeHorizon, a system that demonstrates sponta- neous gaze interaction, enabling users to walk up to a display and navi- gate content using their eyes only. GazeHorizon is extemporaneous and optimised for instantaneous usability by any user without prior configura- tion, calibration or training. The system provides interactive assistance to bootstrap gaze interaction with unaware users, employs a single off-the- shelf web camera and computer vision for person-independent tracking of the horizontal gaze direction, and maps this input to rate-controlled nav- igation of horizontally arranged content. We have evaluated GazeHorizon through a series of field studies, culminating in a four-day deployment in a public environment during which over a hundred passers-by interacted with it, unprompted and unassisted. We realised that since eye move- ments are subtle, users cannot learn gaze interaction from only observing others, and as a results guidance is required. |
![]() | Mélodie Vidal; Remi Bismuth; Andreas Bulling; Hans Gellersen The Royal Corgi: Exploring Social Gaze Interaction for Immersive Gameplay Inproceedings Proc. of the 33rd ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2015), pp. 115-124, 2015. @inproceedings{Vidal_CHI15, title = {The Royal Corgi: Exploring Social Gaze Interaction for Immersive Gameplay}, author = {Mélodie Vidal and Remi Bismuth and Andreas Bulling and Hans Gellersen}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/01/Vidal_CHI15.pdf}, doi = {10.1145/2702123.2702163}, year = {2015}, date = {2015-01-11}, booktitle = {Proc. of the 33rd ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2015)}, journal = {Proc. of the 33rd ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2015)}, pages = {115-124}, abstract = {The eyes are a rich channel for non-verbal communication in our daily interactions. We propose social gaze interaction as a game mechanic to enhance user interactions with virtual characters. We develop a game from the ground-up in which characters are esigned to be reactive to the player’s gaze in social ways, such as etting annoyed when the player seems distracted or changing their dialogue depending on the player’s apparent focus of ttention. Results from a qualitative user study provide insights bout how social gaze interaction is intuitive for users, elicits deep feelings of immersion, and highlight the players’ self-consciousness of their own eye movements through their strong reactions to the characters.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The eyes are a rich channel for non-verbal communication in our daily interactions. We propose social gaze interaction as a game mechanic to enhance user interactions with virtual characters. We develop a game from the ground-up in which characters are esigned to be reactive to the player’s gaze in social ways, such as etting annoyed when the player seems distracted or changing their dialogue depending on the player’s apparent focus of ttention. Results from a qualitative user study provide insights bout how social gaze interaction is intuitive for users, elicits deep feelings of immersion, and highlight the players’ self-consciousness of their own eye movements through their strong reactions to the characters. |
![]() | Ken Pfeuffer; Mélodie Vidal; Jayson Turner; Andreas Bulling; Hans Gellersen Pursuit Calibration: Making Gaze Calibration Less Tedious and More Flexible Inproceedings Proc. of the 26th ACM Symposium on User Interface Software and Technology (UIST 2013), pp. 261-270 , 2013. @inproceedings{pfeuffer13_uist, title = {Pursuit Calibration: Making Gaze Calibration Less Tedious and More Flexible}, author = {Ken Pfeuffer and Mélodie Vidal and Jayson Turner and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2501988.2501998 https://perceptual.mpi-inf.mpg.de/files/2013/10/pfeuffer13_uist.pdf https://www.youtube.com/watch?v=T7S76L1Rkow}, year = {2013}, date = {2013-10-08}, booktitle = {Proc. of the 26th ACM Symposium on User Interface Software and Technology (UIST 2013)}, pages = { 261-270 }, abstract = {Eye gaze is a compelling interaction modality but requires a user calibration before interaction can commence. State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. We present a novel approach, pursuit calibration, that instead uses moving targets for calibration. Users naturally perform smooth pursuit eye movements when they follow a moving target, and we use correlation of eye and target movement to detect the users attention and to sample data for calibration. Because the method knows when the users is attending to a target, the calibration can be performed implicitly, which enables more flexible design of the calibration task. We demonstrate this in application examples and user studies, and show that pursuit calibration is tolerant to interruption, can blend naturally with applications, and is able to calibrate users without their awareness.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Eye gaze is a compelling interaction modality but requires a user calibration before interaction can commence. State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. We present a novel approach, pursuit calibration, that instead uses moving targets for calibration. Users naturally perform smooth pursuit eye movements when they follow a moving target, and we use correlation of eye and target movement to detect the users attention and to sample data for calibration. Because the method knows when the users is attending to a target, the calibration can be performed implicitly, which enables more flexible design of the calibration task. We demonstrate this in application examples and user studies, and show that pursuit calibration is tolerant to interruption, can blend naturally with applications, and is able to calibrate users without their awareness. |
![]() | Mélodie Vidal; Andreas Bulling; Hans Gellersen Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets Inproceedings Proc. of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2013), pp. 439-448 , 2013. @inproceedings{vidal13_ubicomp, title = {Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets}, author = {Mélodie Vidal and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2493432.2493477 https://perceptual.mpi-inf.mpg.de/files/2013/10/vidal13_ubicomp.pdf https://www.youtube.com/watch?v=fpVPD_wQAWo}, year = {2013}, date = {2013-09-08}, booktitle = {Proc. of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2013)}, pages = { 439-448 }, abstract = {Although gaze is an attractive modality for pervasive interactions, the real-world implementation of eye-based interfaces poses significant challenges, such as calibration. We present Pursuits, an innovative interaction technique that enables truly spontaneous interaction with eye-based interfaces. A user can simply walk up to the screen and readily interact with moving targets. Instead of being based on gaze location, Pursuits correlates eye pursuit movements with objects dynamically moving on the interface. We evaluate the influence of target speed, number and trajectory and develop guidelines for designing Pursuits-based interfaces. We then describe six realistic usage scenarios and implement three of them to evaluate the method in a usability study and a field study. Our results show that Pursuits is a versatile and robust technique and that users can interact with Pursuits-based interfaces without prior knowledge or preparation phase.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Although gaze is an attractive modality for pervasive interactions, the real-world implementation of eye-based interfaces poses significant challenges, such as calibration. We present Pursuits, an innovative interaction technique that enables truly spontaneous interaction with eye-based interfaces. A user can simply walk up to the screen and readily interact with moving targets. Instead of being based on gaze location, Pursuits correlates eye pursuit movements with objects dynamically moving on the interface. We evaluate the influence of target speed, number and trajectory and develop guidelines for designing Pursuits-based interfaces. We then describe six realistic usage scenarios and implement three of them to evaluate the method in a usability study and a field study. Our results show that Pursuits is a versatile and robust technique and that users can interact with Pursuits-based interfaces without prior knowledge or preparation phase. |