GazeDirector: Fully Articulated Eye Gaze Redirection in Video
![]() | Erroll Wood; Tadas Baltrušaitis; Louis-Philippe Morency; Peter Robinson; Andreas Bulling GazeDirector: Fully Articulated Eye Gaze Redirection in Video Technical Report arXiv:1704.08763, 2017. @techreport{wood17_arxiv, title = {GazeDirector: Fully Articulated Eye Gaze Redirection in Video}, author = {Erroll Wood and Tadas Baltrušaitis and Louis-Philippe Morency and Peter Robinson and Andreas Bulling}, url = {https://arxiv.org/abs/1704.08763 https://perceptual.mpi-inf.mpg.de/files/2017/05/wood17_arxiv.pdf https://www.youtube.com/watch?v=xxFUjUUSmgU}, year = {2017}, date = {2017-01-01}, abstract = {We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior.}, type = {arXiv:1704.08763}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior. |
AggreGaze: Collective Estimation of Audience Attention on Public Displays (UIST ’16)
![]() | Yusuke Sugano; Xucong Zhang; Andreas Bulling AggreGaze: Collective Estimation of Audience Attention on Public Displays Inproceedings Proc. of the ACM Symposium on User Interface Software and Technology (UIST), pp. 821-831, 2016, (best paper honourable mention award). @inproceedings{sugano16_uist, title = {AggreGaze: Collective Estimation of Audience Attention on Public Displays}, author = {Yusuke Sugano and Xucong Zhang and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2016/09/sugano16_uist.pdf https://www.youtube.com/watch?v=eFK39S_lgdg http://s2017.siggraph.org/acm-siggraph-organization-events/sessions/uist-reprise-siggraph-2017}, doi = {10.1145/2984511.2984536}, year = {2016}, date = {2016-06-26}, booktitle = {Proc. of the ACM Symposium on User Interface Software and Technology (UIST)}, pages = {821-831}, abstract = {Gaze is frequently explored in public display research given its importance for monitoring and analysing audience attention. However, current gaze-enabled public display interfaces require either special-purpose eye tracking equipment or explicit personal calibration for each individual user. We present AggreGaze, a novel method for estimating spatio-temporal audience attention on public displays. Our method requires only a single off-the-shelf camera attached to the display, does not require any personal calibration, and provides visual attention estimates across the full display. We achieve this by 1) compensating for errors of state-of-the-art appearance-based gaze estimation methods through on-site training data collection, and by 2) aggregating uncalibrated and thus inaccurate gaze estimates of multiple users into joint attention estimates. We propose different visual stimuli for this compensation: a standard 9-point calibration, moving targets, text and visual stimuli embedded into the display content, as well as normal video content. Based on a two-week deployment in a public space, we demonstrate the effectiveness of our method for estimating attention maps that closely resemble ground-truth audience gaze distributions.}, note = {best paper honourable mention award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Gaze is frequently explored in public display research given its importance for monitoring and analysing audience attention. However, current gaze-enabled public display interfaces require either special-purpose eye tracking equipment or explicit personal calibration for each individual user. We present AggreGaze, a novel method for estimating spatio-temporal audience attention on public displays. Our method requires only a single off-the-shelf camera attached to the display, does not require any personal calibration, and provides visual attention estimates across the full display. We achieve this by 1) compensating for errors of state-of-the-art appearance-based gaze estimation methods through on-site training data collection, and by 2) aggregating uncalibrated and thus inaccurate gaze estimates of multiple users into joint attention estimates. We propose different visual stimuli for this compensation: a standard 9-point calibration, moving targets, text and visual stimuli embedded into the display content, as well as normal video content. Based on a two-week deployment in a public space, we demonstrate the effectiveness of our method for estimating attention maps that closely resemble ground-truth audience gaze distributions. |
A 3D Morphable Eye Region Model for Gaze Estimation (ECCV ’16)
![]() | Erroll Wood; Tadas Baltrusaitis; Louis-Philippe Morency; Peter Robinson; Andreas Bulling A 3D Morphable Eye Region Model for Gaze Estimation Inproceedings Proc. of the European Conference on Computer Vision (ECCV), pp. 297-313, 2016. @inproceedings{wood16_eccv, title = {A 3D Morphable Eye Region Model for Gaze Estimation}, author = {Erroll Wood and Tadas Baltrusaitis and Louis-Philippe Morency and Peter Robinson and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2017/02/wood16_eccv.pdf https://www.youtube.com/watch?v=n_htSvUq7iU}, doi = {10.1007/978-3-319-46448-0_18}, year = {2016}, date = {2016-07-11}, booktitle = {Proc. of the European Conference on Computer Vision (ECCV)}, pages = {297-313}, abstract = {Morphable face models are a powerful tool, but have previ- ously failed to model the eye accurately due to complexities in its material and motion. We present a new multi-part model of the eye that includes a morphable model of the facial eye region, as well as an anatomy-based eyeball model. It is the first morphable model that accurately captures eye region shape, since it was built from high-quality head scans. It is also the first to allow independent eyeball movement, since we treat it as a separate part. To showcase our model we present a new method for illumination- and head-pose–invariant gaze estimation from a single RGB image. We fit our model to an image through analysis-by-synthesis, solving for eye region shape, texture, eyeball pose, and illumination simul- taneously. The fitted eyeball pose parameters are then used to estimate gaze direction. Through evaluation on two standard datasets we show that our method generalizes to both webcam and high-quality camera images, and outperforms a state-of-the-art CNN method achieving a gaze estimation accuracy of 9.44° in a challenging user-independent scenario.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Morphable face models are a powerful tool, but have previ- ously failed to model the eye accurately due to complexities in its material and motion. We present a new multi-part model of the eye that includes a morphable model of the facial eye region, as well as an anatomy-based eyeball model. It is the first morphable model that accurately captures eye region shape, since it was built from high-quality head scans. It is also the first to allow independent eyeball movement, since we treat it as a separate part. To showcase our model we present a new method for illumination- and head-pose–invariant gaze estimation from a single RGB image. We fit our model to an image through analysis-by-synthesis, solving for eye region shape, texture, eyeball pose, and illumination simul- taneously. The fitted eyeball pose parameters are then used to estimate gaze direction. Through evaluation on two standard datasets we show that our method generalizes to both webcam and high-quality camera images, and outperforms a state-of-the-art CNN method achieving a gaze estimation accuracy of 9.44° in a challenging user-independent scenario. |
SkullConduct: Biometric User Identification on Eyewear Computers (CHI ’16)
![]() | Stefan Schneegass; Youssef Oualil; Andreas Bulling SkullConduct: Biometric User Identification on Eyewear Computers Using Bone Conduction Through the Skull Inproceedings Proc. of the 34th ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1379-1384, 2016, ISBN: 978-1-4503-3362-7. @inproceedings{schneegass16_chi, title = {SkullConduct: Biometric User Identification on Eyewear Computers Using Bone Conduction Through the Skull}, author = {Stefan Schneegass and Youssef Oualil and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2016/01/schneegass16_chi.pdf https://www.youtube.com/watch?v=A4BCnsQmo6c https://www.newscientist.com/article/2085430-the-buzz-of-your-skull-can-be-used-to-tell-exactly-who-you-are/ http://www.golem.de/news/skullconduct-der-schaedel-meldet-den-nutzer-an-der-datenbrille-an-1605-120892.html http://gizmodo.com/youll-never-forget-your-password-when-its-the-sound-you-1772327137 https://www.washingtonpost.com/news/the-switch/wp/2016/04/22/could-skull-echos-and-brainprints-replace-the-password/ http://www.computerwelt.at/news/technologie-strategie/security/detail/artikel/115454-entsperrung-durch-schaedelknochen-loest-passwoerter-ab/ http://www.uni-saarland.de/nc/aktuelles/artikel/nr/14597.html}, doi = {10.1145/2858036.2858152}, isbn = {978-1-4503-3362-7}, year = {2016}, date = {2016-05-07}, booktitle = {Proc. of the 34th ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, pages = {1379-1384}, abstract = {Secure user identification is important for the increasing number of eyewear computers but limited input capabilities pose significant usability challenges for established knowledge-based schemes, such as a passwords or PINs. We present SkullConduct, a biometric system that uses bone conduction of sound through the user's skull as well as a microphone readily integrated into many of these devices, such as Google Glass. At the core of SkullConduct is a method to analyze the characteristic frequency response created by the user's skull using a combination of Mel Frequency Cepstral Coefficient (MFCC) features as well as a computationally light-weight 1NN classifier. We report on a controlled experiment with 10 participants that shows that this frequency response is person-specific and stable - even when taking off and putting on the device multiple times - and thus serves as a robust biometric. We show that our method can identify users with 97.0% accuracy and authenticate them with an equal error rate of 6.9%, thereby bringing biometric user identification to eyewear computers equipped with bone conduction technology.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Secure user identification is important for the increasing number of eyewear computers but limited input capabilities pose significant usability challenges for established knowledge-based schemes, such as a passwords or PINs. We present SkullConduct, a biometric system that uses bone conduction of sound through the user's skull as well as a microphone readily integrated into many of these devices, such as Google Glass. At the core of SkullConduct is a method to analyze the characteristic frequency response created by the user's skull using a combination of Mel Frequency Cepstral Coefficient (MFCC) features as well as a computationally light-weight 1NN classifier. We report on a controlled experiment with 10 participants that shows that this frequency response is person-specific and stable - even when taking off and putting on the device multiple times - and thus serves as a robust biometric. We show that our method can identify users with 97.0% accuracy and authenticate them with an equal error rate of 6.9%, thereby bringing biometric user identification to eyewear computers equipped with bone conduction technology.
|
GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays (UIST ’15)
![]() | Christian Lander; Sven Gehring; Antonio Krüger; Sebastian Boring; Andreas Bulling GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 395-404, 2015. @inproceedings{Lander_UIST15, title = {GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays}, author = {Christian Lander and Sven Gehring and Antonio Krüger and Sebastian Boring and Andreas Bulling }, url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Lander_UIST15.pdf https://www.youtube.com/watch?v=peuL4WRfrRM}, doi = {10.1145/2807442.2807479}, year = {2015}, date = {2015-11-01}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {395-404}, abstract = {Mobile gaze-based interaction with multiple displays may occur from arbitrary positions and orientations. However, maintaining high gaze estimation accuracy in such situations remains a significant challenge. In this paper, we present GazeProjector, a system that combines (1) natural feature tracking on displays to determine the mobile eye tracker’s position relative to a display with (2) accurate point-of-gaze estimation. GazeProjector allows for seamless gaze estimation and interaction on multiple displays of arbitrary sizes independently of the user’s position and orientation to the display. In a user study with 12 participants we compare GazeProjector to established methods (here: visual on-screen markers and a state-of-the-art video-based motion capture system). We show that our approach is robust to varying head poses, orientations, and distances to the display, while still providing high gaze estimation accuracy across multiple displays without re-calibration for each variation. Our system represents an important step towards the vision of pervasive gaze-based interfaces.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Mobile gaze-based interaction with multiple displays may occur from arbitrary positions and orientations. However, maintaining high gaze estimation accuracy in such situations remains a significant challenge. In this paper, we present GazeProjector, a system that combines (1) natural feature tracking on displays to determine the mobile eye tracker’s position relative to a display with (2) accurate point-of-gaze estimation. GazeProjector allows for seamless gaze estimation and interaction on multiple displays of arbitrary sizes independently of the user’s position and orientation to the display. In a user study with 12 participants we compare GazeProjector to established methods (here: visual on-screen markers and a state-of-the-art video-based motion capture system). We show that our approach is robust to varying head poses, orientations, and distances to the display, while still providing high gaze estimation accuracy across multiple displays without re-calibration for each variation. Our system represents an important step towards the vision of pervasive gaze-based interfaces. |
GravitySpot: Guiding Users in Front of Public Displays Using On-Screen Visual Cues (UIST ’15)
![]() | Florian Alt; Andreas Bulling; Gino Gravanis; Daniel Buschek GravitySpot: Guiding Users in Front of Public Displays Using On-Screen Visual Cues Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 47-56, 2015. @inproceedings{Alt_UIST15, title = {GravitySpot: Guiding Users in Front of Public Displays Using On-Screen Visual Cues}, author = {Florian Alt and Andreas Bulling and Gino Gravanis and Daniel Buschek }, url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Alt_UIST15.pdf https://www.youtube.com/watch?v=laWfbOpQQ8A}, doi = {10.1145/2807442.2807490}, year = {2015}, date = {2015-11-05}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {47-56}, abstract = {Users tend to position themselves in front of interactive public displays in such a way as to best perceive its content. Currently, this sweet spot is implicitly defined by display properties, content, the input modality, as well as space constraints in front of the display. We present GravitySpot – an approach that makes sweet spots flexible by actively guiding users to arbitrary target positions in front of displays using visual cues. Such guidance is beneficial, for example, if a particular input technology only works at a specific distance or if users should be guided towards a non-crowded area of a large display. In two controlled lab studies (n=29) we evaluate different visual cues based on color, shape, and motion, as well as position-to-cue mapping functions. We show that both the visual cues and mapping functions allow for fine-grained control over positioning speed and accuracy. Findings are complemented by observations from a 3-month real-world deployment.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Users tend to position themselves in front of interactive public displays in such a way as to best perceive its content. Currently, this sweet spot is implicitly defined by display properties, content, the input modality, as well as space constraints in front of the display. We present GravitySpot – an approach that makes sweet spots flexible by actively guiding users to arbitrary target positions in front of displays using visual cues. Such guidance is beneficial, for example, if a particular input technology only works at a specific distance or if users should be guided towards a non-crowded area of a large display. In two controlled lab studies (n=29) we evaluate different visual cues based on color, shape, and motion, as well as position-to-cue mapping functions. We show that both the visual cues and mapping functions allow for fine-grained control over positioning speed and accuracy. Findings are complemented by observations from a 3-month real-world deployment. |
Orbits: Gaze Interaction for Smart Watches using Smooth Pursuit Eye Movements (UIST ’15)
![]() | Augusto Esteves; Eduardo Velloso; Andreas Bulling; Hans Gellersen Orbits: Gaze Interaction in Smart Watches using Moving Targets Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 457-466, 2015, (best paper award). @inproceedings{Esteves_UIST15, title = {Orbits: Gaze Interaction in Smart Watches using Moving Targets}, author = {Augusto Esteves and Eduardo Velloso and Andreas Bulling and Hans Gellersen}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/09/Esteves_UIST15.pdf https://www.youtube.com/watch?v=KEIgw5A0yfI http://www.wired.co.uk/news/archive/2016-01/22/eye-tracking-smartwatch}, doi = {10.1145/2807442.2807499}, year = {2015}, date = {2015-11-01}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {457-466}, abstract = {We introduce Orbits, a novel gaze interaction technique that enables hands-free input on smart watches. The technique relies on moving controls to leverage the smooth pursuit movements of the eyes and detect whether and at which control the user is looking at. In Orbits, controls include targets that move in a circular trajectory in the face of the watch, and can be selected by following the desired one for a small amount of time. We conducted two user studies to assess the technique’s recognition and robustness, which demonstrated how Orbits is robust against false positives triggered by natural eye movements and how it presents a hands-free, high accuracy way of interacting with smart watches using off-the-shelf devices. Finally, we developed three example interfaces built with Orbits: a music player, a notifications face plate and a missed call menu. Despite relying on moving controls – very unusual in current HCI interfaces – these were generally well received by participants in a third and final study.}, note = {best paper award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We introduce Orbits, a novel gaze interaction technique that enables hands-free input on smart watches. The technique relies on moving controls to leverage the smooth pursuit movements of the eyes and detect whether and at which control the user is looking at. In Orbits, controls include targets that move in a circular trajectory in the face of the watch, and can be selected by following the desired one for a small amount of time. We conducted two user studies to assess the technique’s recognition and robustness, which demonstrated how Orbits is robust against false positives triggered by natural eye movements and how it presents a hands-free, high accuracy way of interacting with smart watches using off-the-shelf devices. Finally, we developed three example interfaces built with Orbits: a music player, a notifications face plate and a missed call menu. Despite relying on moving controls – very unusual in current HCI interfaces – these were generally well received by participants in a third and final study. |
Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency (UIST ’15)
![]() | Yusuke Sugano; Andreas Bulling Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency Inproceedings Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015), pp. 363-372, 2015. @inproceedings{Sugano_UIST15, title = {Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency}, author = {Yusuke Sugano and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Sugano_UIST15.pdf https://www.youtube.com/watch?v=CvsZ3YCWFPk}, doi = {10.1145/2807442.2807445}, year = {2015}, date = {2015-11-05}, booktitle = {Proc. of the 28th ACM Symposium on User Interface Software and Technology (UIST 2015)}, pages = {363-372}, abstract = {Head-mounted eye tracking has significant potential for gaze-based applications such as life logging, mental health monitoring, or quantified self. However, a neglected challenge for such applications is that drift in the initial person-specific eye tracker calibration, for example caused by physical activity, can severely impact gaze estimation accuracy and, thus, system performance and user experience. We first analyse calibration drift on a new dataset of natural gaze data recorded using synchronised video-based and Electrooculography-based eye trackers of 20 users performing everyday activities in a mobile setting. Based on this analysis we present a method to automatically self-calibrate head-mounted eye trackers based on a computational model of bottom-up visual saliency. Through evaluations on the dataset we show that our method is 1) effective in reducing calibration drift in calibrated eye trackers and 2) given sufficient data, can achieve competitive gaze estimation accuracy to a calibrated eye tracker without any manual calibration.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Head-mounted eye tracking has significant potential for gaze-based applications such as life logging, mental health monitoring, or quantified self. However, a neglected challenge for such applications is that drift in the initial person-specific eye tracker calibration, for example caused by physical activity, can severely impact gaze estimation accuracy and, thus, system performance and user experience. We first analyse calibration drift on a new dataset of natural gaze data recorded using synchronised video-based and Electrooculography-based eye trackers of 20 users performing everyday activities in a mobile setting. Based on this analysis we present a method to automatically self-calibrate head-mounted eye trackers based on a computational model of bottom-up visual saliency. Through evaluations on the dataset we show that our method is 1) effective in reducing calibration drift in calibrated eye trackers and 2) given sufficient data, can achieve competitive gaze estimation accuracy to a calibrated eye tracker without any manual calibration. |
Analyzing Visual Attention During Whole Body Interaction with Public Displays (UbiComp ’15)
![]() | Robert Walter; Andreas Bulling; David Lindlbauer; Martin Schuessler; Jörg Müller Analyzing Visual Attention During Whole Body Interaction with Public Displays Inproceedings Proc. of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015), pp. 1263-1267, 2015. @inproceedings{Walter_Ubicomp15, title = {Analyzing Visual Attention During Whole Body Interaction with Public Displays}, author = {Robert Walter and Andreas Bulling and David Lindlbauer and Martin Schuessler and Jörg Müller}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/07/Walter_Ubicomp15.pdf https://www.youtube.com/watch?v=JlEnUyhQ1cY}, doi = {10.1145/2750858.280425}, year = {2015}, date = {2015-01-01}, booktitle = {Proc. of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015)}, pages = {1263-1267}, abstract = {While whole body interaction can enrich user experience on public displays, it remains unclear how common visualizations of user representations impact users' ability to perceive content on the display. In this work we use a head-mounted eye tracker to record visual behavior of 25 users interacting with a public display game that uses a silhouette user representation, mirroring the users' movements. Results from visual attention analysis as well as post-hoc recall and recognition tasks on display contents reveal that visual attention is mostly on users' silhouette while peripheral screen elements remain largely unattended. In our experiment, content attached to the user representation attracted significantly more attention than other screen contents, while content placed at the top and bottom of the screen attracted significantly less. Screen contents attached to the user representation were also significantly better remembered than those at the top and bottom of the screen.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } While whole body interaction can enrich user experience on public displays, it remains unclear how common visualizations of user representations impact users' ability to perceive content on the display. In this work we use a head-mounted eye tracker to record visual behavior of 25 users interacting with a public display game that uses a silhouette user representation, mirroring the users' movements. Results from visual attention analysis as well as post-hoc recall and recognition tasks on display contents reveal that visual attention is mostly on users' silhouette while peripheral screen elements remain largely unattended. In our experiment, content attached to the user representation attracted significantly more attention than other screen contents, while content placed at the top and bottom of the screen attracted significantly less. Screen contents attached to the user representation were also significantly better remembered than those at the top and bottom of the screen. |
Appearance-Based Gaze Estimation in the Wild (CVPR ’15)
![]() | Xucong Zhang; Yusuke Sugano; Mario Fritz; Andreas Bulling Appearance-Based Gaze Estimation in the Wild Inproceedings Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 4511-4520, 2015. @inproceedings{zhang15_cvpr, title = {Appearance-Based Gaze Estimation in the Wild}, author = {Xucong Zhang and Yusuke Sugano and Mario Fritz and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/04/zhang_CVPR15.pdf https://www.youtube.com/watch?v=rw6LZA1USG8 https://perceptual.mpi-inf.mpg.de/research/datasets/#zhang15_cvpr}, doi = {10.1109/CVPR.2015.7299081}, year = {2015}, date = {2015-03-02}, booktitle = {Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)}, pages = {4511-4520}, abstract = {Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild. |
GazeHorizon: Enabling Passers-by to Interact with Public Displays by Gaze (UbiComp ’14)
![]() | Yanxia Zhang; Hans Jörg Müller; Ming Ki Chong; Andreas Bulling; Hans Gellersen GazeHorizon: Enabling Passers-by to Interact with Public Displays by Gaze Inproceedings Proc. of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2014), pp. 559-563, 2014. @inproceedings{zhang14_ubicomp, title = {GazeHorizon: Enabling Passers-by to Interact with Public Displays by Gaze}, author = { Yanxia Zhang and Hans Jörg Müller and Ming Ki Chong and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2632048.2636071 https://perceptual.mpi-inf.mpg.de/files/2014/09/p559-zhang.pdf https://www.youtube.com/watch?v=zKsSeLvvsXU}, year = {2014}, date = {2014-09-13}, booktitle = {Proc. of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2014)}, pages = {559-563}, abstract = {Public displays can be made interactive by adding gaze control. However, gaze interfaces do not offer any physical affordance, and require users to move into a tracking range. We present GazeHorizon, a system that provides interactive assistance to enable passers-by to walk up to a display and to navigate content using their eyes only. The system was developed through field studies culminating in a four-day deployment in a public environment. Our results show that novice users can be facilitated to successfully use gaze control by making them aware of the interface at first glance and guiding them interactively into the tracking range.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Public displays can be made interactive by adding gaze control. However, gaze interfaces do not offer any physical affordance, and require users to move into a tracking range. We present GazeHorizon, a system that provides interactive assistance to enable passers-by to walk up to a display and to navigate content using their eyes only. The system was developed through field studies culminating in a four-day deployment in a public environment. Our results show that novice users can be facilitated to successfully use gaze control by making them aware of the interface at first glance and guiding them interactively into the tracking range. |
Pursuit calibration: making gaze calibration less tedious and more flexible. (UIST ’13)
![]() | Ken Pfeuffer; Mélodie Vidal; Jayson Turner; Andreas Bulling; Hans Gellersen Pursuit Calibration: Making Gaze Calibration Less Tedious and More Flexible Inproceedings Proc. of the 26th ACM Symposium on User Interface Software and Technology (UIST 2013), pp. 261-270 , 2013. @inproceedings{pfeuffer13_uist, title = {Pursuit Calibration: Making Gaze Calibration Less Tedious and More Flexible}, author = {Ken Pfeuffer and Mélodie Vidal and Jayson Turner and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2501988.2501998 https://perceptual.mpi-inf.mpg.de/files/2013/10/pfeuffer13_uist.pdf https://www.youtube.com/watch?v=T7S76L1Rkow}, year = {2013}, date = {2013-10-08}, booktitle = {Proc. of the 26th ACM Symposium on User Interface Software and Technology (UIST 2013)}, pages = { 261-270 }, abstract = {Eye gaze is a compelling interaction modality but requires a user calibration before interaction can commence. State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. We present a novel approach, pursuit calibration, that instead uses moving targets for calibration. Users naturally perform smooth pursuit eye movements when they follow a moving target, and we use correlation of eye and target movement to detect the users attention and to sample data for calibration. Because the method knows when the users is attending to a target, the calibration can be performed implicitly, which enables more flexible design of the calibration task. We demonstrate this in application examples and user studies, and show that pursuit calibration is tolerant to interruption, can blend naturally with applications, and is able to calibrate users without their awareness.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Eye gaze is a compelling interaction modality but requires a user calibration before interaction can commence. State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. We present a novel approach, pursuit calibration, that instead uses moving targets for calibration. Users naturally perform smooth pursuit eye movements when they follow a moving target, and we use correlation of eye and target movement to detect the users attention and to sample data for calibration. Because the method knows when the users is attending to a target, the calibration can be performed implicitly, which enables more flexible design of the calibration task. We demonstrate this in application examples and user studies, and show that pursuit calibration is tolerant to interruption, can blend naturally with applications, and is able to calibrate users without their awareness. |
Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets (UbiComp ’13)
![]() | Mélodie Vidal; Andreas Bulling; Hans Gellersen Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets Inproceedings Proc. of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2013), pp. 439-448 , 2013. @inproceedings{vidal13_ubicomp, title = {Pursuits: Spontaneous Interaction with Displays based on Smooth Pursuit Eye Movement and Moving Targets}, author = {Mélodie Vidal and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2493432.2493477 https://perceptual.mpi-inf.mpg.de/files/2013/10/vidal13_ubicomp.pdf https://www.youtube.com/watch?v=fpVPD_wQAWo}, year = {2013}, date = {2013-09-08}, booktitle = {Proc. of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2013)}, pages = { 439-448 }, abstract = {Although gaze is an attractive modality for pervasive interactions, the real-world implementation of eye-based interfaces poses significant challenges, such as calibration. We present Pursuits, an innovative interaction technique that enables truly spontaneous interaction with eye-based interfaces. A user can simply walk up to the screen and readily interact with moving targets. Instead of being based on gaze location, Pursuits correlates eye pursuit movements with objects dynamically moving on the interface. We evaluate the influence of target speed, number and trajectory and develop guidelines for designing Pursuits-based interfaces. We then describe six realistic usage scenarios and implement three of them to evaluate the method in a usability study and a field study. Our results show that Pursuits is a versatile and robust technique and that users can interact with Pursuits-based interfaces without prior knowledge or preparation phase.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Although gaze is an attractive modality for pervasive interactions, the real-world implementation of eye-based interfaces poses significant challenges, such as calibration. We present Pursuits, an innovative interaction technique that enables truly spontaneous interaction with eye-based interfaces. A user can simply walk up to the screen and readily interact with moving targets. Instead of being based on gaze location, Pursuits correlates eye pursuit movements with objects dynamically moving on the interface. We evaluate the influence of target speed, number and trajectory and develop guidelines for designing Pursuits-based interfaces. We then describe six realistic usage scenarios and implement three of them to evaluate the method in a usability study and a field study. Our results show that Pursuits is a versatile and robust technique and that users can interact with Pursuits-based interfaces without prior knowledge or preparation phase. |
SideWays: A Gaze Interface for Spontaneous Interaction with Situated Displays (CHI ’13)
![]() | Yanxia Zhang; Andreas Bulling; Hans Gellersen SideWays: A Gaze Interface for Spontaneous Interaction with Situated Displays Inproceedings Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (CHI 2013), pp. 851-860, ACM, New York, NY, USA, 2013, ISBN: 978-1-4503-1899-0. @inproceedings{zhang13_chi, title = {SideWays: A Gaze Interface for Spontaneous Interaction with Situated Displays}, author = {Yanxia Zhang and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2470654.2470775 https://perceptual.mpi-inf.mpg.de/files/2013/03/zhang13_chi.pdf https://www.youtube.com/watch?v=cucOArVoyV0}, isbn = {978-1-4503-1899-0}, year = {2013}, date = {2013-04-27}, booktitle = {Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (CHI 2013)}, pages = {851-860}, publisher = {ACM}, address = {New York, NY, USA}, abstract = {Eye gaze is compelling for interaction with situated displays as we naturally use our eyes to engage with them. In this work we present SideWays, a novel person-independent eye gaze interface that supports spontaneous interaction with displays: users can just walk up to a display and immediately interact using their eyes, without any prior user calibration or training. Requiring only a single off-the-shelf camera and lightweight image processing, SideWays robustly detects whether users attend to the centre of the display or cast glances to the left or right. The system supports an interaction model in which attention to the central display is the default state, while "sidelong glances" trigger input or actions. The robustness of the system and usability of the interaction model are validated in a study with 14 participants. Analysis of the participants' strategies in performing different tasks provides insights on gaze control strategies for design of SideWays applications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Eye gaze is compelling for interaction with situated displays as we naturally use our eyes to engage with them. In this work we present SideWays, a novel person-independent eye gaze interface that supports spontaneous interaction with displays: users can just walk up to a display and immediately interact using their eyes, without any prior user calibration or training. Requiring only a single off-the-shelf camera and lightweight image processing, SideWays robustly detects whether users attend to the centre of the display or cast glances to the left or right. The system supports an interaction model in which attention to the central display is the default state, while "sidelong glances" trigger input or actions. The robustness of the system and usability of the interaction model are validated in a study with 14 participants. Analysis of the participants' strategies in performing different tasks provides insights on gaze control strategies for design of SideWays applications. |
EyeContext: Recognition of High-level Contextual Cues from Human Visual Behaviour (CHI ’13)
![]() | Andreas Bulling; Christian Weichel; Hans Gellersen EyeContext: Recognition of High-level Contextual Cues from Human Visual Behaviour Inproceedings Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (CHI 2013), pp. 305-308, ACM, New York, NY, USA, 2013, ISBN: 978-1-4503-1899-0. @inproceedings{bulling13_chi, title = {EyeContext: Recognition of High-level Contextual Cues from Human Visual Behaviour}, author = {Andreas Bulling and Christian Weichel and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2470654.2470697 https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling13_chi.pdf https://www.youtube.com/watch?v=bhdVmWnnnIM}, doi = {10.1145/2470654.2470697}, isbn = {978-1-4503-1899-0}, year = {2013}, date = {2013-04-27}, booktitle = {Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (CHI 2013)}, pages = {305-308}, publisher = {ACM}, address = {New York, NY, USA}, abstract = {In this work we present EyeContext, a system to infer high-level contextual cues from human visual behaviour. We conducted a user study to record eye movements of four participants over a full day of their daily life, totalling 42.5 hours of eye movement data. Participants were asked to self-annotate four non-mutually exclusive cues: social (interacting with somebody vs. no interaction), cognitive (concentrated work vs. leisure), physical (physically active vs. not active), and spatial (inside vs. outside a building). We evaluate a proof-of-concept EyeContext system that combines encoding of eye movements into strings and a spectrum string kernel support vector machine (SVM) classifier. Our results demonstrate the large information content available in long-term human visual behaviour and opens up new venues for research on eye-based behavioural monitoring and life logging.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this work we present EyeContext, a system to infer high-level contextual cues from human visual behaviour. We conducted a user study to record eye movements of four participants over a full day of their daily life, totalling 42.5 hours of eye movement data. Participants were asked to self-annotate four non-mutually exclusive cues: social (interacting with somebody vs. no interaction), cognitive (concentrated work vs. leisure), physical (physically active vs. not active), and spatial (inside vs. outside a building). We evaluate a proof-of-concept EyeContext system that combines encoding of eye movements into strings and a spectrum string kernel support vector machine (SVM) classifier. Our results demonstrate the large information content available in long-term human visual behaviour and opens up new venues for research on eye-based behavioural monitoring and life logging. |
MotionMA: Motion Modelling and Analysis by Demonstration (CHI ’13)
![]() | Eduardo Velloso; Andreas Bulling; Hans Gellersen MotionMA: Motion Modelling and Analysis by Demonstration Inproceedings Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (CHI 2013), pp. 1309-1318, ACM, New York, NY, USA, 2013, ISBN: 978-1-4503-1899-0. @inproceedings{velloso13_chi, title = {MotionMA: Motion Modelling and Analysis by Demonstration}, author = {Eduardo Velloso and Andreas Bulling and Hans Gellersen}, url = {http://dx.doi.org/10.1145/2470654.2466171 https://perceptual.mpi-inf.mpg.de/files/2013/03/velloso13_chi.pdf https://www.youtube.com/watch?v=fFFWyt9LOhg}, isbn = {978-1-4503-1899-0}, year = {2013}, date = {2013-04-27}, booktitle = {Proc. of the 31st SIGCHI International Conference on Human Factors in Computing Systems (CHI 2013)}, pages = {1309-1318}, publisher = {ACM}, address = {New York, NY, USA}, abstract = {Particularly in sports or physical rehabilitation, users have to perform body movements in a specific manner for the exercises to be most effective. It remains a challenge for experts to specify how to perform such movements so that an automated system can analyse further performances of it. In a user study with 10 participants we show that experts' explicit estimates do not correspond to their performances. To address this issue we present MotionMA, a system that: (1) automatically extracts a model of movements demonstrated by one user, e.g. a trainer, (2) assesses the performance of other users repeating this movement in real time, and (3) provides real-time feedback on how to improve their performance. We evaluated the system in a second study in which 10 other participants used the system to demonstrate arbitrary movements. Our results demonstrate that MotionMA is able to extract an accurate movement model to spot mistakes and variations in movement execution.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Particularly in sports or physical rehabilitation, users have to perform body movements in a specific manner for the exercises to be most effective. It remains a challenge for experts to specify how to perform such movements so that an automated system can analyse further performances of it. In a user study with 10 participants we show that experts' explicit estimates do not correspond to their performances. To address this issue we present MotionMA, a system that: (1) automatically extracts a model of movements demonstrated by one user, e.g. a trainer, (2) assesses the performance of other users repeating this movement in real time, and (3) provides real-time feedback on how to improve their performance. We evaluated the system in a second study in which 10 other participants used the system to demonstrate arbitrary movements. Our results demonstrate that MotionMA is able to extract an accurate movement model to spot mistakes and variations in movement execution. |
A Dual Scene Camera Eye Tracker for Interaction with Public and Hand-held Displays (Pervasive ’12)
![]() | Jayson Turner; Andreas Bulling; Hans Gellersen A Dual Scene Camera Eye Tracker for Interaction with Public and Hand-held Displays Proceeding 2012. @proceedings{Turner12_pervasive, title = {A Dual Scene Camera Eye Tracker for Interaction with Public and Hand-held Displays}, author = {Jayson Turner and Andreas Bulling and Hans Gellersen}, url = {https://perceptual.mpi-inf.mpg.de/files/2015/11/Turner12.pdf}, year = {2012}, date = {2012-01-01}, booktitle = {Proc. of the 10th International Conference on Pervasive Computing (Pervasive 2012)}, abstract = {Advances in eye tracking technology have allowed gaze to become a viable input modality for pervasive displays. Hand-held devices are typically located below the visual field of a standard mobile eye tracker. To enable eye-based interaction with a public display and a hand-held device, we have developed a dual scene camera system with an extended field of view. Our system enables new interaction techniques that take advantage of gaze on remote and close proximity displays to select and move information for retrieval and manipulation.}, keywords = {}, pubstate = {published}, tppubtype = {proceedings} } Advances in eye tracking technology have allowed gaze to become a viable input modality for pervasive displays. Hand-held devices are typically located below the visual field of a standard mobile eye tracker. To enable eye-based interaction with a public display and a hand-held device, we have developed a dual scene camera system with an extended field of view. Our system enables new interaction techniques that take advantage of gaze on remote and close proximity displays to select and move information for retrieval and manipulation. |
Wearable EOG Goggles: Eye-Based Interaction in Everyday Environments (CHI ’09)
![]() | Andreas Bulling; Daniel Roggen; Gerhard Tröster Wearable EOG Goggles: Eye-Based Interaction in Everyday Environments Inproceedings Ext. Abstracts of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2009), pp. 3259–3264, ACM Press, Boston, United States, 2009, ISBN: 978-1-60558-247-4. @inproceedings{bulling09_chi, title = {Wearable EOG Goggles: Eye-Based Interaction in Everyday Environments}, author = {Andreas Bulling and Daniel Roggen and Gerhard Tröster}, url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling09_chi.pdf}, isbn = {978-1-60558-247-4}, year = {2009}, date = {2009-01-01}, booktitle = {Ext. Abstracts of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2009)}, pages = {3259--3264}, publisher = {ACM Press}, address = {Boston, United States}, abstract = {In this paper, we present an embedded eye tracker for context-awareness and eye-based human-computer interaction – the wearable EOG goggles. In contrast to common systems using video, this unobtrusive device relies on Electrooculography (EOG). It consists of goggles with dry electrodes integrated into the frame and a small pocket-worn component with a powerful microcontroller for EOG signal processing. Using this lightweight system, sequences of eye movements, so-called eye gestures, can be efficiently recognised from EOG signals in real-time for HCI purposes. The device is self-contained solution and allows for seamless eye motion sensing, context-recognition and eye-based interaction in everyday environments.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this paper, we present an embedded eye tracker for context-awareness and eye-based human-computer interaction – the wearable EOG goggles. In contrast to common systems using video, this unobtrusive device relies on Electrooculography (EOG). It consists of goggles with dry electrodes integrated into the frame and a small pocket-worn component with a powerful microcontroller for EOG signal processing. Using this lightweight system, sequences of eye movements, so-called eye gestures, can be efficiently recognised from EOG signals in real-time for HCI purposes. The device is self-contained solution and allows for seamless eye motion sensing, context-recognition and eye-based interaction in everyday environments. |