Privacy-Aware Eye Tracking Using Differential Privacy (MPIIDPEye)
With eye tracking being increasingly integrated into virtual and augmented reality (VR/AR) head-mounted displays, preserving users’ privacy is an ever more important, yet under-explored, topic in the eye tracking community. We report a large-scale online survey (N=124) on privacy aspects of eye tracking that provides the first comprehensive account of with whom, for which services, and to which extent users are willing to share their gaze data. Using these insights, we design a privacy-aware VR interface that uses differential privacy, which we evaluate on a new 20-participant dataset for two privacy sensitive tasks: We show that our method can prevent user re-identification and protect gender information while maintaining high performance for gaze-based document type classification. Our results highlight the privacy challenges particular to gaze data and demonstrate that differential privacy is a potential means to address them. Thus, this paper lays important foundations for future research on privacy-aware gaze interfaces.
The dataset consists of a .zip file with two folders (Eye_Tracking_Data and Eye_Movement_Features), a .csv file with the ground truth annotation (Ground_Truth.csv) and a Readme.txt file. In each folder there are two files for participant (P) for each recording (R = document class). These two files contain the recorded eye tracking data and the corresponding eye movement features. The data is saved as a .npy and .csv file. The data scheme of the eye tracking data and eye movement features is given in this Readme.txt file.
Download: Please download the full dataset here (64 MB). Contact: Julian Steil Campus E1.4, room 622, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Julian Steil; Inken Hagestedt; Michael Xuelin Huang; Andreas Bulling
@inproceedings{steil19_etra2,
title = {Privacy-Aware Eye Tracking Using Differential Privacy},
author = {Julian Steil and Inken Hagestedt and Michael Xuelin Huang and Andreas Bulling},
url = {//perceptual.mpi-inf.mpg.de/files/2019/04/steil19_etra2.pdf
//perceptual.mpi-inf.mpg.de/files/2019/04/steil19_etra2_supplementary_material.pdf},
doi = {10.1145/3314111.3319915},
year = {2019},
date = {2019-03-07},
booktitle = {Proc. International Symposium on Eye Tracking Research and Applications (ETRA)},
abstract = {With eye tracking being increasingly integrated into virtual and augmented reality (VR/AR) head-mounted displays, preserving users’ privacy is an ever more important, yet under-explored, topic in the eye tracking community. We report a large-scale online survey (N=124) on privacy aspects of eye tracking that provides the first comprehensive account of with whom, for which services, and to what extent users are willing to share their gaze data. Using these insights, we design a privacy-aware VR interface that uses differential privacy, which we evaluate on a new 20-participant dataset for two privacy sensitive tasks: We show that our method can prevent user re-identification and protect gender information while maintaining high performance for gaze-based document type classification. Our results highlight the privacy challenges particular to gaze data and demonstrate that differential privacy is a potential means to address them. Thus, this paper lays important foundations for future research on privacy-aware gaze interfaces.},
note = {best paper award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
With eye tracking being increasingly integrated into virtual and augmented reality (VR/AR) head-mounted displays, preserving users’ privacy is an ever more important, yet under-explored, topic in the eye tracking community. We report a large-scale online survey (N=124) on privacy aspects of eye tracking that provides the first comprehensive account of with whom, for which services, and to what extent users are willing to share their gaze data. Using these insights, we design a privacy-aware VR interface that uses differential privacy, which we evaluate on a new 20-participant dataset for two privacy sensitive tasks: We show that our method can prevent user re-identification and protect gender information while maintaining high performance for gaze-based document type classification. Our results highlight the privacy challenges particular to gaze data and demonstrate that differential privacy is a potential means to address them. Thus, this paper lays important foundations for future research on privacy-aware gaze interfaces.
PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features
Eyewear devices, such as augmented reality displays, increasingly integrate eye tracking, but the first-person camera required to map a user’s gaze to the visual scene can pose a significant threat to user and bystander privacy. We present PrivacEye, a method to detect privacy-sensitive everyday situations and automatically enable and disable the eye tracker’s first-person camera using a mechanical shutter. To close the shutter in privacy-sensitive situations, the method uses a deep representation of the first-person video combined with rich features that encode users’ eye movements. To open the shutter without visual input, PrivacEye detects changes in users’ eye movements alone to gauge changes in the ”privacy level” of the current situation. We evaluate our method on a first-person video dataset recorded in daily life situations of 17 participants, annotated by themselves for privacy sensitivity, and show that our method is effective in preserving privacy in this challenging setting.
The full dataset can be downloaded as .zip file (but also separately for each participant here). Each .zip file contains four folders. In each folder there is a Readme.txt with a separate annotation scheme for the contained files.
Data_Annotation
For each participant and each recording continuously recorded eye, scene, and IMU data as well as the corresponding ground truth annotation are saved as .csv, .npy, and .pkl (all three files include the same data).
Features_and_Ground_Truth
For each participant and each recording eye movement features (52) from a sliding window of 30 seconds and CNN features (68) extracted with a step size of 1 second are saved as .csv, .npy (both files include the same data). These data are not standardised. In a standardised form these data were used to train our SVM models.
Video_Frames_and_Ground_Truth
For each participant and each recording the scene frame number and corresponding ground truth annotation are saved as .csv, .npy (both files include the same data).
Private_Segments_Statistics
For each participant and each recording statistics of the number of private and non-private segments, average, min, max, and total segment time in minutes are saved as .csv, .npy (both files include the same data).
Download: Please download the full dataset here (2.6 GB). Contact: Julian Steil Campus E1.4, room 622, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Julian Steil; Marion Koelle; Wilko Heuten; Susanne Boll; Andreas Bulling
@inproceedings{steil19_etra,
title = {PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features},
author = {Julian Steil and Marion Koelle and Wilko Heuten and Susanne Boll and Andreas Bulling},
url = {//perceptual.mpi-inf.mpg.de/files/2019/04/steil19_etra.pdf
//perceptual.mpi-inf.mpg.de/files/2019/04/steil19_etra_supplementary_material.pdf},
doi = {10.1145/3314111.3319913},
year = {2019},
date = {2019-03-07},
booktitle = {Proc. International Symposium on Eye Tracking Research and Applications (ETRA)},
abstract = {Eyewear devices, such as augmented reality displays, increasingly integrate eye tracking, but the first-person camera required to map a user’s gaze to the visual scene can pose a significant threat to user and bystander privacy. We present PrivacEye, a method to detect privacy-sensitive everyday situations and automatically enable and disable the eye tracker’s first-person camera using a mechanical shutter. To close the shutter in privacy-sensitive situations, the method uses a deep representation of the first-person video combined with rich features that encode users’ eye movements. To open the shutter without visual input, PrivacEye detects changes in users’ eye movements alone to gauge changes in the “privacy level” of the current situation. We evaluate our method on a first-person video dataset recorded in daily life situations of 17 participants, annotated by themselves for privacy sensitivity, and show that our method is effective in preserving privacy in this challenging setting.},
note = {best video award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Eyewear devices, such as augmented reality displays, increasingly integrate eye tracking, but the first-person camera required to map a user’s gaze to the visual scene can pose a significant threat to user and bystander privacy. We present PrivacEye, a method to detect privacy-sensitive everyday situations and automatically enable and disable the eye tracker’s first-person camera using a mechanical shutter. To close the shutter in privacy-sensitive situations, the method uses a deep representation of the first-person video combined with rich features that encode users’ eye movements. To open the shutter without visual input, PrivacEye detects changes in users’ eye movements alone to gauge changes in the “privacy level” of the current situation. We evaluate our method on a first-person video dataset recorded in daily life situations of 17 participants, annotated by themselves for privacy sensitivity, and show that our method is effective in preserving privacy in this challenging setting.
Forecasting User Attention During Everyday Mobile Interactions Using Device-Integrated and Wearable (MPIIMobileAttention)
Visual attention is highly fragmented during mobile interactions, but the erratic nature of attention shifts currently limits attentive user interfaces to adapting after the fact, i.e. after shifts have already happened. We instead study attention forecasting – the challenging task of predicting users’ gaze behaviour (overt visual attention) in the near future. We present a novel long-term dataset of everyday mobile phone interactions, continuously recorded from 20 participants engaged in common activities on a university campus over 4.5 hours each (more than 90 hours in total). We propose a proof-of-concept method that uses device-integrated sensors and body-worn cameras to encode rich information on device usage and users’ visual scene. We demonstrate that our method can forecast bidirectional attention shifts and predict whether the primary attentional focus is on the handheld mobile device. We study the impact of different feature sets on performance and discuss the significant potential but also remaining challenges of forecasting user attention during mobile interactions.
The dataset consists of a .zip file with three files per participant for each of the three recording blocks (RB). Each recording block file is saved as a .pkl file which can be read with python using pandas. The data scheme of the 213 coulmns is given in this README.txt file.
Download: Please download the full dataset here (2.4 GB). Contact: Julian Steil Campus E1.4, room 622, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Julian Steil; Philipp Müller; Yusuke Sugano; Andreas Bulling
@inproceedings{steil18_mobilehci,
title = {Forecasting User Attention During Everyday Mobile Interactions Using Device-Integrated and Wearable Sensors},
author = {Julian Steil and Philipp Müller and Yusuke Sugano and Andreas Bulling},
url = {https://wp.mpi-inf.mpg.de/perceptual/files/2018/07/steil18_mobilehci.pdf},
doi = {10.1145/3229434.3229439},
year = {2018},
date = {2018-04-16},
booktitle = {Proc. International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI)},
pages = {1:1--1:13},
abstract = {Visual attention is highly fragmented during mobile interactions but the erratic nature of attention shifts currently limits attentive user interfaces to adapt after the fact, i.e. after shifts have already happened. We instead study attention forecasting – the challenging task of predicting users' gaze behavior (overt visual attention) in the near future. We present a novel long-term dataset of everyday mobile phone interactions, continuously recorded from 20 participants engaged in common activities on a university campus over 4.5 hours each (more than 90 hours in total). We propose a proof-of-concept method that uses device-integrated sensors and body-worn cameras to encode rich information on device usage and users' visual scene. We demonstrate that our method can forecast bidirectional attention shifts and whether the primary attentional focus is on the handheld mobile device. We study the impact of different feature sets on performance and discuss the significant potential but also remaining challenges of forecasting user attention during mobile interactions.},
note = {best paper award},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Visual attention is highly fragmented during mobile interactions but the erratic nature of attention shifts currently limits attentive user interfaces to adapt after the fact, i.e. after shifts have already happened. We instead study attention forecasting – the challenging task of predicting users' gaze behavior (overt visual attention) in the near future. We present a novel long-term dataset of everyday mobile phone interactions, continuously recorded from 20 participants engaged in common activities on a university campus over 4.5 hours each (more than 90 hours in total). We propose a proof-of-concept method that uses device-integrated sensors and body-worn cameras to encode rich information on device usage and users' visual scene. We demonstrate that our method can forecast bidirectional attention shifts and whether the primary attentional focus is on the handheld mobile device. We study the impact of different feature sets on performance and discuss the significant potential but also remaining challenges of forecasting user attention during mobile interactions.
Fixation Detection for Head-Mounted Eye Tracking Based on Visual Similarity of Gaze Targets (MPIIEgoFixation)
Fixations are widely analysed in human vision, gaze-based interaction, and experimental psychology research. However, robust fixation detection in mobile settings is profoundly challenging given the prevalence of user and gaze target motion. These movements feign a shift in gaze estimates in the frame of reference defined by the eye tracker’s scene camera. To address this challenge, we present a novel fixation detection method for head-mounted eye trackers. Our method exploits that, independent of user or gaze target motion, target appearance remains about the same during a fixation. It extracts image information from small regions around the current gaze position and analyses the appearance similarity of these gaze patches across video frames to detect fixations. We evaluate our method using fine-grained fixation annotations on a five-participant indoor dataset (MPIIEgoFixation) with more than 2,300 fixations in total. Our method outperforms commonly used velocity- and dispersion-based algorithms, which highlights its significant potential to analyse scene image information for eye movement detection.
We have evaluated our method on a recent mobile eye tracking dataset [Sugano and Bulling 2015]. This dataset is particularly suitable because participants walked around throughout the recording period. Walking leads to a large amount of head motion and scene dynamics, which is both challenging and interesting for our detection task. Since the dataset was not yet publicly available, we requested it directly from the authors. The eye tracking headset (Pupil [Kassner et al. 2014]) featured a 720p world camera as well as an infra-red eye camera equipped on an adjustable camera arm. Both cameras recorded at 30 Hz. Egocentric videos were recorded using the world camera and synchronised via hardware timestamps. Gaze estimates were given in the dataset.
The dataset consists of 5 folders (Indoor-Recordings: P1 (1. Recording), P2 (1. Recording), P3 (2. Recording), P4 (1. Recording), P5 (2. Recording)). Each folder consists of a data file as well as a ground truth file with fixation IDs, start and end frame of the corresponding scene video. Both files are available as .npy and .csv format.
Download: Please download the full dataset here (3.2 MB). Contact: Julian Steil Campus E1.4, room 622, E-mail: Videos: Can be requested here
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Julian Steil; Michael Xuelin Huang; Andreas Bulling
@inproceedings{steil18_etra,
title = {Fixation Detection for Head-Mounted Eye Tracking Based on Visual Similarity of Gaze Targets},
author = {Julian Steil and Michael Xuelin Huang and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2018/04/steil18_etra.pdf
https://perceptual.mpi-inf.mpg.de/research/datasets/#steil18_etra},
doi = {10.1145/3204493.3204538},
year = {2018},
date = {2018-03-28},
booktitle = {Proc. International Symposium on Eye Tracking Research and Applications (ETRA)},
pages = {23:1-23:9},
abstract = {Fixations are widely analysed in human vision, gaze-based interaction, and experimental psychology research. However, robust fixation detection in mobile settings is profoundly challenging given the prevalence of user and gaze target motion. These movements feign a shift in gaze estimates in the frame of reference defined by the eye tracker's scene camera. To address this challenge, we present a novel fixation detection method for head-mounted eye trackers. Our method exploits that, independent of user or gaze target motion, target appearance remains about the same during a fixation. It extracts image information from small regions around the current gaze position and analyses the appearance similarity of these gaze patches across video frames to detect fixations. We evaluate our method using fine-grained fixation annotations on a five-participant indoor dataset (MPIIEgoFixation) with more than 2,300 fixations in total. Our method outperforms commonly used velocity- and dispersion-based algorithms, which highlights its significant potential to analyse scene image information for eye movement detection.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Fixations are widely analysed in human vision, gaze-based interaction, and experimental psychology research. However, robust fixation detection in mobile settings is profoundly challenging given the prevalence of user and gaze target motion. These movements feign a shift in gaze estimates in the frame of reference defined by the eye tracker's scene camera. To address this challenge, we present a novel fixation detection method for head-mounted eye trackers. Our method exploits that, independent of user or gaze target motion, target appearance remains about the same during a fixation. It extracts image information from small regions around the current gaze position and analyses the appearance similarity of these gaze patches across video frames to detect fixations. We evaluate our method using fine-grained fixation annotations on a five-participant indoor dataset (MPIIEgoFixation) with more than 2,300 fixations in total. Our method outperforms commonly used velocity- and dispersion-based algorithms, which highlights its significant potential to analyse scene image information for eye movement detection.
InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation
Analysis of everyday human gaze behaviour has significant potential for ubiquitous computing, as evidenced by a large body of work in gaze-based human-computer interaction, attentive user interfaces, and eye-based user modelling. However, current mobile eye trackers are still obtrusive, which not only makes them uncomfortable to wear and socially unacceptable in daily life, but also prevents them from being widely adopted in the social and behavioural sciences. To address these challenges we present InvisibleEye, a novel approach for mobile eye tracking that uses millimetre-size RGB cameras that can be fully embedded into normal glasses frames. To compensate for the cameras’ low image resolution of only a few pixels, our approach uses multiple cameras to capture different views of the eye, as well as learning-based gaze estimation to directly regress from eye images to gaze directions. We prototypically implement our system and characterise its performance on three large-scale, increasingly realistic, and thus challenging datasets: 1) eye images synthesised using a recent computer graphics eye region model, 2) real eye images recorded of 17 participants under controlled lighting, and 3) eye images recorded of four participants over the course of four recording sessions in a mobile setting. We show that InvisibleEye achieves a top person-specific gaze estimation accuracy of 1.79° using four cameras with a resolution of only 5 × 5 pixels. Our evaluations not only demonstrate the feasibility of this novel approach but, more importantly, underline its significant potential for finally realising the vision of invisible mobile eye tracking and pervasive attentive user interfaces.
We used this first hardware prototype to record a dataset of more than 280,000 close-up eye images with ground truth annotation of the gaze location. A total of 17 participants were recorded, covering a wide range of appearances:
• Gender: Five (29%) female and 12 (71%) male
• Nationality: Seven (41%) German, seven (41%) Indian, one (6%) Bangladeshi, one (6%) Iranian,
and one (6%) Greek
• Eye Color: 12 (70%) brown, four (23%) blue, and one (5%) green
• Glasses: Four participants (23%) wore regular glasses and one (6%) wore contact lenses
For each participant, two sets of data were recorded: one set of training data and a separate set of test data. For each set, a series of gaze targets was shown on a display that participants were instructed to look at. For both training and test data the gaze targets covered a uniform grid in a random order, where the grid corresponding to the test data was positioned to lie in between the training points. Since the NanEye cameras record at about 44 FPS, we gathered approximately 22 frames per camera and gaze target. The training data was recorded using a uniform 24 × 17 grid of points, with an angular distance in gaze angle of 1.45° horizontally and 1.30° vertically between the points. In total the training set contained about 8,800 images per camera and participant. The test set’s points belonged to a 23 × 16 grid of points and it contains about 8,000 images per camera and participant. This way, the gaze targets covered a field of view of 35° horizontally and 22° vertically.
The dataset consists of 17 folders. Each folder contains the two subfolders for train and test sets with the video frames from each of the four NanEye cameras as well as a .npy file with pixel coordinates on the display.
Download: Please download the full dataset here (49.2 GB). Contact: Julian Steil Campus E1.4, room 622, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Marc Tonsen; Julian Steil; Yusuke Sugano; Andreas Bulling
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 1 (3), pp. 106:1-106:21, 2017, (distinguished paper award).
@article{tonsen17_imwut,
title = {InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation},
author = {Marc Tonsen and Julian Steil and Yusuke Sugano and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2017/08/tonsen17_imwut.pdf
https://perceptual.mpi-inf.mpg.de/research/datasets/#tonsen17_imwut},
doi = {10.1145/3130971},
year = {2017},
date = {2017-07-24},
journal = {Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)},
volume = {1},
number = {3},
pages = {106:1-106:21},
abstract = {Analysis of everyday human gaze behaviour has significant potential for ubiquitous computing, as evidenced by a large body of work in gaze-based human-computer interaction, attentive user interfaces, and eye-based user modelling. However, current mobile eye trackers are still obtrusive, which not only makes them uncomfortable to wear and socially unacceptable in daily life, but also prevents them from being widely adopted in the social and behavioural sciences. To address these challenges we present InvisibleEye, a novel approach for mobile eye tracking that uses millimetre-size RGB cameras that can be fully embedded into normal glasses frames. To compensate for the cameras’ low image resolution of only a few pixels, our approach uses multiple cameras to capture different views of the eye, as well as learning-based gaze estimation to directly regress from eye images to gaze directions. We prototypically implement our system and characterise its performance on three large-scale, increasingly realistic, and thus challenging datasets: 1) eye images synthesised using a recent computer graphics eye region model, 2) real eye images recorded of 17 participants under controlled lighting, and 3) eye images recorded of four participants over the course of four recording sessions in a mobile setting. We show that InvisibleEye achieves a top person-specific gaze estimation accuracy of 1.79° using four cameras with a resolution of only 5×5 pixels. Our evaluations not only demonstrate the feasibility of this novel approach but, more importantly, underline its significant potential for finally realising the vision of invisible mobile eye tracking and pervasive attentive user interfaces.},
note = {distinguished paper award},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Analysis of everyday human gaze behaviour has significant potential for ubiquitous computing, as evidenced by a large body of work in gaze-based human-computer interaction, attentive user interfaces, and eye-based user modelling. However, current mobile eye trackers are still obtrusive, which not only makes them uncomfortable to wear and socially unacceptable in daily life, but also prevents them from being widely adopted in the social and behavioural sciences. To address these challenges we present InvisibleEye, a novel approach for mobile eye tracking that uses millimetre-size RGB cameras that can be fully embedded into normal glasses frames. To compensate for the cameras’ low image resolution of only a few pixels, our approach uses multiple cameras to capture different views of the eye, as well as learning-based gaze estimation to directly regress from eye images to gaze directions. We prototypically implement our system and characterise its performance on three large-scale, increasingly realistic, and thus challenging datasets: 1) eye images synthesised using a recent computer graphics eye region model, 2) real eye images recorded of 17 participants under controlled lighting, and 3) eye images recorded of four participants over the course of four recording sessions in a mobile setting. We show that InvisibleEye achieves a top person-specific gaze estimation accuracy of 1.79° using four cameras with a resolution of only 5×5 pixels. Our evaluations not only demonstrate the feasibility of this novel approach but, more importantly, underline its significant potential for finally realising the vision of invisible mobile eye tracking and pervasive attentive user interfaces.
It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation
We present the MPIIFaceGaze dataset which is based on the MPIIGaze dataset, with the additional human facial landmark annotation and the face regions are available. We added additional facial landmark and pupil center annotations for 37,667 face images. Facial landmarks annotations were conducted in a semi-automatic manner as running facial landmark detection method first and then checking by two human annotators. The pupil centers were annotated by two human annotators from scratch. For sake of privacy, we only released the face region and blocked the background in images.
Download: Please download the full dataset here (940 MB).
You can also download the normalized data from here, which includes the normalized face images as 448*448 pixels size and 2D gaze angle vectors. Please note that the data normalization procedure changed the gaze direction label so that you need to convert results based on this normalized data to the original camera space.
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following papers:
Xucong Zhang; Yusuke Sugano; Mario Fritz; Andreas Bulling
@article{zhang18_pami,
title = {MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation},
author = {Xucong Zhang and Yusuke Sugano and Mario Fritz and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2018/04/zhang18_pami.pdf},
doi = {10.1109/TPAMI.2017.2778103},
year = {2019},
date = {2019-01-01},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
volume = {41},
number = {1},
pages = {162-175},
abstract = {Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.
@inproceedings{zhang17_cvprw,
title = {It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation},
author = {Xucong Zhang and Yusuke Sugano and Mario Fritz and Andreas Bulling},
url = {https://wp.mpi-inf.mpg.de/perceptual/files/2017/11/zhang_cvprw2017-6.pdf
https://perceptual.mpi-inf.mpg.de/research/datasets/#zhang17_cvprw},
doi = {10.1109/CVPRW.2017.284},
year = {2017},
date = {2017-05-18},
booktitle = {Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
pages = {2299-2308},
abstract = {Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and par- ticularly pronounced for the most challenging extreme head poses.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and par- ticularly pronounced for the most challenging extreme head poses.
Labeled pupils in the wild: A dataset for studying pupil detection in unconstrained environments
We present labelled pupils in the wild (LPW), a novel dataset of 66 high-quality, high-speed eye region videos for the development and evaluation of pupil detection algorithms. The videos in our dataset were recorded from 22 participants in everyday locations at about 95 FPS using a state-of-the-art dark-pupil head-mounted eye tracker. They cover people with different ethnicities, a diverse set of everyday indoor and outdoor illumination environments, as well as natural gaze direction distributions. The dataset also includes participants wearing glasses, contact lenses, as well as make-up. We benchmark five state-of-the-art pupil detection algorithms on our dataset with respect to robustness and accuracy. We further study the influence of image resolution, vision aids, as well as recording location (indoor, outdoor) on pupil detection performance. Our evaluations provide valuable insights into the general pupil detection problem and allow us to identify key challenges for robust pupil detection on head-mounted eye trackers.
Download: Please download the full dataset here (2.4 GB). Contact: Andreas Bulling Campus E1.4, room 628, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Marc Tonsen; Xucong Zhang; Yusuke Sugano; Andreas Bulling
@inproceedings{tonsen16_etra,
title = {Labelled pupils in the wild: A dataset for studying pupil detection in unconstrained environments},
author = {Marc Tonsen and Xucong Zhang and Yusuke Sugano and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2016/01/tonsen16_etra.pdf
https://perceptual.mpi-inf.mpg.de/research/datasets/#tonsen16_etra},
doi = {10.1145/2857491.2857520},
year = {2016},
date = {2016-01-01},
booktitle = {Proc. of the 9th ACM International Symposium on Eye Tracking Research & Applications (ETRA 2016)},
pages = {139-142},
abstract = {We present labelled pupils in the wild (LPW), a novel dataset of 66 high-quality, high-speed eye region videos for the development and evaluation of pupil detection algorithms. The videos in our dataset were recorded from 22 participants in everyday locations at about 95 FPS using a state-of-the-art dark-pupil head-mounted eye tracker. They cover people of different ethnicities and a diverse set of everyday indoor and outdoor illumination environments, as well as natural gaze direction distributions. The dataset also includes participants wearing glasses, contact lenses, and make-up. We bench- mark five state-of-the-art pupil detection algorithms on our dataset with respect to robustness and accuracy. We further study the influence of image resolution and vision aids as well as recording lo- cation (indoor, outdoor) on pupil detection performance. Our evaluations provide valuable insights into the general pupil detection problem and allow us to identify key challenges for robust pupil detection on head-mounted eye trackers.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
We present labelled pupils in the wild (LPW), a novel dataset of 66 high-quality, high-speed eye region videos for the development and evaluation of pupil detection algorithms. The videos in our dataset were recorded from 22 participants in everyday locations at about 95 FPS using a state-of-the-art dark-pupil head-mounted eye tracker. They cover people of different ethnicities and a diverse set of everyday indoor and outdoor illumination environments, as well as natural gaze direction distributions. The dataset also includes participants wearing glasses, contact lenses, and make-up. We bench- mark five state-of-the-art pupil detection algorithms on our dataset with respect to robustness and accuracy. We further study the influence of image resolution and vision aids as well as recording lo- cation (indoor, outdoor) on pupil detection performance. Our evaluations provide valuable insights into the general pupil detection problem and allow us to identify key challenges for robust pupil detection on head-mounted eye trackers.
3D Gaze Estimation from 2D Pupil Positions on Monocular Head-Mounted Eye Trackers
We collected eye tracking data from 14 participants aged between 22 and 29 years. 10 recordings were collected from each participant, 2 for each depth (calibration and test) at 5 different depths from a public display (1m, 1.25m, 1.5m, 1.75m and 2m). Display dimensions were 121.5cm × 68.7cm. We use a 5×5 grid pattern to disply 25 calibration points and an inner 4×4 grid for displaying 16 test points. This is done by randomly moving a target marker on these grid positions and capturing images from eye/scene camera at 30 Hz. We further perform marker detection using ArUco library on target points to compute their 3D coordinates w.r.t. scene camera. In addition, we are given the 2D position of pupil center in each frame of the eye-camera from a state-of-the-art dark-pupil head-mounted eye tracker (PUPIL). the eye tracker consists of a 1280×720 resolution scene camera and a 640×360 resolution eye camera. the PUPIL software used was v0.5.4.
Data is collected in an indoor setting and adds up to over 7 hours of eye tracking. Current dataset includes marker tracking results using ArUco per frame for every recording along with pupil tracking results from PUPIL eye tracker also for every frame of the eye video. We have also included camera intrinsic parameters for both eye camera and scene camera along with some post processed results such as frames corresponding to gaze intervals for every grid point. For more information on data format and how to use it please refer to the README file inside the dataset. In case you want to access the raw videos from both scene and eye camera please contact the authors.
Our evaluations on this data show the effectiveness of our new 2D-to-3D mapping approach together with multiple depth calibration data in reducing gaze estimation error. More information on this data and the analysis made can be found here.
Download: Please download the full dataset from here (81.4 MB). Contact: Andreas Bulling Campus E1.4, room 628, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Mohsen Mansouryar; Julian Steil; Yusuke Sugano; Andreas Bulling
@inproceedings{mansouryar16_etra,
title = {3D Gaze Estimation from 2D Pupil Positions on Monocular Head-Mounted Eye Trackers},
author = {Mohsen Mansouryar and Julian Steil and Yusuke Sugano and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2016/01/mansouryar16_etra.pdf
https://github.molgen.mpg.de/perceptual/etra2016_gazesim
https://perceptual.mpi-inf.mpg.de/research/datasets/#mansouryar16_etra},
doi = {10.1145/2857491.2857530},
year = {2016},
date = {2016-01-01},
booktitle = {Proc. of the 9th ACM International Symposium on Eye Tracking Research & Applications (ETRA 2016)},
pages = {197-200},
abstract = {3D gaze information is important for scene-centric attention analysis, but accurate estimation and analysis of 3D gaze in real-world environments remains challenging. We present a novel 3D gaze estimation method for monocular head-mounted eye trackers. In contrast to previous work, our method does not aim to infer 3D eye- ball poses, but directly maps 2D pupil positions to 3D gaze directions in scene camera coordinate space. We first provide a detailed discussion of the 3D gaze estimation task and summarize different methods, including our own. We then evaluate the performance of different 3D gaze estimation approaches using both simulated and real data. Through experimental validation, we demonstrate the effectiveness of our method in reducing parallax error, and we identify research challenges for the design of 3D calibration procedures.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
3D gaze information is important for scene-centric attention analysis, but accurate estimation and analysis of 3D gaze in real-world environments remains challenging. We present a novel 3D gaze estimation method for monocular head-mounted eye trackers. In contrast to previous work, our method does not aim to infer 3D eye- ball poses, but directly maps 2D pupil positions to 3D gaze directions in scene camera coordinate space. We first provide a detailed discussion of the 3D gaze estimation task and summarize different methods, including our own. We then evaluate the performance of different 3D gaze estimation approaches using both simulated and real data. Through experimental validation, we demonstrate the effectiveness of our method in reducing parallax error, and we identify research challenges for the design of 3D calibration procedures.
Discovery of Everyday Human Activities From Long-term Visual Behaviour Using Topic Models
We recruited 10 participants (three female) aged between 17 and 25 years through university mailing lists and adverts in university buildings. Most participants were bachelor’s and master’s students in computer science and chemistry. None of them had previous experience with eye tracking. After arriving in the lab, participants were first introduced to the purpose and goals of the study and could familiarise themselves with the recording system. In particular, we showed them how to start and stop the recording software, how to run the calibration procedure, and how to restart the recording. We then asked them to take the system home and wear it continuously for a full day from morning to evening. We asked participants to plug in and recharge the laptop during prolonged stationary activities, such as at their work desk. We did not impose any other restrictions on these recordings, such as which day of the week to record or which activities to perform, etc.
The recording system consisted of a Lenovo Thinkpad X220 laptop, an additional 1TB hard drive and battery pack, as well as an external USB hub. Gaze data was collected using a PUPIL head-mounted eye tracker connected to the laptop via USB. The eye tracker features two cameras: one eye camera with a resolution of 640×360 pixels recording a video of the right eye from close proximity, as well as an egocentric (scene) camera with a resolution of 1280×720 pixels. Both cameras record at 30 Hz. The battery lifetime of the system was four hours. We implemented custom recording software with a particular focus on ease of use as well as the ability to easily restart a recording if needed.
We recorded a dataset of more than 80 hours of eye tracking data. The dataset comprises 7.8 hours of outdoor activities, 14.3 hours of social interaction, 31.3 hours of focused work, 8.3 hours of travel, 39.5 hours of reading, 28.7 hours of computer work, 18.3 hours of watching media, 7 hours of eating, and 11.4 hours of other (special) activities. Note that annotations are not mutually exclusive, i.e. these durations should be seen independently and sum up to more than the actual dataset size.
The dataset consists of 20 files. Ten files contain the long-term eye movement data of the ten recorded participants of this study. The other ten files describe the corresponding ground truth annotations.
Download: Please download the full dataset here (457.8 MB). Contact: Julian Steil Campus E1.4, room 622, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
@inproceedings{Steil_Ubicomp15,
title = {Discovery of Everyday Human Activities From Long-Term Visual Behaviour Using Topic Models},
author = {Julian Steil and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2015/08/Steil_Ubicomp15.pdf
https://perceptual.mpi-inf.mpg.de/research/datasets/#steil15_ubicomp},
doi = {10.1145/2750858.2807520},
year = {2015},
date = {2015-05-21},
booktitle = {Proc. of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015)},
pages = {75-85},
abstract = {Human visual behaviour has significant potential for activity recognition and computational behaviour analysis, but previous works focused on supervised methods and recognition of predefined activity classes based on short-term eye movement recordings. We propose a fully unsupervised method to discover users' everyday activities from their long-term visual behaviour. Our method combines a bag-of-words representation of visual behaviour that encodes saccades, fixations, and blinks with a latent Dirichlet allocation (LDA) topic model.
We further propose different methods to encode saccades for their use in the topic model. We evaluate our method on a novel long-term gaze dataset that contains full-day recordings of natural visual behaviour
of 10 participants (more than 80 hours in total). We also provide annotations for eight sample activity classes (outdoor, social interaction, focused work, travel, reading, computer work, watching media, eating) and periods with no specific activity. We show the ability of our method to discover these activities with performance competitive with that of previously published supervised methods.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Human visual behaviour has significant potential for activity recognition and computational behaviour analysis, but previous works focused on supervised methods and recognition of predefined activity classes based on short-term eye movement recordings. We propose a fully unsupervised method to discover users' everyday activities from their long-term visual behaviour. Our method combines a bag-of-words representation of visual behaviour that encodes saccades, fixations, and blinks with a latent Dirichlet allocation (LDA) topic model.
We further propose different methods to encode saccades for their use in the topic model. We evaluate our method on a novel long-term gaze dataset that contains full-day recordings of natural visual behaviour
of 10 participants (more than 80 hours in total). We also provide annotations for eight sample activity classes (outdoor, social interaction, focused work, travel, reading, computer work, watching media, eating) and periods with no specific activity. We show the ability of our method to discover these activities with performance competitive with that of previously published supervised methods.
We present the MPIIGaze dataset that contains 213,659 images that we collected from 15 participants during natural everyday laptop use over more than three months. The number of images collected by each participant varied from 34,745 to 1,498. Our dataset is significantly more variable than existing ones with respect to appearance and illumination.
The dataset contains three parts: “Data”, “Evaluation Subset” and “Annotation subset”.
The “Data” includes “Original”, “Normalized” and “Calibration” for all the 15 participants.
The “Evaluation Subset” contains the image list that indicates the selected samples for the evaluation subset in our paper.
The “Annotation Subset” contains the image list that indicates 10,848 samples that we manually annotated, following the annotations with (x, y) position of 6 facial landmarks (four eye corners, two mouth corners) and (x,y) position of two pupil centers for each of above images.
@article{zhang18_pami,
title = {MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation},
author = {Xucong Zhang and Yusuke Sugano and Mario Fritz and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2018/04/zhang18_pami.pdf},
doi = {10.1109/TPAMI.2017.2778103},
year = {2019},
date = {2019-01-01},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
volume = {41},
number = {1},
pages = {162-175},
abstract = {Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.
@inproceedings{zhang15_cvpr,
title = {Appearance-Based Gaze Estimation in the Wild},
author = {Xucong Zhang and Yusuke Sugano and Mario Fritz and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2015/04/zhang_CVPR15.pdf
https://www.youtube.com/watch?v=rw6LZA1USG8
https://perceptual.mpi-inf.mpg.de/research/datasets/#zhang15_cvpr},
doi = {10.1109/CVPR.2015.7299081},
year = {2015},
date = {2015-03-02},
booktitle = {Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)},
pages = {4511-4520},
abstract = {Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild.
Prediction of Search Targets From Fixations in Open-World Settings
We recorded fixation data of 18 participants (nine male) with different nationalities and aged between 18 and 30 years. The eyesight of nine participants was impaired but corrected with contact lenses or glasses.
To record gaze data we used a stationary Tobii TX300 eye tracker that provides binocular gaze data at a sampling frequency of 300Hz. Parameters for fixation detection were left at their defaults: fixation duration was set to 60ms while the maximum time between fixations was set to 75ms.The stimuli were shown on a 30 inch screen with a resolution of 2560×1600 pixels.Participants were randomly assigned to search for targets for one of the three stimulus types.
The dataset contains three categories: “Amazon”, “O’Reilly” and “Mugshots”. For each category, there is a folder that contains 4 subfolder: search targets, Collages, Gaze data and binary mask that we used to get the position of each individual image in the collages.
In the subfolder search targets you can find the 5 single images. The participants were looking for this image in the collages.
In the folder Collages there are 5 subfolder. Subfolder with the same name as the search target indicate that users saw those collages for the search target. There are 20 collages per search target.
In the folder gaze data you can find Media name, Fixation order, Fixation position on the screen, pupil size for left and right eye.
Download: Please download the full dataset here (374.9 MB). Contact: Hosnieh Sattar Campus E1.4, room 608, E-mail:
The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:
Hosnieh Sattar; Sabine Müller; Mario Fritz; Andreas Bulling
@inproceedings{sattar15_cvpr,
title = {Prediction of Search Targets From Fixations in Open-World Settings},
author = {Hosnieh Sattar and Sabine Müller and Mario Fritz and Andreas Bulling},
url = {https://perceptual.mpi-inf.mpg.de/files/2015/04/sattar15_cvpr.pdf
https://perceptual.mpi-inf.mpg.de/research/datasets/#sattar15_cvpr},
doi = {10.1109/CVPR.2015.7298700},
year = {2015},
date = {2015-03-02},
booktitle = {Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)},
pages = {981-990},
abstract = {Previous work on predicting the target of visual search from human fixations only considered closed-world settings in which training labels are available and predictions are performed for a known set of potential targets. In this work we go beyond the state of the art by studying search target prediction in an open-world setting in which we no longer assume that we have fixation data to train for the search targets. We present a dataset containing fixation data of 18 users searching for natural images from three image categories within synthesised image collages of about 80 images. In a closed-world baseline experiment we show that we can predict the correct target image out of a candidate set of five images. We then present a new problem formulation for search target prediction in the open-world setting that is based on learning compatibilities between fixations and potential targets.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Previous work on predicting the target of visual search from human fixations only considered closed-world settings in which training labels are available and predictions are performed for a known set of potential targets. In this work we go beyond the state of the art by studying search target prediction in an open-world setting in which we no longer assume that we have fixation data to train for the search targets. We present a dataset containing fixation data of 18 users searching for natural images from three image categories within synthesised image collages of about 80 images. In a closed-world baseline experiment we show that we can predict the correct target image out of a candidate set of five images. We then present a new problem formulation for search target prediction in the open-world setting that is based on learning compatibilities between fixations and potential targets.
Recognition of Visual Memory Recall Processes Using Eye Movement Analysis
This dataset was recorded to investigate the feasibility of recognising visual memory recall from eye movements. Eye movement data was recorded of participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. The study was designed with two objectives in mind: (1) to elicit distinct eye movements by using a large screen and well-defined visual stimuli, and (2) to record natural visual behaviour without any active visual search or memory task by not asking participants for real-time feedback.
The dataset has the following characteristics:
~7 hours of eye movement data recorded using a wearable Electrooculography (EOG) system
7 participants (3 female, 4 male), aged between 25 and 29 years
one experimental run for each participant, involving them to look at four continuous, random sequences of pictures (exposure time for each picture 10s). Within each sequence, 12 pictures were presented only once; five others were presented four times at regular intervals. In between each exposure, a picture with Gaussian noise was shown for five seconds as a baseline measurement.
separate horizontal and vertical EOG channels, joint sampling frequency of 128Hz
fully ground truth annotated for picture type (repeated, non-repeated) and picture category
Download: Please download the full dataset here (25.3 MB). Contact:Andreas Bulling, Campus E1.4, room 628, E-mail:
If you use this dataset in your work, please cite:
@inproceedings{bulling11_ubicomp,
title = {Recognition of Visual Memory Recall Processes Using Eye Movement Analysis},
author = {Andreas Bulling and Daniel Roggen},
url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling11_ubicomp.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proc. of the 13th International Conference on Ubiquitous Computing (UbiComp 2011)},
pages = {455-464},
abstract = {Physical activity, location, as well as a person's psychophysiological and affective state are common dimensions for developing context-aware systems in ubiquitous computing. An important yet missing contextual dimension is the cognitive context that comprises all aspects related to mental information processing, such as perception, memory, knowledge, or learning. In this work we investigate the feasibility of recognising visual memory recall. We use a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the methodology in a dual user study with a total of fourteen participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. Using person-independent training, we are able to discriminate between familiar and unfamiliar abstract pictures with a top recognition rate of 84.3% (89.3% recall, 21.0% false positive rate) over all participants. We show that eye movement analysis is a promising approach to infer the cognitive context of a person and discuss the key challenges for the real-world implementation of eye-based cognition-aware systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Physical activity, location, as well as a person's psychophysiological and affective state are common dimensions for developing context-aware systems in ubiquitous computing. An important yet missing contextual dimension is the cognitive context that comprises all aspects related to mental information processing, such as perception, memory, knowledge, or learning. In this work we investigate the feasibility of recognising visual memory recall. We use a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the methodology in a dual user study with a total of fourteen participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. Using person-independent training, we are able to discriminate between familiar and unfamiliar abstract pictures with a top recognition rate of 84.3% (89.3% recall, 21.0% false positive rate) over all participants. We show that eye movement analysis is a promising approach to infer the cognitive context of a person and discuss the key challenges for the real-world implementation of eye-based cognition-aware systems.
Eye Movement Analysis for Activity Recognition Using Electrooculography
This dataset was recorded to investigate the problem of recognising common office activities from eye movements. The experimental scenario involved five office-based activities – copying a text, reading a printed paper, taking handwritten notes, watching a video, and browsing the Web – and periods during which participants took a rest (the NULL class).
The dataset has the following characteristics:
~8 hours of eye movement data recorded using a wearable Electrooculography (EOG) system
8 participants (2 female, 6 male), aged between 23 and 31 years
2 experimental runs for each participant, each run involving them in a sequence of five different, randomly ordered office activities and a period of rest
separate horizontal and vertical EOG channels, joint sampling frequency of 128Hz
fully ground truth annotated (5 activity classes plus NULL)
Download: Please download the full dataset here (20.9 MB). Contact:Andreas Bulling, Campus E1.4, room 628, E-mail:
If you use this dataset in your work, please cite:
Andreas Bulling; Jamie A. Ward; Hans Gellersen; Gerhard Tröster
@article{bulling11_pami,
title = {Eye Movement Analysis for Activity Recognition Using Electrooculography},
author = {Andreas Bulling and Jamie A. Ward and Hans Gellersen and Gerhard Tröster},
url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling11_pami.pdf
http://doi.ieeecomputersociety.org/10.1109/TPAMI.2010.86},
year = {2011},
date = {2011-01-01},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume = {33},
number = {4},
pages = {741-753},
abstract = {In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data was recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals - saccades, fixations, and blinks - and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video, and browsing the web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data was recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals - saccades, fixations, and blinks - and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video, and browsing the web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.
@inproceedings{bulling09_ubicomp,
title = {Eye Movement Analysis for Activity Recognition},
author = {Andreas Bulling and Jamie A. Ward and Hans Gellersen and Gerhard Tröster},
url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling09_ubicomp.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proc. of the 11th International Conference on Ubiquitous Computing (UbiComp 2009)},
pages = {41--50},
abstract = {In this work we investigate eye movement analysis as a new modality for recognising human activity. We devise 90 different features based on the main eye movement characteristics: saccades, fixations and blinks. The features are derived from eye movement data recorded using a wearable electrooculographic (EOG) system. We describe a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the method in an eight participant study in an office environment using five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video and browsing the web. In addition, we include periods with no specific activity. Using a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. We discuss the most relevant features and show that eye movement analysis is a rich and thus promising modality for activity recognition.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this work we investigate eye movement analysis as a new modality for recognising human activity. We devise 90 different features based on the main eye movement characteristics: saccades, fixations and blinks. The features are derived from eye movement data recorded using a wearable electrooculographic (EOG) system. We describe a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the method in an eight participant study in an office environment using five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video and browsing the web. In addition, we include periods with no specific activity. Using a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. We discuss the most relevant features and show that eye movement analysis is a rich and thus promising modality for activity recognition.
Robust Recognition of Reading Activity in Transit Using Wearable Electrooculography
This dataset was recorded to investigate the problem of recognising reading activity from eye movements. The experimental setup was designed with two main objectives in mind: (1) to record eye movements in an unobtrusive manner in a mobile real-world setting, and (2) to evaluate how well reading can be recognised for persons in transit. We defined a scenario of travelling to and from work containing a semi-naturalistic set of reading activities. It involved subjects reading freely chosen text without pictures while engaged in a sequence of activities such as sitting at a desk, walking along a corridor, walking along a street, waiting at a tram stop and riding a tram.
The dataset has the following characteristics:
~6 hours of eye movement data recorded using a wearable Electrooculography (EOG) system
8 participants (4 female, 4 male), aged between 23 and 35 years
4 experimental runs for each participant: calibration (walking around a circular corridor for approximately 2 minutes while reading continuously), baseline (walk and tram ride to and from work without any reading), two runs of reading in the same scenario
separate horizontal and vertical EOG channels, joint sampling frequency of 128Hz
fully ground truth annotated (reading vs. not reading) using a wireless Wii Remote controller
Download: Please download the full dataset here (20.2 MB). Contact:Andreas Bulling, Campus E1.4, room 628, E-mail:
If you use this dataset in your work, please cite:
@article{bulling12_tap,
title = {Multimodal Recognition of Reading Activity in Transit Using Body-Worn Sensors},
author = {Andreas Bulling and Jamie A. Ward and Hans Gellersen},
url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling12_tap.pdf},
doi = {10.1145/2134203.2134205},
year = {2012},
date = {2012-01-01},
journal = {ACM Transactions on Applied Perception},
volume = {9},
number = {1},
pages = {2:1--2:21},
abstract = {Reading is one of the most well studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognise reading activity by jointly analysing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognising reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognising reading events by more than 24%. Our work demonstrates that the joint analysis of multiple modalities is beneficial for reading recognition and opens up discussion on the wider applicability of this recognition approach to other visual and physical activities.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Reading is one of the most well studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognise reading activity by jointly analysing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognising reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognising reading events by more than 24%. Our work demonstrates that the joint analysis of multiple modalities is beneficial for reading recognition and opens up discussion on the wider applicability of this recognition approach to other visual and physical activities.
@inproceedings{bulling08_pervasive,
title = {Robust Recognition of Reading Activity in Transit Using Wearable Electrooculography},
author = {Andreas Bulling and Jamie A. Ward and Hans Gellersen and Gerhard Tröster},
url = {https://perceptual.mpi-inf.mpg.de/files/2013/03/bulling08_pervasive.pdf},
doi = {10.1007/978-3-540-79576-6_2},
year = {2008},
date = {2008-01-01},
booktitle = {Proc. of the 6th International Conference on Pervasive Computing (Pervasive 2008)},
pages = {19--37},
abstract = {In this work we analyse the eye movements of people in transit in an everyday environment using a wearable electrooculographic (EOG) system. We compare three approaches for continuous recognition of reading activities: a string matching algorithm which exploits typical characteristics of reading signals, such as saccades and fixations; and two variants of Hidden Markov Models (HMMs) - mixed Gaussian and discrete. The recognition algorithms are evaluated in an experiment performed with eight subjects reading freely chosen text without pictures while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. A total dataset of roughly 6 hours was collected with reading activity accounting for about half of the time. We were able to detect reading activities over all subjects with a top recognition rate of 80.2% (71.0% recall, 11.6% false positives) using string matching. We show that EOG is a potentially robust technique for reading recognition across a number of typical daily situations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In this work we analyse the eye movements of people in transit in an everyday environment using a wearable electrooculographic (EOG) system. We compare three approaches for continuous recognition of reading activities: a string matching algorithm which exploits typical characteristics of reading signals, such as saccades and fixations; and two variants of Hidden Markov Models (HMMs) - mixed Gaussian and discrete. The recognition algorithms are evaluated in an experiment performed with eight subjects reading freely chosen text without pictures while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. A total dataset of roughly 6 hours was collected with reading activity accounting for about half of the time. We were able to detect reading activities over all subjects with a top recognition rate of 80.2% (71.0% recall, 11.6% false positives) using string matching. We show that EOG is a potentially robust technique for reading recognition across a number of typical daily situations.