3 papers at CHI 2018
We will present the following three papers at the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2018), two with our colleagues at LMU Munich:
![]() | Mohamed Khamis; Anita Baier; Niels Henze; Florian Alt; Andreas Bulling Understanding Face and Eye Visibility in Front-Facing Cameras of Smartphones used in the Wild Inproceedings Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 280:1-280:12, 2018. @inproceedings{khamis18a_chi, title = {Understanding Face and Eye Visibility in Front-Facing Cameras of Smartphones used in the Wild}, author = {Mohamed Khamis and Anita Baier and Niels Henze and Florian Alt and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2018/01/khamis18a_chi.pdf https://www.youtube.com/watch?v=_L6FyzTjFG0}, doi = {10.1145/3173574.3173854}, year = {2018}, date = {2018-01-01}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, journal = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, pages = {280:1-280:12}, abstract = {Commodity mobile devices are now equipped with high-resolution front-facing cameras, paving the way for applications in biometrics, facial expression analysis, or gaze interaction. However, it is unknown how often users hold devices in a way that allows capturing their face or eyes, and how this impacts detection accuracy. We collected 25,726 in-the-wild photos taken from the front-facing camera of smartphones and associated application usage logs. We found that the full face is visible about 29% of the time, and that in most cases the face is only partially visible. We further identified an influence of users' current activity; for example, when watching videos, the eyes but not the entire face are visible 75% of the time in our dataset. We found that state-of-the-art face detection algorithms perform poorly against photos taken from front-facing cameras. We discuss how these findings impact mobile applications that leverage face and eye detection, and derive practical implications to address state-of-the art's limitations.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Commodity mobile devices are now equipped with high-resolution front-facing cameras, paving the way for applications in biometrics, facial expression analysis, or gaze interaction. However, it is unknown how often users hold devices in a way that allows capturing their face or eyes, and how this impacts detection accuracy. We collected 25,726 in-the-wild photos taken from the front-facing camera of smartphones and associated application usage logs. We found that the full face is visible about 29% of the time, and that in most cases the face is only partially visible. We further identified an influence of users' current activity; for example, when watching videos, the eyes but not the entire face are visible 75% of the time in our dataset. We found that state-of-the-art face detection algorithms perform poorly against photos taken from front-facing cameras. We discuss how these findings impact mobile applications that leverage face and eye detection, and derive practical implications to address state-of-the art's limitations. |
![]() | Mohamed Khamis; Christian Becker; Andreas Bulling; Florian Alt Which one is me? Identifying Oneself on Public Displays Inproceedings Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 287:1-287:12, 2018, (best paper honourable mention award). @inproceedings{khamis18b_chi, title = {Which one is me? Identifying Oneself on Public Displays}, author = {Mohamed Khamis and Christian Becker and Andreas Bulling and Florian Alt}, url = {https://perceptual.mpi-inf.mpg.de/files/2018/01/khamis18b_chi.pdf https://www.youtube.com/watch?v=yG5_RBrnRx0}, doi = {10.1145/3173574.3173861}, year = {2018}, date = {2018-01-01}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, journal = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, pages = {287:1-287:12}, abstract = {While user representations are extensively used on public displays, it remains unclear how well users can recognize their own representation among those of surrounding users. We study the most widely used representations: abstract objects, skeletons, silhouettes and mirrors. In a prestudy (N=12), we identify five strategies that users follow to recognize themselves on public displays. In a second study (N=19), we quantify the users' recognition time and accuracy with respect to each representation type. Our findings suggest that there is a significant effect of (1) the representation type, (2) the strategies performed by users, and (3) the combination of both on recognition time and accuracy. We discuss the suitability of each representation for different settings and provide specific recommendations as to how user representations should be applied in multi-user scenarios. These recommendations guide practitioners and researchers in selecting the representation that optimizes the most for the deployment's requirements, and for the user strategies that are feasible in that environment.}, note = {best paper honourable mention award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } While user representations are extensively used on public displays, it remains unclear how well users can recognize their own representation among those of surrounding users. We study the most widely used representations: abstract objects, skeletons, silhouettes and mirrors. In a prestudy (N=12), we identify five strategies that users follow to recognize themselves on public displays. In a second study (N=19), we quantify the users' recognition time and accuracy with respect to each representation type. Our findings suggest that there is a significant effect of (1) the representation type, (2) the strategies performed by users, and (3) the combination of both on recognition time and accuracy. We discuss the suitability of each representation for different settings and provide specific recommendations as to how user representations should be applied in multi-user scenarios. These recommendations guide practitioners and researchers in selecting the representation that optimizes the most for the deployment's requirements, and for the user strategies that are feasible in that environment. |
![]() | Xucong Zhang; Michael Xuelin Huang; Yusuke Sugano; Andreas Bulling Training Person-Specific Gaze Estimators from Interactions with Multiple Devices Inproceedings Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 624:1-624:12, 2018. @inproceedings{zhang18_chi, title = {Training Person-Specific Gaze Estimators from Interactions with Multiple Devices}, author = {Xucong Zhang and Michael Xuelin Huang and Yusuke Sugano and Andreas Bulling}, url = {https://perceptual.mpi-inf.mpg.de/files/2018/02/zhang18_chi.pdf}, doi = {10.1145/3173574.3174198}, year = {2018}, date = {2018-01-01}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, journal = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, pages = {624:1-624:12}, abstract = {Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input. |