Comparing humans and deep neural networks on face recognition under various distance and rotation viewing conditions
Arslan , Şuayb Şefik
Groth, Matt J
MetadataShow full item record
CitationFux, M., Arslan, S. S., Jang, H., Boix, X., Cooper, A., Groth, M. J., & Sinha, P. (2023). Comparing Humans and Deep Neural Networks on face recognition under various distance and rotation viewing conditions. Journal of Vision, 23(9), 5916-5916.
Humans possess impressive skills for recognizing faces even when the viewing conditions are challenging, such as long ranges, non-frontal regard, variable lighting, and atmospheric turbulence. We sought to characterize the effects of such viewing conditions on the face recognition performance of humans, and compared the results to those of DNNs. In an online verification task study, we used a 100 identity face database, with images captured at five different distances (2m, 5m, 300m, 650m and 1000m) three pitch values (00 - straight ahead, +/- 30 degrees) and three levels of yaw (00, 45, and 90 degrees). Participants were presented with 175 trials (5 distances x 7 yaw and pitch combinations, with 5 repetitions). Each trial included a query image, from a certain combination of range x yaw x pitch, and five options, all frontal short range (2m) faces. One was of the same identity as the query, and the rest were the most similar identities, chosen according to a DNN-derived similarity matrix. Participants ranked the top three most similar target images to the query image. The collected data reveal the functional relationship between human performance and multiple viewing parameters. Nine state-of-the-art pre-trained DNNs were tested for their face recognition performance on precisely the same stimulus set. Strikingly, DNN performance was significantly diminished by variations in ranges and rotated viewpoints. Even the best-performing network reported below 65% accuracy at the closest distance with a profile view of faces, with results dropping to near chance for longer ranges. The confusion matrices of DNNs were generally consistent across the networks, indicating systematic errors induced by viewing parameters. Taken together, these data not only help characterize human performance as a function of key ecologically important viewing parameters, but also enable a direct comparison of humans and DNNs in this parameter regime