Towards Explainable Ear Recognition Systems Using Deep Residual Networks

Hammam Alshazly*, Christoph Linse, Erhardt Barth, Sahar Ahmed Idris, Thomas Martinetz

*Corresponding author for this work
4 Citations (Scopus)


This paper presents ear recognition models constructed with Deep Residual Networks (ResNet) of various depths. Due to relatively limited amounts of ear images we propose three different transfer learning strategies to address the ear recognition problem. This is done either through utilizing the ResNet architectures as feature extractors or through employing end-to-end system designs. First, we use pretrained models trained on specific visual recognition tasks, inititalize the network weights and train the fully-connected layer on the ear recognition task. Second, we fine-tune entire pretrained models on the training part of each ear dataset. Third, we utilize the output of the penultimate layer of the fine-tuned ResNet models as feature extractors to feed SVM classifiers. Finally, we build ensembles of networks with various depths to enhance the overall system performance. Extensive experiments are conducted to evaluate the obtained models using ear images acquired under constrained and unconstrained imaging conditions from the AMI, AMIC, WPUT and AWE ear databases. The best performance is obtained by averaging ensembles of fine-tuned networks achieving recognition accuracy of 99.64%, 98.57%, 81.89%, and 67.25% on the AMI, AMIC, WPUT, and AWE databases, respectively. In order to facilitate the interpretation of the obtained results and explain the performance differences on each ear dataset we apply the powerful Guided Grad-CAM technique, which provides visual explanations to unravel the black-box nature of deep models. The provided visualizations highlight the most relevant and discriminative ear regions exploited by the models to differentiate between individuals. Based on our analysis of the localization maps and visualizations we argue that our models make correct prediction when considering the geometrical structure of the ear shape as a discriminative region even with a mild degree of head rotations and the presence of hair occlusion and accessories. However, severe head movements and low contrast images have a negative impact of the recognition performance.

Original languageEnglish
Article number9526589
JournalIEEE Access
Pages (from-to)122254-122273
Number of pages20
Publication statusPublished - 2021

Research Areas and Centers

  • Centers: Center for Artificial Intelligence Luebeck (ZKIL)
  • Research Area: Intelligent Systems

DFG Research Classification Scheme

  • 409-05 Interactive and Intelligent Systems, Image and Language Processing, Computer Graphics and Visualisation


Dive into the research topics of 'Towards Explainable Ear Recognition Systems Using Deep Residual Networks'. Together they form a unique fingerprint.

Cite this