GAP layers
On Global Average Pooling (GAP) layers, From Network In Network, Lin, et al:
In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.
Wow! That seems like a pretty powerful technique in a world of computer vision which strides for explainabile models. Of particular note is visualizing the final GAP layer as a softmax and having spatial information map directly to a probability distribution of “how useful is this pixel for this class?” While there are other techniques out there for this, including the age-old Visualizing and Understanding Convolutional Networks and what powers distill.pub’s The Building Blocks of Interpretability, this does feel like you get something explainable for free.