[Paper Review📃] Facial Expression Recognition in the Wild via Deep Attentive Center Loss

Facial Expression Recognition in the Wild via Deep Attentive Center Loss

-DACL-

Paper😙

Studying I searched FER using center loss. This model is a SOTA model, ranking each 6 and 5 in RAF-DB and AffectNet as of today (09 March 2022).

Intro

  • In FER, soft max loss was widely used. However, softmax loss is incapable of yielding discriminative features in wild scenarios.

  • Deep Metric Learning(DML) approaches constrain the embedding space to obtain well-discriminated deep features.
  • In a typical DML problem, the deep feature equally contributes to the DML’s objective function along all dimensions. Therefore, DML methods are prone to discriminate redundant and noisy information along with important information encoded in the deep feature vector. This leads to over-fitting and hinders the generalization ability of the learning algorithm

➜ Paper designed a modular attention-based DML approach, called Deep Attentive Center Loss (DACL), to selectively learn to discriminate exclusively the relevant information in the embedding space.

image

DACL extract attention weights to apply it in loss computational process.


DACL method

image

Above image shows a whole process of the proposed model. When input image goes to CNN(ResNet18), the last layer’s feature goes to two different ways.

The DACL take flattened feature as a input for attention network. The output of attention network will be attention weights which will be element-wise multiplicated with sparse center loss computation.

the same last layer’s feature go through pooling layer, will be computed in sparse center loss and softmas loss each.

The final loss will be summation of softmax loss and sparse center loss.


DACL method

- Context Encoder (CE) Unit

image

The three fully connected layer can be mathematically notated to…

Since CE Unit is composed of FC layer, the significant features can be extracted well. The final single unit vector $e_i$ is a latent representation vector.


DACL method

- Multi-head binary classification

image


attention value $a_{ij}$ eventually saturates 0~1.


DACL method

- sparse center loss

image

as you can see, the difference between feature and center point process element-wise multiplication with attention weight.


Training DACL

  1. Sparse center loss is jointly optimized with softmax loss : $L = L_S$   $+ \lambda L_{SC}$

  2. Sparse center loss contributes to the gradients with respect to the deep features and their corresponding attention weights

  1. Centers are updated using a moving average strategy


Experiment

RAF-DB

image


AffectNet

image

Attention weights visualization

댓글남기기