MarioNETte: Few-shot Face Reenactment
Preserving Identity of Unseen Targets

Sungjoo Ha ^* Martin Kersner ^* Beomsu Kim ^* Seokjun Seo ^* Dongyoung Kim ^†

Hyperconnect
Seoul, Republic of Korea
In AAAI 2020

Paper (arXiv)

* Equal contributions, listed in alphabetical order. † Corresponding author.

Abstract

When there is a mismatch between the target identity and the driver identity, face reenactment suffers severe degradation in the quality of the result, especially in a few-shot setting. The identity preservation problem, where the model loses the detailed information of the target leading to a defective output, is the most common failure mode. The problem has several potential sources such as the identity of the driver leaking due to the identity mismatch, or dealing with unseen large poses. To overcome such problems, we introduce components that address the mentioned problem: image attention block, target feature alignment, and landmark transformer. Through attending and warping the relevant features, the proposed architecture, called MarioNETte, produces high-quality reenactments of unseen identities in a few-shot setting. In addition, the landmark transformer dramatically alleviates the identity preservation problem by isolating the expression geometry through landmark disentanglement. Comprehensive experiments are performed to verify that the proposed framework can generate highly realistic faces, outperforming all other baselines, even under a significant mismatch of facial characteristics between the target and the driver.

Demo video

One-shot reenactment examples

The first row's image is used as a one-shot target image, and the leftmost image is provided as a driver image.

MarioNETte-LT

MarioNETte

Methods

Our model reenacts the face of unseen targets in a few-shot manner, especially focusing on the preservation of target identity. The model does not require any fine-tuning procedure, thus can be deployed with a single model for reenacting arbitrary identity. We adopted three novel components for compositing our model:

Image attention block, for efficiently blending relevant style information from multiple target images.
Target feature alignment, which enables model to inject fine-grained style information of target images into the generated image.
Landmark transformer, for adjusting the structural differences of two identities' landmarks by disentangling facial landmarks into the identity geometry and the expression geometry.

Citation

To refer our work, please cite our paper as follows:

@inproceedings{MarioNETte:AAAI2020,

        author = {Sungjoo Ha and Martin Kersner and Beomsu Kim and Seokjun Seo and Dongyoung Kim},

        title = {MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets},

        booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},

        year = {2020}

    }