Few-shot Compositional Font Generation with Dual Memory

Fig. 1: Few-shot font generation results. While previous few-shot font generation methods (AGIS, FUNIT, and EMD) are failed to generate unseen font, our model successfully transfer the font style and details.

Abstract

Preliminary: Complete Compositional Scripts

Fig. 2: Examples of compositionality of Korean script. Even if we choose the same sub-glyph, e.g., “ㄱ”, the shape and position of each sub-glyph are varying depending on the combination, as shown in red boxes.

Dual Memory-augmented Font Generation Network

Architecture overview

Encoder

The encoder extracts the component-wise features and stores them into the dynamic memory using the component label uci^\hat{u^{i}_{c}} and the style label ys^\hat{y_{s}}.

dynamic memory
persistent memory

Note that DM simply stores and retrieves the encoded features, but PM is learned embedding trained from the data. Therefore, DM is adaptive to the reference input style samples, while PM is fixed after training.

Memory addressor

e.g.

Decoder

The memory addressor loads the component features by the character label ycy_{c} and feeds them to the decoder.
In the decoding stage, decoder Dec generates a target glyph with the target character ycy_{c} and the reference style ysy_{s} using the component-wise features stored into the dynamic memory DM and the persistent memory PM.

discriminator
component classifie

We further use component classifier Cls to ensure the model to fully utilize the compositionality

compositional generator

Moreover, we introduce the global-context awareness and local-style preservation to the generator, called compositional generator

DM-Font learns the compositionality in the weakly-supervised manner; it does not require any exact component location, e.g., component-wise bounding boxes, but only component labels are required. Hence, DM-Font is not restricted to the font generation only, but can be applied to any generation task with compositionality, e.g., attribute conditioned generation tasks

Experiments

Pixel-level evaluation metrics assess the pixel structural similarity between the ground truth image and the generated image. We employ the structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).

We report the top-1 accuracy, perceptual distance (PD), and mean FID (mFID) using the classifiers. PD is computed by L2 distance of the features between generated glyph and GT glyph, and mFID is a conditional FID [16] by averaging FID for each target class.