SMILE: Sequence-to-Sequence Domain Adaption with Minimizing Latent Entropy for Text Image Recognition.
Yen-Cheng Chang, Yi-Chang Chen, Yu-Chuan Chang, and Yi-Ren Yeh, ICIP, 2022. [pdf]Due to the characteristic of sequential labeling in OCR, we proposed a UDA method with minimizing latent entropy on sequence-to-sequence attention-based models with class-balanced self-paced learning.
g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin.
Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang, and Yi-Ren Yeh, INTERSPEECH, 2022. [pdf]The proposed method adapts learnable softmax-weights to condition the outputs of BERT with the polyphonic character of interest and its POS tagging to solve the problem of polyphone ambiguation.
Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data.
Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang, and Yi-Ren Yeh, ICPR Workshop, 2022 [pdf]This paper presents a framework for a Traditional Chinese synthetic data engine. We generated over 20 million synthetic data and collected over 7,000 manually labeled data TC-STR 7k-word as the benchmark.
Verifiability Enhanced Active Learning Using Multi-armed Bandit.
Yen-Cheng Chang and Tian-Li Yu, Master's Thesis, Computer Science Group, Department of Electrical Engineering, National Taiwan University. [pdf]Proposed a pool-based active learning technique that queries instances using the concept of verifiability, which is defined as the proportion of instances that are correctly classified by all classifiers in the version space.