Building a Handwriting Input Method from Scratch: Preparation

In this series of articles, you will learn:

Training models using PyTorch.
Model inference using Rust Candle.
Deploying models to WASM using Rust.
Calling WASM using TypeScript + Web Worker.

Preface

I am a user of the Wubi input method. Although Wubi can type characters that users may not recognize, there are times when I encounter characters whose radicals I cannot decipher. In such cases, I resort to the “elderly input method” => handwriting input.

As a technical person, I have been curious for many years about how handwriting input is implemented. Back when deep learning was not yet prevalent, achieving decent recognition rates for handwritten Chinese characters was indeed a marvel.

Recently, I wanted to find a model for recognizing handwritten Chinese characters and discovered that most models have remained unchanged for nearly a decade.

So, why not build one myself?

Approach

How many Chinese characters are there in total?

The Unicode standard includes over 90,000 Chinese characters, but the number of commonly used characters is only around 3,000-4,000. Implementing recognition for these two different scales would require different approaches. For 4,000 characters, a simple classification model would suffice. However, for 90,000 characters, simple classification becomes impractical due to the fully connected layers, which could cause the number of parameters to explode into the hundreds of millions, and the computational complexity of forward propagation would also increase significantly.

For handling a large set of Chinese characters, the general approaches are:

Grouped Classification: First perform coarse classification, then fine classification.
Similarity: Use features for similarity calculations.

For personal use, there is no need to implement such a large character set. Therefore, I will use the casia-hwdb dataset from the Chinese Academy of Sciences, which contains approximately 3,740 Chinese characters, along with some symbols and foreign letters, totaling 4,037 characters.

Technical Implementation

To achieve usable handwriting recognition for Chinese characters, we need to consider the real-time performance of model inference and ease of deployment. If real-time performance is not a concern, many small model tasks can be directly handled by multimodal large models, which might be why small models are becoming less popular.

Here, we will use multiple technologies and frameworks: Python for training, Rust for deployment, and TypeScript for the user interface.

Training

We will use PyTorch Lightning as the training framework. The lightning framework is a higher-level abstraction over PyTorch, which reduces boilerplate code for simple tasks.

To ensure real-time performance even on CPUs, we will use the MobileNetV2 model as the base. Of course, since we are considering deployment to WASM for browser-based front-end inference, this choice makes sense. For server deployment, using ResNet could yield better recognition rates.

Inference

For inference, we will use the Rust Candle framework. Since we have previously implemented the model structures for MobileNetV2 and ResNet in earlier articles, using Candle as the inference framework is a reasonable choice. More importantly, Candle supports the WASM environment.

Deployment Environment

Given that MobileNetV2 is lightweight enough, we will directly deploy it to the browser environment using WASM.

Application

The implementation will roughly be as follows: Use React + Konva to create a canvas for simulating handwriting input, start a Web Worker process to handle WASM model inference, and display the results on the page.

Of course, this is just one application scenario. It could also be implemented as a desktop-level input method, or the ONNX weights could be exported and converted to the NCNN inference framework for integration into Android or iOS applications.

Final Result:

ochw.ximei.me