Some Thoughts on Implementing Tencent's Translation Model hy-mt-1.8b with Candle

Recently, Tencent open-sourced a relatively small translation model called hy-mt-1.8b. Due to its small size and support for many languages, it has remained highly popular on Hugging Face (as of January 11, 2026, it’s still ranked #2 on the overall leaderboard).

I’ve implemented this model in Rust using the Candle framework and successfully run it on a GPU.

Here are some of my thoughts on the experience.

This model is extremely simple—so simple that its structure can be described in just a few lines:

HunYuanDenseV1ForCausalLM(
  (model): HunYuanDenseV1Model(
    (embed_tokens): Embedding(120818, 2048, padding_idx=120002)
    (layers): ModuleList(
      (0-31): 32 x HunYuanDenseV1DecoderLayer(
        (self_attn): HunYuanDenseV1Attention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (query_layernorm): HunYuanDenseV1RMSNorm((128,), eps=1e-05)
          (key_layernorm): HunYuanDenseV1RMSNorm((128,), eps=1e-05)
        )
        (mlp): HunYuanDenseV1MLP(
          (gate_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (up_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (down_proj): Linear(in_features=6144, out_features=2048, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): HunYuanDenseV1RMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): HunYuanDenseV1RMSNorm((2048,), eps=1e-05)
      )
    )
    (norm): HunYuanDenseV1RMSNorm((2048,), eps=1e-05)
    (rotary_emb): HunYuanDenseV1RotaryEmbedding()
  )
  (lm_head): Linear(in_features=2048, out_features=120818, bias=False)
)

That’s roughly what it looks like.

Many of the modules inside are quite similar to implementations in other models, so there’s not much worth mentioning specifically.

Therefore, porting this model essentially became a task of translating the model implementation from the Python transformers library.

Challenge One

Since popular models are directly implemented in the transformers library, the code abstraction level is inevitably high. For example, model loading:

model_name_or_path = "tencent/HY-MT1.5-1.8B"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")

This is a highly encapsulated result, and we can’t directly start porting from here.

Fortunately, the printed model structure already provided us with enough information. It actually calls: HunYuanDenseV1ForCausalLM.

Challenge Two

Dead code and dead parameters.

Because the methods in the transformers library have some conventional parameters and norms, and because model structures vary, combined with some modules obviously copying and pasting code from other modules, there are many instances of dead code and unused parameters.

This model’s implementation has particularly many of these. Debugging outputs might be needed in the transformers library to determine exactly which code was called.

Challenge Three

Aligning inputs and outputs. This is meticulous work, as differences between programming languages and framework implementations are inevitable.

For example, the official Python example translates It's on the house. as 这是免费的。.

However, my implementation clearly had some differences, translating it as 这是一所房子而已。. Sometimes after modifying some parameters, it would become: 这是房子。 or 它在房子里. This also shows that these tokens are very close, and slight differences can completely change the output.

Repository: https://github.com/ximeiorg/hy-mt