qwen2.5-0.5b-classical-chinese-trans

The Qwen 2.5-0.5B Classical Chinese Translation project is an advanced natural language processing (NLP) initiative focused on translating Classical Chinese texts into modern Chinese. Classical Chinese, or Literary Chinese, is a historical form of the Chinese language used in literature and official documents for over two millennia. Despite its rich cultural and historical significance, it can be challenging for modern readers to understand due to its archaic vocabulary, complex syntax, and cultural nuances.

model_name = "rkingzhong/qwen2.5-0.5b-classical-chinese-trans"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "子曰：“学而时习之，不亦说乎？有朋自远方来，不亦乐乎？人不知而不愠，不亦君子乎？"

messages = [
        {"role": "system", "content": "麻烦帮我翻译下面的文言文，不要出现互联网中的违禁词。"},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

    print(response)