在Rust中使用ONNX Runtime部署YOLOv10目标检测模型

不知不觉间， YOLO 系列已经演变到了 YOLOV12。 ultralytics 这个项目也让 YOLO 家族大放异彩，其他争议放一边，至少工程化上，ultralytics 做出最大的贡献。

YOLO v5、v8、v11 都是 ultralytics 推出的迭代版本。

YOLO 最近的三个版本: v10,v11,v12。

v10 是清华大学的研究人员推出的版本，我觉得它最大的贡献是消除非极大值抑制 (NMS)。一般使用 YOLO 做工程，基本都是部署在边缘设备， YOLO 的 NMS 其实非常不友好，比如它只能在 CPU 上计算，而且它的耗时在边缘设备上已经到了不可无视的地步。 v11 并不是从 v10 迭代而来的，它是基于 v8 版本迭代出来的，因此这个版本存在 NMS。 v12 引入了一种以注意力为中心的架构，它脱离了之前 YOLO 模型中使用的传统 CNN 方法，这个版本在推理速度上是更慢的。

所以我选择 YOLOv10。

导出 onnx 版本

from ultralytics import YOLO
# Load a model
model = YOLO("yolov10s.pt")  # load a custom trained model
# Export the model
model.export(format="onnx")

和 YOLOv8 [1,84,8400] 不一样，YOLOv10 的输出为[1, 300, 6]。这表示 YOLOv10 最多支持 300 个检测框，每个检测框包含 6 个通道，分别为[x1, y1, x2, y2, score, class_id]。

有了以上信息，推理代码就好实现了。这里我们使用 onnxruntime 作为推理引擎。

ort = {version = "=2.0.0-rc.10", features = ["ndarray","coreml","cuda"]}
ndarray = "0.16.1"
image = "0.25.8"
imageproc = "0.25.0"
ab_glyph = "0.2.31"

coreml 对应 macOS 框架，cuda 开启 nvidia 显卡。

let session = Session::builder()?
            .with_inter_threads(1)?
            .with_optimization_level(GraphOptimizationLevel::Level3)?
            .with_execution_providers([CoreMLExecutionProvider::default().build()])?
            .commit_from_file(model_path)?;

首先，前处理和其他版本的 YOLO 基本一样，resize 到640x640，并转换成(C,H,W) 通道顺序:

	/// 预处理图像
    /// 
    /// 将输入图像调整大小为640x640，并将其转换为模型所需的张量格式。
    /// 像素值会被归一化到[0, 1]范围。
    /// 
    /// 参数:
    /// * `image`: 输入的动态图像
    /// 
    /// 返回值:
    /// 经过预处理的4维张量，形状为(1, 3, 640, 640)的Result包装
    #[allow(clippy::type_complexity)]
    fn preprocess_image(
        &self,
        image: &DynamicImage,
    ) -> Result<ndarray::ArrayBase<ndarray::OwnedRepr<f32>, ndarray::Dim<[usize; 4]>>, Box<dyn Error>>
    {
        // 将图像调整为640x640大小
        let img = image.resize_exact(640, 640, FilterType::Nearest);
        // 创建形状为(1, 3, 640, 640)的零值张量
        let mut input = Array::zeros((1, 3, 640, 640));
        // 遍历所有像素，将RGB值归一化并存入张量
        for pixel in img.pixels() {
            let x = pixel.0 as _;
            let y = pixel.1 as _;
            let [r, g, b, _] = pixel.2.0;
            input[[0, 0, y, x]] = (r as f32) / 255.;
            input[[0, 1, y, x]] = (g as f32) / 255.;
            input[[0, 2, y, x]] = (b as f32) / 255.;
        }

        Ok(input)
    }

通过 Netron 软件可以看到，输入的 name = images,输出的 name = output0。

let array = self.preprocess_image(&img)?;
// 运行推理
let outputs: SessionOutputs = self
	.session
	.run(inputs!["images" => TensorRef::from_array_view(&array)?])?;

和 YOLOv8 不一样，YOLOv10 的输出不需要转置（参考上面提到的输出形状）：

// 这是 YOLOV10
let (_output_shape, output_data) = outputs["output0"].try_extract_tensor::<f32>()?;
// println!("输出张量形状: {:?}", _output_shape);
let output_vec: Vec<f32> = output_data.to_vec();

// 这是 YOLOv8 
//let output = outputs
//            .get(0)
//            .unwrap()
//            .try_extract::<f32>()?
//            .view()
//            .t()
//            .into_owned();

因为没了 NMS，后处理非常非常简单，可以说真正端到端了。

/// 过滤检测结果
pub fn filter_detections(
    results: &[f32],
    confidence_threshold: f32,
    img_width: u32,
    img_height: u32,
    orig_width: u32,
    orig_height: u32,
) -> Vec<Detection> {
    // YOLOv10输出格式: [x1, y1, x2, y2, score, class_id]
    // 每6个元素为一个检测框
    if !results.len().is_multiple_of(6) {
        eprintln!("警告: 模型输出长度不是6的倍数，实际长度: {}", results.len());
    }

    let num_detections = results.len() / 6;
    // println!("检测框数量: {}", num_detections);

    let mut detections = Vec::with_capacity(num_detections);

    // 计算缩放和填充因子 
    let scale = (img_width as f32 / orig_width as f32).min(img_height as f32 / orig_height as f32);
    let new_width = (orig_width as f32 * scale) as u32;
    let new_height = (orig_height as f32 * scale) as u32;
    let pad_x = (img_width - new_width) / 2;
    let pad_y = (img_height - new_height) / 2;

    for i in 0..num_detections {
        let base_index = i * 6;

        let left = results[base_index];
        let top = results[base_index + 1];
        let right = results[base_index + 2];
        let bottom = results[base_index + 3];
        let confidence = results[base_index + 4];
        let class_id = results[base_index + 5] as usize;

        // 打印原始值用于调试
        // println!("检测框 {}: left={}, top={}, right={}, bottom={}, 置信度={}, 类别ID={}",
        //          i, left, top, right, bottom, confidence, class_id);

        // 检查置信度是否有效
        if !(0.0..=1.0).contains(&confidence) {
            // println!("跳过无效置信度: {}", confidence);
            continue;
        }

        // 检查类别ID是否有效
        if class_id >= YOLOV10_CLASS_LABELS.len() {
            // println!("跳过无效类别ID: {}", class_id);
            continue;
        }

        // 应用置信度阈值
        if confidence >= confidence_threshold {
            // 移除填充并缩放到原始图像尺寸
            let left = (left - pad_x as f32) / scale;
            let top = (top - pad_y as f32) / scale;
            let right = (right - pad_x as f32) / scale;
            let bottom = (bottom - pad_y as f32) / scale;

            let x = left as u32;
            let y = top as u32;
            let width = (right - left) as u32;
            let height = (bottom - top) as u32;

            // 确保坐标有效
            if width > 0 && height > 0 && x < orig_width && y < orig_height {
                detections.push(Detection {
                    confidence,
                    bbox: (x, y, width, height),
                    class_id,
                    class_name: YOLOV10_CLASS_LABELS[class_id].to_owned(),
                });
            } else {
                println!(
                    "跳过无效边界框: ({}, {}) - 宽度: {}, 高度: {}",
                    x, y, width, height
                );
            }
        }
    }

    // NMS，但是 yolov10 并不需要 nms
    // nms(&mut detections, 0.5, 0.3);

    // println!("最终有效检测数量: {}", detections.len());
    detections
}

把结果画出来，可以看到，结果基本和 Python 一样。

[res.jpg]

最后

需要注意的是，虽然 ultralytics 这个项目是开源的，但是它的代码，包括模型权重文件，全都是基于AGPL-3.0 这个不友好的许可证发布的。因此你会发现虽然yolov8 很出名，但是其实没有多少开源项目愿意使用，比如著名的 cvat 项目早期其实是有yolov8 相关插件的，后面又把它给删除了。又比如 candle 项目里，你会发现它的yolov8 实现其实和ultralytics 没有半点关系，权重的节点名完全不一样，这是因为candle 项目里的 yolov8 是基于tinygrad 项目的重新实现。

如果你要用 ultralytics 和它的权重做商业项目，最好要看清这个风险。

如果能你觉得能帮到你，可以看到这里查看完整代码： https://gitcode.com/tunzei/yolov10