Challenges and Solutions for Implementing Cross-Platform Screenshot Functionality with Tauri

Using Tauri as the application technology stack to implement screenshot functionality is not advantageous.

Tauri’s biggest feature is cross-platform compatibility - one codebase for multiple platforms. However, the screenshot functionality itself is not easily cross-platform because each platform has its own unique API implementation. For example, on macOS, you need to call Cocoa framework interfaces to implement it. Moreover, macOS requires applications to request screen recording permissions. If the user doesn’t authorize, the application can only capture screenshots of the application itself and an empty desktop screenshot. On Linux, we might need to consider implementing on X server or Wayland.

Another issue is Tauri’s transmission efficiency. We know that Tauri uses Command instructions to interact between the frontend and Rust. In the Tauri v1 era, Command instructions only supported serialized text transmission because Tauri used serde::Serialize to process data by default. For transmitting binary data, this was terribly inefficient.

Fortunately, Tauri v2 data now supports Raw:

pub enum InvokeBody {
  /// Json payload.
  Json(JsonValue),
  /// Bytes payload.
  Raw(Vec<u8>),
}

But for frontend and backend, there are still some data transmission overheads, just not as unacceptable as before.

Why is this important for screenshots? Try opening WeChat’s screenshot feature - before taking a screenshot, WeChat has a mouse-following preview function (similar to a magnifying glass effect) to help users accurately position the starting point of the screenshot.

How to Implement Screenshots in Tauri?

Screenshot software is essentially a transparent, topmost, full-screen application.

In tauri.conf.json, add the following configuration:

  "app": {
    "windows": [
      {
        "title": "jietu",
        "width": 800,
        "height": 600,
        "resizable": false,
        "alwaysOnTop": true,
        "decorations": false,
        "fullscreen": true,
        "shadow": false,
        "transparent": true
      }
    ],
    "security": {
       "csp": "default-src 'self' 'unsafe-inline'; connect-src 'self' ipc://localhost"
    }
  },

During development, it’s best to set "fullscreen": false, otherwise it’s difficult to debug.

The frontend also needs to set a transparent background:

html,body,#root,.container {
  @apply bg-transparent;
}

The rest requires implementing screenshot functionality for each platform in the Rust layer and implementing screenshot interactions in the frontend layer.

If we implement screenshots for the three major platforms ourselves in the Rust layer, the code volume would be quite large, and you’d also need to be very familiar with Windows API, Cocoa framework, etc. Fortunately, someone has done this dirty work:

cargo add xcap

Magnifying Glass (Screenshot Preview Starting Point)

[屏幕截图 2025-09-07 220129.png] The following method implements the magnifying glass function. We need to get the mouse coordinates and use this point as the origin to capture a 200x200 screenshot:

#[tauri::command]
fn xcap_start(x:u32,y:u32) -> Result<Response, String> {
    let monitors = Monitor::all().map_err(|e| e.to_string())?;

    let monitor = monitors
        .into_iter()
        .find(|m| m.is_primary().unwrap_or(false))
        .expect("No primary monitor found");


    let region_width = 200u32;
    let region_height = 200u32;

    let start = Instant::now();

    let image = monitor.capture_region(x, y, region_width, region_height).map_err(|e| e.to_string())?;
    println!(
        "Time to record region of size {}x{}: {:?}",
        image.width(),
        image.height(),
        start.elapsed()
    );
    // Convert image data to byte array and return to frontend
    let width = image.width();
    let height = image.height();
    let mut buffer = Cursor::new(Vec::new());
    let encoder = image::codecs::png::PngEncoder::new(&mut buffer);
    let rgba_image = image::DynamicImage::ImageRgba8(image).into_rgba8();
    encoder.write_image(rgba_image.as_bytes(), width, height, image::ExtendedColorType::Rgba8)
           .map_err(|e| e.to_string())?;
    
    Ok(Response::new(buffer.into_inner()))
        
}

The frontend uses a full-screen canvas to implement the screenshot page. I’m used to using konva.

“tsx <Text text={Mouse position: (${mousePosition.x}, ${mousePosition.y})} fontSize={15} x={20} y={20} fill=“white” />

      {image && (
        <Image
          x={300}
          y={50}
          width={200}
          height={200}
          image={image}
        />
      )}
    </Layer>
  </Stage>


Add mouse events to the outermost container:

capsview(e.clientX, e.clientY)}> ```

  const [imagePath, setImagePath] = useState<string | null>(null);
  // Use throttling to optimize screenshot calls
  const capsview = useRef(throttle(async (x: number, y: number) => {
    try {
      // Update mouse position state
      setMousePosition({ x, y });
      
      // Call Rust command to get screenshot data
      const result: Uint8Array = await invoke("xcap_start",{x,y});
      const blob = new Blob([new Uint8Array(result).buffer], { type: "image/png" });
      setImagePath(URL.createObjectURL(blob));
      console.log("Screenshot successfully obtained");
    } catch (error) {
      console.error("Screenshot failed:", error);
    }
  }, 30)).current; // Limit to at most once every 30ms

The 30ms here has a reason. Because the Rust layer takes the following time to capture a 200x200 image on Windows platform (using pnpm tauri dev, this is unoptimized, theoretically optimized should take less time):

Time to record region of size 200x200: 16.8005ms
Time to record region of size 200x200: 16.9432ms
Time to record region of size 200x200: 16.5475ms
Time to record region of size 200x200: 15.8173ms
Time to record region of size 200x200: 17.1305ms
Time to record region of size 200x200: 16.2595ms
Time to record region of size 200x200: 16.0607ms
Time to record region of size 200x200: 16.7726ms
Time to record region of size 200x200: 16.3253ms
Time to record region of size 200x200: 15.9661ms
Time to record region of size 200x200: 17.0717ms
Time to record region of size 200x200: 15.809ms
Time to record region of size 200x200: 12.2509ms
Time to record region of size 200x200: 19.8513ms
Time to record region of size 200x200: 17.0228ms
Time to record region of size 200x200: 14.8592ms
Time to record region of size 200x200: 15.5001ms
Time to record region of size 200x200: 15.9599ms
Time to record region of size 200x200: 16.6687ms
Time to record region of size 200x200: 15.5552ms
Time to record region of size 200x200: 15.9622ms

Adding frontend-backend transmission and some processing, 30ms is a suitable interval.

At this point, the core functionality of the magnifying glass effect before screenshot has been implemented. The details are just UI issues.

Similarly, for the area selection screenshot function, you can even modify the above `xcap_start` to use it. As for adding brush functions, text functions, frame functions, etc. after taking screenshots, they all belong to the frontend domain and can be easily handled using konva’s API.

title: “使用 Tauri 实现跨平台截图功能的挑战与解决方案” summary: “探讨在 Tauri 框架中实现截图功能的技术难点，包括跨平台兼容性问题和数据传输效率问题，并提供相应的解决方案。” date: “Sep 08 2025” draft: false tags: [“Tauri”, “Rust”]

使用 Tauri 作为应用技术栈来实现截图功能并不占优势。

Tauri 最大的特色是跨平台，一份代码，多端应用。然而，截图这个功能本身就不可太可能跨得了平台，因为每个平台都有自己的独特 API 实现。比如，macOS 上，需要调用 cocoa 框架的接口来实现。并且，macOS上是需要应用申请屏幕录制权限的。如果用户不授权，那么应用只能获取到应用自己的截图以及空桌面截图。在 linux 上，我们可能还得考虑是在 X server 上实现还是 wayland 上实现。

另外一个问题，是 Tauri 的传输效率。我们知道，Tauri 使用 Command 指令来交互前端与Rust 。在 Tauri v1 时代，Command 指令只支持序列化的文本传输信息，因为 Tauri 默认使用 serde::Serialize 处理数据。这对传输二进制数据来说，效率慢的非常可怕。

好在，Tauri v2 的数据已经支持 Raw:

pub enum InvokeBody {
  /// Json payload.
  Json(JsonValue),
  /// Bytes payload.
  Raw(Vec<u8>),
}

但是对于前后端来说，还是有一些传输数据的开销在，只是不那么另人无法接受了。

为什么这对于截图来说很重要？你试试打开微信的截图，微信的截图功能在截图前会有一个鼠标跟随预览功能（类似放大镜效果），方便用户精准定位截图的开始点。

Tauri 要怎么实现截图？

截图软件本质上就是一个透明的，置顶的，全屏的应用。

在 tauri.conf.json 里，添加如下配置：

  "app": {
    "windows": [
      {
        "title": "jietu",
        "width": 800,
        "height": 600,
        "resizable": false,
        "alwaysOnTop": true,
        "decorations": false,
        "fullscreen": true,
        "shadow": false,
        "transparent": true
      }
    ],
    "security": {
       "csp": "default-src 'self' 'unsafe-inline'; connect-src 'self' ipc://localhost"
    }
  },

开发阶段，最好设置 "fullscreen": false,，不然不好调试。

前端也要设置背景透明色：

html,body,#root,.container {
  @apply bg-transparent;
}

剩下的，我们需要在 Rust 层实现各个平台的截图功能。并在前端层实现截图上的交互。

如果我们自己在Rust 层实现三大平台的截图，代码量是比较大的，同时你还得非常了解 windows api、cocoa 框架等。幸运的是，有人把这部分脏活给干了：

cargo add xcap

放大镜（截图预览开始点）

[屏幕截图 2025-09-07 220129.png] 下面这个方法就实现了放大镜功能,我们需要获取鼠标的坐标点，并以这个坐标点为原点，获取一个200x200 的截图：

#[tauri::command]
fn xcap_start(x:u32,y:u32) -> Result<Response, String> {
    let monitors = Monitor::all().map_err(|e| e.to_string())?;

    let monitor = monitors
        .into_iter()
        .find(|m| m.is_primary().unwrap_or(false))
        .expect("No primary monitor found");


    let region_width = 200u32;
    let region_height = 200u32;

    let start = Instant::now();

    let image = monitor.capture_region(x, y, region_width, region_height).map_err(|e| e.to_string())?;
    println!(
        "Time to record region of size {}x{}: {:?}",
        image.width(),
        image.height(),
        start.elapsed()
    );
    // 将图像数据转换为字节数组返回给前端
    let width = image.width();
    let height = image.height();
    let mut buffer = Cursor::new(Vec::new());
    let encoder = image::codecs::png::PngEncoder::new(&mut buffer);
    let rgba_image = image::DynamicImage::ImageRgba8(image).into_rgba8();
    encoder.write_image(rgba_image.as_bytes(), width, height, image::ExtendedColorType::Rgba8)
           .map_err(|e| e.to_string())?;
    
    Ok(Response::new(buffer.into_inner()))
        
}

前端则使用一个全屏画布来实现截图页面，我习惯使用 konva 。

“tsx <Text text={鼠标位置: (${mousePosition.x}, ${mousePosition.y})} fontSize={15} x={20} y={20} fill=“white” />

      {image && (
        <Image
          x={300}
          y={50}
          width={200}
          height={200}
          image={image}
        />
      )}
    </Layer>
  </Stage>


在最外层的容器上添加鼠标事件：

capsview(e.clientX, e.clientY)}> ```

  const [imagePath, setImagePath] = useState<string | null>(null);
  // 使用节流优化截图调用
  const capsview = useRef(throttle(async (x: number, y: number) => {
    try {
      // 更新鼠标位置状态
      setMousePosition({ x, y });
      
      // 调用Rust命令获取截图数据
      const result: Uint8Array = await invoke("xcap_start",{x,y});
      const blob = new Blob([new Uint8Array(result).buffer], { type: "image/png" });
      setImagePath(URL.createObjectURL(blob));
      console.log("截图成功获取");
    } catch (error) {
      console.error("截图失败:", error);
    }
  }, 30)).current; // 限制为每30ms最多执行一次

这里的 30ms 是有说法的。因为 Rust 层在 windows 平台上截图一个 200x200大小的图片耗时（使用的是 pnpm tauri dev, 这是未优化过的，理论上优化过的应该更少耗时）如下：

Time to record region of size 200x200: 16.8005ms
Time to record region of size 200x200: 16.9432ms
Time to record region of size 200x200: 16.5475ms
Time to record region of size 200x200: 15.8173ms
Time to record region of size 200x200: 17.1305ms
Time to record region of size 200x200: 16.2595ms
Time to record region of size 200x200: 16.0607ms
Time to record region of size 200x200: 16.7726ms
Time to record region of size 200x200: 16.3253ms
Time to record region of size 200x200: 15.9661ms
Time to record region of size 200x200: 17.0717ms
Time to record region of size 200x200: 15.809ms
Time to record region of size 200x200: 12.2509ms
Time to record region of size 200x200: 19.8513ms
Time to record region of size 200x200: 17.0228ms
Time to record region of size 200x200: 14.8592ms
Time to record region of size 200x200: 15.5001ms
Time to record region of size 200x200: 15.9599ms
Time to record region of size 200x200: 16.6687ms
Time to record region of size 200x200: 15.5552ms
Time to record region of size 200x200: 15.9622ms

再叠加前后端传输以及一些处理等，30ms 是比较合适的间隔。

这时候截图前的放大镜效果最核心的功能就已经实现出来了，细节什么的，只是UI 问题而已。