Writing so many articles about Candle and PyTorch is actually a collection of my learning notes. Previously, I used PyTorch almost blindly without paying attention to the details of its APIs. On the other hand, Candle has almost no documentation.
masked_fill
masked_fill is an operation used for conditional tensor filling, which replaces positions in the tensor that meet certain conditions with a given value based on a specified boolean mask. Candle does not have an official masked_fill interface implementation, but there are some custom implementations found in the transformer module.
PyTorch:
x = torch.tensor([[1.0, 0.0], [0.3, -0.4]])
mask = x.to(torch.bool)
c = x.masked_fill(mask, torch.finfo(x.dtype).min)
print(c) #tensor([[-3.4028e+38, 0.0000e+00],
#[-3.4028e+38, -3.4028e+38]])
Candle:
// Custom implementation
fn masked_fill(on_false: &Tensor, mask: &Tensor, on_true: f32) -> Result<Tensor> {
let shape = mask.shape();
let on_true = Tensor::new(on_true, on_false.device())?.broadcast_as(shape.dims())?;
let m = mask.where_cond(&on_true, on_false)?;
Ok(m)
}
// Example usage
let data = vec![1.0f32, 0.0,0.3, -0.4];
let x = Tensor::from_vec(data, (2,2), &Device::Cpu)?;
let mask = x.ne(0.0)?;
let y = masked_fill(&x, &mask, f32::MIN)?;
println!("mask:{y}");
// mask:[[-3.4028e38, 0.0000e0],
// [-3.4028e38, -3.4028e38]]
// Tensor[[2, 2], f32]
Broadcasting Mechanism
PyTorch’s broadcasting mechanism allows tensors of different shapes to perform element-wise operations (such as addition, subtraction, multiplication, division) as long as their shapes meet the following conditions:
- Starting from the trailing dimension, the sizes of the two tensors must be equal or one of them must be 1.
- If the number of dimensions of the two tensors is different, 1 will be padded to the front of the smaller tensor until both have the same number of dimensions.
Suppose we have two tensors:
Ahas a shape of[1, 1, 64, 64]Bhas a shape of[64, 64]
These two tensors can be directly added in PyTorch:
a = torch.ones(1,1,64,64)
b = torch.ones(64,64)
print(a+b)
However, in Candle, due to Rust’s characteristics, tensors of different sizes cannot perform operations like a+b. Therefore, we need to use broadcast_add to achieve the same purpose.
let device = Device::Cpu;
let a = Tensor::ones((1,1,64,64), DType::F32, &device)?;
let b = Tensor::ones((64,64), DType::F32, &device)?;
// Addition
let c = a.broadcast_add(&b)?;
println!("c::{c}");
Matrix Multiplication
In PyTorch, a@b is equivalent to torch.matmul(a, b).
So what’s the difference between this and
a*b?
Take these two matrices as examples:
a*b is actually called element-wise multiplication. It requires that the dimensions of a and b must be the same, and it multiplies corresponding elements one by one. That is,