Hands-On Guide: Building Efficient LLM Applications in Rust Using Qwen and Knowledge Base with RAG

Generally, the process of developing applications that combine large models with Retrieval-Augmented Generation (RAG) involves several steps. This process ensures that the application provides more accurate and up-to-date answers by not only relying on the LLM’s knowledge but also dynamically retrieving information from external data sources.

The specific steps are as follows:

User submits a query: The user inputs a question or command through the application interface.
Query preprocessing: Preliminary processing of the user’s query.
Retrieval phase (RAG part):

Use embedding techniques to convert the user query into a vector representation.
Search for the most relevant entries in a pre-built document or knowledge base index. This typically involves calculating the similarity between the query vector and the vectors of documents in the index and selecting the best matches.
The retrieved relevant content is extracted and prepared for generating the response.

Generation phase (LLM part):

Pass the retrieved relevant content as context, along with the original user query, to the LLM.
The LLM generates detailed and precise responses based on the provided context. Here, the LLM relies not only on its internal training dataset but also on the latest information retrieved in real-time to enrich the content of the response.

Post-processing of the answer:

Format or simplify the LLM-generated answer to make it easier to understand.
Ensure the answer meets the application’s security and compliance requirements, such as filtering sensitive information.

Return the result to the user: Finally, the processed answer is displayed to the user through the application interface.

There are many frameworks available to help us implement this process more easily. Does Rust have similar frameworks? The answer is yes.

Swiftide is a Rust library for building LLM applications, supporting fast data ingestion, transformation, and indexing to achieve effective querying and prompt injection, known as Retrieval-Augmented Generation. It provides flexible building blocks for creating various agents, allowing for rapid development from concept to production with minimal code.

However, so far, Swiftide only supports four large models/platforms: OpenAI, Groq, Ollama, and AWS Bedrock. The Qwen large model can only be loaded locally using Ollama, which requires GPU resources. Clearly, using online models is more cost-effective. Fortunately, the official Qwen model also supports the OpenAI interface, so we only need to make some changes to the Swiftide project to make it compatible.

Qwen official documentation: https://help.aliyun.com/zh/model-studio/getting-started/models

Adding Qwen Support to Swiftide

Before adding features, let’s look at the components of Swiftide:

LLM large model: This is the most important.
LLM Embedding: Embedding model, as we need to convert the knowledge base into feature vectors.
Feature vector database: Feature vectors need a storage medium.

Adapting Qwen to the OpenAI Interface

As mentioned earlier, the Qwen official documentation supports the OpenAI interface protocol, so we only need to make minimal adaptations. We need to implement async_openai::config::Config.

First, define the structure:

const QWEN_API_BASE: &str = "https://dashscope.aliyuncs.com/compatible-mode/v1";

#[derive(Clone, Debug, Deserialize)]
#[serde(default)]
pub struct QwenConfig {
    api_base: String,
    api_key: Secret<String>,
}

impl Default for QwenConfig {
    fn default() -> Self {
        Self {
            api_base: QWEN_API_BASE.to_string(),
            api_key: get_api_key().into(),
        }
    }
}

For the API key, it is retrieved from environment variables by default：

fn get_api_key() -> String {
    std::env::var("QWEN_API_KEY")
        .unwrap_or_else(|_| std::env::var("DASHSCOPE_API_KEY").unwrap_or_default())
}

Implement the async_openai::config::Config, mainly referring to the request headers of this interface:

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "你是谁？"
        }
    ]
}'


impl async_openai::config::Config for QwenConfig {
    fn headers(&self) -> HeaderMap {
        let mut headers = HeaderMap::new();

        headers.insert(
            AUTHORIZATION,
            format!("Bearer {}", self.api_key.expose_secret())
                .as_str()
                .parse()
                .unwrap(),
        );

        headers
    }

    fn url(&self, path: &str) -> String {
        format!("{}{}", self.api_base, path)
    }

    fn api_base(&self) -> &str {
        &self.api_base
    }

    fn api_key(&self) -> &Secret<String> {
        &self.api_key
    }

    fn query(&self) -> Vec<(&str, &str)> {
        vec![]
    }
}

We need to restrict the Qwen model name and the embedding model name, so we need to define two enums:

Qwen model versions:

#[derive(Debug, Default, Clone, PartialEq)]
pub enum QwenModel {
    #[default]
    Max,
    Plus,
    Turbo,
    Long,
}
impl Display for QwenModel {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            QwenModel::Max => write!(f, "qwen-max"),
            QwenModel::Plus => write!(f, "qwen-plus"),
            QwenModel::Turbo => write!(f, "qwen-turbo"),
            QwenModel::Long => write!(f, "qwen-long"),
        }
    }
}

Qwen Embedding version

#[derive(Debug, Default, Clone, PartialEq)]
pub enum QwenEmbedding {
    #[default]
    TextEmbeddingV1,
    TextEmbeddingV2,
    TextEmbeddingV3,
    TextEmbeddingAsyncV1,
    TextEmbeddingAsyncV2,
}


impl Display for QwenEmbedding {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            QwenEmbedding::TextEmbeddingV1 => write!(f, "text-embedding-v1"),
            QwenEmbedding::TextEmbeddingV2 => write!(f, "text-embedding-v2"),
            QwenEmbedding::TextEmbeddingV3 => write!(f, "text-embedding-v3"),
            QwenEmbedding::TextEmbeddingAsyncV1 => write!(
                f,
                "text-embedding-async-v1

"
            ),
            QwenEmbedding::TextEmbeddingAsyncV2 => write!(
                f,
                "text-embedding-async-v2

"
            ),
        }
    }
}

impl From<&String> for QwenEmbedding {
    fn from(value: &String) -> Self {
        match value.as_str() {
            "text-embedding-v1" => QwenEmbedding::TextEmbeddingV1,
            "text-embedding-v2" => QwenEmbedding::TextEmbeddingV2,
            "text-embedding-v3" => QwenEmbedding::TextEmbeddingV3,
            "text-embedding-async-v1" => QwenEmbedding::TextEmbeddingAsyncV1,
            "text-embedding-async-v2" => QwenEmbedding::TextEmbeddingAsyncV2,
            _ => panic!("Invalid embedding model"),
        }
    }
}

Define the Qwen Struct

Since we need an OpenAI-compatible interface, the client should use the Client from the OpenAI library.

#[derive(Debug, Builder, Clone)]
#[builder(setter(into, strip_option))]
pub struct Qwen {
    #[builder(default = "default_client()", setter(custom))]
    client: Arc<async_openai::Client<QwenConfig>>,
    /// Default options for prompt models.
    #[builder(default)]
    default_options: Options,
}

Options can refer to the implementation of OpenAI or Groq:

#[derive(Debug, Default, Clone, Builder)]
#[builder(setter(into, strip_option))]
pub struct Options {
    /// The default prompt model to use, if specified.
    #[builder(default)]
    pub prompt_model: Option<String>,
    #[builder(default)]
    pub embed_model: Option<String>,
    #[builder(default)]
    pub dimensions: u16,
}

Here, we need to add a dimensions field to store the dimensions of the embedding model because the dimensions of Qwen’s Embedding models are not uniform:


Model Name	Vector Dimension
text-embedding-v3	1,024 (default), 768, or 512
text-embedding-v2	1,536
text-embedding-v1	1,536
text-embedding-async-v2	1,536
text-embedding-async-v1	1,536

Therefore, we need to perform the corresponding validation when providing the dimension parameter externally:

/// Set or validate default dimensions
/// 
/// This method is mainly used to set or validate the dimension parameters related to the embedding model. If an embedding model is provided, it verifies whether the incoming dimensions match the expected values based on the model type.
/// If the dimensions do not match the expected values, the program will fail an assertion and throw an error message. If default_options has not been initialized, or no embedding model is provided, it will initialize or update the dimensions field in default_options.
/// 
/// # Parameters
/// - `dimensions`: u16 type, representing the dimension value.
/// 
/// # Returns
/// Returns `&mut Self`, allowing for method chaining.
pub fn default_dimensions(&mut self, dimensions: u16) -> &mut Self {
    // Attempt to get a mutable reference to default_options
    if let Some(options) = self.default_options.as_mut() {
        // If an embedding model is provided, validate the dimensions based on the model type
        if let Some(model) = &options.embed_model {
            let embed_model: QwenEmbedding = model.into();
            match embed_model {
                QwenEmbedding::TextEmbeddingV1 => assert_eq!(
                    dimensions, 1536,
                    "Dimensions must be 1536 for this embedding model"
                ),
                QwenEmbedding::TextEmbeddingV2 => assert_eq!(
                    dimensions, 1536,
                    "Dimensions must be 1536 for this embedding model"
                ),
                QwenEmbedding::TextEmbeddingV3 => assert!(
                    matches!(dimensions, 1024 | 768 | 512),
                    "Dimensions must be one of [1024, 768, 512] for TextEmbeddingV3"
                ),
                QwenEmbedding::TextEmbeddingAsyncV1 => assert_eq!(
                    dimensions, 1536,
                    "Dimensions must be 1536 for this embedding model"
                ),
                QwenEmbedding::TextEmbeddingAsyncV2 => assert_eq!(
                    dimensions, 1536,
                    "Dimensions must be 1536 for this embedding model"
                ),
            }
        }

        // Update the dimensions field in options
        options.dimensions = dimensions;
    } else {
        // If default_options has not been initialized, initialize it
        self.default_options = Some(Options {
            dimensions,
            ..Default::default()
        });
    }
    self
}

To facilitate calling, add the following methods to QwenBuilder:

pub fn default_prompt_model(&mut self, model: &QwenModel) -> &mut Self {
    if let Some(options) = self.default_options.as_mut() {
        options.prompt_model = Some(model.to_string());
    } else {
        self.default_options = Some(Options {
            prompt_model: Some(model.to_string()),
            ..Default::default()
        });
    }
    self
}

pub fn default_embed_model(&mut self, model: &QwenEmbedding) -> &mut Self {
    if let Some(options) = self.default_options.as_mut() {
        options.embed_model = Some(model.to_string());
    } else {
        self.default_options = Some(Options {
            embed_model: Some(model.to_string()),
            ..Default::default()
        });
    }
    self
}

Qwen LLM Adapter for Swiftide

Swiftide only needs to implement the SimplePrompt trait.

#[async_trait]
impl SimplePrompt for Qwen {
    // An asynchronous function to interact with the language model using the given prompt
    async fn prompt(&self, prompt: Prompt) -> Result<String> {
        // Get the default model for handling the prompt, ensuring it is set
        let model = self
            .default_options
            .prompt_model
            .as_ref()
            .context("Model not set")?
            .to_string();

        // Construct the request for creating chat completion, including the model and prompt message
        let request = CreateChatCompletionRequestArgs::default()
            .model(model)
            .messages(vec![ChatCompletionRequestUserMessageArgs::default()
                .content(prompt.render().await?) // Render and add prompt content
                .build()?
                .into()])
            .build()?;

        // Log the constructed request for debugging purposes
        tracing::debug!(
            messages = serde_json::to_string_pretty(&request)?,
            "[SimplePrompt] Request sent to qwen"
        );

        // Send the request and wait for the response
        let mut response = self.client.chat().create(request).await?;

        // Log the received response for debugging purposes
        tracing::debug!(
            response = serde_json::to_string_pretty(&response)?,
            "[SimplePrompt] Response received from qwen"
        );

        // Extract and return the content of the first choice in the response, ensuring it exists
        response
            .choices
            .remove(0)
            .message
            .content
            .take()
            .context("Response content error")
    }
}

Qwen Embedding Adapter

Swiftide adaptation requires implementing swiftide_core::EmbeddingModel.

Original interface:

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/embeddings' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "text-embedding-v3",
    "input": "衣服的质量杠杠的，很漂亮，不枉我等了这么久啊，喜欢，以后还来这里买",  
    "dimension": "1024",  
    "encoding_format": "float"
}'

// Asynchronously process input text to generate the corresponding embedding vectors.
//
// Parameters:
// - `input`: A vector of strings containing the text to be embedded.
//
// Return value:
// - `Result<Embeddings>`: A result type containing the generated embedding vectors if successful.
//
// This function retrieves the embedding model and dimensions from the default options, constructs the embedding request, and sends it to the Qwen API. It then processes the response to extract the embedding vectors and returns the result.
#[async_trait]
impl EmbeddingModel for Qwen {
    async fn embed(&self, input: Vec<String>) -> Result<Embeddings> {
        // Retrieve the embedding model, ensuring it is set.
        let model = self
            .default_options
            .embed_model
            .as_ref()
            .context("Model not set")?;
        
        // Retrieve the embedding dimensions from the default options.
        let dimensions = self.default_options.dimensions;
        
        // Construct the embedding request parameters.
        let request = CreateEmbeddingRequestArgs::default()
            .model(model)
            .dimensions(dimensions)
            .input(&input)
            .build()?;
        
        // Log the details of the embedding request.
        tracing::debug!(
            num_chunks = input.len(),
            model = &model,
            "[Embed] Request sent to qwen"
        );
        
        // Send the embedding request and wait for the response.
        let response = self.client.embeddings().create(request).await?;
        
        // Log the number of embedding vectors received.
        let num_embeddings = response.data.len();
        tracing::debug!(num_embeddings = num_embeddings, "[Embed] Response received");

        // Process the response to extract the embedding vectors, assuming the order remains unchanged.
        // Warning: This assumption may not always hold.
        Ok(response.data.into_iter().map(|d| d.embedding).collect())
    }
}

Implementing Document Query LLM Application

To demonstrate the above results, I will use the content of this page (Tauri framework documentation) as an example:

https://tauri.app/zh-cn/develop/resources/

Initialize Qwen Client

let client = QwenBuilder::default()
        .default_embed_model(&swiftide::integrations::qwen::QwenEmbedding::TextEmbeddingV2)
        .default_prompt_model(&swiftide::integrations::qwen::QwenModel::Long)
        .default_dimensions(1536)
        .build()?;

Initialize Vector Database

We use LanceDB as the vector database, which is also implemented in Rust. Of course, you can also use Qdrant, which is implemented in Rust, but Qdrant is an application and is more cumbersome to use. LanceDB initialization:

let tempdir = TempDir::new().unwrap();
let lancedb = LanceDB::builder()
        .uri(tempdir.child("lancedb").to_str().unwrap())
        .vector_size(1536)
        .with_vector(EmbeddedField::Combined)
        .with_metadata(metadata_qa_text::NAME)
        .table_name("swiftide_test")
        .build()
        .unwrap();

Here, vector_size is the vector dimension, which needs to be consistent with the dimension of the Embedding model you call.

Build Vector Index Pipeline

This operation calls the embedding model to generate knowledge base content vectors and then stores the vectors in the vector database.

// Build the index pipeline, starting from the file loader, going through chunking, metadata processing, embedding, and finally storing in LanceDB
indexing::Pipeline::from_loader(FileLoader::new(".").with_extensions(&["md"]))
        .with_default_llm_client(client.clone())
        .then_chunk(ChunkMarkdown::from_chunk_range(10..2048))
        .then(MetadataQAText::new(client.clone()))
        .then_in_batch(Embed::new(client.clone()).with_batch_size(10))
        .then_store_with(lancedb.clone())
        .run()
        .await?;

Build Query Pipeline


let pipeline = query::Pipeline::default()
        .then_transform_query(query_transformers::GenerateSubquestions::from_client(
            client.clone(),
        ))
        .then_transform_query(query_transformers::Embed::from_client(client.clone()))
        .then_retrieve(lancedb.clone())
        .then_transform_response(response_transformers::Summary::from_client(client.clone()))
        .then_answer(answers::Simple::from_client(client.clone()));

Query

Let’s query a question to see the result.

let result = pipeline
        .query("tauri 如何访问文件")
        .await?;
    
    println!("====");
    println!("{:?}", result.answer());

Output:

Accessing files in a Tauri application can be done through both Rust and JavaScript. Here are the specific steps and example code:\n\n### Accessing Files in Rust\n\n1. **Configure `resources`**: First, ensure you have added the `resources` attribute in the `bundle` object of your `tauri.conf.json` file to include the files you need.\n\n    ```json title=tauri.conf.json\n    {\n      \"bundle\": {\n        \"resources\": [\n          \"lang/*\"\n        ]\n      }\n    }\n    ```\n\n2. **Use `PathResolver` to Access Files**: In your Rust code, you can use an instance of `PathResolver` to access these files.\n\n ...................

Conclusion

In fact, many large documents are implemented with AI queries in a similar way, such as the Milvus vector database documentation, which has an Ask AI feature in the top right corner: https://milvus.io/docs. LanceDB, on the other hand, does not provide a similar feature.

Milvus is also a vector database and theoretically can be implemented with embedding storage in Swiftide.

It is also possible to implement without a framework, but Swiftide provides some AI Agent functionalities, supporting input and output for some Tools, which greatly reduces development work.