Boosting Monogram’s Smarts: Leveraging RAG for Enhanced AI Interactions

Artificial Intelligence (AI), a field in computer science, focuses on creating systems that can perform tasks traditionally done by humans. These tasks include learning from experiences, understanding natural language, recognizing patterns, and making decisions. The main goal of AI is to improve productivity and enhance human life.

The rise of AI-driven chatbots is significant, as these systems are becoming more popular. These chatbots can understand and respond to human language, provide personalized assistance, and learn from past interactions to improve future ones. They are used in various areas like customer service, personal assistants, and mental health support.

Adding a chatbot to a website has great potential to improve user experience. By offering immediate, tailored support to visitors, these systems help with navigation, finding information, and answering questions. As a result, they can significantly improve customer service quality, increase engagement, and boost user retention. Additionally, AI-powered chatbots work non-stop, providing support around the clock. Their ability to handle multiple inquiries simultaneously not only improves operational efficiency but also reduces the workload on human staff.

Gemini

Gemini LLM (Large Language Model) is a language tool developed by Google, designed to understand and generate human language with high accuracy. Its main goal is to grasp the context and nuances of language more deeply.

Built to be versatile, Gemini can handle a wide range of language tasks, including translation, summarization, and answering questions. It uses transformers, which are advanced deep learning models known for their ability to understand complex language structures. One standout feature of Gemini is its ability to understand conversational context, making it especially useful for tasks like chatbots, where understanding the flow of conversation is crucial for giving relevant and accurate responses.

RAG

RAG (Retrieval-Augmented Generation) is an advanced question-answering model that combines the strengths of pre-trained language models with the ability to find relevant documents from large collections of information. Designed to generate answers based on retrieved data, RAG is a great option for building chatbots, where giving accurate and contextually relevant responses is key. This innovative technology has the potential to significantly improve the chatbot's ability to provide precise and thorough answers to user questions.

Pinecone

Vector databases are an important tool in the machine learning and AI digital landscape, allowing the management of unstructured data and semi-structured data using vectors that allow us to execute fast and efficient searches based on similarity metric on real time interactions, enabling the acceleration of AI application development.

Pinecone is a specialized vector database designed for machine learning tasks. It's built to store, search, and organize high-dimensional vectors smoothly, making it perfect for managing complex data structures effectively. In chatbot development, Pinecone plays a crucial role by helping to index and find relevant information when users ask questions. This feature greatly improves the chatbot's ability to provide personalized and accurate responses, making the user experience much better overall.

Combining Gemini + RAG + Pinecone to create the chatbot

Creating a chatbot adapted to specific information requires careful selection of tools, as the process involves multiple steps to effectively retrieve and process user queries.

Langchain

Langchain is an innovative language tool that uses blockchain technology to manage and analyze natural language data. When combined with AI chatbots, Langchain boosts their ability to process language. It helps chatbots understand and respond to a wider range of natural language inputs, making them more effective at answering user questions. Additionally, Langchain uses the security features of blockchain to keep processed data confidential and intact. With components designed for Q&A applications and broader RAG applications, Langchain makes it easier to create advanced conversational interfaces.

The creation process entails the following steps:

1. Indexing: Load, split, embed and store data into Vector Stores with Embeddings model.

2. Retrieval and Generation: Get information based on user inputs.

To install Langchain run pnpm add langchain.

1. Indexing

The first step was to index all Monogram information. To accomplish this we needed to collect all the information from our site, we did this using the "DocumentLoaders" provided by "Langchain" specifically the "SiteMapLoader", which allows us to load the resources of a page through the "sitemap.xml" as follows:

const loader = new SitemapLoader(SITEMAP_URL)
const docs = await loader.load()

Then it’s necessary to split break large "Documents" into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won’t fit in a model’s finite context window. Using "RecursiveCharacterTextSplitter" from "Langchain" we can achieve this, this tool split the documents by default list of separators: ["\n\n", "\n", " ", ""].

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: size, //size of each chunk
  chunkOverlap: overlap, //how much overlap there should be between chunks
})

const docOutput = await splitter.splitDocuments(docs)

Finally, we store and index the split documents, so after we can search and retrieve them. Here is where Pinecone enters allowing us to push and manage all our data in vectors using Embeddings (which creates a vector representation of a piece of text). To do it it's necessary to initialize Pinecone and load the index where we are going to push the data.
Run pnpm @pinecone-database/pinecone to be able to initialize the client.

import { Pinecone } from '@pinecone-database/pinecone'

export async function initPinecone() {
  try {
	const pinecone = new Pinecone({
    	apiKey: 'PINECONE_API_KEY'
    })

	return pinecone
  } catch (error) {
    console.log('error', error)
    throw new Error('Failed to initialize Pinecone Client')
  }
}

Once the client is initialized and we get the index, we can use "PineconeStore" tool from "Langchain" to finally push the data.

To be able to use Pinecone tools run pnpm add @langchain/pinecone.

import { GoogleGenerativeAIEmbeddings } from '@langchain/google-genai'
import { PineconeStore } from '@langchain/pinecone'

const embeddings = new GoogleGenerativeAIEmbeddings({
  modelName: 'embedding-001', // 768 dimensions
})
	
// Push documents to Pinecone		
await PineconeStore.fromDocuments(docOutput, embeddings, {
  pineconeIndex,
  textKey: 'text',
})

2. Retrieval and Generation

To retrieve the documents and generate the answers we used the following components:

Prompt templates:. simplify the process of assembling prompts that combine default messages, user input, chat history, and (optionally) additional retrieved context.
Chat history: allows the chatbot to remember the previous user inputs and messages.
User inputs: questions made by the user on each interaction.
LLMChain: wraps an LLM to add additional functionality. Handles prompt formatting, input/output parsing, conversations, etc. Used extensively by higher level LangChain tools.

First, we retrieve all the Documents from the vector store (Pinecone).

const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
	pineconeIndex,
	textKey: 'text',
	filter: {
		// parameters
	}
})

Then the LLM must be establish with "LLMProvider" from "Langchain", in this case we use "ChatGoogleGenerateAI" sending "gemini-pro" model name and gemini API Key.

To use Google tools from Langchain run pnpm add @langchain/google-genai.

To know more about Gemini API and how to use it, see Google’s Gemini API documentation.

import { ChatGoogleGenerativeAI } from '@langchain/google-genai'

const model = new ChatGoogleGenerativeAI({
  apiKey: GEMINI_API_KEY,
  modelName: "gemini-pro",
  temperature: number, // controls the degree of randomness in token selection
  topK: number, // [1-40] Default value: 32. Lower value for less random responses
  topP: number // [0.0 - 1.0] Default value 1.0. Lower value for less random responses
});

After, we create a "PromptTemplate" to give certain instructions to the model and we create a "LLMChain" that will take the human question and pass it to the template to send it to the LLM and generate the answer.

import { PromptTemplate } from '@langchain/core/prompts'
import { ConversationalRetrievalQAChain } from 'langchain/chains'

// Prompt Template
const template = PromptTemplate.fromTemplate(`
  You are a helpful assistant. You are here to help users with their questions based on the provided context.
	
  Context: {context}
  Question: {question}

  Answer:`)
	
// Create chain
const chain = ConversationalRetrievalQAChain.fromLLM(model, vectorStore.asRetriever(), {
  qaChainOptions: {
    type: 'stuff',
	prompt: template,
  },
})
	
const response = await chain.invoke({
  question: prompt, //prompt sent by the user
  chat_history: chat_history
})
	
// Answer
console.log('Answer', response.text)

Conclusion

Adding conversation to a RAG chatbot may seem like a simple thing to do. However, it involves many considerations. How are you going to store the information? How are you going to retrieve the history of the conversation? How much of the conversation are you going to consider, given limited context for LLMs?

Once these questions are answered with the knowledge of the tools that are going to be used, the work can be simplified considerable, allowing the LLM generates the best answers based on user inputs.

However AI chatbots are not perfect, they can be trained to offer high-quality users support and conflict resolution any time of day without the need for a person to answer their questions from behind the scenes.