What is RAG?

RAG stands for Retrieval Augmented Generation - an advanced AI technique that combines information retrieval with generative AI to provide accurate, context-aware responses.

How RAG Works

RAG works through a sophisticated 7-step process:

1️⃣ Document Loading

The PDF document is loaded and parsed using PyPDFLoader to extract all text content from the document pages.

2️⃣ Text Chunking

The extracted text is split into smaller chunks (12,000 characters) with overlaps (1,500 characters) to maintain context across boundaries.

3️⃣ Embedding Generation

Each text chunk is converted into a dense vector representation (embedding) that captures semantic meaning using Google's Gemini model.

4️⃣ Vector Database Storage

These embeddings are stored in a vector database, allowing for efficient similarity search based on semantic content rather than exact keyword matching.

5️⃣ Query Processing

When you ask a question, your query is converted into an embedding and compared against all stored document embeddings to find the most relevant chunks.

6️⃣ Retrieval

The system retrieves the top-k most semantically similar chunks (top 4 in this implementation) that contain information related to your query.

7️⃣ Context-Aware Generation

The retrieved chunks are combined with your question and sent to the Gemini model, which generates an accurate answer based on the document's context.

Benefits of RAG

Accuracy: Answers are grounded in the actual document content, reducing hallucinations
Context Awareness: The AI understands the specific document context, not just general knowledge
Scalability: Can handle large documents by processing them in chunks
Efficiency: Only relevant parts of the document are used for generation
Traceability: Answers can be traced back to specific sections of the document

Technology Stack

This application is built using the following technologies:

Frontend: HTML5, CSS3, JavaScript
Backend Framework: Python Flask
AI Model: Google Gemini 2.5 Flash
Hosted On Google Cloud Run
PDF Processing: LangChain PyPDFLoader
Text Splitting: LangChain RecursiveCharacterTextSplitter
RAG Framework: LangChain LLMChain
Language Model: ChatGoogleGenerativeAI

Implementation Details

In this application:

PDFs are split into chunks of 12,000 characters with 1,500 character overlap
Only the first 4 chunks are used as context to stay within token limits
A custom prompt template guides the AI to answer based solely on the document
Temperature is set to 0.2 for more focused and factual responses

Meet the Team

Sachethan V

AIML,Global Academy of Technology

Led the development of the RAG model architecture and deployed the application on Google Cloud Run, ensuring scalable and efficient document processing.

Harshitha C

AIML,Global Academy of Technology

Designed and implemented the interactive user interface, seamlessly integrating frontend components with backend API calls for a smooth user experience.