Powered by Gemini

Chat with
Multiple PDFs

An End-to-End Gen AI project that processes 1,000+ Pages. Built with LangChain, FAISS, and Streamlit.

GitHub
01

The Challenge

Why read 1,000 pages when AI can do it?

"I’m a huge Harry Potter fan!" I declared to the class. A skeptical student (the class Nerd) muttered, "Yeah... like that's real."

To prove the power of RAG (Retrieval Augmented Generation), I uploaded a 1,065-page Harry Potter fanfiction PDF into my system and asked it to summarize the Battle in the Chamber of Secrets.

The result? A precise, instant answer about Basilisks, the Sword of Gryffindor, and Fawkes the Phoenix. No magic required—just good architecture.

RAG in Action

USER QUERY

"Summarize the battle in the Chamber of Secrets."

GEMINI RESPONSE

"In the Chamber of Secrets, Harry Potter battles a gigantic basilisk summoned by Tom Riddle. By dodging the basilisk’s lethal gaze and wielding the Sword of Gryffindor..."

02

The Architecture

How do we process 1,000+ pages instantly?

Two-Part System
PDF Processing & Query Handling

PDF Processing

  • Extract text from PDFs
  • Split into chunks (10k chars)
  • Create Embeddings (Google GenAI)
  • Store in Vector DB (FAISS)

Query Handling

  • Convert query to embedding
  • Similarity Search (FAISS)
  • Retrieve Context Chunks
  • Generate Answer (Gemini Pro)

Why Vector Stores?

Imagine a 1000-page document. To find specific info, we convert text into "Embeddings" (numerical vectors) and store them in FAISS. This allows for lightning-fast similarity searches.

Raw PDF Text1,000,000+ chars
Recursive Splitter1000 chunks
FAISS Vector IndexSearchable DB

Capacity

1,000+

Pages Processed

Chunking Logic
def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=10000, 
        chunk_overlap=1000
    )
    chunks = text_splitter.split_text(text)
    return chunks
03

The Code

Integrating LangChain, Gemini, and FAISS.

01

TECH 01

PyPDF2

Extract raw text from PDF files.

02

TECH 02

LangChain

Text splitting & Prompt templates.

03

TECH 03

Google Gemini

Embeddings & Generation (1.0 Pro).

04

TECH 04

FAISS

Local vector storage for retrieval.

04

Deployment

Going live with Streamlit Cloud.

Streamlit Cloud

Connected directly to the GitHub repository for continuous deployment. Any push to main triggers a re-build.

Secret Management

Secured sensitive data like `GEMINI_API_KEY` using Streamlit's secrets management (`secrets.toml`).

secrets.toml
# .streamlit/secrets.toml
GEMINI_API_KEY = "AIzaSyD-xxxxxxxxxxxx"
USER_NAME = "admin"
PASSWORD = "secure_pass"
_
05

Live Demo

Interact with the interface.

Demo Mode

The live Streamlit app is currently restricted for security. Enjoy this interactive preview of the interface.

Drag & Drop PDF Here
Processing Status
Done
What happens to Harry in the Chamber?
Harry battles the Basilisk, uses the Sword of Gryffindor to defeat it, but gets bitten. Fawkes the Phoenix heals him with tears.