You’re drowning in context. A year of research notes scattered across Notion, Obsidian, and email drafts. A folder of PDFs you’ll never search effectively. When you need that one insight — the specific prompt structure that worked three months ago, the paper on token optimization, the customer query pattern — you either spend 20 minutes digging or ask the LLM to hallucinate it.
A personal AI knowledge base fixes this. Not a folder. Not a note-taking app hoping to add search. A system where you feed content in, retrieve it with natural language, and feed it into your prompts with zero friction.
Why Generic Note Apps Fail for AI Work
Obsidian, Roam, Notion — they optimize for human retrieval. You navigate folders, use search bars, remember where you filed something. That’s friction.
An AI knowledge base optimizes for semantic search and programmatic retrieval. You ask it a question in English. It finds relevant content, ranks it, and you use it immediately in your next prompt.
The difference: Obsidian search finds “token optimization”. Semantic search finds “techniques to reduce input token count for long documents” and returns three papers, a prompt library entry, and a benchmark you ran last month — ranked by relevance.
For production AI work, that difference means the difference between guessing and building on actual evidence.
The Core Stack: Three Tools That Actually Work
You need three components: ingestion, storage, and retrieval. Pick tools that don’t require PhD-level DevOps.
Ingestion: Unstructured or Firecrawl
Unstructured.io parses PDFs, docs, emails, and web pages into clean text. Firecrawl crawls websites and returns structured data. Both strip formatting noise and preserve semantic meaning — critical because bad input ruins everything downstream.
Use Unstructured if you’re mostly working with static files (research papers, your own notes exported). Use Firecrawl if you’re indexing blogs, documentation, or learning resources.
Storage: Supabase + pgvector or Pinecone
You need vector embeddings (semantic meaning) and structured metadata (source, date, category). Supabase + pgvector is open-source and costs $25/month for serious usage. Pinecone is simpler but vendor-locked.
Supabase wins if you want portability. Pinecone wins if you want zero infrastructure.
Retrieval: Claude or OpenAI with function calling
Your retrieval layer doesn’t need to be complicated. Query your vector DB, get results, inject them into a system prompt. Claude Sonnet 4 costs $3 per million input tokens — for a personal system, you’ll spend under $10/month.
The Workflow: Build It Once, Use It Forever
This is the part that matters. Architecture without workflow is expensive machinery.
Step 1: Weekly ingestion cycle.
Every Sunday, you spend 30 minutes collecting the week’s useful content — a saved article, a customer support email pattern, a benchmark you ran, a prompt that worked. Dump it into a folder. Run a simple Python script that parses files, chunks them, embeds them, and stores them in your DB.
Step 2: Query before you build.
Before writing a new prompt, before building a new feature, before answering a complex question — query your knowledge base first.
# Bad workflow
You write a prompt from memory.
It underperforms.
You tweak it blindly.
# Good workflow
You query: "prompts for customer sentiment extraction from short text"
You get: 3 previous attempts, 2 benchmark results, 1 research paper
You write the prompt informed by actual history.
Step 3: Build retrieval into your AI pipeline.
This is where it becomes production-grade. Your LLM pipeline queries your knowledge base automatically, ranks results by relevance, and injects the top 3–5 documents into the system prompt.
# Python example: querying your knowledge base before a prompt
import supabase
from openai import OpenAI
# Initialize clients
supabase_client = supabase.create_client(url, key)
client = OpenAI()
# Query knowledge base
query = "optimization techniques for reducing hallucination in customer support"
embedding = client.embeddings.create(
input=query,
model="text-embedding-3-small"
).data[0].embedding
# Search vector DB
results = supabase_client.rpc(
'match_documents',
{
'query_embedding': embedding,
'match_count': 5,
'similarity_threshold': 0.7
}
).execute()
# Build context from results
context = "\n\n".join([r['content'] for r in results.data])
# Use context in system prompt
system_prompt = f"""You are a customer support AI. Use these reference materials:
{context}
Respond based on these materials when relevant."""
response = client.chat.completions.create(
model="gpt-4o",
system=system_prompt,
messages=[
{"role": "user", "content": user_query}
]
)
What to Actually Store
Not everything. Noise collapses signal.
Store: prompts that worked, benchmark results, research papers relevant to your work, patterns in customer queries, your own analysis and notes, tool comparisons you’ve run.
Don’t store: generic tutorials, marketing content, anything you’d find in a Google search in under 30 seconds.
Tag everything with metadata — source, date, relevance score, category. This matters. A prompt from three months ago ranked by your actual success rate beats a prompt ranked by string similarity.
Start Small, Iterate
The mistake: building the “perfect” system before you have content.
The right move: start with Supabase and a Python script this week. Index 20 documents. Query it 10 times. See what works. Iterate.
By month two you’ll know what you actually need to store. By month three you’ll have a system that pays for itself in time saved.
Pick one of the tools above — Supabase if you like control, Pinecone if you want simplicity — and build your first ingestion script this week. Start with your research folder, your best prompts, your benchmark results. That’s 20–50 documents. Enough to feel the difference.