RAG & Semantic Search

🧠 RAG & Semantic Search β€” Personal Notes


Semantic search focuses on the meaning of content, unlike lexical search which does literal string/pattern matching.

Nearest Neighbour Algorithms

AlgorithmDescription
KNN (K-Nearest Neighbours)Basic approach β€” compares a vector against all others
NSW (Navigable Small World)Approximate nearest neighbour β€” maintains 2 closest connections per node, navigates the graph from a random start point
HNSW (Hierarchical Navigable Small World)Layered version of NSW β€” searches across multiple layers, narrowing down to the nearest vector at each level

HNSW is the standard used in most production vector databases.


2. RAG Workflow

DATA β†’ Chunks β†’ Vector Embeddings β†’ Vector Store β†’ Relevant Docs β†’ LLM β†’ USER

Tweeks


3. Embeddings

An embedding is a vector that carries meaning.

Sparse Embeddings

Dense Embeddings

Combining sparse + dense embeddings gives the best retrieval results. This is called hybrid search.


4. Vector Databases

Uses HNSW under the hood β€” returns all relevant elements within a given distance threshold.

Choosing a Vector Database β€” Key Considerations

FactorQuestions to Ask
Search FunctionalityWhat types of search do you need?
BudgetOpen-source vs. commercial?
PrivacyDoes data need to be self-hosted?
PopularityBetter community = better support
SDK SupportDoes it support your language/stack?
PerformanceDoes it meet your latency/scale needs?

5. Chunking Strategy

Chunk Length

Chunk Overlap


6. Metadata

Attach extra metadata to chunks for two reasons:

  1. Enrich LLM generation β€” e.g. citations, page numbers
  2. Enable custom filtering during retrieval

Common Metadata Fields


7. Reranking

Searches often return irrelevant chunks β€” reranking sorts results by relevance and filters noise, especially useful with keyword search.

Reranking Pipeline (Example: Top k = 5)

Hybrid Search  β†’  Top k Γ— 2  =  10 chunks
Add adjacent chunks           =  18 chunks
Reranker filters to Top k     =  5 chunks  βœ…

Rule of thumb: Retrieve 2–3Γ— more than your Top k, then rerank down to Top k.

Cross Encoders


Quick Reference Cheatsheet

Semantic Search   β†’  meaning-based, uses vector distance
Lexical Search    β†’  pattern/string matching (BM25)
Embeddings        β†’  vectors with meaning
Sparse            β†’  BM25 (term frequency)
Dense             β†’  CLIP, neural models
Hybrid            β†’  sparse + dense combined
HNSW              β†’  layered graph for fast ANN search
Chunking          β†’  200–300 tokens default, overlap helps
Reranking         β†’  retrieve 2–3Γ— Top k, filter down
Cross Encoder     β†’  fast ML model for relevance scoring