1. Retieval-Augmented-Generation
With Rag, you can retrieve external information instead of only relying on the model's parameters.
User asks questions
↓
Search relevant documents / database / web / notes
↓
Put retrieved chunks into the prompt
↓
LLM answers using those chunksSo it's simple.
User question -> Embed question -> Search vector DB for similar chunks -> Retrieve top-k chunks -> Put chunks into LLM prompt -> Generate answer
And the database part looks like this.
documents table
id | content_chunk | embedding
---|---------------------------------------|---------------------
1 | "Refunds are allowed within 14 days" | [0.12, -0.44, ...]
2 | "Premium users may request..." | [0.08, 0.31, ...]
3 | "Cancellation takes effect..." | [-0.19, 0.22, ...]2. is rag always good?
No because
more retrieved context
= more coverage
= more noise
Less retrieved context
= cleaner prompt
= more noise