1. Retieval-Augmented-Generation

With Rag, you can retrieve external information instead of only relying on the model's parameters.

User asks questions 
↓
Search relevant documents / database / web / notes
↓
Put retrieved chunks into the prompt
↓
LLM answers using those chunks

So it's simple.
User question -> Embed question -> Search vector DB for similar chunks -> Retrieve top-k chunks -> Put chunks into LLM prompt -> Generate answer

And the database part looks like this.

documents table

id | content_chunk                         | embedding
---|---------------------------------------|---------------------
1  | "Refunds are allowed within 14 days"  | [0.12, -0.44, ...]
2  | "Premium users may request..."        | [0.08, 0.31, ...]
3  | "Cancellation takes effect..."        | [-0.19, 0.22, ...]

2. is rag always good?

No because

more retrieved context
= more coverage
= more noise

Less retrieved context
= cleaner prompt
= more noise