RAG Chunk Visualizer
Paste a document, see how it gets chunked under fixed, recursive, or markdown-aware strategies side-by-side.
1,361 chars · 199 words
Strategy
Try paragraphs → sentences → chars. Generic, robust.
5 chunksavg 0 tokens · total 0 tokens
# Why retrieval-augmented generation matters Large language models are powerful but bounded. They can't know what happened after their training cut-off, they can't reference your private documents, and even when they can, they hallucinate confidently. ## The core ideaRetrieval-augmented generation (RAG) sidesteps these limits by retrieving relevant documents at query time and stuffing them into the model's context window. Instead of asking the model what it knows, you ask it to answer a question *given* the documents you've supplied. ## Chunking is the hidden leverThe retrieval quality of a RAG system is dominated by how you split your documents into chunks. Chunks that are too small lose context. Chunks that are too large dilute relevance and burn tokens. Boundaries that cross logical sections — mid-sentence, mid-paragraph, mid-section — produce noisy retrieval. There are several common strategies:1. **Fixed-size** with a small overlap. Simple, predictable, ignores structure.
2. **Recursive** which tries paragraph boundaries first, then sentences, then characters as a last resort.
3. **Markdown-aware** for documentation, which respects heading hierarchies.
4. **Semantic** which uses embeddings to group similar adjacent sentences.Each has tradeoffs. The point of this tool is to make them visible side-by-side on a real document.
Chunks (5)
| # | Chars | Tokens | Preview |
|---|---|---|---|
| 1 | 268 | … | # Why retrieval-augmented generation matters Large language models are powerful … |
| 2 | 303 | … | Retrieval-augmented generation (RAG) sidesteps these limits by retrieving releva… |
| 3 | 341 | … | The retrieval quality of a RAG system is dominated by how you split your documen… |
| 4 | 338 | … | 1. **Fixed-size** with a small overlap. Simple, predictable, ignores structure. … |
| 5 | 99 | … | Each has tradeoffs. The point of this tool is to make them visible side-by-side … |