Manage Documents
In a knowledge base, each imported item—whether a local file, a Notion page, or a web page—becomes a document. From the document list, you can view and manage all these documents to keep your knowledge accurate, relevant, and up-to-date.
| Action | Description |
|---|---|
| Add | Import a new document. |
| Modify Chunk Settings | Modify a document’s chunking settings (excluding the chunk structure). Each document can have its own chunking settings, while the chunk structure is shared across the knowledge base and cannot be changed once set. |
| Delete | Permanently remove a document. Deletion cannot be undone. |
| Enable / Disable | Temporarily include or exclude a document from retrieval. On Dify Cloud, documents that have not been updated or retrieved for a certain period are automatically disabled to optimize performance. The inactivity period varies by subscription plan:
|
| Generate Summary | Automatically generates summaries for all chunks in a document. Only available for self-hosted deployments when Summary Auto-Gen is enabled. Existing summaries will be overwritten. |
| Archive / Unarchive | Archive a document that you no longer need for retrieval but still want to keep. Archived documents are read-only and can be unarchived at any time. |
| Edit | Modify the content of a document by editing its chunks. See Manage Chunks for details. |
| Rename | Change the name of a document. |
Manage Chunks
According to its chunk settings, every document is split into content chunks—the basic units for retrieval. From the chunk list within a document, you can view and manage all its chunks to improve the retrieval efficiency and accuracy.
| Action | Description |
|---|---|
| Add | Add one or batch add multiple new chunks. For documents chunked with Parent-child mode, both new parent and child chunks can be added. Add chunks is a paid feature on Dify Cloud. Upgrade to Professional or Team to use it. |
| Delete | Permanently remove a chunk. Deletion cannot be undone. |
| Enable / Disable | Temporarily include or exclude a chunk from retrieval. Disabled chunks cannot be edited. |
| Edit | Modify the content of a chunk. Edited chunks are marked Edited. For knowledge bases using the Parent-child chunk mode:
|
| Add / Edit / Delete Keywords | Add or modify keywords (up to 10) for a chunk to improve its retrievability. Only available for knowledge bases using the Economical index method. |
| Add / Delete Image Attachments | Remove images extracted from documents or upload new ones within their corresponding chunk. URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won’t affect the extracted images. Each chunk can have up to 10 image attachments, which are returned alongside it during retrieval; images beyond this limit will not be extracted. For self-hosted deployments, you can adjust this limit via the environment variable SINGLE_CHUNK_ATTACHMENT_LIMIT. |
| Add / Edit / Delete Summary | Add, modify, or remove a summary for a chunk. Summaries are embedded and indexed for retrieval as well. When a summary matches a query, its corresponding chunk is also returned. |
Best Practices
Check Chunk Quality
After a document is chunked, carefully review each chunk to ensure it’s semantically complete and appropriately sized for optimal retrieval accuracy and response relevance. Common issues to watch for:- Chunks are too short—may lack sufficient context, leading to semantic loss and inaccurate answers.
- Chunks are too long—may include irrelevant information, introducing semantic noise and lowering retrieval precision.
- Chunks are semantically incomplete—caused by forced chunking that cuts through sentences or paragraphs, resulting in missing or misleading content during retrieval.
Use Child Chunks as Retrieval Hooks for Parent Chunks
For documents chunked with Parent-child mode, the system searches across child chunks but returns the parent chunks. Since editing a child chunk does not update its parent, you can treat child chunks as semantic tags or retrieval hints for their parent chunks. To do this, rewrite child chunks into keywords, summaries, or common user queries. For example, if a parent chunk covers technical “LED Status Indicators”, you could rephrase its child chunks as:- blinking light, won’t turn on, red light, connection error, frozen (keywords)
- Guide to interpreting LED colors and troubleshooting hardware power or pairing issues (summaries)
- What does a solid red light mean? (queries)
Use Summaries to Bridge Query-Content Gaps
While high-quality indexing enables semantic search, raw chunks can still be hard to retrieve when they are too specific, noisy, or structurally complex to align well with user queries. Summaries bridge this gap by providing a condensed semantic layer that makes the chunk’s core intent explicit. Use summaries when:- User queries differ from document language: For technical documentation written formally, add summaries in the way users actually ask questions.
- Concepts are implicit or buried in details: Add high-level summaries that surface the core concepts and intent, so the chunk can be matched without relying on small details scattered across the text.
- Raw text is non-textual: When a chunk is primarily code, tables, logs, transcripts, or otherwise hard to match semantically, add descriptive summaries that clearly label what the chunk contains.
-
Related chunks should be retrieved together: Apply identical summaries to a series of related chunks to enable grouped retrieval. This semantic glue allows multiple parts of a topic to be retrieved together, providing richer context.
The number of returned related chunks is subject to the Top K limit defined in the retrieval settings.