Skip to main content

Manage Documents

In a knowledge base, each imported item—whether a local file, a Notion page, or a web page—becomes a document. From the document list, you can view and manage all these documents to keep your knowledge accurate, relevant, and up-to-date.
Click the knowledge base name at the top to quickly switch between knowledge bases.
Manage Knowledge Documents
ActionDescription
AddImport a new document.
Modify Chunk SettingsModify a document’s chunking settings (excluding the chunk structure).
Each document can have its own chunking settings, while the chunk structure is shared across the knowledge base and cannot be changed once set.
DeletePermanently remove a document. Deletion cannot be undone.
Enable / DisableTemporarily include or exclude a document from retrieval.
On Dify Cloud, documents that have not been updated or retrieved for a certain period are automatically disabled to optimize performance.

The inactivity period varies by subscription plan:
  • Sandbox: 7 days
  • Professional & Team: 30 days
For Professional and Team plans, these documents can be re-enabled with one click.
Generate SummaryAutomatically generates summaries for all chunks in a document. Only available for self-hosted deployments when Summary Auto-Gen is enabled.
Existing summaries will be overwritten.
Archive / UnarchiveArchive a document that you no longer need for retrieval but still want to keep. Archived documents are read-only and can be unarchived at any time.
EditModify the content of a document by editing its chunks. See Manage Chunks for details.
RenameChange the name of a document.

Manage Chunks

According to its chunk settings, every document is split into content chunks—the basic units for retrieval. From the chunk list within a document, you can view and manage all its chunks to improve the retrieval efficiency and accuracy.
Click the document name in the upper-left corner to quickly switch between documents.
Manage Knowledge Chunks
ActionDescription
AddAdd one or batch add multiple new chunks.

For documents chunked with Parent-child mode, both new parent and child chunks can be added.
Add chunks is a paid feature on Dify Cloud. Upgrade to Professional or Team to use it.
DeletePermanently remove a chunk. Deletion cannot be undone.
Enable / DisableTemporarily include or exclude a chunk from retrieval. Disabled chunks cannot be edited.
EditModify the content of a chunk. Edited chunks are marked Edited.

For knowledge bases using the Parent-child chunk mode:
  • When editing a parent chunk, you can choose to regenerate its child chunks or keep them unchanged.
  • Editing a child chunk does not update its parent chunk.
Add / Edit / Delete KeywordsAdd or modify keywords (up to 10) for a chunk to improve its retrievability. Only available for knowledge bases using the Economical index method.
Add / Delete Image AttachmentsRemove images extracted from documents or upload new ones within their corresponding chunk.

URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won’t affect the extracted images.
Each chunk can have up to 10 image attachments, which are returned alongside it during retrieval; images beyond this limit will not be extracted.

For self-hosted deployments, you can adjust this limit via the environment variable SINGLE_CHUNK_ATTACHMENT_LIMIT.
If you select a multimodal embedding model (marked with a Vision icon), the extracted images will also be embedded and indexed for retrieval.
Add / Edit / Delete SummaryAdd, modify, or remove a summary for a chunk.

Summaries are embedded and indexed for retrieval as well. When a summary matches a query, its corresponding chunk is also returned.
Add identical summaries to multiple chunks to enable grouped retrieval, allowing related chunks to be returned together (subject to the Top K limit).

Best Practices

Check Chunk Quality

After a document is chunked, carefully review each chunk to ensure it’s semantically complete and appropriately sized for optimal retrieval accuracy and response relevance. Common issues to watch for:
  • Chunks are too short—may lack sufficient context, leading to semantic loss and inaccurate answers.
  • Chunks are too long—may include irrelevant information, introducing semantic noise and lowering retrieval precision.
  • Chunks are semantically incomplete—caused by forced chunking that cuts through sentences or paragraphs, resulting in missing or misleading content during retrieval.

Use Child Chunks as Retrieval Hooks for Parent Chunks

For documents chunked with Parent-child mode, the system searches across child chunks but returns the parent chunks. Since editing a child chunk does not update its parent, you can treat child chunks as semantic tags or retrieval hints for their parent chunks. To do this, rewrite child chunks into keywords, summaries, or common user queries. For example, if a parent chunk covers technical “LED Status Indicators”, you could rephrase its child chunks as:
  • blinking light, won’t turn on, red light, connection error, frozen (keywords)
  • Guide to interpreting LED colors and troubleshooting hardware power or pairing issues (summaries)
  • What does a solid red light mean? (queries)

Use Summaries to Bridge Query-Content Gaps

While high-quality indexing enables semantic search, raw chunks can still be hard to retrieve when they are too specific, noisy, or structurally complex to align well with user queries. Summaries bridge this gap by providing a condensed semantic layer that makes the chunk’s core intent explicit. Use summaries when:
  • User queries differ from document language: For technical documentation written formally, add summaries in the way users actually ask questions.
  • Concepts are implicit or buried in details: Add high-level summaries that surface the core concepts and intent, so the chunk can be matched without relying on small details scattered across the text.
  • Raw text is non-textual: When a chunk is primarily code, tables, logs, transcripts, or otherwise hard to match semantically, add descriptive summaries that clearly label what the chunk contains.
  • Related chunks should be retrieved together: Apply identical summaries to a series of related chunks to enable grouped retrieval. This semantic glue allows multiple parts of a topic to be retrieved together, providing richer context.
    The number of returned related chunks is subject to the Top K limit defined in the retrieval settings.