Supported Formats
- PDF - Most common
- DOCX - Word documents
- TXT - Plain text
- Markdown - Formatted text
- JSON - Structured data
- XLSX - Spreadsheets
- HTML - Web pages
Upload Steps
Document Processing
Each document:- Parse - Extract text from format
- Chunk - Split into ~1KB sections
- Embed - Convert to vectors
- Index - Store in vector DB
- Ready - Available for search
- Small docs: seconds
- Large docs: minutes
- Batch uploads: hours
Chunking Strategy
Documents split intelligently:- Respect paragraph boundaries
- Keep context together
- Default: ~1KB per chunk
- Overlap: 10% between chunks
Metadata
Optional info attached to documents:- Search filtering
- Result ranking
- Citation tracking
Updating Documents
To update a document:- Delete old version
- Upload new version
- New version indexed
- Upload new version
- Old version auto-deleted
- New version indexed
Organizing Documents
Use metadata to organize:| Metadata | Purpose |
|---|---|
| Category | Product, FAQ, Policy, etc. |
| Tags | Multiple labels |
| Source | Where doc came from |
| Date | Version date |
Search Filtering
Filter search results by metadata:Deleting Documents
Remove document from knowledge base:- Documents tab
- Select document
- Click Delete
- Confirm
Bulk Upload
Upload multiple documents:- Create ZIP file with documents
- Upload ZIP
- Noorle extracts and processes all
File Size Limits
- Per file: 100MB
- Per batch: 1GB
- Total account: 500GB (limit)
Processing Errors
Common errors and solutions:| Error | Solution |
|---|---|
| Unsupported format | Convert to PDF first |
| Corrupted file | Re-download and try again |
| Encoding issue | Save as UTF-8 |
| Too large | Split into multiple files |
| Password protected | Remove protection first |
Testing Search
After upload, test search works:- Click Test Search
- Enter query
- Verify documents appear
- Check relevance