Quick Start
“I’ll be honest with you, I love his music. I do. I’m a Michael Bolton fan.”
Let’s get you extracting documents faster than you can say “TPS report cover sheet.”
Your First Extraction
Section titled “Your First Extraction”-
Point at a document
Extract text from /path/to/quarterly-report.docx -
Get the content
{"text": "Q4 2024 Financial Summary\n\nRevenue increased by 15%...","metadata": {"format": "Word Document (DOCX)","extraction_method": "python-docx","extraction_time": 0.042}} -
That’s it.
Common Operations
Section titled “Common Operations”Extract Text (Any Format)
Section titled “Extract Text (Any Format)”# Works with .docx, .doc, .xlsx, .xls, .pptx, .ppt, .csvresult = await extract_text("document.docx")print(result["text"])Convert Word to Markdown
Section titled “Convert Word to Markdown”result = await convert_to_markdown("report.docx")print(result["markdown"])Extract Tables
Section titled “Extract Tables”result = await extract_word_tables( "contract.docx", output_format="markdown")# Returns tables as markdown tablesAnalyze Excel Data
Section titled “Analyze Excel Data”result = await analyze_excel_data( "sales-data.xlsx", include_statistics=True, check_data_quality=True)# Returns column types, missing values, outliers, statisticsIndex for On-Demand Fetching
Section titled “Index for On-Demand Fetching”# Index onceresult = await index_document("novel.docx")# Returns: {"doc_id": "abc123", "resources": {...}}
# Fetch chapters on demand via MCP resources# chapter://abc123/1 → Chapter 1# chapter://abc123/1.txt → Plain text# chapters://abc123/1-5 → Multiple chaptersWorking with URLs
Section titled “Working with URLs”mcwaddams can fetch documents directly from URLs:
result = await extract_text("https://example.com/report.docx")Files are cached for 1 hour by default.
Format Detection
Section titled “Format Detection”Not sure what you’re dealing with?
result = await detect_office_format("mystery-file.doc")# Returns: format, version, encryption status, document categoryError Handling
Section titled “Error Handling”mcwaddams never silently fails. You’ll get either:
- Content — The extracted text/data
- Clear error — Explaining exactly what went wrong
result = await extract_text("encrypted.docx")# Returns: {"error": "Document is password-protected", "hint": "..."}Next Steps
Section titled “Next Steps”- Tutorials — Deeper walkthrough of each tool
- Reference — All 20 tools with parameters
- TPS Reports — See our test results
“Looks like someone has a case of the Mondays.”
Not anymore. Your documents are handled.