Your First Extraction
“I’ll be honest with you, I love extracting documents. I do. I’m a mcwaddams fan.”
Let’s get you extracting documents faster than you can say “TPS report cover sheet.”
Prerequisites
Section titled “Prerequisites”Make sure you have mcwaddams installed and configured:
claude mcp add mcwaddams "uvx mcwaddams"Restart Claude Code, and you’re ready.
Add to your claude_desktop_config.json:
{ "mcpServers": { "mcwaddams": { "command": "uvx", "args": ["mcwaddams"] } }}Restart Claude Desktop.
Step 1: Find a Document
Section titled “Step 1: Find a Document”Grab any Office document you have lying around:
- A
.docxreport - An
.xlsxspreadsheet - A
.pptxpresentation - Even a crusty
.docfrom 2005
Step 2: Ask for Extraction
Section titled “Step 2: Ask for Extraction”Just tell your AI assistant what you want:
Extract text from /path/to/quarterly-report.docxThat’s it. No configuration, no options, no ceremony.
Step 3: Get Results
Section titled “Step 3: Get Results”mcwaddams returns structured data:
{ "text": "Q4 2024 Financial Summary\n\nRevenue increased by 15%...", "metadata": { "format": "Word Document (DOCX)", "extraction_method": "python-docx", "extraction_time": 0.042, "word_count": 3421 }}The AI can now use this content to answer your questions, summarize, analyze, or whatever you need.
What Just Happened?
Section titled “What Just Happened?”Behind the scenes, mcwaddams:
-
Detected the format — Identified
.docxas a modern Word document -
Selected the best method — Used
python-docxfor optimal extraction -
Extracted the content — Pulled text while preserving structure
-
Added metadata — Included timing and method information
Try Different Formats
Section titled “Try Different Formats”The same command works for all supported formats:
Word Documents
Section titled “Word Documents”Extract text from contract.docxExtract text from legacy-proposal.docExcel Spreadsheets
Section titled “Excel Spreadsheets”Extract text from sales-data.xlsxExtract text from budget-2019.xlsPowerPoint Presentations
Section titled “PowerPoint Presentations”Extract text from quarterly-deck.pptxExtract text from old-presentation.pptCSV Files
Section titled “CSV Files”Extract text from export.csvWorking with Large Documents
Section titled “Working with Large Documents”Documents over 25,000 tokens get automatically paginated:
{ "text": "Chapter 1: Introduction...", "pagination": { "current_page": 1, "total_pages": 5, "cursor_id": "abc123" }}To get the next page:
Continue extracting (cursor: abc123)Common Options
Section titled “Common Options”You can be more specific about what you want:
Include Images
Section titled “Include Images”Extract text and images from report.docxGet Metadata Only
Section titled “Get Metadata Only”Get metadata from mystery-file.docConvert to Markdown
Section titled “Convert to Markdown”Convert presentation.pptx to markdownAnalyze Structure
Section titled “Analyze Structure”Show me the structure of thesis.docxError Messages
Section titled “Error Messages”mcwaddams provides clear errors when something goes wrong:
File Not Found
Section titled “File Not Found”{ "error": "File not found", "path": "/path/to/missing.docx", "hint": "Check that the file path exists and is accessible"}Unsupported Format
Section titled “Unsupported Format”{ "error": "Unsupported format", "extension": ".xyz", "hint": "Use get_supported_formats to see all supported types"}Password Protected
Section titled “Password Protected”{ "error": "Document is password-protected", "hint": "Remove password protection or provide an unencrypted version"}Next Steps
Section titled “Next Steps”Now that you’ve extracted your first document:
- Working with Legacy Formats — Handle
.doc,.xls,.ppt - Indexing Large Documents — Efficient access to huge files
- Extract Tables — Structured table extraction
- All Tools Reference — Complete tool documentation
“Looks like someone has a case of the Mondays.”
Not anymore. Your documents are extracted.