Skip to content

Your First Extraction

“I’ll be honest with you, I love extracting documents. I do. I’m a mcwaddams fan.”

Let’s get you extracting documents faster than you can say “TPS report cover sheet.”


Make sure you have mcwaddams installed and configured:

Terminal window
claude mcp add mcwaddams "uvx mcwaddams"

Restart Claude Code, and you’re ready.


Grab any Office document you have lying around:

  • A .docx report
  • An .xlsx spreadsheet
  • A .pptx presentation
  • Even a crusty .doc from 2005

Just tell your AI assistant what you want:

Extract text from /path/to/quarterly-report.docx

That’s it. No configuration, no options, no ceremony.


mcwaddams returns structured data:

{
"text": "Q4 2024 Financial Summary\n\nRevenue increased by 15%...",
"metadata": {
"format": "Word Document (DOCX)",
"extraction_method": "python-docx",
"extraction_time": 0.042,
"word_count": 3421
}
}

The AI can now use this content to answer your questions, summarize, analyze, or whatever you need.


Behind the scenes, mcwaddams:

  1. Detected the format — Identified .docx as a modern Word document

  2. Selected the best method — Used python-docx for optimal extraction

  3. Extracted the content — Pulled text while preserving structure

  4. Added metadata — Included timing and method information


The same command works for all supported formats:

Extract text from contract.docx
Extract text from legacy-proposal.doc
Extract text from sales-data.xlsx
Extract text from budget-2019.xls
Extract text from quarterly-deck.pptx
Extract text from old-presentation.ppt
Extract text from export.csv

Documents over 25,000 tokens get automatically paginated:

{
"text": "Chapter 1: Introduction...",
"pagination": {
"current_page": 1,
"total_pages": 5,
"cursor_id": "abc123"
}
}

To get the next page:

Continue extracting (cursor: abc123)

You can be more specific about what you want:

Extract text and images from report.docx
Get metadata from mystery-file.doc
Convert presentation.pptx to markdown
Show me the structure of thesis.docx

mcwaddams provides clear errors when something goes wrong:

{
"error": "File not found",
"path": "/path/to/missing.docx",
"hint": "Check that the file path exists and is accessible"
}
{
"error": "Unsupported format",
"extension": ".xyz",
"hint": "Use get_supported_formats to see all supported types"
}
{
"error": "Document is password-protected",
"hint": "Remove password protection or provide an unencrypted version"
}

Now that you’ve extracted your first document:


“Looks like someone has a case of the Mondays.”


Not anymore. Your documents are extracted.
🎉

Flair Earned!

Badge Name

🎖️

You earned your first flair!

What should we call you?