Your First Extraction

“I’ll be honest with you, I love extracting documents. I do. I’m a mcwaddams fan.”

Let’s get you extracting documents faster than you can say “TPS report cover sheet.”

Prerequisites

Make sure you have mcwaddams installed and configured:

Claude Code
Claude Desktop

claude mcp add mcwaddams "uvx mcwaddams"

Restart Claude Code, and you’re ready.

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "mcwaddams": {
      "command": "uvx",
      "args": ["mcwaddams"]
    }
  }
}

Restart Claude Desktop.

Step 1: Find a Document

Grab any Office document you have lying around:

A .docx report
An .xlsx spreadsheet
A .pptx presentation
Even a crusty .doc from 2005

Step 2: Ask for Extraction

Just tell your AI assistant what you want:

Extract text from /path/to/quarterly-report.docx

That’s it. No configuration, no options, no ceremony.

Step 3: Get Results

mcwaddams returns structured data:

{
  "text": "Q4 2024 Financial Summary\n\nRevenue increased by 15%...",
  "metadata": {
    "format": "Word Document (DOCX)",
    "extraction_method": "python-docx",
    "extraction_time": 0.042,
    "word_count": 3421
  }
}

The AI can now use this content to answer your questions, summarize, analyze, or whatever you need.

What Just Happened?

Behind the scenes, mcwaddams:

Detected the format — Identified .docx as a modern Word document
Selected the best method — Used python-docx for optimal extraction
Extracted the content — Pulled text while preserving structure
Added metadata — Included timing and method information

Try Different Formats

The same command works for all supported formats:

Word Documents

Extract text from contract.docx
Extract text from legacy-proposal.doc

Excel Spreadsheets

Extract text from sales-data.xlsx
Extract text from budget-2019.xls

PowerPoint Presentations

Extract text from quarterly-deck.pptx
Extract text from old-presentation.ppt

CSV Files

Extract text from export.csv

Working with Large Documents

Documents over 25,000 tokens get automatically paginated:

{
  "text": "Chapter 1: Introduction...",
  "pagination": {
    "current_page": 1,
    "total_pages": 5,
    "cursor_id": "abc123"
  }
}

To get the next page:

Continue extracting (cursor: abc123)

Common Options

You can be more specific about what you want:

Include Images

Extract text and images from report.docx

Get Metadata Only

Get metadata from mystery-file.doc

Convert to Markdown

Convert presentation.pptx to markdown

Analyze Structure

Show me the structure of thesis.docx

Error Messages

mcwaddams provides clear errors when something goes wrong:

File Not Found

{
  "error": "File not found",
  "path": "/path/to/missing.docx",
  "hint": "Check that the file path exists and is accessible"
}

Unsupported Format

{
  "error": "Unsupported format",
  "extension": ".xyz",
  "hint": "Use get_supported_formats to see all supported types"
}

Password Protected

{
  "error": "Document is password-protected",
  "hint": "Remove password protection or provide an unencrypted version"
}

Next Steps

Now that you’ve extracted your first document:

Working with Legacy Formats — Handle .doc, .xls, .ppt
Indexing Large Documents — Efficient access to huge files
Extract Tables — Structured table extraction
All Tools Reference — Complete tool documentation

“Looks like someone has a case of the Mondays.”

Not anymore. Your documents are extracted.

🎉

Flair Earned!

Badge Name