The Backstory
“I was told I could listen to the radio at a reasonable volume from nine to eleven…”
The Relocation
Section titled “The Relocation”Milton Waddams was relocated to the basement. They took his stapler. But down there, surrounded by boxes of .doc files from 1997 and .xls spreadsheets that predate Unicode, he became something else entirely.
He became a document processing expert.
The Problem
Section titled “The Problem”Every enterprise has them:
- The Archive Folder — 50,000 Word documents from before the cloud existed
- The Legacy Database Export — Excel files with formulas referencing cells that no longer exist
- The Board Presentations — PowerPoint decks with embedded charts from 2003
- The Contract Repository —
.docfiles that crash modern Word
AI agents can read PDFs. They can parse JSON. But Office documents? The binary formats, the OLE containers, the OOXML with custom schemas?
Nobody wants to deal with that.
The Solution
Section titled “The Solution”mcwaddams handles the documents nobody else wants to touch.
# Extract text from a 1997 .doc fileresult = await extract_text("contract_final_FINAL_v2.doc")
# It just worksprint(result["text"])What We Handle
Section titled “What We Handle”| Format | Era | Status |
|---|---|---|
.docx | 2007+ | ✅ Full support |
.doc | 1997-2007 | ✅ Works fine |
.xlsx | 2007+ | ✅ Full support |
.xls | 1997-2007 | ✅ Works fine |
.pptx | 2007+ | ✅ Full support |
.ppt | 1997-2007 | ✅ Works fine |
The Philosophy
Section titled “The Philosophy”1. No Silent Failures
Section titled “1. No Silent Failures”When python-docx can’t handle a file, mammoth tries. When openpyxl fails, pandas steps in. You’ll always get either content or a clear error message explaining why.
2. Legacy is Not Abandoned
Section titled “2. Legacy is Not Abandoned”Those .doc files from 2003? They’re still business-critical for someone. We don’t treat legacy formats as second-class citizens.
3. Context-Aware Extraction
Section titled “3. Context-Aware Extraction”Large documents get paginated automatically. The MCP resource system lets you fetch chapters on-demand. Your context window stays manageable.
4. Testing Painful Stuff
Section titled “4. Testing Painful Stuff”We threw 301 random Office documents at mcwaddams. 299 succeeded. The 2 failures were empty/corrupt files.
See the TPS Reports for proof.
The Name
Section titled “The Name”Milton Waddams. The guy with the stapler. Relegated to the basement with the old filing cabinets and the roaches.
That’s where the legacy documents live too.
“I could set the building on fire…”
Ready to start? → Installation