Working with Legacy Formats
“I was told I could keep my legacy documents at a reasonable location from nine to eleven…”
Legacy formats (.doc, .xls, .ppt) require special handling. mcwaddams uses OLE Compound Document parsing to extract content from files dating back to 1997.
The Basics
Section titled “The Basics”Legacy formats just work:
Extract text from ancient-contract.docmcwaddams automatically:
- Detects the OLE structure
- Parses the binary format
- Extracts text with proper encoding
- Handles embedded objects
Common Quirks
Section titled “Common Quirks”- Encoding issues — Old files may use non-UTF-8 encoding; we detect and convert
- Embedded fonts — Text renders correctly even without the original fonts
- Macros — VBA macros are detected but not executed (security)
Full tutorial coming soon.