Skip to content

Working with Legacy Formats

“I was told I could keep my legacy documents at a reasonable location from nine to eleven…”

Legacy formats (.doc, .xls, .ppt) require special handling. mcwaddams uses OLE Compound Document parsing to extract content from files dating back to 1997.

Legacy formats just work:

Extract text from ancient-contract.doc

mcwaddams automatically:

  1. Detects the OLE structure
  2. Parses the binary format
  3. Extracts text with proper encoding
  4. Handles embedded objects
  • Encoding issues — Old files may use non-UTF-8 encoding; we detect and convert
  • Embedded fonts — Text renders correctly even without the original fonts
  • Macros — VBA macros are detected but not executed (security)

Full tutorial coming soon.

🎉

Flair Earned!

Badge Name

🎖️

You earned your first flair!

What should we call you?