Skip to content

Extract Tables from Word Documents

“We need to talk about your table extraction…”

Word documents often contain important data locked in tables. mcwaddams extracts them as structured data you can actually use.


You have a Word document with a table like this:

ProductQ1Q2Q3Q4
Widget A1,2341,4561,6781,890
Widget B9871,0121,0451,089
Widget C456489512534

You need that data in a format you can process, not trapped in a binary .docx file.


Extract tables from quarterly-report.docx as markdown

Result:

| Product | Q1 | Q2 | Q3 | Q4 |
|---------|-----|-----|-----|-----|
| Widget A | 1,234 | 1,456 | 1,678 | 1,890 |
| Widget B | 987 | 1,012 | 1,045 | 1,089 |
| Widget C | 456 | 489 | 512 | 534 |

Extract tables from report.docx as markdown

Perfect for documentation, README files, or pasting into other documents.


Tables with merged cells are normalized:

Extract tables, preserve merged cells

The tool maintains cell relationships while making the data usable.

Tables within tables? We handle those too:

{
"tables": [
{
"index": 0,
"nested_tables": [
{
"cell": [1, 2],
"data": [...]
}
]
}
]
}

Tables spanning multiple pages extract completely — no truncation.


Documents with several tables return all of them:

Extract all tables from multi-section-report.docx
{
"tables": [
{"index": 0, "context": "Executive Summary", "rows": 5},
{"index": 1, "context": "Financial Data", "rows": 24},
{"index": 2, "context": "Appendix A", "rows": 100}
],
"total_tables": 3
}

Extract only the table you need:

Extract the second table from report.docx

Or by context:

Extract the table under "Financial Summary"

Empty cells are preserved as empty strings, not null:

{
"rows": [
["Widget A", "", "1,456", "1,678", "1,890"]
]
}

Bold, italic, and other formatting is stripped for clean data. If you need styled output, use markdown format.

Numbers remain as strings to preserve formatting (commas, currency symbols). Convert as needed:

import json
data = json.loads(result)
# Convert strings to numbers
for row in data["tables"][0]["rows"]:
row[1] = int(row[1].replace(",", ""))

Tables in .doc files work the same way:

Extract tables from old-report.doc as JSON

The OLE parser handles legacy table structures automatically.



“PC Load Letter? What does that mean?”


We don’t know either, but at least tables make sense now.
🎉

Flair Earned!

Badge Name

🎖️

You earned your first flair!

What should we call you?