Cookbook
This guide provides immediate solutions for common tasks.
Table of Contents
- Installation
- Read Tables from File
- Read Table from Text
- Excel Integration
- Pandas Integration
- Programmatic Editing
- Formatting & Linting
- JSON Conversion
- Type-Safe Validation
1. Installation
pip install md-spreadsheet-parser
2. Read Tables from File (Recommended)
The easiest way to extract data from a Markdown file is using scan_tables_from_file. This works regardless of the file structure (ignoring headers like # or ##).
data.md
| ID | Name |
| -- | ---- |
| 1 | Alice |
| 2 | Bob |
Python
from md_spreadsheet_parser import scan_tables_from_file
# Returns a list of Table objects
tables = scan_tables_from_file("data.md")
for table in tables:
print(table.rows)
# [['1', 'Alice'], ['2', 'Bob']]
3. Read Table from Text (Simple)
If you have a markdown string, use parse_table.
from md_spreadsheet_parser import parse_table
markdown = """
| ID | Name |
| -- | ---- |
| 1 | Alice |
"""
table = parse_table(markdown)
print(table.headers) # ['ID', 'Name']
print(table.rows[0]) # ['1', 'Alice']
4. Excel Integration
Excel (TSV/CSV) → Markdown
This is the easiest method! Just copy cells from Excel and paste them as a string.
Convert Excel-exported TSV or CSV data to Markdown. Handles merged headers and in-cell newlines.
from md_spreadsheet_parser import parse_excel, ExcelParsingSchema
# Paste your Excel data (TSV format)
tsv_data = """
ID\tName\tNotes
1\tAlice\t"Lines
include
newlines"
2\tBob\tSimple
""".strip()
table = parse_excel(tsv_data)
print(table.to_markdown())
With Merged Headers (Forward-Fill)
# Excel merged cells export as: "Category\t\t\tInfo"
tsv = "Category\t\t\tInfo\nA\tB\tC\tD"
table = parse_excel(tsv)
# Headers: ["Category", "Category", "Category", "Info"]
With 2-Row Hierarchical Headers
# Parent row: "Info\t\tMetrics\t"
# Child row: "Name\tID\tScore\tRank"
tsv = "Info\t\tMetrics\t\nName\tID\tScore\tRank\nAlice\t001\t95\t1"
table = parse_excel(tsv, ExcelParsingSchema(header_rows=2))
# Headers: ["Info - Name", "Info - ID", "Metrics - Score", "Metrics - Rank"]
Excel (.xlsx) → Markdown (with openpyxl)
If you have openpyxl installed, you can pass Worksheets directly.
# pip install openpyxl # User-managed dependency
import openpyxl
from md_spreadsheet_parser import parse_excel, ExcelParsingSchema
wb = openpyxl.load_workbook("report.xlsx", data_only=True)
ws = wb["SalesData"] # Select sheet by name
table = parse_excel(ws, ExcelParsingSchema(header_rows=2))
print(table.to_markdown())
5. Pandas Integration
This library acts as a bridge between Markdown and Data Science tools.
Markdown -> Pandas DataFrame
Convert parsed tables directly to a list of dictionaries, which Pandas can ingest.
import pandas as pd
from md_spreadsheet_parser import scan_tables_from_file
tables = scan_tables_from_file("data.md")
df = pd.DataFrame(tables[0].to_models(dict))
print(df)
# ID Name
# 0 1 Alice
# 1 2 Bob
Pandas DataFrame -> Markdown
Convert a Pandas DataFrame into a Table object to generate Markdown.
import pandas as pd
from md_spreadsheet_parser import Table
# 1. Setup your DataFrame
df = pd.DataFrame({
"ID": [1, 2],
"Name": ["Alice", "Bob"]
})
# 2. Convert to Table
# Ensure all data is stringified for the parser
headers = df.columns.tolist()
rows = df.astype(str).values.tolist()
table = Table(headers=headers, rows=rows)
# 3. Generate Markdown
print(table.to_markdown())
# | ID | Name |
# | --- | --- |
# | 1 | Alice |
# | 2 | Bob |
6. Programmatic Editing (Excel-like)
You can load a table, modify values based on logic (e.g., formulas), and save it back.
from md_spreadsheet_parser import parse_table
markdown = """
| Item | Price | Qty | Total |
|---|---|---|---|
| Apple | 100 | 2 | |
| Banana | 50 | 3 | |
"""
table = parse_table(markdown)
# Update "Total" column
# 1. basic string parsing (or use to_models for type safety)
new_rows = []
for row in table.rows:
price = int(row[1])
qty = int(row[2])
total = price * qty
# Create new row with updated total
new_rows.append([row[0], row[1], row[2], str(total)])
# 2. Create new table with updates
updated_table = Table(headers=table.headers, rows=new_rows)
print(updated_table.to_markdown())
7. Formatting & Linting
Read a messy, misaligned Markdown table and output it perfectly formatted.
from md_spreadsheet_parser import parse_table
# Messy input
messy_markdown = """
|Name|Age|
|---|---|
|Alice|30|
|Bob|25|
"""
table = parse_table(messy_markdown)
# Output clean Markdown
print(table.to_markdown())
# | Name | Age |
# | --- | --- |
# | Alice | 30 |
# | Bob | 25 |
8. JSON Conversion
Convert a table directly to a JSON string or list of dictionaries for API usage.
import json
from md_spreadsheet_parser import parse_table
markdown = """
| ID | Status |
| -- | ------ |
| 1 | Open |
"""
table = parse_table(markdown)
# Convert to list of dicts
data = table.to_models(dict)
# Dump to JSON
print(json.dumps(data, indent=2))
# [
# {
# "ID": "1",
# "Status": "Open"
# }
# ]
9. Type-Safe Validation
Convert loose text into strongly-typed Python objects.
from dataclasses import dataclass
from md_spreadsheet_parser import parse_table
@dataclass
class User:
id: int
name: str
active: bool = True
markdown = """
| id | name | active |
| -- | ---- | ------ |
| 1 | Alice| yes |
| 2 | Bob | no |
"""
users = parse_table(markdown).to_models(User)
for user in users:
print(f"{user.name} (Active: {user.active})")
# Alice (Active: True)
# Bob (Active: False)