Skip to content

API Reference

md_spreadsheet_parser

ConversionSchema dataclass

Configuration for converting string values to Python types.

Attributes:

Name Type Description
boolean_pairs tuple[tuple[str, str], ...]

Pairs of strings representing (True, False). Case-insensitive. Example: (("yes", "no"), ("on", "off")).

custom_converters dict[type, Callable[[str], Any]]

Dictionary mapping ANY Python type to a conversion function str -> Any. You can specify: - Built-in types: int, float, bool (to override default behavior) - Standard library types: Decimal, datetime, date, ZoneInfo - Custom classes: MyClass, Product

field_converters dict[str, Callable[[str], Any]]

Dictionary mapping field names (str) to conversion functions. Takes precedence over custom_converters.

Source code in src/md_spreadsheet_parser/schemas.py
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
@dataclass(frozen=True)
class ConversionSchema:
    """
    Configuration for converting string values to Python types.

    Attributes:
        boolean_pairs: Pairs of strings representing (True, False). Case-insensitive.
                       Example: `(("yes", "no"), ("on", "off"))`.
        custom_converters: Dictionary mapping ANY Python type to a conversion function `str -> Any`.
                           You can specify:
                           - Built-in types: `int`, `float`, `bool` (to override default behavior)
                           - Standard library types: `Decimal`, `datetime`, `date`, `ZoneInfo`
                           - Custom classes: `MyClass`, `Product`
        field_converters: Dictionary mapping field names (str) to conversion functions.
                          Takes precedence over `custom_converters`.
    """

    boolean_pairs: tuple[tuple[str, str], ...] = (
        ("true", "false"),
        ("yes", "no"),
        ("1", "0"),
        ("on", "off"),
    )
    custom_converters: dict[type, Callable[[str], Any]] = field(default_factory=dict)
    field_converters: dict[str, Callable[[str], Any]] = field(default_factory=dict)

ExcelParsingSchema dataclass

Configuration for parsing Excel-exported data (TSV/CSV or openpyxl).

Attributes:

Name Type Description
header_rows int

Number of header rows (1 or 2). If 2, headers are flattened to "Parent - Child" format.

fill_merged_headers bool

Whether to forward-fill empty header cells (for merged cells in Excel exports).

delimiter str

Column separator for TSV/CSV parsing. Default is tab.

header_separator str

Separator used when flattening 2-row headers.

Source code in src/md_spreadsheet_parser/schemas.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
@dataclass(frozen=True)
class ExcelParsingSchema:
    """
    Configuration for parsing Excel-exported data (TSV/CSV or openpyxl).

    Attributes:
        header_rows: Number of header rows (1 or 2).
                     If 2, headers are flattened to "Parent - Child" format.
        fill_merged_headers: Whether to forward-fill empty header cells
                             (for merged cells in Excel exports).
        delimiter: Column separator for TSV/CSV parsing. Default is tab.
        header_separator: Separator used when flattening 2-row headers.
    """

    header_rows: int = 1
    fill_merged_headers: bool = True
    delimiter: str = "\t"
    header_separator: str = " - "

    def __post_init__(self):
        if self.header_rows not in (1, 2):
            raise ValueError("header_rows must be 1 or 2")

MultiTableParsingSchema dataclass

Bases: ParsingSchema

Configuration for parsing multiple tables (workbook mode). Inherits from ParsingSchema.

Attributes:

Name Type Description
root_marker str

The marker indicating the start of the data section. Defaults to "# Tables".

sheet_header_level int

The markdown header level for sheets. Defaults to 2 (e.g. ## Sheet).

table_header_level int | None

The markdown header level for tables. If None, table names are not extracted. Defaults to None.

capture_description bool

Whether to capture text between the table header and the table as a description. Defaults to False.

Source code in src/md_spreadsheet_parser/schemas.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@dataclass(frozen=True)
class MultiTableParsingSchema(ParsingSchema):
    """
    Configuration for parsing multiple tables (workbook mode).
    Inherits from ParsingSchema.

    Attributes:
        root_marker (str): The marker indicating the start of the data section. Defaults to "# Tables".
        sheet_header_level (int): The markdown header level for sheets. Defaults to 2 (e.g. `## Sheet`).
        table_header_level (int | None): The markdown header level for tables. If None, table names are not extracted. Defaults to None.
        capture_description (bool): Whether to capture text between the table header and the table as a description. Defaults to False.
    """

    root_marker: str = "# Tables"
    sheet_header_level: int = 2
    table_header_level: int | None = 3
    capture_description: bool = True

    def __post_init__(self):
        if self.capture_description and self.table_header_level is None:
            raise ValueError(
                "capture_description=True requires table_header_level to be set"
            )

ParsingSchema dataclass

Configuration for parsing markdown tables. Designed to be immutable and passed to pure functions.

Attributes:

Name Type Description
column_separator str

Character used to separate columns. Defaults to "|".

header_separator_char str

Character used in the separator row. Defaults to "-".

require_outer_pipes bool

Whether tables must have outer pipes (e.g. | col |). Defaults to True.

strip_whitespace bool

Whether to strip whitespace from cell values. Defaults to True.

Source code in src/md_spreadsheet_parser/schemas.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@dataclass(frozen=True)
class ParsingSchema:
    """
    Configuration for parsing markdown tables.
    Designed to be immutable and passed to pure functions.

    Attributes:
        column_separator (str): Character used to separate columns. Defaults to "|".
        header_separator_char (str): Character used in the separator row. Defaults to "-".
        require_outer_pipes (bool): Whether tables must have outer pipes (e.g. `| col |`). Defaults to True.
        strip_whitespace (bool): Whether to strip whitespace from cell values. Defaults to True.
    """

    column_separator: str = "|"
    header_separator_char: str = "-"
    require_outer_pipes: bool = True
    strip_whitespace: bool = True
    convert_br_to_newline: bool = True

Sheet dataclass

Represents a single sheet containing tables.

Attributes:

Name Type Description
name str

Name of the sheet.

tables list[Table]

List of tables contained in this sheet.

metadata dict[str, Any] | None

Arbitrary metadata (e.g. layout). Defaults to None.

Source code in src/md_spreadsheet_parser/models.py
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
@dataclass(frozen=True)
class Sheet:
    """
    Represents a single sheet containing tables.

    Attributes:
        name (str): Name of the sheet.
        tables (list[Table]): List of tables contained in this sheet.
        metadata (dict[str, Any] | None): Arbitrary metadata (e.g. layout). Defaults to None.
    """

    name: str
    tables: list[Table]
    metadata: dict[str, Any] | None = None

    def __post_init__(self):
        if self.metadata is None:
            # Hack to allow default value for mutable type in frozen dataclass
            object.__setattr__(self, "metadata", {})

    @property
    def json(self) -> SheetJSON:
        """
        Returns a JSON-compatible dictionary representation of the sheet.

        Returns:
            SheetJSON: A dictionary containing the sheet data.
        """
        return {
            "name": self.name,
            "tables": [t.json for t in self.tables],
            "metadata": self.metadata if self.metadata is not None else {},
        }

    def get_table(self, name: str) -> Table | None:
        """
        Retrieve a table by its name.

        Args:
            name (str): The name of the table to retrieve.

        Returns:
            Table | None: The table object if found, otherwise None.
        """
        for table in self.tables:
            if table.name == name:
                return table
        return None

    def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
        """
        Generates a Markdown string representation of the sheet.

        Args:
            schema (ParsingSchema, optional): Configuration for formatting.

        Returns:
            str: The Markdown string.
        """
        return generate_sheet_markdown(self, schema)

json property

Returns a JSON-compatible dictionary representation of the sheet.

Returns:

Name Type Description
SheetJSON SheetJSON

A dictionary containing the sheet data.

get_table(name)

Retrieve a table by its name.

Parameters:

Name Type Description Default
name str

The name of the table to retrieve.

required

Returns:

Type Description
Table | None

Table | None: The table object if found, otherwise None.

Source code in src/md_spreadsheet_parser/models.py
382
383
384
385
386
387
388
389
390
391
392
393
394
395
def get_table(self, name: str) -> Table | None:
    """
    Retrieve a table by its name.

    Args:
        name (str): The name of the table to retrieve.

    Returns:
        Table | None: The table object if found, otherwise None.
    """
    for table in self.tables:
        if table.name == name:
            return table
    return None

to_markdown(schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the sheet.

Parameters:

Name Type Description Default
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/models.py
397
398
399
400
401
402
403
404
405
406
407
def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
    """
    Generates a Markdown string representation of the sheet.

    Args:
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    return generate_sheet_markdown(self, schema)

Table dataclass

Represents a parsed table with optional metadata.

Attributes:

Name Type Description
headers list[str] | None

List of column headers, or None if the table has no headers.

rows list[list[str]]

List of data rows.

alignments list[AlignmentType] | None

List of column alignments ('left', 'center', 'right'). Defaults to None.

name str | None

Name of the table (e.g. from a header). Defaults to None.

description str | None

Description of the table. Defaults to None.

metadata dict[str, Any] | None

Arbitrary metadata. Defaults to None.

Source code in src/md_spreadsheet_parser/models.py
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
@dataclass(frozen=True)
class Table:
    """
    Represents a parsed table with optional metadata.

    Attributes:
        headers (list[str] | None): List of column headers, or None if the table has no headers.
        rows (list[list[str]]): List of data rows.
        alignments (list[AlignmentType] | None): List of column alignments ('left', 'center', 'right'). Defaults to None.
        name (str | None): Name of the table (e.g. from a header). Defaults to None.
        description (str | None): Description of the table. Defaults to None.
        metadata (dict[str, Any] | None): Arbitrary metadata. Defaults to None.
    """

    headers: list[str] | None
    rows: list[list[str]]
    alignments: list[AlignmentType] | None = None
    name: str | None = None
    description: str | None = None
    metadata: dict[str, Any] | None = None
    start_line: int | None = None
    end_line: int | None = None

    def __post_init__(self):
        if self.metadata is None:
            # Hack to allow default value for mutable type in frozen dataclass
            object.__setattr__(self, "metadata", {})

    @property
    def json(self) -> TableJSON:
        """
        Returns a JSON-compatible dictionary representation of the table.

        Returns:
            TableJSON: A dictionary containing the table data.
        """
        return {
            "name": self.name,
            "description": self.description,
            "headers": self.headers,
            "rows": self.rows,
            "metadata": self.metadata if self.metadata is not None else {},
            "start_line": self.start_line,
            "end_line": self.end_line,
            "alignments": self.alignments,
        }

    def to_models(
        self,
        schema_cls: type[T],
        conversion_schema: ConversionSchema = DEFAULT_CONVERSION_SCHEMA,
    ) -> list[T]:
        """
        Converts the table rows into a list of dataclass instances, performing validation and type conversion.

        Args:
            schema_cls (type[T]): The dataclass type to validate against.
            conversion_schema (ConversionSchema, optional): Configuration for type conversion.

        Returns:
            list[T]: A list of validated dataclass instances.

        Raises:
            ValueError: If schema_cls is not a dataclass.
            TableValidationError: If validation fails for any row or if the table has no headers.
        """
        return validate_table(self, schema_cls, conversion_schema)

    def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
        """
        Generates a Markdown string representation of the table.

        Args:
            schema (ParsingSchema, optional): Configuration for formatting.

        Returns:
            str: The Markdown string.
        """
        return generate_table_markdown(self, schema)

    def update_cell(self, row_idx: int, col_idx: int, value: str) -> "Table":
        """
        Return a new Table with the specified cell updated.
        """
        # Handle header update
        if row_idx == -1:
            if self.headers is None:
                # Determine width from rows if possible, or start fresh
                width = len(self.rows[0]) if self.rows else (col_idx + 1)
                new_headers = [""] * width
                # Ensure width enough
                if col_idx >= len(new_headers):
                    new_headers.extend([""] * (col_idx - len(new_headers) + 1))
            else:
                new_headers = list(self.headers)
                if col_idx >= len(new_headers):
                    new_headers.extend([""] * (col_idx - len(new_headers) + 1))

            # Update alignments if headers grew
            new_alignments = list(self.alignments) if self.alignments else []
            if len(new_headers) > len(new_alignments):
                # Fill with default/None up to new width
                # But we only need as many alignments as columns.
                # If alignments is None, it stays None?
                # Ideally if we start tracking alignments, we should init it?
                # If self.alignments was None, we might keep it None unless explicitly set?
                # Consistent behavior: If alignments is NOT None, expand it.
                if self.alignments is not None:
                    # Cast or explicit type check might be needed for strict type checkers with literals
                    # Using a typed list to satisfy invariant list[AlignmentType]
                    extension: list[AlignmentType] = ["default"] * (
                        len(new_headers) - len(new_alignments)
                    )
                    new_alignments.extend(extension)

            final_alignments = new_alignments if self.alignments is not None else None

            new_headers[col_idx] = value

            return replace(self, headers=new_headers, alignments=final_alignments)

        # Handle Body update
        # 1. Ensure row exists
        new_rows = [list(r) for r in self.rows]

        # Grow rows if needed
        if row_idx >= len(new_rows):
            # Calculate width
            width = (
                len(self.headers)
                if self.headers
                else (len(new_rows[0]) if new_rows else 0)
            )
            if width == 0:
                width = col_idx + 1  # At least cover the new cell

            rows_to_add = row_idx - len(new_rows) + 1
            for _ in range(rows_to_add):
                new_rows.append([""] * width)

        # If columns expanded due to row update, we might need to expand alignments too
        current_width = len(new_rows[0]) if new_rows else 0
        if col_idx >= current_width:
            # This means we are expanding columns
            if self.alignments is not None:
                width_needed = col_idx + 1
                current_align_len = len(self.alignments)
                if width_needed > current_align_len:
                    new_alignments = list(self.alignments)
                    extension: list[AlignmentType] = ["default"] * (
                        width_needed - current_align_len
                    )
                    new_alignments.extend(extension)
                    return replace(
                        self,
                        rows=self._update_rows_cell(new_rows, row_idx, col_idx, value),
                        alignments=new_alignments,
                    )

        return replace(
            self, rows=self._update_rows_cell(new_rows, row_idx, col_idx, value)
        )

    def _update_rows_cell(self, new_rows, row_idx, col_idx, value):
        target_row = new_rows[row_idx]
        if col_idx >= len(target_row):
            target_row.extend([""] * (col_idx - len(target_row) + 1))
        target_row[col_idx] = value
        return new_rows

    def delete_row(self, row_idx: int) -> "Table":
        """
        Return a new Table with the row at index removed.
        """
        new_rows = [list(r) for r in self.rows]
        if 0 <= row_idx < len(new_rows):
            new_rows.pop(row_idx)
        return replace(self, rows=new_rows)

    def delete_column(self, col_idx: int) -> "Table":
        """
        Return a new Table with the column at index removed.
        """
        new_headers = list(self.headers) if self.headers else None
        if new_headers and 0 <= col_idx < len(new_headers):
            new_headers.pop(col_idx)

        new_rows = []
        for row in self.rows:
            new_row = list(row)
            if 0 <= col_idx < len(new_row):
                new_row.pop(col_idx)
            new_rows.append(new_row)

        new_alignments = None
        if self.alignments is not None:
            new_alignments = list(self.alignments)
            if 0 <= col_idx < len(new_alignments):
                new_alignments.pop(col_idx)

        return replace(
            self, headers=new_headers, rows=new_rows, alignments=new_alignments
        )

    def clear_column_data(self, col_idx: int) -> "Table":
        """
        Return a new Table with data in the specified column cleared (set to empty string),
        but headers and column structure preserved.
        """
        # Headers remain unchanged

        new_rows = []
        for row in self.rows:
            new_row = list(row)
            if 0 <= col_idx < len(new_row):
                new_row[col_idx] = ""
            new_rows.append(new_row)

        return replace(self, rows=new_rows)

    def insert_row(self, row_idx: int) -> "Table":
        """
        Return a new Table with an empty row inserted at row_idx.
        Subsequent rows are shifted down.
        """
        new_rows = [list(r) for r in self.rows]

        # Determine width
        width = (
            len(self.headers) if self.headers else (len(new_rows[0]) if new_rows else 0)
        )
        if width == 0:
            width = 1  # Default to 1 column if table is empty

        new_row = [""] * width

        if row_idx < 0:
            row_idx = 0
        if row_idx > len(new_rows):
            row_idx = len(new_rows)

        new_rows.insert(row_idx, new_row)
        return replace(self, rows=new_rows)

    def insert_column(self, col_idx: int) -> "Table":
        """
        Return a new Table with an empty column inserted at col_idx.
        Subsequent columns are shifted right.
        """
        new_headers = list(self.headers) if self.headers else None

        if new_headers:
            if col_idx < 0:
                col_idx = 0
            if col_idx > len(new_headers):
                col_idx = len(new_headers)
            new_headers.insert(col_idx, "")

        new_alignments = None
        if self.alignments is not None:
            new_alignments = list(self.alignments)
            # Pad if needed before insertion?
            if col_idx > len(new_alignments):
                extension: list[AlignmentType] = ["default"] * (
                    col_idx - len(new_alignments)
                )
                new_alignments.extend(extension)
            new_alignments.insert(col_idx, "default")  # Default alignment

        new_rows = []
        for row in self.rows:
            new_row = list(row)
            # Ensure row is long enough before insertion logic?
            # Or just insert.
            # If col_idx is way past end, we might need padding?
            # Standard list.insert handles index > len -> append.
            current_len = len(new_row)
            target_idx = col_idx
            if target_idx > current_len:
                # Pad up to target
                new_row.extend([""] * (target_idx - current_len))
                target_idx = len(new_row)  # Append

            new_row.insert(target_idx, "")
            new_rows.append(new_row)

        return replace(
            self, headers=new_headers, rows=new_rows, alignments=new_alignments
        )

json property

Returns a JSON-compatible dictionary representation of the table.

Returns:

Name Type Description
TableJSON TableJSON

A dictionary containing the table data.

clear_column_data(col_idx)

Return a new Table with data in the specified column cleared (set to empty string), but headers and column structure preserved.

Source code in src/md_spreadsheet_parser/models.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def clear_column_data(self, col_idx: int) -> "Table":
    """
    Return a new Table with data in the specified column cleared (set to empty string),
    but headers and column structure preserved.
    """
    # Headers remain unchanged

    new_rows = []
    for row in self.rows:
        new_row = list(row)
        if 0 <= col_idx < len(new_row):
            new_row[col_idx] = ""
        new_rows.append(new_row)

    return replace(self, rows=new_rows)

delete_column(col_idx)

Return a new Table with the column at index removed.

Source code in src/md_spreadsheet_parser/models.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
def delete_column(self, col_idx: int) -> "Table":
    """
    Return a new Table with the column at index removed.
    """
    new_headers = list(self.headers) if self.headers else None
    if new_headers and 0 <= col_idx < len(new_headers):
        new_headers.pop(col_idx)

    new_rows = []
    for row in self.rows:
        new_row = list(row)
        if 0 <= col_idx < len(new_row):
            new_row.pop(col_idx)
        new_rows.append(new_row)

    new_alignments = None
    if self.alignments is not None:
        new_alignments = list(self.alignments)
        if 0 <= col_idx < len(new_alignments):
            new_alignments.pop(col_idx)

    return replace(
        self, headers=new_headers, rows=new_rows, alignments=new_alignments
    )

delete_row(row_idx)

Return a new Table with the row at index removed.

Source code in src/md_spreadsheet_parser/models.py
227
228
229
230
231
232
233
234
def delete_row(self, row_idx: int) -> "Table":
    """
    Return a new Table with the row at index removed.
    """
    new_rows = [list(r) for r in self.rows]
    if 0 <= row_idx < len(new_rows):
        new_rows.pop(row_idx)
    return replace(self, rows=new_rows)

insert_column(col_idx)

Return a new Table with an empty column inserted at col_idx. Subsequent columns are shifted right.

Source code in src/md_spreadsheet_parser/models.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
def insert_column(self, col_idx: int) -> "Table":
    """
    Return a new Table with an empty column inserted at col_idx.
    Subsequent columns are shifted right.
    """
    new_headers = list(self.headers) if self.headers else None

    if new_headers:
        if col_idx < 0:
            col_idx = 0
        if col_idx > len(new_headers):
            col_idx = len(new_headers)
        new_headers.insert(col_idx, "")

    new_alignments = None
    if self.alignments is not None:
        new_alignments = list(self.alignments)
        # Pad if needed before insertion?
        if col_idx > len(new_alignments):
            extension: list[AlignmentType] = ["default"] * (
                col_idx - len(new_alignments)
            )
            new_alignments.extend(extension)
        new_alignments.insert(col_idx, "default")  # Default alignment

    new_rows = []
    for row in self.rows:
        new_row = list(row)
        # Ensure row is long enough before insertion logic?
        # Or just insert.
        # If col_idx is way past end, we might need padding?
        # Standard list.insert handles index > len -> append.
        current_len = len(new_row)
        target_idx = col_idx
        if target_idx > current_len:
            # Pad up to target
            new_row.extend([""] * (target_idx - current_len))
            target_idx = len(new_row)  # Append

        new_row.insert(target_idx, "")
        new_rows.append(new_row)

    return replace(
        self, headers=new_headers, rows=new_rows, alignments=new_alignments
    )

insert_row(row_idx)

Return a new Table with an empty row inserted at row_idx. Subsequent rows are shifted down.

Source code in src/md_spreadsheet_parser/models.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def insert_row(self, row_idx: int) -> "Table":
    """
    Return a new Table with an empty row inserted at row_idx.
    Subsequent rows are shifted down.
    """
    new_rows = [list(r) for r in self.rows]

    # Determine width
    width = (
        len(self.headers) if self.headers else (len(new_rows[0]) if new_rows else 0)
    )
    if width == 0:
        width = 1  # Default to 1 column if table is empty

    new_row = [""] * width

    if row_idx < 0:
        row_idx = 0
    if row_idx > len(new_rows):
        row_idx = len(new_rows)

    new_rows.insert(row_idx, new_row)
    return replace(self, rows=new_rows)

to_markdown(schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the table.

Parameters:

Name Type Description Default
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/models.py
125
126
127
128
129
130
131
132
133
134
135
def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
    """
    Generates a Markdown string representation of the table.

    Args:
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    return generate_table_markdown(self, schema)

to_models(schema_cls, conversion_schema=DEFAULT_CONVERSION_SCHEMA)

Converts the table rows into a list of dataclass instances, performing validation and type conversion.

Parameters:

Name Type Description Default
schema_cls type[T]

The dataclass type to validate against.

required
conversion_schema ConversionSchema

Configuration for type conversion.

DEFAULT_CONVERSION_SCHEMA

Returns:

Type Description
list[T]

list[T]: A list of validated dataclass instances.

Raises:

Type Description
ValueError

If schema_cls is not a dataclass.

TableValidationError

If validation fails for any row or if the table has no headers.

Source code in src/md_spreadsheet_parser/models.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def to_models(
    self,
    schema_cls: type[T],
    conversion_schema: ConversionSchema = DEFAULT_CONVERSION_SCHEMA,
) -> list[T]:
    """
    Converts the table rows into a list of dataclass instances, performing validation and type conversion.

    Args:
        schema_cls (type[T]): The dataclass type to validate against.
        conversion_schema (ConversionSchema, optional): Configuration for type conversion.

    Returns:
        list[T]: A list of validated dataclass instances.

    Raises:
        ValueError: If schema_cls is not a dataclass.
        TableValidationError: If validation fails for any row or if the table has no headers.
    """
    return validate_table(self, schema_cls, conversion_schema)

update_cell(row_idx, col_idx, value)

Return a new Table with the specified cell updated.

Source code in src/md_spreadsheet_parser/models.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def update_cell(self, row_idx: int, col_idx: int, value: str) -> "Table":
    """
    Return a new Table with the specified cell updated.
    """
    # Handle header update
    if row_idx == -1:
        if self.headers is None:
            # Determine width from rows if possible, or start fresh
            width = len(self.rows[0]) if self.rows else (col_idx + 1)
            new_headers = [""] * width
            # Ensure width enough
            if col_idx >= len(new_headers):
                new_headers.extend([""] * (col_idx - len(new_headers) + 1))
        else:
            new_headers = list(self.headers)
            if col_idx >= len(new_headers):
                new_headers.extend([""] * (col_idx - len(new_headers) + 1))

        # Update alignments if headers grew
        new_alignments = list(self.alignments) if self.alignments else []
        if len(new_headers) > len(new_alignments):
            # Fill with default/None up to new width
            # But we only need as many alignments as columns.
            # If alignments is None, it stays None?
            # Ideally if we start tracking alignments, we should init it?
            # If self.alignments was None, we might keep it None unless explicitly set?
            # Consistent behavior: If alignments is NOT None, expand it.
            if self.alignments is not None:
                # Cast or explicit type check might be needed for strict type checkers with literals
                # Using a typed list to satisfy invariant list[AlignmentType]
                extension: list[AlignmentType] = ["default"] * (
                    len(new_headers) - len(new_alignments)
                )
                new_alignments.extend(extension)

        final_alignments = new_alignments if self.alignments is not None else None

        new_headers[col_idx] = value

        return replace(self, headers=new_headers, alignments=final_alignments)

    # Handle Body update
    # 1. Ensure row exists
    new_rows = [list(r) for r in self.rows]

    # Grow rows if needed
    if row_idx >= len(new_rows):
        # Calculate width
        width = (
            len(self.headers)
            if self.headers
            else (len(new_rows[0]) if new_rows else 0)
        )
        if width == 0:
            width = col_idx + 1  # At least cover the new cell

        rows_to_add = row_idx - len(new_rows) + 1
        for _ in range(rows_to_add):
            new_rows.append([""] * width)

    # If columns expanded due to row update, we might need to expand alignments too
    current_width = len(new_rows[0]) if new_rows else 0
    if col_idx >= current_width:
        # This means we are expanding columns
        if self.alignments is not None:
            width_needed = col_idx + 1
            current_align_len = len(self.alignments)
            if width_needed > current_align_len:
                new_alignments = list(self.alignments)
                extension: list[AlignmentType] = ["default"] * (
                    width_needed - current_align_len
                )
                new_alignments.extend(extension)
                return replace(
                    self,
                    rows=self._update_rows_cell(new_rows, row_idx, col_idx, value),
                    alignments=new_alignments,
                )

    return replace(
        self, rows=self._update_rows_cell(new_rows, row_idx, col_idx, value)
    )

TableValidationError

Bases: Exception

Exception raised when table validation fails. Contains a list of errors found during validation.

Source code in src/md_spreadsheet_parser/validation.py
14
15
16
17
18
19
20
21
22
23
24
class TableValidationError(Exception):
    """
    Exception raised when table validation fails.
    Contains a list of errors found during validation.
    """

    def __init__(self, errors: list[str]):
        self.errors = errors
        super().__init__(
            f"Validation failed with {len(errors)} errors:\n" + "\n".join(errors)
        )

Workbook dataclass

Represents a collection of sheets (multi-table output).

Attributes:

Name Type Description
sheets list[Sheet]

List of sheets in the workbook.

metadata dict[str, Any] | None

Arbitrary metadata. Defaults to None.

Source code in src/md_spreadsheet_parser/models.py
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
@dataclass(frozen=True)
class Workbook:
    """
    Represents a collection of sheets (multi-table output).

    Attributes:
        sheets (list[Sheet]): List of sheets in the workbook.
        metadata (dict[str, Any] | None): Arbitrary metadata. Defaults to None.
    """

    sheets: list[Sheet]
    metadata: dict[str, Any] | None = None

    def __post_init__(self):
        if self.metadata is None:
            # Hack to allow default value for mutable type in frozen dataclass
            object.__setattr__(self, "metadata", {})

    @property
    def json(self) -> WorkbookJSON:
        """
        Returns a JSON-compatible dictionary representation of the workbook.

        Returns:
            WorkbookJSON: A dictionary containing the workbook data.
        """
        return {
            "sheets": [s.json for s in self.sheets],
            "metadata": self.metadata if self.metadata is not None else {},
        }

    def get_sheet(self, name: str) -> Sheet | None:
        """
        Retrieve a sheet by its name.

        Args:
            name (str): The name of the sheet to retrieve.

        Returns:
            Sheet | None: The sheet object if found, otherwise None.
        """
        for sheet in self.sheets:
            if sheet.name == name:
                return sheet
        return None

    def to_markdown(self, schema: MultiTableParsingSchema) -> str:
        """
        Generates a Markdown string representation of the workbook.

        Args:
            schema (MultiTableParsingSchema): Configuration for formatting.

        Returns:
            str: The Markdown string.
        """
        return generate_workbook_markdown(self, schema)

    def add_sheet(self, name: str) -> "Workbook":
        """
        Return a new Workbook with a new sheet added.
        """
        # Create new sheet with one empty table as default
        new_table = Table(headers=["A", "B", "C"], rows=[["", "", ""]])
        new_sheet = Sheet(name=name, tables=[new_table])

        new_sheets = list(self.sheets)
        new_sheets.append(new_sheet)

        return replace(self, sheets=new_sheets)

    def delete_sheet(self, index: int) -> "Workbook":
        """
        Return a new Workbook with the sheet at index removed.
        """
        if index < 0 or index >= len(self.sheets):
            raise IndexError("Sheet index out of range")

        new_sheets = list(self.sheets)
        new_sheets.pop(index)

        return replace(self, sheets=new_sheets)

json property

Returns a JSON-compatible dictionary representation of the workbook.

Returns:

Name Type Description
WorkbookJSON WorkbookJSON

A dictionary containing the workbook data.

add_sheet(name)

Return a new Workbook with a new sheet added.

Source code in src/md_spreadsheet_parser/models.py
468
469
470
471
472
473
474
475
476
477
478
479
def add_sheet(self, name: str) -> "Workbook":
    """
    Return a new Workbook with a new sheet added.
    """
    # Create new sheet with one empty table as default
    new_table = Table(headers=["A", "B", "C"], rows=[["", "", ""]])
    new_sheet = Sheet(name=name, tables=[new_table])

    new_sheets = list(self.sheets)
    new_sheets.append(new_sheet)

    return replace(self, sheets=new_sheets)

delete_sheet(index)

Return a new Workbook with the sheet at index removed.

Source code in src/md_spreadsheet_parser/models.py
481
482
483
484
485
486
487
488
489
490
491
def delete_sheet(self, index: int) -> "Workbook":
    """
    Return a new Workbook with the sheet at index removed.
    """
    if index < 0 or index >= len(self.sheets):
        raise IndexError("Sheet index out of range")

    new_sheets = list(self.sheets)
    new_sheets.pop(index)

    return replace(self, sheets=new_sheets)

get_sheet(name)

Retrieve a sheet by its name.

Parameters:

Name Type Description Default
name str

The name of the sheet to retrieve.

required

Returns:

Type Description
Sheet | None

Sheet | None: The sheet object if found, otherwise None.

Source code in src/md_spreadsheet_parser/models.py
441
442
443
444
445
446
447
448
449
450
451
452
453
454
def get_sheet(self, name: str) -> Sheet | None:
    """
    Retrieve a sheet by its name.

    Args:
        name (str): The name of the sheet to retrieve.

    Returns:
        Sheet | None: The sheet object if found, otherwise None.
    """
    for sheet in self.sheets:
        if sheet.name == name:
            return sheet
    return None

to_markdown(schema)

Generates a Markdown string representation of the workbook.

Parameters:

Name Type Description Default
schema MultiTableParsingSchema

Configuration for formatting.

required

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/models.py
456
457
458
459
460
461
462
463
464
465
466
def to_markdown(self, schema: MultiTableParsingSchema) -> str:
    """
    Generates a Markdown string representation of the workbook.

    Args:
        schema (MultiTableParsingSchema): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    return generate_workbook_markdown(self, schema)

generate_sheet_markdown(sheet, schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the sheet.

Parameters:

Name Type Description Default
sheet Sheet

The Sheet object.

required
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/generator.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def generate_sheet_markdown(
    sheet: "Sheet", schema: ParsingSchema = DEFAULT_SCHEMA
) -> str:
    """
    Generates a Markdown string representation of the sheet.

    Args:
        sheet: The Sheet object.
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    lines = []

    if isinstance(schema, MultiTableParsingSchema):
        lines.append(f"{'#' * schema.sheet_header_level} {sheet.name}")
        lines.append("")

    for i, table in enumerate(sheet.tables):
        lines.append(generate_table_markdown(table, schema))
        if i < len(sheet.tables) - 1:
            lines.append("")  # Empty line between tables

    # Append Sheet Metadata if present (at the end)
    if isinstance(schema, MultiTableParsingSchema) and sheet.metadata:
        lines.append("")
        metadata_json = json.dumps(sheet.metadata)
        comment = f"<!-- md-spreadsheet-sheet-metadata: {metadata_json} -->"
        lines.append(comment)

    return "\n".join(lines)

generate_table_markdown(table, schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the table.

Parameters:

Name Type Description Default
table Table

The Table object.

required
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/generator.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def generate_table_markdown(
    table: "Table", schema: ParsingSchema = DEFAULT_SCHEMA
) -> str:
    """
    Generates a Markdown string representation of the table.

    Args:
        table: The Table object.
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    lines = []

    # Handle metadata (name and description) if MultiTableParsingSchema
    if isinstance(schema, MultiTableParsingSchema):
        if table.name and schema.table_header_level is not None:
            lines.append(f"{'#' * schema.table_header_level} {table.name}")
            lines.append("")  # Empty line after name

        if table.description and schema.capture_description:
            lines.append(table.description)
            lines.append("")  # Empty line after description

    # Build table
    sep = f" {schema.column_separator} "

    def _prepare_cell(cell: str) -> str:
        """Prepare cell for markdown generation."""
        if schema.convert_br_to_newline and "\n" in cell:
            return cell.replace("\n", "<br>")
        return cell

    # Headers
    if table.headers:
        # Add outer pipes if required
        processed_headers = [_prepare_cell(h) for h in table.headers]
        header_row = sep.join(processed_headers)
        if schema.require_outer_pipes:
            header_row = (
                f"{schema.column_separator} {header_row} {schema.column_separator}"
            )
        lines.append(header_row)

        # Separator row
        separator_cells = []
        for i, _ in enumerate(table.headers):
            alignment = "default"
            if table.alignments and i < len(table.alignments):
                # Ensure we handle potentially None values if list has gaps (unlikely by design but safe)
                alignment = table.alignments[i] or "default"

            # Construct separator cell based on alignment
            # Use 3 hyphens as base
            if alignment == "left":
                cell = ":" + schema.header_separator_char * 3
            elif alignment == "right":
                cell = schema.header_separator_char * 3 + ":"
            elif alignment == "center":
                cell = ":" + schema.header_separator_char * 3 + ":"
            else:
                # default
                cell = schema.header_separator_char * 3

            separator_cells.append(cell)

        separator_row = sep.join(separator_cells)
        if schema.require_outer_pipes:
            separator_row = (
                f"{schema.column_separator} {separator_row} {schema.column_separator}"
            )
        lines.append(separator_row)

    # Rows
    for row in table.rows:
        processed_row = [_prepare_cell(cell) for cell in row]
        row_str = sep.join(processed_row)
        if schema.require_outer_pipes:
            row_str = f"{schema.column_separator} {row_str} {schema.column_separator}"
        lines.append(row_str)

    # Append Metadata if present
    if table.metadata and "visual" in table.metadata:
        metadata_json = json.dumps(table.metadata["visual"])
        comment = f"<!-- md-spreadsheet-table-metadata: {metadata_json} -->"
        lines.append("")
        lines.append(comment)

    return "\n".join(lines)

generate_workbook_markdown(workbook, schema)

Generates a Markdown string representation of the workbook.

Parameters:

Name Type Description Default
workbook Workbook

The Workbook object.

required
schema MultiTableParsingSchema

Configuration for formatting.

required

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/generator.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def generate_workbook_markdown(
    workbook: "Workbook", schema: MultiTableParsingSchema
) -> str:
    """
    Generates a Markdown string representation of the workbook.

    Args:
        workbook: The Workbook object.
        schema (MultiTableParsingSchema): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    lines = []

    if schema.root_marker:
        lines.append(schema.root_marker)
        lines.append("")

    for i, sheet in enumerate(workbook.sheets):
        lines.append(generate_sheet_markdown(sheet, schema))
        if i < len(workbook.sheets) - 1:
            lines.append("")  # Empty line between sheets

    # Append Workbook Metadata if present
    if workbook.metadata:
        # Ensure separation from last sheet
        if lines and lines[-1] != "":
            lines.append("")

        metadata_json = json.dumps(workbook.metadata)
        comment = f"<!-- md-spreadsheet-workbook-metadata: {metadata_json} -->"
        lines.append(comment)

    return "\n".join(lines)

parse_excel(source, schema=DEFAULT_EXCEL_SCHEMA)

Parse Excel data from various sources.

Parameters:

Name Type Description Default
source ExcelSource

One of: - openpyxl.Worksheet (if openpyxl is installed) - str: TSV/CSV text content - list[list[str]]: Pre-parsed 2D array

required
schema ExcelParsingSchema

Configuration for parsing.

DEFAULT_EXCEL_SCHEMA

Returns:

Type Description
Table

Table object with processed headers and data.

Raises:

Type Description
TypeError

If source type is not supported.

Source code in src/md_spreadsheet_parser/excel.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def parse_excel(
    source: ExcelSource,
    schema: ExcelParsingSchema = DEFAULT_EXCEL_SCHEMA,
) -> Table:
    """
    Parse Excel data from various sources.

    Args:
        source: One of:
            - openpyxl.Worksheet (if openpyxl is installed)
            - str: TSV/CSV text content
            - list[list[str]]: Pre-parsed 2D array
        schema: Configuration for parsing.

    Returns:
        Table object with processed headers and data.

    Raises:
        TypeError: If source type is not supported.
    """
    rows: list[list[str]]

    # Check for openpyxl Worksheet (duck typing via hasattr)
    if HAS_OPENPYXL and hasattr(source, "iter_rows"):
        # At runtime, source is a Worksheet with iter_rows method
        ws: Any = source
        rows = [
            [_safe_str(cell) for cell in row] for row in ws.iter_rows(values_only=True)
        ]

    # Check for string (TSV/CSV content)
    elif isinstance(source, str):
        rows = _parse_tsv(source, schema.delimiter)

    # Check for pre-parsed 2D array
    elif isinstance(source, list):
        # Assume it's already list[list[str]]
        rows = source

    else:
        supported = "openpyxl.Worksheet, str, or list[list[str]]"
        if not HAS_OPENPYXL:
            supported = (
                "str or list[list[str]] (install openpyxl for Worksheet support)"
            )
        raise TypeError(
            f"Unsupported source type: {type(source).__name__}. Expected {supported}."
        )

    return parse_excel_text(rows, schema)

parse_excel_text(rows, schema=DEFAULT_EXCEL_SCHEMA)

Parse a 2D string array into a Table with merged cell and header handling.

Parameters:

Name Type Description Default
rows list[list[str]]

2D list of strings (e.g., from csv.reader or worksheet iteration).

required
schema ExcelParsingSchema

Configuration for header processing.

DEFAULT_EXCEL_SCHEMA

Returns:

Type Description
Table

Table object with processed headers and data rows.

Source code in src/md_spreadsheet_parser/excel.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def parse_excel_text(
    rows: list[list[str]],
    schema: ExcelParsingSchema = DEFAULT_EXCEL_SCHEMA,
) -> Table:
    """
    Parse a 2D string array into a Table with merged cell and header handling.

    Args:
        rows: 2D list of strings (e.g., from csv.reader or worksheet iteration).
        schema: Configuration for header processing.

    Returns:
        Table object with processed headers and data rows.
    """
    if not rows:
        return Table(headers=None, rows=[])

    if schema.header_rows == 1:
        # Single header row
        header_row = rows[0]
        if schema.fill_merged_headers:
            header_row = _forward_fill(header_row)
        headers = header_row
        data_rows = rows[1:]

    elif schema.header_rows == 2:
        # Two header rows: Parent-Child flattening
        if len(rows) < 2:
            # Not enough rows for 2-row header
            return Table(headers=rows[0] if rows else None, rows=[])

        parent_row = rows[0]
        child_row = rows[1]

        if schema.fill_merged_headers:
            parent_row = _forward_fill(parent_row)

        headers = _flatten_headers(parent_row, child_row, schema.header_separator)
        data_rows = rows[2:]

    else:
        # Should not reach here due to schema validation
        raise ValueError(f"Invalid header_rows: {schema.header_rows}")

    # Convert data_rows to list[list[str]] ensuring all are strings
    processed_rows = [[_safe_str(cell) for cell in row] for row in data_rows]

    return Table(headers=headers, rows=processed_rows)

parse_sheet(markdown, name, schema, start_line_offset=0)

Parse a sheet (section) containing one or more tables.

Source code in src/md_spreadsheet_parser/parsing.py
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
def parse_sheet(
    markdown: str,
    name: str,
    schema: MultiTableParsingSchema,
    start_line_offset: int = 0,
) -> Sheet:
    """
    Parse a sheet (section) containing one or more tables.
    """
    metadata: dict[str, Any] | None = None

    # Scan for sheet metadata
    # We prioritize the first match if multiple exist (though usually only one)
    metadata_match = re.search(
        r"^<!-- md-spreadsheet-sheet-metadata: (.*) -->$", markdown, re.MULTILINE
    )
    if metadata_match:
        try:
            metadata = json.loads(metadata_match.group(1))
        except json.JSONDecodeError:
            pass  # Ignore invalid JSON

    tables = _extract_tables(markdown, schema, start_line_offset)
    return Sheet(name=name, tables=tables, metadata=metadata)

parse_table(markdown, schema=DEFAULT_SCHEMA)

Parse a markdown table into a Table object.

Parameters:

Name Type Description Default
markdown str

The markdown string containing the table.

required
schema ParsingSchema

Configuration for parsing.

DEFAULT_SCHEMA

Returns:

Type Description
Table

Table object with headers and rows.

Source code in src/md_spreadsheet_parser/parsing.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
def parse_table(markdown: str, schema: ParsingSchema = DEFAULT_SCHEMA) -> Table:
    """
    Parse a markdown table into a Table object.

    Args:
        markdown: The markdown string containing the table.
        schema: Configuration for parsing.

    Returns:
        Table object with headers and rows.
    """
    lines = markdown.strip().split("\n")
    headers: list[str] | None = None
    rows: list[list[str]] = []
    alignments: list[AlignmentType] | None = None
    potential_header: list[str] | None = None
    visual_metadata: dict | None = None

    # Buffer for potential header row until we confirm it's a header with a separator
    potential_header: list[str] | None = None

    for line in lines:
        line = line.strip()
        if not line:
            continue

        # Check for metadata comment
        metadata_match = re.match(
            r"^<!-- md-spreadsheet-table-metadata: (.*) -->$", line
        )
        if metadata_match:
            try:
                json_content = metadata_match.group(1)
                visual_metadata = json.loads(json_content)
                continue
            except json.JSONDecodeError:
                # If invalid JSON, treat as normal text/comment (or ignore?)
                # For robustness, we ignore it as metadata but let parse_row handle it or skip?
                # Usually comments are ignored by parse_row if they don't look like tables?
                # parse_row will likely return ["<!-- ... -->"].
                # If we want to hide it from table data, we should continue here even if error?
                # User constraint: "if user manually edits... handle gracefully".
                # Let's log/ignore and continue, effectively stripping bad metadata lines from table data.
                continue

        parsed_row = parse_row(line, schema)

        if parsed_row is None:
            continue

        if headers is None and potential_header is not None:
            detected_alignments = parse_separator_row(parsed_row, schema)
            if detected_alignments is not None:
                headers = potential_header
                alignments: list[AlignmentType] | None = detected_alignments
                potential_header = None
                continue
                potential_header = None
                continue
            else:
                # Previous row was not a header, treat as data
                rows.append(potential_header)
                potential_header = parsed_row
        elif headers is None and potential_header is None:
            potential_header = parsed_row
        else:
            rows.append(parsed_row)

    if potential_header is not None:
        rows.append(potential_header)

    # Normalize rows to match header length
    if headers:
        header_len = len(headers)
        normalized_rows = []
        for row in rows:
            if len(row) < header_len:
                # Pad with empty strings
                row.extend([""] * (header_len - len(row)))
            elif len(row) > header_len:
                # Truncate
                row = row[:header_len]
            normalized_rows.append(row)
        rows = normalized_rows

    metadata: dict[str, Any] = {"schema_used": str(schema)}
    if visual_metadata:
        metadata["visual"] = visual_metadata

    return Table(headers=headers, rows=rows, metadata=metadata, alignments=alignments)

parse_table_from_file(source, schema=DEFAULT_SCHEMA)

Parse a markdown table from a file.

Parameters:

Name Type Description Default
source Union[str, Path, TextIO]

File path (str/Path) or file-like object.

required
schema ParsingSchema

Parsing configuration.

DEFAULT_SCHEMA
Source code in src/md_spreadsheet_parser/loader.py
20
21
22
23
24
25
26
27
28
29
30
31
def parse_table_from_file(
    source: Union[str, Path, TextIO], schema: ParsingSchema = DEFAULT_SCHEMA
) -> Table:
    """
    Parse a markdown table from a file.

    Args:
        source: File path (str/Path) or file-like object.
        schema: Parsing configuration.
    """
    content = _read_content(source)
    return parse_table(content, schema)

parse_workbook(markdown, schema=MultiTableParsingSchema())

Parse a markdown document into a Workbook.

Source code in src/md_spreadsheet_parser/parsing.py
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
def parse_workbook(
    markdown: str, schema: MultiTableParsingSchema = MultiTableParsingSchema()
) -> Workbook:
    """
    Parse a markdown document into a Workbook.
    """
    lines = markdown.split("\n")
    sheets: list[Sheet] = []
    metadata: dict[str, Any] | None = None

    # Check for Workbook metadata at the end of the file
    # Scan for Workbook metadata anywhere in the file
    # We filter it out from the lines so it doesn't interfere with sheet content
    filtered_lines: list[str] = []
    wb_metadata_pattern = re.compile(
        r"^<!-- md-spreadsheet-workbook-metadata: (.*) -->$"
    )

    for line in lines:
        stripped = line.strip()
        match = wb_metadata_pattern.match(stripped)
        if match:
            try:
                metadata = json.loads(match.group(1))
            except json.JSONDecodeError:
                pass
            # Skip adding this line to filtered_lines
        else:
            filtered_lines.append(line)

    lines = filtered_lines

    # Find root marker
    start_index = 0
    in_code_block = False
    if schema.root_marker:
        found = False
        for i, line in enumerate(lines):
            stripped = line.strip()
            if stripped.startswith("```"):
                in_code_block = not in_code_block

            if not in_code_block and stripped == schema.root_marker:
                start_index = i + 1
                found = True
                break
        if not found:
            return Workbook(sheets=[], metadata=metadata)

    # Split by sheet headers
    header_prefix = "#" * schema.sheet_header_level + " "

    current_sheet_name: str | None = None
    current_sheet_lines: list[str] = []
    current_sheet_start_line = start_index

    # Reset code block state for the second pass
    # If we started after a root marker, check if that root marker line was just a marker.
    # We assume valid markdown structure where root marker is not inside a code block (handled above).
    in_code_block = False

    for idx, line in enumerate(lines[start_index:], start=start_index):
        stripped = line.strip()

        if stripped.startswith("```"):
            in_code_block = not in_code_block

        if in_code_block:
            # Just collect lines if we are in a sheet
            if current_sheet_name:
                current_sheet_lines.append(line)
            continue

        # Check if line is a header
        if stripped.startswith("#"):
            # Count header level
            level = 0
            for char in stripped:
                if char == "#":
                    level += 1
                else:
                    break

            # If header level is less than sheet_header_level (e.g. # vs ##),
            # it indicates a higher-level section, so we stop parsing the workbook.
            if level < schema.sheet_header_level:
                break

        if stripped.startswith(header_prefix):
            if current_sheet_name:
                sheet_content = "\n".join(current_sheet_lines)
                # The content starts at current_sheet_start_line + 1 (header line)
                # Wait, current_sheet_lines collected lines AFTER the header.
                # So the offset for content is current_sheet_start_line + 1.
                sheets.append(
                    parse_sheet(
                        sheet_content,
                        current_sheet_name,
                        schema,
                        start_line_offset=current_sheet_start_line + 1,
                    )
                )

            current_sheet_name = stripped[len(header_prefix) :].strip()
            current_sheet_lines = []
            current_sheet_start_line = idx
        else:
            if current_sheet_name:
                current_sheet_lines.append(line)

    if current_sheet_name:
        sheet_content = "\n".join(current_sheet_lines)
        sheets.append(
            parse_sheet(
                sheet_content,
                current_sheet_name,
                schema,
                start_line_offset=current_sheet_start_line + 1,
            )
        )

    return Workbook(sheets=sheets, metadata=metadata)

parse_workbook_from_file(source, schema=MultiTableParsingSchema())

Parse a markdown workbook from a file.

Parameters:

Name Type Description Default
source Union[str, Path, TextIO]

File path (str/Path) or file-like object.

required
schema MultiTableParsingSchema

Parsing configuration.

MultiTableParsingSchema()
Source code in src/md_spreadsheet_parser/loader.py
34
35
36
37
38
39
40
41
42
43
44
45
46
def parse_workbook_from_file(
    source: Union[str, Path, TextIO],
    schema: MultiTableParsingSchema = MultiTableParsingSchema(),
) -> Workbook:
    """
    Parse a markdown workbook from a file.

    Args:
        source: File path (str/Path) or file-like object.
        schema: Parsing configuration.
    """
    content = _read_content(source)
    return parse_workbook(content, schema)

scan_tables(markdown, schema=None)

Scan a markdown document for all tables, ignoring sheet structure.

Parameters:

Name Type Description Default
markdown str

The markdown text.

required
schema MultiTableParsingSchema | None

Optional schema. If None, uses default MultiTableParsingSchema.

None

Returns:

Source code in src/md_spreadsheet_parser/parsing.py
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
def scan_tables(
    markdown: str, schema: MultiTableParsingSchema | None = None
) -> list[Table]:
    """
    Scan a markdown document for all tables, ignoring sheet structure.

    Args:
        markdown: The markdown text.
        schema: Optional schema. If None, uses default MultiTableParsingSchema.

    Returns:
    """
    if schema is None:
        schema = MultiTableParsingSchema()

    return _extract_tables(markdown, schema)

scan_tables_from_file(source, schema=None)

Scan a markdown file for all tables.

Parameters:

Name Type Description Default
source Union[str, Path, TextIO]

File path (str/Path) or file-like object.

required
schema MultiTableParsingSchema | None

Optional schema.

None
Source code in src/md_spreadsheet_parser/loader.py
49
50
51
52
53
54
55
56
57
58
59
60
def scan_tables_from_file(
    source: Union[str, Path, TextIO], schema: MultiTableParsingSchema | None = None
) -> list[Table]:
    """
    Scan a markdown file for all tables.

    Args:
        source: File path (str/Path) or file-like object.
        schema: Optional schema.
    """
    content = _read_content(source)
    return scan_tables(content, schema)

scan_tables_iter(source, schema=None)

Stream tables from a source (file path, file object, or iterable) one by one. This allows processing files larger than memory, provided that individual tables fit in memory.

Parameters:

Name Type Description Default
source Union[str, Path, TextIO, Iterable[str]]

File path, open file object, or iterable of strings.

required
schema MultiTableParsingSchema | None

Parsing configuration.

None

Yields:

Type Description
Table

Table objects found in the stream.

Source code in src/md_spreadsheet_parser/loader.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def scan_tables_iter(
    source: Union[str, Path, TextIO, Iterable[str]],
    schema: MultiTableParsingSchema | None = None,
) -> Iterator[Table]:
    """
    Stream tables from a source (file path, file object, or iterable) one by one.
    This allows processing files larger than memory, provided that individual tables fit in memory.

    Args:
        source: File path, open file object, or iterable of strings.
        schema: Parsing configuration.

    Yields:
        Table objects found in the stream.
    """
    if schema is None:
        schema = MultiTableParsingSchema()

    header_prefix = None
    if schema.table_header_level is not None:
        header_prefix = "#" * schema.table_header_level + " "

    current_lines: list[str] = []
    current_name: str | None = None
    # We track line number manually for metadata
    current_line_idx = 0
    # Start of the current block
    block_start_line = 0

    def parse_and_yield(
        lines: list[str], name: str | None, start_offset: int
    ) -> Iterator[Table]:
        if not lines:
            return

        # Check if block looks like a table (has separator)
        block_text = "".join(lines)

        if schema.column_separator not in block_text:
            return

        # Simple extraction logic similar to process_table_block
        # We reuse parsing logic.

        # Split description vs table
        # We need list of lines stripped of newline for index finding
        stripped_lines = [line_val.rstrip("\n") for line_val in lines]

        table_start_idx = -1
        for idx, line in enumerate(stripped_lines):
            if schema.column_separator in line:
                table_start_idx = idx
                break

        if table_start_idx != -1:
            desc_lines = stripped_lines[:table_start_idx]
            table_lines = stripped_lines[table_start_idx:]

            table_text = "\n".join(table_lines)
            table = parse_table(table_text, schema)

            if table.rows or table.headers:
                description = None
                if schema.capture_description:
                    desc_text = "\n".join(d.strip() for d in desc_lines if d.strip())
                    if desc_text:
                        description = desc_text

                table = replace(
                    table,
                    name=name,
                    description=description,
                    start_line=start_offset + table_start_idx,
                    end_line=start_offset + len(lines),
                )
                yield table

    for line in _iter_lines(source):
        # normalize: file iter yields line with \n
        stripped_line = line.strip()

        is_header = header_prefix and stripped_line.startswith(header_prefix)

        if is_header:
            # New section starts. Yield previous buffer if any.
            yield from parse_and_yield(current_lines, current_name, block_start_line)

            assert header_prefix is not None
            current_name = stripped_line[len(header_prefix) :].strip()
            current_lines = []
            block_start_line = current_line_idx

        elif stripped_line == "":
            # Blank line.
            yield from parse_and_yield(current_lines, current_name, block_start_line)
            current_lines = []
            # block_start_line for NEXT block will be current_line_idx + 1
            block_start_line = current_line_idx + 1

        else:
            current_lines.append(line)

        current_line_idx += 1

    # End of stream
    yield from parse_and_yield(current_lines, current_name, block_start_line)

md_spreadsheet_parser.schemas

ConversionSchema dataclass

Configuration for converting string values to Python types.

Attributes:

Name Type Description
boolean_pairs tuple[tuple[str, str], ...]

Pairs of strings representing (True, False). Case-insensitive. Example: (("yes", "no"), ("on", "off")).

custom_converters dict[type, Callable[[str], Any]]

Dictionary mapping ANY Python type to a conversion function str -> Any. You can specify: - Built-in types: int, float, bool (to override default behavior) - Standard library types: Decimal, datetime, date, ZoneInfo - Custom classes: MyClass, Product

field_converters dict[str, Callable[[str], Any]]

Dictionary mapping field names (str) to conversion functions. Takes precedence over custom_converters.

Source code in src/md_spreadsheet_parser/schemas.py
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
@dataclass(frozen=True)
class ConversionSchema:
    """
    Configuration for converting string values to Python types.

    Attributes:
        boolean_pairs: Pairs of strings representing (True, False). Case-insensitive.
                       Example: `(("yes", "no"), ("on", "off"))`.
        custom_converters: Dictionary mapping ANY Python type to a conversion function `str -> Any`.
                           You can specify:
                           - Built-in types: `int`, `float`, `bool` (to override default behavior)
                           - Standard library types: `Decimal`, `datetime`, `date`, `ZoneInfo`
                           - Custom classes: `MyClass`, `Product`
        field_converters: Dictionary mapping field names (str) to conversion functions.
                          Takes precedence over `custom_converters`.
    """

    boolean_pairs: tuple[tuple[str, str], ...] = (
        ("true", "false"),
        ("yes", "no"),
        ("1", "0"),
        ("on", "off"),
    )
    custom_converters: dict[type, Callable[[str], Any]] = field(default_factory=dict)
    field_converters: dict[str, Callable[[str], Any]] = field(default_factory=dict)

ExcelParsingSchema dataclass

Configuration for parsing Excel-exported data (TSV/CSV or openpyxl).

Attributes:

Name Type Description
header_rows int

Number of header rows (1 or 2). If 2, headers are flattened to "Parent - Child" format.

fill_merged_headers bool

Whether to forward-fill empty header cells (for merged cells in Excel exports).

delimiter str

Column separator for TSV/CSV parsing. Default is tab.

header_separator str

Separator used when flattening 2-row headers.

Source code in src/md_spreadsheet_parser/schemas.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
@dataclass(frozen=True)
class ExcelParsingSchema:
    """
    Configuration for parsing Excel-exported data (TSV/CSV or openpyxl).

    Attributes:
        header_rows: Number of header rows (1 or 2).
                     If 2, headers are flattened to "Parent - Child" format.
        fill_merged_headers: Whether to forward-fill empty header cells
                             (for merged cells in Excel exports).
        delimiter: Column separator for TSV/CSV parsing. Default is tab.
        header_separator: Separator used when flattening 2-row headers.
    """

    header_rows: int = 1
    fill_merged_headers: bool = True
    delimiter: str = "\t"
    header_separator: str = " - "

    def __post_init__(self):
        if self.header_rows not in (1, 2):
            raise ValueError("header_rows must be 1 or 2")

MultiTableParsingSchema dataclass

Bases: ParsingSchema

Configuration for parsing multiple tables (workbook mode). Inherits from ParsingSchema.

Attributes:

Name Type Description
root_marker str

The marker indicating the start of the data section. Defaults to "# Tables".

sheet_header_level int

The markdown header level for sheets. Defaults to 2 (e.g. ## Sheet).

table_header_level int | None

The markdown header level for tables. If None, table names are not extracted. Defaults to None.

capture_description bool

Whether to capture text between the table header and the table as a description. Defaults to False.

Source code in src/md_spreadsheet_parser/schemas.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@dataclass(frozen=True)
class MultiTableParsingSchema(ParsingSchema):
    """
    Configuration for parsing multiple tables (workbook mode).
    Inherits from ParsingSchema.

    Attributes:
        root_marker (str): The marker indicating the start of the data section. Defaults to "# Tables".
        sheet_header_level (int): The markdown header level for sheets. Defaults to 2 (e.g. `## Sheet`).
        table_header_level (int | None): The markdown header level for tables. If None, table names are not extracted. Defaults to None.
        capture_description (bool): Whether to capture text between the table header and the table as a description. Defaults to False.
    """

    root_marker: str = "# Tables"
    sheet_header_level: int = 2
    table_header_level: int | None = 3
    capture_description: bool = True

    def __post_init__(self):
        if self.capture_description and self.table_header_level is None:
            raise ValueError(
                "capture_description=True requires table_header_level to be set"
            )

ParsingSchema dataclass

Configuration for parsing markdown tables. Designed to be immutable and passed to pure functions.

Attributes:

Name Type Description
column_separator str

Character used to separate columns. Defaults to "|".

header_separator_char str

Character used in the separator row. Defaults to "-".

require_outer_pipes bool

Whether tables must have outer pipes (e.g. | col |). Defaults to True.

strip_whitespace bool

Whether to strip whitespace from cell values. Defaults to True.

Source code in src/md_spreadsheet_parser/schemas.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@dataclass(frozen=True)
class ParsingSchema:
    """
    Configuration for parsing markdown tables.
    Designed to be immutable and passed to pure functions.

    Attributes:
        column_separator (str): Character used to separate columns. Defaults to "|".
        header_separator_char (str): Character used in the separator row. Defaults to "-".
        require_outer_pipes (bool): Whether tables must have outer pipes (e.g. `| col |`). Defaults to True.
        strip_whitespace (bool): Whether to strip whitespace from cell values. Defaults to True.
    """

    column_separator: str = "|"
    header_separator_char: str = "-"
    require_outer_pipes: bool = True
    strip_whitespace: bool = True
    convert_br_to_newline: bool = True

md_spreadsheet_parser.models

Sheet dataclass

Represents a single sheet containing tables.

Attributes:

Name Type Description
name str

Name of the sheet.

tables list[Table]

List of tables contained in this sheet.

metadata dict[str, Any] | None

Arbitrary metadata (e.g. layout). Defaults to None.

Source code in src/md_spreadsheet_parser/models.py
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
@dataclass(frozen=True)
class Sheet:
    """
    Represents a single sheet containing tables.

    Attributes:
        name (str): Name of the sheet.
        tables (list[Table]): List of tables contained in this sheet.
        metadata (dict[str, Any] | None): Arbitrary metadata (e.g. layout). Defaults to None.
    """

    name: str
    tables: list[Table]
    metadata: dict[str, Any] | None = None

    def __post_init__(self):
        if self.metadata is None:
            # Hack to allow default value for mutable type in frozen dataclass
            object.__setattr__(self, "metadata", {})

    @property
    def json(self) -> SheetJSON:
        """
        Returns a JSON-compatible dictionary representation of the sheet.

        Returns:
            SheetJSON: A dictionary containing the sheet data.
        """
        return {
            "name": self.name,
            "tables": [t.json for t in self.tables],
            "metadata": self.metadata if self.metadata is not None else {},
        }

    def get_table(self, name: str) -> Table | None:
        """
        Retrieve a table by its name.

        Args:
            name (str): The name of the table to retrieve.

        Returns:
            Table | None: The table object if found, otherwise None.
        """
        for table in self.tables:
            if table.name == name:
                return table
        return None

    def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
        """
        Generates a Markdown string representation of the sheet.

        Args:
            schema (ParsingSchema, optional): Configuration for formatting.

        Returns:
            str: The Markdown string.
        """
        return generate_sheet_markdown(self, schema)

json property

Returns a JSON-compatible dictionary representation of the sheet.

Returns:

Name Type Description
SheetJSON SheetJSON

A dictionary containing the sheet data.

get_table(name)

Retrieve a table by its name.

Parameters:

Name Type Description Default
name str

The name of the table to retrieve.

required

Returns:

Type Description
Table | None

Table | None: The table object if found, otherwise None.

Source code in src/md_spreadsheet_parser/models.py
382
383
384
385
386
387
388
389
390
391
392
393
394
395
def get_table(self, name: str) -> Table | None:
    """
    Retrieve a table by its name.

    Args:
        name (str): The name of the table to retrieve.

    Returns:
        Table | None: The table object if found, otherwise None.
    """
    for table in self.tables:
        if table.name == name:
            return table
    return None

to_markdown(schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the sheet.

Parameters:

Name Type Description Default
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/models.py
397
398
399
400
401
402
403
404
405
406
407
def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
    """
    Generates a Markdown string representation of the sheet.

    Args:
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    return generate_sheet_markdown(self, schema)

SheetJSON

Bases: TypedDict

JSON-compatible dictionary representation of a Sheet.

Source code in src/md_spreadsheet_parser/models.py
38
39
40
41
42
43
44
45
class SheetJSON(TypedDict):
    """
    JSON-compatible dictionary representation of a Sheet.
    """

    name: str
    tables: list[TableJSON]
    metadata: dict[str, Any]

Table dataclass

Represents a parsed table with optional metadata.

Attributes:

Name Type Description
headers list[str] | None

List of column headers, or None if the table has no headers.

rows list[list[str]]

List of data rows.

alignments list[AlignmentType] | None

List of column alignments ('left', 'center', 'right'). Defaults to None.

name str | None

Name of the table (e.g. from a header). Defaults to None.

description str | None

Description of the table. Defaults to None.

metadata dict[str, Any] | None

Arbitrary metadata. Defaults to None.

Source code in src/md_spreadsheet_parser/models.py
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
@dataclass(frozen=True)
class Table:
    """
    Represents a parsed table with optional metadata.

    Attributes:
        headers (list[str] | None): List of column headers, or None if the table has no headers.
        rows (list[list[str]]): List of data rows.
        alignments (list[AlignmentType] | None): List of column alignments ('left', 'center', 'right'). Defaults to None.
        name (str | None): Name of the table (e.g. from a header). Defaults to None.
        description (str | None): Description of the table. Defaults to None.
        metadata (dict[str, Any] | None): Arbitrary metadata. Defaults to None.
    """

    headers: list[str] | None
    rows: list[list[str]]
    alignments: list[AlignmentType] | None = None
    name: str | None = None
    description: str | None = None
    metadata: dict[str, Any] | None = None
    start_line: int | None = None
    end_line: int | None = None

    def __post_init__(self):
        if self.metadata is None:
            # Hack to allow default value for mutable type in frozen dataclass
            object.__setattr__(self, "metadata", {})

    @property
    def json(self) -> TableJSON:
        """
        Returns a JSON-compatible dictionary representation of the table.

        Returns:
            TableJSON: A dictionary containing the table data.
        """
        return {
            "name": self.name,
            "description": self.description,
            "headers": self.headers,
            "rows": self.rows,
            "metadata": self.metadata if self.metadata is not None else {},
            "start_line": self.start_line,
            "end_line": self.end_line,
            "alignments": self.alignments,
        }

    def to_models(
        self,
        schema_cls: type[T],
        conversion_schema: ConversionSchema = DEFAULT_CONVERSION_SCHEMA,
    ) -> list[T]:
        """
        Converts the table rows into a list of dataclass instances, performing validation and type conversion.

        Args:
            schema_cls (type[T]): The dataclass type to validate against.
            conversion_schema (ConversionSchema, optional): Configuration for type conversion.

        Returns:
            list[T]: A list of validated dataclass instances.

        Raises:
            ValueError: If schema_cls is not a dataclass.
            TableValidationError: If validation fails for any row or if the table has no headers.
        """
        return validate_table(self, schema_cls, conversion_schema)

    def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
        """
        Generates a Markdown string representation of the table.

        Args:
            schema (ParsingSchema, optional): Configuration for formatting.

        Returns:
            str: The Markdown string.
        """
        return generate_table_markdown(self, schema)

    def update_cell(self, row_idx: int, col_idx: int, value: str) -> "Table":
        """
        Return a new Table with the specified cell updated.
        """
        # Handle header update
        if row_idx == -1:
            if self.headers is None:
                # Determine width from rows if possible, or start fresh
                width = len(self.rows[0]) if self.rows else (col_idx + 1)
                new_headers = [""] * width
                # Ensure width enough
                if col_idx >= len(new_headers):
                    new_headers.extend([""] * (col_idx - len(new_headers) + 1))
            else:
                new_headers = list(self.headers)
                if col_idx >= len(new_headers):
                    new_headers.extend([""] * (col_idx - len(new_headers) + 1))

            # Update alignments if headers grew
            new_alignments = list(self.alignments) if self.alignments else []
            if len(new_headers) > len(new_alignments):
                # Fill with default/None up to new width
                # But we only need as many alignments as columns.
                # If alignments is None, it stays None?
                # Ideally if we start tracking alignments, we should init it?
                # If self.alignments was None, we might keep it None unless explicitly set?
                # Consistent behavior: If alignments is NOT None, expand it.
                if self.alignments is not None:
                    # Cast or explicit type check might be needed for strict type checkers with literals
                    # Using a typed list to satisfy invariant list[AlignmentType]
                    extension: list[AlignmentType] = ["default"] * (
                        len(new_headers) - len(new_alignments)
                    )
                    new_alignments.extend(extension)

            final_alignments = new_alignments if self.alignments is not None else None

            new_headers[col_idx] = value

            return replace(self, headers=new_headers, alignments=final_alignments)

        # Handle Body update
        # 1. Ensure row exists
        new_rows = [list(r) for r in self.rows]

        # Grow rows if needed
        if row_idx >= len(new_rows):
            # Calculate width
            width = (
                len(self.headers)
                if self.headers
                else (len(new_rows[0]) if new_rows else 0)
            )
            if width == 0:
                width = col_idx + 1  # At least cover the new cell

            rows_to_add = row_idx - len(new_rows) + 1
            for _ in range(rows_to_add):
                new_rows.append([""] * width)

        # If columns expanded due to row update, we might need to expand alignments too
        current_width = len(new_rows[0]) if new_rows else 0
        if col_idx >= current_width:
            # This means we are expanding columns
            if self.alignments is not None:
                width_needed = col_idx + 1
                current_align_len = len(self.alignments)
                if width_needed > current_align_len:
                    new_alignments = list(self.alignments)
                    extension: list[AlignmentType] = ["default"] * (
                        width_needed - current_align_len
                    )
                    new_alignments.extend(extension)
                    return replace(
                        self,
                        rows=self._update_rows_cell(new_rows, row_idx, col_idx, value),
                        alignments=new_alignments,
                    )

        return replace(
            self, rows=self._update_rows_cell(new_rows, row_idx, col_idx, value)
        )

    def _update_rows_cell(self, new_rows, row_idx, col_idx, value):
        target_row = new_rows[row_idx]
        if col_idx >= len(target_row):
            target_row.extend([""] * (col_idx - len(target_row) + 1))
        target_row[col_idx] = value
        return new_rows

    def delete_row(self, row_idx: int) -> "Table":
        """
        Return a new Table with the row at index removed.
        """
        new_rows = [list(r) for r in self.rows]
        if 0 <= row_idx < len(new_rows):
            new_rows.pop(row_idx)
        return replace(self, rows=new_rows)

    def delete_column(self, col_idx: int) -> "Table":
        """
        Return a new Table with the column at index removed.
        """
        new_headers = list(self.headers) if self.headers else None
        if new_headers and 0 <= col_idx < len(new_headers):
            new_headers.pop(col_idx)

        new_rows = []
        for row in self.rows:
            new_row = list(row)
            if 0 <= col_idx < len(new_row):
                new_row.pop(col_idx)
            new_rows.append(new_row)

        new_alignments = None
        if self.alignments is not None:
            new_alignments = list(self.alignments)
            if 0 <= col_idx < len(new_alignments):
                new_alignments.pop(col_idx)

        return replace(
            self, headers=new_headers, rows=new_rows, alignments=new_alignments
        )

    def clear_column_data(self, col_idx: int) -> "Table":
        """
        Return a new Table with data in the specified column cleared (set to empty string),
        but headers and column structure preserved.
        """
        # Headers remain unchanged

        new_rows = []
        for row in self.rows:
            new_row = list(row)
            if 0 <= col_idx < len(new_row):
                new_row[col_idx] = ""
            new_rows.append(new_row)

        return replace(self, rows=new_rows)

    def insert_row(self, row_idx: int) -> "Table":
        """
        Return a new Table with an empty row inserted at row_idx.
        Subsequent rows are shifted down.
        """
        new_rows = [list(r) for r in self.rows]

        # Determine width
        width = (
            len(self.headers) if self.headers else (len(new_rows[0]) if new_rows else 0)
        )
        if width == 0:
            width = 1  # Default to 1 column if table is empty

        new_row = [""] * width

        if row_idx < 0:
            row_idx = 0
        if row_idx > len(new_rows):
            row_idx = len(new_rows)

        new_rows.insert(row_idx, new_row)
        return replace(self, rows=new_rows)

    def insert_column(self, col_idx: int) -> "Table":
        """
        Return a new Table with an empty column inserted at col_idx.
        Subsequent columns are shifted right.
        """
        new_headers = list(self.headers) if self.headers else None

        if new_headers:
            if col_idx < 0:
                col_idx = 0
            if col_idx > len(new_headers):
                col_idx = len(new_headers)
            new_headers.insert(col_idx, "")

        new_alignments = None
        if self.alignments is not None:
            new_alignments = list(self.alignments)
            # Pad if needed before insertion?
            if col_idx > len(new_alignments):
                extension: list[AlignmentType] = ["default"] * (
                    col_idx - len(new_alignments)
                )
                new_alignments.extend(extension)
            new_alignments.insert(col_idx, "default")  # Default alignment

        new_rows = []
        for row in self.rows:
            new_row = list(row)
            # Ensure row is long enough before insertion logic?
            # Or just insert.
            # If col_idx is way past end, we might need padding?
            # Standard list.insert handles index > len -> append.
            current_len = len(new_row)
            target_idx = col_idx
            if target_idx > current_len:
                # Pad up to target
                new_row.extend([""] * (target_idx - current_len))
                target_idx = len(new_row)  # Append

            new_row.insert(target_idx, "")
            new_rows.append(new_row)

        return replace(
            self, headers=new_headers, rows=new_rows, alignments=new_alignments
        )

json property

Returns a JSON-compatible dictionary representation of the table.

Returns:

Name Type Description
TableJSON TableJSON

A dictionary containing the table data.

clear_column_data(col_idx)

Return a new Table with data in the specified column cleared (set to empty string), but headers and column structure preserved.

Source code in src/md_spreadsheet_parser/models.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def clear_column_data(self, col_idx: int) -> "Table":
    """
    Return a new Table with data in the specified column cleared (set to empty string),
    but headers and column structure preserved.
    """
    # Headers remain unchanged

    new_rows = []
    for row in self.rows:
        new_row = list(row)
        if 0 <= col_idx < len(new_row):
            new_row[col_idx] = ""
        new_rows.append(new_row)

    return replace(self, rows=new_rows)

delete_column(col_idx)

Return a new Table with the column at index removed.

Source code in src/md_spreadsheet_parser/models.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
def delete_column(self, col_idx: int) -> "Table":
    """
    Return a new Table with the column at index removed.
    """
    new_headers = list(self.headers) if self.headers else None
    if new_headers and 0 <= col_idx < len(new_headers):
        new_headers.pop(col_idx)

    new_rows = []
    for row in self.rows:
        new_row = list(row)
        if 0 <= col_idx < len(new_row):
            new_row.pop(col_idx)
        new_rows.append(new_row)

    new_alignments = None
    if self.alignments is not None:
        new_alignments = list(self.alignments)
        if 0 <= col_idx < len(new_alignments):
            new_alignments.pop(col_idx)

    return replace(
        self, headers=new_headers, rows=new_rows, alignments=new_alignments
    )

delete_row(row_idx)

Return a new Table with the row at index removed.

Source code in src/md_spreadsheet_parser/models.py
227
228
229
230
231
232
233
234
def delete_row(self, row_idx: int) -> "Table":
    """
    Return a new Table with the row at index removed.
    """
    new_rows = [list(r) for r in self.rows]
    if 0 <= row_idx < len(new_rows):
        new_rows.pop(row_idx)
    return replace(self, rows=new_rows)

insert_column(col_idx)

Return a new Table with an empty column inserted at col_idx. Subsequent columns are shifted right.

Source code in src/md_spreadsheet_parser/models.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
def insert_column(self, col_idx: int) -> "Table":
    """
    Return a new Table with an empty column inserted at col_idx.
    Subsequent columns are shifted right.
    """
    new_headers = list(self.headers) if self.headers else None

    if new_headers:
        if col_idx < 0:
            col_idx = 0
        if col_idx > len(new_headers):
            col_idx = len(new_headers)
        new_headers.insert(col_idx, "")

    new_alignments = None
    if self.alignments is not None:
        new_alignments = list(self.alignments)
        # Pad if needed before insertion?
        if col_idx > len(new_alignments):
            extension: list[AlignmentType] = ["default"] * (
                col_idx - len(new_alignments)
            )
            new_alignments.extend(extension)
        new_alignments.insert(col_idx, "default")  # Default alignment

    new_rows = []
    for row in self.rows:
        new_row = list(row)
        # Ensure row is long enough before insertion logic?
        # Or just insert.
        # If col_idx is way past end, we might need padding?
        # Standard list.insert handles index > len -> append.
        current_len = len(new_row)
        target_idx = col_idx
        if target_idx > current_len:
            # Pad up to target
            new_row.extend([""] * (target_idx - current_len))
            target_idx = len(new_row)  # Append

        new_row.insert(target_idx, "")
        new_rows.append(new_row)

    return replace(
        self, headers=new_headers, rows=new_rows, alignments=new_alignments
    )

insert_row(row_idx)

Return a new Table with an empty row inserted at row_idx. Subsequent rows are shifted down.

Source code in src/md_spreadsheet_parser/models.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def insert_row(self, row_idx: int) -> "Table":
    """
    Return a new Table with an empty row inserted at row_idx.
    Subsequent rows are shifted down.
    """
    new_rows = [list(r) for r in self.rows]

    # Determine width
    width = (
        len(self.headers) if self.headers else (len(new_rows[0]) if new_rows else 0)
    )
    if width == 0:
        width = 1  # Default to 1 column if table is empty

    new_row = [""] * width

    if row_idx < 0:
        row_idx = 0
    if row_idx > len(new_rows):
        row_idx = len(new_rows)

    new_rows.insert(row_idx, new_row)
    return replace(self, rows=new_rows)

to_markdown(schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the table.

Parameters:

Name Type Description Default
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/models.py
125
126
127
128
129
130
131
132
133
134
135
def to_markdown(self, schema: ParsingSchema = DEFAULT_SCHEMA) -> str:
    """
    Generates a Markdown string representation of the table.

    Args:
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    return generate_table_markdown(self, schema)

to_models(schema_cls, conversion_schema=DEFAULT_CONVERSION_SCHEMA)

Converts the table rows into a list of dataclass instances, performing validation and type conversion.

Parameters:

Name Type Description Default
schema_cls type[T]

The dataclass type to validate against.

required
conversion_schema ConversionSchema

Configuration for type conversion.

DEFAULT_CONVERSION_SCHEMA

Returns:

Type Description
list[T]

list[T]: A list of validated dataclass instances.

Raises:

Type Description
ValueError

If schema_cls is not a dataclass.

TableValidationError

If validation fails for any row or if the table has no headers.

Source code in src/md_spreadsheet_parser/models.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def to_models(
    self,
    schema_cls: type[T],
    conversion_schema: ConversionSchema = DEFAULT_CONVERSION_SCHEMA,
) -> list[T]:
    """
    Converts the table rows into a list of dataclass instances, performing validation and type conversion.

    Args:
        schema_cls (type[T]): The dataclass type to validate against.
        conversion_schema (ConversionSchema, optional): Configuration for type conversion.

    Returns:
        list[T]: A list of validated dataclass instances.

    Raises:
        ValueError: If schema_cls is not a dataclass.
        TableValidationError: If validation fails for any row or if the table has no headers.
    """
    return validate_table(self, schema_cls, conversion_schema)

update_cell(row_idx, col_idx, value)

Return a new Table with the specified cell updated.

Source code in src/md_spreadsheet_parser/models.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def update_cell(self, row_idx: int, col_idx: int, value: str) -> "Table":
    """
    Return a new Table with the specified cell updated.
    """
    # Handle header update
    if row_idx == -1:
        if self.headers is None:
            # Determine width from rows if possible, or start fresh
            width = len(self.rows[0]) if self.rows else (col_idx + 1)
            new_headers = [""] * width
            # Ensure width enough
            if col_idx >= len(new_headers):
                new_headers.extend([""] * (col_idx - len(new_headers) + 1))
        else:
            new_headers = list(self.headers)
            if col_idx >= len(new_headers):
                new_headers.extend([""] * (col_idx - len(new_headers) + 1))

        # Update alignments if headers grew
        new_alignments = list(self.alignments) if self.alignments else []
        if len(new_headers) > len(new_alignments):
            # Fill with default/None up to new width
            # But we only need as many alignments as columns.
            # If alignments is None, it stays None?
            # Ideally if we start tracking alignments, we should init it?
            # If self.alignments was None, we might keep it None unless explicitly set?
            # Consistent behavior: If alignments is NOT None, expand it.
            if self.alignments is not None:
                # Cast or explicit type check might be needed for strict type checkers with literals
                # Using a typed list to satisfy invariant list[AlignmentType]
                extension: list[AlignmentType] = ["default"] * (
                    len(new_headers) - len(new_alignments)
                )
                new_alignments.extend(extension)

        final_alignments = new_alignments if self.alignments is not None else None

        new_headers[col_idx] = value

        return replace(self, headers=new_headers, alignments=final_alignments)

    # Handle Body update
    # 1. Ensure row exists
    new_rows = [list(r) for r in self.rows]

    # Grow rows if needed
    if row_idx >= len(new_rows):
        # Calculate width
        width = (
            len(self.headers)
            if self.headers
            else (len(new_rows[0]) if new_rows else 0)
        )
        if width == 0:
            width = col_idx + 1  # At least cover the new cell

        rows_to_add = row_idx - len(new_rows) + 1
        for _ in range(rows_to_add):
            new_rows.append([""] * width)

    # If columns expanded due to row update, we might need to expand alignments too
    current_width = len(new_rows[0]) if new_rows else 0
    if col_idx >= current_width:
        # This means we are expanding columns
        if self.alignments is not None:
            width_needed = col_idx + 1
            current_align_len = len(self.alignments)
            if width_needed > current_align_len:
                new_alignments = list(self.alignments)
                extension: list[AlignmentType] = ["default"] * (
                    width_needed - current_align_len
                )
                new_alignments.extend(extension)
                return replace(
                    self,
                    rows=self._update_rows_cell(new_rows, row_idx, col_idx, value),
                    alignments=new_alignments,
                )

    return replace(
        self, rows=self._update_rows_cell(new_rows, row_idx, col_idx, value)
    )

TableJSON

Bases: TypedDict

JSON-compatible dictionary representation of a Table.

Source code in src/md_spreadsheet_parser/models.py
23
24
25
26
27
28
29
30
31
32
33
34
35
class TableJSON(TypedDict):
    """
    JSON-compatible dictionary representation of a Table.
    """

    name: str | None
    description: str | None
    headers: list[str] | None
    rows: list[list[str]]
    metadata: dict[str, Any]
    start_line: int | None
    end_line: int | None
    alignments: list[AlignmentType] | None

Workbook dataclass

Represents a collection of sheets (multi-table output).

Attributes:

Name Type Description
sheets list[Sheet]

List of sheets in the workbook.

metadata dict[str, Any] | None

Arbitrary metadata. Defaults to None.

Source code in src/md_spreadsheet_parser/models.py
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
@dataclass(frozen=True)
class Workbook:
    """
    Represents a collection of sheets (multi-table output).

    Attributes:
        sheets (list[Sheet]): List of sheets in the workbook.
        metadata (dict[str, Any] | None): Arbitrary metadata. Defaults to None.
    """

    sheets: list[Sheet]
    metadata: dict[str, Any] | None = None

    def __post_init__(self):
        if self.metadata is None:
            # Hack to allow default value for mutable type in frozen dataclass
            object.__setattr__(self, "metadata", {})

    @property
    def json(self) -> WorkbookJSON:
        """
        Returns a JSON-compatible dictionary representation of the workbook.

        Returns:
            WorkbookJSON: A dictionary containing the workbook data.
        """
        return {
            "sheets": [s.json for s in self.sheets],
            "metadata": self.metadata if self.metadata is not None else {},
        }

    def get_sheet(self, name: str) -> Sheet | None:
        """
        Retrieve a sheet by its name.

        Args:
            name (str): The name of the sheet to retrieve.

        Returns:
            Sheet | None: The sheet object if found, otherwise None.
        """
        for sheet in self.sheets:
            if sheet.name == name:
                return sheet
        return None

    def to_markdown(self, schema: MultiTableParsingSchema) -> str:
        """
        Generates a Markdown string representation of the workbook.

        Args:
            schema (MultiTableParsingSchema): Configuration for formatting.

        Returns:
            str: The Markdown string.
        """
        return generate_workbook_markdown(self, schema)

    def add_sheet(self, name: str) -> "Workbook":
        """
        Return a new Workbook with a new sheet added.
        """
        # Create new sheet with one empty table as default
        new_table = Table(headers=["A", "B", "C"], rows=[["", "", ""]])
        new_sheet = Sheet(name=name, tables=[new_table])

        new_sheets = list(self.sheets)
        new_sheets.append(new_sheet)

        return replace(self, sheets=new_sheets)

    def delete_sheet(self, index: int) -> "Workbook":
        """
        Return a new Workbook with the sheet at index removed.
        """
        if index < 0 or index >= len(self.sheets):
            raise IndexError("Sheet index out of range")

        new_sheets = list(self.sheets)
        new_sheets.pop(index)

        return replace(self, sheets=new_sheets)

json property

Returns a JSON-compatible dictionary representation of the workbook.

Returns:

Name Type Description
WorkbookJSON WorkbookJSON

A dictionary containing the workbook data.

add_sheet(name)

Return a new Workbook with a new sheet added.

Source code in src/md_spreadsheet_parser/models.py
468
469
470
471
472
473
474
475
476
477
478
479
def add_sheet(self, name: str) -> "Workbook":
    """
    Return a new Workbook with a new sheet added.
    """
    # Create new sheet with one empty table as default
    new_table = Table(headers=["A", "B", "C"], rows=[["", "", ""]])
    new_sheet = Sheet(name=name, tables=[new_table])

    new_sheets = list(self.sheets)
    new_sheets.append(new_sheet)

    return replace(self, sheets=new_sheets)

delete_sheet(index)

Return a new Workbook with the sheet at index removed.

Source code in src/md_spreadsheet_parser/models.py
481
482
483
484
485
486
487
488
489
490
491
def delete_sheet(self, index: int) -> "Workbook":
    """
    Return a new Workbook with the sheet at index removed.
    """
    if index < 0 or index >= len(self.sheets):
        raise IndexError("Sheet index out of range")

    new_sheets = list(self.sheets)
    new_sheets.pop(index)

    return replace(self, sheets=new_sheets)

get_sheet(name)

Retrieve a sheet by its name.

Parameters:

Name Type Description Default
name str

The name of the sheet to retrieve.

required

Returns:

Type Description
Sheet | None

Sheet | None: The sheet object if found, otherwise None.

Source code in src/md_spreadsheet_parser/models.py
441
442
443
444
445
446
447
448
449
450
451
452
453
454
def get_sheet(self, name: str) -> Sheet | None:
    """
    Retrieve a sheet by its name.

    Args:
        name (str): The name of the sheet to retrieve.

    Returns:
        Sheet | None: The sheet object if found, otherwise None.
    """
    for sheet in self.sheets:
        if sheet.name == name:
            return sheet
    return None

to_markdown(schema)

Generates a Markdown string representation of the workbook.

Parameters:

Name Type Description Default
schema MultiTableParsingSchema

Configuration for formatting.

required

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/models.py
456
457
458
459
460
461
462
463
464
465
466
def to_markdown(self, schema: MultiTableParsingSchema) -> str:
    """
    Generates a Markdown string representation of the workbook.

    Args:
        schema (MultiTableParsingSchema): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    return generate_workbook_markdown(self, schema)

WorkbookJSON

Bases: TypedDict

JSON-compatible dictionary representation of a Workbook.

Source code in src/md_spreadsheet_parser/models.py
48
49
50
51
52
53
54
class WorkbookJSON(TypedDict):
    """
    JSON-compatible dictionary representation of a Workbook.
    """

    sheets: list[SheetJSON]
    metadata: dict[str, Any]

md_spreadsheet_parser.validation

TableValidationError

Bases: Exception

Exception raised when table validation fails. Contains a list of errors found during validation.

Source code in src/md_spreadsheet_parser/validation.py
14
15
16
17
18
19
20
21
22
23
24
class TableValidationError(Exception):
    """
    Exception raised when table validation fails.
    Contains a list of errors found during validation.
    """

    def __init__(self, errors: list[str]):
        self.errors = errors
        super().__init__(
            f"Validation failed with {len(errors)} errors:\n" + "\n".join(errors)
        )

validate_table(table, schema_cls, conversion_schema=DEFAULT_CONVERSION_SCHEMA)

Validates a Table object against a dataclass OR Pydantic schema.

Parameters:

Name Type Description Default
table Table

The Table object to validate.

required
schema_cls Type[T]

The dataclass or Pydantic model type to validate against.

required
conversion_schema ConversionSchema

Configuration for type conversion.

DEFAULT_CONVERSION_SCHEMA

Returns:

Type Description
list[T]

list[T]: A list of validated instances.

Raises:

Type Description
ValueError

If schema_cls is not a valid schema.

TableValidationError

If validation fails.

Source code in src/md_spreadsheet_parser/validation.py
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
def validate_table(
    table: "Table",
    schema_cls: Type[T],
    conversion_schema: ConversionSchema = DEFAULT_CONVERSION_SCHEMA,
) -> list[T]:
    """
    Validates a Table object against a dataclass OR Pydantic schema.

    Args:
        table: The Table object to validate.
        schema_cls: The dataclass or Pydantic model type to validate against.
        conversion_schema: Configuration for type conversion.

    Returns:
        list[T]: A list of validated instances.

    Raises:
        ValueError: If schema_cls is not a valid schema.
        TableValidationError: If validation fails.
    """
    # Check for Pydantic Model
    if HAS_PYDANTIC and BaseModel and issubclass(schema_cls, BaseModel):
        if not table.headers:
            raise TableValidationError(["Table has no headers"])
        # Import adapter lazily to avoid unused imports when pydantic is not used
        # (though we checked HAS_PYDANTIC so it exists)
        from .pydantic_adapter import validate_table_pydantic

        return validate_table_pydantic(table, schema_cls, conversion_schema)  # type: ignore

    # Check for Dataclass
    if is_dataclass(schema_cls):
        if not table.headers:
            raise TableValidationError(["Table has no headers"])
        return _validate_table_dataclass(table, schema_cls, conversion_schema)

    # Check for TypedDict
    if is_typeddict(schema_cls):
        if not table.headers:
            raise TableValidationError(["Table has no headers"])
        return _validate_table_typeddict(table, schema_cls, conversion_schema)

    # Check for simple dict
    # We compare schema_cls against dict type
    if schema_cls is dict:
        if not table.headers:
            raise TableValidationError(["Table has no headers"])
        return _validate_table_dict(table, conversion_schema)  # type: ignore

    raise ValueError(
        f"{schema_cls} must be a dataclass, Pydantic model, TypedDict, or dict"
    )

md_spreadsheet_parser.generator

generate_sheet_markdown(sheet, schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the sheet.

Parameters:

Name Type Description Default
sheet Sheet

The Sheet object.

required
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/generator.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def generate_sheet_markdown(
    sheet: "Sheet", schema: ParsingSchema = DEFAULT_SCHEMA
) -> str:
    """
    Generates a Markdown string representation of the sheet.

    Args:
        sheet: The Sheet object.
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    lines = []

    if isinstance(schema, MultiTableParsingSchema):
        lines.append(f"{'#' * schema.sheet_header_level} {sheet.name}")
        lines.append("")

    for i, table in enumerate(sheet.tables):
        lines.append(generate_table_markdown(table, schema))
        if i < len(sheet.tables) - 1:
            lines.append("")  # Empty line between tables

    # Append Sheet Metadata if present (at the end)
    if isinstance(schema, MultiTableParsingSchema) and sheet.metadata:
        lines.append("")
        metadata_json = json.dumps(sheet.metadata)
        comment = f"<!-- md-spreadsheet-sheet-metadata: {metadata_json} -->"
        lines.append(comment)

    return "\n".join(lines)

generate_table_markdown(table, schema=DEFAULT_SCHEMA)

Generates a Markdown string representation of the table.

Parameters:

Name Type Description Default
table Table

The Table object.

required
schema ParsingSchema

Configuration for formatting.

DEFAULT_SCHEMA

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/generator.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def generate_table_markdown(
    table: "Table", schema: ParsingSchema = DEFAULT_SCHEMA
) -> str:
    """
    Generates a Markdown string representation of the table.

    Args:
        table: The Table object.
        schema (ParsingSchema, optional): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    lines = []

    # Handle metadata (name and description) if MultiTableParsingSchema
    if isinstance(schema, MultiTableParsingSchema):
        if table.name and schema.table_header_level is not None:
            lines.append(f"{'#' * schema.table_header_level} {table.name}")
            lines.append("")  # Empty line after name

        if table.description and schema.capture_description:
            lines.append(table.description)
            lines.append("")  # Empty line after description

    # Build table
    sep = f" {schema.column_separator} "

    def _prepare_cell(cell: str) -> str:
        """Prepare cell for markdown generation."""
        if schema.convert_br_to_newline and "\n" in cell:
            return cell.replace("\n", "<br>")
        return cell

    # Headers
    if table.headers:
        # Add outer pipes if required
        processed_headers = [_prepare_cell(h) for h in table.headers]
        header_row = sep.join(processed_headers)
        if schema.require_outer_pipes:
            header_row = (
                f"{schema.column_separator} {header_row} {schema.column_separator}"
            )
        lines.append(header_row)

        # Separator row
        separator_cells = []
        for i, _ in enumerate(table.headers):
            alignment = "default"
            if table.alignments and i < len(table.alignments):
                # Ensure we handle potentially None values if list has gaps (unlikely by design but safe)
                alignment = table.alignments[i] or "default"

            # Construct separator cell based on alignment
            # Use 3 hyphens as base
            if alignment == "left":
                cell = ":" + schema.header_separator_char * 3
            elif alignment == "right":
                cell = schema.header_separator_char * 3 + ":"
            elif alignment == "center":
                cell = ":" + schema.header_separator_char * 3 + ":"
            else:
                # default
                cell = schema.header_separator_char * 3

            separator_cells.append(cell)

        separator_row = sep.join(separator_cells)
        if schema.require_outer_pipes:
            separator_row = (
                f"{schema.column_separator} {separator_row} {schema.column_separator}"
            )
        lines.append(separator_row)

    # Rows
    for row in table.rows:
        processed_row = [_prepare_cell(cell) for cell in row]
        row_str = sep.join(processed_row)
        if schema.require_outer_pipes:
            row_str = f"{schema.column_separator} {row_str} {schema.column_separator}"
        lines.append(row_str)

    # Append Metadata if present
    if table.metadata and "visual" in table.metadata:
        metadata_json = json.dumps(table.metadata["visual"])
        comment = f"<!-- md-spreadsheet-table-metadata: {metadata_json} -->"
        lines.append("")
        lines.append(comment)

    return "\n".join(lines)

generate_workbook_markdown(workbook, schema)

Generates a Markdown string representation of the workbook.

Parameters:

Name Type Description Default
workbook Workbook

The Workbook object.

required
schema MultiTableParsingSchema

Configuration for formatting.

required

Returns:

Name Type Description
str str

The Markdown string.

Source code in src/md_spreadsheet_parser/generator.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def generate_workbook_markdown(
    workbook: "Workbook", schema: MultiTableParsingSchema
) -> str:
    """
    Generates a Markdown string representation of the workbook.

    Args:
        workbook: The Workbook object.
        schema (MultiTableParsingSchema): Configuration for formatting.

    Returns:
        str: The Markdown string.
    """
    lines = []

    if schema.root_marker:
        lines.append(schema.root_marker)
        lines.append("")

    for i, sheet in enumerate(workbook.sheets):
        lines.append(generate_sheet_markdown(sheet, schema))
        if i < len(workbook.sheets) - 1:
            lines.append("")  # Empty line between sheets

    # Append Workbook Metadata if present
    if workbook.metadata:
        # Ensure separation from last sheet
        if lines and lines[-1] != "":
            lines.append("")

        metadata_json = json.dumps(workbook.metadata)
        comment = f"<!-- md-spreadsheet-workbook-metadata: {metadata_json} -->"
        lines.append(comment)

    return "\n".join(lines)