Parsing Huge XML Files Incrementally

This post presents a Python script for parsing huge XML files incrementally. The purpose of the script is to convert XML tables to delimited text files. I searched online for inspiration while making the script and found relevant documentation and very useful posts with code examples. However, it took me a couple of days to develop the complete solution for this trivial task and this is why I have chosen to publish my script here.

Use Case

The XML tables are generally huge, in the order of tens of gigabytes or more. Here is a very small example showing the XML content of a table with twelve columns and three rows:

Parsing Huge XML Files Incrementally

Here is the code that performs incremental parsing of the XML table while converting the content to a delimited text file:

Execute the script from shell:

References

The script is based on input from the posts below:

Leave a Reply

Your email address will not be published.