How to Recover Data Records with Vimscript

This post shows how to recover records from fixed-format data files without end of line characters to indicate the end of a record. The record length must be read somehow, preferably from a metadata file. In order to recover data records, the text must be split into multiple lines with a given record length.

As a data provider, I was asked to recover data records from a fixed-format data file without end of line characters. At that time, I chose Vimscript as a tool to accomplish this task. Vim can easily execute formatting tasks via scripting. However, string operations are costly and reduce speed, particularly when looping.

Thus, I decided to split the original data file into smaller files, split the lines in the smaller files into individual records, and finally append the individual records from the smaller files into a single output data file. I created two functions to do just that. These functions are called PostTypeFixed() and MakeLines().

PostTypeFixed() takes two arguments, the actual record length and an integer called factor. Factor is the number of smaller files created by the function. PostTypeFixed() calls MakeLines() on each of the smaller files. MakeLines() splits the lines into individual records. Thus, factor is used to increase the performance of MakeLines() by splitting very long lines into shorter lines.

 

Leave a Reply

Your email address will not be published.