Come together, right now...

In which we join files line by line.

Working in IT, there is no shortage of "grunt work" that is amenable to one off scripts or applications.  In an effort to prevent others (and possibly myself) from having to reimplement these, I am publishing these as useLESS tools.

useLESS tools: scripts or programs written to paper over the shortcomings or deficiencies in applications or processes.  You could use less, and they would be made useless, if you could reimplement everything in light of the experience gained by doing it the last time.

Today I give you our first useLESS tool - fjoin

At work I am currently involved in integrating with an external vendor.  As part of the process, I have been provided with pseudo realistic test data to process.  Being pseudo realistic, it doesn't really match with what our system expects and must be "massaged".  That is to say, nothing works.

In particular, I have two text files.  One text file has the IDs used by the vendor, and the other has the IDs our system expects.  I need a way to turn these into SQL update statements, and it needs to be repeatable so I can rerun it as testing progresses.  What to do?

Using notepad++ or you favorite editor, it is fairly easy to record a macro that inserts

UPDATE [TABLE] SET [FIELD] = '

before each line of one file and

' WHERE ID = 

at the beginning of each line in the other so that if you took line x from the first file and line x from the second file and joined them, you would get something like

UPDATE [TABLE] SET [FIELD] = 'OURID_123' WHERE ID = 42

The real trick is merging the two files line by line.  fjoin does just this.  It takes two or more files and joins them line by line stopping at the end of the shortest file.  The result is sent to the terminal so you can pipe this into another program or redirect to a file.

If you would like to create an executable from this code, you can use pyinstaller

pysinstaller -F fjoin.py

This will create a single (rather large) executable for you.

import sys

if len(sys.argv) == 1:
    print 'No files specified'
    sys.exit(0)

files = []
# skip the name of the program
for arg in sys.argv[1:]:
    try:
        files.append(open(arg))
    except IOError:
        print 'ERROR:', arg, 'does not exist'
        sys.exit(1)

reading = True
composite = []

while reading:
    for f in files:
        line = f.readline()
        # stop as soon as a file comes up short
        if not line:
            reading = False
            break
        else:
            # remove newline
            composite.append(line.rstrip('\n'))
    # only output if we joined across all files
    # in other words, no partial lines
    if len(composite) == len(files):
        print ''.join(composite)
    composite = []

for f in files:
    f.close()

fjoin.py (2.36 kb)

Loading