Zipping the Lines of Files
I think that zip
is one of the coolest functions in Python. It lets you combine parallel elements from one or more iterables into a single iterable of tuples. I know, I know; that’s quite a mouthful and the statement has many words without really saying anything meaningful. Really, the best way to understand zip
is to see it in action.
Let’s say that you have two arrays of integers, a
and b
, with the following definitions (in pseudo code):
a = [1,2,3];
b = [4,5,6];
zip
ping them together would give you zip(a,b)
with the following elements:
zip(a,b) = [[1,4], [2,5], [3,6]]
Pretty neat!
Zipping Lines of Files
Wouldn’t it be cool if we could apply a similar concept to the lines of files? I found myself needing this functionality when I wanted to combine csv files without having to resort to LibreOffice Calc.
Assume that I have three files, file1
, file2
and file3
. file1
has the following content:
File1 Line1
File1 Line2
File1 Line3
File1 Line4
file2
has the following content.
File2 Line1
File2 Line2
file3
has the following content:
File3 Line1
File3 Line2
File3 Line3
File3 Line4
I would like to be able to zip those files together with the following result:
File1 Line1, File2 Line1, File3 Line1
File1 Line2, File2 Line2, File3 Line2
File1 Line3, , File3 Line3
File1 Line4, , File3 Line4
Ziplines
I wrote a utility called ziplines
that will do just that!
./ziplines <delimeter> <file 1> <file 2> ... <file N>
Code
I am sure that there are myriad ways to do this with BASH, Python, PERL, etc, but I am always looking for a reason to write code in C++. The source is available at github.
Feedback
I love feedback and am always looking for ways to improve. If you see anything you think I can improve with this post or the implementation of zipfiles
, please contact me using the information in the footer!