Tuesday, April 27, 2010

Linux BASH tip - How to subtract arrays

So I needed a BASH code snippet to subtract arrays...

What I mean by "subtracting arrays" is actually set difference like so -

IF
array1="A B C D E F G H"
array2="C B D G"
THEN
array1-array2 = "A E F H"

Note that the arrays (strings with space separated terms really) need not be in any particular order. Also it's not be clear from the example above but in my solution duplicates are automatically removed.

So without further ado here's the code -

$ array1="A B C D E F G H"
$ array2="C B D G"
$ finalResult=`comm -23 <(echo ${array1} | sed 's/ /\n/g' | sort -u) <(echo ${array2} | sed 's/ /\n/g' | sort -u)`
$ echo $finalResult
A E F H

It uses the cool "comm" which comes installed on Linux by default. Comm basically compares two sorted files line by line and can give you lines only in a particular file and not the other; In other words the "difference" between two files. This is basically all we needed. We do need to sort the inputs (which removes duplicates as well) before we pass them to comm, hence the use of 'sort'. And we need to convert 'space' to 'newline' before we can pass the strings to sort, hence the use of 'sed'. Piping it all together, we have a nice little one liner.

1 comment:

fepede said...

very smart use of comm, thanks for the post!