Tuesday, April 21, 2009

Looping through files in bash, quietly

All too often, I find myself looking at a scenario when I want to look at all files located in a particular directory, and for each file - do something to the file (maybe, copy it, rot13 it, several things, etc...).

There are many ways to do this, like

find ${dir} -maxdepth 1 -exec cat "{}" \;

But what if you wanted to spend some time doing other things with those files, maybe rot13 it and email it to a destination determined by the name and subject? Ok, so that's probably not really what you want to do with those files, but the point is being missed entirely.

Welcome the bash for loop - a simple loop that we're all familiar with, but forget about what it really does. For instance:

for file in *.txt; do
echo ${file}
done

Simple enough - it just echos the files - but what is happening on that for line? Expansion. Yes, your listing of *.txt is now being expanded to say a.txt, b.txt, c.txt and more-importantly "don't wait.txt".

Ah-ha, what happens with "don't wait.txt" ? Well, the problem with the for loop is this - the space is interpreted as a delimiter, so now you have "don't" and "wait.txt" - not really what you wanted now is it?

Of course, if it was - then you should enjoy it now, because when you don't want it - then you're stuck in the situation where the files weren't processed correctly, and your entire business is going to shambles, all because someone decided to be funny and upload a file with a new name that doesn't match your format... But wait, there's a fix!

Aside from the typical notions of using perl, or find with a pipe to an auxillary function, it can also be handled by using a combination of "ls", "grep" (or find) and "for" . Here's a process:

tmpfile="/tmp/fun.tmp.$$"
dir="/home/sites/site123/data/"
cd "${dir}"
ls -1 | grep -E '\.txt$' > "${tmpfile}" 2>/dev/null
# the prior 2 lines can be changed to:
# find ${dir} -maxdepth 1 -type f -name '*.txt',
# but that's just not as fun...
max=$(cat "${tmpfile}" | wc -l)
for ((i=1;i<=${max};i++)); do
file=$(cat "${tmpfile}" | head -n "${i}" | tail -n 1)
# If you really wanted to use perl, might as well use
# IO::Dir/opendir/readdir but if you're focusing on
# this detail - you're missing the point!
perl -i -ne 'tr/[a-zA-Z]/[n-za-mN-ZA-M]/;print;' \
"${file}"
done
rm -f "${tmpfile}"

The magic? Looping through a counter of files, rather than the files themselves (and quoting your arguments) so that shell expansion doesn't kick in and ruin your day.

Friday, April 10, 2009

For what it's worth

Just like everyone else in the world, I have a blog. Expect commentary about once every 6 months at best.