CodeKata Four: Data Munging
Dave's specification: http://blogs.pragprog.com/cgi-bin/pragdave.cgi/Practices/Kata/KataFour.rdoc
Mmm, parsing problems. This is easy stuff, using pretty standard UNIX tools. No fancy-schmancy Python needed here; /bin/sh and awk are good enough for this kind of thing.
- Weather data parser: codekata/4/weather.sh
- Time spent:
- Probably 10-15 minutes doing a rough cut, then a few more minutes cleaning up the interface.
- Comments:
- This is trivial stuff that system administrators do fairly regularly, so it's probably an unfair exercise for me; parsing formatted data is my day-to-day.
- Time spent:
- Soccer data parser: codekata/4/soccer.sh
- Time spent:
- Because it was a big cut-and-paste of the original code with a couple of small changes, probably only about three minutes.
- Comments:
- Since the first table parser was basically identical to what I needed to do here, implementation was trivial.
- Time spent:
- Combined parser: codekata/4/both.sh
- Time spent:
- 10-15 minutes.
- Comments:
- This was actually harder than just reusing the original version, because of the tools I chose to use; sh and awk scripts don't lend themselves well to "good programming practices". ;-) A little creative substitution, and we were all set.
- Time spent:
Answers to Dave's questions:
- Using sh and awk made it a little less convenient to merge the two later. Had I known in advance (or read ahead), I would probably have picked Perl, Python, or something a little more "all-in-one", rather than using a set of disconnected tools to do it (in this case, sh handling the interface, and awk handling the heavy lifting).
- The second program wasn't just inspired by the first: it was practically a cut-and-paste.
- The programs suffered from the additional refactoring here. I really just obfuscated things for very little value (the original scripts were 27-28 lines each, and that's with some thought given to user interface; the core awk script is only 18 lines long). That being said, adding additional parsers to this is prety easy now, since we have a simple definition for the "business rules" of the data: header length, columns to compare, and column to output. Add a few extra things (field separators, line separators, etc), and you'd...well, you'd have awk, which I think was my original point. ;-)
An interesting thing about this was that the combination of awk for processing, sh for interface, and the data definition provided by Dave gives us a pretty low-tech version of MVC. Well, isn't that interesting; even my old-school UNIX tools can fit into whatever modern best-practices model-of-choice gimmick-widget we're using today. :-)
