My day to day use of the command line doesn’t change much from one day to the next, and I sometimes forget just how fine a thing a powerful command line is. Every now and then, though I’m reminded just how much I like having the ability to connect together simple commands in ways that the original authors had never thought of. Recently, a simple pipeline I put together to get live train departure times was just such a reminder.
Up until February this year I was fortunate to receive a lift to work from one of my work colleagues. Unfortunately he has now left the company [1], leaving me to travel the 26 mile round trip by train and bike. This isn’t such a bad thing though; I have been promising myself and my partner that I would take more exercise for such a long time that it had gotten to the point that it was only ever likely to happen if forced upon me.
There is only one train an hour, with the 17:12 train leaving 15 minutes before I could comfortably get to the station. However, it’s not unknown for the trains to be over 15 minutes late, so if I know that this train is delayed, it would be worth getting out the door on time and biking until my lungs explode to catch it. So I have a need for up-to-date train times (and better lungs).
Live train times can be found on the live departure boards website; unfortunately, I’m just too lazy to check this often enough for it to be of any use. What I really need is a program to check the website for me and inform me if the trains are running late. A few minutes work and I had created a simple pipeline to do just that.
wget --quiet --output-document - 'http://www.livedepartureboards.co.uk/ldb/sumdep.aspx?T=LMS&S=COV&A=1' \ | grep 'headers="header[34]' \ | cut -d'>' -f2 \ | cut -d'<' -f1 \ | sed 'N; s/\n/ /'
The wget command downloads the accessible version of the page containing the train times I’m interested in. The accessible version adds some extra attributes to the HTML tags making it much easier for me to find just the lines I want. They can be found by searching for any lines containing either headers="header3 or headers="header4, which provides something similar to:
<td headers="header3 header7">17:12</td> <td headers="header4 header7">On time</td> <td headers="header3 header9">18:12</td> <td headers="header4 header9">18:30</td>
The two cut commands first strip out everything before the first > and then everything after the first <, providing:
17:12 On time 18:12 18:30
Finally, because I’m picky I call sed to join the lines together to give me the desired output:
17:12 On time 18:12 18:30
The sed command consists of two instructions to sed. The first instruction, N, reads the next line from the input and appends it to the line of text that sed is currently working on. The second instruction, s/\n/ / substitutes the new line character used to join the two lines together for a single space. So when sed is working on the line “17:12″ and the next line is “On time”, it first joins them together to form “17:12\nOn time” and then transforms it into “17:12 On time”.
You can see from the final output that the 17:12 train is on time, whilst the 18:12 is delayed until 18:30, so it looks like I’ll be staying in the warm office with the free coffee a little while longer.
There are a couple of easy ways to improve this script, I could either use watch to keep the output continuously updated. Or use cron and zenity to inform me if either of those trains are not “On time”. I’m using watch at the moment and running it in a terminal which is visible on all of my virtual desktops. This is working well for me, apart from the small snag that I’m still waiting for the 17:12 to be delayed
[1] He is currently looking for work, so if you are in the Edinburgh area and require a talented Linux sysadmin, head over to his website.
One Comment
Hi, good post. I have been wondering about this issue,so thanks for posting. I’ll definitely be coming back to your site.