Contents

Compiling a List of Every Airport in the World

I wanted a list of every airport. That's normal. The closest list I could find was via prokerala. The issue I have with this page is that each country is it's own page! Unaccaptable!

This is the process I followed to compile a single list with every code.

Get the links to each country's airport page

First I inspected the HTML on the page. The part I'm interested looks like:


<li class=country-list-li><a href="/travel/airports/afghanistan/">Afghanistan</a></li>
<li class=country-list-li><a href="/travel/airports/aland/">Aland Islands</a></li>
<li class=country-list-li><a href="/travel/airports/albania/">Albania</a></li>
<li class=country-list-li><a href="/travel/airports/algeria/">Algeria</a></li>
...

So I need to pull out the lines that contain class=country-list-li. I do this with the following command.


[jcaskey@jcaskey ~]$ curl http://www.prokerala.com/travel/airports/country-list/ 2> /dev/null | grep "country-list-li"

The final step is to get just the URI. I'll use the cut command to get only the second field using double-quotes as the deliminator.


[jcaskey@jcaskey ~]$ curl http://www.prokerala.com/travel/airports/country-list/ 2> /dev/null | grep "country-list-li" | cut -d'"' -f2
/travel/airports/afghanistan/
/travel/airports/aland/
/travel/airports/albania/
/travel/airports/algeria
...
Much better.

Getting the airport codes from each page

Now I can worry about the airport codes. I follow the same basic path as before. First, inspect the HTML to see what I need.


<td class=tc>BIN</td>
<td class=tc>OABN</td>
So I need to find class=tc and then the three or four char airport ID.

[jcaskey@jcaskey ~]$ curl http://www.prokerala.com/travel/airports/afghanistan/ 2> /dev/null | grep -Eo "class=tc>[A-Z]*\b"
class=tc>
class=tc>
class=tc>BIN
class=tc>OABN
class=tc>
class=tc>BST
class=tc>OABT
That's nice, but I'm getting a lot of empty lines with that command. The issue is that I'm using the * wildcard, which will match zero or more instances of a capital letter. I need to use the + wildcard, which will match one or more instances of a capital letter.

[jcaskey@jcaskey ~]$ curl http://www.prokerala.com/travel/airports/afghanistan/ 2> /dev/null | grep -Eo "class=tc>[A-Z]+\b"
class=tc>BIN
class=tc>OABN
class=tc>BST
class=tc>OABT
Now I get the actual code.

jcaskey@jcaskey ~]$ curl http://www.prokerala.com/travel/airports/afghanistan/ 2> /dev/null | grep -Eo "class=tc>[A-Z]+\b" | cut -d'>' -f2
BIN
OABN
BST
OABT
CCN
OACC
...
Finally, comma seperate using the tr command to translate newlines into commas.

[jcaskey@jcaskey ~]$ curl http://www.prokerala.com/travel/airports/afghanistan/ 2> /dev/null | grep -Eo "class=tc>[A-Z]+\b" | cut -d'>' -f2 | tr '\n' ', '
BIN, OABN, BST, OABT, CCN, OACC, DAZ, OADZ, FBD, OAFZ, FAH, OAFR, GRG, OAGZ, GZI, HEA, OAHR, JAA, OAJL, KDH, OAKN, KHT, OAKS, KWH, OAHN, KBL, OAKB, UND, OAUZ, KUR, MMZ, OAMN, MZR, OAMS, IMZ, LQN, OAQN, SBF, SGA, OASN, TQN, TII, OATN, URZ, ZAJ, OAZJ, 

Putting it all together

I want to iterate over each country page, get the list of codes, and output those codes to a file. This is the whole script. I have added formatting here, but this was executed as a single command.

for country in $(curl http://www.prokerala.com/travel/airports/country-list/ 2> /dev/null \
  | grep "country-list-li" \
  | grep -Eo "href=\"[^\"]+\"" \
  | cut -d'"' -f2);
do
  curl http://www.prokerala.com$country 2> /dev/null \
  | grep -Eo "class=tc>[A-Z]+\b" \
  | cut -d'>' -f2 | tr '\n' ', ' \
  >> airport_codes.tmp;
done;
Isn't that nice.

Show me the codes!

Of course I will provide the codes. There are 17,116 of them.


Expanding a Virtual Disk for VirtualBox

I recently ran out of room on my Windows VM. Turns out that 25G is nothing when installing Windows is involved. No big deal, I'll just expand the virtual disk to give myself more room.

Expanding the Virtual Disk

Using VBoxManage, it is trivial to expand a disk. One command, in fact.


VBoxManage modifyhd /path/to/VirtualMachines/Windows_8.1/Windows_8.1-HD.vdi --resize 4500

Great, now I have expanded my drive to 45G. That should be plenty. Worked like a charm!

Windows Denies Those Changes

I can see the larger disk in the VirtualBox settings. It's there. I'm looking right at it, yet Windows just denies it's existence and reports the original 25G.

After using this as an opportunity to hate on Windows I decide to actually look into the issue. The second result in my search gave me the answer I was looking for. It turns out that this is an issue with VirtualBox and not Windows. "You can only resize disks if they are not part of a VM that uses snapshots".

The Solution

  1. clone the Windows VM to a new VM
  2. perform the VBoxManage command above, to the new VM
  3. boot the new VM
  4. expand the partition in Windows
  5. download everything I can find and use all of this new space!

This eliminates any old snapshots. Maybe I could have just deleted them, I don't know. The process took about 15 minutes, including waiting for VMs to clone and disks to resize. Not bad.