Quote of the Day
The world always seems brighter when you've just made something that wasn't there before.
— Neil Gaiman. Yup – this is what I love about engineering. I get to make new things all the time.
I have been researching some of the work of Alfred Thayer Mahan, a US Navy officer and strategist, that was an influential voice in the 19th-century military circles. He was a strong advocate for the role of sea power in securing economic opportunity (link). This research has motivated me to determine how US Navy ship numbers have varied with time.
Mahan wrote when the US had a very weak navy and advocated for increasing the navy's number of capital ships (link) and foreign bases. Interestingly, Mahan was a rather poor ship captain and was involved in a number of ship collisions (link). I always find it interesting when you find strategists that are not particularly good at low-level functions – yet we usually promote people based on their low-level performance. Similarly, Nimitz was a good strategist but, like Mahan, I would not want him conning my ship (link).
My reading and watching lectures on Mahan have motivated me to look at how the US Navy grew and shrank over the years. Fortunately, the Naval History and Heritage Command (NHHC) have an excellent page on the size of the US Navy over time. Unfortunately, the data is scattered throughout the page and it must be scrapped so that I can consolidate and graph it. This post is about using Power Query to scrape the data from the page and generate a graph in Excel of the number of active ships in the US Navy over time.
For those who like to follow along, my Excel workbook is here.
My analysis was motivated by a statement in this Youtube lecture on Mahan where John Kuehn mentioned that the US Navy in the late 1800s has less than 40 ships. I was surprised at that number and I wanted to investigate further.
|Figure 2: Good Mahan Video Briefing.|
The NHHC website is a bit odd in that it has broken up the US Navy ship number data into multiple tables that contain various numbers of years worth of ship type and number data. I could use R or Python to scrape the page, but I am in a hurry and Power Query grabs web tables very cleanly and quickly. Here is the process:
- Grab all the tables on the page
- Filter for the tables with ship number data
- Use a PQ function to grab the total active ship number data for each year in each table.
- Consolidate all the table
- Generate the Graph
Figure 3 show my plot of the scraped ship number data.
Here is what the graph tells me:
- The US Navy ship count at the end of WW2 is amazing.
- The US Navy ship count two years after the end of WW2 was equally amazing.
- The most common WW2 ship type was "amphibious" and the most common amphibious ship was a Landing Craft, Vehicle, Personnel (LCVP), which makes sense. The LCVP is shown in Figure 1.
- The US Navy had a small increase in ship numbers during WW1, but US participation was so short that many of those ships were not finished by war's end.
- The current US Navy ship numbers in actually relatively low.
I am going to use this as an exercise for Excel students because I was able to solve a relatively tough problem very quickly (I had data in about 10 minutes). The problem provides students with experience in:
- Power Query web scraping
- Cleaning up web tables
- Writing a Power Query M-function
- Generating a graph from a table