USNS Comfort Bed Count Relative To Other US Hospitals

Quote of the Day

The more a man rises in leadership the more responsible he is not just for what he says but also for what people hear.

Figure 1: USNS Comfort is now berthed in New York City. (Wikipedia)

Because of COVID-19, I am currently sheltering-in-place at my cabin in northern Minnesota. I was reading news reports of the USNS Comfort docking in New York City to provide the city with another 1000 bed hospital. I had no idea as to the average size of a hospital and decided to take a quick look at the number of beds in hospitals around the United States using a histogram.

Fortunately, the Department of Homeland Security (DHS) makes an enormous amount of US infrastructure data available online at the Homeland Infrastructure Foundation-Level Data (HIFLD) website.  On this webpage, you can find details on all the hospitals in the US. As always, there are issues with the data. For example, while it only has bed counts for 6919 hospitals out of the 7581 hospitals in its list. My graph will filter out the hospitals that have no bed count data, which are listed in the file as having -999 beds. For those who are interested in looking at my Excel workbook, it is available here.

The government database lists the following types of hospitals in the US:

• Children's
• Chronic Disease
• Critical Access (rural hospitals)
• General Acute Care
• Long-Term Care
• Military
• Psychiatric
• Rehabilitation
• Special
• Women's

I am only interested in the General Acute Care and Critical Access types of hospitals, of which there are 5263. These are the hospitals that would bear the brunt of caring for COVID-19 victims. Figure 2 shows the number of US hospitals by bed count.

Figure 2: US Hospital Bed Count Distribution.

Here is what Figure 2 tells me that:

• USNS Comfort's 1000 beds put her in the top 0.6% of hospitals in the US by bed count. It is a major hospital by any measure.
• The bulk of US hospitals (~71%) have less than 200 beds.
• 90% of US hospitals have less than 400 beds.

I should mention that the USNS Comfort has a sister ship called the USNS Mercy, which is the lead ship of the Mercy class of hospital ships. Both ships are converted supertankers originally built to service the trans-Alaska pipeline.

Maple Sap is Sweet This Year

Quote of the Day

Statistics are like brief swimming costumes; what they reveal is interesting, but what they conceal is absolutely vital.

— old statistics professor

Figure 1: My Neighbor's Maple Syrup Bags.

I am being a good citizen and socially distancing myself by staying at my cabin in northern Minnesota – avoiding people is easy in the dense forest that surrounds my residence. When I went out for my daily walk today, I saw that my neighbor had put out maple sap bags (Figure 1). Most folks around here collect their sap in these blue bags.

Some of my fondest boyhood memories are of my father driving the tractor that was pulling a trailer on which my grandfather and I sat on while going from tree to tree gathering pails of maple sap.

My brother has also been collecting sap this spring and he is reporting that it is taking 30 gallons of sap to obtain one gallon of syrup, which is better than the 40-to-1 ratio that he usually reports. This means the sap is a bit sweeter this year than last. Maple syrup typically has a sugar content greater than 66%, but maple sap typically has a sugar content of about 2.5% but can vary anywhere from 1% to 5%.

You can estimate the amount of sap you will need to make a gallon of maple syrup using the Jones Rule, or Rule of 86 (Equation 1). As an example of using the Rule of 86, given a 2.5% sugar concentration sap, you will need 86/2.5 = 34 gallons of sap to make 1 gallon of maple syrup.

 Eq. 1 $\displaystyle {{N}_{{Sap}}}=\frac{{86}}{{{{C}_{{Sap}}}}}$

where

• NSap is the number of gallons of maple sap per gallon of syrup.
• CSap is the sugar concentration of maple sap expressed in percent.

WW2 Fighter Aircraft Fuel Fraction

Quote of the Day

Just because you had it rough does mean you need to inflict that mentality on subordinates – there's no pride to be had in suffering for the sake of suffering. Subordinates do not need to carry your burdens to have high quality of work ethic.

— Slide from Air Force briefing on training. I could not agree more. I have seen far too many people made to suffer because someone just wanted to be cruel.

Introduction

Figure 1: Air Force fighter pilot who played a key role in developing modern fighter plane designs. (Source)

I recently read a book called Boyd: The Fighter Pilot Who Changed the Art of War that documents the life of John Boyd, a fighter pilot who was critical to putting fighter aircraft design on a firm mathematical footing now called Energy-Maneuvering Theory. He was a key member of the Fighter Mafia, a group of military and industry experts in the 1970s that advocated for a lightweight fighter alternative to the large, heavy fighters like the F-111 and MiG-25. Their concepts played a significant role in the final designs of the F-16 and F-18.

Boyd also had an impact on modern management with his advocacy of a nimble decision-making strategy called the Observe-Orient-Decide-Act (OODA) loop that has become doctrine in many military, legal, and corporate circles. I certainly encountered this approach in numerous corporate management situations.

One fighter metric that Boyd used was called the fuel fraction, which is defined as the ratio of fuel weight to the plane's take-off weight.  In theory, this metric should provide an indication of the range of the fighter. I became curious as to how the fuel fraction of WW2 fighters compare with modern fighters. No sophisticated data analysis here, just some simple digging in various airplane databases and putting the data into a spreadsheet, which can be downloaded here.

Background

Overview

Modern fighter aircraft have fuel fractions in the 30% range. For example, the F-22 has a fuel fraction of 29% and the Eurofighter Typhoon has a fuel fraction of 31% (reference). I was curious as to what the fuel fraction was of US WW2 fighters, so I decided to do a little research.

Definitions

There are numerous terms in the specifications that are not always used consistently. Here is my take on these terms.

Out and back range of an aircraft loaded for combat, including a budgeted time for combat. There appears to be no standard definition for this term, so it cannot be used to compare different aircraft.
Empty Weight
The empty weight of an aircraft is the weight of the aircraft without passengers or fuel, but with unusable fuel, full operating fluids, and full engine oil. (link)
Ferry Distance
For the purposes of this post, the one-way range of an aircraft without drop tanks. The aircraft is configured for maximum range and has minimum or no weapons load. You will find references that do refer to ferrying aircraft during wartime with drop tanks. I have tried to find references for ferry range that did not use drop tanks.
Takeoff weight of an aircraft under normal circumstances, which includes fuel and typical weapons load.
Transfer Distance
Same as ferry range.

Methodology

I thought my research would involve simply reviewing various aircraft specifications. However, few sources agreed on any of the specifications for range. I ended up taking the minimum ferry distance from contemporaneous military sources or books on these aircraft. Once I had the ferry range, I simply divided the internal fuel weight by the takeoff weight to get the fuel fraction.

For this exercise, I examined some of the most common US fighters, plus the Japanese Zero. I added the Zero because it was famous for its long range.

I should note that WW2 fighters often used drop tanks for both combat and ferry flights. There were numerous drop tank configurations and I decided to focus on the internal fuel capacities and on the one-way maximum range on internal fuel to keep things simple.

Analysis

There is nothing sophisticated in this work – simple research, organizing, and plotting. Figure 2 shows my scatter plot of a small sample of WW2 fighters.

I was hoping to see a correlation between the ferry range and fuel fraction, but I see no strong relationship. It is interesting to note that most of the fighters had fuel fractions around 12%, which is much less than the fuel fraction of modern jet fighters.

Figure 2: Fuel Fraction versus Ferry Range.

It is interesting to note that the two fighters with the longest ferry range were the F6F-3 (Hellcat) and the A6M2 (Zero). These were fighters designed specifically to operate from carriers in the Pacific where distances were very long. Note that the fuel fraction of the Hellcat was relatively low, while the fuel fraction of the Zero was relatively high.

Conclusion

The fuel fractions for US fighters during WW2 appears to be  ~12%, which is substantially less than the ~30% value often seen with modern fighters.

Combined Excel Timeline/Column Chart using WW2 Naval Air Data

Quote of the Day

I'd rather fly in a Spitfire but fight in a Hurricane - because the Hurricane was made of non-essential parts. I had them all shot off at one time or another, and it still flew just as well without them.

James 'Ginger' Lacey of 501 Squadron, who was the second-highest ace from the Battle of Britain.

Introduction

Figure 1: F6F Hellcat, the fighter that dominated US carrier sorties in the later part of the Pacific War.

I recently needed to generate a graph in Excel that combined a column chart with a timeline. The graph turned out well and I decided to share my work here. As my original work is proprietary, I will share the technique here using some US Navy (USN) air operations data from WW2.

I OCRed Table 5 of the WW2 Naval Aviation Combat Statistics (NASC) report, which contains both monthly squadron numbers and action sorties flown by month. I filtered the data to show the sum of the fighter (Wildcat, Hellcat, Corsair) and bomber (Dauntless, Helldiver, and Avenger) sorties.

For those who like to follow along, my workbook is here.

Background

I prepared a column chart of the monthly fighter and bomber sorties. I also added a timeline of major battles to provide some context for the number of sorties. The date ranges of the major battles were obtained from the Wikipedia.

The NASC separated the sortie data into carrier-based and land-based categories.

Analysis

Data

I used Tabula to grab Table 5 from the NASC and output a CSV file. I then used Power Query to clean the data and to prepare it for graphing.

Graph

Figure 2 shows my plot of carrier and land sorties versus time in the chart's upper half and the battle timeline on the chart's lower half. This graph nicely illustrates that:

• Until late-1943, the USN had relatively few sorties per month compared to the first few months of 1945.
• The Battles of Coral Sea, Midway, and Guadalcanal can be seen on the graph, but have relatively few sorties compared to the Battles of Manila, Iwo Jima, and Okinawa.
• There is a long carrier sortie drought during most of 1943.

Figure 2 shows very clearly that the USN needed ~18 months to build itself into a navy that could take on the IJN at sea.

Figure 2: Sorties vs Time with Timeline.

Conclusion

While I do not like to use Excel for graphics, it does allow you to easily combine different chart types. This can be useful for some types of presentations, such as this one.

Web Scraping Sailboat Reviews Using Power Query

Quote of the Day

The real problem of humanity is the following: we have paleolithic emotions; medieval institutions; and god-like technology.

Figure 1: The Seawind 1600 was a top vote-getter. (Source)

While it is true that I worked on US Navy contracts for twelve years and spent some time on ships testing new underwater vehicles, I know very little about recreational boating. However, I have always been fascinated by sailing, though this fascination has been limited to reading books about the Age of Sail.

Recently, my wife and I have become huge fans of three sailing channels on Youtube that may cause me to take up sailing when I retire (about five years away). Here are my reviews of these three channels:

• Sailing Ruby Rose
Nick and Terysa and their Southerly 38 monohull sailboat seem to have the most diverse sailing experience. For example, I was utterly fascinated with how they took down their mast, pulled up their keel, and used their sailboat to cruise the canals of France. Their style of presentation is the most journalistic of the three shows in that they matter-of-factly present a large amount of interesting information.
• Gone With the Wynns
Jason and Nikki Wynn sail the Pacific with their Leopard 43 catamaran sailboat, which they purchased before learning to sail. I could never imagine myself being so bold.  The show is the best produced of the three programs. Nikki is a first-rate narrator and Jason is a professional photographer. His skills really show. The show is beautifully filmed and edited. His use of the drone for aerial shots is superb. These two are fun to watch because you get to see them learning as they go. I am particularly impressed with their worldwide network of helpers.
• The O'Kelly's
Nick and Megan O'Kelly and their Leopard 46 catamaran sailboat cruise the east coast of the US and the Carribean. Nick is a lifelong sailor and former TV weatherman. He has significant presentation skills and his talks are the most technical in terms of navigation and weather. Since my life is a celebration of detail, I love seeing his explanations of weather and how he plans sailing routes. Megan appears both in their sailing videos and she also has her own video channel where she covers non-sailing topics, like health and sewing. I enjoy this channel as well. I particularly like her sewing projects (my sewing has been limited to a few upholstery projects and a quilt).

Sailboat Survey

The Ruby Rose crew is selling their current boat and did a very interesting analysis of the boats they were considering purchasing as a replacement. Not only did Nick and Terysa review the boats, but they invited their viewers to review the boats.

Here are links to the sailboat review pages. While a good format, I wanted to see all the results in a cross-tabulated table.

 St Francis 50 Majestic 530 Seawind 1260 Knysna 500SE Maverick 440 Balance 526 Antares 44 GS Xsquisite X5 Discovery 50 Excess 12 Catana 53 Outremer 51 Privilege 5.0 Neel 47 NautiTech 46 Leopard 45 Astrea 42 Seawind 1600 Lagoon 42

Because the data is scattered across nineteen pages, I decided that I needed to do a bit of web scraping using Power Query. You can download my Excel workbook here.

The boats were assigned grades (1-10) in the following categories:

• Performance
• Interior Design
• Build Quality
• Safety and Design
• Value for Money

The final grade was assigned by simply summing the total scores, which implicitly gives each category an equal weight. I wanted my data in a spreadsheet so that I could weight the categories differently. For example, I would like to give Safety and Design and Build Quality more weight than Performance and Interior Design.

Table 2 shows the results from their survey of people who watch their channel. Each boat had been the subject of a separate video that allowed their audience to get a good look at each boat.

Table 2: Ruby Rose Sailboat Reviews

BoatPerform-
ance
Interior DesignQualitySafety/
Design
ValueTotal
Xsquisite X55.307.808.508.206.1035.90
Seawind 16007.806.807.607.606.0035.80
Balance 5268.006.807.707.604.4034.50
Seawind 12606.906.206.807.107.1034.10
Knysna 500SE6.206.807.007.206.9034.10
Outremer 518.106.407.606.905.0034.00
Privilege 5.05.107.608.707.904.6033.90
St Francis 505.807.007.907.705.3033.70
Leopard 455.607.006.807.206.3032.90
Discovery 505.007.308.107.604.6032.60
Maverick 4405.706.107.207.106.5032.60
Catana 53.7.206.507.106.604.9032.30
NautiTech 466.306.706.506.306.2032.00
Antares 44 GS4.906.208.107.804.8031.80
Lagoon 424.706.906.306.606.1030.60
Majestic 5304.306.907.206.905.3030.60
Astrea 425.006.805.105.905.2028.00
Excess 124.405.004.905.204.5024.00
Neel 476.804.302.905.104.0023.10

Nick and Terysa's Vote

Nick and Terysa also shared the grades they assigned to the boats, which I show in Table 3.

Table 3: Ruby Rose Crew Scoring.

BoatPerform-
ance
Int. DesignQualitySafety
Design
Value Total
Seawind 160098910642
Balance 526981010542
Outremer 5189108540
Catana 53.8998539
Knysna 500SE6899739
Xsquisite X52101010638
Majestic 53041099638
Seawind 12607789738
Maverick 4405799636
St Francis 50551010636
Discovery 50310109436
Privilege 5.058109335
Antares 44 GS471010334
NautiTech 466876633
Lagoon 424687530
Leopard 454767529
Astrea 424846426
Neel 478616526
Excess 123566424

Visualizing House of Representative Data Using Power Query

Quote of the Day

Success is never due to one thing, but failure can be. Sleeping well won’t make you successful, but not sleeping enough will hold you back. Hard work is rarely enough without good strategy, but even the best strategy is useless without hard work. Many things are necessary, but not sufficient for success.

Introduction

Figure 1: Pivot Table Showing How Minnesota representatives for House Amendment 789 to Bill HR 2546.

I have been following certain bills through the US House of Representatives and wanted to know how the voting varied by US regions and political parties. I also want to generate tables that show how my state's representatives vote (Figure 1). Fortunately, the votes are documented online and Power Query was able to easily grab and process the data.

The government does a decent job of graphically presenting the data but I wanted to view the data a bit differently than how they did because I want to:

• Quickly be able to look at how my state's congressional representatives voted
• See how the voting varied US region and party
• See the votes of only congressional representative – the online data includes delegates from US territories if the votes will not change the pass/fail results

This post provides a workbook that can be used to:

• graphical view the voting results by US region and party
• filter the data to view the votes by state delegation
• graphical view of the voting without including the territorial delegates

Background

The online data is well-formatted:

• Votes of the US House and Senate are listed here
• US House data includes the district number and party of each representative
• Territory delegates are also included if their voting does not affect the vote outcome

There is one minor issue involving how votes are recorded. An affirmative vote may be recorded as "Yes" or "Aye." A negative vote may be recorded as "No" or "Nay." Also, some representatives choose not to vote, either through absence or simply deciding not to vote.

Because the US House and Senate data is formatted differently, I have chosen to restrict my workbook to processing only House data. My processing plan is straightforward:

• Filter the data to remove territorial delegates (if needed)
• Augment the data with US region information using region designations from the US Census Department

The workbook is available here.  It does contain a small, button-activated macro that updates all the queries and pivot tables.

To use the workbook, you must:

• go to the list of votes
• copy the link to the US House vote (NOT Senate) you are interested in
• paste the link into the vote link location on the Report tab of the workbook.

This processing sequence leaves me with a tidy table that I then use to generate pivot tables and charts.

Analysis

I randomly chose a bill that went through the House recently that involved adding some wilderness lands (HR  2546, Amendment 789). I wanted to see how the vote results varied by party and region of the US.

Figure 2 shows that the bill passed but the voting was along party lines. The graph in Figure 2 is my duplication of a similar graph on the government website. The only difference in my plot is that I removed the non-voting delegates.

Figure 2: House Vote Results Summarized By Party.

Figure 3 shows how the voting results varied by region. Again, it is very clear that the voting was along party lines in all regions of the US.

Figure 3: Vote Summary By Region and Party.

Conclusion

I was able to view the data for a US House bill that I was interested in. Figure 3 shows how polarized the US is nationally right now. Figure 4 shows that the Colorado House delegation was also polarized.

Figure 4: Colorado Voting Breakdown By Party.

My home state of Minnesota's House delegation was also polarized (Figure 5).

Figure 5: Minnesota Delegation was also Polarized.

Power Query Regular Expression Hack

Quote of the Day

May I give you a word of advice? Next time you invade Italy, do not start at the bottom.

General Fridolin von Senger und Etterlin, WW2 German general in Italy during WW2. This was his post-war advice to a British historian. The Allied Italian Campaign was a disaster in many ways – General von Senger und Etterlin's critique of the Allied effort was probably correct.

Introduction

Figure 1: Regular Expressions Used in My Example. Drawn using Regexpr.com.

Because many corporations are focused on the Microsoft Office suite of tools, businesses often require that I use Excel/Power Query for my analysis work so that they can work with the tools I develop after I am gone. Fortunately, I really enjoy using Power Query, but I find it irritating that it does not support regular expressions. I must admit that Power Query's standard functions can do a good job of extracting strings, but the process is a bit tedious.  I have a large library of regular expressions for extracting email addresses, phone numbers, social security numbers, and the like that would be efficient for me to use if Power Query could run them.

Fortunately, I saw a blog post by Imke Feldmann that showed a hack that allows the use of a regular expression in Power Query. The hack involves writing a query that contains a small HTML page that calls a Javascript regex routine. This may not be the world's most efficient piece of code, but much of my work involves cleaning up relatively small dimension tables where convenience is more important than efficiency.

My M-code example is included in this workbook and is shown below. I used a US Census table that I copied from the Wikipedia for an application example. I use this table all the time for grouping US state data by region. While it is a small table that could have been cleaned up with only Power Query's editor (i.e., no coding), using regular expressions resulted in a tad shorter query.

Reg Ex Query

The Power Query query that contains the regex is shown below.

Wikipedia Description of US Census Regions and Divisions

This is the text I copied from the Wikipedia.

Region 1: Northeast
Division 1: New England (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont)
Division 2: Mid-Atlantic (New Jersey, New York, and Pennsylvania)
Region 2: Midwest (Prior to June 1984, the Midwest Region was designated as the North Central Region.)[7]
Division 3: East North Central (Illinois, Indiana, Michigan, Ohio, and Wisconsin)
Division 4: West North Central (Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota)
Region 3: South
Division 5: South Atlantic (Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, District of Columbia, and West Virginia)
Division 6: East South Central (Alabama, Kentucky, Mississippi, and Tennessee)
Division 7: West South Central (Arkansas, Louisiana, Oklahoma, and Texas)
Region 4: West
Division 8: Mountain (Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming)
Division 9: Pacific (Alaska, California, Hawaii, Oregon, and Washington)

Tidied Table of US State Regions and Divisions

This is a tidy version of the Wikipedia data.

Cleaned Up Table of US States By Region and Division

RegionDivisionState
MidwestEast North CentralIllinois
MidwestEast North CentralIndiana
MidwestEast North CentralMichigan
MidwestEast North CentralOhio
MidwestEast North CentralWisconsin
MidwestWest North CentralIowa
MidwestWest North CentralKansas
MidwestWest North CentralMinnesota
MidwestWest North CentralMissouri
MidwestWest North CentralNorth Dakota
MidwestWest North CentralSouth Dakota
NortheastNew EnglandConnecticut
NortheastNew EnglandMaine
NortheastNew EnglandMassachusetts
NortheastNew EnglandNew Hampshire
NortheastNew EnglandRhode Island
NortheastNew EnglandVermont
NortheastMid-AtlanticNew Jersey
NortheastMid-AtlanticNew York
NortheastMid-AtlanticPennsylvania
SouthSouth AtlanticDelaware
SouthSouth AtlanticFlorida
SouthSouth AtlanticGeorgia
SouthSouth AtlanticMaryland
SouthSouth AtlanticNorth Carolina
SouthSouth AtlanticSouth Carolina
SouthSouth AtlanticVirginia
SouthSouth AtlanticWest Virginia
SouthEast South CentralAlabama
SouthEast South CentralKentucky
SouthEast South CentralMississippi
SouthEast South CentralTennessee
SouthWest South CentralArkansas
SouthWest South CentralLouisiana
SouthWest South CentralOklahoma
SouthWest South CentralTexas
WestMountainArizona
WestMountainIdaho
WestMountainMontana
WestMountainNew Mexico
WestMountainUtah
WestMountainWyoming
WestPacificCalifornia
WestPacificHawaii
WestPacificOregon
WestPacificWashington

US Navy WW2 Aviation Statistics Cleanup Using Power Query

Quote of the Day

The difference between MacArthur and God is that God never thought that He was MacArthur.

— Tom Kratman, former Regular Army Infantry Enlisted

Introduction

Figure 1: F6F Hellcat had the highest kill ratio of any USN fighter in the Pacific War.

I was reading a forum post on fighter kill ratios during WW2 and decide to compute some  Imperial Japanese Navy (IJN) vs  US Navy (USN) ratios for myself. I should point out that these ratios are generally viewed as inflated because of the difficulty of confirming downed aircraft. However, the inflated numbers continue to be quoted. The published reports state that the F6F Hellcat had the best kill ratio of the USN/Marine fighter at 19-to-1, followed by the F4U Corsair at 11-to-1, and the F4F Wildcat at 7-to-1.

I decided to compute the ratios for myself using the Naval Aviation Combat Statistics (NACS) report on the air war in the Pacific. As far as data cleanup goes, there is nothing quite as ugly as scanned WW2 documents:

• scanning errors
• typing errors
• hierarchical row and column headings
• math errors

This scan had all the usual problems minus the math errors. Click on Figure 2 to see the scan that I OCRed.

Figure 2: NACS Table 1.

Background

The analysis presented here is entirely based on the NASC's Table 1. Virtually all the work is in forcing the table into tidy format for easy pivot table generation. Figure 3 shows my data extraction workflow.

Figure 3: Data Processing Workflow.

My Excel workbook is available here.

Analysis

After converting the data to tidy format, I generated a few pivot tables. Table 1 shows the kill ratios for the listed USN fighters: Hellcat, Corsair, and Wildcat. My numbers agree with other publications. I was not able to find any references to the Brewster Buffalo's kill ratio in USN use, but my low number does not surprise me – the USN wanted to get rid of the Buffalo as soon as it could. I should note the Buffalo was used very successfully by the Finns.

 Table 1: USN WW2 Fighter Kill Ratios.

Table 2 shows the bomb tonnage dropped by USN aircraft. Note how the Corsair had the highest bomb tonnage total among the fighters. This attack capability explains why the Corsair remained in service through with the USN through the Korean War. Table 3 shows that the Hellcat and Corsair dominate the action sortie statistics.

 Table 2: Bomb Tonnage By Aircraft. Table 3: Sorties By Aircraft.

Conclusion

Power Query was able to deal with a complex table having both hierarchical and non-hierarchical column headings. I was able to generate pivot tables that confirmed previously published kill ratio numbers. I also was able to generate numbers for action sorties and bomb tonnage.

Battleship Shell Size Comparison

Quote of the Day

I am not afraid of storms, for I am learning how to sail my ship.

Louisa May Alcott (the author of Little Women). I feel the same way about data science. I am always looking for tough problems to test the skills I am acquiring.

Figure 1: Sailors on the 14-inch guns of the USS Texas, the last remaining dreadnought. (Wikipedia)

During some routine research on battleships, I encountered some photos on the web that I thought were worth sharing here. My hope with these photos is to give readers a feel for the size of these guns and their projectiles.

Figure 1 shows the 14-inch guns of the USS Texas, which is the last remaining dreadnought. She is a great example of a WW1-era warship. Unfortunately, salt-water is slowly eating away at her and her long-term fate is probably as a land-based museum exhibit.  I find it remarkable that they have kept it afloat this long. I do plan to travel in the next few years to the San Jacinto Battleground State Historic Site where she currently resides.

Figure 2 shows a "photoshopped" collection of different battleship shells with a person standing next to them to give a sense of scale. A heavy cruiser 8-inch shell is included in the photo (far right) to show that battleship shells are in a different category of size from that of a cruiser.

Figure 2: Battleship Shell Size Comparison. (Source)

The Yamato-class battleships used the 18.1-inch shells. I can find only one, albeit disputed, mention of the effectiveness of these shells in combat because the Yamato-class ships saw so little combat.  According to Japanese sources, the USS Gambier Bay was apparently heavily damaged by a near-miss from one of these shells.

Figure 3 shows the selection of shells available to an Iowa-class battleship. Most sources only mention the 1900 lb HC and 2700 lb armor-piercing variants, but you can see there were a number of other options.

Figure 3: USS Iowa Shell Selection. (Source)

Figure 4 is a photo that I saw on Quora that does a good job of comparing the sizes of the battleships (link).

Figure 4: Battleship Size Comparison.

Posted in Ballistics, Naval History | 1 Comment

Cleaning Up Coffee Production Data Using Power Query

Quote of the Day

Courage doesn't always roar. Sometimes courage is a quiet voice at the end of the day saying, 'I will try again tomorrow.'

— Poet and artist Mary Anne Radmacher on courage.

Introduction

Figure 1: Row of Coffee Plants in Kenya.

I recently volunteered to do some pro-bono data analysis and front-end web development for a very nice Kenyan woman who is trying to provide a US distribution channel for some coffee growers in her native Kenya. This post documents my work on tidying her coffee data.

The story of Kenya's coffee growers is similar to that of other commodity farmers. Kenya's warm climate and mountain terrain are ideal for coffee production and it currently supports ~150K coffee growers (link). However, the Kenyan coffee growers have been going through a cycle of low prices – a business cycle similar to what US milk producers are going through now.

Unfortunately, there are only a few alternatives for dealing with low prices:

• Lower your cost of production
Tough to do – most producers have already squeezed the inefficiencies out of their processes.
• Change to a more profitable crop
Tough to do – this generally requires new equipment, learning new skills, and developing new markets.
• Find a higher margin coffee sector
Tough to do – others are already fighting to keep these markets.
• Cut out the middlemen and sell directly to the consumer
This is a bit like opening a Farmer's Market. My Kenyan friend is trying this approach.

In this post, I am using Power Query to clean up some data from the International Coffee Organization (ICO) for use in generating some graphics. This is a good example to show my coffee-growing friends on how to process their data.

Background

The ICO puts its data out in Excel workbooks that are not in tidy format. My cleanup role consists of:

• removing blank rows
• changing the table from wide-format (years in columns) to long-format (a single year column)
• incorporate a continent column

Power Query will allow these folks to update the automatically update the data as the ICO puts out new releases.

Analysis

There is only one file: Coffee.xlsx. You can download it from here. No VBA involved, just Power Query.

Simple Graphics

I generated a copy of Excel graphs to illustrate how to use the data. Do not send me hate mail about using pie charts. I used them here so I could compare my data results with some ICO charts.

Figure 2 shows that Ethiopia is the dominant coffee producer in East Africa.

Figure 2: Horn of Africa Coffee Growing Percentages.

Figure 3 shows that South America dominates coffee production.

Figure 3: 2017 World Coffee Growing Percentages By Continent.

Conclusion

This proved to be a nice illustration of the use of Power Query to clean up some coffee data.

Post Script

I just showed the data to my coffee-growing friends. They had no idea how simple it was to clean up the data and to augment that data with information on Kenya. We then had a long discussion on things they could add to their web site. It was great to see these folks feel empowered!

Posted in Excel | 1 Comment