Survivorship Bias and Understanding Your Data
Published June 2, 2021
Updated June 2, 2021
In this post I'd like to share one of my favourite statistics-related stories. It is a story with a message that in hindsight seems so obvious, yet for many is not clear at the beginning. It is also a good reminder for all of us to check our assumptions when asking questions and to check if our conclusions match the scenarios we find ourselves in.
Let's jump right into it. The year is 1942 and the world is at war. The German army as captured much of mainland Europe and, with the largest invasion force yet assembled, has plunged deep into the Soviet Union. In an effort to cripple the German war machine, the allied powers - namely Great Britain and the United States - plan a bombing campaign to target the military, industrial, and civilian factories of Germany. These attacks would be carried out by a variety of bomber aircraft such as the four-engine B-17 "flying fortress" - with a crew of ten and a carrying capacity of 2,000 to 7,800 kilograms of explosives.
Such machines of death were neither fast nor agile and were the primary targets of enemy aircraft and ground-based defensives. With a cost of more than one million USD in today's currency per plane, the loss of a single plane with its experienced crew was substantial. As loses rose, the Air Force commissioned a study on how to better armour the planes to increase their survivability.
The story goes the Air Force collected data on bombers that returned from their missions to record where projectiles hit (see the diagram below.) Using this data, the following was concluded: "as most hits occurred on the central fuselage or the ends of wings, added armour should be placed in these locations." However, before the decision was finalized, a mathematician by the name of Abraham Wald who was part of the statistical research group at Columbia University contributing to the war effort, recommend the exact opposite.
Wald understood that the Air Force's conclusion was made on incorrect data, or rather, the conclusion did not support the data. The locations of hits were not from all aircraft that flew on missions, only those that survived. Further it is reasonable to assume that the hits on aircraft should be uniformly distributed around the plane. With these two premises a more accurate conclusion is "given that bombers can return while incurring hits on the central fuselage or the ends of wings, increased armour should be placed on the engines in inner wings, as aircraft with hits in these areas are not present in the sample of surviving aircraft. " Of all planes sent out, they can be partitioned into two sets - those that survived and those that did not - and only by realizing this truth about the data gathered can an accurate conclusion be made.
I first learned of this story from a WW2 documentary and have since heard it in several statistical courses as a consequence of incorrectly making assumptions about ones data. While it is an interesting story, I note that there are conflicting sources on it validity. Wald did work within the research group from Columbia, yet his work was on estimating the probability that a bomber would survive given that it incurred x number of hits (this is an interesting problem as the only data Wald had to make these estimates were the bombers that survived!) Additionally the phenomenon described above - survivorship bias - was known before Wald, and certainly known to the militaries of the western world.
At the onset of WW1 all of the warring nations issued fabric caps to their soldiers. These caps, relics of a type of warfare that had since past, offered no protection from the shrapnel released by modern artillery which devastated the battlefields of Europe. Within a year armies began to issue steel helmets, yet it was soon found that even more men were being admitted with head injuries afterwards. Some called for remaking the design of the helmets due to this increase until it was conjectured that the increase was not due the helmets giving injuries to able men, but rather the helmets protected against blows that would otherwise be fatal.
There are many examples of survivorship bias. Music today is not as good as it was in the past. There was plenty of bad music in the past but we only remember that which we enjoyed. University dropouts like Elon Musk, Bill Gates, and Mark Zuckerberg went on to build massive, successful companies, yet there are numerous university dropouts that failed to meet their ambitions. It is, of course, natural that successful enterprises are built on success. We are often taught to focus on success such that we may emulated it and achieve success for ourselves. Much less attention is given to failure. Some attribute failure to bad luck; not everyone can rise to the top. But in doing so we miss out on the opportunity to understand why we failed, to take a retrospective and learn what went wrong.
Just as we should mediate on our failures, we should remember survivorship bias with our own experiences. For example when I learn new coding languages or frameworks from video tutorials, I see people build feature-rich applications in a few hours. It is an impressive feat and makes one feel unready for the profession. However the few hours of video is not the full story. What doesn't survive the editing process, if it is at all captured, is the planning, design, debugging, replanning, more debugging, and the frustration that achieved the end product. On social media sites we see amazing skills - impressive art, strenuous exercises, choreographed dance - that does not capture the full intensity experienced to achieve such skill.