Limits to Robustness

7 minute read

Robustness without context is meaningless. Nonetheless, many of the amenities and privileges we enjoy as participants in modern societies are built upon the foundations of robust social and physical infrastructure; our hospitals and houses of representatives are able to adapt to significant disturbances and (mostly) continue functioning. Clearly, we understand robustness enough to build resilient infrastructure, so maybe robustness is "solved" and we can move on to think about something else.

Unfortunately, these conclusions are misleading for a couple reasons: First, in both the 2021 Capitol riot and the COVID-19 pandemic, we came uncomfortably close to collapse – if there had been a coup or if we didn't have any way to quickly design, manufacture, and distribute vaccines, we'd be having a very different conversation about robustness. Second, our knowledge is historical, based on past experiences. When we ground our predictions and preparations in outdated estimates, we're setting ourselves up for nasty surprises. Climate change is shifting the ground underneath our feet, with wide-ranging effects, and continuing to plan the future based on old data is dangerous and foolish.

That was the TLDR, but in the rest of this post, I'm going to discuss examples of robustness and then how our changing environment is undermining “robustness” that relies on historical data.

How Do We Ensure Robustness?

To better understand how we usually design robustness, we'll start by looking at computers. Computers are used for critical systems in healthcare, energy, financial, and military institutions, so by examining how we make computer systems robust, we can learn about how robustness is ensured.

So, how do we ensure robustness in computer systems? Failures in a complex assembly like a computer are the results of random processes: variations in the quality of the components, the external stresses they face, and the interactions with other internal components all contribute. Making a system more robust typically consists of a few approaches: building redundancy for when failures inevitably occur, minimizing the risk of failure, and minimizing the cost of (quickly) repairing the system. Different scenarios present different constraints and different understandings of what is robust: there isn't really any way to fix computers in space, so the computers that NASA sends into space probably aren’t optimized to minimize the cost of repair. However, they do have multiple, redundant central computers and are specially treated to be more radiation resistantWithout protection from the Earth's magnetic field, radiation significantly shortens a computer's lifespan in space.. Digitized financial systems, housed in server farms on Earth, benefit from being fairly easily repairable due to the highly-developedBut not that robust computer industry. Redundancy is built in by running critical processes across distributed networks that can accommodate the failures of individual computers. Furthermore, the computers are kept in isolated and climate-controlled warehouses that insulate them from external disturbances.

Qualitatively, both of NASA's rovers and bank's computer systems are quite robust. But with robustness, context is king. A NASA rover dropped into the ocean would fare poorly. Given that robustness obviously has its limitations, how robust can our creations be?

Paragons of Robustness

The robustness of our creations often shows up in their longevity, and there are few that last for many centuries and continue functioning. Cities, and their longevity, are triumphs of robustness. The robustness of cities is due to the great redundancy they are granted by the many social and physical structures that comprise them and their repairabilitySee here for an interesting discussion,Notably, there isn’t really much that cities can do to protect themselves, (i.e. from natural disasters or wars.). Even when cities "die" and shrink their populations by orders of magnitude they very frequently recover. London after the fall of the Roman Empire was almost deserted, and is now a thriving global metropolis. The 1906 San Francisco Earthquake, 1871 Chicago Fire, and the bombings of Hiroshima, Dresden and Warsaw are all examples of colossal catastrophes that each of the cities recovered from. It takes more than a single localized disturbance to end a city. As cited in the Reddit post linked above, "Burn down your cities and leave our farms, and your cities will spring up again as if by magic. But destroy our farms and the grass will grow in the streets of every city in the country."Originally from William J Bryant's 'Cross of Gold' speech The scale of development that humanity has undertaken over its history ensures that when even when most cities are leveled, the surrounding infrastructure enables them to spring back.

Hidden Assumptions

Obscured within the previous discussions of cities and computers are assumptions of normality. Building something that is impervious to everything is impossible, because there is always an unknown event that can overwhelm our best efforts to engineer robustness. So in the process of engineering robustness we reason about how probable an event is. For example, NASA's rover probably won't fly off Mars into the ocean on Earth, so while a rover's durability in the ocean represents a real limit on its robustness, we (very reasonably) assume that we don't need to worry about how waterproof it is or how resilient it is to corrosion by salt water. These assumptions are based on physical laws, so unless the laws of gravity change or a meteorite strikes Mars flinging ejecta in the direction of EarthIf either of these things happen, we have much bigger problems than the robustness of a rover, we don’t need to worry about the Mars rover, which is intended to stay on Mars, finding itself in the ocean on Earth.

However, many other less ironclad assumptions of normality are littered throughout the social and physical infrastructure of our societies. The COVID-19 pandemic has violated many of these assumptions. Many relatively harmless examples of these assumptions being violated were experienced in supermarkets. Shortages of toilet paper resulted from violated assumptions about normal variations in the rate of consumption. Toilet paper is typically consumed at a very stable per capita rate, especially when averaged over large populations. However, panic generated by the pandemic prompted extreme stockpiling which, as we all likely experienced, led to widespread shortages. Toilet paper producers simply weren’t prepared for the demand to spike like it did, and as a result, supply chains took some time to recover. Fortunately, toilet paper producers had made a pretty good assumption; the true, underlying rate of toilet paper useMeaning, how often people were actually wiping their butts did not change, people were simply stockpiling, creating an apparent crisis, while there was still in fact enough toilet paper for all.

Unfortunately, the violation of other assumptions led to deaths. Hospitals and medical suppliers can't afford to keep an infinite amount of medical supplies on hand, so they make inventory decisions based on disease statistics, with some allowance for variations. With the onset of a pandemic, their predictions became totally irrelevant and as a result contributed to the death count due to shortages of ventilators and other medical equipment. The eventual pandemic was certainly not a surpriseNote that this article is from 2009., and though robustness is difficult precisely because we can’t foresee exactly what challenges we are yet to face, we can certainly do a much better job preparing a rapid response to emerging infectious diseases, such asBut certainly not limited to! designing more scalable medical supply chains.

At a bigger picture level, the sub-optimal pandemic response was likely facilitated by the fact that human populations have become increasingly interconnected. The emphasis here is on the rate of change; old precedents for the appropriate magnitude and speed of response to pandemics are invalidated as the conditions change.

Robustness In A Changing World

Nowhere are the changes to our environment more universal than those caused by climate change. Cities, once the pinnacles of robustness in an uncertain world, are threatened in endless ways. Cape Town is one of countless examplesIn case you've been living under a rock, we are facing rising temperatures, rising sea-levels, wildfires, collapsing forest and aquatic ecosystems, among other things. of cities facing existential threats due to climate change. In 2018, Cape Town nearly ran out of water. While climate change continues, we can expect these threats to only increase in severity.

Building robustness into our society means abandoning past notions of what is normal and preparing for the future. But in many regions in the US, we're still investing more resources into developing climate-change threatenedIn some cases, threatened = guaranteed to be underwater in a decade or so. buildingsFor example, California and Florida, instead of future-proofing ourselves, our cities and our supply chains.

The point of discussing robustness in computers and cities was to illustrate that we are really good at engineering robustness when we understand the context we are designing for. The trouble with climate change is that the conditions are changing underneath our feet so it is hard to build an intuition for what is normal without help from experts. We, the scientific community, need to do a better job of communicating the environmental conditions that we find ourselves in. However, the burden doesn't lie solely with scientists. It also lies with insurance companies, (housing) developers, and governmental regulators to more faithfully depict the risks and protect people against them.

We face a challenge to upgrade our decision-making frameworks. We, as humans, have been successful at creating robust infrastructure in mostly static environments. Now we are confronted with a changing environmentThrough our own faults. and must update our practices to account for this paradigm shift. Otherwise, even systems designed to be robust will constantly fail.