Sunday, July 4, 2010

The BP Oil Spill: Why Slow Is Much Faster

As I read an article in the online Wall Street Journal about the equipment failures leading to the disastrous oil spill in the Gulf of Mexico, I am reminded of a lesson that I teach my laboratory students. It is this: The fastest, cheapest way to get something done is to proceed slowly. Check and recheck each step before proceeding to the next. Don't rush and don't make assumptions. It is counterintuitive advice to give students, who like everyone else, are in a hurry. But, as BP is finding out, assumptions can be costly, time consuming, and deadly.

The Wall Street Journal investigation is the most complete account to date in the media of what went wrong on the Deepwater Horizon. It is a story of rushed procedures and faulty assumptions that appear motivated by schedule and budget considerations. For example:

  • BP cut short a procedure designed to detect gas in the well and remove it before it becomes a problem.
  • BP skipped a quality test of the cement around the pipe (despite a warning from the cement contractor).
  • BP installed fewer centering devices than recommended (6 instead of 21).

The article also reported that on the day (April 20) the Deepwater Horizon exploded and sank, a disagreement broke on the rig over the procedures to be followed. A BP official had a "skirmish" with Transocean officials over how to remove drilling mud. BP prevailed and several hours later 11 people were dead and oil was spewing into the Gulf.

It appears that all involved knew corners were being cut, but a consensus emerged that the process would "most likely work." The cementing contractor Halliburton said that it followed BP's instructions, and that while some "were not consistent with industry best practices," they were "within acceptable industry standards."

But, the problem with complex equipment and procedures is that "most likely" can easily become "very unlikely" when everything has to function. Simply adhering to "acceptable standards" is no guarantee that everything will work.

This is lesson my students usually have to learn the hard way, even though it can be proved mathematically. Suppose you have a 90% confidence in your ability to make electrical connections. You think that if you wire your project without conducting tests, it will have a 90% chance of working. But, if you have 10 connections and each one must work, it is unlikely your project will succeed. The reason is that probabilities for simultaneous events multiply. If two events with a 90% chance of success must occur together, the likelihood of the combined events happening is (0.9) x (0.9), or 0.81, which is 81%. If 10 simultaneous events must occur, the chance becomes (0.9) multiplied by itself 10 times (0.9)10, or 0.35, which is a 35% chance of success.

A relative high confidence of 90% for a single can connection can be a deceiving number if all of them have to work. Worse still, when it doesn't work, you won't know why. It is difficult and time consuming to track down errors. The only solution is to spend extra time during assembly to test each connection when you make it, before proceeding to the next one.

I see this problem all the time when I teach. Students will follow the assembly instructions but do not perform the tests as they go along. They assume everything is correctly assembled. At the end they will have a beautiful piece of equipment that doesn't work. It is brought to me to figure out why and the students watch in dismay as I dismantle it piece by piece to search for the problem. Sometimes it is a mistake or misunderstanding on the first step, and that forces the students to begin all over again. They learn that time-consuming testing actually saves time.

It's not only students that struggle with this lesson. A friend once asked me for help wiring an external keyboard he purchased for a handheld device. He had followed the instructions, but after making all the connections it didn't work. Frustrated and confused he didn't know what to do next. He took it apart, put it back in the box, and called me.

He came to my office where I spread the parts out on my desk and followed the enclosed wiring instructions. But, after making each connection, I tested it with an electrical meter while twisting and pulling to make sure it was secure. I did this for every connection, because I made no assumptions about reliability based on how it looked or the high probability that almost all the connections I make are secure. When I finished, I turned the device on and it worked.

My friend said: "But, I wired it the same way you did. Why didn't it work?"

"You didn't do the same thing I did. You didn't test each connection when you made it. When it didn't work, you had no good way of finding a single bad connection, which is all that is needed for it to fail. I made sure each connection worked before I continued to the next one."

For highly complex equipment, such as on oil drilling platform, a 99.9% success rate for each step might not be acceptable. Consider a procedure that involves 10 steps with a 99.9 % chance of succeeding. The number 0.99910 is equal to .99, or 99%. A 1% chance of failure sounds safe, but the fact is 1% events happen frequently, about 1% of the time to be exact. If an event with a 1% frequency results in deaths, injuries, environmental and economic devastation, and possible bankrupting of the company, the risk is unacceptably large.

But, what is most disturbing is that even if the executives at BP making decisions understood the mathematics of risk it might not have made any difference. The root of cause the Gulf oil spill is the same as the root cause of the financial meltdown two years earlier. The executives take dangerous risks because they realize enormous personal gain when they succeed, while others will pay for the losses when they fail.

Imagine if Tony Hayward, BP's CEO, faced personal financial ruin from an oil spill. What if he had to contemplate having no yacht, no house, no assets, no job, and complete loss of livelihood? After all, those are the circumstances facing thousands of people on the Gulf coast as a result of the oil spill. What if Tony Hayward had to personally operate the equipment on the Deepwater Horizon so that its failure would end his life as it did eleven others? Do you think he would run his company differently? I bet if his life and livelihood were on the line he would make sure careful testing is done to insure safety for all concerned.

Unfortunately, the most likely outcome of this disaster is that nothing will change. There will be calls for tougher regulation, but, just like the financial overhaul working its way through Congress, change will be cosmetic. Opponents of more financial regulation use the same rhetoric as opponents of more oil industry regulation. They denounce increased regulation as an attack on "free markets."

But for "free markets" to work the agents must have a personal stake in the outcomes. Real free markets are composed of the thousand of small business owners and their workers who have a personal financial stake in their successes and failures. It's a sham to say that the executives of banks and oil companies are agents in a free market when they can only reap profits, while everyone else pays for their losses.

Joseph Ganem is a physicist and author of the award-winning The Two Headed Quarter: How to See Through Deceptive Numbers and Save Money on Everything You Buy