Gareth Lock studies the Titan incident where an OceanGate submersible collapsed during its descent in the North Atlantic Ocean, approximately 370 nautical miles off the coast of Newfoundland, Canada.
Two weeks ago, I wrote a blog about the (in)ability to learn from near-misses because they are often treated as successes. Furthermore, the research behind the article also showed that those in leadership positions of organisations are rewarded for near-misses because the positive outcome contributes to the organisational goal, and the organisation doesn’t look at how the ‘success’ happened but rather takes it at face value. They don’t question whether they were lucky or whether they were good.
This week, following the tragic loss of the Titan with five crew/passengers onboard, this blog looks at why it can be so hard to learn from events because of the biases we face: both after the event when we review the account, but also because of the biases and heuristics we use to make decisions under uncertainty.
I make no apology for the length of this blog, there is so much to unpack in this event, I am only touching the surface (and I am not even looking at the technical aspects of this).
Complex Systems
Modern socio-technical systems (combinations and interactions of people, technology, environment, and culture) are not linear in nature i.e., there isn’t a simple cause-and-effect relationship between the elements. Doing one thing one day will produce different results on a different day – you cannot replicate a complex system exactly because it will have changed, especially as people will have learned and adapted from past events.
No system is completely safe, nor is it completely specified. Every system operates with latent weaknesses and gaps; it takes a specific set of circumstances for the failure to emerge. Once the failure has happened, the obvious and not-so-obvious, factors become visible. The ease by which we can spot these critical factors is often aided by the hindsight bias.
Ironically, the chances of exactly those circumstances being repeated is quite low, and that means for us to learn, we must be able to abstract from one scenario to another. That isn’t easy, as we have a tendency to look for differences rather than similarities to the event that has just happened, a behaviour known as “distancing through differencing” (Woods & Cook, 2006). e.g., “I would never switch to the wrong gas because I always analyse and follow protocols.” and yet it happened because the diver was distracted and under pressure to complete the switch quickly given the situation they faced. If we want to learn, notice the conditions, not the outcomes.
Risk and Safety: Socially Constructed
Diving takes place in a hazardous environment – there is an irreducible risk of dying or being injured because of the environment we are in. The risk is made up of all the dynamic factors going on in a dive (environment, weather, temperature, physiological variability, equipment reliability, equipment design and testing…) Consequently, the risks we encounter are not constant. The risks we perceive are not constant. The risks we accept are not constant nor consistent over time. Fundamentally, there is uncertainty. This can pose a problem when it comes to determining what is ‘safe’ prior to a dive as that is based on experience, context, peer behaviours, conditions… However, after a dive, it can be easy to assume a dive was safe because there wasn’t an adverse event, but how close were we to failure? Humans tend to want to define an event as safe/unsafe. However, binary states like ‘safe’ do not exist in complex socio-technical systems.
The hazards we face in diving are relatively easy to identify because of the short duration of the dives and the limited number of ‘moving parts’. However, for something like Titan, where there are multiple factors that are independent and interdependent and are likely competing (customer wants/needs vs reliability vs reputation vs customer kudos vs financial viability vs weather vs technical expertise vs social conformance), identifying the ‘highest priority’ risks can be difficult even when there is technical evidence to say that something isn’t safe.
Some might challenge that safety and risk are socially constructed because there are rules and laws. These are a form of social construction. Society has determined that to be ‘safe’, there is a need for rules and the associated compliance to reduce the level of risk to an ‘acceptable’ level based on what is possible and affordable.
In the absence of ‘good’ data, we rely on heuristics (mental shortcuts) to lead us to the ‘correct’ answer. While technical standards for design and verification exist for submersibles, what conditions were present that meant they weren’t followed or didn’t think they applied? It is easy to focus on the statements made along the lines of ‘Rules don’t apply’ and ‘They reduce innovation’ but there must be more to these simple soundbites. There were multiple professionals involved in the programme, it wasn’t just the CEO making things happen, what was their position? This isn’t about shifting blame, but trying to understand the rationality of those involved.
Without a detailed understanding of what did happen, we rely on hindsight bias (and other biases) to infer that the issues were obvious to those involved at the time. Note, “At the time” itself is an interesting perspective – what specific time are we talking about? Risk perspective shifts as time progresses – both positively and negatively.
Formal, structured investigations (something most divers have never seen) look across the whole system, not just the technical failures and who was the ‘last to touch it’. Systems-focused investigations look at the system from the top down (government/culture) to the front-line operators and the relationships that exist between the factors. AcciMap provides an excellent way of looking at these relationships. A great example of a local rationality investigation is this one from the DMAIB which looks at the grounding of a coaster after the Master fell asleep in his cabin.
Hindsight Bias: What is it?
The hindsight bias is made up of two parts and can be summarised as:
- once people have knowledge of an outcome, they tend to view the outcome as having been more probable than other possible outcomes.
- people tend to be largely unaware of the modifying effect of outcome information on what they believe they would do .
The research shows that even when people have been told about bias and its effects on their judgements, they still succumb to its effects and make more critical judgements. Motivational and reward factors don’t impact its effect, so telling people to be better or paying them for the ‘right’ decision won’t reduce its effect. The implications for this bias are:
- “Decisions and actions having a negative outcome will be judged more harshly than if the same process had resulted in a neutral or positive outcome. We can expect this result even when judges are warned about the phenomenon and have been advised to guard against it.
- Judges will tend to believe that people involved in some incident knew more about their situation than they actually did. Judges will tend to think that people should have seen how their actions would lead up to the outcome failure.”(Woods & Cook, 2000).
We can already see #1 in action because there wasn’t the same outcry when the Titan missions were successful – the same applies in diving where gas isn’t analysed, caves are penetrated without lines, or rebreather cells are used beyond their date. Hindsight bias is not about ignoring learning opportunities following adverse events, it is about how we respond to information that is presented to us (the reviewers) or to those involved (the actors) at the time and after the event, and how they made sense of it. If ‘important’ information is dismissed at the time by those involved there will be some local rationality for that, and that is what we need to explore during the formal investigation process.
Another point to consider is a piece of research that shows that if the adverse event occurs in an uncertain or unusual environment, then we are more likely to judge it more harshly e.g., if someone visits a new curry house and gets food poisoning, they are likely to get more compensation from a claim than if they got food poisoning from their regular curry house and made the same claim (MacRae, 1992). A similar outcome was reported when a car crash took place on a new route/regular route.
Local Rationality Principle
One of the key principles of modern safety science so that we can learn from adverse events is to understand the local rationality of those involved: people are trying to do the best they can, with the resources they’ve got, the knowledge and skills they’ve developed, and within the constraints they must operate (sensory, data processing, financial etc). This is also known as bounded rationality. Bounded or local rationality doesn’t necessarily mean rationally sound when considering globally-accepted standards, but rather against the individual or team goals, values, expectations, and finite resources of those directly involved. When we view something after the effect, we always have more knowledge or evidence than those involved at the time. One of the most important pieces of information is knowing what the outcome was, something that is not known with 100% certainty prior to the event. Even to say “This was Russian Roulette” means there is a 5:6 chance of success. Not great odds for submersible operations, but there was a reason why those who played Russian Roulette did (albeit normally under duress).
Consequently, if we can understand the local rationality of those involved, we might be able to improve future decision-making situations which occur in similar scenarios, recognising that human nature doesn’t make this easy!!
What can divers learn from the loss of Titan?
I am not going to look at the technical aspects for a couple of reasons:
- I know that I don’t know anything useful about carbon fibre and titanium submersible construction. I could quote pieces from online sources, but they would be out of the context of the wider system.
- The investigation is going to find what can be found – which will still be an incomplete picture.
Rather, what I am going to do is look at some factors that I think have parallels with the diving industry (or other sectors if you’re not a diver).
- The CEO did not operate on his own. Those who carried on with developing Titan and the OceanGate business believed in the goal, even if it involved risks. This isn’t about defusing responsibility, it is about harnessing the knowledge and expertise within the team and following the five principles behind High-Reliability Organisations (HRO) (sensitivity to operations, pre-occupation with failure, reluctance to simplify, resilience, and deference to expertise). However, to do this, there is a need for a psychologically-safe environment where challenges can be made, and if there are disagreements, then rationale must be provided. ‘Lines in the sand’ i.e., standards, can help those in the team call out when the increases in risk have been normalised. Create an environment where constructive dissent can happen, using tools like prospective hindsight/pre-mortems and others from Red Teaming/Red Team Thinking.
- At an organisational level, consider what your goals are. At what point will you know that you are now too focused on the goal, and not recognising the drift that is happening to get you there? As Guy said in this blog, is the juice worth the squeeze? There are parallels with Doc Deep’s final dive.
- OceanGate had internal risk management processes – did they contribute or detract from the accuracy of the decision-making process? Did the decision-makers understand the difference between process safety/system safety and individual safety? Compliance is one way of ensuring safety, but compliance isn’t always possible in a dynamic environment. Even a risk of 1:10 000 still involves a failure. Do diver training organisations and dive centres understand the difference between system safety and personal safety? What do you do to set your instructors and staff up for success? Do you introduce weaknesses into the system through new materials, new processes, or new goals without understanding the unintended consequences? The teams on Deep Water Horizon focused on personal safety to the detriment of process safety and missed many critical factors as a result.
- Is there a culture of learning within the organisation or team? When learning opportunities arise, are they looked at in detail? Is there learning single-loop (fix the broken component/activity) or double loop learning (look at the underlying assumptions and activities and ask are we doing the right thing?) Chapter 8 from the Columbia Accident Investigation Board (CAIB) report on the loss of Columbia and Challenger Shuttles contains many examples of organisational learning opportunities and how NASA failed to learn. However, this inability to learn isn’t just a ‘NASA thing’, this research paper shows that you can repackage the loss of Challenger into another scenario and still have between 70-85% of your subjects launch the air platform, not seeing the parallels with Challenger!! Learning organisations require a fundamental shift from pure compliance to humble curiosity.
- Are near-misses treated as learning opportunities or successes? For learning to happen, time must be made for debriefs, reflection, analysis and embedding change. There must also be a culture to support this. Saying you run a debrief and ignore the elephants in the room doesn’t help anyone, and in fact, can make things less safe. Critical debriefs are not the norm in most dive operations and dive operations.
- Is the ‘identity’ challenged by the situation? Research from the US wildfire teams showed that firefighters were reticent to drop their tools in an emergency and thereby increase their speed across the ground because they provided some identity as to who they were. Was there an ‘explorer’ culture which meant that boundaries had to be explored and limits pushed, even though this was, in essence, adventure tourism? How many divers feel they can’t change or improve because they are part of a ‘culture’ or ‘group’? Social conformance and cultural identity are powerful factors. My experience of the Floridian cave diving community is that it is different to the rest of the diving community – there is a ‘code’ that needs to be adhered to, especially when it comes to discussing incidents and accidents.
- Availability, Representativeness & Adjustment and Anchoring. Powerful biases and heuristics shape our decision-making under uncertainty even when we know about them and are motivated or rewarded to get the ‘correct’ answer. The following points come from this 1974 paper by Kahneman and Tversky and is the basis of ‘Thinking. Fast and Slow.’
- The more data that is available cognitively, the more likely we will use it. If there hasn’t been a submersible lost with the loss of human life, then heuristically, it is less likely to happen. In terms of diving, how many incidents happen while diving? Don’t hear many, therefore they don’t happen very often…However, the reason we don’t hear about them is that there isn’t a psychologically-safe environment or a Just Culture to discuss them!
- If we don’t have good data, we try to match something which is close and use that as a proxy for data. For example, our representativeness of who explorers are and what they do, who billionaires are and what they represent, and apparently both have a propensity to take significant risks, potentially with others’ lives, all means that we could see this event as something to be expected, hence the horrendous memes which flooded social media. ‘Close enough’ shouldn’t be part of engineering designs and solutions, but the approach is part of human nature, and reinforced when things don’t go wrong.
- Adjustment and Anchoring. When we don’t have complete information of the situation, we use something to anchor the likely value. For example, informants were asked to determine an approximate value of 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 in a short period of time, others were asked to do the same with 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1. The former median estimate for the ascending sequence was 512, the descending was 2250. The correct answer is 40 320! How we present things matters! Because the first sequence starts small, the perception is the final answer will be ‘small’, the second sequence starts larger, and so the outcome will be a larger number. The problem is our brains can’t easily deal with compound factors so risk/failure analysis tools like FMEA/FMECA are used to try and quantify the uncertainty.
- These biases don’t exist in isolation, they influence each other to some degree, and therefore it is difficult to isolate them in a complex scenario.
- Sunk cost fallacy. OceanGate is a business and so commits resources and time, this can make it hard to say no to certain things. Divers travel around the world to undertake dive trips. It can be hard to say no when things aren’t ‘just right’. The closer you are to the ‘commit’ point, the harder it is to say no. How much more risk is acceptable if you’ve invested $10k on a trip to Chuuk/Truk, Bikini, or Galapagos?
- Prospect Theory. If we are in a losing situation (poor weather, unreliable equipment, great opportunities, cancelled dives), we are more likely to take risks (complete the dive) than when we are in a winning situation (lots of diving taking place, reliable equipment, and great weather) and so will cancel the dive. This is why it is hard for experienced, well-dived individuals to understand how hard it is for those who don’t dive often to say ‘no’ when things aren’t going quite as planned and the risk is deemed ‘acceptable’. With hindsight, we can see where the ‘wrong’ decisions happen more easily.
- Linked with the Human Diver blog from two weeks ago about success and near-misses, consider the Trieste and their record-breaking dive which was hailed as a success, even though the team were ordered not to dive, and the crew had seen “that the tow from Guam had taken a terrific toll!” on the hull. If that mission had failed, would the world view the event the same way?
- The application of HF to your diving and the diving industry. Even though there is plenty of evidence to show that HF can improve safety and performance in diving, there is nothing in the standards for sports or commercial diving regarding training and development of Human Factors for divers and diving supervisors. The UK MOD now have something in place, but only because there was a fatality and it was a key recommendation made in the report. Other militaries are now joining The Human Diver programmes to develop their knowledge – this is despite research written in 2007 stating the USN divers should explicitly build non-technical skills into their diving programmes. This presentation from Rebreather Forum 4 provides numerous resources highlighting the need and value for HF in Rebreather Diving (and wider than CCR diving too).
What you can do about it?
- Have clearly defined standards and goals which are agreed upon before an activity. They allow ‘lines in the sand’ to be drawn e.g., minimum gas, maximum deco, maximum penetration etc. Grey areas mean it is hard to ‘speak up’ because the fear is that I am not right, and so will look stupid for something that I didn’t understand.
- Psychological safety. Critical to using the resources within your team. There are a huge number of resources available at Tom Geraghty’s site here.
- Understanding the presence and impact of these biases is critical to reducing their effect, but once you are ‘in the tunnel’, you’re not likely to know that you are struggling. This is why peer and independent checking are important.
- Consider adopting the five HRO principles: Preoccupation with failure, Reluctance to simplify interpretations, Sensitivity to operations, Commitment to resilience, and Deference to expertise. You can find more about these via the link above and also the original book on the topic, Managing the Unexpected: Resilient Performance in an Age of Uncertainty.
- Continue learning…have an open, curious mind.
- When an adverse event occurs, once you’ve had your emotional response, consider the local rationality of those involved. It may appear stupid to you, but it must have made sense to them. That is where the learning happens.
Summary
The goal of this blog wasn’t to say that what happened prior to the loss of Titan was acceptable, but rather to explain a number of factors that can cloud our assessment of what happened, and their assessment of risk that culminated in the loss of the submersible and its 5 crew/passengers on 18 June 2023. It is easy to see the dots after the effect, some of which may not have even been visible, but if we want to learn, we have to suspend judgement and be curious as to how it made sense for all of those involved in the project/programme to do what they did. They were professionals who had a vested interest in success not failure.
There are always trade-offs and compromises in what we do. The more severe the outcome, the harder it is to detach ourselves from the apparent ‘stupidity’ of those involved. We also have to accept that history has shown us that we aren’t very good at learning from complex, systems accidents, hence why Perrow called them ‘Normal Accidents’.
Finally, five people lost their lives doing something that they thought was safe enough. It doesn’t matter who they were in terms of social status – they had family and friends who will mourn their loss. Fortunately, their loss would have been quick. RIP Stockton Rush, Paul-Henri Nargeolet, Hamish Harding, Shahzada Dawood and Suleman Dawood. There are significant learning opportunities to come from this tragic event, the hard part will be generating the learning and then applying it.
The above article has been initially published in The Human Diver blog and is reproduced here with author’s kind permission.
The views presented are only those of the author and do not necessarily reflect those of SAFETY4SEA and are for information sharing and discussion purposes only.