Monday, December 29, 2025



Lessons Learned from a Fatal Accident
Lessons Learned from a Fatal Accident
Lessons Learned: in this 30-minute video, we learn lessons from an accident in 2016 that killed four people on the Thunder River Rapids Ride in Queensland. The coroner's report was issued this year, and we went through the summary of that report. In it, we find failings in WHS Duties, Due Diligence, risk management, and failures to eliminate or minimize risks So Far As is Reasonably Practicable (SFARP). We do not 'name and shame', rather we focus on where we can find guidance to do better.

https://youtu.be/QaSoFld7W0g
In 2016, four people died on the Thunder River Rapids Ride.

Lessons Learned: Key Points

We examine multiple failings in:

- WHS Duties;

- WHS Due Diligence;

- Risk management; and

- Eliminating or minimizing risks So Far As is Reasonably Practicable (SFARP).

Transcript: Lessons Learned from a Theme Park Tragedy

Introduction

Hello, everyone, and welcome to the Safety Artisan: purveyors of fine safety engineering training videos and other resources. I'm Simon, and I'm your host and today we're going to be doing something slightly different. So, there are no PowerPoint slides. Instead, I'm going to be reading from a coroner's report from a well-known accident here in Australia, and we're going to be learning some lessons in the context of WHS workplace health and safety law.

Disclaimer

Now, I'd just like to reassure you before we start that I won't be mentioning the names of the deceased. I won't be sharing any images of them. And I'm not even going to mention the firm that owned the theme park because this is not about bashing people when they're down. It's about us as a community learning lessons when things go wrong to fix the problem, not the blame. So that's what I'd like to emphasize here.

The Coroner's Report

So, I'm just going to I'm just turning to the summary of the coroner's report. The coroner was examining the deaths of four people back in 2016 on what was called the Thunder River Rapids Ride. Or TRRR or TR3 for short because it's a bit of a mouthful. This was a water ride, as the name implies, and what went wrong was that the water level dropped. Rafts, these circular rafts that went down the rapids, went down the chute, got stuck. Another raft came up behind the stuck raft and went into it. One of the rafts tipped over. These rafts seat six people in a circular configuration. You may have seen them. They're in - different versions of this ride are in lots of theme parks.

But out of the six, unfortunately, the only two escaped before people were killed, tragically. So that's the background. That happened in October 2016, I think it was. The coroner's report came out a few months ago, and I've been wanting to talk about it for some time because it illustrates very well several issues where WHS can help us do the right thing.

WHS Duties

So, first of all, I'm looking at the first paragraph in the summary, the coroner starts off; the design and construction of the TRRR at the conveyor and unload area posed a significant risk to the health and safety of patrons. Notice that the coroner says the design and construction. Most people think that WHS only applies to workplaces and people managing workplaces, but it does a lot more than that. Sections 22 through 26 of the Act talk about the duties of designers, manufacturers, importers, suppliers, and then people who commissioned, installed, et cetera.

So, WHS supplies duties on a wide range of businesses and undertakings, and designers and constructors are key. There are two of them. Now, it's worth noting that there was no importer here. The theme park, although the TRRR ride was similar to a ride available commercially elsewhere, for some reason, they chose to design and build their version in Queensland. Don't know why. Anyway, that doesn't matter now. So, there was no importer, but otherwise, even if you didn't design and construct the thing, if you imported it, the same duties still apply to you.

No Effective Risk Assessment

So, the coroner then goes on to talk about risks and hazards and says each of these obvious hazards posed a risk to the safety of patrons on the ride and would have been easily identifiable to a competent person had one ever been commissioned to conduct a risk and hazard assessment of the ride. So, what the coroner is saying there is, “No effective risk assessment has been done”. Now, that is contrary to the risk management code of practice under WHS and also, of course, that the definition of SFARP, so far as reasonably practicable, basically is a risk assessment or risk management process. So, if you've not done effective risk management, you can't say that you've eliminated or minimized risks SFARP, which is another legal requirement. So, a double whammy there.

Then moving on. “Had noticed been taken of lessons learned from the preceding incidents, which were all of a very similar nature …” and then he goes on. That's the back end of a sentence where he says, you didn't do this, you had incidents on the ride, which are very similar in the past, and you didn't learn from them. And again, concerning reducing risks SFARP, Section 18 in the WHS Act, which talks about the definition of reasonably practicable, which is the core of SFARP, talks about what ought to have been known at the time.

So, when you're doing a risk assessment or maybe you're reassessing risk after a modification - and this ride was heavily modified several times or after an incident - you need to take account of the available information. And the owners of TRRR the operators didn't do that. So, another big failing.

The coroner goes on to note that records available concerning the modifications to the ride are scant and ad hoc. And again, there's a section in the WHS risk management code of practice about keeping records. It's not that onerous. I mean, the COP is pretty simple but they didn't meet the requirement of the code of practice. So, bad news again.

Due Diligence

And then finally, I’ve got to the bottom of page one. So, the coroner then notes the maintenance tasks undertaken on the ride whilst done so regularly and diligently by the staff, seemed to have been based upon historical checklists which were rarely reviewed despite the age of the device or changes to the applicable Australian standards. Now, this is interesting. So, this is contravening a different section of the WHS Act.

Section 27, talks about the duties of officers and effectively that sort of company directors, and senior managers. Officers are supposed to exercise due diligence. In the act, due diligence is fairly simple- It's six bullet points, but one of them is that the officers have to sort of keep up to date on what's going on in their operation. They have to provide up-to-date and effective safety information for their staff. They're also supposed to keep up with what's going on in safety regulations that apply to their operation. So, I reckon in that one statement from the coroner then there's probably three breaches of due diligence there to start with.

Risk Controls Lacking

We've reached the bottom of page one- Let's carry on. The coroner then goes on to talk about risk controls that were or were not present and says, “in accordance with the hierarchy of controls, plant and engineering measures should have been considered as solutions to identified hazards”. So in WHS regulations and it’s repeated in the risk code of practice, there's a thing called the hierarchy of controls. It says that some types of risk controls are more effective than others and therefore they come at the top of the list, whereas others are less effective and should be considered last.

So, top of the list is, “Can you eliminate the hazard?” If not, can you substitute the hazardous thing for something else that's less hazardous- or with something else that is less hazardous, I should say? Can you put in engineering solutions or controls to control hazards? And then finally, at the bottom of my list are admin procedures for people to follow and then personal protective equipment for workers, for example. We'll talk about this more later, but the top end of the hierarchy had just not been considered or not effectively anyway.

A Predictable Risk

So, the coroner then goes on to say, “rafts coming together on the ride was a well-known risk, highlighted by the incident in 2001 and again in 2004”. Now actually it says 2004, I think that might be a typo. Elsewhere, it says 2014, but certainly, two significant incidents were similar to the accident that killed four people. And it was acknowledged that various corrective measures could be undertaken to, quote, “adequately control the risk of raft collision”.

However, a number of these suggestions were not implemented on the ride. Now, given that they've demonstrated the ability to kill multiple people on the ride with a raft collision, it's going to be a very, very difficult thing to justify not implementing controls. So, given the seriousness of the potential risk, to say that a control is feasible is practicable, but then to say “We're not going to do it. It's not reasonable”. That's going to be very, very difficult to argue and I would suggest it's almost a certainty that not all reasonably practicable controls were implemented, which means the risk is not SFARP, which is a legal requirement.

Further on, we come back to document management, which was poor with no formal risk register in place. So, no evidence of a proper risk assessment. Members of the department did not conduct any holistic risk assessments of rides with the general view that another department was responsible. So, the fact that risk assessment wasn't done - that's a failure. The fact that senior management didn't knock heads together and say “This has to be done. Make it happen”- That's also another failing. That’s a failing of due diligence, I suspect. So, we've got a couple more problems there.

High-Risk Plant

Then, later on, the coroner talks about necessary engineering oversight of high-risk plant not being done. Now, under WHS act definitions, amusement rides are counted as high-risk plant, presumably because of the number of serious accidents that have happened with them over the years. The managers of the TRRR didn't meet their obligations concerning high-risk plants. So, some things that are optional for common stuff are mandatory for high-risk plants, and those obligations were not met it seems.

And then in just the next paragraph, we reinforce this due diligence issue. Only a scant amount of knowledge was held by those in management positions, including the general manager of engineering, as to the design modifications and past notable incidents on the ride. One of the requirements of due diligence is that senior management must know their operations, and know the hazards and risks associated with the operations. So for the engineering manager to be ignorant about modifications and risks associated with the ride, I think is a clear failure of due diligence.

Still talking about engineering, the coroner notes “it is significant that the general manager had no knowledge of past incidents involving rafts coming together on the ride”. Again, due diligence. If things have happened those need to be investigated and learned from and then you need to apply fresh controls if that's required. And again, this is a requirement. So, this shows a lack of due diligence. It's also a requirement in the risk management code of practice to look at things when new knowledge is gained. So, a couple more failures there.

No Water-Level Detection, Alarm, Or Emergency Stop

Now, it said that the operators of the ride were well aware that when one pump failed, and there were two, the ride was no longer able to operate with the water level dropping dramatically, stranding the rafts on the steel support railings. And of course, that's how the accident happened. Regardless, there was no formal means by which to monitor the water level of the ride and no audible alarm to advise one of the pumps had ceased to operate. So, a water level monitor? Well, we're talking potentially about a float, which is a pretty simple thing. There's one in every cistern, in every toilet in Australia. Maybe the one for the ride would have to be a bit more sophisticated than that- A bit industrial grade but the same principle.

And no alarm to advise the operators that this pump had failed, even though it was known that this would have a serious effect on the operation of the ride. So, there are multiple problems here. I suspect you'll be able to find regulations that require these things. Certainly, if you looked at the code of practice on plant design because this counts as industrial plants, it's a high-risk plant, so you would expect very high standards of engineering controls on high-risk plants and these were missing. More on that later.

In a similar vein, the coroner says “a basic automated detection system for the water level would have been inexpensive and may have prevented the incident from occurring”. So basically, the coroner is saying this control mechanism would have been cheap so it's certainly reasonably practicable. If you've got a cheap control that will prevent a serious injury or a death, then how on earth are you going to argue that it's not reasonable to implement it? The onus is on us to implement all reasonably practical controls.

And then similarly, the lack of a single emergency stop on the ride, which was capable of initiating a complete shutdown of all the mechanisms, was also inadequate. And that's another requirement from the code of practice on plant design, which refers back to WHS regulations. So, another breach there.

Human Factors

We then move on to a section where it talks about operators, operators’ accounts of the incident, and other human factors. I'm probably going to ask my friend Peter Bender, who is a Human Factors specialist, to come and do a session on this and look at this in some more detail, because there are rich pickings in this section and I'm just going to skim the surface here because we haven't got time to do more.

The coroner says “it's clear that these 38 signals and checks to be undertaken by the ride operators was excessive, particularly given that the failure to carry out any one could potentially be a factor which would contribute to a serious incident”. So clearly, 38 signals and checks were distributed between two ride operators, because there was no one operator in control of the whole ride- that's a human factors nightmare for a start- but clearly, the work designed for the ride was poor. There is good guidance available from Safe Work Australia on good work design so there's no excuse for this kind of lapse.

And then the coroner goes on to say, reinforcing this point that the ride couldn't be safely controlled by a human operator. The lack of engineering controls on a ride of this nature is unjustifiable. Again, reinforces the point that risk was not SFARP because not all reasonably practicable controls had been implemented. Particularly controls at the higher end of the hierarchy of controls. So, a serious failing there.  

(Now, I've got something that I'm going to skip, actually, but - It's a heck of a comment, but it's not relevant to WHS.)

Training And Competence

We're moving on to training and competence. Those responsible for managing the ride whilst following the process and procedure in place - and I'm glad to see you from a human practice point of view that the coroner is not just trying to blame the last person who touched it. He's making a point of saying the operators did all the right stuff. Nevertheless, they were largely not qualified to perform the work for which they were charged.

The process and procedures that they were following seemed to have been created by unknown persons. Because of the poor record-keeping, presumably who it is safe to assume lacked the necessary expertise. And I think the coroner is making a reasonable assumption there, given the multiple failings that we've seen in risk management, in due diligence, in record-keeping, in the knowledge of key people, et cetera, et cetera. It seems that the practice at the park was simply to accept what had always been done in terms of policy and procedure.

And despite changes to safety standards and practices happening over time, because this is an old ride, only limited and largely reactionary consideration was ever given to making changes, including training, provided to staff. So, reactionary -bad word. We're supposed to predict risk and prevent harm from happening. So, multiple failures in due diligence here and on staff training, providing adequate staff training, providing adequate procedures, et cetera.

The coroner goes on to say, “regardless of the training provided at the park, it would never have been sufficient to overcome the poor design of the ride. The lack of automation and engineering controls”. So, again, the hierarchy of controls was not applied, and relatively cheap, engineering controls were not used, placing an undue burden on the operator. Sadly, this is all too common in many applications. This is one of the reasons they are not naming the ride operators or trying to shame them because I've seen this happen in so many different places. It wouldn't be fair to single these people out.

‘Incident-Free’ Operations?

Now we have a curious, a curious little statement in paragraph 1040. The coroner says “submissions are made that there was a 30-year history of incident-free operation of the ride”. So, what it looks like is that the ride operators, and management, trying to tell the coroner that they never had an incident on the ride in 30 years, which sounds pretty impressive, doesn't it, at face value?

But of course, the coroner already knew or discovered later on that there had been incidents on the ride. Two previous incidents were very similar to the fatal accident. Now, on the surface, this looks bad, doesn't it? It looks like the ride management was trying to mislead the coroner. I don't think that's the case because I've seen many organizations do poor incident reporting, poor incident recording, and poor learning from experience from incidents. It doesn't surprise me that the senior management was not aware of incidents on their ride. Unfortunately, it's partly human nature.

Nobody likes to dwell on their failures or think about nasty things happening, and nobody likes to go to the boss saying we need to shut down a moneymaking ride. Don't forget, this was a very popular ride. We need to shut down a moneymaking ride to spend more money on modifications to make it safer. And then management turns around and says, “Well, nobody's been hurt. So, what's the problem?” And again, I've seen this attitude again and again, even on people operating much more sophisticated and much more dangerous equipment than this. So, whilst this does look bad- the optics are not good, as they like to say. I don't think there's a conspiracy going on here. I think it's just stupid mistakes because it's so common. Moving on.

Standards

Now the coroner goes on to talk about standards not being followed, particularly when standards get updated over time. Bearing in mind this ride was 30 years old. The coroner states “it is essential that any difference in these standards are recognized and steps taken to ensure any shortfalls with a device manufactured internationally is managed”. Now, this is a little bit of an aside, because as I've mentioned before, the TRRR was actually designed and manufactured in Australia. Albeit not to any standards that we would recognize these days. But most rides were not and this highlights the duties of importers. So, if you import something from abroad, you need to make sure that it complies with Australian requirements. That's a requirement, that's a duty under WHS law. We'll come back to this in just a moment.
#coursesafetyengineering #duediligence #engineersafety #fatalaccident #ineedsafety #knowledgeofsafety #learnsafety #lessonslearned #needforsafety #riskmanagement #safetyblog #safetydo #safetyengineer #safetyengineerskills #safetyengineertraining #safetyengineeringcourse #safetyprinciples #SFARP #softwaresafety #theneedforsafety #themeparkaccident #thunderriverrapidsride #WHS
Simon Di Nucci https://www.safetyartisan.com/2023/12/06/lessons-learned-from-a-fatal-accident/


Introduction to Human Factors
Introduction to Human Factors
In this 40-minute video, 'Introduction to Human Factors', I am very pleased to welcome Peter Benda to The Safety Artisan.

Peter is a colleague and Human Factors specialist who has 23 years' experience in applying Human Factors to large projects in all kinds of domains. In this session, we look at some fundamentals: what does Human Factors engineering aim to achieve? Why do it? And what sort of tools and techniques are useful?

This is The Safety Artisan, so we also discuss some real-world examples of how erroneous human actions can contribute to accidents. (See this post for a fuller example of that.) And, of course, how the Human Factors discipline can help to prevent them.

https://youtu.be/FnL4XuLlvoQ
In 'Introduction to Human Factors', Peter explains these vital terms to us!

Topics

- Introducing Peter;

- The Joint Optimization Of Human-Machine Systems;

- So why do it (HF)?

- Introduction to Human Factors;

- Definitions of Human Factors;

- The Long Arm of Human Factors; and

- What is Human Factors Integration?

Introduction to Human Factors: Transcript

Introduction

Simon:  Hello, everyone, and welcome to the Safety Artisan: Home of Safety Engineering Training. I'm Simon, and I'm your host, as always. But today we are going to be joined by a guest, a Human Factors specialist, a colleague, and a friend of mine called Peter Benda. Now, Peter started as one of us, an ordinary engineer, but unusually, perhaps for an engineer, he decided he didn't like engineering without people in it. He liked the social aspects and the human aspects, and so he began to specialise in that area. And today, after twenty-three years in the business, and a first degree and a master's degree in engineering with a Human Factors speciality. He's going to join us and share his expertise with us.

So that's how you got into it then, Peter. For those of us who aren't really familiar with Human Factors, how would you describe it to a beginner?

Peter:   Well, I would say it's The Joint Optimization Of Human-Machine Systems. So it's really focusing on designing systems, perhaps help holistically would be a term that could be used, where we're looking at optimizing the human element as well as the machine element. And the interaction between the two. So that's really the key to Human Factors. And, of course, there are many dimensions from there: environmental, organisational, job factors, human and individual characteristics. All of these influence behaviour at work and health and safety. Another way to think about it is the application of scientific information concerning humans to the design of systems. Systems are for human use, which I think most systems are.

Simon:  Indeed. Otherwise, why would humans build them?

Peter:   That's right. Generally speaking, sure.

Simon:  So, given that this is a thing that people do, then. Perhaps we're not so good at including the human unless we think about it specifically?

Peter:   I think that's fairly accurate. I would say that if you look across industries, and industries are perhaps better at integrating Human Factors considerations or Human Factors into the design lifecycle, that they have had to do so because of the accidents that have occurred in the past. You could probably say this about safety engineering as well, right?

Simon:  And this is true, yes.

Peter:   In a sense, you do it because you have to, because the implications of not doing it are quite significant. However, I would say the upshot, if you look at some of the evidence –and you see this also across software design and non-safety critical industries or systems –that taking into account human considerations early in the design process typically ends up in better system performance. You might have more usable systems, for example. Apple would be an example of a company that puts a lot of focus into human-computer interaction and optimizing the interface between humans and their technologies and ensuring that you can walk up and use it fairly easily. Now as time goes on, one can argue how out how well Apple is doing something like that, but they were certainly very well known for taking that approach.

Simon:  And reaped the benefits accordingly and became, I think, they were the world's number one company for a while.

Peter:   That's right. That's right.

Simon:  So, thinking about the “So why do it?” What is one of the benefits of doing Human Factors well?

Peter:   Multiple benefits, I would say. Clearly, safety and safety-critical systems, like health and safety, Performance, system performance, Efficiency and so forth. Job satisfaction and that has repercussions that go back into, broadly speaking, that society. If you have meaningful work that has other repercussions, and that's sort of the angle I originally came into all of this from. But, you know, you could be looking at just the safety and efficiency aspects.

Simon:  You mentioned meaningful work: is that what attracted you to it?

Peter:   Absolutely. Absolutely. Yes. Yes, as I said, I had a keen interest in the sociology of work and looking at work organisation. Then, for my master's degree, I looked at lean production, which is the Toyota approach to producing vehicles. I looked at multiskilled teams and multiskilling, and job satisfaction. Then, looking at stress indicators and so forth versus mass production systems. So that's really the angle I came into this. If you look at it, mass production lines where a person is doing the same job over and over, it’s quite repetitive and very narrow, versus the more Japanese-style lean production. There are certainly repercussions, both socially and individually, from a psychological health perspective.

Simon:  So, you get happy workers and more contented workers -

Peter:   – And better quality, yeah.

Simon:  And again, you mentioned Toyota. Another giant company that's presumably grown partly through applying these principles.

Peter:   Well, they’re famous for quality, aren't they? Famous for reliable, high-quality cars that go on forever. I mean, when I moved from Canada to Australia, Toyota had a very, very strong history here with the Land Cruiser, and the Hilux, and so forth.

Simon:  All very well-known brands here. Household names.

Peter: They are known to be bombproof and can outlast any other vehicle. And the lean production system certainly has, I would say, quite a bit of responsibility for the production of these high-quality cars.

Simon:  So, we've spoken about how you got into it and “What is it?” and “Why do it?” I suppose, as we've said, what it is in very general terms, but I suspect a lot of people listening will want to know to define what it is, what Human Factors is, based on doing it. On how you do it. It's a long, long time since I did my Human Factors training. Just one module in my master's, so could you take me through what Human Factors involves these days in broad terms?

Peter:   Sure, I actually have a few slides that might be useful –  

Simon:  – Oh, terrific! –

Peter:   – Maybe I should present that. So, let me see how well I can share this. And of course, sometimes the problem is I'll make sure that – maybe screen two is the best way to share it. Can you see that OK?

Simon:  Yeah, that's great...

(See the video for the full content)

Introduction to Human Factors: Leave a Comment!
#coursesafetyengineering #engineersafety #HF #humanfactors #humanmachinesystems #ineedsafety #jointoptimization #knowledgeofsafety #learnsafety #needforsafety #safetyblog #safetydo #safetyengineer #safetyengineerskills #safetyengineertraining #safetyengineeringcourse #safetyprinciples #safetytraining #softwaresafety #theneedforsafety
Simon Di Nucci https://www.safetyartisan.com/2023/08/02/introduction-to-human-factors/

Sunday, December 28, 2025



The 2024 Blog Digest - Q3/Q4
The 2024 Blog Digest - Q3/Q4
The 2024 Blog Digest - Q3/Q4 brings you all of The Safety Artisan's blog posts from the first six months of this year. I hope that you find this a useful resource!

The 2024 Blog Digest - Q3/Q4: 18 Posts!

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience. I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Hi, everyone, and welcome to The Safety Artisan. I’m Simon, and I just wanted to share with you briefly why I started this enterprise. I’ve had a career in safety, engineering, and safety consulting for over 25 years now. And in that time, I’ve seen customers make one of two mistakes quite often. First of all, I’ve seen customers not do some things that they should have been doing. This was usually because they were just ignorant of what their legal obligations were.

And I guess that’s a fairly obvious mistake. That’s what you would expect me to say. But more often, I’ve seen customers do too much to try and achieve safety, which is surprising! I’ve seen people waste a lot of time, energy, and money doing things that just didn’t make a difference. Sometimes it actually got in the way of doing good safety work.

And I think the reasons for those mistakes are, first of all, ignorance.

Secondly, not knowing precisely what safety is and therefore not being able to work out how to get there. That’s why I started The Safety Artisan. I wanted to equip people with the knowledge of what safety really is and the tools to get there efficiently. To neither do too much nor too little. We want Safety, Just Right.
#coursesafetyengineering #ineedsafety #knowledgeofsafety #learnsafety #safetyblog #safetydo #safetyengineer #safetyengineertraining #safetyengineeringcourse #safetyprinciples
Simon Di Nucci https://www.safetyartisan.com/2024/12/26/the-2024-blog-digest-q3-q4/


System Hazard Analysis with Mil-Std-882E
System Hazard Analysis with Mil-Std-882E
In this 45-minute session, I look at System Hazard Analysis with Mil-Std-882E. SHA is Task 205 in the Standard. I explore Task 205's aim, description, scope, and contracting requirements.

I also provide commentary, based on working with this Standard since 1996, which explains SHA. How to use it to complement Sub-System Hazard Analysis (SSHA, Task 204). How to get the maximum benefits from your System Safety Program.

Using Task 205 effectively is not just a matter of applying it in number order with the other Tasks. We need to use it within the Systems Engineering framework. That means using it top-down, to set requirements, and bottom-up to verify that they are met.

https://youtu.be/F70fhSGsyLk
This is the seven-minute-long demo. The full video is 47 minutes long.

get the course 'system hazard analysis': click here

System Hazard Analysis: Topics

- Task 205 Purpose ;

- Verify subsystem compliance;

- ID hazards (subsystem interfaces and faults);

- ID hazards (integrated system design); and

- Recommend necessary actions.

- Task Description (five slides);

- Reporting;

- Contracting; and

- Commentary.

Transcript: System Hazard Analysis with Mil-Std-882E

Introduction

Hello, everyone, and welcome to the Safety Artisan, where you will find professional, pragmatic, and impartial safety training resources and videos. I’m Simon, your host, and I’m recording this on the 13th of April 2020. And given the circumstances when I record this, I hope this finds you all well.

System Hazard Analysis Task 205

Let's get on to our topic for today, which is System Hazard Analysis. Now, system hazard analysis is, as you may know, Task 205 in the Mil-Std-882E system safety standard.

Topics for this Session

What we're going to cover in this session is purpose, task description, reporting, contracting, and some commentary – although I'll be making commentary all the way through. Going back to the top, the yellow highlighting with this (and with Task 204), I'm using the yellow highlighting to indicate differences between 205 and 204 because they are superficially quite similar. And then I'm using underlining to emphasize those things that I want to bring to your attention and emphasize.

Within Task 205, Purpose. We've got four purpose slides for this one. Verify subsistent compliance and recommend necessary actions – fourth one there. And then in the middle of the sandwich, we've got the identification of hazards, both between the subsystem interfaces and faults from the subsystem propagating upwards to the overall system and identifying hazards in the integrated system design. So, quite a different emphasis to 204, which was thinking about subsystems in isolation. We’ve got five slides of task description, a couple on reporting, one on contracting – nothing new there – and several commentaries.

System Requirements Hazard Analysis (T205)

Let's get straight on with it. The purpose, as we've already said, there is a three-fold purpose here; Verify system compliance, hazard identification, and recommended actions, and then, as we can see in the yellow, the identifying previously unidentified hazards is split into two. Looking at subsystem interfaces and faults and the integration of the overall system design. And you can see the yellow bit, that's different from 204 where we are taking this much higher-level view, taking an inter-subsystem view and then an integrated view.

Task Description (T205) #1

On to the task description. The contract has got to do it and document, as usual, looking at hazards and mitigations, or controls, in the integrated system design, including software and human interface. We must come onto that later.

All the usual stuff about we've got to include COTS, GOTS, GFE, and NDI. So, even if stuff is not being developed, if we're putting together a jigsaw system from existing pieces, we've still got to look at the overall thing. And as with 204, we go down to the underlined text at the bottom of the slide, areas to consider. Think about performance, and degradation of performance, functional failures, timing and design errors, defects, inadvertent functioning – that classic functional failure analysis that we've seen before.

Again, while conducting this analysis, we’ve got to include human beings as an integral component of the system, receiving inputs, and initiating outputs.  Human factors were included in this standard from long ago...

The End

You can see all the Mil-Std-882E Analysis Tasks here.

Get a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.
#Milstd882Technique #Milstd882Training #Milstd882tutorial #Milstd882Video #MilStd882E #Milstd882eTechnique #Milstd882eTraining #Milstd882etutorial #Milstd882eVideo #SafetystandardTechnique #SafetystandardTraining #Safetystandardtutorial #SafetystandardVideo #SHA #systemhazardanalysis #systemhazardanalysisTechnique #systemhazardanalysisTraining #systemhazardanalysistutorial #systemhazardanalysisVideo #SystemsafetyengineeringTechnique #systemsafetyengineeringtraining #Systemsafetyengineeringtutorial #SystemsafetyengineeringVideo #Task205
Simon Di Nucci https://www.safetyartisan.com/2025/06/30/equipped-system-hazard-analysis/


Failure Mode Effects Analysis
Failure Mode Effects Analysis
TL;DR This article on Failure Mode and Effects Analysis explains this powerful and commonly used family of techniques. You can access this webinar (and all the others) here.

I have used FMEA and related techniques on many programs, and it can produce powerful results quickly and cheaply. Recently, I've seen some criticism of FEMA on social media. However, I'm convinced that this is only clickbait. The secret of success is to understand what a technique is good for - and not - and to apply it well. It's as simple as that!

This article covers:

- A description of the technique, including its purpose;

- When it might be used;

- Advantages, disadvantages and limitations;

- Sources of additional information;

- A simple example of an FMEA/FMECA; and

- Additional comments.

I’ve added some ‘top tips’ of my own based on my personal experience in the industry.Top Tip

In this article, I have used material from a UK Ministry of Defence guide, reproduced under the terms of the UK’s Open Government Licence. I have rewritten the very dull source material to make it more readable!

A Description of the Technique, Including Its Purpose

Failure modes and effects analysis (FMEA) was one of the first systematic techniques for failure analysis. It was developed in the United States military (Military Procedure MIL-P-1629, titled ‘Procedures for Performing a Failure Modes, Effects and Criticality Analysis’, November 9, 1949) as a reliability evaluation technique to determine the effect of system and equipment failures. Failures were classified according to their impact on mission success and personnel, equipment, and safety. In the 1960s, it was used by the aerospace industry and NASA during the Apollo program. More and more industries - notably the automotive industry - have seen the benefits to be gained by using FMEAs to complement their design processes.

This qualitative technique helps identify failure potential in a design or process i.e., to foresee failure before it actually happens. This is done by defining the system that is under consideration to ensure system boundaries are established, and then by following a procedure, which helps to identify design features or process operations that could fail. The procedure requires the following essential questions to be asked:

- How can each component fail?

- What might cause these modes of failure?

- What could the effects be if these failures did occur?

- How serious are these failure modes?

- How is each failure mode detected?

- What are the safeguards in place to protect against accidents resulting from the failure mode?

As always with safety analyses, the more precisely you can answer these questions (above), the better the results you will get.Top Tip

As an aid in structuring the analysis and ensuring a systematic approach, results are recorded in a tabular format. Several different forms are in use, and the form design can be tailor-made to suit the particular requirements of a study. Examples of forms can be found in several standards (links below).

Make the form support the flow of the process, left-to-right, then top-down!Top Tip

The FMEA analysis can be extended if necessary by characterizing the likelihood, severity, and resulting levels of risk of failures. FMEAs that incorporate this criticality analysis (CA) are known as FMECAs. A FMECA is an analytical quantitative technique that ranks failure modes according to their probability and consequences (i.e., the resulting effect of the failure mode on the system, mission, and personnel). This technique is referred to as a “bottom-up approach” as it starts by identifying the potential failure modes of a component and analyzing their effects on the whole system. It can be quite complex depending on how the user drives the technique.

We should note that the FMECA does not provide a model by which system reliability can be quantified. Hence, if the objective is to estimate the probability of events, a technique that results in a logic model of the failure mechanisms must be employed, typically a fault tree and/or an event tree.

Reliability Block Diagrams, or for repairable systems, Markov Chains can also be used.Top Tip

A FMEA or FMECA can be conducted on either a component or a functional level. A functional FMEA/FMECA only covers hardware aspects, but a functional FMEA/FMECA can cover all aspects of a system. For either approach, the general principle remains the same.

When it Might be Used

FMEA is applicable for any well-defined system, but is primarily used for reviews of mechanical and electrical systems. It can be used in many situations, for example, to assess the design of a product in terms of what could go wrong in manufacturing and in-service as a result of the weakness in the design. We can also use it to analyze failures in the manufacturing process itself and during service. It is effective for collecting information needed to troubleshoot system problems and improving maintenance and reliability of plant and equipment (defining and optimizing), as it focuses directly and individually on equipment failure modes.

It's fair to say that you need a design, on which to perform a FMEA. Pre-design you could use Functional Failure Analysis (FFA) instead.Top Tip

The FMECA technique is best suited for detailed analysis of system hardware, and should preferably be carried out by the designer in parallel with system development. This will not only speed up the analysis itself, but also force the design team to think systematically about the failure characteristics of the system. The primary use of the FMECA is in verifying that single-component failures cannot cause a catastrophic system failure.

There are a number of areas today in which the use of FMECA has become mandatory to demonstrate system reliability. Examples of such requirements are in the classification of Dynamically Positioned (DP) vessels and in a number of US military applications for which MIL-STD documents apply.

Advantages, Disadvantages, and Limitations

Advantages

- It is widely used and well-understood, and easy to understand and interpret

- It can be performed by a single analyst or more if required

- Qualitative data about the causes and effects can be incorporated into the analysis

- It is systematic and comprehensive, and should identify hazards with an electrical or mechanical basis

- The level of detail incorporated can be varied to suit the analysis

- It identifies safety-critical equipment where a single failure would be critical for the system

- Even though the technique can be quite time-consuming it can lead to a thorough understanding of the system being considered

Disadvantages

- The technique adopts a bottom-up approach, and if conducting a component-level FMEA or FMECA, this can be boring and repetitive

- The benefit gained is dependent upon the experience of the analyst or the group.

- It requires a hierarchical system drawing as the basis for the analysis, which the analyst usually has to develop before the FMEA process can start

- It is optimised for mechanical and electrical equipment, and does not apply easily to Human Factor Integration, procedures, or process equipment

- It is difficult for the technique to cover multiple failures, as equipment failures are generally analysed one by one; therefore, important combinations of equipment failures may be overlooked

- Most accidents have a significant human or external influence contribution, and these are not a usual failure mode with FMEA

- More than one FMEA may be required for a system with multiple modes of operation

- Due to its wide use, there can be a temptation to read across data from ARM or ILS projects where, for example, the fault-tree technique has been used. As a consequence, the safety perspective can be lost as human error has been excluded and the focus has been solely on determining faults and not on more far-reaching safety issues

- Perhaps the worst drawback of the technique is that all component failures are examined and documented, including those that do not have any significant consequences.

- For large systems, especially those with a fair degree of redundancy built into them, the amount of unnecessary documentation is a major disadvantage. Hence, the FMECA should primarily be used by designers of reasonably simple systems. It should, however, be noted that the concept of the FMECA form can be quite useful in other contexts, e.g., when reviewing an operation rather than a hardware system. Then the use of a form similar to the FMECA can provide a useful way of documenting the analysis. Suitable columns in the form could, for example, include: operation, deviation, consequence, correcting or reversing action, etc.

ARM = Availability, Reliability, MaintainabilityILS = Integrated Logistic Support (or logistics engineering)Top Tip

Sources of Additional Information, such as Standards, Textbooks, and Websites

BS 5760: Part 5 Reliability of Systems, Equipment and Components: Part 5 Guide to Failure Modes, Effects and Criticality Analysis.

HSE Website - Marine Risk Assessment, Offshore Technology Report 2001/063

IEC 60812:2018 Failure modes and effects analysis (FMEA and FMECA)

As always, Understand your Standard (what it was designed to do) to get the best out of it!Top Tip

A Simple Example of an FMEA/FMECA

An example extract from an FMEA of a ballast system is shown below. This can be found in the HSE Marine Risk Assessment Report. The column headings are based on the US Military Standard Mil-Std 1629A, but with modifications to suit the particular application. For example, the failure mode and cause columns are combined. The criticality of each failure is ranked as minor, incipient, degraded, or critical.

An example of an FMEA Output Table

To properly understand these results you need to know how a Sea Chest works (see context here). Otherwise the example just shows what kind of output a FMEA can produce.Top Tip

Additional comments

Failure Modes and Effects and Criticality Analysis (FMECA) is an analytical QRA technique, used by ARM and ILS systems engineers, most commonly and effectively at the late design, test and manufacture stage of a project. It requires the breakdown of the system into individual components and the identification of possible failure modes or malfunctions of each component, (such as too much flow through a valve). Referred to as a bottom-up approach, it starts by identifying the potential failure modes of a component and analyzing their potential effects on the whole system. Numerical levels can be assigned to the likelihood of the failure and the severity or consequence of the failure.

Note: It is important to recognize that FMEA/FMECA Standards have different approaches to criticality. Failure mode severity classes 1 – 5 for Standards MIL1629A and ARP926A go from Class 1 being the most severe (e.g. loss of life) to Class 5 being less severe (i.e. no effect), whereas BS 5760 deals with criticality in the opposite direction where Class 5 is the most severe.

Note that FMECA for ARM/ILS looks at availability or mission criticality, not safety criticality.  A FMECA for safety will have a different focus.Top Tip

Software:

- Isograph;

- Reliability Work Bench;

- Reliasoft;

- Microsoft Excel.

These are not recommendations!

FMEA/FMECA tables for complex systems can run to hundreds of pages, so good tool support is essential.Top Tip

Failure Mode Effects Analysis: Have You Used This Technique?

Back to the Safety Assessment topic page.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience. I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.
#failureeffectsmodeanalysis #failuremodeeffectanalysiscasestudy #failuremodeeffectanalysisfosterssafetyinsystemsandthepreventionofaccidentsby #failuremodeeffectanalysissoftware #FailureModeEffectsAnalysis #failuremodeeffectsanalysisexample #failuremodeeffectsanalysistemplate #failuremodesandeffectsanalysisinvolveswhatactivity #infailuremodeandeffectsanalysis #whatisfailuremodeeffectsanalysis #whatisthedifferencebetweenfaulttreeanalysisandfmea
Simon Di Nucci https://www.safetyartisan.com/2024/04/24/failure-mode-effects-analysis/


How to do Preliminary Hazard Analysis with Mil-Std-882E
How to do Preliminary Hazard Analysis with Mil-Std-882E
In this 45-minute session, I look at how to do a Preliminary Hazard Analysis with Mil-Std-882E. Preliminary Hazard Analysis, or PHA, is Task 202 in the Standard.

I explore Task 202's aim, description, scope, and contracting requirements. There's value-adding commentary, and I explain the issues with PHA - how to do it well and avoid the pitfalls.

Now, I have worked in System Safety since 1996, and I think that PHA is one of the three tasks that EVERY project needs to do. The other two are:

- Preliminary Hazard Identification (PHI, Task 201); and

- System Requirements Hazard Analysis (SRHA, Task 203).

I look at these three tasks together in my course 'Foundations of Safety Assessment'. This is one of five linked courses on Mil-Std-882E. They will teach you how to get the maximum benefits from your System Safety Program.

https://youtu.be/gzKcj2to3uU
This is the seven-minute-long demo video. The full video is 45 minutes long.

get the video as part of the course:'foundations of safety assessment'

Topics: How to do Preliminary Hazard Analysis

- Task 202 Purpose;

- Task Description;

- Recording & Scope;

- Risk Assessment (Tables I, II & III);

- Risk Mitigation (order of preference);

- Contracting; and

- Commentary.

Transcript: How to do Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you'll find professional, pragmatic, and impartial safety training resources. So, we’ll get straight on to our session, which is on the 8th of February 2020.

Preliminary Hazard Analysis

Now we're going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It's very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It's one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I'm doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording, and scope. How we do risk assessments against Tables 1, 2, and 3. These tables describe severities, likelihoods, and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting, and then a short commentary from myself. I’m providing commentary all the way through. So, let's crack on.

Task 202 Purpose

The purpose of Task 202, as it says, is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks, and identify potential mitigation measures. We're going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I've highlighted some stuff that I particularly want to talk about.

It says “the contractor” , but it doesn't matter who is doing the analysis, and, the customer needs to do something to inform themselves, otherwise they won't understand what they're doing.  Whoever does it needs to perform and document a PHA. It's about determining initial risk assessments. There's going to be more work, more detailed work done later. But for now, we're doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions we propose to introduce. That's very important. We don't need a design to do this. We can get in early when we have user requirements, functional requirements, and that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that's another big issue. We'll talk about that more later. It says to include mishap data as well, if accessible: American term mishap, means an accident, but we're avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, or whatever. It's a mishap. It's an undesirable event.

We look for accessible data from similar systems, legacy systems, and other lessons learned. I've talked about that a little bit in the Task 201 lesson, and there’s more on that today under commentary. We need to look at provisions, and alternatives, meaning design provisions and design alternatives to reduce risks and add mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What's the task description? That's a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we've got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call a hazard tracking system; we might call it a hazard log; we might call it a risk register. It doesn't matter what it's called. The key point is it's a tracking system. It's a live document, as people say, it's a spreadsheet or a database, something like that. It's something relatively easy to update and change.

Also, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and refine them and change them as time goes on. Very important point.

That's it for the Demo...

End: How to do Preliminary Hazard Analysis

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.
#buyhazardanalysistraining #buysafetyrequirementscourse #gethazardanalysistraining #getsafetyrequirementscourse #hazardanalysisandriskassessmentpdf #hazardanalysisandriskassessmenttemplate #hazardanalysisguide #hazardanalysismethod #hazardanalysistechnique #hazardanalysisthatworks #hazardanalysistutorial #hazardanalysisvideo #healthandsafetyrequirement #howtoanalyzehazards #PHA #requirementforsafetyofficer #safetyregulationintheworkplace #safetyrequirement #safetyrequirements #safetyrequirementstechnique #safetyrequirementstips #safetyrequirementstraining #safetyrequirementstutorial #safetyrequirementsvideo #safetystandardcertificate #safetystandardiso #studyhazardanalysis #Task202 #waystoanalyzehazards
Simon Di Nucci https://www.safetyartisan.com/2024/03/13/how-to-do-preliminary-hazard-analysis/


Lessons Learned from a Fatal Accident
Lessons Learned from a Fatal Accident
Lessons Learned: in this 30-minute video, we learn lessons from an accident in 2016 that killed four people on the Thunder River Rapids Ride in Queensland. The coroner's report was issued this year, and we went through the summary of that report. In it, we find failings in WHS Duties, Due Diligence, risk management, and failures to eliminate or minimize risks So Far As is Reasonably Practicable (SFARP). We do not 'name and shame', rather we focus on where we can find guidance to do better.

https://youtu.be/QaSoFld7W0g
In 2016, four people died on the Thunder River Rapids Ride.

Lessons Learned: Key Points

We examine multiple failings in:

- WHS Duties;

- WHS Due Diligence;

- Risk management; and

- Eliminating or minimizing risks So Far As is Reasonably Practicable (SFARP).

Transcript: Lessons Learned from a Theme Park Tragedy

Introduction

Hello, everyone, and welcome to the Safety Artisan: purveyors of fine safety engineering training videos and other resources. I'm Simon, and I'm your host and today we're going to be doing something slightly different. So, there are no PowerPoint slides. Instead, I'm going to be reading from a coroner's report from a well-known accident here in Australia, and we're going to be learning some lessons in the context of WHS workplace health and safety law.

Disclaimer

Now, I'd just like to reassure you before we start that I won't be mentioning the names of the deceased. I won't be sharing any images of them. And I'm not even going to mention the firm that owned the theme park because this is not about bashing people when they're down. It's about us as a community learning lessons when things go wrong to fix the problem, not the blame. So that's what I'd like to emphasize here.

The Coroner's Report

So, I'm just going to I'm just turning to the summary of the coroner's report. The coroner was examining the deaths of four people back in 2016 on what was called the Thunder River Rapids Ride. Or TRRR or TR3 for short because it's a bit of a mouthful. This was a water ride, as the name implies, and what went wrong was that the water level dropped. Rafts, these circular rafts that went down the rapids, went down the chute, got stuck. Another raft came up behind the stuck raft and went into it. One of the rafts tipped over. These rafts seat six people in a circular configuration. You may have seen them. They're in - different versions of this ride are in lots of theme parks.

But out of the six, unfortunately, the only two escaped before people were killed, tragically. So that's the background. That happened in October 2016, I think it was. The coroner's report came out a few months ago, and I've been wanting to talk about it for some time because it illustrates very well several issues where WHS can help us do the right thing.

WHS Duties

So, first of all, I'm looking at the first paragraph in the summary, the coroner starts off; the design and construction of the TRRR at the conveyor and unload area posed a significant risk to the health and safety of patrons. Notice that the coroner says the design and construction. Most people think that WHS only applies to workplaces and people managing workplaces, but it does a lot more than that. Sections 22 through 26 of the Act talk about the duties of designers, manufacturers, importers, suppliers, and then people who commissioned, installed, et cetera.

So, WHS supplies duties on a wide range of businesses and undertakings, and designers and constructors are key. There are two of them. Now, it's worth noting that there was no importer here. The theme park, although the TRRR ride was similar to a ride available commercially elsewhere, for some reason, they chose to design and build their version in Queensland. Don't know why. Anyway, that doesn't matter now. So, there was no importer, but otherwise, even if you didn't design and construct the thing, if you imported it, the same duties still apply to you.

No Effective Risk Assessment

So, the coroner then goes on to talk about risks and hazards and says each of these obvious hazards posed a risk to the safety of patrons on the ride and would have been easily identifiable to a competent person had one ever been commissioned to conduct a risk and hazard assessment of the ride. So, what the coroner is saying there is, “No effective risk assessment has been done”. Now, that is contrary to the risk management code of practice under WHS and also, of course, that the definition of SFARP, so far as reasonably practicable, basically is a risk assessment or risk management process. So, if you've not done effective risk management, you can't say that you've eliminated or minimized risks SFARP, which is another legal requirement. So, a double whammy there.

Then moving on. “Had noticed been taken of lessons learned from the preceding incidents, which were all of a very similar nature …” and then he goes on. That's the back end of a sentence where he says, you didn't do this, you had incidents on the ride, which are very similar in the past, and you didn't learn from them. And again, concerning reducing risks SFARP, Section 18 in the WHS Act, which talks about the definition of reasonably practicable, which is the core of SFARP, talks about what ought to have been known at the time.

So, when you're doing a risk assessment or maybe you're reassessing risk after a modification - and this ride was heavily modified several times or after an incident - you need to take account of the available information. And the owners of TRRR the operators didn't do that. So, another big failing.

The coroner goes on to note that records available concerning the modifications to the ride are scant and ad hoc. And again, there's a section in the WHS risk management code of practice about keeping records. It's not that onerous. I mean, the COP is pretty simple but they didn't meet the requirement of the code of practice. So, bad news again.

Due Diligence

And then finally, I’ve got to the bottom of page one. So, the coroner then notes the maintenance tasks undertaken on the ride whilst done so regularly and diligently by the staff, seemed to have been based upon historical checklists which were rarely reviewed despite the age of the device or changes to the applicable Australian standards. Now, this is interesting. So, this is contravening a different section of the WHS Act.

Section 27, talks about the duties of officers and effectively that sort of company directors, and senior managers. Officers are supposed to exercise due diligence. In the act, due diligence is fairly simple- It's six bullet points, but one of them is that the officers have to sort of keep up to date on what's going on in their operation. They have to provide up-to-date and effective safety information for their staff. They're also supposed to keep up with what's going on in safety regulations that apply to their operation. So, I reckon in that one statement from the coroner then there's probably three breaches of due diligence there to start with.

Risk Controls Lacking

We've reached the bottom of page one- Let's carry on. The coroner then goes on to talk about risk controls that were or were not present and says, “in accordance with the hierarchy of controls, plant and engineering measures should have been considered as solutions to identified hazards”. So in WHS regulations and it’s repeated in the risk code of practice, there's a thing called the hierarchy of controls. It says that some types of risk controls are more effective than others and therefore they come at the top of the list, whereas others are less effective and should be considered last.

So, top of the list is, “Can you eliminate the hazard?” If not, can you substitute the hazardous thing for something else that's less hazardous- or with something else that is less hazardous, I should say? Can you put in engineering solutions or controls to control hazards? And then finally, at the bottom of my list are admin procedures for people to follow and then personal protective equipment for workers, for example. We'll talk about this more later, but the top end of the hierarchy had just not been considered or not effectively anyway.

A Predictable Risk

So, the coroner then goes on to say, “rafts coming together on the ride was a well-known risk, highlighted by the incident in 2001 and again in 2004”. Now actually it says 2004, I think that might be a typo. Elsewhere, it says 2014, but certainly, two significant incidents were similar to the accident that killed four people. And it was acknowledged that various corrective measures could be undertaken to, quote, “adequately control the risk of raft collision”.

However, a number of these suggestions were not implemented on the ride. Now, given that they've demonstrated the ability to kill multiple people on the ride with a raft collision, it's going to be a very, very difficult thing to justify not implementing controls. So, given the seriousness of the potential risk, to say that a control is feasible is practicable, but then to say “We're not going to do it. It's not reasonable”. That's going to be very, very difficult to argue and I would suggest it's almost a certainty that not all reasonably practicable controls were implemented, which means the risk is not SFARP, which is a legal requirement.

Further on, we come back to document management, which was poor with no formal risk register in place. So, no evidence of a proper risk assessment. Members of the department did not conduct any holistic risk assessments of rides with the general view that another department was responsible. So, the fact that risk assessment wasn't done - that's a failure. The fact that senior management didn't knock heads together and say “This has to be done. Make it happen”- That's also another failing. That’s a failing of due diligence, I suspect. So, we've got a couple more problems there.

High-Risk Plant

Then, later on, the coroner talks about necessary engineering oversight of high-risk plant not being done. Now, under WHS act definitions, amusement rides are counted as high-risk plant, presumably because of the number of serious accidents that have happened with them over the years. The managers of the TRRR didn't meet their obligations concerning high-risk plants. So, some things that are optional for common stuff are mandatory for high-risk plants, and those obligations were not met it seems.

And then in just the next paragraph, we reinforce this due diligence issue. Only a scant amount of knowledge was held by those in management positions, including the general manager of engineering, as to the design modifications and past notable incidents on the ride. One of the requirements of due diligence is that senior management must know their operations, and know the hazards and risks associated with the operations. So for the engineering manager to be ignorant about modifications and risks associated with the ride, I think is a clear failure of due diligence.

Still talking about engineering, the coroner notes “it is significant that the general manager had no knowledge of past incidents involving rafts coming together on the ride”. Again, due diligence. If things have happened those need to be investigated and learned from and then you need to apply fresh controls if that's required. And again, this is a requirement. So, this shows a lack of due diligence. It's also a requirement in the risk management code of practice to look at things when new knowledge is gained. So, a couple more failures there.

No Water-Level Detection, Alarm, Or Emergency Stop

Now, it said that the operators of the ride were well aware that when one pump failed, and there were two, the ride was no longer able to operate with the water level dropping dramatically, stranding the rafts on the steel support railings. And of course, that's how the accident happened. Regardless, there was no formal means by which to monitor the water level of the ride and no audible alarm to advise one of the pumps had ceased to operate. So, a water level monitor? Well, we're talking potentially about a float, which is a pretty simple thing. There's one in every cistern, in every toilet in Australia. Maybe the one for the ride would have to be a bit more sophisticated than that- A bit industrial grade but the same principle.

And no alarm to advise the operators that this pump had failed, even though it was known that this would have a serious effect on the operation of the ride. So, there are multiple problems here. I suspect you'll be able to find regulations that require these things. Certainly, if you looked at the code of practice on plant design because this counts as industrial plants, it's a high-risk plant, so you would expect very high standards of engineering controls on high-risk plants and these were missing. More on that later.

In a similar vein, the coroner says “a basic automated detection system for the water level would have been inexpensive and may have prevented the incident from occurring”. So basically, the coroner is saying this control mechanism would have been cheap so it's certainly reasonably practicable. If you've got a cheap control that will prevent a serious injury or a death, then how on earth are you going to argue that it's not reasonable to implement it? The onus is on us to implement all reasonably practical controls.

And then similarly, the lack of a single emergency stop on the ride, which was capable of initiating a complete shutdown of all the mechanisms, was also inadequate. And that's another requirement from the code of practice on plant design, which refers back to WHS regulations. So, another breach there.

Human Factors

We then move on to a section where it talks about operators, operators’ accounts of the incident, and other human factors. I'm probably going to ask my friend Peter Bender, who is a Human Factors specialist, to come and do a session on this and look at this in some more detail, because there are rich pickings in this section and I'm just going to skim the surface here because we haven't got time to do more.

The coroner says “it's clear that these 38 signals and checks to be undertaken by the ride operators was excessive, particularly given that the failure to carry out any one could potentially be a factor which would contribute to a serious incident”. So clearly, 38 signals and checks were distributed between two ride operators, because there was no one operator in control of the whole ride- that's a human factors nightmare for a start- but clearly, the work designed for the ride was poor. There is good guidance available from Safe Work Australia on good work design so there's no excuse for this kind of lapse.

And then the coroner goes on to say, reinforcing this point that the ride couldn't be safely controlled by a human operator. The lack of engineering controls on a ride of this nature is unjustifiable. Again, reinforces the point that risk was not SFARP because not all reasonably practicable controls had been implemented. Particularly controls at the higher end of the hierarchy of controls. So, a serious failing there.  

(Now, I've got something that I'm going to skip, actually, but - It's a heck of a comment, but it's not relevant to WHS.)

Training And Competence

We're moving on to training and competence. Those responsible for managing the ride whilst following the process and procedure in place - and I'm glad to see you from a human practice point of view that the coroner is not just trying to blame the last person who touched it. He's making a point of saying the operators did all the right stuff. Nevertheless, they were largely not qualified to perform the work for which they were charged.

The process and procedures that they were following seemed to have been created by unknown persons. Because of the poor record-keeping, presumably who it is safe to assume lacked the necessary expertise. And I think the coroner is making a reasonable assumption there, given the multiple failings that we've seen in risk management, in due diligence, in record-keeping, in the knowledge of key people, et cetera, et cetera. It seems that the practice at the park was simply to accept what had always been done in terms of policy and procedure.

And despite changes to safety standards and practices happening over time, because this is an old ride, only limited and largely reactionary consideration was ever given to making changes, including training, provided to staff. So, reactionary -bad word. We're supposed to predict risk and prevent harm from happening. So, multiple failures in due diligence here and on staff training, providing adequate staff training, providing adequate procedures, et cetera.

The coroner goes on to say, “regardless of the training provided at the park, it would never have been sufficient to overcome the poor design of the ride. The lack of automation and engineering controls”. So, again, the hierarchy of controls was not applied, and relatively cheap, engineering controls were not used, placing an undue burden on the operator. Sadly, this is all too common in many applications. This is one of the reasons they are not naming the ride operators or trying to shame them because I've seen this happen in so many different places. It wouldn't be fair to single these people out.

‘Incident-Free’ Operations?

Now we have a curious, a curious little statement in paragraph 1040. The coroner says “submissions are made that there was a 30-year history of incident-free operation of the ride”. So, what it looks like is that the ride operators, and management, trying to tell the coroner that they never had an incident on the ride in 30 years, which sounds pretty impressive, doesn't it, at face value?

But of course, the coroner already knew or discovered later on that there had been incidents on the ride. Two previous incidents were very similar to the fatal accident. Now, on the surface, this looks bad, doesn't it? It looks like the ride management was trying to mislead the coroner. I don't think that's the case because I've seen many organizations do poor incident reporting, poor incident recording, and poor learning from experience from incidents. It doesn't surprise me that the senior management was not aware of incidents on their ride. Unfortunately, it's partly human nature.

Nobody likes to dwell on their failures or think about nasty things happening, and nobody likes to go to the boss saying we need to shut down a moneymaking ride. Don't forget, this was a very popular ride. We need to shut down a moneymaking ride to spend more money on modifications to make it safer. And then management turns around and says, “Well, nobody's been hurt. So, what's the problem?” And again, I've seen this attitude again and again, even on people operating much more sophisticated and much more dangerous equipment than this. So, whilst this does look bad- the optics are not good, as they like to say. I don't think there's a conspiracy going on here. I think it's just stupid mistakes because it's so common. Moving on.

Standards

Now the coroner goes on to talk about standards not being followed, particularly when standards get updated over time. Bearing in mind this ride was 30 years old. The coroner states “it is essential that any difference in these standards are recognized and steps taken to ensure any shortfalls with a device manufactured internationally is managed”. Now, this is a little bit of an aside, because as I've mentioned before, the TRRR was actually designed and manufactured in Australia. Albeit not to any standards that we would recognize these days. But most rides were not and this highlights the duties of importers. So, if you import something from abroad, you need to make sure that it complies with Australian requirements. That's a requirement, that's a duty under WHS law. We'll come back to this in just a moment.
#coursesafetyengineering #duediligence #engineersafety #fatalaccident #ineedsafety #knowledgeofsafety #learnsafety #lessonslearned #needforsafety #riskmanagement #safetyblog #safetydo #safetyengineer #safetyengineerskills #safetyengineertraining #safetyengineeringcourse #safetyprinciples #SFARP #softwaresafety #theneedforsafety #themeparkaccident #thunderriverrapidsride #WHS
Simon Di Nucci https://www.safetyartisan.com/2023/12/06/lessons-learned-from-a-fatal-accident/

The 2023 Digest The 2023 Digest brings you all The Safety Artisan's blog posts from last year. I hope that you find this a useful resou...