Tuesday, December 30, 2025



System Safety Assessment
System Safety Assessment
In this System Safety Assessment course, I will take you through a suite of safety analysis tasks. They are designed to deal with a complex system, but can be simplified (known as 'tailoring'). I start with Preliminary Hazard Identification and work through detailed analyses, each with a different point of view of the system.

System Safety

The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approachHarold E. Roland; Brian Moriarty (1990). System Safety Engineering and Management.

System Safety Engineering

Every approach to safety has a context that needs to be understood to get the best results. I have used the Tasks from a system safety engineering standard called Military-Standard-882E, or Mil-Std-882E, for short. This has been around for a long time and is very widely used. It was developed for use on US military systems, but it has found its way, sometimes in disguise, into many other programs around the world.

However, any safety analysis standard can be applied blindly – it is not a substitute for competent decision-making. So, I explain the limitations with each Task and how to overcome them.

Safety Assessment

A safety assessment is a comprehensive and systematic investigation and analysis of all aspects of risks to health and safety associated with major incidents that may potentially occur in the course of operation of the major hazard facility...Guide for Major Hazard Facilities: Safety Assessment, Safe Work Australia, 2012

Safety Assessment

Head back to the Topics Page for more safety training.

Simon Di Nucci https://www.safetyartisan.com/safety-analysis/


System Safety FAQ
System Safety FAQ
Introduction

In System Safety FAQs I will deal with the most commonly searched-for online queries.  This post is also the basis for the First in a new series of monthly webinars I’m running.  I will also be answering your questions: leave them in the comments at the bottom of this post!

What is System Safety?

“System Safety is the application of engineering and management principles, criteria and techniques to achieve acceptable mishap risk within the constraints of operational effectiveness and suitability, time and cost throughout all phases of the system life cycle.”NASA

This definition from NASA is spot on. System Safety is fundamentally about reducing the risks of mishaps (accidents). The NASA Office of Safety and Mission Assurance website is great for practitioners!

The Systems Engineering 'V' Model

“The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach”. Wikipedia

This Wikipedia article reminds us that safety risk management is a subset of risk management in general.  It also brings in the concept of a ‘hazard’, which is typical for ‘system safety’ – see my free lesson on basic risk concepts for more information.

Where Does Safety Start?

Safety is an ‘emergent property’, that is it comes about by pulling together many different things.  Only leaders and managers can deliver these things; it doesn’t work if you try to do it from the bottom up.

“Safety undoubtedly starts at the top. The people leading the organization are the ones most responsible for its safety. It's simple.”

Avatarms.com

I would also say that safety begins at the start of the lifecycle with requirements – see my short video about what System Safety is:

https://youtu.be/hse2M_ZeDzQ

Safe System Approach?

“The Safe System approach adopts a holistic view of the road transport system and the interactions between people, vehicles, and the road environment. It recognises that people will always make mistakes and may have road crashes – but those crashes should not result in death or serious injury.”Thinkroadsafety.sa.gov.au

This is a great view of a safe system approach, or strategy, from the world of road safety.  Road networks, their commercial and private users, neighbours, regulators, emergency services, etc., form a very complex distributed system.

Why System Safety?

What are the benefits?

“A customised Safety Management System will help you create an environment where all employees are empowered to identify hazards before they become problems, so your business can stay safe without losing focus on growth, profit or innovation.”Worksafetyhub.com.au

I would add that a systematic approach to safety saves time and money in the long run.

System Safety for The 21st Century

Traditional System Safety has its critics, most famously professors Nancy Leveson and Erik Hollnagel.  They have made various criticisms of system safety – some of which I agree with, and some I most definitely do not.

Leveson has proposed new methods:

- System-Theoretic Accident Model and Processes (STAMP);

- Systems Theoretic Process Analysis (STPA); and

- Causal Analysis using System Theory (CAST) – accident analysis.

Hollnagel has written on a wide variety of safety topics including cognition, organizational robustness, and resilience.  He also coined the terms “Safety I” for traditional safety approaches, and “Safety II” to describe the conceptual approach that he and others have developed.

He designed the Functional Resonance Analysis Method (FRAM). 

“THE FRAM is a method to analyse how work activities take place either retrospectively or prospectively. This is done by analysing work activities in order to produce a model or representation of how work is done.”Functionalresonance.com

I have tried FRAM, and even without any training (which is recommended), I found it tremendously powerful.  FRAM can analyse problems that conventional safety techniques just can’t get to grips with.   

From 'FRAM in a Nutshell' by Mohammad Tishehzan at etn-peter.eu

Others have also introduced the term “Safety III”, but I’m not sure how useful these labels are.  Perhaps we are now on a trajectory of diminishing returns.

System Safety is a Design Parameter

To save us from all this abstract navel-gazing, let’s get back to practical matters.

“Safety-related parameters are control system variables whose incorrect setting immediately increases the risk to the user.”Machinery101.com

Concrete, specific, practical: I love it!  Let’s not forget that we do safety for a reason, and big part of that is to control the machines that make our modern world.  This doesn’t sound very exciting, but automation has enabled huge increases in productivity, wealth, health, quality of life, lifespan and human rights.  Let’s remember that during the current hysteria about Artificial Intelligence (actually Machine Learning).

Safety System of Work

“a safe system of work such as safety procedures. information, supervision, instruction and training on the safe use, handling and storage of machinery, structures, substances and other work tasks. personal protective equipment as required. a system to identify hazards, assess and control risks.”Safework.sa.gov.au

If we think about it, this ties in nicely with the definition of a system used in system safety, e.g.:

“A combination, with defined boundaries, of elements that are used together in a defined operating environment to perform a given task or achieve a specific purpose. The elements may include personnel, procedures, materials, tools, equipment, facilities, services and/or software as appropriate.”UK Defence Standard 00-56/1

System Safety in Engineering

There are a number of ways that we could answer this (implicit) question.  Here’s one from the Office of The Under Secretary Of Defense For Research And Engineering:

“System safety engineering involves planning, identifying, documenting, and mitigating hazards that contribute to mishaps involving defense systems, platforms, or personnel (military and the public). The system safety practice aids in optimizing the safety of a system.”Ac.cto.mil

This definition neatly pulls together activities, hazards and accidents, those impacted and the aim of the whole thing.  Phew!

There’s More!

Questions and Comments?

Please leave them below.

Meet the Author

Harold E. Roland; Brian Moriarty (1990). System Safety Engineering and Management. John Wiley & Sons. ISBN 0471618160.
#doessafetystart #issystemsafety #safesystemapproach #safetysystemtowork #systemsafety #systemsafetyforthe21stcentury #systemsafetyinengineering #systemsafetyisthesystemdesignparameterthat #whatissystemsafety #whatsystemsafety #what’ssystemsafety
Simon Di Nucci https://www.safetyartisan.com/2023/10/07/system-safety-faq/


Guide to Establishing and Running a Project Safety Committee (PSC)
Guide to Establishing and Running a Project Safety Committee (PSC)
Our Second Safety Management Procedure is the Project Safety Committee. Okay, so committees are not the sexiest subject, but we need to get stakeholders together to make things happen!

Project Safety Committee: Introduction

In safety-critical industries such as defense, aerospace, and engineering, maintaining a robust safety management system (SMS) is paramount. A Project Safety Committee (PSC) plays a vital role in overseeing, coordinating, and ensuring safety compliance throughout the lifecycle of equipment and systems. This guide will explore the role, objectives, and procedures of a PSC, as defined in UK Def Stan 00-56, and provide insights on how to structure and run a PSC effectively.

What is a Project Safety Committee (PSC)?

A Safety Committee is defined as:

A group of stakeholders that exercises, oversees, reviews and endorses safety management and safety engineering activities.Def Stan 00-56

Simply put, the PSC is a formal body composed of experts and decision-makers from various disciplines, convened to ensure that safety-related decisions are well-founded, thoroughly vetted, and correctly implemented.

Objectives of a PSC

The key objectives of a PSC are to ensure effective coordination, agreement, and proper response from those with safety responsibilities. Specifically, the PSC achieves the following:

- Coordination of Safety Issues: The PSC acts as a platform where all stakeholders responsible for safety management can ensure coordination on safety issues, eliminating silos.

- Access to Knowledge: It provides decision-makers with access to relevant knowledge and expertise across different domains, including engineering, maintenance, user experience, and risk management.

- Oversight of the Safety Case: The PSC ensures competent oversight of the safety case throughout its development and maintenance.

- Audit Trail: keep detailed meeting records, and establish an audit trail showing that advice was sought and safety decisions were grounded in expertise.

The PSC should facilitate smaller working groups or sub-committees to address specific safety issues when necessary, ensuring that no aspect of the safety management process is overlooked.

In Australia, it is a legal requirement for those with safety responsibilities (Duty Holders) to consult, coordinate and cooperate with others. Other countries may use different terms for similar requirements. The bottom line is that it's a good idea!Top Tip

Project Safety Committee: Procedure

Membership of the PSC

The effectiveness of a PSC largely depends on its membership, which should include representatives with specific roles and expertise, as appropriate to the project. Typical members might include:

- Delivery Team Representatives (e.g., Project Safety Manager)

- Logistics Support Teams

- Equipment Support Teams

- Customer and User Representatives

- Prime Contractors and Subcontractors

- Design Organization

- Independent Safety Auditor

- Specialist Advisors

- Regulator / Safety Authority

- Safety and Environmental Protection Group

Moreover, it may also include contractors, consultants, and subject matter experts from other government departments or foreign defense bodies.

However, don't invite anybody and everybody 'just in case', as this devalues the PSC and its work. Top Tip

More information on PSC membership has been provided in Annex A - example Terms of Reference for a PSC.

Chair and Quorum

A critical element of any PSC is competent leadership. The PSC Chair must be a safety-competent individual holding formally-delegated authority for the program's safety tasks, typically defined in a Letter of Delegation. This document outlines the chairperson’s responsibilities and authority.

For a PSC to conduct its business, it must be quorate, meaning a minimum number of key members must be present. This quorum usually consists of:

- Delivery Team safety delegation holder

- Project Safety Manager

- Design organization representative

- Customer representative

- Safety Case author

If a quorum is not achieved, the meeting can still proceed, but decisions will only be implemented after receiving approval from the absent quorum members..

Quorum

In order for a PSC to make decisions concerning the safety of a capability or equipment, it should be declared quorate at the beginning of the meeting. In order for a PSC to be declared quorate, the following SQEP and authorized members should be in attendance:

- Delivery Team safety delegation holder

- Project Safety Manager

- Design organization

- Customer representative (Project Sponsor)

- Safety Case author

The quorate for a PSC can be expanded depending on the nature of the project. Details should be provided in the Project Safety Management Plan (SMP) or Terms of Reference.

If a quorum is not achieved, the meeting can still proceed, but decisions will only be implemented after receiving approval from the absent quorum members. 

This is a good point. PSCs don't always meet frequently, and getting some members to attend can be challenging. Nevertheless, it is important to keep moving forwards.Top Tip

Meeting Frequency and Structure

PSC meetings should be scheduled regularly, though the frequency will depend on the project’s complexity and phase. Typically, meetings occur more frequently during the early design and review stages, and less frequently once the system is in service.

For smaller projects, PSC activities can be integrated into broader project meetings, ensuring safety remains a specific agenda item. Larger or more complex projects may require dedicated PSC meetings with support from Working Groups to assess hazards or system integrity.

Working Level Support

Depending on the complexity of the project, one or more working groups may be established that support the PSC by assessing hazards or reviewing the integrity of specific systems. Integrity working groups could consider structure, propulsion or other electrical or mechanical systems, reporting significant issues to the PSC.

Role of the Safety Management Committee (SMC)

For large-scale projects or portfolios, a Safety Management Committee (SMC) may be established to manage multiple PSCs across similar systems. This ensures consistency in safety management policy and strategy across projects. The SMC will oversee the activities of individual PSCs, ensuring adherence to safety management plans (SMPs).

Figure 2.1 shows an example of a Safety Committee structure, together with the management documents that sit at the relevant committee level.

Figure 2.1 - Safety Committee Structure

Safety Committee Structure

Figure 2.1 represents an example of a Safety Committee structure, with supporting working groups and hazard reviews in place. Teams can modify the structure of the Safety Committees to suit the specific organization of the program. The emphasis should be on establishing a Safety Committee with suitable chairmanship and Terms of Reference.

The structure shown in Figure 2.1 would be suitable for a large Program managing several important projects. However, it is probably overkill for most projects. With committees, less is sometimes more.Top Tip

Project Safety Committee Authority and Competence

The chairman of the PSC should hold a Letter of Delegation detailing the authority for carrying out the safety management tasks on that program.

The PSC exists to provide information and specialist advice to those who have specific responsibility for safety management on an acquisition project so that they can reach informed decisions. The Project safety delegation holder should seek and consider relevant advice through the PSC but remain the decision-maker.

While not all members of the PSC need to have specific competence and experience in Safety Management, some committee members must have this competence and are consulted.  In addition to the safety delegation holder, whose competence must be established before their delegation being issued, other members of the PSC who must be safety competent would typically include the Project Safety Manager and the Independent Safety Auditor (if appointed).

As a minimum, the Project Safety Manager should have system safety competence at the practitioner level.  Competence requirements for the safety delegation holder will be defined in a relevant Assignment Specification.

The level of competence needed is driven by many factors - size, complexity, novelty - and this will be discussed under a post on 'Proportionality' (TBD). Top Tip

Where beneficial, combine committees for safety and environmental management activities. Align programs as far as possible and share data where relevant.

Where there are separate safety and environmental committees, these could meet consecutively over the morning and afternoon. Members and specialists should attend as appropriate to each.

The PSC covers groups of similar projects within a Delivery Team where common activities are required. Separate committees are better for very large, high-risk risk or diverse projects within a Delivery Team.

The PSC meets regularly as a body, or its work is included as a permanent item in another forum (in this instance, care should be taken that all relevant parties are included), or simply through written communications. This last option is less desirable because there is no opportunity for direct interaction.

Record-Keeping and Documentation

Accurate record-keeping is vital to ensure transparency, accountability, and auditability. PSC meeting minutes should document:

- Attendees

- Key discussions

- Advice and recommendations

- Decisions made

- Agreed actions

These records often feed into larger project documentation, such as the System Requirements Document, Through Life Management Plan, and Safety Management System (SMS).

Review and Agreement of Safety Documents

A key PSC function is reviewing safety documents and advising the safety delegation holder on their suitability. Agreement can be recorded formally via document sign-offs or recommendations in PSC minutes. This process ensures that all safety documentation, including the Safety Case, meets the required standards before formal approval and implementation.

Risks and Pitfalls

Failure to establish or effectively run a PSC can lead to significant risks for a project, including:

- Incomplete stakeholder engagement, leading to safety requirements not being adequately defined.

- Inappropriate safety activities, if the PSC does not review and approve the SMP.

- Infrequent meetings potentially delay issue identification, risking project time and cost.

- Lack of clear authority, causing confusion between Enterprise and contractor responsibilities, which could shift accountability from the designers to the PSC.

By mitigating these risks through clear terms of reference, structured meetings, and well-defined roles, the PSC can ensure project safety management remains robust and reliable.

Beware of the PSC delving into detail and doing what is expedient, rather than was is needed. Set appropriate TORs and agendas and stick to them.Tip Top

If the PSC does not meet with sufficient frequency, then they may not identify in a timely manner any issues with the safety program. This could result in impacts on project time and cost.

If the PSC attempts to control the detailed design solutions, rather than relying on the contractor’s Safety Committee and design function, then Enterprise will take responsibility from the designer. Enterprise staff will be represented on the contractor’s Safety Committee and shall exercise influence at that forum and through setting appropriate requirements.

Project Safety Committee: Timing

Formation

Establish the PSC during the Concept phase of a project by the Customer, or Requirements Manager, through the Capability Working Group, in conjunction with the relevant Project Director, to set out the safety requirements for the equipment.

The PSC has an important role to play in influencing safety requirements. This is not mentioned in 'PSC: Required Outputs', below, but is possibly the PSC's most important contribution.Top Tip

Meetings

The required frequency of the PSC meetings depends on various factors, including the stage of the project, the complexity of the system, and whether the PSC is supported by Working Groups or has complete responsibility.  Hold meetings at greater frequency during periods of significant review and decision-making, typically when project milestones are approaching.

PSC meetings may occur less frequently during periods of stability, such as during the in-service phase, when fewer safety decisions are necessary.  However, the PSC still has an important duty to provide oversight of the Safety Case and ensure that it remains valid and monitors safety performance.  Consider whether the system or its usage is changing and seeking counter-evidence that shows the predicted level of safety performance is not being achieved in practice.

Project Safety Committee: Required Inputs

The procedure may use the following reference inputs, as available:

- Outputs from procedure SMP01 – Safety Initiation;

- Documents to be reviewed, such as:

- Project Safety Management Plan;

- Independent Safety Auditor Audit Plan (if appointed);

- Independent Safety Auditor Audit Report (if appointed);

- Other Safety Audit Plans (e.g. self or Peer audit);

- Safety Audit Report;

- Hazard Log Report;

- Safety Requirements;

- Safety Assessment Report;

- Safety Case Report.

- Acquisition System Guidance Functional Competencies for System Safety Management;

- Records of previous meetings of the Safety Committee.

Project Safety Committee: Required Outputs

The outputs of the procedure will comprise:

- Established Safety Committee membership;

- Defined Terms of Reference for the Safety Committee (see Further Guidance – Examples Terms of Reference for Project Safety Committee);

- Records of Safety Committee meetings, including advice given and the actions, agreed;

- The advice given by members of the Safety Committee should include recommendations on whether a reviewed document (e.g. Safety Management Plan or Safety Case Report) should be authorized by the Project Director. If authorization is not recommended, then the reasons should be recorded.

Conclusion

The establishment and management of a Project Safety Committee (PSC) are critical to the safe delivery of defense and engineering projects. Through clear objectives, expert membership, and rigorous oversight, the PSC ensures that safety remains at the forefront of project decision-making, thereby protecting both people and assets.

By following this comprehensive guide, organizations can structure their PSCs effectively, aligning with safety standards and regulatory requirements. The PSC is not just a procedural necessity; it is a cornerstone of responsible project management in safety-critical environments.

Annex A

Example Terms of Reference for Project Safety Committee

Terms of Reference for – Project XXXX

Purpose:

To provide a forum for monitoring and coordinating all safety management and risk reduction activities associated with the project to ensure effective levels of safety and provide an appraisal of the Safety Case. The Project Safety Committee reports to the Project Director or in a larger Delivery Team to the Safety Management Committee.

Tasks:

- Set and keep under review the project’s safety policy and strategy;

- Set and keep under review the project’s safety targets and objectives;

- Define the system boundaries for safety responsibility;

- Advise the Chairperson of the Safety Committee on the safety responsibilities of each authority associated with the project;

- Advise the Chairperson of the Safety Committee on the standards, statutory regulations, and any restrictions with which the projects should comply;

- Review, monitor, classify and allocate new equipment hazards as they are identified;

- Carry out reviews of the project’s Safety Case and progress on achieving safety targets, to a predetermined program, issuing the results to the Delegated Authority;

- Agree on any control measures necessary to reduce identified risks to ALARP;

- Ensure proper and timely availability of training and issue of documentation;

- Carry out actions from ISA, regulatory or internal audit findings;

- Operate a system for reviewing and monitoring safety performance and maintain the Safety Case.

Membership:

- Delivery Team responsible for the procurement aspects of the project;

- Customer representative (Capability or Equipment Customer);

- Safety Officer (if appointed);

- Design organization;

- Delivery Team responsible for the support aspects of the project;

- Equipment User;

- Training Authority;

- Maintainer;

- Maintenance Authority;

- Specialist Advisors (as required):

- Defense Safety Regulators;

- Defense Ordnance Safety Group;

- Land Accident Prevention and Investigation Team;

- Military Aviation Accident Investigation Team;

- Serious Equipment Failure Investigation Team;

- Independent Safety Auditor;

- Interfacing Delivery Teams;

- Technical Specialists.

Acknowledgment of Copyright

In this article, I have used material from a UK Ministry of Defence guide, reproduced under the terms of the UK’s Open Government Licence.

Project Safety Committee: Who Would You Include?

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.
#defstan0056 #DefenseSafetyAuthority #howtoselectsafetycommitteemembers #ProjectManagementSafety #ProjectSafetyCommittee #RiskManagementinEngineering #SafetyCaseManagement #safetycommittee #safetycommitteechairmanresponsibilities #safetycommitteechairpersonresponsibilities #safetycommitteediscussiontopics #safetycommitteegoalsexamples #safetycommitteeiscomprisedof #safetycommitteetermsofreference #safetycommitteevisionstatementexamples #safetyengineering #SafetyManagementCommittee #SafetyOversightinEngineering #systemsafety
Simon Di Nucci https://www.safetyartisan.com/2024/10/09/project-safety-committee/

Monday, December 29, 2025



Failure Mode Effects Analysis
Failure Mode Effects Analysis
TL;DR This article on Failure Mode and Effects Analysis explains this powerful and commonly used family of techniques. You can access this webinar (and all the others) here.

I have used FMEA and related techniques on many programs, and it can produce powerful results quickly and cheaply. Recently, I've seen some criticism of FEMA on social media. However, I'm convinced that this is only clickbait. The secret of success is to understand what a technique is good for - and not - and to apply it well. It's as simple as that!

This article covers:

- A description of the technique, including its purpose;

- When it might be used;

- Advantages, disadvantages and limitations;

- Sources of additional information;

- A simple example of an FMEA/FMECA; and

- Additional comments.

I’ve added some ‘top tips’ of my own based on my personal experience in the industry.Top Tip

In this article, I have used material from a UK Ministry of Defence guide, reproduced under the terms of the UK’s Open Government Licence. I have rewritten the very dull source material to make it more readable!

A Description of the Technique, Including Its Purpose

Failure modes and effects analysis (FMEA) was one of the first systematic techniques for failure analysis. It was developed in the United States military (Military Procedure MIL-P-1629, titled ‘Procedures for Performing a Failure Modes, Effects and Criticality Analysis’, November 9, 1949) as a reliability evaluation technique to determine the effect of system and equipment failures. Failures were classified according to their impact on mission success and personnel, equipment, and safety. In the 1960s, it was used by the aerospace industry and NASA during the Apollo program. More and more industries - notably the automotive industry - have seen the benefits to be gained by using FMEAs to complement their design processes.

This qualitative technique helps identify failure potential in a design or process i.e., to foresee failure before it actually happens. This is done by defining the system that is under consideration to ensure system boundaries are established, and then by following a procedure, which helps to identify design features or process operations that could fail. The procedure requires the following essential questions to be asked:

- How can each component fail?

- What might cause these modes of failure?

- What could the effects be if these failures did occur?

- How serious are these failure modes?

- How is each failure mode detected?

- What are the safeguards in place to protect against accidents resulting from the failure mode?

As always with safety analyses, the more precisely you can answer these questions (above), the better the results you will get.Top Tip

As an aid in structuring the analysis and ensuring a systematic approach, results are recorded in a tabular format. Several different forms are in use, and the form design can be tailor-made to suit the particular requirements of a study. Examples of forms can be found in several standards (links below).

Make the form support the flow of the process, left-to-right, then top-down!Top Tip

The FMEA analysis can be extended if necessary by characterizing the likelihood, severity, and resulting levels of risk of failures. FMEAs that incorporate this criticality analysis (CA) are known as FMECAs. A FMECA is an analytical quantitative technique that ranks failure modes according to their probability and consequences (i.e., the resulting effect of the failure mode on the system, mission, and personnel). This technique is referred to as a “bottom-up approach” as it starts by identifying the potential failure modes of a component and analyzing their effects on the whole system. It can be quite complex depending on how the user drives the technique.

We should note that the FMECA does not provide a model by which system reliability can be quantified. Hence, if the objective is to estimate the probability of events, a technique that results in a logic model of the failure mechanisms must be employed, typically a fault tree and/or an event tree.

Reliability Block Diagrams, or for repairable systems, Markov Chains can also be used.Top Tip

A FMEA or FMECA can be conducted on either a component or a functional level. A functional FMEA/FMECA only covers hardware aspects, but a functional FMEA/FMECA can cover all aspects of a system. For either approach, the general principle remains the same.

When it Might be Used

FMEA is applicable for any well-defined system, but is primarily used for reviews of mechanical and electrical systems. It can be used in many situations, for example, to assess the design of a product in terms of what could go wrong in manufacturing and in-service as a result of the weakness in the design. We can also use it to analyze failures in the manufacturing process itself and during service. It is effective for collecting information needed to troubleshoot system problems and improving maintenance and reliability of plant and equipment (defining and optimizing), as it focuses directly and individually on equipment failure modes.

It's fair to say that you need a design, on which to perform a FMEA. Pre-design you could use Functional Failure Analysis (FFA) instead.Top Tip

The FMECA technique is best suited for detailed analysis of system hardware, and should preferably be carried out by the designer in parallel with system development. This will not only speed up the analysis itself, but also force the design team to think systematically about the failure characteristics of the system. The primary use of the FMECA is in verifying that single-component failures cannot cause a catastrophic system failure.

There are a number of areas today in which the use of FMECA has become mandatory to demonstrate system reliability. Examples of such requirements are in the classification of Dynamically Positioned (DP) vessels and in a number of US military applications for which MIL-STD documents apply.

Advantages, Disadvantages, and Limitations

Advantages

- It is widely used and well-understood, and easy to understand and interpret

- It can be performed by a single analyst or more if required

- Qualitative data about the causes and effects can be incorporated into the analysis

- It is systematic and comprehensive, and should identify hazards with an electrical or mechanical basis

- The level of detail incorporated can be varied to suit the analysis

- It identifies safety-critical equipment where a single failure would be critical for the system

- Even though the technique can be quite time-consuming it can lead to a thorough understanding of the system being considered

Disadvantages

- The technique adopts a bottom-up approach, and if conducting a component-level FMEA or FMECA, this can be boring and repetitive

- The benefit gained is dependent upon the experience of the analyst or the group.

- It requires a hierarchical system drawing as the basis for the analysis, which the analyst usually has to develop before the FMEA process can start

- It is optimised for mechanical and electrical equipment, and does not apply easily to Human Factor Integration, procedures, or process equipment

- It is difficult for the technique to cover multiple failures, as equipment failures are generally analysed one by one; therefore, important combinations of equipment failures may be overlooked

- Most accidents have a significant human or external influence contribution, and these are not a usual failure mode with FMEA

- More than one FMEA may be required for a system with multiple modes of operation

- Due to its wide use, there can be a temptation to read across data from ARM or ILS projects where, for example, the fault-tree technique has been used. As a consequence, the safety perspective can be lost as human error has been excluded and the focus has been solely on determining faults and not on more far-reaching safety issues

- Perhaps the worst drawback of the technique is that all component failures are examined and documented, including those that do not have any significant consequences.

- For large systems, especially those with a fair degree of redundancy built into them, the amount of unnecessary documentation is a major disadvantage. Hence, the FMECA should primarily be used by designers of reasonably simple systems. It should, however, be noted that the concept of the FMECA form can be quite useful in other contexts, e.g., when reviewing an operation rather than a hardware system. Then the use of a form similar to the FMECA can provide a useful way of documenting the analysis. Suitable columns in the form could, for example, include: operation, deviation, consequence, correcting or reversing action, etc.

ARM = Availability, Reliability, MaintainabilityILS = Integrated Logistic Support (or logistics engineering)Top Tip

Sources of Additional Information, such as Standards, Textbooks, and Websites

BS 5760: Part 5 Reliability of Systems, Equipment and Components: Part 5 Guide to Failure Modes, Effects and Criticality Analysis.

HSE Website - Marine Risk Assessment, Offshore Technology Report 2001/063

IEC 60812:2018 Failure modes and effects analysis (FMEA and FMECA)

As always, Understand your Standard (what it was designed to do) to get the best out of it!Top Tip

A Simple Example of an FMEA/FMECA

An example extract from an FMEA of a ballast system is shown below. This can be found in the HSE Marine Risk Assessment Report. The column headings are based on the US Military Standard Mil-Std 1629A, but with modifications to suit the particular application. For example, the failure mode and cause columns are combined. The criticality of each failure is ranked as minor, incipient, degraded, or critical.

An example of an FMEA Output Table

To properly understand these results you need to know how a Sea Chest works (see context here). Otherwise the example just shows what kind of output a FMEA can produce.Top Tip

Additional comments

Failure Modes and Effects and Criticality Analysis (FMECA) is an analytical QRA technique, used by ARM and ILS systems engineers, most commonly and effectively at the late design, test and manufacture stage of a project. It requires the breakdown of the system into individual components and the identification of possible failure modes or malfunctions of each component, (such as too much flow through a valve). Referred to as a bottom-up approach, it starts by identifying the potential failure modes of a component and analyzing their potential effects on the whole system. Numerical levels can be assigned to the likelihood of the failure and the severity or consequence of the failure.

Note: It is important to recognize that FMEA/FMECA Standards have different approaches to criticality. Failure mode severity classes 1 – 5 for Standards MIL1629A and ARP926A go from Class 1 being the most severe (e.g. loss of life) to Class 5 being less severe (i.e. no effect), whereas BS 5760 deals with criticality in the opposite direction where Class 5 is the most severe.

Note that FMECA for ARM/ILS looks at availability or mission criticality, not safety criticality.  A FMECA for safety will have a different focus.Top Tip

Software:

- Isograph;

- Reliability Work Bench;

- Reliasoft;

- Microsoft Excel.

These are not recommendations!

FMEA/FMECA tables for complex systems can run to hundreds of pages, so good tool support is essential.Top Tip

Failure Mode Effects Analysis: Have You Used This Technique?

Back to the Safety Assessment topic page.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience. I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.
#failureeffectsmodeanalysis #failuremodeeffectanalysiscasestudy #failuremodeeffectanalysisfosterssafetyinsystemsandthepreventionofaccidentsby #failuremodeeffectanalysissoftware #FailureModeEffectsAnalysis #failuremodeeffectsanalysisexample #failuremodeeffectsanalysistemplate #failuremodesandeffectsanalysisinvolveswhatactivity #infailuremodeandeffectsanalysis #whatisfailuremodeeffectsanalysis #whatisthedifferencebetweenfaulttreeanalysisandfmea
Simon Di Nucci https://www.safetyartisan.com/2024/04/24/failure-mode-effects-analysis/


How to do Preliminary Hazard Analysis with Mil-Std-882E
How to do Preliminary Hazard Analysis with Mil-Std-882E
In this 45-minute session, I look at how to do a Preliminary Hazard Analysis with Mil-Std-882E. Preliminary Hazard Analysis, or PHA, is Task 202 in the Standard.

I explore Task 202's aim, description, scope, and contracting requirements. There's value-adding commentary, and I explain the issues with PHA - how to do it well and avoid the pitfalls.

Now, I have worked in System Safety since 1996, and I think that PHA is one of the three tasks that EVERY project needs to do. The other two are:

- Preliminary Hazard Identification (PHI, Task 201); and

- System Requirements Hazard Analysis (SRHA, Task 203).

I look at these three tasks together in my course 'Foundations of Safety Assessment'. This is one of five linked courses on Mil-Std-882E. They will teach you how to get the maximum benefits from your System Safety Program.

https://youtu.be/gzKcj2to3uU
This is the seven-minute-long demo video. The full video is 45 minutes long.

get the video as part of the course:'foundations of safety assessment'

Topics: How to do Preliminary Hazard Analysis

- Task 202 Purpose;

- Task Description;

- Recording & Scope;

- Risk Assessment (Tables I, II & III);

- Risk Mitigation (order of preference);

- Contracting; and

- Commentary.

Transcript: How to do Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you'll find professional, pragmatic, and impartial safety training resources. So, we’ll get straight on to our session, which is on the 8th of February 2020.

Preliminary Hazard Analysis

Now we're going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It's very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It's one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I'm doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording, and scope. How we do risk assessments against Tables 1, 2, and 3. These tables describe severities, likelihoods, and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting, and then a short commentary from myself. I’m providing commentary all the way through. So, let's crack on.

Task 202 Purpose

The purpose of Task 202, as it says, is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks, and identify potential mitigation measures. We're going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I've highlighted some stuff that I particularly want to talk about.

It says “the contractor” , but it doesn't matter who is doing the analysis, and, the customer needs to do something to inform themselves, otherwise they won't understand what they're doing.  Whoever does it needs to perform and document a PHA. It's about determining initial risk assessments. There's going to be more work, more detailed work done later. But for now, we're doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions we propose to introduce. That's very important. We don't need a design to do this. We can get in early when we have user requirements, functional requirements, and that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that's another big issue. We'll talk about that more later. It says to include mishap data as well, if accessible: American term mishap, means an accident, but we're avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, or whatever. It's a mishap. It's an undesirable event.

We look for accessible data from similar systems, legacy systems, and other lessons learned. I've talked about that a little bit in the Task 201 lesson, and there’s more on that today under commentary. We need to look at provisions, and alternatives, meaning design provisions and design alternatives to reduce risks and add mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What's the task description? That's a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we've got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call a hazard tracking system; we might call it a hazard log; we might call it a risk register. It doesn't matter what it's called. The key point is it's a tracking system. It's a live document, as people say, it's a spreadsheet or a database, something like that. It's something relatively easy to update and change.

Also, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and refine them and change them as time goes on. Very important point.

That's it for the Demo...

End: How to do Preliminary Hazard Analysis

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.
#buyhazardanalysistraining #buysafetyrequirementscourse #gethazardanalysistraining #getsafetyrequirementscourse #hazardanalysisandriskassessmentpdf #hazardanalysisandriskassessmenttemplate #hazardanalysisguide #hazardanalysismethod #hazardanalysistechnique #hazardanalysisthatworks #hazardanalysistutorial #hazardanalysisvideo #healthandsafetyrequirement #howtoanalyzehazards #PHA #requirementforsafetyofficer #safetyregulationintheworkplace #safetyrequirement #safetyrequirements #safetyrequirementstechnique #safetyrequirementstips #safetyrequirementstraining #safetyrequirementstutorial #safetyrequirementsvideo #safetystandardcertificate #safetystandardiso #studyhazardanalysis #Task202 #waystoanalyzehazards
Simon Di Nucci https://www.safetyartisan.com/2024/03/13/how-to-do-preliminary-hazard-analysis/


Lessons Learned from a Fatal Accident
Lessons Learned from a Fatal Accident
Lessons Learned: in this 30-minute video, we learn lessons from an accident in 2016 that killed four people on the Thunder River Rapids Ride in Queensland. The coroner's report was issued this year, and we went through the summary of that report. In it, we find failings in WHS Duties, Due Diligence, risk management, and failures to eliminate or minimize risks So Far As is Reasonably Practicable (SFARP). We do not 'name and shame', rather we focus on where we can find guidance to do better.

https://youtu.be/QaSoFld7W0g
In 2016, four people died on the Thunder River Rapids Ride.

Lessons Learned: Key Points

We examine multiple failings in:

- WHS Duties;

- WHS Due Diligence;

- Risk management; and

- Eliminating or minimizing risks So Far As is Reasonably Practicable (SFARP).

Transcript: Lessons Learned from a Theme Park Tragedy

Introduction

Hello, everyone, and welcome to the Safety Artisan: purveyors of fine safety engineering training videos and other resources. I'm Simon, and I'm your host and today we're going to be doing something slightly different. So, there are no PowerPoint slides. Instead, I'm going to be reading from a coroner's report from a well-known accident here in Australia, and we're going to be learning some lessons in the context of WHS workplace health and safety law.

Disclaimer

Now, I'd just like to reassure you before we start that I won't be mentioning the names of the deceased. I won't be sharing any images of them. And I'm not even going to mention the firm that owned the theme park because this is not about bashing people when they're down. It's about us as a community learning lessons when things go wrong to fix the problem, not the blame. So that's what I'd like to emphasize here.

The Coroner's Report

So, I'm just going to I'm just turning to the summary of the coroner's report. The coroner was examining the deaths of four people back in 2016 on what was called the Thunder River Rapids Ride. Or TRRR or TR3 for short because it's a bit of a mouthful. This was a water ride, as the name implies, and what went wrong was that the water level dropped. Rafts, these circular rafts that went down the rapids, went down the chute, got stuck. Another raft came up behind the stuck raft and went into it. One of the rafts tipped over. These rafts seat six people in a circular configuration. You may have seen them. They're in - different versions of this ride are in lots of theme parks.

But out of the six, unfortunately, the only two escaped before people were killed, tragically. So that's the background. That happened in October 2016, I think it was. The coroner's report came out a few months ago, and I've been wanting to talk about it for some time because it illustrates very well several issues where WHS can help us do the right thing.

WHS Duties

So, first of all, I'm looking at the first paragraph in the summary, the coroner starts off; the design and construction of the TRRR at the conveyor and unload area posed a significant risk to the health and safety of patrons. Notice that the coroner says the design and construction. Most people think that WHS only applies to workplaces and people managing workplaces, but it does a lot more than that. Sections 22 through 26 of the Act talk about the duties of designers, manufacturers, importers, suppliers, and then people who commissioned, installed, et cetera.

So, WHS supplies duties on a wide range of businesses and undertakings, and designers and constructors are key. There are two of them. Now, it's worth noting that there was no importer here. The theme park, although the TRRR ride was similar to a ride available commercially elsewhere, for some reason, they chose to design and build their version in Queensland. Don't know why. Anyway, that doesn't matter now. So, there was no importer, but otherwise, even if you didn't design and construct the thing, if you imported it, the same duties still apply to you.

No Effective Risk Assessment

So, the coroner then goes on to talk about risks and hazards and says each of these obvious hazards posed a risk to the safety of patrons on the ride and would have been easily identifiable to a competent person had one ever been commissioned to conduct a risk and hazard assessment of the ride. So, what the coroner is saying there is, “No effective risk assessment has been done”. Now, that is contrary to the risk management code of practice under WHS and also, of course, that the definition of SFARP, so far as reasonably practicable, basically is a risk assessment or risk management process. So, if you've not done effective risk management, you can't say that you've eliminated or minimized risks SFARP, which is another legal requirement. So, a double whammy there.

Then moving on. “Had noticed been taken of lessons learned from the preceding incidents, which were all of a very similar nature …” and then he goes on. That's the back end of a sentence where he says, you didn't do this, you had incidents on the ride, which are very similar in the past, and you didn't learn from them. And again, concerning reducing risks SFARP, Section 18 in the WHS Act, which talks about the definition of reasonably practicable, which is the core of SFARP, talks about what ought to have been known at the time.

So, when you're doing a risk assessment or maybe you're reassessing risk after a modification - and this ride was heavily modified several times or after an incident - you need to take account of the available information. And the owners of TRRR the operators didn't do that. So, another big failing.

The coroner goes on to note that records available concerning the modifications to the ride are scant and ad hoc. And again, there's a section in the WHS risk management code of practice about keeping records. It's not that onerous. I mean, the COP is pretty simple but they didn't meet the requirement of the code of practice. So, bad news again.

Due Diligence

And then finally, I’ve got to the bottom of page one. So, the coroner then notes the maintenance tasks undertaken on the ride whilst done so regularly and diligently by the staff, seemed to have been based upon historical checklists which were rarely reviewed despite the age of the device or changes to the applicable Australian standards. Now, this is interesting. So, this is contravening a different section of the WHS Act.

Section 27, talks about the duties of officers and effectively that sort of company directors, and senior managers. Officers are supposed to exercise due diligence. In the act, due diligence is fairly simple- It's six bullet points, but one of them is that the officers have to sort of keep up to date on what's going on in their operation. They have to provide up-to-date and effective safety information for their staff. They're also supposed to keep up with what's going on in safety regulations that apply to their operation. So, I reckon in that one statement from the coroner then there's probably three breaches of due diligence there to start with.

Risk Controls Lacking

We've reached the bottom of page one- Let's carry on. The coroner then goes on to talk about risk controls that were or were not present and says, “in accordance with the hierarchy of controls, plant and engineering measures should have been considered as solutions to identified hazards”. So in WHS regulations and it’s repeated in the risk code of practice, there's a thing called the hierarchy of controls. It says that some types of risk controls are more effective than others and therefore they come at the top of the list, whereas others are less effective and should be considered last.

So, top of the list is, “Can you eliminate the hazard?” If not, can you substitute the hazardous thing for something else that's less hazardous- or with something else that is less hazardous, I should say? Can you put in engineering solutions or controls to control hazards? And then finally, at the bottom of my list are admin procedures for people to follow and then personal protective equipment for workers, for example. We'll talk about this more later, but the top end of the hierarchy had just not been considered or not effectively anyway.

A Predictable Risk

So, the coroner then goes on to say, “rafts coming together on the ride was a well-known risk, highlighted by the incident in 2001 and again in 2004”. Now actually it says 2004, I think that might be a typo. Elsewhere, it says 2014, but certainly, two significant incidents were similar to the accident that killed four people. And it was acknowledged that various corrective measures could be undertaken to, quote, “adequately control the risk of raft collision”.

However, a number of these suggestions were not implemented on the ride. Now, given that they've demonstrated the ability to kill multiple people on the ride with a raft collision, it's going to be a very, very difficult thing to justify not implementing controls. So, given the seriousness of the potential risk, to say that a control is feasible is practicable, but then to say “We're not going to do it. It's not reasonable”. That's going to be very, very difficult to argue and I would suggest it's almost a certainty that not all reasonably practicable controls were implemented, which means the risk is not SFARP, which is a legal requirement.

Further on, we come back to document management, which was poor with no formal risk register in place. So, no evidence of a proper risk assessment. Members of the department did not conduct any holistic risk assessments of rides with the general view that another department was responsible. So, the fact that risk assessment wasn't done - that's a failure. The fact that senior management didn't knock heads together and say “This has to be done. Make it happen”- That's also another failing. That’s a failing of due diligence, I suspect. So, we've got a couple more problems there.

High-Risk Plant

Then, later on, the coroner talks about necessary engineering oversight of high-risk plant not being done. Now, under WHS act definitions, amusement rides are counted as high-risk plant, presumably because of the number of serious accidents that have happened with them over the years. The managers of the TRRR didn't meet their obligations concerning high-risk plants. So, some things that are optional for common stuff are mandatory for high-risk plants, and those obligations were not met it seems.

And then in just the next paragraph, we reinforce this due diligence issue. Only a scant amount of knowledge was held by those in management positions, including the general manager of engineering, as to the design modifications and past notable incidents on the ride. One of the requirements of due diligence is that senior management must know their operations, and know the hazards and risks associated with the operations. So for the engineering manager to be ignorant about modifications and risks associated with the ride, I think is a clear failure of due diligence.

Still talking about engineering, the coroner notes “it is significant that the general manager had no knowledge of past incidents involving rafts coming together on the ride”. Again, due diligence. If things have happened those need to be investigated and learned from and then you need to apply fresh controls if that's required. And again, this is a requirement. So, this shows a lack of due diligence. It's also a requirement in the risk management code of practice to look at things when new knowledge is gained. So, a couple more failures there.

No Water-Level Detection, Alarm, Or Emergency Stop

Now, it said that the operators of the ride were well aware that when one pump failed, and there were two, the ride was no longer able to operate with the water level dropping dramatically, stranding the rafts on the steel support railings. And of course, that's how the accident happened. Regardless, there was no formal means by which to monitor the water level of the ride and no audible alarm to advise one of the pumps had ceased to operate. So, a water level monitor? Well, we're talking potentially about a float, which is a pretty simple thing. There's one in every cistern, in every toilet in Australia. Maybe the one for the ride would have to be a bit more sophisticated than that- A bit industrial grade but the same principle.

And no alarm to advise the operators that this pump had failed, even though it was known that this would have a serious effect on the operation of the ride. So, there are multiple problems here. I suspect you'll be able to find regulations that require these things. Certainly, if you looked at the code of practice on plant design because this counts as industrial plants, it's a high-risk plant, so you would expect very high standards of engineering controls on high-risk plants and these were missing. More on that later.

In a similar vein, the coroner says “a basic automated detection system for the water level would have been inexpensive and may have prevented the incident from occurring”. So basically, the coroner is saying this control mechanism would have been cheap so it's certainly reasonably practicable. If you've got a cheap control that will prevent a serious injury or a death, then how on earth are you going to argue that it's not reasonable to implement it? The onus is on us to implement all reasonably practical controls.

And then similarly, the lack of a single emergency stop on the ride, which was capable of initiating a complete shutdown of all the mechanisms, was also inadequate. And that's another requirement from the code of practice on plant design, which refers back to WHS regulations. So, another breach there.

Human Factors

We then move on to a section where it talks about operators, operators’ accounts of the incident, and other human factors. I'm probably going to ask my friend Peter Bender, who is a Human Factors specialist, to come and do a session on this and look at this in some more detail, because there are rich pickings in this section and I'm just going to skim the surface here because we haven't got time to do more.

The coroner says “it's clear that these 38 signals and checks to be undertaken by the ride operators was excessive, particularly given that the failure to carry out any one could potentially be a factor which would contribute to a serious incident”. So clearly, 38 signals and checks were distributed between two ride operators, because there was no one operator in control of the whole ride- that's a human factors nightmare for a start- but clearly, the work designed for the ride was poor. There is good guidance available from Safe Work Australia on good work design so there's no excuse for this kind of lapse.

And then the coroner goes on to say, reinforcing this point that the ride couldn't be safely controlled by a human operator. The lack of engineering controls on a ride of this nature is unjustifiable. Again, reinforces the point that risk was not SFARP because not all reasonably practicable controls had been implemented. Particularly controls at the higher end of the hierarchy of controls. So, a serious failing there.  

(Now, I've got something that I'm going to skip, actually, but - It's a heck of a comment, but it's not relevant to WHS.)

Training And Competence

We're moving on to training and competence. Those responsible for managing the ride whilst following the process and procedure in place - and I'm glad to see you from a human practice point of view that the coroner is not just trying to blame the last person who touched it. He's making a point of saying the operators did all the right stuff. Nevertheless, they were largely not qualified to perform the work for which they were charged.

The process and procedures that they were following seemed to have been created by unknown persons. Because of the poor record-keeping, presumably who it is safe to assume lacked the necessary expertise. And I think the coroner is making a reasonable assumption there, given the multiple failings that we've seen in risk management, in due diligence, in record-keeping, in the knowledge of key people, et cetera, et cetera. It seems that the practice at the park was simply to accept what had always been done in terms of policy and procedure.

And despite changes to safety standards and practices happening over time, because this is an old ride, only limited and largely reactionary consideration was ever given to making changes, including training, provided to staff. So, reactionary -bad word. We're supposed to predict risk and prevent harm from happening. So, multiple failures in due diligence here and on staff training, providing adequate staff training, providing adequate procedures, et cetera.

The coroner goes on to say, “regardless of the training provided at the park, it would never have been sufficient to overcome the poor design of the ride. The lack of automation and engineering controls”. So, again, the hierarchy of controls was not applied, and relatively cheap, engineering controls were not used, placing an undue burden on the operator. Sadly, this is all too common in many applications. This is one of the reasons they are not naming the ride operators or trying to shame them because I've seen this happen in so many different places. It wouldn't be fair to single these people out.

‘Incident-Free’ Operations?

Now we have a curious, a curious little statement in paragraph 1040. The coroner says “submissions are made that there was a 30-year history of incident-free operation of the ride”. So, what it looks like is that the ride operators, and management, trying to tell the coroner that they never had an incident on the ride in 30 years, which sounds pretty impressive, doesn't it, at face value?

But of course, the coroner already knew or discovered later on that there had been incidents on the ride. Two previous incidents were very similar to the fatal accident. Now, on the surface, this looks bad, doesn't it? It looks like the ride management was trying to mislead the coroner. I don't think that's the case because I've seen many organizations do poor incident reporting, poor incident recording, and poor learning from experience from incidents. It doesn't surprise me that the senior management was not aware of incidents on their ride. Unfortunately, it's partly human nature.

Nobody likes to dwell on their failures or think about nasty things happening, and nobody likes to go to the boss saying we need to shut down a moneymaking ride. Don't forget, this was a very popular ride. We need to shut down a moneymaking ride to spend more money on modifications to make it safer. And then management turns around and says, “Well, nobody's been hurt. So, what's the problem?” And again, I've seen this attitude again and again, even on people operating much more sophisticated and much more dangerous equipment than this. So, whilst this does look bad- the optics are not good, as they like to say. I don't think there's a conspiracy going on here. I think it's just stupid mistakes because it's so common. Moving on.

Standards

Now the coroner goes on to talk about standards not being followed, particularly when standards get updated over time. Bearing in mind this ride was 30 years old. The coroner states “it is essential that any difference in these standards are recognized and steps taken to ensure any shortfalls with a device manufactured internationally is managed”. Now, this is a little bit of an aside, because as I've mentioned before, the TRRR was actually designed and manufactured in Australia. Albeit not to any standards that we would recognize these days. But most rides were not and this highlights the duties of importers. So, if you import something from abroad, you need to make sure that it complies with Australian requirements. That's a requirement, that's a duty under WHS law. We'll come back to this in just a moment.
#coursesafetyengineering #duediligence #engineersafety #fatalaccident #ineedsafety #knowledgeofsafety #learnsafety #lessonslearned #needforsafety #riskmanagement #safetyblog #safetydo #safetyengineer #safetyengineerskills #safetyengineertraining #safetyengineeringcourse #safetyprinciples #SFARP #softwaresafety #theneedforsafety #themeparkaccident #thunderriverrapidsride #WHS
Simon Di Nucci https://www.safetyartisan.com/2023/12/06/lessons-learned-from-a-fatal-accident/


Introduction to Human Factors
Introduction to Human Factors
In this 40-minute video, 'Introduction to Human Factors', I am very pleased to welcome Peter Benda to The Safety Artisan.

Peter is a colleague and Human Factors specialist who has 23 years' experience in applying Human Factors to large projects in all kinds of domains. In this session, we look at some fundamentals: what does Human Factors engineering aim to achieve? Why do it? And what sort of tools and techniques are useful?

This is The Safety Artisan, so we also discuss some real-world examples of how erroneous human actions can contribute to accidents. (See this post for a fuller example of that.) And, of course, how the Human Factors discipline can help to prevent them.

https://youtu.be/FnL4XuLlvoQ
In 'Introduction to Human Factors', Peter explains these vital terms to us!

Topics

- Introducing Peter;

- The Joint Optimization Of Human-Machine Systems;

- So why do it (HF)?

- Introduction to Human Factors;

- Definitions of Human Factors;

- The Long Arm of Human Factors; and

- What is Human Factors Integration?

Introduction to Human Factors: Transcript

Introduction

Simon:  Hello, everyone, and welcome to the Safety Artisan: Home of Safety Engineering Training. I'm Simon, and I'm your host, as always. But today we are going to be joined by a guest, a Human Factors specialist, a colleague, and a friend of mine called Peter Benda. Now, Peter started as one of us, an ordinary engineer, but unusually, perhaps for an engineer, he decided he didn't like engineering without people in it. He liked the social aspects and the human aspects, and so he began to specialise in that area. And today, after twenty-three years in the business, and a first degree and a master's degree in engineering with a Human Factors speciality. He's going to join us and share his expertise with us.

So that's how you got into it then, Peter. For those of us who aren't really familiar with Human Factors, how would you describe it to a beginner?

Peter:   Well, I would say it's The Joint Optimization Of Human-Machine Systems. So it's really focusing on designing systems, perhaps help holistically would be a term that could be used, where we're looking at optimizing the human element as well as the machine element. And the interaction between the two. So that's really the key to Human Factors. And, of course, there are many dimensions from there: environmental, organisational, job factors, human and individual characteristics. All of these influence behaviour at work and health and safety. Another way to think about it is the application of scientific information concerning humans to the design of systems. Systems are for human use, which I think most systems are.

Simon:  Indeed. Otherwise, why would humans build them?

Peter:   That's right. Generally speaking, sure.

Simon:  So, given that this is a thing that people do, then. Perhaps we're not so good at including the human unless we think about it specifically?

Peter:   I think that's fairly accurate. I would say that if you look across industries, and industries are perhaps better at integrating Human Factors considerations or Human Factors into the design lifecycle, that they have had to do so because of the accidents that have occurred in the past. You could probably say this about safety engineering as well, right?

Simon:  And this is true, yes.

Peter:   In a sense, you do it because you have to, because the implications of not doing it are quite significant. However, I would say the upshot, if you look at some of the evidence –and you see this also across software design and non-safety critical industries or systems –that taking into account human considerations early in the design process typically ends up in better system performance. You might have more usable systems, for example. Apple would be an example of a company that puts a lot of focus into human-computer interaction and optimizing the interface between humans and their technologies and ensuring that you can walk up and use it fairly easily. Now as time goes on, one can argue how out how well Apple is doing something like that, but they were certainly very well known for taking that approach.

Simon:  And reaped the benefits accordingly and became, I think, they were the world's number one company for a while.

Peter:   That's right. That's right.

Simon:  So, thinking about the “So why do it?” What is one of the benefits of doing Human Factors well?

Peter:   Multiple benefits, I would say. Clearly, safety and safety-critical systems, like health and safety, Performance, system performance, Efficiency and so forth. Job satisfaction and that has repercussions that go back into, broadly speaking, that society. If you have meaningful work that has other repercussions, and that's sort of the angle I originally came into all of this from. But, you know, you could be looking at just the safety and efficiency aspects.

Simon:  You mentioned meaningful work: is that what attracted you to it?

Peter:   Absolutely. Absolutely. Yes. Yes, as I said, I had a keen interest in the sociology of work and looking at work organisation. Then, for my master's degree, I looked at lean production, which is the Toyota approach to producing vehicles. I looked at multiskilled teams and multiskilling, and job satisfaction. Then, looking at stress indicators and so forth versus mass production systems. So that's really the angle I came into this. If you look at it, mass production lines where a person is doing the same job over and over, it’s quite repetitive and very narrow, versus the more Japanese-style lean production. There are certainly repercussions, both socially and individually, from a psychological health perspective.

Simon:  So, you get happy workers and more contented workers -

Peter:   – And better quality, yeah.

Simon:  And again, you mentioned Toyota. Another giant company that's presumably grown partly through applying these principles.

Peter:   Well, they’re famous for quality, aren't they? Famous for reliable, high-quality cars that go on forever. I mean, when I moved from Canada to Australia, Toyota had a very, very strong history here with the Land Cruiser, and the Hilux, and so forth.

Simon:  All very well-known brands here. Household names.

Peter: They are known to be bombproof and can outlast any other vehicle. And the lean production system certainly has, I would say, quite a bit of responsibility for the production of these high-quality cars.

Simon:  So, we've spoken about how you got into it and “What is it?” and “Why do it?” I suppose, as we've said, what it is in very general terms, but I suspect a lot of people listening will want to know to define what it is, what Human Factors is, based on doing it. On how you do it. It's a long, long time since I did my Human Factors training. Just one module in my master's, so could you take me through what Human Factors involves these days in broad terms?

Peter:   Sure, I actually have a few slides that might be useful –  

Simon:  – Oh, terrific! –

Peter:   – Maybe I should present that. So, let me see how well I can share this. And of course, sometimes the problem is I'll make sure that – maybe screen two is the best way to share it. Can you see that OK?

Simon:  Yeah, that's great...

(See the video for the full content)

Introduction to Human Factors: Leave a Comment!
#coursesafetyengineering #engineersafety #HF #humanfactors #humanmachinesystems #ineedsafety #jointoptimization #knowledgeofsafety #learnsafety #needforsafety #safetyblog #safetydo #safetyengineer #safetyengineerskills #safetyengineertraining #safetyengineeringcourse #safetyprinciples #safetytraining #softwaresafety #theneedforsafety
Simon Di Nucci https://www.safetyartisan.com/2023/08/02/introduction-to-human-factors/

The 2023 Digest The 2023 Digest brings you all The Safety Artisan's blog posts from last year. I hope that you find this a useful resou...