Category Archives: Case Study

A new systems approach to safety management with applications to arctic ship navigation

  1. Introduction This study is intended to improve the techniques available to safety assessors and provide tools for decision making in safety management. This is done by fostering a new paradigm for safety management, which forms the basis for the performance measurement and process mapping/monitoring (PMPM) method.

The research examines safety management philosophies and compares methods. This examination is intended to provide a broad understanding of the fundamental safety and risk concepts.

The FRAM was adopted for Arctic ship navigation: where three captains were interviewed to form the basis for a functional map of the way ship navigation work can be performed. Also, variations in the ways ship navigation work is performed was recorded from the captains to help understand some of the ways captains may adjust their work to the dynamic conditions they face.

Figure 1 – FRAM model for ship navigation with input from ship navigators

Two additions to the FRAM are presented in this work: 1) functional signatures and 2) system performance measurements. Functional signatures provide a method for assessors to animate the FRAM and visualize the functional dynamics over time. (figure 2 ) System performance measurement provides a way to bring an element of quantification to the FRAM. Quantification can then be used to help compare different scenarios and support decisions. These additions to the FRAM have been demonstrated using data from an ice management ship simulator experiment. The demonstration can be used as a basis to continue future analysis of using this method in the maritime domain or transfer this approach to other domains.

Figure 2 – A functional signature for a given time (t)

  1. Safety Management In this paper, three approaches to safety are examined: fault trees (FT), Bayesian networks (BN), and the Functional Resonance Analysis Method (FRAM). A case study of a propane feed control system is used to apply these methods. In order to make safety improvements to industrial workplaces high understanding of the systems is required. It is shown that consideration of the chance of failure of the system components, as in the FT and BN approaches, may not provide enough understanding to fully inform safety assessments. FT and BN methods are top-down approaches that are formed from the perspective of management in workplaces. The FRAM methodology uses a bottom-up approach from the operational perspective to improve the understanding of the industrial workplace. The FRAM approach can provide added insight to the human factor and context and increase the rate at which we learn by considering successes as well as failures.

  2. Ship Navigation A methodology is presented on how to apply the FRAM to a domain, with a focus on ship navigation. The method draws on ship navigators to inform the building of the model and to learn about practical variations that must be managed to effectively navigate a ship. The Exxon Valdez case is used to illustrate the model’s utility and provide some context to the information gathered by this investigation. The functional signature of the work processes of the Exxon Valdez on the night of the grounding is presented. This shows the functional dynamics of that particular ship navigation case, and serves to illustrate how the FRAM approach can provide another perspective on the safety of complex operations.

  3. Resilience The concepts of resilience, such as robustness and rapidity, can be used to inform safety management decisions. A methodology is presented that uses quantitative techniques of system performance measurement and qualitative understanding of functional execution from the Functional Resonance Analysis Method (FRAM) to gain an understanding of these resilience concepts. Examples of robustness and rapidity using this methodology are illustrated, and how they can help operators manage their operation is discussed.

  4. Operational Dynamics In this paper, a method is presented for visualizing and understanding the operational dynamics of a shipping operation. The method uses system performance measurement and functional signatures. System performance measurement allows assessors to understand the level of performance that is being achieved by the operation. The functional signatures then provide insight into the functional dynamics that occur for each level of performance. By combining system performance measurement with functional signatures, there is a framework to help understand what levels of performance are being achieved and why certain levels of performance are being achieved. The insight gained from this approach can be helpful in managing shipping operations. Data from an ice management ship simulator is used to demonstrate this method and compare different operational approaches.

 

The Simulator Experiments

An experiment was done using a ship simulator configured for an ice management operation.

Figure 3 – Sketch of the Ice Management Simulator setup

Thirty-three participants used the simulator to execute an operation that consisted of clearing pack ice from a lifeboat launch site at an offshore petroleum installation. The Own-ship (the vessel in which the simulation takes place) is modelled on an Anchor Handling Tug Supply (AHTS) vessel. An array of five computers collected data during the simulations. This included a time history of ice concentration within a specified zone, as well as position, speed, and heading. A video “Replay” file was also recorded during each simulation, which upon playback showed the entire simulation from start to finish. Figure 4 shows a screenshot example from such a Replay video.

Figure 4 – Snapshot of a replay file

The data analysis of this experiment consisted of assessing the overall performance of each participant and determining the functional signatures for each participant, as per the methodology section. The metric used to define the performance of each participant is the percentage of time that the lifeboat launch zone was free of ice. Each participant performed ice management for 30 mins, so the best performing participants were deemed to have kept the area under the lifeboat launch zone ice free for the longest amount of time within the 30-minute simulation. The lifeboat launch zone was defined as a circular area of radius 8 m located 8 m off the port quarter where the lifeboat davits are located. An image processing script was then used to determine if ice was present in the lifeboat launch zone.

In order to determine when decisions and actions were made by the navigator, the functional signature was approximated. It is not known exactly when the participant was trying to make a course change (speed or heading), but it can be approximated by examining the peaks and troughs in the speed trace. A trough implies that a speed change was made to increase speed and a peak implies a speed change was made to decrease speed.

The output for observing ice conditions was also approximated. It was assumed that the navigator checked the ice conditions in the lifeboat zone at least once every 30 s. This was the resolution of the data for the presence of ice in the lifeboat zone.

Times when the speed of 3 knots was exceeded and very high ice loads occurred were flagged. This can help understand when the highest ice loads were on the vessel, and particularly, the relationship between the highest ice loads and speeds above the regulatory maximum as imposed by the POLARIS system.

Based on these criteria, a case file was generated for each participant. The case file contained time stamped events, such as speed and heading changes, ice observations, speed limit violations, and very high ice loads.

After the functional signatures were approximated and the performance quantified for each participant, the functional signatures were compared. This can be a basis for understanding why one person performed better than another, and also for identifying practices that are common to high or low performance types. The functional signatures contain information pertaining to the function execution for each participant, including the outputs of tasks, the relationships between them, and the times at which the tasks occur.

Figure 5 – Snapshot of functional signature for V42 at 0 seconds

The first step is to bin the performance measurements from Figure 5.7 to “group” the data. The bins can be setup to the desired levels of granularity that the assessor wishes to investigate. In this assessment, the bins were chosen to be 0-25%, 25-50%, and 50-75% to represent poor performance, medium performance, and high performance, respectively, (see Figure 5.15). The groups are then examined using a boxplot.

The groups were then examined to understand the functional activity of each group. This measure can provide insight into the level of functional activity that occurs in each group. Figure 5.16 shows the functional activity for the 3 groups in this assessment. For each group there is a wide variation in functional activity, with the 0.25-0.5 group having the least variability.

The temporal distribution of the functional signatures can be examined as well. Figure 5.17 shows the time distribution of active functions. It shows that the high-performance group is more functionally active in the earlier part of the simulation than the other 2 groups. Similarly, the time distributions for each specific function can be examined this way.

The variability for the functional outputs can also be monitored, which can be used to help understand the nature of the output variability for certain functions. For instance, the vessel speed is an output of the “monitor vessel parameters” function. This output is displayed in the functional signature every time the “monitor vessel parameters” function is active. Figure 5.18 shows the distribution of vessel speed for the participants’ speed changes

The functional signatures promote the monitoring of many system parameters by way of functional outputs. This allows certain system parameters, such as regulations, to be examined. In many systems, regulations are created to improve safety, but rarely are the effects of the regulation checked to see if they are as intended. Also, the possibility that a regulation could have unintended effects on the system can be examined.

Figure 6 – Components of PMPMM method for safety management

  1. Conclusions Operational practices influence performance of shipping operations. It is not always obvious which practices will produce certain outcomes because of the dynamic conditions in which ships operate. This paper presents a method to help visualize the way certain practices influence the performance of an operation. The method is demonstrated through the application of an ice management simulator experiment. A metric is used to measure the performance of each participant. This helps understand the level of performance that is being achieved, but does not help understand why certain levels of performance are being achieved. In order to provide more insight into why participants are achieving low or high performance, functional signatures are used to monitor the system functionality. This paper demonstrates some of the ways a comparison may be made to examine the performance data. In this example, enough insight was obtained to understand some qualities of high and low performance and suggest an approach for improving future performance. These are valuable insights for system management.

Publication

Smith, D. (2019)  A NEW SYSTEMS APPROACH TO SAFETY MANAGEMENT WITH APPLICATIONS TO ARCTIC SHIP NAVIGATION,  A Thesis submitted to the School of Graduate Studies in partial fulfilment of the requirements for the degree of Doctor of Philosophy Faculty of Engineering and Applied Science Memorial University of Newfoundland.

 

“Safe” software and AI systems? – the Horizon Example

Programming is a human task, and programmers make mistakes; an error rate in writing software code of 10 errors per thousand lines of code is considered good, 1 error per thousand lines is rarely if ever achieved.”                                                                    Harold Thimbleby et al

The current public inquiry into the causes and implications of the failures of the Post Office’s Horizon software, has served to bring to the fore an issue which has been a problem for software and safety engineers for a long time. The issue has been outlined1 in a think piece which asks the question as to why we have not manged to do this more effectively to date. One of the main problems seems to be the lack of a universally acceptable and accepted method of demonstrating this to designers and users alike. This has resulted in a reliance on a catalogue of qualitative assurances from the number of precautions and tests involved that the system must be safe. But in reality, we are all aware that this is more hope than confidence. Software is getting more capable, but also more complex all the time. We have a real problem with assuring ourselves that the coding does exactly what it says ‘on the tin’, no more and no less. With more conventional engineering systems, risk assessments and safety cases would be made by analysing and predicting the reliability and security of the system from detailed engineering process flow or wiring diagrams

Unfortunately, software systems are not built that way, and the necessary detailed documentation is almost impossible to construct, or to find. This is because they are predominantly built in an “agile” way, involving groups and teams progressing through sprints and scrums, to add layer after layer of developing code, one on top of another (like papier mâché?) to form “the package”, (essentially a black box?). So, the only way to demonstrate reliability, security and safety, in its intended application is to test, test, test in development and monitor continuously in use. And in use we know that errors and bugs are inevitable, common, frequent and (Perrow) “normal”! – thus we accept this reality and hope it is acceptable?

So how can we develop a way of producing the realistic system “models” that we need to systematically probe for performance in operation. Many attempts have been made using conventional approaches to detail the hard wiring diagrams of what is happening (e.g. Model Based System Engineering, MBSE) so that established quantitative methodologies such as “Fault and Event Trees”, Probabilistic Risk (or Reliability?) Analysis, and HAZOP’s can be carried out. The problem is the resource intensity and detailed databases need and the abovementioned lack of definitive “wiring diagrams” for the integrated software packages.

Thus, in an increasingly complex world there is a real, urgent need for methodologies to enable engineers to model complex socio-technical systems, as these now seem to encompass the majority of systems in use today. This is of course exacerbated by the increasing involvement and augmentation with “black box” AI contributions. We need methodologies which will allow the analyst insights into these complex systems’. A group of safety system professionals in the Safety Critical Systems Club are actively concerned and involved in finding better, more responsible and transparent way of assuring the safety of the black boxes and they do indeed ‘do what it says on the tin’, no more, no less!

This case study looks at at an approach developed to model systems as sets of interactive, interdependent “functions”, (abstracted from agent or component details, FRAM, (Hollnagel, 2020)) and this has now been developed to the point where it can take the basic data and structures from the current component focussed system engineering “models”, and can pull it all together into dynamic models, (as opposed to static, fixed System Theoretic Process Accimaps), from which analysts can discern how they really work in practice, and predict the emergent behaviours characteristic of complex systems. It can now provide the numbers and a quantitative approach that the model-based system engineering applications demand. Furthermore, as the methodology merely builds the system “visualisation “, or FRAM model, it still needs the safety professional to analyse the model to discern behaviours expected and emergent

The first step is to define the system under considerations and hen complie a list of the functions needed to deliver the processes involved.

These functions encompass the entire range of activities involved in the Post Office transaction process, from initiating a transaction to completing it, including all the critical steps for security, accounting, and operational management in between. They provide a comprehensive framework for the FRAM model, allowing for a detailed analysis of the system’s functionalities and interdependencies. Using the FRAM Model Visualiser (FMV), the following FRAM model was built (Figure 1, below).

This case study outlines the steps for creating an initial FRAM model of a typical software solution to the counter operations Post Office Horizon assisted by ChatGPT 4.0. It reports on the initial attempts to develop and validate a better way to model and assure the performance of modern software packages. It sets out to address systematically the issues which are proving difficult to obtain consensus solutions to analysing and assuring the performance of safety-critical software systems. It thus looks at the potential for applying more advanced methods of modelling and analysing these systems.

  • The first approach to be investigated is the use of the Functional Resonance Analysis Method to build the system visualisations – models.
  • Secondly the feasibility of using LLM’s to produce initial outline systems models which can then be used to examine in detail the behaviours possible in these complex systems.

Publication

Slater, D. (“)”$), How do we make the case for “Safe” software and AI systems? – the Horizon Example, Published by the Safety-Critical Systems Club. All Rights Reserved

See also – https://www.linkedin.com/pulse/what-took-you-so-long-david-slater-ty14e%3FtrackingId=1B%252F7CL%252FXTrqpVhE1IvD64A%253D%253D/?trackingId=1B%2F7CL%2FXTrqpVhE1IvD64A%3D%3D

Monitoring Equipment Health Symptoms in the International Space Station (ISS),  using FRAM

In the International Space Station (ISS), multiple systems are operating to maintain the environment for astronauts, and the flight controllers are monitoring the status of systems for 24hours, 365 days a year. Although those systems have high reliability, there could be some anomalies for the systems in some cases. If an anomaly is detected, the flight controllers are supposed to investigate the trends of telemetries and assess impacts on operations.

Experienced flight controllers can detect those symptoms of anomaly based on unusual combinations of telemetries(funny data). However, it is generally difficult to identify those unusual combinations systematically because the number of combinations could be huge, i.e., at least 2􀬷􀬴 for30 telemetries for just binary type parameters whose value can take TRUE/FALSE. To address the issue, machine learning based models are expected to support anomaly prediction in terms of the combinations of those telemetries, without wasting huge state space.

Automatic anomaly detection methods have been proposed by several researchers, for the purpose of which machine learning based anomaly detection methods are widely used.

Those methods are effective for limited number of telemetries with known anomaly events. However, although those methods provide high accuracy for anomaly detection, explainability for operators is lacking there. To apply automatic anomaly symptom detections methods to ISS operations, it is required to provide flight operators with the rationale for the prediction because they cannot take actions without justification.

This case study demonstrates  the process of symptom detections for ISS operations and designs an automatic method to detect symptoms of anomaly with additional information for explaining reasons of detections. It presents a systemic symptom detection method by combining the Functional Resonance Analysis Method(FRAM) and the Specification Tools and Requirement Methodology-Requirement Language (SpecTRM-RL) with machine learning-based anomaly detection model. This system is utilizing the international patent technology(Nomoto et al., 2020). Figure 1 shows the overview of our proposed method, and the detail will be provided in the following subsections.

Figure 2 shows FRAM modelling of the process to detect symptoms.

Flight controllers monitor telemetries of assigned ISS operations. Then they find unusual trends for individual telemetry or anomaly by alerts of each telemetry if the observed values are over threshold. After symptom or anomaly detection, specialists assess the impact and perform trouble shootings for each anomaly. Our motivation is to enable them to assess symptoms with combinations of telemetries and provide additional information for further assessment.

Figure 2 – FRAM modeling of symptom detection process

Our FRAM model of systems related to TCA-L pump is shown in Figure 3. We made four patterns of FRAM models based on results of interviews with specialists.

Figure 3 – Functions related to possible causes in FRAM.

We compared the results of models with Pugh Concept selection as shown in Table 3. RMSE was lowest for model2 while the performance of early symptom detection of anomaly was high in model 3 and 4.

Table 3 – Results of Pugh Concept Selection

Discussing with specialists about the performances of models from several views, we selected model 4 for simulation as it is important to detect anomaly earlier with higher accuracy of predictions. Simulations with defined threshold were performed. We compared the simulation results with the threshold of two, three, or four-sigma. Consequently, four-sigma was chosen because the balance in the numbers of alerts was better than Table 2. Selected telemetries for each model

Figure 4 – Simulation results

Results of model 4 are shown in Figure 4. Red points are the values exceeding the threshold  that Alerts can be released to flight controllers based on the simulations.

Conclusions

This case study has  proposed a new method to provide additional information for explanations with FRAM and SpecTRM-RL. The proposed method was verified with an experiment on ISS systems. It enables the carrying  out of systemic analyses, overcoming the limitations of previous studies which have had difficulty in handling complex multiple factors. The experimental results implied the effectiveness of the method. Although further experiments with other systems and discussion with flight controllers and specialists are required for practical use, the proposed method is expected to use for several safety-critical systems in aerospace and other fields.

Interaction between drivers and automated vehicles – the case of driving in an overtaking scenario

Automated driving promises great possibilities in traffic safety advancement, frequently assuming that human error is the main cause of accidents, and promising a significant decrease in road accidents through automation. However, this assumption is too simplistic and does not consider potential side effects and adaptations in the socio-technical system that traffic represents.

Thus, a differentiated analysis, including the understanding of road system mechanisms regarding accident development and accident avoidance, is required to avoid adverse automation surprises, which is currently lacking. This case study  looked at a Resilience Engineering approach, using the functional resonance analysis method (FRAM) to reveal these mechanisms in an overtaking scenario a rural road to compare the contributions between the human driver and potential automation, in order to derive system design recommendations. Finally, this serves to demonstrate how FRAM can be used for a systemic function allocation for the driving task between humans and automation.

Thus, an in-depth FRAM model was developed for both agents based on document knowledge elicitation and observations and interviews in a driving simulator, which was validated by a focus group with peers. Further, the performance variabilities were identified by structured interviews with human drivers as well as automation experts and observations in the driving simulator. Then, the aggregation and propagation of variability were analysed focusing on the interaction and complexity in the system by a semi-quantitative approach combined with a Space-Time/Agency framework.

Since it is not sufficient to know only the theoretical mechanisms of the overtaking process, the next step is to create a WAD model using observations and interviews implemented in a driving simulator study which serves to update and enhance the WAI model into a more realistic overall model.

Here, a static driving simulator (see Figure 1) was used. The environment is simulated by three flat screens with a resolution of 4K covering the space from the left-side window to the right-side window of the car, which ensures a 120_ viewpoint in front. Additionally, the rear-view mirror is virtually displayed at the top of the centre screen. The side mirrors are displayed via two small monitors placed to the left and right of the subject.

The driver, seated on a default automobile seat that is adjustable in height and longitudinal direction, has a steering wheel for lateral control that can be adjusted along the axis, as well as an accelerator and brake pedal for longitudinal control. The use of a turn signal and shoulder view to the rear are not possible. Behind the steering wheel is a combination display that shows the engine speed and the current speed of the vehicle. Further, the driving simulator is equipped with automatic transmission and sound, consisting of engine, environmental, and vehicle noises that are reproduced via two speakers placed next to the pedals. During a test drive, the room was darkened to increase the immersion for the driver.

 SILAB 6.0 of the Würzburg Institute for Traffic Sciences GmbH in Germany was used as the simulation software.

Figure 1 – Structure of the static driving simulator.

The information was used to build the FRAM model shown below in Figure 2.

Figure 2 – the FRAM model of the overtaking functions

Finally, design recommendations for managing performance variability were proposed in order to enhance system safety. The outcomes showed that the current automation strategy should focus on adaptive automation based on a human-automation collaboration, rather than full automation.

The study concluded that the FRAM analysis can support decision-makers in enhancing safety enriched by the identification of non-linear and complex risk.

Publication

Grabbe, N., Gales, A., Höcher, M., & Bengler, K. (2021). Functional resonance analysis in an overtaking situation in road traffic: comparing the performance variability mechanisms between human and automation. Safety, 8(1), 3. https://doi.org/10.1007/s10111-022-00701-7

 

 

 

 

 

The Formula 1 Pit Stop Test Case

In analysing the performance of complex sociotechnical systems, of particular interest is the inevitable and inherent variability that these systems exhibit, but can normally tolerate, in successfully operating in the real world. Knowing how that variability propagates and impacts the total function mix then allows an understanding of emergent behaviours. This interdependence, however, is not readily apparent from normal linear business process flow diagrams.

An alternative approach to exploring the operability of complex systems, that addresses these limitations, is the functional resonance analysis method (FRAM). This is a way of visualising a system’s behaviour, by defining it as an array of functions, with all the interactions and interdependencies that are needed for it to work successfully. Until now this methodology has mainly been employed as a qualitative mind map.

This case study describes a new development of the FRAM visualisation software that allows the quantification of the extent and effects of this functional variability. It then sets out to demonstrate its application in a practical, familiar test case. The example given is the complex sociotechnical system involved in a Formula 1 pit stop. This has shown the potential of the application and provided some interesting insights into the observed performances.

Figure 1 – The Work as Imagined (WAI) FRAM model

Insights from the Model

The spine of the process is a very smooth, well-rehearsed, coordinated and choreographed, essentially linear series of sequential actions by the four tyre-changing teams, which operates almost autonomously; and only requires a car and fresh tyres to be available. The additional and critical functions that enable and develop the outcomes of the tyre teams are in the initial car reception phase and the final car release phases of the operation. Here it is crucial that the car stops exactly in position and that it is promptly and reliably elevated to enable the tyres to be removed.

This criticality has been recognised by the provision of two extra mechanics to ensure the car’s stabilization, and two extra jackmen to provide resilience for an essential function.

Similarly, at the rear of the car, the time taken to lower the car and move the jacks out of the way shows up as a potentially crucial delay to release. But it is clear that the last two mechanics (the “gap spotter” and the “release” controller) have the most demanding functions (with multiple aspects), which are the final and are probably crucial to determining the overall time taken.

Arrivals of other cars are completely outside of the control of the pit crews so that this variable is essentially random and needs to be accepted as a delay. The release process requires knowledge, indications, and signals that all the previous functions have been successfully achieved and that there is a clear gap available before the function can execute. Just in terms of conscious processing, this decision probably takes the most time to execute correctly and safely.

The consequences of getting it wrong add to the pressure on the decision maker. Putting a set
of notional values into the model yields a value of the time taken of around 2–3 s, which fits observed performances.

It is noticeable that in the Williams video referenced in the paper, the overall time taken is less than predicted by this study of the  “as imagined’ FRAM sequence of instantiations. So, the video was examined in more detail to try and establish how exactly the teams carried out their different functions. What adaptations were made to be able to complete the tasks more quickly?

Figure 2 – The Work as Done FRAM model

The first thing that becomes apparent when the videos are examined closely is that although the officially timed start of the process is from when the car has stopped at its marks, the pit crews anticipate the stop, and the air guns are engaging, the wheel nuts and the jacks are moving into position before the car stops.

This means that none of these functions are rate-determining in adding to the time but are effectively reducing the time by anticipating the start. In the WAD instantiation, below (Figure 2) we have thus added an additional function for the car to enter the box and be active before the “official” start time. Similarly, at the rear of the car, the jacks are removed as soon as the tyres are on and the release seems to happen simultaneously with the wheel-nut-tightening completion, another corner-cutting adaptation reflected in the changing the aspect links. There does not seem to be a noticeable delay in the release of the car, after the nut is tightened, which means again, that the release function is anticipating the clearance checks. Again, this has a significant effect in further reducing the overall time taken.

When the Williams pit stop video is analysed more rigorously, we observe timings remarkably close to the WAD FRAM  timings, which further supports our interpretation of the actual work as done. As it is a very competitive environment and seconds saved in pit stops can mean gaining or losing advantage, there is continuing pressure to find ways of further reducing these times.

One such initiative is rumoured to be the progressive automation of some of these critical functions like the release function, either for more speed, but more likely to be for more reliability/safety.

 This is now a classic case of Rasmussen drift, where the operational safety boundaries are gradually tested and extended, to gain competitive and efficiency advantages. Unfortunately, as these boundaries can never be precisely predicted in real environments, this often results in unfortunate but totally foreseeable (in hindsight) unsafe excursions, accidents, and casualties. In Formula 1, Ferrari were fined 50,000 euros by race officials for an unsafe release at the Bahrain Grand Prix in 2018, which resulted in an injury to the front jack man who was not able to get out of the way in time. From the FRAM model, this was the result of pressuring the release mechanic to cut his decision time to such an extent that it was reflex, rather than a conscious confirmation of a safe state for release
Publication December 2021 Applied Sciences 11(24):11873 DOI: 10.3390/app112411873

(PDF) Optimising the Performance of Complex Sociotechnical Systems in High-Stress, High-Speed Environments: The Formula 1 Pit Stop Test Case. Available from: https://www.researchgate.net/publication/357045761_Optimising_the_Performance_of_Complex_Sociotechnical_Systems_in_High-Stress_High-Speed_Environments_The_Formula_1_Pit_Stop_Test_Case [accessed Aug 21 2024].

 

 

 

 

 

 

 

 

 

 

 

 

Runway Incursions

Much has been written about the quick thinking and disciplined organisation that allowed the brave Japan Airlines crew to evacuate their passengers safely and live up to the exemplary safety record of aviation operations. But of course, we tend to focus on the consequences (which could have been much worse!), and not realise that the actions of those involved tend to be similar, whether near misses, or disastrous. As Shawn Wildey has just pointed out “We need to do more about protecting runways…there are a lot of near misses (look at the snapshot below of just a year) …let’s not forget Tenerife.”

So, wanting to learn more we got ChatGPT to build a quick FRAM to explore the issues, the result is shown below.

What this shows clearly is the complete reliance on the one channel of communication to control landing, taxiing and take offs, for multiple simultaneous movements. The safety record is thus heavily dependent on the undoubted excellence of the Air Traffic Controllers, as they seem to constitute a single point of failure. (I would be relieved to be corrected if I have misrepresented the issue).This seems both unsafe and unfair to totally rely on human oversight, (however expert and professional) in an archetypal complex sociotechnical system.

A recent video (Av Safety investigation video – runway incursion | Civil Aviation Safety Authority(casa.gov.au), thus concentrates on the “Human Factors” available to increase the reliability and minimise pilot errors. Their recommendations are sensible, but do they address the real issues? Entreaties to recognise information overload, fatigue, confirmation bias are all a relevant and naturally understandable response, common to almost all large organizations, with much invested in their existing systems. But perhaps more enlightened thinking that might allow a more objective approach, unafraid to challenge “the system” is more relevant to its complexsociotechnicality.

Perhaps it’s the “system” (st—–d?)

Looking at the Uberlingen incident for the Swiss Government, (another prime candidate system needing a FRAM analysis)there was an automatic collision avoidance system was involved.

 This Traffic Collision Avoidance System (TCAS) is a safety net designed to prevent mid-air collisions between aircraft. Here’s a brief overview of how it works:

  1. Transponders: Aircraft equipped with TCAS are equipped with transponders, which are electronic devices that automatically transmit information about the aircraft, such as its identity, altitude, and position.
  2. Interrogation and Replies: TCAS operates by periodically sending out interrogations to nearby aircraft equipped with transponders. These interrogations are like electronic “questions “asking for information. Aircraft transponders, in turn, reply to these interrogations with their own information.
  3. Resolution Advisories (RAs): If TCAS detects a potential collision threat, it issues Resolution Advisories (RAs) to the flight crews of the involved aircraft. RAs provide guidance on what action the pilots should take to avoid a collision. There are two types of RAs: Climb Advisory(CA): If a collision threat is detected and a climb is necessary to avoid it, TCAS issues a Climb RA, indicating the required rate of climb. Descend Advisory (DA): If a descent is necessary, TCAS issues a Descend RA, specifying the required rate of descent.
  4. Coordinated Manoeuvres: Both aircraft involved in a TCAS resolution advisory receive complementary RAs. For example, if one aircraft receives a Climb RA, the other will receive a Descend RA. This ensures that the aircraft move away from each other safely.
  5. Pilot Discretion: While TCAS provides advisories, it’s ultimately the responsibility of the flight crew to follow these advisories.

Pilots are trained to prioritize TCAS RAs over other air traffic control instructions when a conflict is detected. It’s important to note that TCAS is just one layer of the overall air traffic control and collision avoidance system. It works in conjunction with ground-based radar, air traffic control instructions, and other safety measures to ensure the safe and efficient movement of aircraft in controlled airspace. “So, a question that begs to be asked, is that if the aircraft are telling each other where they are, surely some sort of geofencing, or automatic segregation and warnings of potential safety margin incursions, can apply on the ground as well as in the air. (granted some major electronics would be needed to distinguish signals from noise).But if not this what? To maintain aviation’s pre-eminence in safety thinking, don’t we need to plug this glaring gap?

(PDF) Runway Incursion incidents. Available from: https://www.researchgate.net/publication/377147564_Runway_Incursion_incidents [accessed Aug 21 2024].

The Arena Bombing, the Manchester Children’s Hospital‘s  Response

On the evening of the 22nd of May 2017, a terrorist denoted an improvised explosive device in the foyerof the Manchester Arena as concert goers, children and adults emerged, killing 23 people (including the attacker). Paediatric Mass Casualty Incidents (MCI) are rare in the context of an individual clinician orinstitution, but children are often involved when MCI occur.1 A paediatric MCI should provide anopportunity to explore optimal human and organisational performance, to apply that learning to improvefuture patient outcomes. Resilience dened as “the intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances so that it can sustain required operations, evenafter a major mishap or in the presence of continuous stress”, is an essential prerequisite of a MajorTrauma Centre (MTC). A MTC is a complex socio-technical healthcare system designed to respondeffectively to a myriad of clinical scenarios, within which healthcare staff work adaptively to providepatient care.

In the immediate aftermath of the Manchester Arena Attack the nearby paediatric MTCdemonstrated both resilient elements and a series of adaptations to improve patient outcomes during theMCI. During the initial response to the attack twenty-two children aged between eight to fteen years and veparents presented with blast injuries to the paediatric MTC. One child died in the Paediatric EmergencyDepartment (PED), fourteen children were admitted, four going directly to the operating theatres and six tothe Paediatric Intensive Care Unit (PICU).MCI involving children are rare events. However, learning from such experiences, is a fundamental element of resilience. A lack of in-depth learning after events, severely hampers the capability to respondto future MCIs that may present to a UK MTC. Modelling is one way of learning, with a model being aformal system that can be used to express or represent the “objects and their relationships in the world”that are being investigated.7Functional Resonance Analytical Methodology (FRAM) facilitates the modelling of complex adaptive systems.

With condence developed in the model, actual timings during the MI were compared with thoseproduced by the model using expected timings for functions. These expected Work As Imagined ndingswere Function Process Time (Tp) the time it took for a function to go from input to output, the WAIFunction Output Lag Time (To) the time it took to move from one function ending to starting anotherfunction and WAI Total Time of Functions (Tt) the total time for functions in the system.  

These expectedtimings were constructed on discussion with subject matter experts, for example discussion with seniorPED nurse regarding how many minutes it takes to triage a severely injured child. The exception was thefunction “To stabilise in Resus” which was theoretically derived from a series of simulated resuscitations suggesting an average resuscitation time of thirty minutes for trauma patients published previously.14 Atthe time of the Arena Attack the hospital did not have an electronic patient record, the reliable Work AsDone (WAD) data was taken from actual timings to commence CT scanning and times of entering andleaving theatre from theatre software. Mean WAD Function Start Times and Function Process Times arepresented. Table 2 shows the expected mean timings produced by the model of the MCI and timingsrecorded during the MI for the rst eight patients, three of whom went to theatre.

Publication

FRAM - the Functional Resonance Analysis Method for modelling non-trivial socio-technical systems

A Functional Resonance Analytical Methodology exploration of the essential functions of a paediatric major trauma centre responding to a mass casualty incident

February 2024 DOI: 10.21203/rs.3.rs-3937622/v1

The Deepwater Horizon Incident

“What does the collapse of sub-prime lending have in common with a broken jackscrew in an airliner’s tailplane? Or the oil spill disaster in the Gulf of Mexico with the burn-up of Space Shuttle Columbia? These were systems that drifted into failure. (Dekker, 2011)

Traditionally accident investigation approaches have been driven by the need to pin down exactly what went wrong. The answer is demanded by our insurance and legal processes, which need to establish who, or what was to blame. People like Turner (1997) and Rasmussen, (1997) however, came to the conclusion that much of the blame, lay with the organisations that were supposed to be managing these situations, safely (i.e., without accidents). Perrow, (1984) on the other hand, theorised that in highly complex, tightly coupled, stiff systems, accidents were inevitable; indeed, were to be expected and regarded as “normal”. He quoted the 3 Mile Island (Elliot, 1980) nuclear accident as an example. Hopkins (1999) has articulated the problems and confusion inherent in this explanation (justification?) of such incidents; and further queried whether even 3 Mile Island fitted this definition in practice. (2001) Many of the methods employed in the study of these accidents are focussed on finding what failures caused the consequences observed, whether of components, individuals, or organisations.

More recent discussions (Hollnagel, Woods, Dekker) have highlighted that these failures perhaps represent extreme excursions in “normal” system behaviour and hence as Perrow indicates “to be expected. So, the questions of whether or not accidents are “normal” is relevant. . Hence more recent approaches (Hollnagel E. , 2014) to trying to understand what happens in these situations, has proposed that many of the accidents happen as a result of operating such systems in very much the same way as usual – i.e., normally.

What is now of interest as a research question is to determine what constitutes “normal” behaviour and why deviations from it are a problem. Variabilities in operational environments, personnel and conditions, Manifest themselves as a range of observed behaviours, with a (normal?) distribution of frequency of occurrence.  Accidents, on this approach would thus represent excursions into a small section of the tails of a normal distribution. This is almost back full circle to Rasmussen’s idea that in real systems and operating environments, it is normal to expect such straying over safe limits inadvertently,)

 The case study uses FRAM,  (the Functional Resonance Analysis Method) (Hollnagel E. , FRAM: The Functional Resonance Analysis Method: Modelling Complex Socio-technical Systems, 2012), to examine the BP Macondo Well incident to determine its applicability and effectiveness as a diagnostic tool. The FRAM analysis employed, showed that there was indeed a range of conditions which were considered “normal” and acceptable in individual functions; and that their complex interdependencies could indeed explain the emergent accident conditions that were observed.  It argued that if “normal” is understood as natural variabilities in operating environments i.e., in its normal usage, the Macondo Well incident was indeed a normal accident.

The study also showed that the Functions modelled, corresponded to the barriers identified in the Investigating Commission’s BOW TIE diagrams.

This led to a further publication showing how to use FRAM to quantify predictions of barrier performance on demand more realistically

Figure 1 – The FRAM Model showing the Instantiation for the procedure being operated

Publication

Slater, D. (2023), Was the Deepwater Horizon incident a “Normal” accident? Safety Science 168(2023):106290, DOI: 10.1016/j.ssci.2023.106290

Bow Tie paper

Slater, D. and Hill, R., (2024), Building Nonlinear, Systemic Bow Ties, Using Functional Barriers, System Engineering, DOI:  10.20944/preprints202406.1433.v1 

UK COVID Response: A Comprehensive Analysis

1. UK COVID Response: A Comprehensive Analysis

Scope and Objectives of the Study

Responding to outbreaks of new infectious diseases is a significant challenge in today’s interconnected global society. Since the start of the 21st century, we’ve encountered several pandemics declared by the World Health Organization (WHO), including SARS (2002/3), Swine Flu (2009), Polio (2014), Ebola (2014), MERS (2015), Zika (2016), Kivu Ebola (2018), and most recently, COVID-19 (2019). These pandemics have highlighted the difficulties and complexities of responding effectively, exacerbated by the rapid spread of infections—sometimes reaching global levels in just 72 hours (American Assoc. 2014)—and the unforeseen and unique challenges they present, leading to varying degrees of medical, social, and economic crises.

The spread and impact of these pandemics are the result of intricate interactions between disease vectors and societies, along with the type, timing, and effectiveness of societal responses. While sound epidemiological modeling based on previous outbreaks is crucial, the complex nature of these interactions often leads to unforeseen developments that predetermined models cannot always predict or manage effectively.

This project aimed to document and describe the development and deployment of pandemic response and management strategies during the UK’s response to COVID-19. The goal was to identify lessons learned and build resilience for future pandemics.

Using the Hollnagel Functional Resonance Method (FRAM), the project sought to precisely capture the reality of the crisis as it unfolded. Given that the pandemic was still ongoing during the study, this approach allowed for a deeper understanding of what worked well and what didn’t, with the aim of improving future performance without focusing on blame, but rather on the actions taken. The overall FRAM model used is shown below.

Key Outcomes and Conclusions

The UK’s experience during the COVID-19 pandemic offers several critical lessons. The pandemic underscored the importance of preparation, early intervention, clear communication, collaboration, equity, and the use of science to guide decision-making. This project explored these key issues:

  • Adequate preparation and early intervention
  • Legitimate and truthful use of scientific evidence
  • The basis and quality of decisions made
  • Perceived equity and public trust
  • Clear communication of messages

The study identified an inevitable progression of impact due to these factors. A lack of understanding and action, combined with political concerns overshadowing public safety, led to overcompensation and mismanagement. Notably, the high death rates in Italy, Britain, and the USA were heavily influenced by the failure to protect the elderly in care and nursing homes.

The paper delves into these issues to better understand their escalation and offers recommendations to avoid similar failures in the future. However, it remains unclear whether these lessons have been fully understood or whether the necessary changes will be implemented.

Recommendations for Future Pandemic Response

  1. Reevaluate Government Structures: Reconsider the design, effectiveness, and interactions of traditional government structures, particularly within the NHS.
  2. Rethink the Role of Special Advisers: The UK Government should reassess the status, roles, and responsibilities of Special Advisers in managing independent advice to ministers.
  3. Clarify the Use of Truth: Governments need to distinguish between “objective” and “convenient” truths in decision-making and communication.
  4. Accountability in Decision-Making: Decision-makers must take responsibility for following or interpreting published advice.
  5. Provide Unbiased Information: The public deserves the best available information and reasoning behind decisions, free from polarized opinions.
  6. Address Uncertainty and Complexity: Governments should openly acknowledge and communicate the inherent uncertainty, ambiguity, and complexity of difficult decisions.
  7. Implement a Red Teaming Function: A formal red teaming function should be required in planning and response organizations to challenge assumptions and strategies.
  8. Foster a Culture of Independent Thinking: Encourage a culture that values challenge and enlightened, independent thinking.
  9. Adopt a “Military” Mindset: In pandemics, governments should consider adopting a mindset akin to wartime strategies, moving beyond conventional approaches.
  10. Mandate Inclusivity and Competence: Ensure inclusivity, acceptability, and competence in crisis management, potentially through a “war cabinet” approach.

Publications

  • A Systems Analysis of the COVID-19 Pandemic Response in the United Kingdom – Part 1: The Overall Context (Safety Science, October 2021)
  • A Systems Analysis of the UK COVID-19 Pandemic Response: Part 2 – Work as Imagined vs. Work as Done (Safety Science, October 2021)
  • The UK’s Response to the COVID-19 Pandemic, Part 3 – Lessons Learned (Medical Research Archives, July 2023)

These publications offer an in-depth analysis of the UK’s COVID-19 response, providing valuable insights for improving future pandemic preparedness and management.

The rest of the case studies will be developed like this