“Programming is a human task, and programmers make mistakes; an error rate in writing software code of 10 errors per thousand lines of code is considered good, 1 error per thousand lines is rarely if ever achieved.” Harold Thimbleby et al
The current public inquiry into the causes and implications of the failures of the Post Office’s Horizon software, has served to bring to the fore an issue which has been a problem for software and safety engineers for a long time. The issue has been outlined1 in a think piece which asks the question as to why we have not manged to do this more effectively to date. One of the main problems seems to be the lack of a universally acceptable and accepted method of demonstrating this to designers and users alike. This has resulted in a reliance on a catalogue of qualitative assurances from the number of precautions and tests involved that the system must be safe. But in reality, we are all aware that this is more hope than confidence. Software is getting more capable, but also more complex all the time. We have a real problem with assuring ourselves that the coding does exactly what it says ‘on the tin’, no more and no less. With more conventional engineering systems, risk assessments and safety cases would be made by analysing and predicting the reliability and security of the system from detailed engineering process flow or wiring diagrams
Unfortunately, software systems are not built that way, and the necessary detailed documentation is almost impossible to construct, or to find. This is because they are predominantly built in an “agile” way, involving groups and teams progressing through sprints and scrums, to add layer after layer of developing code, one on top of another (like papier mâché?) to form “the package”, (essentially a black box?). So, the only way to demonstrate reliability, security and safety, in its intended application is to test, test, test in development and monitor continuously in use. And in use we know that errors and bugs are inevitable, common, frequent and (Perrow) “normal”! – thus we accept this reality and hope it is acceptable?
So how can we develop a way of producing the realistic system “models” that we need to systematically probe for performance in operation. Many attempts have been made using conventional approaches to detail the hard wiring diagrams of what is happening (e.g. Model Based System Engineering, MBSE) so that established quantitative methodologies such as “Fault and Event Trees”, Probabilistic Risk (or Reliability?) Analysis, and HAZOP’s can be carried out. The problem is the resource intensity and detailed databases need and the abovementioned lack of definitive “wiring diagrams” for the integrated software packages.
Thus, in an increasingly complex world there is a real, urgent need for methodologies to enable engineers to model complex socio-technical systems, as these now seem to encompass the majority of systems in use today. This is of course exacerbated by the increasing involvement and augmentation with “black box” AI contributions. We need methodologies which will allow the analyst insights into these complex systems’. A group of safety system professionals in the Safety Critical Systems Club are actively concerned and involved in finding better, more responsible and transparent way of assuring the safety of the black boxes and they do indeed ‘do what it says on the tin’, no more, no less!
This case study looks at at an approach developed to model systems as sets of interactive, interdependent “functions”, (abstracted from agent or component details, FRAM, (Hollnagel, 2020)) and this has now been developed to the point where it can take the basic data and structures from the current component focussed system engineering “models”, and can pull it all together into dynamic models, (as opposed to static, fixed System Theoretic Process Accimaps), from which analysts can discern how they really work in practice, and predict the emergent behaviours characteristic of complex systems. It can now provide the numbers and a quantitative approach that the model-based system engineering applications demand. Furthermore, as the methodology merely builds the system “visualisation “, or FRAM model, it still needs the safety professional to analyse the model to discern behaviours expected and emergent
The first step is to define the system under considerations and hen complie a list of the functions needed to deliver the processes involved.
These functions encompass the entire range of activities involved in the Post Office transaction process, from initiating a transaction to completing it, including all the critical steps for security, accounting, and operational management in between. They provide a comprehensive framework for the FRAM model, allowing for a detailed analysis of the system’s functionalities and interdependencies. Using the FRAM Model Visualiser (FMV), the following FRAM model was built (Figure 1, below).
This case study outlines the steps for creating an initial FRAM model of a typical software solution to the counter operations Post Office Horizon assisted by ChatGPT 4.0. It reports on the initial attempts to develop and validate a better way to model and assure the performance of modern software packages. It sets out to address systematically the issues which are proving difficult to obtain consensus solutions to analysing and assuring the performance of safety-critical software systems. It thus looks at the potential for applying more advanced methods of modelling and analysing these systems.
- The first approach to be investigated is the use of the Functional Resonance Analysis Method to build the system visualisations – models.
- Secondly the feasibility of using LLM’s to produce initial outline systems models which can then be used to examine in detail the behaviours possible in these complex systems.
Publication
Slater, D. (“)”$), How do we make the case for “Safe” software and AI systems? – the Horizon Example, Published by the Safety-Critical Systems Club. All Rights Reserved