In the International Space Station (ISS), multiple systems are operating to maintain the environment for astronauts, and the flight controllers are monitoring the status of systems for 24hours, 365 days a year. Although those systems have high reliability, there could be some anomalies for the systems in some cases. If an anomaly is detected, the flight controllers are supposed to investigate the trends of telemetries and assess impacts on operations.
Experienced flight controllers can detect those symptoms of anomaly based on unusual combinations of telemetries(funny data). However, it is generally difficult to identify those unusual combinations systematically because the number of combinations could be huge, i.e., at least 2 for30 telemetries for just binary type parameters whose value can take TRUE/FALSE. To address the issue, machine learning based models are expected to support anomaly prediction in terms of the combinations of those telemetries, without wasting huge state space.
Automatic anomaly detection methods have been proposed by several researchers, for the purpose of which machine learning based anomaly detection methods are widely used.
Those methods are effective for limited number of telemetries with known anomaly events. However, although those methods provide high accuracy for anomaly detection, explainability for operators is lacking there. To apply automatic anomaly symptom detections methods to ISS operations, it is required to provide flight operators with the rationale for the prediction because they cannot take actions without justification.
This case study demonstrates the process of symptom detections for ISS operations and designs an automatic method to detect symptoms of anomaly with additional information for explaining reasons of detections. It presents a systemic symptom detection method by combining the Functional Resonance Analysis Method(FRAM) and the Specification Tools and Requirement Methodology-Requirement Language (SpecTRM-RL) with machine learning-based anomaly detection model. This system is utilizing the international patent technology(Nomoto et al., 2020). Figure 1 shows the overview of our proposed method, and the detail will be provided in the following subsections.
Figure 2 shows FRAM modelling of the process to detect symptoms.
Flight controllers monitor telemetries of assigned ISS operations. Then they find unusual trends for individual telemetry or anomaly by alerts of each telemetry if the observed values are over threshold. After symptom or anomaly detection, specialists assess the impact and perform trouble shootings for each anomaly. Our motivation is to enable them to assess symptoms with combinations of telemetries and provide additional information for further assessment.
Figure 2 – FRAM modeling of symptom detection process
Our FRAM model of systems related to TCA-L pump is shown in Figure 3. We made four patterns of FRAM models based on results of interviews with specialists.
Figure 3 – Functions related to possible causes in FRAM.
We compared the results of models with Pugh Concept selection as shown in Table 3. RMSE was lowest for model2 while the performance of early symptom detection of anomaly was high in model 3 and 4.
Table 3 – Results of Pugh Concept Selection
Discussing with specialists about the performances of models from several views, we selected model 4 for simulation as it is important to detect anomaly earlier with higher accuracy of predictions. Simulations with defined threshold were performed. We compared the simulation results with the threshold of two, three, or four-sigma. Consequently, four-sigma was chosen because the balance in the numbers of alerts was better than Table 2. Selected telemetries for each model
Figure 4 – Simulation results
Results of model 4 are shown in Figure 4. Red points are the values exceeding the threshold that Alerts can be released to flight controllers based on the simulations.
Conclusions
This case study has proposed a new method to provide additional information for explanations with FRAM and SpecTRM-RL. The proposed method was verified with an experiment on ISS systems. It enables the carrying out of systemic analyses, overcoming the limitations of previous studies which have had difficulty in handling complex multiple factors. The experimental results implied the effectiveness of the method. Although further experiments with other systems and discussion with flight controllers and specialists are required for practical use, the proposed method is expected to use for several safety-critical systems in aerospace and other fields.