I’m currently developing a new method for automated safety analysis of complex systems.
This method is called Causality Checking and allows for the automated generation of fault trees out of system or software architectures in SysML or UML. This post gives an overview of how Causality Checking works.
CausCheck : Causality Checking for Complex System Models
With the increasing complexity of modern safety-critical systems, the need for model based engineering methods that both help in architecting such systems and to asses their safety and correctness becomes increasingly obvious. Due to the size of the systems, traditional techniques like reviews and testing, on the one hand, and manual fault tree analysis or failure mode and effect analysis, on the other hand, can only be applied to limited parts of the system. The main reason for this limitation lies in the vast amount of time and resources that is consumed by manually executing those techniques. In order to be able to asses the correctness and safety of these systems in a comprehensive manner automated or, at least, computer-aided techniques are needed.
Model Checking is an established technique for the automated analysis of system properties. If a model of the system and a formalized property are given to the model checker, it automatically checks whether it can find property violations. In case a safety property is violated, the model checker returns a counterexample, which consists of a system execution trace leading to the property violation. In recent work LeiLeu11, LeiLeu11a we have presented the QuantUM approach which allows for automatic translation of system and software architecture models in UML to the input language of the probabilistic model checker PRISM.
The automatic translation of the UML model to PRISM saves time and resources and prevents errors that where frequently introduced in the manual translation process that was previously used. The remaining challenge is the identification of the causes of the property violation and representing this information in a way that it is interpretable on the level of the UML model. The counterexamples that are computed by the model checker help in retracing the system executions leading to the property violation, but they can only be interpreted at the level of the analysis model. While the visualization of the graph structure of a probabilistic counterexample helps to analyze the counterexamples, it is still difficult to compare the thousands of paths in the counterexample with each other and to discern causal factors during fault analysis.
From Counterexamples via Causality to Fault Trees
In order to lift the probabilistic counterexamples generated by the model checker to the level of the UML model, we propose an automatic approach in KunLL11b that computes causality relationships based on a complete set of probabilistic counterexamples and visualizes the computed causality relationships as fault trees. Fault trees are an industrial standard method to document graphically which combination of events can cause a system hazard. The justification for the causalities determined by our method are based on an adoption of the Structural Equation Model of Halpern and Pearl. We illustrate how to use this model in the analysis of computing systems and extend it to account for event orderings as causal factors. We present an over-approximating implementation of the causality tests derived from the extended model.
We demonstrate that our approach improves and facilitates the analysis of safety critical systems. The resulting fault trees are significantly smaller and hence easier to understand than the probabilistic stochastic counterexample, but still contain all information needed to discern the causes for the occurrence of a hazard.
On-The-Fly Causality Checking
The causality computation for probabilistic counterexamples and the mapping of the causality relationships to fault trees helps to understand how the failure of the systems was caused. Experiments in BeeKLLP12 indicate that the computation of the probabilistic counterexamples, on which the causality computation method relies, accounts for the majority of run-time and memory consumption needed for the causality computation.
The main reason for this is that the probability for each execution trace of the counterexample needs to be computed. While the probability of a system fault is of interest, the information which events cause the system failure is more important to the user. Consequently, we extend QuantUM with a translation from UML models to Promela, the input language of the qualitative model checker SPIN, and adapt our causality method to work on counterexamples that were generated using SPIN. This qualitative method scales much better since no probabilities have to be computed. It is, however, still necessary to enumerate and store all counterexample execution traces and all good execution traces in order to compute the causality relationships. We address this issue in LeiLeu12a and LeiLeu12 by extending the causality model to make it applicable to concurrent system models that are specified by transition systems. Furthermore, we propose a causality checking algorithm which can be integrated into a depth-first search or breadth-first search algorithm that is used for the state space exploration during model checking.
Pingback: Causality Checking at Microsoft Research
Pingback: Causality Checking for Complex System Models (Talk at VMCAI 2013) - Florian Leitner-Fischer