1.
a) Why it is useful to think of safety as a relative and not an absolute concept.
[4 marks]
There are many reasons why safety should be considered as a relative concept. For example, how would you demonstrate that a system was absolutely safe. Even if it continued to operate without any threat to safety for several years, there may be no guarantee that it will not fail in the next 24 hours. Changes in the environment, hardware failures, untested modes in software systems can all work together to expose systems in ways that had not been done in a previous period of `safe' operation. Hence continues safe operation may paradoxically increase our level of confidence in a system but may also lead to different concerns about the longer-term safety of an application.
b) Why does Reason's distinction between latent and active failures have important implications for the development of safety critical software.
[6 marks]
The consequences of active errors are felt immediately. The effects of a latent failure may lie dormant for some time. For example, a decision to reduce the staffing in a safety-critical application can create the latent conditions for failure. It does not directly trigger an accident in the same way that a tired or over-worked employee might issue a hazardous comment.
Software bugs often exist as latent failures. The system can function and operate successfully assuming that the relevant instructions are not executed. However, a change in the environment or operating characteristics can expose the underlying vulnerability. Inadequate development practices and fuding constraints can be seen as the precurosrs that, in turn, are likely to lead to latent software failures.
c) Briefly explain at least two ways in which confidential incident reporting systems can be used to support the development of safety-critical computer systems.
[10 marks]
Incident reporting systems can support the development of safety-critical computer systems in a number of ways:
2.
``Section 5. Specify Non-Functional Requirements.
Non functional requirements are constraints on the system design. They may arise from user requirements, technical disciplines or the external environment. They are often "ilities", can be divided into product or support constraints and include the following areas:
UK Ministry of Defence, Smart Procurement Initiative (SPI)
a) Provide a briefly definition for each of the non-functional requirements mentioned in the Smart Procurement Initiative (SPI) document and explain how they relate to Laprie's concept of dependability.
[4 marks]
Laprie views dependability as a generic concept that embraces some elements of the various non-functional requirements identified by the UK MOD. He views these as attributes of the more general concept. There are various alternate definitions, however, each can be summarised as follows:
b) Briefly provide an explanation of why the MOD SPI document includes the following sentence: "Blanket application of individual non-functional requirements will be unnecessarily costly and should be avoided."
[4 marks]
Question 1 has already argued that safety should be considered a relative rather than an absolute concept. Hence any attempt to achieve `blanket' safety is likely to waste finite development resources. The previous answer has also alluded to the tensions that exist between what Laprie views as the components of dependability. Hence, increasing security may have an adverse effect on safety and vice versa. There are also a number of paradoxes which reduce the benefits that are to be obtained from over-investment in certain nn-functional requirements. For instance, the more reliable that a system becomes then the longer it will be between failures. In consequence, staff will need frequent retraining so that they are ready to respond to eventual failures. This training can incur considerable additional expense and it can be difficult to gather evidence to justify the additional margin investment in improved reliability. In consequence, it is important that developers consider the tradeoffs that exist when focussing on particular non-functional requirements.
c) With an example drawn from any of the case studies that have been introduced in this course, explain why it is so difficult to maintain an adequate safety case for a safety-critical computer system.
[12 marks]
[CWJ: there are many different solutions to this question. Mail me you answer and I'll provide some feedback.]
The key points are that safety cases build up an argument that any risks associated with an application are as low as reasonably practicable. These arguments can be based on assumptions about the relationship between an application and other systems. They can also be based upon assumptions about the environment that a system will operate in. These conditions can be difficult to control and hence can force revisions to a safety case in order to demonstrate that the system continues to be safe within revised operating conditions. Changes in operating practices or the functional demands on a system can introduce the same need for revised argumentation.
In cases where a revised argument cannot be made then it can be necessary to introduce design changes. A continuing problem here is that it is difficult to ensure the modularity of a safety-case. Hence any changes within the design of a system sub-component will often force renewed inspection of the argumentation associated with the safety cases of other related components. If the changes cannot be accommodated within the argument then redesign may be necessary.
Other answers might focus more on the difficulty of judging whether a particulr safety case will satisfy independent assessors.
3.
a) Briefly explain the differences between hardware and software that are increasingly complicating the development of safety-critical systems.
[5 marks]
In previous systems, there was a preference towards using hardware for most safety functions. This led to the development of bespoke applications that were relatively inflexible. More recently PLC's have been used to provide some of the flexibility of programmable systems without the full complexity of general purpose computing architectures. Increasingly, however, the computational power and costs benefits of mass market hardware and software and attracting the attention of safety-critical systems developers. There are obvious dangers in using Commercial off the shelf (COTS) systems, indeed many vendors specifically exclude there use in safety related projects. In particular, the standards of documentation and the lack of open source can prevent the necessary levels of analysis.
b) What are kernel requirements and why is it necessary to identify the intent of a safety-critical specification?
[5 marks]
A safety kernel is a core set of simple functions that are subject to increased levels of rigour and analysis. They offer a higher degree of assurance than is available for the rest of the code that uses the kernel services. Hence, in a time-critical application the scheduler might from part of sa efty kernel because if there is a design flaw in this component then the rest of the system is unlikely to meet its timing requirements.
A specification tells you what a system must do rather than how it might do it. An intent specification does not simply tell you what a system does, it also tells you what that function is necessary and important. This can be related to the concept of a safety kernel because it is important to justify why the additional expence is justified for any code that is included within the kernel. Conversely, it may also be necessary to justify why some code is excluded from the kernel.
In some ways an intent specification is mid-way between development documentation and a final safety case.
c) Why is it important to explicitly identify completeness requirements for safety-critical software? Illustrate your answer by explianing how the lack of these criteria may have contributed to one of the incidents or accidents that you have studied during this course.
[10 marks]
[CWJ: there are many different solutions to this question. Mail me you answer and I'll provide some feedback.]
Completeness criteria can be used to determine whether or not developers have adequately considered the various aspects of their system. Most often, arguments must be made about the sufficiency of a design with respect to each of the conpleteness criteria. For example, designers must demonstrate that the system is sufficiently safe to cope with predicted environmental conditions without being absolutely safe. In other words, it is not necessary to demonstrate that the system will function correctly under environmental conditions that could not reasonably be predicted.
Leveson introduces the following completeness criteria:
4.
Using the lessons learnt from recent accidents, explain why safety culture can be less important than the application and oversight of safety-critical standards.
[20 marks]
[CWJ: there are many different solutions to this question.
Mail me you answer and I'll provide some feedback.]
END