Copyright Chris Johnson, 1999. Here are sample solutions for this paper.
Xday, XX May 200X.

9.30 am - 11.15am



University of Glasgow





DEGREES OF BEng, BSc, MA, MA (SOCIAL SCIENCES).





COMPUTING SCIENCE - SINGLE AND COMBINED HONOURS
ELECTRONIC AND SOFTWARE ENGINEERING - HONOURS
SOFTWARE ENGINEERING - HONOURS





SAFETY-CRITICAL SYSTEMS DEVELOPMENT





Answer 3 of the 4 questions.

1.

a) Why it is useful to think of safety as a relative and not an absolute concept.

[4 marks]

There are many reasons why safety should be considered as a relative concept. For example, how would you demonstrate that a system was absolutely safe. Even if it continued to operate without any threat to safety for several years, there may be no guarantee that it will not fail in the next 24 hours. Changes in the environment, hardware failures, untested modes in software systems can all work together to expose systems in ways that had not been done in a previous period of `safe' operation. Hence continues safe operation may paradoxically increase our level of confidence in a system but may also lead to different concerns about the longer-term safety of an application.


b) Why does Reason's distinction between latent and active failures have important implications for the development of safety critical software.

[6 marks]

The consequences of active errors are felt immediately. The effects of a latent failure may lie dormant for some time. For example, a decision to reduce the staffing in a safety-critical application can create the latent conditions for failure. It does not directly trigger an accident in the same way that a tired or over-worked employee might issue a hazardous comment.

Software bugs often exist as latent failures. The system can function and operate successfully assuming that the relevant instructions are not executed. However, a change in the environment or operating characteristics can expose the underlying vulnerability. Inadequate development practices and fuding constraints can be seen as the precurosrs that, in turn, are likely to lead to latent software failures.


c) Briefly explain at least two ways in which confidential incident reporting systems can be used to support the development of safety-critical computer systems.

[10 marks]

Incident reporting systems can support the development of safety-critical computer systems in a number of ways:

  1. identifying common problems. Software and hardware developers may identify a number of bugs or potential failure modes in a developing design. Incident reporting systems provide a means of gathering this information so that it can be collated to find out if individual problems form part of a wider pattern. For example, every office may suffer the same problems from a particular faulty component.
  2. communicating development concerns. Developers may also observe potential problems in their colleague's work and find it difficult to report these concerns. Annonymous or confidential incident reporting systems provide a forum for these concerns without necessarily exposing the employee as a `whistle blower'.
  3. communicating lessons learned and best practice. Collated information can be exchanged between units do that best practice will be followed throughout a company. This not only applies to the recommendations in the aftermath of an adverse event but also to the ways in which previous failures have been detected and mitigated.
  4. supporting risk assessment. The frequency of previous failures can be monitored using incident reporting systems. They probably only provide useful best case estimates as many adverse events may not be reported by personnel even if they observe them.
As mentioned, there may be several other ways in which incident reporting systems can help the development of safety-critical applications. For example, the dissemination of safety related information through feedback newsletters about previous incidents can `keep staff in the loop'. Further information can be obtained from a review paper

2.

``Section 5. Specify Non-Functional Requirements.

Non functional requirements are constraints on the system design. They may arise from user requirements, technical disciplines or the external environment. They are often "ilities", can be divided into product or support constraints and include the following areas:

  1. reliability
  2. maintainability
  3. operability
  4. safety
  5. security
  6. engineering standards
  7. environment
  8. support
Non-functional requirements are often expensive but add quality. Early identification will avoid costly changes and facilitate the trade-off process leading to a cost-effective solution. Blanket application of individual non-functional requirements will be unnecessarily costly and should be avoided. They should be identified against and linked to the lowest level function in the decomposition to which they specifically apply. Non-functional requirements should also be expressed as unique statements of requirement with the same attributes as system functions. ''

UK Ministry of Defence, Smart Procurement Initiative (SPI)

a) Provide a briefly definition for each of the non-functional requirements mentioned in the Smart Procurement Initiative (SPI) document and explain how they relate to Laprie's concept of dependability.

[4 marks]

Laprie views dependability as a generic concept that embraces some elements of the various non-functional requirements identified by the UK MOD. He views these as attributes of the more general concept. There are various alternate definitions, however, each can be summarised as follows:

  1. reliability - is the probability that a service will be provided for a specific time under specific environmental conditions.
  2. maintainability - is the probability that a system can be returned to service within a specified period after failure (note that other concepts such as mean time between failure might alos be used here).
  3. operability - is the probability that a particular use can perform a particular task within a specified time limit under a sepcified set of operating conditions.
  4. safety - is freedom from accidents or losses.
  5. security - is freedom from malicious or unauthorised access to a specified set of resources.
  6. engineering standards - relates to the application of agreed and documented development techniques.
  7. environment - most of the previous non-functional requirements depend upon a clear concept of the environment in which a system will function. Hence it is important that developments and operators ensure that the system environment remains within foreseen paramaters or that the system is `revised' appropriately.
  8. support - as the previous paragraph suggests, it is insufficient simply to deploy a system. It is important that periodic reviews are conducted to ensure that the non-functional requirements can continue to be satified once a system is `live, operational'.
Objections can be raised against this generic view of dependability. In particular, tensions exist between concepts such as security and safety such that by increasing the security of a system one might make it less safe. For example, designers might deny access to emergency overides unless users enter complex and time consuming key sequences. In such circumstances, it is difficult to reason about the overall effect on the dependability of the system.

b) Briefly provide an explanation of why the MOD SPI document includes the following sentence: "Blanket application of individual non-functional requirements will be unnecessarily costly and should be avoided."

[4 marks]

Question 1 has already argued that safety should be considered a relative rather than an absolute concept. Hence any attempt to achieve `blanket' safety is likely to waste finite development resources. The previous answer has also alluded to the tensions that exist between what Laprie views as the components of dependability. Hence, increasing security may have an adverse effect on safety and vice versa. There are also a number of paradoxes which reduce the benefits that are to be obtained from over-investment in certain nn-functional requirements. For instance, the more reliable that a system becomes then the longer it will be between failures. In consequence, staff will need frequent retraining so that they are ready to respond to eventual failures. This training can incur considerable additional expense and it can be difficult to gather evidence to justify the additional margin investment in improved reliability. In consequence, it is important that developers consider the tradeoffs that exist when focussing on particular non-functional requirements.


c) With an example drawn from any of the case studies that have been introduced in this course, explain why it is so difficult to maintain an adequate safety case for a safety-critical computer system.

[12 marks]

[CWJ: there are many different solutions to this question. Mail me you answer and I'll provide some feedback.]

The key points are that safety cases build up an argument that any risks associated with an application are as low as reasonably practicable. These arguments can be based on assumptions about the relationship between an application and other systems. They can also be based upon assumptions about the environment that a system will operate in. These conditions can be difficult to control and hence can force revisions to a safety case in order to demonstrate that the system continues to be safe within revised operating conditions. Changes in operating practices or the functional demands on a system can introduce the same need for revised argumentation.

In cases where a revised argument cannot be made then it can be necessary to introduce design changes. A continuing problem here is that it is difficult to ensure the modularity of a safety-case. Hence any changes within the design of a system sub-component will often force renewed inspection of the argumentation associated with the safety cases of other related components. If the changes cannot be accommodated within the argument then redesign may be necessary.

Other answers might focus more on the difficulty of judging whether a particulr safety case will satisfy independent assessors.



3.

a) Briefly explain the differences between hardware and software that are increasingly complicating the development of safety-critical systems.

[5 marks]

In previous systems, there was a preference towards using hardware for most safety functions. This led to the development of bespoke applications that were relatively inflexible. More recently PLC's have been used to provide some of the flexibility of programmable systems without the full complexity of general purpose computing architectures. Increasingly, however, the computational power and costs benefits of mass market hardware and software and attracting the attention of safety-critical systems developers. There are obvious dangers in using Commercial off the shelf (COTS) systems, indeed many vendors specifically exclude there use in safety related projects. In particular, the standards of documentation and the lack of open source can prevent the necessary levels of analysis.


b) What are kernel requirements and why is it necessary to identify the intent of a safety-critical specification?

[5 marks]

A safety kernel is a core set of simple functions that are subject to increased levels of rigour and analysis. They offer a higher degree of assurance than is available for the rest of the code that uses the kernel services. Hence, in a time-critical application the scheduler might from part of sa efty kernel because if there is a design flaw in this component then the rest of the system is unlikely to meet its timing requirements.

A specification tells you what a system must do rather than how it might do it. An intent specification does not simply tell you what a system does, it also tells you what that function is necessary and important. This can be related to the concept of a safety kernel because it is important to justify why the additional expence is justified for any code that is included within the kernel. Conversely, it may also be necessary to justify why some code is excluded from the kernel.

In some ways an intent specification is mid-way between development documentation and a final safety case.


c) Why is it important to explicitly identify completeness requirements for safety-critical software? Illustrate your answer by explianing how the lack of these criteria may have contributed to one of the incidents or accidents that you have studied during this course.

[10 marks]

[CWJ: there are many different solutions to this question. Mail me you answer and I'll provide some feedback.]

Completeness criteria can be used to determine whether or not developers have adequately considered the various aspects of their system. Most often, arguments must be made about the sufficiency of a design with respect to each of the conpleteness criteria. For example, designers must demonstrate that the system is sufficiently safe to cope with predicted environmental conditions without being absolutely safe. In other words, it is not necessary to demonstrate that the system will function correctly under environmental conditions that could not reasonably be predicted.

Leveson introduces the following completeness criteria:

  1. Human Computer Interface Criteria.
  2. State Completeness.
  3. Input/Output Variable Completeness.
  4. Trigger Event Completeness.
  5. Output Specification Completeness.
  6. Output to Trigger Relationships.
  7. State Transitions.
More information is available in the notes. These can be linked to some of the failures that we have looked at in the course. For example, human computer interface criteria can be linked to the London Ambulance project. State completeness might be linked to the problems in porting code between Ariane 4 and Ariane 5.

4.

Using the lessons learnt from recent accidents, explain why safety culture can be less important than the application and oversight of safety-critical standards.

[20 marks]

[CWJ: there are many different solutions to this question. Mail me you answer and I'll provide some feedback.]


END