Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability refers to the consistency of the results provided by a system; internal and external reliability are, respectively, the ability to detect gross errors and the effect of an undetected blunder on the solution.
Reliability is about the overall consistency of a measure. It is a concept that encompasses service continuity, and thus it is related to satellite availability, to those indicators described at availability, accuracy, portability and repeatability, in addition to integrity. The latter requires the definition, for each measurement of interest, of:
- an alert limit, defined as the error tolerance not to be exceeded without issuing an alert,
- a time to alert, defined as the maximum allowable time elapsed from the onset of the navigation system being out of tolerance until the equipment enunciates the alert,
- the corresponding integrity risk, defined as the probability that, at any moment, the position error exceeds the alert limit, and
- a protection level, defined as the statistical bound error computed so as to guarantee that the probability of the absolute position error exceeding said number is smaller than or equal to the target integrity risk.
Specifically, software reliability is also related to the usage of the programming language. Certain coding practices are considered unsafe, in the sense that they can lead to undefined, unspecified or implementation-defined behaviors under certain conditions, which is an undesirable feature (see definitions below).
As examples of programming languages for high-reliability systems, we can mention Ada and SPARK (which is an Ada dialect with some hooks for static verification), which are used in aerospace circles for building high reliability software such as avionics systems, and Erlang, which was designed from the ground up for writing high-reliability telecommunications code. Functional languages such as Haskell can be subjected to formal proofs by automated systems due to the declarative nature of the language. However, these languages are garbage collected, and garbage collection is not normally predictable enough for hard real-time applications, although there is a body of ongoing research in time bounded incremental garbage collectors.
Although C and C++ were not specifically designed for this type of application, they are widely used for embedded and safety-critical software for several reasons. The main properties of note are control over memory management (which allows you to avoid having to garbage collect, for example), simple, well debugged core run-time libraries and mature tool support. A lot of the embedded development tool chains in use today were first developed in the 1980s and 1990s when this was current technology and come from the Unix culture that was prevalent at that time, so these tools remain popular for this sort of work. While manual memory management code must be carefully checked to avoid errors, it allows a degree of control over application response times that is not available with languages that depend on garbage collection. The core run time libraries of C and C++ languages are relatively simple, mature and well understood, so they are amongst the most stable platforms available1.
In the case of the C++ language, the software industry has created several specifications for enhanced reliability, banning the usage of a set of libraries and functions from the standard library, as well as defining a list of coding rules. Examples:
The ultimate objective of those coding standards is to prevent from the undesired behaviors described below:
Definitions from the ISO/IEC 14882:2017 standard
- Undefined behavior is behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which the C++ Standard imposes no requirements. Undefined behavior may also be expected when the C++ Standard omits the description of any explicit definition of behavior or defines the behavior to be ill-formed, with no diagnostic required.
- Unspecified behavior is behavior, for a well-formed program construct and correct data, that depends on the implementation. The implementation is not required to document which behavior occurs.
- Implementation-defined behavior is behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents.
Unspecified and implementation-defined behaviors are issues also related to portability.
There are tools that help developers in minimizing code defects by diagnosing typical programming errors, like interface misuse or bugs that can be deduced via static analysis. Examples are Coverity Scan and clang-tidy. Another relevant diagnosing tool are compiler warnings, which indicate things that might cause problems or might have unintended effects that the programmer was not aware of. Different compilers warn more or less than others, and they all have options to increase or decrease the amount of warnings.
Indicators of Reliability
It follows a list of possible reliability indicators for a software-defined GNSS receiver:
- Percentage of false and missed alerts.
- Availability of receiver autonomous integrity monitoring (RAIM) mechanisms:
- Fault detection (requires in-view satellites).
- Fault detection and exclusion (requires in-view satellites).
- RAIM prediction tools.
Horizontal / Vertical Protection Limits (HPL / VPL): radius of circles which are centered on the GNSS position solution and are guaranteed to contain the true position of the receiver to within the specifications of the RAIM scheme (i.e., which meets specified false alarm and missed detection probabilities).
- Availability of mechanisms providing robustness against RF interferences and multipath:
- Out-of-band rejection of RF interferences (see ETSI EN 303 413 Standard).
- In-band rejection techniques for continuous wave, pulsed, and wideband interferences.
- Countermeasures against spoofing, meaconing, and fake assisted and differential data.
- Spatial diversity: Fixed / Controlled Reception Pattern Antennas2.
- Deployment of network security and data integrity mechanisms.
- Availability of GNSS signal authentication mechanisms.
- Probability of failure.
- Time to authentication.
Safety-critical software certifications (e.g., DO–178B).
- If the programming language is C++: Coding Standard certifications (e.g., SEI CERT C++ Coding Standard, High Integrity C++, MISRA C++:2008, others)
- Availability of a
.clang-tidyfile for clang-tidy check customization.
- Availability of a
- Observation of coding standards.
- Use of static checking tools to enforce compliance with the allowed language subset.
- No compiler warnings.
Stack Overflow, Which languages are used for safety-critical software? ↩
C. Fernández-Prades, J. Arribas and P. Closas, Robust GNSS Receivers by Array Signal Processing: Theory and Implementation, Proceedings of the IEEE, Vol. 104, No. 6, pp. 1207 - 1220, June 2016. ↩