ML Neural Networks and Testing

Neural Networks

Artificial neural networks were originally intended to mimic the functioning of the human brain, which can be thought of as a multitude of interconnected biological neurons. The single-layer perceptron is one of the first examples of an artificial neural network implementation and comprises a neural network with only one layer (i.e., a single neuron). It can be used for supervised learning of classifiers that decide whether an input belongs to a particular class or not.
Most current neural networks are called deep neural networks because they include multiple layers and can be viewed as multilayer perceptrons (see Figure 3).

Figure 3: Structure of a deep neural network

A deep neural network consists of three types of layers. The input layer receives inputs, for example pixel values from a camera. The output layer delivers results to the outside world. This may be, for example, a value indicating the probability that the input image is a cat. Between the input and output layers are hidden layers consisting of artificial neurons, also known as nodes. Neurons in one layer are connected to neurons in the next layer, and there can be a different number of neurons in each layer.
Number of neurons in each successive layer. Neurons perform computations and pass information through the network from input neurons to output neurons.

Figure 4: Computation performed by each neuron

As shown in Figure 4, the calculation performed by each neuron (except for the neurons in the input layer) generates the so-called activation value. This value is computed by running a formula (the activation function) that receives as input the activation values of all neurons in the previous layer, the weights assigned to the connections between neurons (these weights change as the network learns), and the individual bias of each neuron. Note that this bias is a default constant value and has nothing to do with the bias considered earlier in Section 2.4). Running different activation functions may result in different activation values being calculated. These values are usually centered around zero and have a range between -1 (meaning that the neuron is “disinterested”) and +1 (meaning that the neuron is “very interested”).

When training the neural network, each neuron is preset to a bias value and the training data is passed through the network, with each neuron performing the activation function to eventually generate an output. The generated output is then compared to the known correct output (in this example of supervised learning, labeled data is used). The difference between the actual output and the known correct result is then fed back through the network to change the values of the weights at the connections between neurons to minimize this difference. As more training data is fed into the network, the weights are adjusted as the network learns. Eventually, the outputs produced are considered good enough to complete the training.

Coverage measures for neural networks

Achieving white-box test coverage criteria (e.g., statement, branch, modified condition/decision (MC/DC) coverage) is mandatory for compliance with some security-related standards when using traditional imperative source code and is recommended by many testing experts for other critical applications. Monitoring and improving coverage supports the design of new test cases, which leads to increased confidence in the test object.

Using such metrics to measure neural network coverage is not very useful because the same code is usually executed each time the neural network is executed. Instead, coverage measures have been proposed that are based on coverage of the structure of the neural network itself, more specifically, the neurons within the network. Most of these measures are based on the activation values of the neurons.

Coverage for neural networks is a new area of research. Academic papers have only been published since 2017, and thus there is little objective evidence (e.g., duplicated research) to show that the proposed measures are effective. It should be noted, however, that despite the fact that assertion and decision coverage have been used for over 50 years, there is also little objective evidence of their relative effectiveness, even though they are mandated for measuring coverage of software in safety-critical applications such as medical devices and avionics systems.

The following coverage criteria for neural networks have been proposed and applied by researchers for a variety of applications:

  • Neuron coverage: Complete neuron coverage requires that every neuron in the neural network achieves an activation value greater than zero. This is very easy to achieve in practice, and research has shown that nearly 100% coverage is achieved for a variety of deep neural networks with very few test cases. This coverage measure is most useful as an alarm signal when it is not achieved.
  • Threshold coverage: Full threshold coverage requires that every neuron in the neural network reaches an activation value greater than a specified threshold. The researchers who developed the DeepXplore framework proposed to measure neuron coverage by the activation value that exceeds a threshold that would change depending on the situation. They conducted their research using a threshold of 0.75 when they reported that they efficiently found thousands of false corner case behaviors using this white-box approach.
    This type of coverage has been renamed here to more easily distinguish it from neuron coverage with a threshold set to zero, as some other researchers use the term “neuron coverage” for neuron coverage with a threshold of zero.
  • Sign-change coverage: To achieve complete sign change coverage, test cases must cause each neuron to reach both positive and negative activation values.
  • Value-change coverage: To achieve complete value change coverage, the test cases must cause each neuron to reach two activation values where the difference between the two values exceeds a specified value.
  • Sign-sign coverage: This coverage takes into account neuron pairs in adjacent layers and the sign that their activation values take. For a neuron pair to be considered covered, a test case must show that changing the sign of a neuron in the first layer causes the neuron in the second layer to change its sign, while the signs of all other neurons in the second layer remain unchanged. This is a similar concept to MC/DC coverage for imperative source code.

Researchers have reported other coverage measures based on layers (although they are simpler than sign coverage), and a successful approach that uses nearest neighbor algorithms to identify meaningful changes in neighboring neuron groups has been implemented in the TensorFuzz tool.

Source: ISTQB®: Certified Tester AI Testing (CT-AI) Syllabus Version 1.0

Was this article helpful?

Related Articles