This section focuses on the metrics that can be used to monitor the test automation strategy and the effectiveness and efficiency of the TAS. These are separate from the SUT-related metrics used to monitor the SUT and the SUT’s tests (functional and non-functional). These are selected by the project’s higher-level test manager. The test automation metrics enable the TAM and TAE to track progress toward test automation goals and monitor the impact of changes to the test automation solution.
TAS metrics can be divided into two groups: external and internal. The external metrics are used to measure the impact of TAS on other activities (especially testing activities). The internal metrics are used to measure the effectiveness and efficiency of the TAS in meeting its objectives.
Measured TAS metrics typically include the following:
- External TAS metrics
- Benefits of automation
- Effort required to create automated tests
- Effort for analyzing automated test incidents
- Effort for maintenance of automated tests
- Ratio of errors to defects
- Time for execution of automated tests
- Number of automated test cases
- Number of passed and failed results
- Number of incorrectly failed and incorrectly passed results
- Code coverage
- Internal TAS metrics
- Scripting metrics for tools
- Automation code error density
- Speed and efficiency of TAS components
These are described below.
Benefits of automation
It is particularly important to measure and report the benefits of a TAS. The reason is that the costs (in terms of the number of people involved in a given period of time) are easy to see. People who are not involved in auditing may have an idea of the total costs, but may not see the benefits achieved.
The measurement of benefits depends on the objective of the TAS. Typically, it may be a savings in time or effort, an increase in the amount of testing performed (breadth or depth of coverage or frequency of execution), or some other benefit such as increased repeatability, better use of resources, or fewer manual errors. Possible metrics include:
- Number of hours saved in manual testing effort.
- Time saved in performing regression testing
- Number of additional test cycles performed
- Number or percentage of additional tests performed
- Percentage of automated test cases relative to the total set of test cases (although automated
cannot be easily compared with manual test cases)
- Increase in coverage (requirements, functionality, structure)
- Number of defects found earlier due to TAS (if the average benefit of defects found earlier is known, it can be “calculated” to a sum of avoided costs)
- Number of errors found by TAS that would not have been found by manual testing (e.g. reliability errors).
Note that test automation generally saves manual testing effort. This effort can be used for other types of (manual) testing (e.g., exploratory testing). The defects found by these additional tests can also be considered an indirect benefit of the TAS, since test automation made it possible to perform these manual tests. Without the TAS, these tests would not have been performed, and as a result, the additional defects would not have been found.
Effort required to create automated tests
The cost of automating tests is one of the most important cost factors in test automation. It is often higher than the cost of manually executing the same test and can therefore be a barrier to expanding test automation. While the cost of implementing a particular automated test depends largely on the test itself, other factors such as the scripting approach used, familiarity with the test tool, the environment, and the skill level of the test automation engineer also have an impact.
Since automating larger or more complex tests typically takes longer than automating short or simple tests, the calculation of test automation creation costs can be based on an average creation time. This can be further refined by looking at the average cost for a specific group of tests, such as tests that are designed to the same function
or for a specific test level. Another approach is to express the creation cost as a factor of the effort required to perform the test manually (equivalent manual test effort, EMTE). For example, automating a test case may require twice the manual test effort, i.e., twice the EMTE.
Effort for the analysis of SUT faults
Analyzing failures in the SUT discovered by automated test execution can be much more complex than for a manually executed test because the events that lead to the failure of a manual test are often known to the tester executing the test. This can be mitigated and the measure can be expressed as an average per failed test case or as a factor of EMTE. The latter is particularly appropriate when automated tests vary widely in complexity and execution time.
The available logging of the SUT and TAS plays a critical role in the analysis of failures. The logging should provide enough information to perform this analysis efficiently. Important logging features are:
- SUT logging and TAS logging should be synchronized.
- The TAS should log expected and actual behavior
- The TAS should log the actions to be performed
The SUT, on the other hand, should log all actions performed (whether the action is the result of a manual or automated test). All internal errors should be logged and all crash dumps and stack traces should be available.
Effort required to maintain automated tests
The maintenance overhead required to keep automated tests in sync with the SUT can be very high and ultimately exceed the benefit of the TAS. This has been the cause of the failure of many automation efforts. Monitoring maintenance effort is therefore important to identify when steps need to be taken to reduce maintenance effort or at least prevent it from increasing out of control.
Maintenance effort measurements can be expressed as the sum of all automated tests that need to be maintained for each new version of the SUT. They can also be expressed as an average value per updated automated test or as a factor of the EMTE. A related metric is the number or percentage of tests that require maintenance.
If the maintenance effort for automated tests is known (or can be inferred), this information can play a critical role in deciding whether or not to implement a particular functionality or fix a particular bug. The effort The effort required to maintain the test case due to the changed software should be taken into account when changing the SUT.
Ratio of bugs to defects
A common problem with automated tests is that many of them can fail for the same reason – a single defect in the software. Although the purpose of tests is to highlight defects in the software, it is wasteful to have more than one test highlighting the same defect. This is especially true for automated tests, as the effort required to analyze each failed test can be significant. Measuring the number of failed automated tests for a given defect can provide insight into where this might be a problem. The solution lies in the design of the automated tests and their selection for execution.
Time to execute automated tests
One of the easier metrics to determine is the time it takes to execute the automated tests. At the beginning of the TAS, this may not be that important, but as the number of automated test cases increases, this metric can become very important.
Number of automated test cases
This metric can be used to show the progress of the test automation project. However, it should be noted that the number of automated test cases alone does not tell you much, e.g., it does not tell you that test coverage has increased.
Number of passed and failed results
This is a common metric that indicates how many automated tests passed and how many failed to produce the expected result. Failures need to be analyzed to determine if the failure was due to a defect in the SUT or due to external factors such as a problem with the environment or with the TAS itself.
Number of false-fail and false-pass results
As shown in the previous metrics, analyzing test failures can take some time. It is even more frustrating when it turns out to be a false alarm. This is the case when the problem is in the TAS or test case, but not in the SUT. It is important that the number of false alarms (and the potential wasted effort) be kept
is kept low. False alarms can reduce confidence in the TAS. Conversely, false-pass results can be more dangerous. When a false pass occurs, there was an error in the SUT but it was not detected by test automation, so a pass result was reported. In this case, a potential error may go undetected. This may be because verification of the result was not performed properly, an invalid test oracle was used, or the test case expected an incorrect result.
Note that false positives can be caused by errors in the test code (see Automation Code Defect Density metric), but can also be caused by an unstable SUT behaving in an unpredictable manner (e.g., timing out). Test Hooks can also cause false alarms due to the level of intervention they cause.
Knowing the code coverage of the SUT by the various test cases can provide useful information. This can also be measured at a high level, such as the code coverage of the regression test suite. There is no absolute percentage that indicates adequate coverage, and 100% code coverage is possible only for the simplest software applications. However, there is general agreement that higher coverage is better because it reduces the overall risk of software deployment. This metric can also indicate activity in the SUT. For example, if code coverage drops, it most likely means that functionality has been added to the SUT, but no corresponding test case has been added to the automated test suite.
Metrics for script development
There are many metrics that can be used to monitor automation script development. Most of these metrics are similar to the SUT source code metrics. Lines of Code (LOC) and cyclomatic complexity can be used to highlight overly large or complex scripts (indicating that a redesign may be needed).
The ratio of comments to executable statements can be used to give an indication of the amount of script documentation and commenting. The number of violations of scripting standards can be used to give an indication of the extent to which these standards are being met.
Error density of automation code
Automation code is no different from SUT code in that it is software and will contain errors.
Automation code should not be considered less important than SUT code. Good coding practices and standards should be applied and the result monitored by metrics such as code error density.
These can be more easily captured with the support of a configuration management system.
Speed and efficiency of TAS components
Differences in the time required to perform the same test steps in the same environment may indicate a problem in the SUT. If the SUT does not perform the same functionality in the same amount of time, an investigation is required. This may indicate a variation in the system that is unacceptable and could worsen as the load increases. If performance is a critical requirement for the SUT, the TAS must be designed to account for this.
For many of these metrics, the trends (i.e., the way the metrics change over time) may be more valuable than the value of a metric at a particular point in time. For example, knowing that the average maintenance cost per automated test that requires maintenance is higher than it was in the last two releases of the SUT, action can be taken to identify the cause of the increase and take steps to reverse the trend.
Measurement costs should be as low as possible, and this can often be achieved by automating collection and reporting.