Why Performance Metrics are Needed
Accurate measurements and the metrics derived from them are essential for setting performance test objectives and evaluating performance test results. Performance testing should not be conducted without first being clear about what measurements and metrics are needed. The following project risks exist if this advice is ignored:
- It is not known whether performance levels are acceptable to meet operational objectives.
- Performance requirements are not defined in measurable terms.
- It may not be possible to identify trends that could predict lower performance levels.
- Actual performance test results cannot be evaluated by comparing them to a baseline set of performance measures that define acceptable and/or unacceptable performance.
- Performance test results are evaluated based on the subjective opinion of one or more individuals.
- The results provided by a performance testing tool are not understood.
Collecting Performance Measurements and Metrics
As with any form of measurement, it is possible to obtain and express metrics in precise ways. Therefore, each of the metrics and measurements described in this section can and should be defined in a way that makes sense in a particular context. This involves conducting initial tests and determining which metrics need further refinement and which need to be added.
For example, the response time metric is likely to be included in any set of performance metrics. However, to be meaningful and actionable, the response time metric needs to be further defined in terms of the time of day, the number of concurrent users, the amount of data processed, etc.
The metrics collected in a specific performance test will vary depending on the
- business context (business processes, customer and user behavior, and stakeholder expectations),
- the operational context (technology and its use).
- Test Objectives
For example, the metrics chosen for performance testing of an international e-commerce website will differ from those chosen for performance testing of an embedded system controlling the functions of a medical device.
A common way to categorize performance measurements and metrics is to consider the technical environment, business environment, or operational environment in which performance evaluation is required.
The categories of measurements and metrics listed below are those commonly used in performance testing.
Performance metrics vary depending on the type of technical environment, as shown in the list below:
- Internet of Things (IoT)
- Desktop-client devices
- Server-side processing
- Type of software running in the environment (e.g., embedded).
Metrics include the following:
- Response time (e.g., per transaction, per concurrent user, page load times)
- Resource utilization (e.g., CPU, memory, network bandwidth, network latency, available disk space, I/O rate, idle and busy threads)
- Key transaction throughput rate (i.e., the number of transactions that can be processed in a given time period)
- Batch processing time (e.g., wait times, throughput times, database response times, completion times)
- Number of errors that affect performance
- Completion time (e.g., for creating, reading, updating, and deleting data)
- Background load on shared resources (especially in virtualized environments)
- Software metrics (e.g., code complexity)
From a business or functional perspective, performance metrics may include the following:
- Business process efficiency (e.g., the speed of execution of an entire business process, including normal, alternative, and exceptional use cases)
- Throughput of data, transactions, and other units of work (e.g., jobs processed per hour, rows of data added per minute)
- Service level agreement (SLA) compliance or violation rates (e.g., SLA violations per unit time)
- The volume of usage (e.g., percentage of global or national users performing tasks at a given time)
- Concurrency of use (e.g., the number of users performing a task simultaneously)
- Timing of usage (e.g., the number of jobs processed during peak hours)
The operational aspect of performance testing focuses on tasks that are generally not considered user-facing. These include the following:
- Operational processes (e.g., the time it takes to boot up the environment, perform backups, shut down, and resume operations)
- System recovery (e.g., the time required to restore data from a backup)
- Alarms and warnings (e.g., the time it takes for the system to issue an alarm or warning)
Selecting Performance Metrics
It should be noted that collecting more metrics than necessary is not necessarily a good thing. Each metric chosen requires a means of consistent recording and reporting. It is important to define an achievable set of metrics that support the performance audit objectives.
For example, the Goal-Question-Metric (GQM) approach is a helpful way to align metrics with performance goals. The idea is to first set the goals and then ask questions to determine when the goals have been met. Each question is linked to metrics to ensure that the answer to the question is measurable. It should be noted that the GQM approach is not always appropriate for the performance testing process. For example, some metrics represent the state of a system and are not directly linked to goals.
It is important to be aware that after the initial measurements are defined and captured, further measurements and metrics may be required to understand the actual level of performance and determine where corrective action may be required.
Aggregating Results from Performance Testing
The purpose of aggregating performance metrics is to understand and express them in a way that accurately reflects the overall picture of system performance. When performance metrics are considered only at the detailed level, it can be difficult to draw the right conclusions – especially for the organization’s stakeholders.
For many stakeholders, the main concern is that the response time of a system, website, or other test object is within acceptable limits.
- Once a deeper understanding of performance metrics is achieved, the metrics can be summarized so that:
- Business and project stakeholders can see the “overall status” of system performance
- Performance trends can be identified
- Performance metrics can be reported in an understandable form
Key Sources of Performance Metrics
System performance should be minimally affected by the effort required to capture the metrics (known as the “trial effect”). In addition, the scope, accuracy, and speed with which performance metrics must be captured necessitate the use of tools. While the combined use of tools is not uncommon, it can lead to redundancy in the use of test tools and other problems.
There are three main sources of performance metrics:
Performance testing tools provide measurements and metrics as a result of a test. Tools differ in the number of metrics displayed, how the metrics are displayed, and the ability for the user to customize the metrics to a particular situation.
Some tools capture and display performance metrics in text format, while more robust tools capture performance metrics and display them graphically in a dashboard format. Many tools offer the ability to export metrics to facilitate test evaluation and reporting.
Performance monitoring tools are often used to complement the reporting capabilities of performance testing tools (see also Section 5.1). In addition, monitoring tools can be used to continuously monitor system performance and alert system administrators to lower levels of performance and higher levels of system errors and warnings. These tools can also be used to detect and report suspicious behavior (e.g., denial-of-service attacks and distributed denial-of-service attacks).
Log analysis tools are tools that scan server logs and compile metrics from them. Some of these tools can create charts to provide a graphical representation of the data.
Errors, alarms, and warnings are typically recorded in server logs. These include:
- High resource usage, such as high CPU utilization, high disk space usage, and insufficient bandwidth.
- Memory errors and warnings, such as exhausted memory
- Deadlocks and multi-threading problems, especially when performing database operations
- Database errors, such as SQL exceptions and SQL timeouts
Typical Results of a Performance Test
In functional testing, especially when testing specified functional requirements or functional elements of user stories, the expected results can usually be clearly defined and the test results interpreted to determine whether the test passed or failed. For example, a monthly revenue report shows either a correct total or an incorrect total.
While tests that verify functional suitability often benefit from well-defined test oracles, performance tests often lack this source of information. Not only are stakeholders notoriously bad at articulating performance requirements, but many business analysts and product owners are also poor at eliciting such requirements. Testers often receive limited guidance in defining expected test results.
When evaluating performance test results, it is important to look closely at the results. Initial raw results can be misleading, as seemingly good overall results may hide performance deficiencies. For example, resource utilization may be well below 75% for all key potential bottleneck resources, but throughput or response time of key transactions or use cases is an order of magnitude too slow. The specific results are to be evaluated depending on the tests performed.