Typical Metrics Collected in Performance Testing
Why Performance Metrics are Needed?
Accurate measurements and the metrics that are derived from those measurements are essential for defining the goals of performance testing and for evaluating the results of performance testing. Performance testing should not be undertaken without first understanding which measurements and metrics are needed. The following project risks apply if this advice is ignored:
- It is unknown if the levels of performance are acceptable to meet operational objectives
- The performance requirements are not defined in measurable terms
- It may not be possible to identify trends that may predict lower levels of performance
- The actual results of a performance test cannot be evaluated by comparing them to a baseline set of performance measures that define acceptable and/or unacceptable performance
- Performance test results are evaluated based on the subjective opinion of one or more people
- The results provided by a performance test tool are not understood
Collecting Performance Measurements and Metrics
As with any form of measurement, it is possible to obtain and express metrics in precise ways. Therefore, any of the metrics and measurements described in this section can and should be defined to be meaningful in a particular context. This can be done by running some initial tests to find out which metrics need to be changed and which ones need to be added.
For example, the metric of response time is likely to be included in any set of performance metrics. However, to be meaningful and actionable, the response time metric will need to be further defined in terms of time of day, number of concurrent users, amount of data being processed, and so forth. The metrics collected in a specific performance test will vary based on the
- business context (business processes, customer and user behavior, and stakeholder expectations),
- operational context (technology and how it is used)
- test objectives
For instance, the metrics used to test the performance of an international e-commerce site will be different from those used to test the performance of an embedded system used to control the functionality of a medical device.
One common way to group performance measurements and metrics is by the technical, business, or operational environment in which the performance needs to be measured.
These are the most common types of measurements and metrics that can be found through performance testing.
Performance metrics will vary by the type of technical environment, as shown in the following list:
- Internet-of-Things (IoT)
- Desktop client devices
- Server-side processing
- The nature of software running in the environment (e.g., embedded)
The metrics include the following:
- Response time (e.g., per transaction, per concurrent user, page load times)
- Resource utilization (e.g., CPU, memory, network bandwidth, network latency, available disk space, I/O rate, idle and busy threads)
- Throughput rate of key transactions (i.e., the number of transactions that can be processed in a given period of time)
- Batch processing time (e.g., wait times, throughput times, database response times, completion times)
- The number of errors affecting performance
- Completion time (e.g., for creating, reading, updating, and deleting data)
- Background load on shared resources (especially in virtualized environments)
- Software metrics (e.g., code complexity)
From the business or functional perspective, performance metrics may include the following:
- Business process efficiency (e.g., the speed of performing an overall business process, including normal, alternate, and exceptional use case flows)
- Throughput of data, transactions, and other units of work performed (e.g., orders processed per hour, data rows added per minute)
- Service Level Agreement (SLA) compliance or violation rates (e.g., SLA violations per unit of time)
- Scope of usage (e.g., percentage of global or national users conducting tasks at a given time)
- Concurrency of usage (e.g., the number of users concurrently performing a task)
- Timing of usage (e.g., the number of orders processed during peak load times)
The operational aspect of performance testing focuses on tasks that are generally not considered to be user-facing in nature. These include the following:
- Operational processes (e.g., the time required for environment start-up, backups, shutdown, and resumption times)
- System restoration (e.g., the time required to restore data from a backup)
- Alerts and warnings (e.g., the time needed for the system to issue an alert or warning)
Selecting Performance Metrics
It should be noted that collecting more metrics than required is not necessarily a good thing. Each metric chosen requires a means for consistent collection and reporting. It is important to define an attainable set of metrics that support the performance test objectives.
For example, the Goal-Question-Metric (GQM) approach is a helpful way to align metrics with performance goals. The idea is to first establish the goals, then ask questions to know when the goals have been achieved. Metrics are associated with each question to ensure the answer to the question is measurable. (See Section 4.3 of the Expert Level Syllabus – Improving the Testing Process. It should be noted that the GQM approach doesn’t always fit the performance testing process. For example, some metrics represent a system’s health and are not directly linked to goals.
It is important to realize that after the definition and capture of initial measurements, further measurements and metrics may be needed to understand true performance levels and to determine where corrective actions may be needed.
Aggregating Results from Performance Testing
The purpose of aggregating performance metrics is to be able to understand and express them in a way that accurately conveys the total picture of system performance.
When performance metrics are only viewed at the detailed level, it can be difficult to draw the correct conclusion, especially for business stakeholders.
For many stakeholders, the main concern is that the response time of a system, web site, or other test object is within acceptable limits.
Once deeper understanding of the performance metrics has been achieved, the metrics can be aggregated so that:
- Business and project stakeholders can see the “big picture” status of system performance
- Performance trends can be identified
- Performance metrics can be reported in an understandable way
Key Sources of Performance Metrics
System performance should be only minimally impacted by the metrics collection effort (known as the “probe effect”). In addition, the volume, accuracy and speed with which performance metrics must be collected make tool usage a requirement. While the combined use of tools is not uncommon, it can introduce redundancy in the usage of test tools and other problems (see Section 4.4).
There are three key sources of performance metrics:
Performance Test Tools
All performance test tools provide measurements and metrics as the result of a test. Tools may vary in the number of metrics shown, the way in which the metrics are shown, and the ability for the user to customize the metrics to a particular situation (see also Section 5.1).
Some tools collect and display performance metrics in text format, while more robust tools collect and display performance metrics graphically in dashboard format. Many tools offer the ability to export metrics to facilitate test evaluation and reporting.
Performance Monitoring Tools
Performance monitoring tools are often employed to supplement the reporting capabilities of these tools (see also Section 5.1). In addition, monitoring tools may be used to monitor system performance on an ongoing basis and to alert system administrators to lowered levels of performance and higher levels of system errors and alerts. These tools may also be used to detect and notify in the event of suspicious behavior (such as denial-of-service attacks and distributed denial-of-service attacks).
Log Analysis Tools
There are tools that scan server logs and compile metrics from them. Some of these tools can create charts to provide a graphical view of the data.
Errors, alerts, and warnings are normally recorded in server logs. These include:
- High resource usage, such as high CPU utilization, high levels of disk storage consumed, and insufficient bandwidth
- Memory errors and warnings, such as memory exhaustion
- Deadlocks and multi-threading problems, especially when performing database operations
- Database errors, such as SQL exceptions and SQL timeouts
Typical Results of a Performance Test
In functional testing, particularly when verifying specified functional requirements or functional elements of user stories, the expected results can usually be defined clearly and the test results interpreted to determine if the test passed or failed. For example, a monthly sales report shows either a correct or an incorrect total.
Unlike tests that verify functional suitability, which often benefit from well-defined test oracles, performance testing often lacks this source of information. Not only are stakeholders notoriously bad at articulating performance requirements, but many business analysts and product owners are equally bad at eliciting them.Testers often receive limited guidance to define the expected test results.
When evaluating performance test results, it is important to look at the results closely.
Initial raw results can be misleading, with performance failures hidden beneath apparently good overall results. For example, resource utilization may be well under 75% for all key potential bottleneck resources, but the throughput or response time of key transactions or use cases is an order of magnitude too slow.
The specific results to be evaluated vary depending on the tests being run, and often include those discussed in Section 2.1.