Principles of Performance Testing
Performance efficiency (or simply “performance”) is an essential part of providing a “good experience” for users when they use their applications on a variety of fixed and mobile platforms. Performance testing is a key part of figuring out what the end user thinks is a good level of quality, and it often works closely with other fields like usability engineering and performance engineering.
Also, evaluating functional suitability, usability, and other quality characteristics under load, such as during a performance test, may show load-specific problems that affect those characteristics.
Performance testing is not limited to the web-based domain, where the end user is the focus. It can also be used in a wide range of application domains and system architectures, such as client-server, distributed, and embedded systems. The ISO 25010 [ISO 25000] standard categorizes performance efficiency.
Product Quality Model as a non-functional quality characteristic with the three sub-characteristics described below. Proper focus and prioritization depend on the risks assessed and the needs of the various stakeholders. Analyzing the test results may reveal additional areas of risk that must be addressed
Time Behavior: Generally, the evaluation of time behavior is the most common performance testing objective. This aspect of performance testing examines the ability of a component or system to respond to user or system inputs within a specified time and under specified conditions. Measurements of time behavior may vary from the “end-to-end” time taken by the system to respond to user input, to the number of CPU cycles required by a software component to execute a particular task.
Resource Utilization: If the availability of system resources is found to be a risk, specific performance tests can be run to look into how those resources (like the limited RAM) are being used.
Capacity: If issues with how the system behaves at the required capacity limits (such as the number of users or the amount of data) are found to be a risk, performance tests may be done to see if the system architecture is good enough.
Performance testing often takes the form of experimentation, which enables the measurement and analysis of specific system parameters. These can be done in a loop to help with system analysis, design, and implementation, as well as to help make architectural decisions and shape what stakeholders expect.
The following performance testing principles are particularly relevant:
- Tests must be aligned to the defined expectations of different stakeholder groups, in particular users, system designers, and operations staff.
- The tests must be reproducible. Statistically identical results (within a specified tolerance) must be obtained by repeating the tests on an unchanged system.
- The tests must yield results that are both understandable and can be readily compared to stakeholder expectations.
- Tests can be run on complete or partial systems, or on test environments that are representative of the production system, where resources allow.
- The tests must be practically affordable and executable within the timeframe set by the project.
All three of the above quality sub-characteristics will impact the ability of the system under test (SUT) to scale.
Types of Performance Testing
Different types of performance testing can be defined. Each of these may be applicable to a given project, depending on the objectives of the test.
Performance testing is an umbrella term that includes any kind of testing focused on the performance (responsiveness) of the system or component under different loads.
Load testing sees how well a system can handle increasing levels of realistic loads caused by transaction requests from a controlled number of users or processes running at the same time.
Stress testing focuses on the ability of a system or component to handle peak loads that are at or beyond the limits of its anticipated or specified workloads. Stress testing is also used to figure out how well a system can work when it has less access to resources like computing power, bandwidth, and memory.
Scalability testing focuses on the ability of a system to meet future efficiency requirements, which may be beyond those currently required. The objective of these tests is to determine the system’s ability to grow (e.g., with more users and larger amounts of stored data) without violating the currently specified performance requirements or failing. Once the limits of scalability are known, threshold values can be set and monitored in production to provide a warning of problems that may be about to arise.
In addition, the production environment may be adjusted with appropriate amounts of hardware.
Spike testing looks at how well a system can handle sudden bursts of peak loads and get back to a steady state afterward.
Endurance testing focuses on the stability of the system over a time frame specific to the system’s operational context. This type of testing makes sure that there are no problems with resource capacity, such as memory leaks, database connections, or thread pools, that could slow down performance or cause failures at breaking points.
Concurrency testing focuses on the impact of situations where specific actions occur simultaneously (e.g., when large numbers of users log in at the same time).
Concurrency issues are notoriously difficult to find and reproduce, particularly when the problem occurs in an environment where testing has little or no control, such as production.
Capacity testing determines how many users and/or transactions a given system will support and still meet the stated performance objectives. These objectives may also be stated with regard to the data volumes resulting from the transactions.
Testing Types in Performance Testing
The principal testing types used in performance testing include static testing and dynamic testing.
Static testing activities are often more important for performance testing than for functional suitability testing. This is because so many critical performance defects are introduced in the architecture and design of the system. These defects can be introduced by misunderstandings or a lack of knowledge by the designers and architects. These defects can also be introduced because the requirements did not adequately capture the response time, throughput, or resource utilization targets, the expected load and usage of the system, or the constraints.
Static testing activities for performance can include:
- Reviews of requirements with a focus on performance aspects and risks
- Reviews of database schemas, entity-relationship diagrams, metadata, stored procedures, and queries
- Reviews of the system and network architecture
- Reviews of critical segments of the system code (e.g., complex algorithms)
As the system is built, dynamic performance testing should start as soon as possible.
Opportunities for dynamic performance testing include:
- During unit testing, including using profiling information to determine potential bottlenecks and dynamic analysis to evaluate resource utilization
- During component integration testing, across key use cases and workflows, especially when integrating different use case features or integrating with the “backbone” structure of a workflow
- During system testing of overall end-to-end behaviors under various load conditions
- During system integration testing, especially for data flows and workflows across key inter-system interfaces. In system integration testing, it is not uncommon for the “user” to be another system or machine (e.g., inputs from sensor inputs and other systems)
- During acceptance testing, to build user, customer, and operator confidence in the proper performance of the system and to fine tune the system under real world conditions (but generally not to find performance defects in the system)
In higher test levels, such as system testing and system integration testing, the use of realistic environments, data, and loads is critical for accurate results (see Chapter 4).
In Agile and other iterative-incremental lifecycles, teams should incorporate static and dynamic performance testing into early iterations rather than waiting for final iterations to address performance risks.
If custom or new hardware is part of the system, early dynamic performance tests can be performed using simulators. However, it is good practice to start testing on the actual hardware as soon as possible, as simulators often do not adequately capture resource constraints and performance-related behaviors.
The Concept of Load Generation
In order to carry out the various types of performance testing described in Section 1.2, representative system loads must be modeled, generated, and submitted to the system under test. Loads are comparable to the data inputs used for functional test cases, but differ in the following principal ways:
- A performance test load must represent many user inputs, not just one
- A performance test load may require dedicated hardware and tools for generation
- The generation of a performance test load is dependent on the absence of any functional defects in the system under test, which may impact test execution.
The efficient and reliable generation of a specified load is a key success factor when conducting performance tests. There are different options for load generation.
Load Generation via the User Interface
This may be an adequate approach if only a small number of users are to be represented and if the required numbers of software clients are available from which to enter required inputs. This approach may also be used in conjunction with functional test execution tools, but may rapidly become impractical as the numbers of users to be simulated increase. The stability of the user interface (UI) also represents a critical dependency. Frequent changes can have an impact on the repeatability of performance tests as well as the maintenance costs. Testing through the UI may be the most representative approach for end-to-end tests.
Load Generation using Crowds
This approach depends on the availability of a large number of testers who will represent real users. In crowd testing, the testers are organized such that the desired load can be generated. This may be a suitable method for testing applications that are reachable from anywhere in the world (e.g., web-based), and may involve the users generating a load from a wide range of different device types and configurations.
Although this approach may enable very large numbers of users to be utilized, the load generated will not be as reproducible and precise as other options and will be more complex to organize.
Load Generation via the Application Programming Interface (API)
This approach is similar to using the UI for data entry, but uses the application’s API instead of the UI to simulate user interaction with the system under test. The approach is therefore less sensitive to changes (e.g., delays) in the UI and allows the transactions to be processed in the same way as they would be if entered directly by a user via the UI. Dedicated scripts may be created that repeatedly call specific API routines and enable more users to be simulated compared to using UI inputs.
Load Generation using Captured Communication Protocols
This approach involves capturing user interaction with the system under test at the level of the communications protocol and then replaying these scripts to simulate potentially very large numbers of users in a repeatable and reliable manner. This tool-based approach is described in Sections 4.2.6 and 4.2.7.
Common Performance Efficiency Failure Modes and Their Causes
While there certainly are many different performance failure modes that can be found during dynamic testing, the following are some examples of common failures (including system crashes), along with typical causes:
Response time is slow at all load levels
In some cases, response is unacceptable regardless of load. This may be caused by underlying performance issues, including, but not limited to, bad database design or implementation, network latency, and other background loads. Such issues can be identified during functional and usability testing, not just performance testing, so test analysts should keep an eye open for them and report them.
Slow response under moderate-to-heavy load levels
In some cases, response degrades unacceptably with a moderate-to-heavy load, even when such loads are entirely within normal, expected, and allowed ranges. Underlying defects include saturation of one or more resources and varying background loads.
Degraded response over time
In some cases, response degrades gradually or severely over time. Underlying causes include memory leaks, disk fragmentation, increasing network load over time, growth of the file repository, and unexpected database growth.
Inadequate or graceless error handling under a heavy or over-limit load
In some cases, response time is acceptable, but error handling degrades at high and beyond-limit load levels. Underlying defects include insufficient resource pools, undersized queues and stacks, and too rapid time-out settings.
Specific examples of the general types of failures listed above include:
- A web-based application that provides information about a company’s services does not respond to user requests within seven seconds (a general industry rule of thumb). The performance efficiency of the system cannot be achieved under specific load conditions.
- A system crashes or is unable to respond to user inputs when subjected to a sudden large number of user requests (e.g., ticket sales for a major sporting event). The capacity of the system to handle this number of users is inadequate.
- System response is significantly degraded when users submit requests for large amounts of data (e.g., when a large and important report is posted on a web site for download). The capacity of the system to handle the generated data volumes is insufficient.
- Batch processing is unable to complete before online processing is needed. The execution time of the batch processes is insufficient for the time period allowed.
- A real-time system runs out of RAM when parallel processes generate large demands for dynamic memory that cannot be released in time. The RAM is not dimensioned adequately, or requests for RAM are not adequately prioritized.
- A real-time system component A that supplies inputs to real-time system component B is unable to calculate updates at the required rate. The overall system fails to respond in time and may fail. Code modules in component A must be evaluated and modified (“performance profiling”) to ensure that the required update rates can be achieved.