Analysis, Design and Implementation of Performance Testing

Typical communication protocols

Communication protocols define a set of communication rules between computers and systems. In order to design tests to target specific parts of the system, the protocols must be understood.

Communication protocols are often described by the layers of the Open Systems Interconnection (OSI) model (see ISO/IEC 7498-1), although some protocols may fall outside this model. Protocols from layer 5 (session layer) to layer 7 (application layer) are typically used for performance testing. Common protocols include:

  • Database – ODBC, JDBC, other vendor-specific protocols.
  • Web service – SOAP, REST

In general, the OSI layer that is most focused on in performance testing refers to the layer of the architecture under test. For example, when testing a low-level embedded architecture, the lower numbered layers of the OSI model are usually the focus.

Other protocols used in performance testing include:

  • Network – DNS, FTP, IMAP, LDAP, POP3, SMTP, Windows Sockets, CORBA.
  • Mobile – TruClient, SMP, MMS
  • Remote access – Citrix ICA, RTE
  • SOA – MQSeries, JSON, WSCL

It is important to understand the entire system architecture, as performance testing can be performed on a single system component (e.g., web server, database server) or on an entire system using end-to-end testing. Traditional 2-tier applications built with a client-server model specify the “client” as the GUI and primary user interface and the “server” as the backend database. These applications require the use of a protocol such as ODBC to access the database. With the development of web-based applications and multi-tier architectures, many servers are involved in processing information that is eventually displayed in the user’s browser.

Depending on the part of the system that is to be tested, an understanding of the protocol to be used is required. So, when it comes to end-to-end testing where user activity is emulated from the browser, a web protocol such as HTTP/HTTPS is used. This way, interaction with the graphical user interface can be bypassed and the tests can focus on the communication and activities of the backend servers.


Transactions describe the totality of activities performed by a system from the time of initiation to the completion of one or more processes (requests, operations, or operational processes). Transaction response time can be measured for the purpose of evaluating system performance. During a performance test, these measurements are used to identify any components that require correction or optimization.

Simulated transactions may include a think time to better reflect the time of a real user performing an action (e.g., pressing the “SEND” button). The transaction response time plus the think time equals the elapsed time for that transaction.

The transaction response times recorded during the performance test show how this measurement changes under different loads on the system. The analysis may show no degradation under load, while other measurements may show severe degradation. By increasing the load and measuring the underlying transaction times, it is possible to correlate the cause of the degradation with the response times of one or more transactions.

Transactions can also be nested so that individual and aggregate activities can be measured. This can be useful, for example, in understanding the performance efficiency of an online ordering system. The tester may want to measure both the individual steps of the ordering process (e.g., searching for an item, adding the item to the cart, paying for the item, confirming the order) and the ordering process as a whole. By nesting transactions, both sets of information can be captured in one test.

Identification of operational profiles

Operational profiles specify specific patterns of interaction with an application, such as users or other system components. Multiple operational profiles can be specified for a given application. They can be combined to create the desired load profile for achieving specific performance test objectives.

The following key steps for identifying operational profiles are described in this section:

  1. Identify the data to be collected
  2. Collect the data using one or more sources
  3. Analyze the data to create the operational profiles

Identify data
Where users interact with the system under test, the following data are collected or estimated to model their operational profiles (i.e., how they interact with the system):

  • Different types of user personas and their roles (e.g., standard user, registered member, administrator, user groups with specific privileges).
  • Various generic tasks were performed by these users/roles (e.g., browsing a website for information, searching a website for a specific product, performing role-specific activities). Note that these tasks are generally best modeled at a high level of abstraction (e.g., at the level of business processes or larger user stories).
  • Estimated number of users for each role/task per unit of time in a given time period. This information is also useful for later workload profiling.

Collecting data
The above data can be collected from a number of sources:

  • Conducting interviews or workshops with stakeholders such as product owners, sales managers, and (potential) end-users. These conversations often shed light on the key operational profiles of users and provide answers to the basic question “Who is this application for?”.
  • Functional specifications and requirements (if available) are a valuable source of information about intended usage patterns that can also help identify user types and their operating profiles. When functional specifications are expressed as user stories, the standard format directly enables the identification of user types (i.e., As <a user type>, I want <a capability> to produce <a benefit>). Similarly, UML use case diagrams and descriptions to identify the “actor” for the use case.
  • Analysis of usage data and metrics obtained from similar applications can support the identification of user types and provide initial indications of expected user numbers. It is recommended to access automatically monitored data (e.g., from a webmaster management tool). This includes monitoring logs and data from the use of the current operating system if an upgrade of that system is planned
  • Monitoring the behavior of users performing predefined tasks with the application can provide insight into the types of operational profiles to be modeled for performance testing. It is recommended that this task be coordinated with planned usability testing (especially if a usability lab is available).

Constructing Operational Profiles
The following steps are performed to identify and construct usage profiles for users:

  • A top-down approach is followed. Relatively simple, broad operational profiles are initially created, which are only further subdivided if necessary to achieve the performance test objectives.
  • Certain user profiles may be singled out as relevant to performance testing if they involve tasks that are performed frequently, require critical (high risk) or frequent transactions between different system components, or involve the transfer of potentially large amounts of data.
  • Operational profiles are reviewed and refined with key stakeholders before being used for load profiling.

The system under test is not always subject to user-imposed loads. Operational profiles may also be required for performance testing of the following types of systems (note that this list is not exhaustive):

Offline batch processing systems
The focus here is primarily on the throughput of the batch processing system and its ability to operate within a specified time period. The operating profiles focus on the types of processing required by batch processing systems. For example, the operational profiles for a stock trading system (which typically includes online and batch processing of transactions) may include those for payments, credential verification, and legal compliance checks for specific types of stock transactions. Each of these operational profiles would result in

different paths through the entire batch process for a stock. The steps described above for identifying the operational profiles of user-based online systems can also be applied to batch processing.

Systems of systems
Components in a multisystem environment (which may also be embedded) respond to different types of input from other systems or components. Depending on the nature of the system under test, this may require modeling several different operational profiles to effectively represent the types of inputs provided by these supplier systems. This may require detailed analysis (e.g., of buffers and queues) in conjunction with system architects and based on system and interface specifications.

Create load profiles

A load profile specifies the activity that a component or system under test may experience in production. It consists of a specified number of instances that will perform the actions of the predefined operational profiles over a specified period of time. If the instances are users, the term “virtual users” is commonly used.

The most important information needed to create a realistic and repeatable load profile is:

  • The target of the performance test (e.g., to evaluate system behavior under stress load).
  • Operating profiles that accurately reflect individual usage patterns
  • Known throughput and concurrency issues
  • The amount and time distribution with which the operational profiles should be executed so that the SUT experiences the desired load. Typical examples are:
    • Ramp-ups: Steadily increasing load (e.g., adding one virtual user per minute).
    • Ramp-downs: Steadily decreasing load
    • Incremental: Immediate changes in load (e.g., adding 100 virtual users every five minutes)
    • Predefined distributions (e.g., volume that mimics daily or seasonal business cycles)

The following example shows the creation of a load profile with the goal of creating stress conditions for the system under test (at or above the expected maximum that a system can handle).

The upper diagram shows a load profile consisting of a step-by-step input of 100 virtual users. These users perform the activities defined in operational profile 1 throughout the duration of the test. This is typical of many power load profiles that represent a background load.

The middle diagram shows a load profile that consists of an increase to 200 virtual users that is sustained for two hours before decreasing again. Each virtual user performs the activities defined in operational profile 2.

The bottom diagram shows the load profile resulting from the combination of the two described above. The system under test is subjected to a three-hour load.

Throughput and concurrency analysis

It is important to understand different aspects of the workload: Throughput and Concurrency. To properly model operational and load profiles, both aspects should be considered.

System Throughput
System throughput is a measure of the number of transactions of a given type that the system processes in a unit of time. For example, the number of orders per hour or the number of HTTP requests per second. System throughput is distinct from network throughput, which is the amount of data transmitted over the network.

System throughput defines the load on the system. Unfortunately, for interactive systems, the number of concurrent users is often used instead of throughput to define the load. This is true in part because this number is often easier to find, and in part, because this is the way load testing tools define load. Without defining operational profiles – what each user is doing and how intensely they are working (which is also the throughput for a user) – the number of users is not a good measure of load. For example, if 500 users perform short queries every minute, we have a throughput of 30,000 queries per hour. If the same 500 users perform the same queries, but one per hour, the throughput is 500 queries per hour. So there are the same 500 users, but a 60-fold difference between the loads and at least a 60-fold difference in the hardware requirements for the system.

Workload modeling is usually done by considering the number of virtual users (execution threads) and think time (delays between user actions). However, system throughput is also defined by processing time, and this time can increase as the load increases.

System throughput = [number of virtual users] / ([processing time] + [think time]).

So if processing time increases, throughput can decrease significantly, even if everything else stays the same.

System throughput is an important consideration when testing batch processing systems. In this case, throughput is typically measured by the number of transactions that can be completed within a given time frame (e.g., a nightly batch processing window).

Concurrency is a measure of the number of concurrent/parallel execution threads. For interactive systems, it can also be a number of concurrent/parallel users. Concurrency is usually modeled in load testing tools by specifying the number of virtual users.

Concurrency is an important metric. It represents the number of parallel sessions, each of which can use its own resources. Even if the throughput is the same, the number of resources used may vary depending on concurrency. Typical test setups are closed systems (from the queueing theory point of view) where the number of users in the system is fixed (fixed population). In a closed system, if all users are waiting for the system to respond, no new users can be added. Many public systems are open systems – new users are added all the time, even if all current users are waiting for the system to respond.

Basic structure of a performance test script

A performance testing script should simulate a user or component activity that contributes to the load on the system under test (this may be the entire system or one of its components). It initiates requests to the server in the right order and at a certain pace.

The best method for creating scripts for performance testing depends on the type of load generation.

  • The traditional method is to record the communication between the client and the system or component at the log level and then play it back after the script has been parameterized and documented. Parameterization results in a scalable and maintainable script, but the task of parameterization can be time-consuming.
  • Recording at the GUI level typically involves recording GUI actions from a single client with a test execution tool and executing the script with the load generation tool to represent multiple clients.
  • Programming can be done via protocol requests (e.g., HTTP requests), GUI actions, or API calls. When programming scripts, the exact sequence of requests sent to and received from the real system must be determined, which is not necessarily trivial.

Usually, a script consists of one or more sections of code (written in a general programming language with some extensions or in a specialized language) or an object that can be presented to the user by the tool in a graphical user interface. In both cases, the script contains server requests that generate a load (e.g., HTTP requests) and programming logic that specifies how exactly these requests are invoked (e.g., in what order, at what time, with what parameters, what to check). The more sophisticated the logic, the more powerful programming languages must be used.

Overall structure
Often the script has an initialization section (where everything is prepared for the main part), main sections that can be executed multiple times, and a cleanup section (where the necessary steps are taken to end the test properly).

Data collection
To capture response times, timers should be added to the script to measure how long a request or combination of requests takes. The measured requests should correspond to a meaningful unit of logical work, such as a business transaction to add an item to order or to submit an order.

It is important to understand what exactly is being measured: Protocol-level scripts are just about server and network response time, while GUI scripts measure end-to-end time (though what exactly is measured depends on the technology used).

Result checking and error handling
An important part of the script is result checking and error handling. Even with the best load testing tools, the default error handling is usually minimal (e.g., checking the HTTP request return code), so it is recommended to add additional checks to verify what the requests actually return. Also, if cleanup is required in the event of an error, it will likely need to be done manually. It is recommended to use indirect methods to verify that the script is doing what it is supposed to do, such as checking the database to ensure that the correct information has been added.

Scripts can include other logic that sets rules for when and how server requests are made. An example of this is setting synchronization points by specifying that the script should wait for an event at that point before proceeding. Synchronization points can be used to ensure that a particular action is called at the same time, or to coordinate work between multiple scripts.

Performance testing scripts are software, so creating a performance testing script is a software development activity. It should include quality assurance and testing to verify that the script works as expected with all input data.

Implementing scripts for performance testing

Performance testing scripts are implemented based on the PTP and load profiles. While the technical details of the implementation differ depending on the approach and tool used, the overall process remains the same. A performance script is created using an integrated development environment (IDE) or script editor to simulate a user or component behavior. Typically, the script is created to simulate a specific operational profile (although it is often possible to combine multiple operational profiles into one script with conditional statements).

Once the sequence of requirements is determined, the script can be recorded or programmed
depending on the approach. Recording usually ensures that it simulates the
real system simulated, while programming is based on knowledge of the correct query sequence.

If log-level recording is used, an essential step after recording in most cases is to replace all recorded internal identifiers that define the context. These identifiers must be converted to variables that can be changed between runs by corresponding values from the query response letters (e.g., a user identifier that is recorded at login and must be specified for all subsequent transactions). This is part of script parameterization, sometimes referred to as “correlation”. In this context, the word correlation has a different meaning than in statistics (where it means a relationship between two or more things). Advanced load testing tools can do some of the correlation automatically, so it is transparent in some cases, but more complex cases may require manual correlation or adding new correlation rules. Incorrect or missing correlation is the main reason why recorded scripts are not replayed.

If multiple virtual users access the same data set with the same username (as is usually the case when replaying a recorded script without any further changes beyond the necessary correlation), this is an easy way to get misleading results. The data could be fully cached (copied from disk to memory to speed up access), and the results would be much better than in production (where such data can be read from disk). Using the same users and/or data can also cause concurrency issues (e.g., if data is locked while a user is updating it), and results would be much worse than in production because the software would wait until the lock is released before the next user can lock the data for updating.

Scripts and test harnesses should therefore be parameterized (i.e., fixed or recorded data should be replaced with values from a list of possible options) so that each virtual user uses an appropriate data set. The term “proper” here means that the data is different enough to avoid caching and concurrency issues, which is specific to the system, data, and test requirements. This further parameterization depends on the data in the system and how the system works with that data and is therefore usually done manually, although many tools provide support here.

There are cases when some data must be parameterized for the test to work more than once – for example, when an order is created and the order name must be unique. If the order name is not parameterized, the test will fail as soon as an attempt is made to create an order with an existing (recorded) name.

To match operation profiles, think times should be inserted and/or adjusted (if recorded) to generate an appropriate number of requests/throughput.

When scripts are created for individual operating profiles, they are combined into a scenario that implements the entire load profile. The load profile controls how many virtual users are started with each script, when and with what parameters. The exact implementation details depend on the particular load testing tool or harness.

Preparations for performance test execution

The main activities to prepare for performance test execution include:

  • Setting up the system under test
  • Deploying the environment
  • Set up the tools for load generation and monitoring and ensure that all necessary information is collected.

It is important that the test environment is as close as possible to the production environment. If this is not possible, there must be a clear understanding of the differences and how to transfer the test results to the production environment. Ideally, the real production environment and data will be used, but testing in a scaled-down environment can also help mitigate a number of performance risks.

It is important to remember that performance is a non-linear function of the environment. Thus, the further the environment is from the production standard, the more difficult it becomes to make accurate projections of production performance. The unreliability of the projections and the increased risk grows the less the test system resembles production.

The most important components of the test environment are the data, hardware and software configuration, and network configuration. The size and structure of the data can significantly affect load test results. Using a small sample size of data or a sample size with a different data complexity for performance testing can lead to misleading results, especially if the production system uses a large amount of data. It is difficult to predict how much the data size will affect performance before real testing is performed. The closer the test data is in size and structure to the production data, the more reliable the test results will be.

If data is created or modified during testing, it may be necessary to restore the original data before the next test cycle to ensure that the system is in the correct state.

If some parts of the system or some of the data are not available for performance testing for some reason, a workaround should be implemented. For example, a stub can be implemented to replace and emulate a third-party component responsible for credit card processing. This process is often referred to as “service virtualization” and there are specialized tools that support this process. The use of such tools is highly recommended to isolate the system under test.

There are many ways to set up environments. For example, one of the following can be used

  • Traditional internal (and external) test labs
  • Cloud as an environment using Infrastructure as a Service (IaaS), when some parts of the system or the entire system are deployed in the cloud
  • Cloud as an environment using Software as a Service (SaaS), when vendors provide the load testing service

Depending on the specific objectives and the systems under test, one test environment may be preferred over another. Example,

  • To test the impact of a performance improvement (performance tuning), an isolated lab environment may be a better option to detect even small changes introduced by the change.
  • For load testing the entire production environment from start to finish to ensure the system can handle the load without major issues, testing via the cloud or service may be more appropriate. (Note that this only works for SUTs that can be accessed from a cloud.)
  • To minimize costs when performance testing is time-limited, creating a test environment in the cloud may be a more economical solution.

Regardless of the deployment type, both hardware and software should be configured to match the test objective and plan. If the environment is the same as the production environment, it should be configured in the same way. However, if there are differences, the configuration may need to be adjusted to compensate for those differences. For example, if the test machines have less physical memory than the production machines, the software memory parameters (e.g., Java heap size) may need to be adjusted to avoid memory swaps.

Proper network configuration/emulation is important for global and mobile systems. For global systems (i.e., systems with globally distributed users or processing), one of the options may be to deploy load generators at the locations where users are located. For mobile systems, network emulation remains the most viable option because of the different types of networks that can be used. Some load testing tools have built-in network emulation tools and there are stand-alone tools for network emulation.

Load generation tools should be properly deployed and monitoring tools should be configured to capture all metrics required for testing. The list of metrics depends on the test objectives, but it is recommended to capture at least basic metrics for all tests.

Depending on the load, specific tool/load generation approach, and machine configuration, more than one load generation machine may be required. To verify the setup, the machines involved in load generation should also be monitored. This will help prevent the load from not being properly maintained because one of the load generators is running slowly.

Depending on the setup and tools used, the load testing tools may need to be configured to generate the appropriate load. For example, certain parameters can be set for browser emulation or IP spoofing (simulating that each virtual user has a different IP address) can be used.

Before testing is performed, the environment and setup must be validated. This is usually done by running a controlled series of tests and verifying the test results and that the monitoring tools are recording the important information.

A variety of techniques can be used to verify that the test is working as designed, including log analysis and database content review. Preparing for the test includes verifying that the required information is logged, that the system is in the correct state, and so on. For example, if the test significantly changes the state of the system (adding/changing information in the database), it may be necessary to reset the system to its original state before repeating the test.

Source: ISTQB®: Certified Tester Performance Testing Syllabus Version 1.0

Was this article helpful?

Related Articles