Machine Learning (ML)

Forms of ML

ML algorithms can be categorized as:

  • Supervised learning,
  • Unsupervised learning and
  • Reinforcement learning.

Supervised Learning

In this type of learning, the algorithm creates the ML model in the training phase from labeled data. The labeled data, which usually includes pairs of inputs (e.g., a picture of a dog and the label “dog”), is used by the algorithm to infer the relationship between the input data (e.g., pictures of dogs) and the output labels (e.g., “dog” and “cat”) during training. In the testing phase of the ML model, a new set of unseen data is applied to the trained model to predict the output. The model is deployed once the accuracy of the output is satisfactory.

Problems solved by supervised learning are divided into two categories:

  • Classification: This is when the problem requires that an input be classified into one of several predefined classes, classification is used. Face recognition or object recognition in an image are examples of problems where classification is used.
  • Regression: This is when the ML model needs to predict a numerical output using regression. Predicting a person’s age based on input data about their habits or predicting future stock prices are examples of problems where regression is used.

It should be noted that the term regression in the context of an ML problem is not the same as the term regression used to describe the problem of software changes causing change-related errors.

Unsupervised learning

In this type of learning, the algorithm creates the ML model during the training phase from unlabeled data. The unlabeled data is used by the algorithm to infer patterns in the input data during training and assign the inputs to different classes based on their similarities. In the testing phase, the trained model is applied to a new set of unseen data to predict which classes the input data Data should be assigned to. The model is deployed once the accuracy of the output is deemed satisfactory.

Problems solved by unsupervised learning fall into two categories:

  • Clustering: this is when the problem requires identifying similarities in the input data points so that they can be grouped based on common features or attributes. Clustering is used, for example, to categorize different types of customers for marketing purposes.
  • Association: This is a problem that requires identifying interesting relationships or dependencies between data attributes. For example, a product recommendation system can identify associations based on customers’ shopping behavior.

Reinforcement Learning

Reinforcement learning is an approach in which the system (an “intelligent agent”) learns by iteratively interacting with the environment, thus learning from experience. Reinforcement learning does not use training data. The agent is rewarded when it makes a correct decision and punished when it makes an incorrect decision.
Setting up the environment, choosing the right strategy for the agent to achieve the desired goal, and designing a reward function are the main challenges in implementing reinforcement learning.
Robotics, autonomous vehicles, and chatbots are examples of applications that use reinforcement learning.

ML Workflow

The activities in the machine learning workflow are:

Understanding the objective
The purpose of the ML model to be used must be understood and agreed upon with stakeholders to ensure alignment with business professionals. Acceptance criteria (including functional ML performance metrics-see Chapter 5) should be defined for the developed model.

Selection of a framework
The selection of an appropriate AI development framework should be based on the objectives, acceptance criteria, and business professionals (see Section 1.5).

Selection and creation of the algorithm
An ML algorithm is selected based on several factors, including the goals, acceptance criteria, and available data (see Section 3.4). The algorithm may be coded manually, but often it is retrieved from a library of previously written code. The algorithm is then compiled to prepare for training the model, if necessary.

Preparing and testing data
Data preparation (see Section 4.1) includes data collection, data preprocessing, and feature engineering. Exploratory data analysis (EDA) can be performed in parallel with these activities.
The data used by the algorithm and model are based on the goals and are used by all activities in the Model Building and Testing activity shown in Figure 1. For example, if the system is a real-time trading system, the data comes from the trading market.
The data used to train, tune, and test the model must be representative of the operational data that the model will use. In some cases, it is possible to use pre-collected data sets for initial training of the model. Otherwise, the raw data usually must be preprocessed and tagged with features.
The data must be tested and all steps of automatic data preparation must be performed. For more details on testing input data, see Section 7.2.1.

Training the model
The selected ML algorithm uses training data to train the model.
Some algorithms, such as those that generate a neural network, read the training data set multiple times. Each iteration of training on the training data set is called an epoch.
Parameters that define the model structure (e.g., the number of layers of a neural network or the depth of a decision tree) are passed to the algorithm. These parameters are called model hyperparameters.
Parameters that control training (e.g., how many epochs to use when training a neural network) are also passed to the algorithm. These parameters are called algorithm hyperparameters.

Evaluation of the model
The model is evaluated against the agreed upon functional ML performance metrics using the validation data set and the results are then used to improve (tune) the model. Model evaluation and tuning should resemble a scientific experiment that must be carefully conducted under controlled conditions and clearly documented. In practice, several models are usually created and trained with different algorithms (e.g., random forests, SVM, and neural networks), and the best model is selected based on the results of the evaluation and tuning.

Tuning the model
The results of evaluating the model against the agreed upon functional ML performance metrics are used to tune the model settings to the data to improve the performance of the model.
Tuning the model can be done through hyperparameter tuning, where the training activity is modified (e.g., by changing the number of training steps or by changing the amount of data used for training) or attributes of the model are updated (e.g., the number of neurons in a neural network or the depth of a decision tree).
The three activities of training, evaluation, and tuning can be considered model generation, as shown in Figure 1.

Testing the model
Once a model has been generated (i.e., it has been trained, evaluated, and tuned), it should be tested against an independent test data set to ensure that the agreed-upon criteria for the functional performance of the ML are met (see Section 7.2.2). If the performance of the model with independent data is significantly lower than when evaluated, it may be necessary to select a different model.
In addition to functional performance tests, non-functional tests must also be performed, e.g., for the time to train the model and the time and resource consumption to make a prediction. Typically, these tests are performed by the data engineer/scientist, but testers with sufficient knowledge of the subject and access to the appropriate resources can also perform these tests.

Deploying the model
Once the model development is complete, as shown in Figure 1, the tuned model usually needs to be redeveloped for deployment along with the associated resources, including the relevant data pipeline. This is usually accomplished through the framework. Targets may include embedded systems and the cloud, where the model can be accessed via a web API.

Figure 1: ML Workflow

Use of the model
Once deployed, the model is typically part of a larger AI-based system and can be used operationally. Models can perform scheduled batch predictions at specific time intervals or run on demand in real time.

Monitoring and tuning the model
As the model is used, the situation may change and the model may deviate from its intended performance (see Sections 2.3 and 7.6). To ensure that any deviation is detected and managed, the operating model should be evaluated periodically against its acceptance criteria.
It may prove necessary to update the model settings to address the deviation issue, or it may be decided that re-training with new data is required to create a more accurate or robust model. In this case, a new model can be created and trained with updated training data. The new model can then be compared to the existing model in a type of A/B test (see Section 9.4).

The ML workflow shown in Figure 1 is a logical sequence. In practice, the workflow is applied in such a way that the steps are repeated iteratively (e.g., when evaluating the model, it is often necessary to return to the training step and sometimes to the data preparation step).

The steps shown in Figure 1 do not include integration of the ML model with the non-ML parts of the overall system. ML models usually cannot be used in isolation and must be integrated with the non-ML parts. For example, in image processing applications, there is a data pipeline that cleans and modifies data before it is passed to the ML model. If the model is part of a larger AI-based
system, it must be integrated into that system before it can be used. In this case, integration, system, and acceptance testing can be performed as described in Section 7.2.

Selecting a Form of ML

The following guidelines apply to the selection of an appropriate ML approach:

  • Sufficient training and test data should be available for the selected ML approach.
  • For supervised learning, it is necessary that the data be properly labeled.
  • If an output label is available, it may be supervised learning.
  • If the output is discrete and categorical, it may be classification.
  • If the output is numeric and continuous, it may be regression.
  • If the given data set has no output, it may be unsupervised learning.
  • If the problem is to group similar data, it may be clustering.
  • If the problem is to find co-occurring data items, it may be association.
  • Reinforcement learning is better suited for contexts where there is interaction with the environment.
  • When the problem involves multiple states and decisions must be made in each state, reinforcement learning may be applicable.

Factors involved in the selection of ML algorithms

There is no clear-cut approach for selecting the optimal ML algorithm, ML model settings, and ML model hyperparameters. In practice, this set is selected based on a mix of the following factors:

  • The required functionality (e.g., whether it is classification or prediction of a discrete value)
  • The quality characteristics required, such as:
    • Accuracy (e.g., some models may be more accurate but also slower)
    • Limitations of available memory (e.g., in an embedded system)
    • The speed of training (and retraining) the model
    • The speed of prediction (e.g., for real-time systems)
    • Requirements for transparency, interpretability, and explainability
  • The type of data available for training the model (e.g., some models may only work with image data)
  • The amount of data available for training and testing the model (e.g., some models may tend to overfit to a greater degree than other models given a limited amount of data)
  • The number of features in the input data that are expected to be used by the model (e.g., other factors such as speed and accuracy are likely to be directly affected by the number of features)
  • The expected number of classes for clustering (e.g., some models may be inappropriate for problems with more than one class)
    Previous experience
  • Trial and error

Overfitting and underfitting


Overfitting occurs when the model is too tightly fitted to a set of data points and cannot be properly generalized. Such a model works very well with the data used to train it, but may have difficulty providing accurate predictions for new data. Overfitting can occur when the model tries to fit every data point, even those data points that may be called noise or outliers. It can also occur when the training data set does not contain enough data.


Underfitting occurs when the model is not sophisticated enough to accurately capture the patterns in the training data. Underfitted models are usually too simplistic and cannot provide accurate predictions for both new data and data that are very similar to the training data. One cause of underfitting can be a training data set that does not contain features that reflect important relationships between inputs and outputs. It can also occur when the algorithm does not fit the data correctly (e.g., creating a linear model for non-linear data).

Source: ISTQB®: Certified Tester AI Testing (CT-AI) Syllabus Version 1.0

Was this article helpful?

Related Articles