6.1. Tutorial: Random Forest Classification

This tutorial is about the Random Forest classification. It is assumed that one has the basic knowledge of SCP and Basic Tutorials.

Random Forest is a particular machine learning technique, based on the iterative and random creation of decision trees (i.e. a set of rules and conditions that define a class).

WARNING: ESA SNAP is required. The ESA SNAP GPT executable must be defined in External programs settings.

The purpose of the classification is to identify the following land cover classes:

  1. Water;
  2. Built-up;
  3. Vegetation;
  4. Soil.

Following the video of this tutorial.

http://www.youtube.com/watch?v=FtHsGlLiNaw

6.1.1. Input Data

Any raster data can be used with Random Forest. In this tutorial, we are going to use a subset of a Sentinel-2 Satellite image (Copernicus land monitoring services), already converted to reflectance, and use the bands illustrated in the following table.

Sentinel-2 Bands Central Wavelength [micrometers] Resolution [meters]
Band 2 - Blue 0.490 10
Band 3 - Green 0.560 10
Band 4 - Red 0.665 10
Band 5 - Vegetation Red Edge 0.705 20
Band 6 - Vegetation Red Edge 0.740 20
Band 7 - Vegetation Red Edge 0.783 20
Band 8 - NIR 0.842 10
Band 8A - Vegetation Red Edge 0.865 20
Band 11 - SWIR 1.610 20
Band 12 - SWIR 2.190 20

You can download the image from this archive (about 20 MB, © Copernicus Sentinel data 2020 downloaded from https://scihub.copernicus.eu/), and then unzip the downloaded file. The downloaded product is already converted to reflectance and no preprocessing is required in this case.

Start QGIS and the SCP.

Open the tab Band set clicking the button bandset_tool in the SCP menu or the SCP dock. Click the button open_file and open the directory containing the input bands and select all the .tif files. The selected bands will be added to the active band set.

In the table Band set definition order the band names in ascending order (click order_by_name to sort bands by name automatically). Finally, select Sentinel-2 from the list Wavelength quick settings, in order to set automatically the Center wavelength of each band and the Wavelength unit (required for spectral signature calculation).

_images/tutorial_rf_band_set.jpg

Band set

We can display a Color Composite of bands: Near-Infrared, Red, and Green: in the Working toolbar, click the list RGB= and select the item 7-3-2 (corresponding to the band numbers in Band set). You can see that image colors in the map change according to the selected bands, and vegetation is highlighted in red (if the item 3-2-1 was selected, natural colors would be displayed). This color composite will be useful later for ROI creation.

_images/tutorial_rf_band_set_2.jpg

Color composite RGB=7-3-2

Now we need to create the Training input in order to collect Training Areas (ROIs).

In the SCP dock select the tab Training input and click the button new_file to create the Training input (define a name such as training.scp). The path of the file is displayed and a vector is added to QGIS layers with the same name as the Training input (in order to prevent data loss, you should not edit this layer using QGIS functions).

_images/tutorial_rf_training_input.jpg

Definition of Training input in SCP

6.1.2. Create the ROIs

ROIs must be created by manually drawing a polygon. You could also import polygons from a vector file using this tool Import vector.

WARNING: because of compatibility with software SNAP only ROIs defined manually with a polygon will be used for classification; region growing ROIs and spectral signatures will not be used as training input.

We are going to create ROIs defining the Classes and Macroclasses. Each ROI is identified by a Class ID (i.e. C ID), and each ROI is assigned to a land cover class through a Macroclass ID (i.e. MC ID). Thus, we are going to create several ROIs for each macroclass (setting the same MC ID, but assigning a different C ID to every ROI). We are going to use the Macroclass IDs defined in the following table.

Macroclasses
Macroclass name Macroclass ID
Water 1
Built-up 2
Vegetation 3
Soil 4

Create a few ROIs and save them in the Training input.

_images/tutorial_rf_rois_1.jpg

Created ROIs

Please note that classification previews are not available with Random Forest.

6.1.3. Random Forest Classification

The Random Forest tool allows for classifying a Band set using the ROI polygons in the Training input.

Open the tab Random Forest clicking the button processing in the SCP menu or the SCP dock. In Select input band set input_number we set 1 because we are going to classify the first Band set.

Check Use checkbox MC ID in order to use the Macroclass ID code of ROIs.

In Number of training samples enter 5000 as the number of training data (pixels) randomly used to traing the model. You can increase this number if the ROI polygons are very large and cover more than 5000 pixels.

In Number of trees enter 100 as the number of decision trees (a higher number allows for more accurate models, but it also increases the calculation time). Also check checkbox Evaluate classifier to report the evaluation of the classifier at the end of the process. You can ignore the option Evaluate feature power set.

TIP : You can save the classifier for later use, for instance classifying a different input band set, by checking checkbox Save classifier, and later select Load classifier open_file to open the previously saved classifier; when loading a saved classifier no training input is required and the processing time is reduced.
_images/tutorial_rf_tool.jpg

Random Forest tool

Now click the button RUN run and define the path of the classification output.

_images/tutorial_rf_result.jpg

Random Forest classification

Also, a confidence raster is created which assess the reliability (from 0 minimum to 1 maximum) of the model at pixel levels.

We can see several classification errors especially in pixels with low confidence values. If pixels have low confidence values, we need to create new ROIs for these pixels.

_images/tutorial_rf_confidence.jpg

Random Forest confidence

The evaluation report allows for assessing the performance of the model (not the accuracy of the whole classification). We can also read the feature importance score, which is the importance of single bands in the Band set definition. For instance, we could try to remove the bands with the lowest score to reduce the computation time and obtaining similar results.

_images/tutorial_rf_evaluation.jpg

Random Forest evaluation

Well done! We have performed a Random Forest classification of a remote sensing image.