Classification and Regression Tree CART

A sophisticated method to map vegetation based on training data and CART analysis.

Difficulty: 3
Technical level: 4
Expense: Variable
Scale: Variable
Accuracy: 4

(Ratings are given on a 1-5 scale. Click on any rating for an explanation)

Method Overview

Object-based classification is a general term for a type of image classification applied to groups of pixels or “objects” (also known as image segments). These objects represent features on the ground, such as uniform stands of vegetation, water bodies, or meadows. Objects are created through a process known as segmentation, which is performed by specialized software such as Trimble’s eCognition.

Like other types of image classification, object-based classification divides an image into a set of discrete classes such as vegetation types. This method is applicable when utilizing simple to complex classification hierarchies that include vegetation types that may be difficult to distinguish from one another.

There are a variety of methods for classifying objects, with some more sophisticated than others. In general, RSAC prefers classification and regression tree (CART)–type algorithms because they are robust, relatively easy to use, and reliably produce good results. Therefore, this article will focus on CART-based methods. For information on other object-based classification methods, please refer to the Additional Information section below.

CART analysis is a process that builds models called decision trees—so called because of their tree-like structure—based on training data. Decision trees can be used for classification (predicting what group a case belongs to) and for regression (predicting a continuous value). Decision trees are not only valuable for their classification potential, but they also provide insight about the relationships of dependent and independent variables. CART analysis can be performed by a variety of software packages; nevertheless, these packages all require some customization to be properly integrated into the vegetation mapping process.

Once decision trees are generated, the user will usually do some pruning. Trees often over-fit data and create an excessive number of nodes and branches. The user can usually either interactively prune their decision trees or experiment with different parameters controlling the thresholds for new tree branches. Once pruning is complete, the decision tree should be tested. Ideally the tree (and its rules) should be tested on an independent dataset or data withheld from the original data set. Once an acceptable level of accuracy has been achieved, the decision tree rules (or model) can be applied.

CART analysis algorithms are not generally included in image processing software packages. However, several software packages are available commercially. RSAC has shown success using the See5 and Cubist software packages. In addition, RSAC frequently uses the random forests classifier in the freeware statistical package, “R.” Random forests is a CART analysis tool, but instead of using a single decision tree, random forests creates many decision trees, each of which is derived from a random subset of the training data. The prediction from each decision tree is tabulated and the majority prediction is used for the classification. This approach eliminates over-fitting and the need for pruning, simplifying the analysis process.

CART classification is not limited to object-based analysis. CART analysis can be applied to individual pixels and to continuous data such as slope analysis (see the RSAC Riparian Mapping Tool).

Similar Methods

Data Inputs

CART classification can be performed on any digital image. It is frequently applied to satellite or aerial imagery, vegetation indexes (e.g., normalized difference vegetation index [NDVI]) derived from such imagery, and a variety of other ancillary data such as topographic or climatic data. A high quality and comprehensive training data set is also required.

Method Products

This method produces a new layer with objects assigned to unique vegetation classes.

Workflow

RSAC uses a five-step process for object-based classification using CART methods:

Segmentation (object creation).
Calculation of zonal statistics.
Training data preparation.
Classification.
Classification check and manual editing.

Segmentation

Segmentation is the process of dividing images into spectrally- and spatially- cohesive objects that are representative of features on the ground. These objects can be vegetation patches of similar physiognomy, structure, and floristics, or other uniform features such as lakes and roads. The segmentation is performed by a specialized image-processing package. Several segmentation packages are available; click here for an extensive list. RSAC evaluated many of these packages. Trimble’s eCognition 8.64 performed best in this evaluation. However, Berkeley Image Segmentation also yielded reasonably good results for a significantly lower price. Currently, eCognition is available to all Forest Service regional offices.

Using eCognition, the user can influence the size and shape of the segments by adjusting the software’s scale, color, and shape settings to obtain segments that match ground features. The settings that produce good segments vary from image to image, requiring some experimentation until the proper settings are found.

Figure 1: Example segmentation in two areas with (A) higher vegetation density and (B) lower vegetation density (from Hamilton and others, 2007).

Calculation of Zonal Statistics

Object-based classification can take advantage of a variety of data beyond remote sensing imagery, such as elevation data in the form of a digital elevation model (DEM). The input data must be summarized for each of the segments by computing zonal statistics, typically the mean, for each segment. For example, a representative elevation for each segment is calculated by averaging the values of all of the pixels that a segment covers. Object-based classification can also incorporate thematic data such as ecotype. If a segment covers more than one thematic class, the dominant class is assigned to the segment. The CART classification is performed on the zonal statistics from each segment.

Object-based classification is particularly well suited for high-resolution imagery because it tends to reduce the variability in the imagery that can cause confusion to an image classifier. For example, in imagery with very high resolution, shadows, branches, and variability in the foliage of a stand of trees are all represented in the pixels. By computing the zonal mean for an entire stand, this variability is reduced.

Training Data Preparation

Unlike pixel-based classification, training data collected for object-based classification must represent the land cover within an entire object. These data can be collected in the field or interpreted from imagery (see the Vegetation Mapping Prerequisites page for special considerations about collecting training data). CART algorithms develop their models based on the zonal statistics of the training objects.

Segment Classification

Classification of segments is performed by generating models using CART algorithms that identify patterns in the training dataset, and from those patterns build a regression (or decision) tree model. Those decision trees are used to predict the vegetation class of each segment based on the zonal statistics computed from imagery and other available data.

CART algorithms are often not included in image processing software packages. RSAC has developed some custom utilities to facilitate CART classification via algorithms available in software packages such as R, Cubist, See5, and Orange. See the Additional Information section for links to these tools.

Classification Check and Edit

Once the segments are classified, an analyst must review and verify the classification to ensure that the predications are acceptable. In some situations, the classification will generally be accurate but will consistently fail with certain classes. In this case, the model can be adjusted by adding training data or, alternatively, by manually editing the final map.

Limitations

CART classification is technically advanced and requires an extensive and comprehensive training dataset. Please see the Vegetation Mapping Prerequisites page for more discussion on training data quality requirements. The CART software is not integrated into existing image processing software and can be complicated to perform. The segmentation software can be very expensive.

Software/Hardware Requirements

Segmentation software (e.g., eCognition)
Image processing software (e.g., ERDAS Imagine)
Data mining/statistical software (e.g., R)
RSAC data mining tools

This is a processing intensive method that requires substantial processing power and large amounts of memory.

Technical References

Hamilton, R.; Megown, K.; Mellin, T.; Fox, I. 2007. Rep. No. RSAC-0094-RPT1. Salt Lake City, UT: U.S. Department of Agriculture Forest Service, Remote Sensing Applications Center. 17p.