Vegetation mapping requires significant preparation before remote sensing analysis can begin. Two principal components of this preparation are establishing a vegetation classification system and obtaining training data compatible with the imagery.
Classification System
The design of the vegetation classification system is integral to an effective mapping effort. The classification system must meet the business needs of your organization, the scientific needs of your investigation, and be compatible with the remote sensing process. Creating a classification system compatible with the business and scientific needs of the investigation are outside the scope of this website. For further information, please refer to Existing Vegetation Classification and Mapping Technical Guide Version 1.0 (an update in progress is here) or to guidance relevant to your organization.
Following are six criteria to keep in mind about a remote sensing compatible classification scheme. The classification scheme must:
- Meet the information needs of the intended user
- Be based on dominant over-story vegetation
- Be exhaustive
- Be mutually exclusive
- Be hierarchical. A hierarchical classification scheme can be collapsed to a higher level if some of the classes cannot me adequately modeled
- Be objective, based on measurable feature characteristics
The classification scheme must meet the information needs of the intended user. The user must establish which types of vegetation need to be identified, and to what level of detail. Is it sufficient to differentiate between riparian and non-riparian or is a more detailed classification required?
The classification scheme must be based on dominant over story vegetation. Remote sensing can only detect vegetation that is visible from above, and is limited to the predominate vegetation. In some cases, a mixed class may be appropriate.
The classification system must be exhaustive. All vegetation types must be included in the classification system. Mapping methods rely on the identification of vegetation type based on training data. If there are vegetation types that are not included in the classification system, the software will try to fit them into the existing classification, leading to errors.
The classification system must be mutually exclusive. Each type of vegetation must fall into one, and only one, class. If vegetation types fall into more than one class, the training data will include profiles that are the same for more than one class. As a result, the image classifier will be unable to differentiate between those classes, producing errors.
The classification scheme must be hierarchical. Remote sensing is very good at distinguishing between vegetation types that have substantially different color profiles; however, at more refined levels, vegetation becomes increasingly similar, and the difficulty of distinguishing between the classes increases. For example, it is easy to distinguish between deciduous and evergreen trees, but it is more difficult to distinguish between oak and maple, and even more difficult to distinguish a red maple from a sugar maple. A hierarchical classification system allows classes to collapse to a higher level when they become too similar to be distinguished from one another.
The classification scheme must be objective, based on measurable feature characteristics. Objective classes increase consistency by preventing interpretive evaluations that can vary from individual to individual. Avoid subjective interpretive classes such as “Old Growth” or “Suitable Habitat.”
Ultimately, the complexity of a classification scheme will affect project accuracy and cost. The more complex the scheme, the more expensive and less accurate the final product will be. A good and workable classification scheme is not a guarantee of success, but a poor and unworkable one is a guarantee of failure—give it the time and attention it deserves for your project.
Training Data
Training data identify the vegetation at known locations on the ground and are used to “teach” the analyst or the software what each vegetation type “looks” like. Because training data make the connection between the imagery and the vegetation, it is critical that the datasets adequately and accurately represent all of the vegetation (including variations within a vegetation type) that will be classified. Acquiring and processing training data can be a labor and time intensive process. While field data is the foundation for a good training data set, image interpretation is frequently used to expand the dataset, especially once the classification process has begun. In some cases, image-interpreted data may be the only source of training data, but this is not recommended unless the classes are easily distinguished in the imagery.
Below are some of the criteria to which training data must conform in order to ensure reliable mapping products.
Positional Accuracy
Training data must have a positional accuracy that is greater than the resolution of the mapping unit. For pixel-based classifiers the mapping unit is the pixel. For object-based classifiers the mapping unit is the segment, which consists of a group of pixels and can vary substantially in size.
It is imperative that the GPS coordinates of the training data are sufficiently accurate. Consider, for example, training data that are collected with a recreational-grade GPS and applied to Landsat imagery. The positional accuracy of the GPS data is +/- 30 m. That that training data point could fall anywhere in a 60-meter diameter circle. Landsat imagery has a 30-meter resolution (30-meter pixels). When the training data are overlaid on the imagery, a training point with an accuracy of +/- 30 m could fall in any of 9 different pixels. If the training point falls in the incorrect pixel, the wrong color signature could be associated with that class.
The positional accuracy of the imagery must also be considered. Every image has a positional accuracy that depends on the circumstances of data collection and post processing. For example, Landsat imagery has an accuracy of approximately +/- 15 meters. This variance adds to the uncertainty of the training data positions. Ultimately, your training data should have the highest accuracy that is reasonably attainable. Collecting training data at the centers of relatively large, homogeneous stands of vegetation can help mitigate the effects of positional uncertainty.
Training Data Scale
Each training data point represents the vegetation within a defined area, and thus, reflects a certain scale of information. The area represented by the training data must correspond to the associated mapping unit. In a remote sensing image, the color of each mapping unit (whether pixel or object) is a composite of all of the vegetation on the portion of the earth’s surface that the mapping unit covers. When training data are collected, they must represent a similar composite. Consider, for example, a small, 5-meter pond covered in cattails, surrounded by grassland. An individual collecting training data at the location might be inclined to classify this point as cattail. If this data point was used with WorldView 2 imagery (2-meter pixels), that classification would be correct. However, if this data point was used with Landsat data (30-meter pixels), the classification would be incorrect. The color of a Landsat pixel centered on that pond would be predominately dependent on the color of the grass, because the size of the pond is small relative to the size of the pixel. The problem is aggravated by the fact that the training data point is unlikely to fall in the center of a pixel or segment. The hypothetical pond could be split, occupy part of two adjacent pixels, further reducing the influence of the pond. When collecting training data, it is best to collect from the center of homogenous stands of vegetation that are larger than the size of the mapping unit.
Training Data Diversity
Any particular type of vegetation can have a variety of color profiles depending on the location and condition of the vegetation, and the manner in which the image is collected. The conditions in which vegetation grows have a profound impact on how that vegetation appears. Within any given image, growing conditions can vary based on differences in hydration, soil type, elevation, and level of disturbance. The conditions of image collection can also impact vegetation appearance. Perhaps most significant is the presence of shadows in mountainous areas, which can have a profound impact on the appearance of the vegetation. It is important that the training data cover this diversity of appearance. Training data must capture the range of variability in appearance for each vegetation class.
Mapping Strategy
There are two strategies to the mapping process that are method-independent: the vegetation classes can either be mapped all at once, or they can be mapped by stepping down through the classification hierarchy. Mapping all classes at the same time is the simplest and fastest approach. The hierarchical approach takes longer, but can yield a more accurate map. In this approach, the mapping begins with the coarsest level (e.g., riparian and non-riparian). Subclasses within these classes are then mapped (e.g., conifer and deciduous within the riparian class). The results from each iteration are reviewed and optimized before proceeding with the next level. If one mapping level is not satisfactory, additional training data are added in areas of misclassifications, if possible. One disadvantage to the hierarchical approach is that it is difficult to assess the overall accuracy of the classification using cross validation.