written by Grant Hamilton
Other Names:
Random Forest, RF
Description
Random Forests is a ensemble learning algorithm for regression and classification. Since it is a machine-learning method, random forests is non-parametric because it is not based on any assumptions about data distribution. Unlike statistical approaches, machine-learning is data driven by the relationship between independent and dependent variables (Breiman, 2001). Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Potential Remote Sensing Applications
The following three cases are a small sample of the many ways the Random Forests algorithm has been applied to remotely sensed image classification.
In a comparison of Random Forests with three other classification algorithms (Gentle AdaBoost [GAB], Support Vector Machine [SVM] and Maximum Likelihood Classification [MLC]), initial findings indicate that random forests gives higher landcover classification accuracies of Ikonos and QuickBird images than the other methods, especially with images of urban areas (Akar and Güngör, 2013).
Mellor et al. (2013) used Random Forests to classify landcover into forested and non-forested classes using Landsat TM and MODIS imagery of the Australian state of Victoria. They found that using random forests for landcover classification yielded an accuracy of 96% (ϰ = 0.91).
Immitzer, Atzberger, and Koukal (2012) used Random Forests to classify tree species using WorldView-2 images of sunlit crowns in a temperate forest in Austria. They concluded that Random Forests was able to classify tree species with an accuracy of approximately 82%; Random Forests was better able to identify some tree species than others.
Limitations
The developers of the Random Forests coded the algorithm in FORTRAN 77. Its code is available here (Breiman and Cutler, 2004). Random Forests code for the free statistical analysis program R is available here (Liaw and Wiener, 2002; The Comprehensive R Archive Network, 2012). In order to run Random Forests algorithm with remotely sensed data in statistics programs, images must be formatted as comma separated value (CSV) files.
Random Forests’ efficacy for image classification has been proven, but it has not yet been widely adopted by the remote sensing community. The primary limitation to its general use is the lack of off-the-shelf tools or plugins for the most popular remote sensing and GIS programs – Akar and Güngör (2013) used a MATLAB script to run the algorithm; Immitzer et. al (2012) used R code; and Mellor et al. (2013) linked GRASS and R using Python code.
Unfortunately the code for the three applications cited above was not published by the authors and no known application or code explicitly for remote sensing image classification using Random Forests is currently available. If users wish to run the algorithm in geospatial programs, they will have to code the algorithm themselves. Although Random Forest is a promising tool for image classification, until code or tools for remote sensing and GIS programs becomes available, its utility to practitioners will be limited.
Related Methods
Other image classification methods include:
- classification_and_regression_tree_cart
- Unsupervised Classification
- Supervised Classification
- Object-based Classification – also called Object-based image analysis (OBIA)
- Target Detection/Extraction
- Spectral Mixture Analysis
- Multiple Endmember Spectral Mixture Analysis
- Wavelet Analysis
- Change Detection
References
Akar, Ö and Güngör, O. Classification of multispectral images using Random Forest algorithm. Journal of Geodesy and Geoinformation. 1(2), pp. 106-112. doi: 10.9733/jgg.241212.1. http://www.hkmodergi.org/jgg/index.php/JGG/article/download/186/185
Breiman, L. 2001. Random forests. Machine Learning. 45(1), pp. 5-32. doi: 10.1023/A:1010933404324. http://oz.berkeley.edu/~breiman/randomforest2001.pdf
Breiman, L. and Cutler A. (2004). Random Forests. Department of Statistics, University of California, Berkeley. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Immitzer M, Atzberger C, Koukal T. Tree species classification with random forest using very high spatial resolution 8-band WorldView-2 satellite data. Remote Sensing. 2012; 4(9), pp. 2661-2693. doi:10.3390/rs4092661. http://www.mdpi.com/2072-4292/4/9/2661
Liaw, A. and Wiener, M. (2002). Classification and Regression by Random Forest. R News 2(3), pp. 18 – 22. Code available from: http://cran.rproject.org/web/packages/randomForest/index.html. The Comprehensive R Archive Network. (2012). Classification and regression based on a forest of trees using random inputs.
Mellor, A., Haywood, A., Stone C., Jones, S. 2013. The performance of random forests in an operational setting for large area sclerophyll forest classification. Remote Sensing. 5(6), pp. 2838-2856. doi: 10.3390/rs5062838. http://www.mdpi.com/2072-4292/5/6/2838