Random Forests

written by Grant Hamilton

Other Names:

Random Forest, RF

Description

Random Forests is a ensemble learning algorithm for regression and classification. Since it is a machine-learning method, random forests is non-parametric because it is not based on any assumptions about data distribution. Unlike statistical approaches, machine-learning is data driven by the relationship between independent and dependent variables (Breiman, 2001). Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.

Potential Remote Sensing Applications

The following three cases are a small sample of the many ways the Random Forests algorithm has been applied to remotely sensed image classification.

In a comparison of Random Forests with three other classification algorithms (Gentle AdaBoost [GAB], Support Vector Machine [SVM] and Maximum Likelihood Classification [MLC]), initial findings indicate that random forests gives higher landcover classification accuracies of Ikonos and QuickBird images than the other methods, especially with images of urban areas (Akar and Güngör, 2013).

Mellor et al. (2013) used Random Forests to classify landcover into forested and non-forested classes using Landsat TM and MODIS imagery of the Australian state of Victoria. They found that using random forests for landcover classification yielded an accuracy of 96% (ϰ = 0.91).

Immitzer, Atzberger, and Koukal (2012) used Random Forests to classify tree species using WorldView-2 images of sunlit crowns in a temperate forest in Austria. They concluded that Random Forests was able to classify tree species with an accuracy of approximately 82%; Random Forests was better able to identify some tree species than others.

Limitations

The developers of the Random Forests coded the algorithm in FORTRAN 77. Its code is available here (Breiman and Cutler, 2004). Random Forests code for the free statistical analysis program R is available here (Liaw and Wiener, 2002; The Comprehensive R Archive Network, 2012). In order to run Random Forests algorithm with remotely sensed data in statistics programs, images must be formatted as comma separated value (CSV) files.

Random Forests’ efficacy for image classification has been proven, but it has not yet been widely adopted by the remote sensing community. The primary limitation to its general use is the lack of off-the-shelf tools or plugins for the most popular remote sensing and GIS programs – Akar and Güngör (2013) used a MATLAB script to run the algorithm; Immitzer et. al (2012) used R code; and Mellor et al. (2013) linked GRASS and R using Python code.

Unfortunately the code for the three applications cited above was not published by the authors and no known application or code explicitly for remote sensing image classification using Random Forests is currently available. If users wish to run the algorithm in geospatial programs, they will have to code the algorithm themselves. Although Random Forest is a promising tool for image classification, until code or tools for remote sensing and GIS programs becomes available, its utility to practitioners will be limited.

Other image classification methods include:

References

Akar, Ö and Güngör, O. Classification of multispectral images using Random Forest algorithm. Journal of Geodesy and Geoinformation. 1(2), pp. 106-112. doi: 10.9733/jgg.241212.1. http://www.hkmodergi.org/jgg/index.php/JGG/article/download/186/185

Breiman, L. 2001. Random forests. Machine Learning. 45(1), pp. 5-32. doi: 10.1023/A:1010933404324. http://oz.berkeley.edu/~breiman/randomforest2001.pdf

Breiman, L. and Cutler A. (2004). Random Forests. Department of Statistics, University of California, Berkeley. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Immitzer M, Atzberger C, Koukal T. Tree species classification with random forest using very high spatial resolution 8-band WorldView-2 satellite data. Remote Sensing. 2012; 4(9), pp. 2661-2693. doi:10.3390/rs4092661. http://www.mdpi.com/2072-4292/4/9/2661

Liaw, A. and Wiener, M. (2002). Classification and Regression by Random Forest. R News 2(3), pp. 18 – 22. Code available from: http://cran.rproject.org/web/packages/randomForest/index.html. The Comprehensive R Archive Network. (2012). Classification and regression based on a forest of trees using random inputs.

Mellor, A., Haywood, A., Stone C., Jones, S. 2013. The performance of random forests in an operational setting for large area sclerophyll forest classification. Remote Sensing. 5(6), pp. 2838-2856. doi: 10.3390/rs5062838. http://www.mdpi.com/2072-4292/5/6/2838

Comments are closed.