Why Stratify?

contributed by Jason Karl

What is Stratification?

Stratification refers to dividing a population or inference space up into sub-groups or subunits prior to sampling. To be useful for sampling, strata are defined in a way that group similar sampling units into the same strata (i.e., variability within each stratum is minimized). Because variability is minimized within strata, stratification improves the precision of estimates and is a more efficient sampling technique than simple random selection. Within each stratum, locations are selected for sampling via simple random selection or another randomized selection technique (e.g., spatially-balanced sampling). The number of locations sampled within each stratum can be different and can be related to the within-stratum variability. For example, strata that are very homogeneous may get only a few samples, whereas more heterogeneous strata would have more samples taken from them. Elzinga et al. (1998) has a good description of stratified random sampling.

Example

To illustrate how stratification can be helpful in assessment and monitoring, consider the following example. Twelve measurements of bare ground were collected at random locations in an allotment:

Location # Measured Bare Ground Cover Location # Measured Bare Ground Cover
1 57% 7 65%
2 45% 8 40%
3 52% 9 35%
4 35% 10 56%
5 27% 11 53%
6 63% 12 32%

The average of these measurements is 46.7%, and the estimated variance is 162.42 (standard deviation of 12.74%).

However, we then realized that the allotment actually contained two different ecological sites: a Loamy 12-16 ARTRW8/PSSP6 (Wyoming big sagebrush/bluebunch wheatgrass) site, and a Loamy Bottom 8-14 ARTRT/LECI (Basin big sagebrush/Great Basin wild rye) site. For simplicity sake, we’ll say that both sites occupy the same area (i.e., each makes up half the allotment). Looking up there reference sheets for these ecological sites from the NRCS ESIS website, we find that these two sites differ in their expected amount of bare ground at reference condition. The Wyoming big sagebrush site is expected to have 50-65% cover of bare ground, while the Basin big sage site is expected to have only 30-40% bare ground. If we stratify our data by ecological site, we get the following:

Location # Stratum Measured Bare Ground Cover
1 ARTRW8/PSSP6 57%
2 ARTRW8/PSSP6 65%
3 ARTRW8/PSSP6 52%
4 ARTRW8/PSSP6 56%
5 ARTRW8/PSSP6 53%
6 ARTRW8/PSSP6 63%
7 ARTRTR/LECI 45%
8 ARTRTR/LECI 40%
9 ARTRTR/LECI 35%
10 ARTRTR/LECI 35%
11 ARTRTR/LECI 27%
12 ARTRTR/LECI 32%

The estimator for stratified random sampling data is more complicated than for simple random sampling (see pros and cons section below), but the results show that the estimated mean for the stratified samples is 46.7% – the same as for the simple random selection above. The variance, however, is a different story. The variance for a stratified random sample is the sum of the variances within each stratum weighted by the proportion of total area that stratum makes up (remember we said above that the two ecological sites were in equal proportion). The estimated variance of the stratified random sample is only 16.7 (standard deviation of 4.1%). That’s way lower than the estimated variance without the strata.

The upshot of this example is that we used the strata to account for some of the variability of bare ground in our allotment and that allowed us to have a more precise estimate of bare ground overall. An implication of this is that you need fewer sampled to achieve the same level of precision with stratified random sampling versus simple random sampling.

When should you Stratify?

In a nutshell, you should consider stratifying when you have reason to believe that the average value of or variability in whatever you’re interested in is not consistent across your inference space. If a study area can be divided up into subunits such that the measurements within the subunits are similar to each other, then stratification is a good idea.

Factors that cause mean or variability in an attribute to change within a study area include:

  • differences in soil types or ecological sites – plant cover, density, and composition can vary depending on soil conditions
  • topography – elevation, slope, and aspect can all cause differences in plant communities
  • climate – temperature and precipitation can also cause differences in composition, cover, and density of plants.
  • management – the distribution of management activities (e.g., grazing) is generally not uniform across a landscape. Defining zones of management influence based on factors related to the management activity (e.g., distance to water for grazing) can be very useful as strata.

This list is certainly not inclusive, but serves to illustrate the kinds of factors that can be used to define strata. In general, for assessment and monitoring studies, it is desirable to define strata based on permanent landscape features (e.g., soil type boundaries) that divide the area into relatively homogeneous units rather than on boundaries that can change over time and that are not directly related to variability within the landscape (e.g., pastures within an allotment).

Pros and Cons of Stratification

As illustrated in the example above, the biggest advantage to stratification is that it can reduce the overall variance of your estimates. The reduction in estimated variance makes stratified random sampling a much more efficient method of sampling than simple random sampling (i.e., you generally need fewer samples to achieve the same level of precision). Additionally, stratification can usually be combined with other sampling techniques to improve efficiency.

One potential disadvantage of stratified random sampling is that the formulas used for estimating the mean and variance are more complicated than for simple random sampling. Also, if strata are defined in a manner that does not partition the study area into relatively homogeneous units, then there is no advantage of stratifying. For instance, in our example above, if the points were randomly grouped into strata rather than grouped by their ecological site, the estimated variance would be expected to be as high as the simple random sampling variance.

References

  • Elzinga, C. L., D. W. Salzer, and J. W. Willoughby. 1998. Measuring and monitoring plant populations. U.S. Department of the Interior, Bureau of Land Management. National Applied Resource Sciences Center, Denver, Colorado. Download PDF.
  • Foreman, E.K. 1991. Survey Sampling Principles. Marcel Dekker, Inc. New York.
  • Thompson, S.K. 1992. Sampling. John Wiley and Sons, Inc. New York.

Comments are closed.