5.6 Types of probability sampling designs

3/6/00


Click here to start


Table of Contents

5.6 Types of probability sampling designs

In social science research, sampling is virtually always done "without" replacement; that is, after a particular unit has been selected for observation, it is not put back into the pool so it can be selected again.

For example, it would be impractical to survey the same person twice.

Recall that with probability samples, every unit in the population has a nonzero probability of being included in the sample, and random selection of units is used.

There are four primary types of probability sampling designs. (Type 1) Simple Random Sample -- Every unit in the population has an equal chance of being selected in the sample. Each unit in the population is assigned a number. A set of numbers is then randomly selected with units assigned those numbers being included in the sample.

With a simple random sample, note that every unit in the population has an equal chance or probability of being selected for the sample. This probability is: p = 1/N Where N=the size of the study population.

Using simple random sampling may be very difficult if the size of the study population is very large as it could be very cumbersome and time consuming to assign a number to every unit in the study population, especially if it has to be done by hand.

Sampling Frame -- A list of all units in a study population that can be used to select a sample from.

(Type 2) Systematic Sample -- The sampling fraction (k) is calculated by dividing the study population size by the desired sample size. A random number is selected between 1 and k. Beginning with the randomly selected number every kth unit in the population is selected for inclusion in the sample.

A sampling fraction is calculated as follows: Study Population Size Desired Sample Size

For example, there is a study population of 100 people. A sample of 10 persons is desired. In this case the sampling fraction would be 100/10 = 10. Thus, k = 10.

A random number is then selected between 1 and 10 (e.g., a 3 is selected). The 3rd unit would then be selected for the sample. Using the sampling fraction, 10 units would be skipped and the 13th unit would be included. Skipping ahead 10 more units, the 23rd unit would be included. This procedure would continue until 10 units were selected for the sample and the desired sample size reached.

Note that if a number between 1 and k were not randomly selected to start on (that is, if you just started on k), a systematic sample would not meet the criteria for a probability sample. By, randomly selecting the starting point, all units have a nonzero probability of being included.

The primary reason for using a systematic sample is that it may more practical in terms of time and resources compared to a simple random sample. Particularly, when the size of the study population is large and it would be too difficult to assign a number to each unit.

A primary danger of systematic sampling is that this design can produce a biased, or non-representative sample if the sampling frame from which the sample is selected is ordered in some kind of systematic fashion that will influence the composition of the sample.

In order to ensure that a systematic sample is representative, it is important to make sure that the sampling frame being used to select the sample is not ordered in some systematic fashion that will produce a biased (or unrepresentative) sample in relation to the key variables that are to be measured.

(Type 3) Stratified Sample -- The population is divided up into subgroups or "strata." A separate sample of units is then selected from each strata.

There are two primary reasons for using a stratified sampling design. Reason 1 -- To potentially reduce sampling error by gaining greater control over the composition of the sample, particularly concerning variables where it is important that the sample be representative.

Stratification Variable -- variable or variables by which a study population is divided up into strata (or groups)in order to select a stratified sample.

Proportionate Stratified Sample -- Stratified sample where the number of units selected from each strata for a sample is proportional to the number of units in each strata in the population.

Reason 2 -- A second reason for using a stratified sampling design is to ensure that a small group within a population is adequately represented in a sample in order to compare it to a large group.

Disproportionate Stratified Sample -- Stratified sample where the number of units selected from each strata for a sample is not proportional to the number of units in each strata in the population.

Disproportionate stratified samples work well when the research goal is to compare the behavior or characteristics of a small strata with a larger strata within a population.

This type of design does not work as well when the goal is to draw inferences about the population as a whole as the sample is not representative of the population on the stratification variable. However, weighting can be used to attempt to correct this problem.

The major problem with using stratified sampling is that it is necessary for the researcher to have data on the characteristics of the population (i.e., have population data on the stratification variable) in order to select the sample. In many situations, data on population characteristics may be unavailable and unknown.

(Type 4) Cluster Sample -- The population is divided up into subgroups or "clusters" that represent aggregates of individual units. A sample of clusters is then selected. All individual units that are contained within a cluster that is selected are included in the sample.

A major advantage of cluster sampling is that it can be used on very large populations and it is not necessary to have data on important variables for the entire population. Rather, it is just necessary to be able to divide the population up into clusters of some type.

A major disadvantage of cluster sampling is that this method tends to produce less representative samples compared to other probability sampling designs, particularly when the clusters contain large numbers of units within them and only a few are needed to meet the desired sample size.

5.7 Types of nonprobability sampling designs

Remember, with a nonprobability sample, every unit in the study population does not have a chance, or a nonzero probability, of being selected for inclusion in the sample. As a result of this, statistical tests, such as the calculation of confidence intervals, cannot be validly used, because such procedures assume that each unit in the population does have a chance of being included in the sample.

Therefore, any studies using nonprobability sampling designs should be viewed with suspicion if the researchers are trying to use the data to draw empirical generalizations, or inferences, to a larger population.

There are four basic types of nonprobability sampling designs: (1) Convenience (Accidental) Sample -- units are selected on the basis of availability.

(2) Quota Sample -- units are selected on the basis of availability with "quotas" being selected to represent the distribution in the population.

A quota sample is similar to a stratified sample, except that units are selected on the basis of convenience.

(3) Judgmental Sample -- the researcher selects units he/she thinks are most representative of the population.

Thus, with this design, not only is every unit in the population not eligible for inclusion in the sample, but the composition may be affected by the personal biases of the researcher as to who he/she believes should be interviewed.

(4) Snowball Sample -- a unit with a desired characteristic is identified. This unit is asked to identify other units with the desired characteristic. These additional units are also asked to identify other units with the desired characteristic. Through this process the size of the sample "snowballs" or grows larger.

This technique is useful when little is known about a population being studied or the goal is to study social groups such as "social networks.

The problem with the snowball sample is the same; that is, not all units in the study population would have a chance of being included in the sample. Therefore, inferences cannot be validly drawn to the study population.

In sum, nonprobability sampling should not be used when the goal of a study is to draw empirical generalizations, or inferences, about a larger population or group.

Nonprobability samples are best used in "exploratory" research studies, where the goal is to gain some initial insights into behaviors that we know little about. -- Why do people join militias?

While findings obtained from nonprobability samples cannot be empirically generalized to a larger population, they could be viewed as "suggestive." That is, the findings could be viewed as the results that a researcher "might" obtain, if he/she conducted a study using a probability sample.

However, we don't have the additional information of being able to attach probabilities to the accuracy of inferences that one could get in using probability samples.

5.8 Sources of error in sample data

Sampling error (i.e., the difference between a sample estimator and its corresponding population parameter) is partly attributable to random fluctuations in terms of which units happen to be randomly selected to be included in a sample. This is known as random error.

Sources of Sample Error a. Random Error -- Sampling error that results from random fluctuations, or chance, according to which units happen to be randomly selected for inclusion in a sample.

b. Nonresponse Bias -- sampling error that results when a substantial number of units in a sample: (a) does not provide data (e.g., does not respond to a survey or participate in a study); and (b) has significantly different characteristics compared to those units that do provide data.

c. Selective Availability -- sample error that results because units that are difficult to identify are left out of the study population and have no chance of being included in the sample (i.e., study population doesn't match theoretical population).

d. Areal Bias -- sampling error that occurs if certain geographic areas have no chance of being included in the sample (when geographic areas are used as units of analysis).

e. Measurement Error -- sampling error that results from using poor quality indicators to measure key study variables. As a result incorrect numbers are assigned to units that don't represent the true quantity of an action or orientation, or the true characteristics possessed by the units.

Author: Department of Sociology