7.0 Quantitative methods for testing hypotheses

4/7/00


Click here to start


Table of Contents

7.0 Quantitative methods for testing hypotheses

Establishing a statistical association or correlation between an independent variable and a dependent variable is one of the criteria that must be met to provide evidence of a causal relationship.

Moreover, it is the only criteria that must be met to provide evidence of a simple relationship, or association, between variables.

7.1 Types of statistical analysis

Statistical analysis of social science data involves three broad levels of analysis: 1. Univariate Analysis -- Examines the distribution of units of analysis on 1 variable.

2. Bivariate Analysis -- Examines the association between 2 variables across all units of analysis.

3. Multivariate Analysis -- Examines the association between 3 or more variables across all units of analysis.

Univariate analysis provides information on the characteristics of a sample concerning independent variables and the dependent variable.

Bivariate analysis is used to determine whether there is a correlation between indicators of an independent variable and dependent variable.

As a general rule, the particular statistical methods that are used depends, in part, upon the level of precision in measurement that is used in the indicators measuring the independent variable and dependent variable. Different statistical methods exist for each level of measurement.

Nonparametric Statistics -- statistical methods used for analyzing data measured at the nominal or ordinal levels of precision.

Parametric Statistics -- statistical methods used analyzing data measured at the interval or ratio levels of precision.

7.2 Nonparametric univariate statistics

The purpose of univariate analysis is to understand the characteristics of a sample in regard to a single variable - whether it be independent or dependent.

The Distribution of a Variable -- a listing of numerical values on a variable for all units of analysis.

As you should know by now, distributions of single variables can be generated in Microcase by clicking on the Statistics Menu, and then the Univariate option.

Frequency Distribution -- A list of how many units of analysis have a particular value on a variable.

With univariate analysis, statistics are calculated for a single variable based on the distribution of that variable among a sample, or the population of units of analysis.

Percentage (pr) pr1 = (n1 / n) * 100 pr2 = (n2 / n) * 100 Where: nk = the number of units in category k n = sample size

The percentage allow you to determine which category of a nominal or ordinal measure is possessed by the most units of analysis in a sample.

If the researcher wants to get insight on how close the true population percentage (PR) is to sample percentage (pr), a confidence interval could be calculated.

Ratio ratio12 = n1 / n2 ratio13 = n1 / n3 ratio23 = n2 / n3 Where: nk = number in category k

The ratio compares the number of units in one category of a nominal or ordinal measure to the number of units in another category. It is interpreted as the odds, or likelihood, that a unit will be in one category versus another in a sample.

7.3 Bivariate analysis of nominal data

Bivariate analysis examines the statistical association between 2 variables.

When both variables are measured at the nominal or ordinal level, bivariate analysis is typically conducted through the use of contingency tables.

Central to bivariate analysis is the concept of the joint distribution of two variables.

Joint Distribution of Two Variables -- a listing of the combination of values on 2 variables for all units of analysis.

Voted For Clinton? Sex 1=Yes 1=Female Joint Unit 2=No 2=Male Distribution 1 1 2 (1,2) 2 2 1 (2,1) 3 1 2 (1,2) 4 1 2 (1,2) 5 2 1 (2,1)

With bivariate analysis, contingency tables can be used to test hypotheses about the association between 2 variables that are measured at the nominal or ordinal level.

With variables measured at the nominal level, this is done through using different statistical procedures which address 2 questions in sequence:

a. Are the two nominal variables associated or correlated?

b. How strong is the correlation?

Hypothesis to be tested in Voting Preference Example Ho: Gender is not associated with presidential candidate preference H1: Gender is associated with presidential candidate preference

With a bivariate analysis of these 2 variables, we are attempting to determine or infer whether or not gender is associated (or correlated) with presidential candidate preference among the entire population of registered voters based on a sample of 1,377.

The issue of whether or not 2 nominal variables are associated is addressed through statistical measures known as “goodness-of-fit” statistics.

Using contingency tables with nominal measures, the most commonly used goodness-of-fit statistic is known as Chi-square.

Goodness-of-fit statistics such as chi-square are used for what is known in inferential statistics as “significance tests.” Thus, the chi-square statistic is also known as a chi-square test.

Confidence levels are also central to goodness-of-fit statistics and significance tests.

In relation to a chi-square test, the confidence level refers to the level of risk that the researcher is willing to take in rejecting the null hypothesis that the 2 variables are not associated, and accepting the alternative hypothesis that the 2 variables are associated in the population from which the sample was drawn.

A confidence level can be translated into a probability of drawing a wrong inference. This probability is known as the signficance level.

Significance Level -- Also known as alpha (? ). Is the probability of drawing a wrong inference from a signficance test. This is set by the researcher and calculated by: 1 - (confidence level ÷ 100).

E.G. 95% confidence level: significance level (? ) = 1 - (95.0 ÷ 100) = 1 - .95 = .05 Thus, the significance level (?) that corresponds to a 95% level of confidence is .05.

In relation to hypothesis testing, the signficance level, or ? , refers to the probability of making a Type I error in an inference:

The significance level ? , thus represents the probability of making a Type I error in rejecting the null hypothesis that 2 variables are not associated in the population from which a sample was drawn.

As a goodness-of-fit measure and significance test, chi-square tests the null hypothesis that two variables are not associated in the population.

In looking at the chi-square formula, chi-square measures the extent that the observed cell frequencies from a sample, differ from the expected cell frequencies that you would get if there were no association.

The more the observed cell frequencies differ from the expected cell frequencies, the higher the ?2 value, and the higher the probability that the 2 variables are associated or correlated with each other in the population from which the sample was drawn.

At what value of ?2 can you infer that the 2 variables are associated in the population from which the sample was drawn? Determining this involves three steps:

a. Set a significance level (? ) -- probability of making a type I error

b. Calculate the degrees of freedom = (r-1)(c-1) e.g. 3X2 Table d.f.=(3-1)(2-1) = 2*1 = 2

c. Consult chi-square (?2 ) table to see what the critical value of would be with significance level and degrees of freedom.

To summarize, the chi-square (?2 ) test can be used to infer whether a dependent and independent variable are correlated in a population of units based on sample data, when both variables are measured at the nominal level of precision.

If the ?2 value computed from a sample is larger than the critical value in the chi-square table given the significance level and degrees of freedom, then it can be concluded or inferred that the 2 variables are “significantly” associated with each other in the population from which the sample was drawn.

When both the independent variable and dependent variable are measured at the nominal level, a bivariate analysis to establish a correlation between them uses two different statistical procedures.

The chi-square (?2) test addresses the first question of whether or not the independent variable and dependent variable correlated?

The second question is addressed through a different correlation coefficient that is based on the chi-square statistic. This correlation coefficient -- known as a “measure of association” -- is called Cramer’s V:

V cannot be interpreted as being meaningful, or as being > 0 in the population from which a sample is drawn, unless the ?2 value is found to be significant -- that is, it indicates that the 2 variables are associated in the population.

There is no well-defined rule for interpreting correlation coefficients such as V. We only know that the closer V gets to 1.0, the stronger the correlation.

The following is suggested as a potential rule-of-thumb for interpreting correlation coefficients: .01 - .30 weak correlation (.01 - .10 very weak) .31 - .60 moderate correlation .61 - 1.0 strong correlation

These categories are not set in stone and different researchers have different interpretations of what constitutes a weak versus strong correlation.

In our example, we can conclude that gender was associated with presidential candidate preference for the 1996 election among the population of registered voters in the U.S. However the correlation between these variables is very weak.

This suggests that gender did not have a very important influence on presidential candidate preference in the 1996 election, and that females had only a slightly greater tendency to vote for Clinton compared to males.

Author: Department of Sociology