Unit: Inference for Categorical Data: Chi – Square
Chapter: Chi- square Test for Goodness of Fit
Reference: – Categorical data, expected frequencies, Null & Alternative Hypothesis, Chi – square test statistics, Degrees of freedom, Critical value & P value, Decision rule, Interpretation, Assumptions & Conditions, Practical Application.
After studying this chapter, you should be able to:
- Categorical Data & Expected Frequencies.
- Null & Alternative Hypothesis.
- Chi – square test statistics & Degree of Freedom.
- Critical Value, P Value & Interpretations.
Categorical Data & Expected Frequencies
Categorical Data:
- Categorical data, also known as qualitative or nominal data, consists of observations that can be sorted into distinct categories or groups.
- Categorical variables have values that are labels or names, and they cannot be ordered or measured on a numerical scale.
- Examples of categorical data include gender (male/female), eye color (blue/brown/green), and type of car (sedan/SUV/truck).
- Categorical data is often presented in frequency tables or bar charts to show the distribution of observations among different categories.
Expected Frequencies:
- Expected frequencies represent the values that would be observed in each category under a certain theoretical distribution or assumption.
- In the context of hypothesis testing, expected frequencies are calculated based on a null hypothesis, which is a statement of no effect or no difference.
- The expected frequencies are usually determined using proportions or probabilities, assuming that the null hypothesis is true.
- The Chi-Square Test for Goodness of Fit compares observed frequencies with expected frequencies to assess whether there is a significant difference between the two.
- The formula for calculating the Chi-Square test statistic involves comparing the squared differences between observed and expected frequencies, divided by the expected frequencies.
Degrees of Freedom:
The degrees of freedom for the Chi-Square Test for Goodness of Fit are calculated as the number of categories minus one (df = k – 1, where k is the number of categories).
Critical Value and P-value:
The critical value is a threshold from the Chi-Square distribution that helps determine whether the observed differences between frequencies are statistically significant.
The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the data, assuming the null hypothesis is true.
Decision Rule and Interpretation:
If the p-value is small (typically less than the chosen significance level, often 0.05), you would reject the null hypothesis and conclude that there is a significant difference between observed and expected frequencies.
Conversely, if the p-value is large, you would fail to reject the null hypothesis and conclude that there is insufficient evidence to claim a significant difference.
Practical Application:
The Chi-Square Test for Goodness of Fit is commonly used in various fields, such as biology (genetics), social sciences (survey analysis), quality control (defective products), and market research (customer preferences).
It helps researchers and analysts assess whether observed data fits expected distributions and provides insights into potential relationships between variables.
Assumptions and Conditions:
Assumptions for the Chi-Square Test for Goodness of Fit include random sampling, independence of observations, and an expected frequency condition (all expected frequencies should be at least 5).
Null & Alternative Hypothesis
- Hypotheses Defined: In statistical hypothesis testing, the null hypothesis (H0) and alternative hypothesis (Ha) are competing statements about a population parameter.
- Null Hypothesis (H0): The null hypothesis is a statement of no effect or no difference. It often represents the status quo or the absence of a particular effect. It is usually denoted as H0.
- Alternative Hypothesis (Ha): The alternative hypothesis is a statement that contradicts the null hypothesis. It represents what the researcher is trying to establish, such as the presence of an effect, a difference, or a relationship. It is denoted as Ha.
- Directionality of Alternative Hypothesis:
- One-Tailed: The alternative hypothesis is one-tailed when it specifies a particular direction of effect (e.g., greater than, less than).
- Two-Tailed: The alternative hypothesis is two-tailed when it specifies a difference without specifying a particular direction.
- Testing Process: Hypothesis testing involves collecting data, calculating a test statistic, and comparing it to a critical value or calculating a p-value.
- Decision Rule: Based on the comparison of the test statistic with the critical value or p-value, you decide whether to reject the null hypothesis in favor of the alternative hypothesis or fail to reject the null.
- Type I Error (α): Type I error occurs when you reject the null hypothesis when it is actually true. It is also known as a "false positive." The probability of Type I error is denoted as α (alpha) and is set as the significance level.
- Type II Error (β): Type II error occurs when you fail to reject the null hypothesis when it is actually false. It is also known as a "false negative." The probability of Type II error is denoted as β (beta).
- Power (1 – β): Power is the probability of correctly rejecting the null hypothesis when it is false. Higher power indicates a better chance of detecting an effect if it exists.
- Significance Level (α): The significance level (α) is the predetermined threshold for determining statistical significance. It is the probability of committing a Type I error.
- Critical Value: The critical value is a threshold derived from the sampling distribution, beyond which you would reject the null hypothesis. It is compared to the test statistic to make a decision.
- P-value: The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the data, assuming the null hypothesis is true.
- Relationship Between Hypotheses: The null and alternative hypotheses are mutually exclusive and exhaustive, meaning that they cover all possibilities and cannot both be true simultaneously.
- Common Symbols:
-
-
-
- H0: Null hypothesis
- Ha: Alternative hypothesis
- α: Significance level (Type I error probability)
- β: Type II error probability
- 1 – β: Power
-
-
Chi – square Test Statistics & Degree of Freedom
Chi-Square Test Statistics:
- Purpose: The Chi-Square (χ²) test statistic is used in hypothesis testing to determine whether there is a significant association between categorical variables.
- Calculation: The formula for calculating the Chi-Square test statistic varies depending on the specific type of Chi-Square test being conducted (e.g., Goodness of Fit, Test of Independence). In general, it involves comparing observed and expected frequencies.
- Comparing Frequencies: The Chi-Square test statistic quantifies the difference between the observed frequencies in a sample and the frequencies that would be expected under a certain distribution or assumption.
- Null Hypothesis: The null hypothesis (H0) typically states that there is no association or difference between the variables being studied. The Chi-Square test assesses whether there is enough evidence to reject this null hypothesis.
- Interpretation: A larger Chi-Square test statistic suggests a greater discrepancy between observed and expected frequencies, potentially indicating a stronger association between variables.
Degrees of Freedom:
- Definition: Degrees of freedom (df) represent the number of values in the final calculation of a statistic that are free to vary. In the context of the Chi-Square test, degrees of freedom relate to the number of categories or cells involved.
- Degrees of Freedom Formula (Goodness of Fit): For the Chi-Square Test for Goodness of Fit, the degrees of freedom (df) are calculated as the number of categories (k) minus one (df = k – 1).
- Degrees of Freedom Formula (Test of Independence): For the Chi-Square Test of Independence, the degrees of freedom (df) are calculated as (rows – 1) multiplied by (columns – 1) in a contingency table.
- Significance Level: The degrees of freedom affect the critical values from the Chi-Square distribution table. As degrees of freedom increase, the critical values decrease, reflecting a narrower region of significance.
- Relation to Sample Size: In general, as the sample size increases, the degrees of freedom also increase. More data points provide greater information and flexibility in estimating the underlying population parameters.
- Limitations: There are limitations on the degrees of freedom based on the number of categories or cells in a contingency table. For example, if the degrees of freedom are too low, the Chi-Square distribution may not be a good approximation.
- Chi-Square Distribution: The Chi-Square distribution is different for different degrees of freedom. As the degrees of freedom increase, the Chi-Square distribution approaches a normal distribution.
- Test Interpretation: When interpreting the results of a Chi-Square test, the degrees of freedom are important for determining the critical value or calculating the p-value.
- Multivariate Tests: In multivariate analyses involving Chi-Square tests, such as the Chi-Square Test of Independence, the degrees of freedom reflect the complexity of the relationship between variables.
- Example: In a 2×2 contingency table comparing gender (male/female) and voting preference (A/B), if you're conducting a Chi-Square Test of Independence, the degrees of freedom would be (2 – 1) * (2 – 1) = 1.
Critical Value, P Value & Interpretations
Critical Value:
- Definition: The critical value is a threshold determined from a probability distribution (such as the Chi-Square or Z distribution) that helps make a decision in hypothesis testing.
- Role: In hypothesis testing, the critical value defines the boundary beyond which you would reject the null hypothesis. If the test statistic exceeds the critical value, it provides evidence against the null hypothesis.
- Significance Level (α): The choice of the critical value is influenced by the chosen significance level (α), which represents the probability of committing a Type I error (rejecting the null when it's true). Common significance levels are 0.05, 0.01, and 0.10.
- Location: Critical values are typically found in statistical tables for different distributions. The location of the critical value depends on the level of significance and the degrees of freedom, if applicable.
- Decision Rule: If the calculated test statistic exceeds the critical value, you would reject the null hypothesis. If it doesn't exceed the critical value, you fail to reject the null hypothesis.
P-value:
- Definition: The p-value is a probability that measures the strength of evidence against the null hypothesis. It quantifies how extreme the observed data is, assuming the null hypothesis is true.
- Interpretation: A small p-value (typically less than the chosen significance level, α) indicates strong evidence against the null hypothesis. It suggests that the observed data is unlikely to occur if the null hypothesis is true.
- Decision Rule: If the p-value is smaller than the significance level (α), you would reject the null hypothesis. If it's larger, you fail to reject the null hypothesis.
- Continuous Interpretation: A p-value of 0.05 doesn't mean the null hypothesis has a 5% chance of being true; rather, it indicates that if the null were true, you'd observe data as extreme as what you have in only 5% of cases.
Interpretations:
- Conclusions: In hypothesis testing, the interpretation of results depends on comparing the calculated test statistic or p-value with the critical value or significance level:
- If the test statistic > critical value or p-value < α: Reject the null hypothesis.
- If the test statistic ≤ critical value or p-value ≥ α: Fail to reject the null hypothesis.
- Type I Error (α) and Type II Error (β): The interpretation of results relates to the risks of Type I and Type II errors. Lowering the significance level (α) reduces the risk of Type I error but increases the risk of Type II error.
- Contextual Interpretation: Always interpret the statistical results in the context of the problem or experiment. Consider the practical significance alongside statistical significance.
- Confidence in Findings: The smaller the p-value or the greater the difference between the test statistic and critical value, the more confident you can be in the findings.
- Effect Size: While p-values and significance levels provide information about statistical significance, effect size measures (like Cohen's d, odds ratios, etc.) provide insights into the practical significance of the observed effect.
- Limitations: Both critical values and p-values have their limitations. Critical values can be arbitrary, and p-values don't provide a measure of the strength of the effect. Therefore, it's important to consider other statistical measures and domain knowledge.
Example: A biologist is studying the genetic makeup of a population of butterflies to determine if their wing color distribution follows the expected Mendelian ratio. The expected ratio of wing colors is 3:1:2 for red, yellow, and green respectively. The biologist collects data from 200 butterflies and wants to test whether the observed wing color distribution matches the expected distribution. Perform a Chi-Square Test for Goodness of Fit at a significance level of 0.05.
Observed Frequencies:
Red: 70 butterflies
Yellow: 50 butterflies
Green: 80 butterflies
Solution: –Step 1: Set Up Hypothesis
Null Hypothesis (H0): The observed wing color distribution follows the expected Mendelian ratio.
Alternative Hypothesis (Ha): The observed wing color distribution does not follow the expected Mendelian ratio.
Step 2: Calculate Expected Frequencies
Total butterflies: 70 + 50 + 80 = 200
Expected Frequencies:
Red: (3/6) * 200 = 100
Yellow: (1/6) * 200 = 33.33 (approximately)
Green: (2/6) * 200 = 66.67 (approximately)
Step 3: Calculate Chi-Square Test Statistic
Chi-Square Test Statistic formula: χ² = Σ((Observed – Expected)² / Expected)
Calculation:
For Red: ((70 – 100)² / 100) = 9
For Yellow: ((50 – 33.33)² / 33.33) = 8.21
For Green: ((80 – 66.67)² / 66.67) = 3.17
Sum of χ² values: 9 + 8.21 + 3.17 = 20.38
Step 4: Determine Degrees of Freedom
Degrees of Freedom (df) = Number of categories – 1 = 3 – 1 = 2
Step 5: Find Critical Value
Using a Chi-Square distribution table or calculator for df = 2 and α = 0.05, the critical value is approximately 5.99.
Step 6: Compare Test Statistic and Critical Value
Test Statistic (χ²) = 20.38
Critical Value = 5.99
Since 20.38 > 5.99, we have evidence to reject the null hypothesis.
Step 7: Calculate p-value
Using a Chi-Square distribution calculator with df = 2 and the test statistic of 20.38, we find that the p-value is very small (p < 0.001).
Step 8: Make a Decision
Since the p-value (p < 0.001) is smaller than the significance level (α = 0.05), we reject the null hypothesis.
Step 9: Interpretation
Based on the data, there is significant evidence to conclude that the observed wing color distribution does not follow the expected Mendelian ratio.
Key Points
- Purpose: The Chi-Square Test for Goodness of Fit is used to determine whether the observed frequency distribution of categorical data fits a hypothesized theoretical distribution.
- Type of Data: It is applicable when you have one categorical variable with multiple categories and you want to test if the observed frequencies differ significantly from expected frequencies.
- Null Hypothesis (H0): The observed frequencies follow the expected theoretical distribution.
- Alternative Hypothesis (Ha): The observed frequencies do not follow the expected theoretical distribution.
- Expected Frequencies: Expected frequencies are calculated based on the null hypothesis, assuming a specific distribution or proportions for each category.
- Calculate the squared difference between observed and expected frequencies for each category.
- Divide the squared differences by the expected frequencies.
- Sum up the values to obtain the Chi-Square test statistic (χ²).
- Degrees of Freedom: The degrees of freedom (df) for the Chi-Square Test for Goodness of Fit are calculated as the number of categories (k) minus one (df = k – 1).
- Compare the calculated Chi-Square test statistic to the critical value from the Chi-Square distribution table with (k – 1) degrees of freedom.
- Alternatively, calculate the p-value associated with the Chi-Square test statistic using a Chi-Square distribution calculator.
- Decision Rule:
- If Chi-Square test statistic > Critical value or p-value < chosen significance level (α), reject the null hypothesis.
- If Chi-Square test statistic ≤ Critical value or p-value ≥ chosen significance level (α), fail to reject the null hypothesis.
- If the null hypothesis is rejected, you conclude that there is a significant difference between observed and expected frequencies, indicating that the observed distribution doesn't fit the expected distribution.
- The data should be categorical and randomly sampled.
- Expected frequencies in each category should be at least 5 to ensure the validity of the Chi-Square distribution approximation.
- Applications: The test is used in various fields such as genetics, social sciences, market research, and quality control to assess whether observed data follows expected distributions.
- Effect Size: While the Chi-Square test indicates if there is a significant difference, it doesn't quantify the strength of the association or the size of the effect.
- Post-hoc Tests: If you reject the null hypothesis, follow-up tests might be needed to determine which specific categories deviate significantly from the expected distribution.
- Multinomial Test: The Chi-Square Test for Goodness of Fit is a type of multinomial test, which is used to compare observed and expected frequencies in multiple categories.
- Critical Chi-Square Value Tables: Critical values for different significance levels and degrees of freedom can be found in Chi-Square distribution tables or calculated using statistical software.