Chi-square_Tests.md
Chi-square (χ²) tests are used to determine whether there is a significant association between categorical variables. They compare the observed frequencies in each category to the frequencies expected under the null hypothesis.
1. When to Use Chi-square Tests
Testing independence between gender and preference for a product
Checking if a die is fair
Analyzing survey results in contingency tables
2. Types of Chi-square Tests
a. Test of Independence
Used to assess whether two categorical variables are associated.
Example: Is there a relationship between smoking status (yes/no) and disease outcome (positive/negative)?
b. Goodness-of-fit Test
Used to test whether the distribution of a single categorical variable fits a specified distribution.
Example: Do observed coin toss results match the expected 50/50 distribution?
3. Assumptions of Chi-square Tests
Observations are independent
Categories are mutually exclusive
Expected frequency in each cell should be ≥ 5 (for validity)
4. Chi-square Test Formula
[ \chi^2 = \sum \frac{(O - E)^2}{E} ] Where:
( O ): Observed frequency
( E ): Expected frequency
5. Interpreting Results
Null hypothesis (H0): No association between variables (or observed distribution matches expected)
Alternative hypothesis (Ha): There is an association (or distribution differs)
If p-value < 0.05, reject H0
6. Example (R Code)
# Test of independence
chisq.test(table(data$smoking, data$disease))
# Goodness-of-fit test
observed <- c(25, 30, 45)
expected <- c(33.3, 33.3, 33.3)
chisq.test(x = observed, p = expected/sum(expected))
7. Summary
Chi-square tests are non-parametric and widely used for analyzing categorical data. Ensure assumptions are met and interpret the results in the context of the data.
Last updated