Chi-square_Tests.md

Chi-square (χ²) tests are used to determine whether there is a significant association between categorical variables. They compare the observed frequencies in each category to the frequencies expected under the null hypothesis.

1. When to Use Chi-square Tests

Testing independence between gender and preference for a product
Checking if a die is fair
Analyzing survey results in contingency tables

2. Types of Chi-square Tests

a. Test of Independence

Used to assess whether two categorical variables are associated.

Example: Is there a relationship between smoking status (yes/no) and disease outcome (positive/negative)?

b. Goodness-of-fit Test

Used to test whether the distribution of a single categorical variable fits a specified distribution.

Example: Do observed coin toss results match the expected 50/50 distribution?

3. Assumptions of Chi-square Tests

Observations are independent
Categories are mutually exclusive
Expected frequency in each cell should be ≥ 5 (for validity)

4. Chi-square Test Formula

[ \chi^2 = \sum \frac{(O - E)^2}{E} ] Where:

( O ): Observed frequency
( E ): Expected frequency

5. Interpreting Results

Null hypothesis (H0): No association between variables (or observed distribution matches expected)
Alternative hypothesis (Ha): There is an association (or distribution differs)
If p-value < 0.05, reject H0

6. Example (R Code)

# Test of independence
chisq.test(table(data$smoking, data$disease))

# Goodness-of-fit test
observed <- c(25, 30, 45)
expected <- c(33.3, 33.3, 33.3)
chisq.test(x = observed, p = expected/sum(expected))

7. Summary

Chi-square tests are non-parametric and widely used for analyzing categorical data. Ensure assumptions are met and interpret the results in the context of the data.

PreviousANOVA.md NextCorrelation.md

Last updated 2 months ago