1. Why we conduct a t test?

  • A t-test is a statistical test that is used to compare the means of two groups.
  • It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different each other.
  • The Central Limit Theorem suggests that even if the original variables themselves are not normally distributed, distribution of samples converges to normal as the number of samples increases.
    → This is the theoretical foundation that we can conduct statistical inference and hypothesis test using a normal distribution.
  • However, it is not always the case that we can have access to sizable data.
  • t test provides us a solution of this small N problem in hypothesis testing.

2. What type of t-test should we use?

When you conduct a t-test, you need to consider two things:
(1) whether the groups being compared come from a single population or two different populations
(3) whether you want to test the difference in a specific direction or both directions

One-sample, two-sample, or paired t-test?

  • If the groups come from a single population (for example, measuring before and after an medical treatment), perform a paired t-test.
  • If the groups come from two different populations (for exmaple, comparing which hamberger tastes better between McDonald’s and In-N-Out Burger), perform a two-sample t-test.
  • If there is one group being compared against a standard value (for example, comparing your test score to the averaged test score of your friends in the same class), perform a one-sample t-test.

One-tailed or two-tailed t-test?

  • If you only care whether the two populations are different from one another, perform a two-tailed t-test.
  • If you want to know whether one population mean is greater than or less than the other, perform a one-tailed t-test.
  • A one-tailed t-test is way harder to be passed.

3. Statistical significance

Denoted as Details
significance level α The probability of the study rejecting the null hypothesis
significance probability p-value The probability of obtaining a result at least as extreme, given that the null hypothesis is true.
  • In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis.
  • The null hypothesis (often denoted \(H_0\) ) is a default hypothesis that a quantity to be measured is zero (null).
  • The alternative hypothesis (often denoted \(H_1\) ) is a position that states something is happening, a new theory is preferred instead of an old one (null hypothesis).
  • A study’s defined significance level, denoted by \(α\), is the probability of the study rejecting the null hypothesis, given that the null hypothesis was assumed to be true.
  • Significance probability, the p-value of a result (\(p\)), is the probability of obtaining a result at least as extreme, given that the null hypothesis is true.
  • The result is statistically significant, by the standards of the study, when \(p≤α\).
  • The significance level for a study is chosen before data collection, and is typically set to 5% or much lower—depending on the field of study.

3.1 Process of t test

  • Let’s suppose you drew 10 numbers (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) from the population (population mean \(μ\) = 5.5)
  • The sample mean = 5.5
    (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)/10 = 5.5
  • What we want to know:
    Whether or not the population mean is 5.
  • To confirm this, we need to conduct t test.
Process of t test
  • Show null hypothesis: \(H_0\)
  • Show alternative hypothesis: \(H_1\)
  • Calculate t value
  • Identify the critical values
  • Check if your t value is within the rejection area
  • Conclusion
  • Process of t test in this case

    • \(H_0\): the population mean = 5
    • What we want to know is “The sample mean is 5.5. With this result, can we conclude that the population mean is 5?”
    • \(H_1\): the population mean is not 5
    • \(H_0\) and \(H_1\) are mutually exclusive
    • Calculate \(t value\) with the following equation:

    \[T = \frac{\bar{x} - μ_0}{SE} = \frac{\bar{x} - μ_0}{u_x / \sqrt{n}}\]

    • \(\bar{x}\) : Sample mean (= 5.5)
    • \(μ_0\) : The value we want to estimate (= 5)
    • \(n\) : Sample size (= 10)
    • \(u_x\): unbiased standard deviation
    • \(SE\) : standard Error: SE

    \(u_x^2\) can be calculated with the following eauation:

    \[u_x^2 = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}\]

    • \(u_x\) = 3.03
    • If we plug in \(μ_0\) = 5 in the equation above, we get the following t value:

    \[T = \frac{\bar{x} - μ_0}{u_x / \sqrt{n}}\]

    \[ = \frac{{5.5} - 5}{3.03 / \sqrt{10}}\]

    \[ = 0.522\]

    • We want to know whether or not the population mean is 5.

    Point estimation

    • Point estimation involves the use of sample data to calculate a single value (known as a point estimate) which is to serve as a “best guess” or “best estimate” of an unknown population parameter (for example, the population mean).
    • We estimate the sample mean, 0.522, and infer whether the population mean is 5.