Hello, readers! In this article, we will be focusing on an important statistical test in Analysis — the T-test in R programming.
So, let us being!
Table of Contents
What is T-test all about?
Statistical tests give us an idea about the distribution of data on a statistical scale. This distribution plays an important role in analyzing the data prior to modelling.
There are various statistical tests such as Chi-square test
, ANOVA test
, etc.
One such test is T-test
in statistics.
T-test is a statistical test that works on regression data variables. It is basically used to compare and analyze the mean of the data variable/variables. With T-test, we can understand the distribution of the passed data variables in terms of the mean values of them and analyze a sense of association between them.
There are broadly two kinds of T-tests:
- One-sample T test
- Paired T-test
We will be covering both the techniques in the upcoming section!
Assumptions of T-test in R
- The data is assumed to follow a normal distribution in terms of the values.
- The passed variables are observed to have equal variance.
Hypothesis for T-testing in R
- Null-hypothesis: The mean value of the groups are equal.
- Alternate-hypothesis: The mean value of the groups is not the same i.e. they are unequal.
1. One-sample T test in R
In One-sample T test, we analyze and compare the mean value of the variable against a static mean value passed to the function to judge its credibility.
Have a look at the below syntax!
t.test(variable, mu=value)
As a result of this function, we get a p-value which can be observed in the below manner:
- If the p-value is greater than 0.05 (assumed significance level), then we accept the NULL hypothesis.
- If the p-value is less than or equal to 0.05, we reject the Null hypothesis i.e. we accept the Alternate hypothesis value.
Example:
#Removed all the existing objects rm(list = ls()) data = rnorm(10,5.99) print(data) print(t.test(data,mu=6))
Output:
Here, we have generated a vector of 10 data points with a mean of 5.99 using rnorm() function. After applying the t-test, we avail the below results.
> data = rnorm(10,5.99) > print(data) [1] 5.472294 6.922749 6.680573 4.839677 5.692492 6.584267 6.960417 5.836062 6.166102 5.422624 > print(t.test(data,mu=6)) One Sample t-test data: data t = 0.2539, df = 9, p-value = 0.8053 alternative hypothesis: true mean is not equal to 6 95 percent confidence interval: 5.543403 6.572048 sample estimates: mean of x 6.057726
It is clearly understood that the p-value is greater than 0.05, thus we cannot reject the NULL hypothesis because the values are close estimates and p-score is greater than the estimated value i.e. 0.05.
2. Paired T-test in R
In a Paired T-test, we compare the mean of two different groups and check if they have same mean or a contrast value of mean. That is, it checks if the difference between the mean values of the two groups is equal to 0.
Syntax:
t.test(var1,var2,var.equal=TRUE)
Example:
Here, we have passed two different data vectors created using rnorm()
function to the t.test()
function.
#Removed all the existing objects rm(list = ls()) data = rnorm(10,5.99) print(data) info = rnorm(10,5.55) print(info) print(t.test(data,info,var.equal = TRUE))
Output:
As a result, we see that the p-value is greater than 0.05. So, we accept the NULL hypothesis because the difference between the mean of the groups can be as low as -0.5 and as high as 1.5 which is considered a small portion.
> print(info) [1] 5.526671 4.743747 5.833444 8.278065 7.042861 5.718028 4.152188 4.670495 5.334782 5.040747 > print(t.test(data,info,var.equal = TRUE)) Two Sample t-test data: data and info t = 0.99928, df = 18, p-value = 0.3309 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5646441 1.5890105 sample estimates: mean of x mean of y 6.146286 5.634103
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more such posts related to R programming, Stay tuned!
Till then, Happy Learning!! 🙂