A great cup of coffee serves as a lifeline to many, it is what wakes people up in the morning, serves as a much needed afternoon jolt, reduces the aches of the day in the evening, and is a much-regarded buddy for those work/study into the wee hours of the night. Drinking coffee often becomes a deep-seated action, people don’t realize when they’ve consumed a little too much. A popular and widely regarded one is the ability of caffeine to keep you from feeling sleepy, so how many cups of coffee should be ideally consumed?
A research study conducted on a college campus to test the effect of caffeine consumption on sleep confirmed students consuming high amounts of caffeine (6-7 cups a day) face significant sleeping issues. The gender and residence status of the respondent causes a difference in effect. The hypothesis adopted was intuitive– students brewing more cups of coffee in a day end up sleeping less than their peers. However, the assumption remains until proven – therefore a researcher was tasked to prove to what extent the hypothesis is true and state the probability of the conclusion being wrong.
In this econometric series, following the types of data write-up, we focus on understanding the most fundamental statistical toolkit used by statisticians– hypothesis testing. Economists loathe assertions and anecdotal evidence- they formulate a hypothesis and test the data to arrive at an evidence-based conclusion. Post data collection and sanitization, a ‘hypothetical thesis’ is constructed which answers the question at hand with statistical tools. The first statistical test conducted by John Arbuthnot observed the fraction of boys that are born year after year is higher than the fraction of girls that are born. He calculated the probability of having a boy is 0.5 then this acts as contrary to what he observed earlier. The probability function was adopted to verify his observation leading to the official definition of the Hypothesis.
In simple terms, a hypothesis is a tentative statement that provides a solution to the problem which has to be verified with an empirical framework. The hypothesis we will be testing is people who drink more than 3 cups of coffee a day sleep for fewer hours compared to people who drink less than 3 cups. This statement is neither a factual nor a correct statement, it’s simply the best guess. We construct this statement based on personal experience and past literature. So pick up a piping hot cuppa (or a cold one) and let’s get testing!
For the sake of explanation, let’s consider another example: the recently passed farm bills in the parliament, the supporters of the bill claim that the market reforms will enable the farmers to connect with consumers. The government hopes to achieve ‘doubling of farmers’ income by 2022’ by instituting market liberalization of agriculture. However, through the lens of a researcher, the question stands: will market reforms steer an increase in income? The research gap is to study the effectiveness of the market reforms in increasing the farmers’ income. After preparing the research design, based on past literature, the best guess points at countries that have liberalized agriculture markets usually result in higher farm revenue.
The logical justification can be interpreted as a policy action by the government —an independent factor— has a significant positive effect in increasing income by reducing the cost for farmers. In reality, a broad sectoral policy cannot be oversimplified by studying two factors. However, the hypothesis testing will enable policymakers to extract the dependency of two variables which serves as empirical evidence in the body of academic literature.
The research question on coffee and farm bills requires a different treatment while designing and testing but the concept of the hypothesis remains the same.
Right Before Hypothesis Testing
While formulating the hypothesis, researchers need to scrutinize literature and select an appropriate experimental design.
Research hypotheses can be of different types like simple, complex, casual, associative, directional, non-directional, and many more. Null and the alternate hypothesis are the most commonly used for research and empirical studies. The former explains no relationship between two or more variables while the latter offers the opposite stance.
Referring to our first example, the null hypothesis denoted by H0 would state that caffeine intake (more than 3 cups a day) does not affect the hours of sleep. From the second illustration, the null hypothesis would underscore there is no statistical relationship between reforms and income. An alternative hypothesis denoted by H1 or Ha states that there is a definitive relationship between the variables in question. It is important to note, these statements at this point are limited to only explaining the existence of a relationship and not the significance of the relationship.
The survey on caffeine’s effect on sleep is primary research and data has been collected from a sample group. The first hypothesis would be to test if the sample mean is the same as the population mean. Imagine if the majority of respondents from the selected group work the night shift then the results will be skewed. To ensure the sample is a sound representation of the population, we formulate the basic hypothesis of the sample mean is the same as the population mean.
In contrast, the impact of market reform study will be secondary research and categorical data on the status of reforms is collected from all the emerging economies. Additionally, farmers’ income from respective countries can be obtained to proceed with our hypothesis testing. The correlation coefficient (β3) indicates the strength of the relationship between both market reforms and farmers’ income. If the coefficient is 1, it means that for every positive increase in one variable there is a proportionate increase in the dependent variable.
We will use the coefficient of market reforms data (β3) that can be written mathematically as,
This means that the coefficient of market reforms in an economy is not different from zero or equal to zero. This would simply mean based on the data, liberalization does not statistically affect the income levels. On the other hand, means the coefficient of market reforms is different from zero implying the market reforms as an independent variable has a quantifiable effect on the income that farmers receive.
To understand hypothesis testing, let us consider the two above mentioned examples and frame hypothetical results.
The hypothesis is tested by creating a normal distribution for mean coefficient data like the figure given below. The area marked as “a rejection or critical region” represents the value of the population parameter is highly unlikely to fall. We reject the null hypothesis if the obtained value falls on the critical region. On the other hand, if it falls in the area marked as the “acceptance region”, we do not reject the null hypothesis.
The area marked in blue (critical region or rejection region) depends on the level of significance chosen. The most commonly encountered levels are, 0.05 or 5%, 0.01 or 1% and 0.1 or 10%.
As we arrive at testing, the decision to accept or reject the null hypothesis warrants a foundational understanding of how the statistics work.
The critical value (calculated from the sample mean estimates) is compared with the test statistic which helps the decision: reject or accept a null hypothesis.
The test statistic is compared with critical values to reject or accept the hypothesis. These critical values are dependent on two things– the sample size (n) and the degrees of freedom.
There is usually a lack of conceptual understanding of the degree of freedom. In simple terms, it refers to the number of values in a data set that is free to vary after a certain number of restrictions have been imposed. For example, your university offers an undergraduate course that has six semesters, and to complete that course a student is required to take 6 courses (one course per semester). Now in the initial five semesters, a student is free to choose any course that he likes out of the six courses that are being offered. However, in the sixth semester, a student will only be left with one choice which he/she will have to take to pass the course. Therefore, the degrees of freedom are five (6-1).
Choose One or Two-Tail Test
We assume population data will follow a normal distribution and the critical value is given by in case of a two-tailed hypothesis and in the case of a one-tailed hypothesis, where “α” is the level of significance.
Within hypothesis testing, two types of tests can be undertaken, one-tail and two-tail tests. A one-tail hypothesis is where the critical region only lies at the one end of a normal distribution curve. Logically this would imply the estimated coefficient market reforms value (β3) is equal to zero in null hypothesis and allowed to be greater (in case of right tail test) or lesser (in case of the left tail test) but not both at the same time.
In mathematical form, a right-tailed hypothesis can be written as,
left tailed hypothesis can be written as,
In the case of a two-tail hypothesis, the critical region lies at both the left and the right of the normal distribution curve. The estimated (β3) value is allowed to be greater or lesser than the population means data.
If the sample mean has too many standard deviations from the hypothesized mean then we can say the sample mean could not have come from the hypothesized distribution. Standard deviation is a measure of variability or dispersion in a set of data points. It represents how far the average data points in a sample lies from the mean value represented by the symbol “σ”.
The standard error on the other hand is an estimate of the standard deviation and is used to calculate the test statistic.
How to Make a Decision to Reject or Accept?
Referring to our model formulated above, let us suppose that there is data of 2000 respondents about their coffee consumption. We can reject the null hypothesis at a 5% level of significance if the test statistics is greater than the critical value. Since the sample size is large, the Z table– which contains the probability of cumulative distribution function can be used for calculating critical values. The test statistic can be calculated as follows
The critical value will be calculated from the standard normal distribution table.
While testing the market reform hypothesis at a 5% level of significance, the sample data will be 24 developing countries with estimated levels of farmers’ income. Less than 30 samples can follow the T-test to compute test statistics. Student’s T table can be used to get the critical value– the table shows different probability density functions with different degrees of freedom.
A 5% level of significance implies that there is a 5% chance rejecting our null hypothesis even if it is true. Let us assume the sample mean from the data collected on coffee consumption and their sleeping pattern lies in the critical region. Then we reject the null hypothesis – sample groups indeed represent the population. But note we are rejecting the null hypothesis with 95 % confidence. There is still a 5% chance that we could be wrong about the decision of rejection.
In the second example, we reject the null hypothesis (no relationship between two variables) by accepting that market reforms contribute to income levels. The same 5 % chance of rejecting while it could be true is possible.
The above figure shows a standard normal distribution curve with a mean 0 and the critical value estimated to be 1.96
Since the test statistic value is greater than the critical value and lies in the rejection region, we reject the null hypothesis.
By refusing to accept the null hypothesis, a statistician is essentially accepting the alternative hypothesis. We can conclude with 95% confidence that there is a statistical relationship between market reforms in influencing the income levels of agrarian communities.
However, it is important to note, the hypothesis testing operates based on the probability of the statement being true in a statistical sense.
Probability of Making Errors
Imagine rejecting the null hypothesis even though liberalizing agriculture markets had no role to play in the income levels. In the world of statistics, one cannot ignore the probability of committing errors– the significance is always defined as “not by chance” or “probably true”. If you are running a hypothesis testing, there are two possible types of errors – type 1 and type 2 error. The former is when you reject a true null hypothesis, imagine a case where market policies are taken based on this conclusion. Most of the economy would undertake market reforms citing the statistical significance with income. The latter type of error is when you fail to reject a false null hypothesis.
Formulating a hypothesis contributes to testing or suggesting theories and describing phenomena. In a plane of anecdotal subjective discourse, hypothesis testing adopts a scientific framework in attending certain questions in academia. To sum up “it seems there is no escaping the use of judgment in the use and interpretation of statistical significance tests”.
- A quantitative study requires the researcher to develop a hypothesis and perform suitable statistical testing.
- The Basis of hypothesis testing aims to check if the sample mean is derived from the population mean. This can be used to assess the correlation between variables as we advance the study.
- The hypothesis of market reforms does not have any impact on farmers’ income is rejected with 95 % confidence. If we run the test 100 times, we will end up rejecting the null hypothesis 95 times.
- A T-test is used for a smaller sample size and Z-test for a larger sample. Additionally, Chi-square test, F-test, Granger Causality test, ANOVA, and ANCOVA are specific tests depending on the requirements (Types of tests will be covered in the upcoming articles of the series)
The article is authored by Richa Gupta, Manjari and Tanvi Bagadiya