Long time no blog update today update an article on methods of data analysis is mainly based on statistical hypothesis test of principle, whether it is T-test or chi-square test can be used in real work, but also combined with Excel very easy to use, based on the type of statistical test of significance can make the data more convincing. Or to maintain a consistent principle on the methodology and then on the application examples, this article introduces the method, then there will be another article devoted to the practical application of case.
Hypothesis testing
Hypothesis testing (Humaniy Hypothesis Testing), or called a test of significance (Significance Testing) is inferred by the sample based on certain assumptions in the mathematical statistics overall a. The basic principle is to make certain assumptions on the overall characteristics, and then through a sample study of statistical inference, this assumption should be rejected or accepted to extrapolate. Since premised on the assumption, then during the inspection before the corresponding assumptions:
H0: the null hypothesis or the null hypothesis (null hypothesis), the need to verify assumptions; generally first identified the original hypothesis is correct, then the significance level to select whether to accept or reject the null hypothesis.
H1: The alternative hypothesis (alternative hypothesis) is generally the null hypothesis of whether the proposition;, default to accept the alternative hypothesis when the null hypothesis is rejected.
If the null hypothesis is based on the assumption that the population mean μ = μ 0, then the alternative hypothesis for the population mean μ ≠ μ 0, the inspection process is to calculate the corresponding statistic significance probability, to verify the null hypothesis should be accepted or rejected.
T-test
T-test (T-Test) is the most common type of a hypothesis test, mainly to verify whether there was a significant difference between the population mean. T-test a parametric hypothesis testing, so it applies to the range of numerical data, number of visits on the site analysis, the number of unique visitors, residence time and the number of orders, sales of e-commerce. T-test also need to comply with a condition - the overall fit the normal distribution.
Here does not introduce t statistic is how calculation of, based on the t-statistic is remarkable probabilistic is how the query of fact, these computational tools can help us to complete, if there are interested can access the statistics class books, which will be the corresponding presentation. Here is the T-test using Excel's data analysis tools:
Excel default does not load the data analysis tool, so we add their own add-ins, file - Options - Add-ons - check the "Analysis ToolPak" Completing the Add, and then in the "Data" tab on the far right to find data to analyze this button, then you can begin to do the T-test, where the most common paired samples t-test, for example, compare an e-commerce site produced a significant difference in the number of orders in the revised front and rear in days, 10 days of data before and after the revision of the sample to compare:
| Before the revision number of orders | The revised orders | |
| A | 1032 | 1187 |
| 2 | 1178 | 1245 |
| 3 | 1098 | 1379 |
| 4 | 1045 | 1094 |
| 5 | 976 | 1173 |
| 6 | 1101 | 1364 |
| 7 | 1276 | 1119 |
| 8 | 1215 | 1268 |
| 9 | 987 | 1303 |
| 10 | 1065 | 1274 |
First established assumptions:
H0: μ 1 = μ 2, before and after the revision of daily orders equal to the number of mean;
H1: μ 1 ≠ μ 2, we mean not equal to the daily orders before and after the revision.
To enter data into Excel, use Excel's data analysis tools, t-test: the average of paired two-sample analysis, the output of test results:
See the right side of the display is a little dizzy, and look a bit professional, in fact, not difficult, as long as the concern of a numerical size - one-tailed P value is 0.00565, if you need to verify that the 95% confidence significant level, then 0.00565 obviously less than 0.05 (1-95%) reject the null hypothesis that the revision number of orders before and after there was a significant difference. Simply put Why choose one-tailed significance probability P, rather than a two-tailed, for most of the web analytics application environment, we generally need to verify the changes before and after the value if there are significantly raised or lowered, so in general there will be only one class may - or to enhance or decrease, so long as the inspection of the probability of unilateral can, like the average number of orders in the above example, the revised 1240.6 than before the revision 1097.3, we need to verify this "greater than" is a significant , is left unilateral test, in which case concern the one-tailed significance probability P can be.
Chi-square test
Chi-square test (chi-square test), that is, χ 2 test, a ratio between the two populations used to verify the existence of significant differences between Chi-square test is non-parametric hypothesis testing, Boolean or binomial data, based on early for the production of enterprises between the two probability rate of qualified products, such as site analysis can be used for the conversion rate, Bounce in Rate all ratios measure the comparative analysis, in fact, in the previous article - Abandonment Rate influence of factors related applications. Here, too, not to introduce the χ 2 is how to calculate, as well as a significant probability of inquiry based on the χ 2 statistic, here directly to the conversion rate, for example to compare the website conversion rate before and after the occurrence of a significant difference, a sample revision before and after three days of the web analytics data - the number of visits of the total number of visits and conversion, "conversion of the number of visits / total number of visits to calculate the conversion rate:
| Before the revision | The revised | |
| Total number of visits | 30567 | 33651 |
| Conversion of access number | 2976 | 3698 |
| Conversion rate | 9.74% | 10.99% |
First established assumptions:
H0: r 1 = r 2, revised front and rear conversion rate equal;
H1: r 1 ≠ r 2, revised front and rear conversion rate is not equal.
In fact, this is one of the most simple example of four the Niyitegeka square test, without the use of SPSS (of course, be sufficiently familiar with SPSS can also use a similar statistical analysis tools) in order to simplify the calculation steps of the middle, I use Excel directly produced a simple chi-square test of the template, as long as the corresponding cell entry statistics will be able to automatically display the test results:
Click to download: Chi-square test sample
Excel light blue cells support the input, the total number of visits and conversion programs and test programs including the original number of visits, the confidence level of 95% is support of the amendments, if you need the 99% confidence level, as long as to modify the The cell can be.
How to see the test results? In fact, very simple, just look at the red "existence" of the cell to display results to the above case, the conversion rate between the two "there is" significant difference, if it does not exist, then the cell will display " does not exist "With this template for the A / B Testing and other similar data is very simple and easy, or that fact, this Excel template to the A / B, Testing and tailor customized. ![]()
Good to here, in fact, this article is not trying to introduce the T-tests and chi-square test from the statistical point of view of the professional, just want you to understand the principles and the applicable conditions of these two methods, with the most simple way to use such methods to make the data more convincing, please continue to pay attention to enclose the application instance.
»In this paper, the BY-NC-SA agreement, reproduced please specify source: The data analysis » T-test and chi-square test
Related Articles:









Very professional, direct application of statistical theory. To complete the About EXCEL load "file - Option - Add-ons - check the" Analysis ToolPak "add" weak and weak ask, "document" in which to find?
Zitan : Excuse me, for my own use Excel2010 directly in accordance with the 2010 interface to write, just read the 2007 is the upper-left corner of the Logo into the "Excel Options", after the steps are similar.
Learning to prepare the collection main site, long-term learning.
There is a problem, the test results in fact explain the revised front and rear, xx values exist significantly with the difference, but xx value in the existence of significantly with the difference and can not explain the reasons is revised, the use of chi-square test from the blogger's principle can only understand as: time variable and conversion rate variable was significantly related, according to the principle of causal chronological think the time change the conversion rate of change. The conversion rate of the time a significant change may include: marketing activities, changes in the seasonal cycle, revision, etc ... At this point, how the exclusion of other reasons to confirm the revision led to this change?
_AT_ janessi : Finally someone raised this issue, in fact, the examples in this article the existence of this problem, in which non-interference factors on the results of the comparison, the text of the case after detailed explanation of wait a few days is the time to organize and write
Thank you ......
Hello bloggers, I used your method of analysis of the data before and after an event, but found some problems, 95% confidence level is obviously very significant data changes through the T-test shows that significant. So I think the confidence level selected rationality, I learned statistics, to know the standard deviation and sample size is closely related to the size of the confidence interval and overall data. I would like to ask bloggers to study how past historical data to arrive at a reasonable level of confidence to be? Thank you ~
Justin is lee : Hello, "the 95% confidence level is obviously very significant data changes through the T-test show significant" is not very understanding, significant originally derived from hypothesis testing based on a certain confidence level , if not through hypothesis testing how to determine "clearly not very significant. In addition, 95% confidence level is the choice in the prevailing circumstances, is generally used to determine whether they have significant critical; reject the null hypothesis at that level commit Type I error probability of 5%, if the sample size is fixed to reduce Type I error occurs, the corresponding will enhance the chance of the second type of error occurs, so to reduce the occurrence of the first and second type of error, the need to enhance the capacity of the sample.
Joegh : Haha, I understand. Before the event indicators for seven days of data: 110,110,134,123,123,111,109,; for seven days after the event: 130,123,181,158,117,128,112, of T test P value = 0.018 <0.05 change significantly, but the naked eye only, "181,158" These two data changes significantly, other data did not change significantly. I take the sample size is too small to lead to individual outliers great influence on the accuracy of the test of significance. I should have to analyze what causes the abnormal changes of the past two days data. Thank Bo main ~
T-test to those who do not understand statistics thoroughly understand, of an expert. Hope to have more good works, thanks to the selfless dedication of the author.
T-test should be using the wrong tool. According to the conditions of the meaning of the questions and assumptions, should belong to the two normal overall average difference between the test, you should use the EXCEL t-test: Two-Sample Assuming Equal Variances ", the result is not the same, such as" t Stat "should be -3.29, the critical value is not the same.
In addition, the formulation of the chi-square test is non-parametric tests are not entirely correct.
Don : Thank you very much your comments and corrections. There may be revised before and after as "paired samples" is a bit inappropriate, because the user of the website is always changing, "variance hypothesis" will be the right age; chi-square test is generally believed that the overall distribution and parameters are unknown, in the final non-parametric test not a big problem.
Joegh : chi-square test for the following situation: a single normal population variance for the inspection of the known quantity, a parametric test.
Don : taught