In many tech companies, experiments on product features (think display of buttons/CTA, arrangement/order of selection, displaying/hiding certain features), asset on key landing pages (which image appeals most to who) and many more are often done in order to iterate quickly to make data-driven decisions on the platform. After all, you don’t want to roll out a feature only to decrease your conversion rate or reduce the value you get from your customers or result in poor customer experience. Such experiments can take the form of a A/B testing (i.e. splitting clients into 2 groups over the same period of time to test) or A/A testing (i.e. features are changed over different periods of time but all clients would see the same at one point in time).
As a product analyst, one of the most commonly used statistical tests to evaluate the effectiveness of the feature from data collected in these experiments is T-test. T-test helps to test your hypothesis on the feature impact.
So, what is a T-test?
The t-test is one of many tests used for the purpose of hypothesis testing in statistics, to determine if there is a significant difference between the means of two given samples/groups. To understand the impact of a product feature, simply quoting the means on a metric is not enough. Most stakeholders would want to know — is the difference significant?
What does it mean to be significant? It means that the impact of the feature is well, significant and not due to chance. Since it is not due to chance, it is likely to be applicable to the entire population. If the confidence level is set at 95%, it means that there is only a 5% chance that the difference in the mean of the metric before and after the feature was introduced is due to randomness/luck.
What does it mean to be insignificant? It means that the impact of the feature is too negligible and does not move the needle on the key metric. The slight difference could very well be due to randomness.
T-test is a parametric test, which means that it is used when the data of interest is assumed to be normally distributed. A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to determine the statistical significance.
T-test is typically used when population parameters (mean and standard deviation) are not known.
Assumptions of T-tests
- Data follows a normal distribution
- Homogeneity of variance i.e. standard deviations of samples are approximately equal
- Data is collected from a representative, randomly selected portion of the total population i.e. not a biased population of a specific trait only
- Data has to be continuous/ordinal. It does not apply to categorical or dummy variables.
How does T-test work?
In general,
- T-tests calculates the t-value using the mean and standard deviation of both sample groups as well as the number of data points.
- A large t-value indicates that the groups are different.
- A small t-value indicates that the groups are similar. - Degrees of freedom (DF) is also computed based on the number of data records available in the sample data. DF refers to the number of data points that has the freedom to vary and are essential for assessing the importance and the validity of the null hypothesis.
- To determine whether the t-value is “large” or “small”, this t-value is compared against a value obtained from the T-distribution table by referencing the appropriate DF and level of probability (alpha level, level of significance, p) as a criterion for acceptance.
- Note that for sample size larger than 30, the values in T-distribution table would be similar to that from Z-tests and Z-distribution.
4. If the t-value is greater than critical value shown in the t-distribution table, the null hypothesis is rejected i.e. there is no difference between the means of the two groups.
Different Types of T-tests
- Paired/correlated t-test
- Tests for the difference between two sets of data points from the same population i.e. it’s a matched dataset from the same individuals over different periods of time
- Textbook use case 1: Compare the effect of a drug before and after a patient takes it
- Reality use case 2: Compare the order values of the same individual before and after a feature change
# use this library for all t-tests below
from scipy import stats
stats.ttest_rel(before,after)
2. Independent/two-sample/student’s t-test
- Tests whether there is a statistically significant difference between the means in two unrelated groups
- Depending on whether the variance of the two groups are equal or unequal, the formula for t-value and DF are different.
- (a) Equal variance/pooled
- (b) Unequal variance
- Textbook use case 1: Compare the grades between gender
- Reality use case 2: Compare the order values between orders in 2 cities
stats.ttest_ind(group1, group2
3. One-sample T-test
- Sometimes, we may want to compare the mean of a single group against a given mean/hypothesized mean.
- Textbook use case 1: Compare the average birth weight of a sample of black babies to the average birth weight of white babies.
- Reality use case 2: Compare the order values after a marketing campaign against the average order value prior to the campaign
- Used less often is the one-sample median t-test, this tests if the sample median is significantly different from a given/hypothesized value. E.g. investment values and annual income can be left skewed and have a non-normal distribution, using median may be a better metric to evaluate.
stats.ttest_1samp(sample, 155)