MathProbabilityFrom probability to statistics

Applications of central limit theorem

8 minutes read

You've learned about the Central Limit Theorem previously, but haven't yet delved into its applications. These applications are essential in statistical testing, not only in pure math but also in fields like risk management and machine learning! So, let's explore this topic together!

Z score

One key concept tied to the Central Limit Theorem is the z-score. This measure tells us how many standard deviations a particular data point or observation is from a distribution's mean, similar to quantifying how a point is 3 meters away from a house, but in terms of standard deviations.

Understanding z-scores is beneficial as they provide a standardized way to interpret and compare data points from different distributions.

To calculate a z-score, subtract the distribution mean (μ)(\mu) from the data point (x)(x) and divide the result by the distribution's standard deviation (σ)(\sigma).

z=xμσz={{x-\mu}\over \sigma}

Let's look at an example comparing student grade marks to understand this better:

The distribution of student's marks for each subject are as follows:

The average student's mark in Physics (μPhysics{\mu}_{Physics}) is 8080, with a standard deviation (σPhysics{\sigma}_{Physics}) of 1010. Meanwhile, the average student's mark in Algebra (μAlgebra{\mu}_{Algebra}) is 6565, with a standard deviation (σAlgebra{\sigma}_{Algebra}) of 55.

If a student scored 8585 in the Physics exam and 7575 in the Algebra exam, which subject did the student perform better in?

You can determine this by calculating z-scores:

zPhysics=Physics markμPhysicsσPhysics=858010=0.5z_{Physics}={ {\text{Physics mark}-{\mu_{Physics}}}\over{\sigma_{Physics}}}={{85-80}\over10}={0.5}
zAlgebra=Algebra markμAlgebraσAlgebra=75655=2z_{Algebra}={ {\text{Algebra mark}-{\mu_{Algebra}}}\over{\sigma_{Algebra}}}={{75-65}\over5}={2}

By comparing z-scores, you discover that although the student got higher marks in Physics, they excelled more in Algebra. The z-score of 22 indicates a score that is two standard deviations above the mean, while in Physics, the z-score of 0.50.5 signifies a score that is 0.50.5 standard deviations above the mean.

Z-scores can also help when comparing a sample with the general population.

Imagine you're a math teacher and you believe your students perform above average compared to the national level. The national average score on an applicable math test is 7575, with a standard deviation of 1010.

Testing your class of 3535 students, they score an average of 8080. You want to check the likelihood that your class genuinely is above average. To do this, you perform a z-test. Your class z-score is

Z=807510/352.68Z=\frac{80-75}{10/\sqrt 35} \approx2.68

Next, calculate the p-value using a calculator. In your scenario, the p-value, approximately 0.00360.0036, is lower than the common significance level of 0.050.05. This suggests that your class's average score is distinctively different from the national average.

From Z score to T score

There's another type of score used in statistics, the t-score. This score is applied when the sample size is small and/or the standard deviation of the population is unknown. Like the z-score, it tells you how many standard deviations a data point is from the mean, but it also considers the sample size, causing its use primarily with small samples.

The formula to calculate a t-score is:

T=Xμs/nT = \frac{X - μ} {s/\sqrt n}

In this formula, XX represents the sample mean, μμ is the population mean, ss is the sample standard deviation, nn is the sample size.

To help decide which score to use for a problem, the following diagram will assist.

T-score or z-score flow chart

Having discussed all the theory about statistical testing, let's proceed to examine its practical usage in A/B testing.

A/B testing

A/B testing shares a connection with the Z-test in how it employs statistical procedures to determine which version of a product or service performs better.

In an A/B test, you compare two groups: Group A (the control group) and Group B (the variant group). The metric you're interested in might be click-through rate, conversion rate, time spent on a page, etc. You calculate the mean performance for both groups, the standard deviation, and the number of users in each group. The Z-test assists in determining if there's a statistically significant difference in performance between the two groups by comparing the difference between the two sample means against the difference you'd anticipate occurring randomly.

Consider an example of A/B testing in risk management. Suppose you're a risk manager at a bank, and you're thinking of implementing a new software system to detect fraudulent transactions. Still, you're uncertain about whether the new system will outperform the old one. You're wary about potential risks attached to switching systems, including the implementation costs, the learning curve for employees, and the chances of false positives or negatives.

To assess the performance of the new system, you conduct an A/B test. You divide your transactions into two groups:

  • Group A continues utilizing the existing fraud detection system.

  • Group B employs the new fraud detection system.

After some time, you compare the results from both groups by evaluating key metrics such as the number of detected fraudulent transactions, the number of false positives (genuine transactions flagged as fraudulent), and the number of false negatives (fraudulent transactions that were not caught).

Upon concluding the test period, analyze the results. If the new system (Group B) substantially detects more fraudulent transactions with fewer false positives and negatives, you might infer that the potential system switch risks are worth it. Otherwise, opting to continue with the existing system or seeking other alternatives may be best.

Similar tests can occur in other situations, ranging from lifespan estimation for quality checks to election forecasting and fraud detection!

Conclusion

  • Z-scores and t-scores express how many standard deviations a data point is from the mean of a distribution, and they're helpful for comparing data points between diverse distributions.

  • Z-scores underpin A/B testing.

  • The Central Limit Theorem finds usage in various fields, including inferential statistics, quality control, financial analysis, and machine learning.

How did you like the theory?
Report a typo