What is the Concept of Confidence Interval in Statistics?

Consider a process that is generating a certain output (like cycle time) that can be characterized by a certain statistical parameter (like average cycle time).

Suppose you are interested in determining the value of this parameter for the entire population. So how would you go about it?

You start with selecting a sample and measuring the value of the parameter using the sample data.

The question that arises then is how to estimate the population parameter from the sample parameter value.

That’s where confidence interval can come to your rescue.

Confidence interval is a statistical interval estimate of the population parameter that is derived from the sample parameter value.

It provides a range of values within which the population parameter is likely to fall and is expressed along with the degree of confidence.

The degree of confidence on the interval estimate is based on the level of significance selected (the alpha value).

For alpha value of 0.05, what you get is a 95% confidence interval. Which means that the probability the population parameter will fall within the interval estimate is 0.95.

It is important to note here that the data generated by most of the processes generally follows the normal distribution (and if not, the data can be transformed into one).

In addition, even if direct measurement of the process doesn’t generate data that follows normal distribution, the average of these values will follow normal distribution in accordance with the CLT (central limit theorem).

The good news with the above results is that it makes the mathematics part easy since for a normal distribution you just need to determine just two parameters – mu (average) and sigma (standard deviation) – to fully characterize the process.

So how to derive confidence interval of population mean for a normal distribution?

The below example illustrates this clearly.

Assuming a sample of size N is selected from a normal distribution and you calculate the following

x-bar (sample average) = summation for i=1 to N [(xi) /N]

s (sample standard deviation) = summation for i=1 to N [(xi – x-bar)^2/(n-1)]

In case population standard deviation is known (sigma), z (which follows standard normal distribution) is used as the critical value and confidence interval would be:

x-bar – z . sigma/sqrt(N), x-bar + z . sigma/sqrt(N)

In case population standard deviation is not known and is estimated (s), t (which follows t distribution) is used as the critical value and confidence interval would be:

x-bar – z . s/sqrt(N), x-bar + z . s/sqrt(N)

Confidence interval is a very useful tool in gaining deeper understanding of the uncertainty associated with the outcome of a process.

They also help estimate the interval within which something is expected to happen and hence can be used for prediction purpose as well.

No comments:

Post a Comment