point
Suresh Menon

Principal Consultant

Digital Stream Consulting

Parametric Hypothesis Testing (Analyse Phase Of Six Sigma)

A hypothesis is a value judgement made about a circumstance, a statement made about a population .Based on experience an engineer can for instance assume that the amount of carbon monoxide emitted by a certain engine is twice the maximum allowed legally. However his assertions can only be ascertained by conducting a test to compare the carbon monoxide generated by the engine with the legal requirements.

Number of Tails: The number of tails refers to the distribution used for testing the hypothesis. There are left and right tails for distribution. Depending on the objective of the test the left, right or both tails are considered in a hypothesis test and the number of tails decided. The Number can either 2 or 1 inequality sign ≠ indicates 2 tail and less than or greater than indicates one tail. (˂ ˃)

If the data used to make the comparison are parametric data that is data that can be used to derive the mean and the standard deviation, the population from which the data are taken are normally distributed they have equal variances. A standard error based hypothesis testing using the t-test can be used to test the validity of the hypothesis made about the population. There are at least five steps to follow when conducting hypothesis.

1. Null Hypothesis: The first step consists of stating the null hypothesis which is the hypothesis being tested. In the case of the engineer making a statement about the level of carbon monoxide generated by the engine , the null hypothesis is

H0: the level of carbon monoxide generated by the engine is twice as great as the legally required amount. The Null hypothesis is denoted by H0

2. Alternate hypothesis: the alternate (or alternative) hypothesis is the opposite of null hypothesis. It is assumed valid when the null hypothesis is rejected after testing. In the case of the engineer testing the carbon monoxide the alternative hypothesis would be

H1: The level of carbon monoxide generated by the engine is not twice as great as the legally required amount.

3. Testing the hypothesis: the objective of the test is to generate a sample test statistic that can be used to reject or fail to reject the null hypothesis .The test statistic is derived from Z formula if the samples are greater than 30.

Z = Xbar-µ/σ/ √n

If the sample are less than 30, then the t-test is used

T= X bar -µ/ s/ √n

4. Level of Risk: The level of risk addresses the kinds of errors that can be made while making an inference based on the test statistics obtained from the testing. Two types of errors can be made. The experimenter can falsely reject a hypothesis that is true. In that case we say that he made a type 1 error or α error. If he fails to reject a hypothesis that is actually false he makes a type II error or β error. In the case of the engineer testing the level of carbon dioxide is in fact twice as great as the prescribed level and he rejected the null hypothesis he would have made a type 1 error. If the carbon monoxide generated by the engine is less than the legally prescribed level and the experimenter fails to reject the null hypothesis he would have made a type ii error he would have failed to reject a null hypothesis that happened to be false.

5. Decision Rule: Only two decisions are considered rejecting the hypothesis or failing to reject it. The decision rule determines the conditions under which the null hypothesis is rejected or failed to be rejected. The Decision to reject the null hypothesis is based on the alpha level. Before conducting the test the experimenter must set the confidence level for the test. He can decide to test the hypothesis with a 95% confidence level. That means he would be 95% sure that the decision to reject or fail to reject the null hypothesis is correct. However 95% confidence level means that 5% chance that an error will be made.

Example: A machine used to average a production rate of 245 units per hour before it went for repair. After it came back from repair over a period of 25h. It produced an average of 249 units with a standard deviation of 8.Determine if there is a statistically significant difference between the machines productivity before and after repair at a confidence level of 95%?

Solution:-

Since the sample is smaller than 30 we will use the t-test 

The null hypothesis is the following

H0: The productivity before repair = Productivity after repair

The alternate hypothesis should be the opposite

HA: The productivity before repair ≠ the productivity after repair.

N=25, s=8, XBar=249, µ= 245

Since the Confidence level is set at 95%, α= 1-0.95 = 0.05, since the null hypothesis is stated as equality we have a 2 tailed curve with each tail covering the one half of alpha. We would need to find α/2 = 0.05/2 = 0.025. With degree of freedom df = n-1 = 25-1 =24 the t critical can be obtained from the t table which is = 2.064.

If the calculated t-statistics falls within the interval [-2.064, + 2.064] we would fail to reject the null hypothesis; otherwise the null hypothesis would be rejected.

Let us find the calculated t-statistic:

T= Xbar - µ / s/√n = 249-245/8 √25 = 2.5

The calculated t statistic is 2.5 since it falls outside the interval [-2.064, + 2.064] we will have to reject the null hypothesis and conclude that there is a statistically significant difference between the productivity of the machine prior to repair and after repair.

We can also Use the confidence interval method:

We can use the formula for confidence interval

Xbar –t α /2,n-1 s/√n ≤ µ ≤ Xbar + α /2,n-1 s/√n

There fore

249-2.064 8/ √25 ≤ µ ≤ 249 + 2.064 8/ √25

245.698 ≤µ≤252.302

The null hypothesis is rejected because the mean µ (245) does not fall within the interval [245.698,

253.302]

P-Value Method: In the previous example we rejected the null hypothesis because the value of the calculated t-statistic was outside the interval [-2.064, + 2.064] had been within that interval we would have failed to reject the null hypothesis the reason why [-2.064, + 2.064] was chosen because the confidence level was set at 95% which translates into α = 0.05. If alpha were set at another level the interval would have been different. The results obtained do not allow a comparison with a single value to make an assessment. Any value of the calculated t-statistic that falls within that interval would lead to a nonrejection of the null hypothesis.

The use of p-value method enables us not have pre-set the value of alpha , the null hypothesis is assumed to be true ; the p-value sets the smallest value of alpha for which the null hypothesis has to be rejected. For instance in the example above the p-value generated from Minitab was 0.020 and alpha = 0.05 there alpha is greater than the p-value and we have to reject the null hypothesis.

Non parametric hypothesis testing:-

In the previous examples we used means and standard deviation to determine if there were statistically significant differences between samples. What happens if the data cannot yield arithmetic means and standard deviations? What happens if the data are nominal or ordinal?

When we deal with categorical, nominal, or ordinal data non parametric statistics are used to conduct hypothesis testing. The Chi-square test and the Mann-Whitney U test are examples ofnonparametric tests.

Chi-Square test:-

Chi Square goodness of fit test:-

Fouta electronics and Touba Inc are computer manufacturers that use the same third party call centre to handle their customer services. Touba Inc conducted a survey to evaluate how satisfied its customers were with the services that they receive from the call centre, the results of the survey are given in the table below.

Caregories Rating%
Excellent 10
Very Good 45
Good 15
Fair 5
Poor 10
Ver poor 15

After having the results of the survey, fouta electronics decided to find out whether they apply to its customers, so it interviewed 80 randomly selected customers and obtained the results as shown in table below.

Categories Rating(absolute value)
Excellent 8
Very Good 37
Good 11
Fair 7
Poor 9
Very Poor 8

To analyse the results the quality engineer at fouta electronics conducts a hypothesis testing. However in this case, because he is faced with categorical data he cannot use a t-test since a t-test relies on the standard deviation and the mean which we cannot obtain from either table. We cannot deduct a mean satisfaction or a standard deviation satisfaction. Therefore another type of test will be needed to conduct the hypothesis testing. The test that applies to this situation is the Chi-square test, which is a non-parametric test.

Step 1: State the hypothesis

The null hypothesis will be

H0: The results of Touba inc Survey = the Same as the results of fouta electronics survey

And the alternate hypothesis will be

H1: The results of Touba Inc Survey ≠ the same as the results of fouta electronics survey.

Step 2: Test Statistic to be used: The test statistic to conduct hypothesis testing is based on the calculated

ᵡ ^2 = ∑ (f0-fe)^2/ fe where fe represents the expected frequencies and f0 represents the observed frequencies

Step 3: Calculating the X^2 test statistic:-

We will use the same table as given above if a sample of 80 customers were surveyed the data in the table-

Table A

Categories Rating % Expected frequencies fe
Excellent 10 0.10 * 80 =8
Very Good 45 0.45 * 80 = 36
Good 15 0.15 * 80 = 12
Fair 5 0.05 * 80 = 4
Poor 10 0.10 * 80 = 8
Very Poor 15 0.15 * 80 = 12
Total 100 80

 

We can summarize the observed frequencies and the expected frequencies in the table given below:

Categories

Observed Frequencies fo

Expected Frequencies fe

(f0-fe)^2/fe
Excellent 8 8 0
Very Good 37 36 0.028
Good 11 12 0.083
Fair 7 4 2.25
Poor 9 8 0.125
Vert Poor 8 12 1.33
Total 80 80 3.816

Therefore we conclude X^2 = ∑ (f0-fe) ^2/ fe = 3.816.

Now that we have found the calculated X^2 we can find the critical X^2 from the table. The Critical X^2 is based on the degree of freedom and the confidence level since the number of categories is 6 the degree of freedom df is = 6-1=5, if the confidence level is set at 95%, α = 0.05; therefore the critical x^2 is equal to 11.070 which we get from the X^2 table.

Since the critical X^2 0.05,5 = 11.070 is much greater than the calculated X^2 =3.816 we fail to reject the null hypothesis and we have to conclude that the surveys done by Touba Inc and Fouta Electronics gave statistically similar results.

Chi-Square using Tools: Click on SigmaXL->Select Chi-Square- Goodness test->Enter the values as shown in the table above you will get the same results. As the excel sheet is macro enabled.

Normalizing Data

  •  Non Normal data Outliers can cause your data to become skewed. The mean is especially sensitive to outliers. Try removing any extreme high or low values and testing your data again.

When the data being analysed are not normal and the tools required for analysis require their normality then one option would be to normalize them. Normalizing the data means transforming them from non normal to normal. This can be done using Box-Cox transformation, the Johnson transformation.

Example take from the minitab files , file titled as acid concentration and click on stat -> basic statistics and then on normality testing and then press ok the graph shows that the data points are not randomly clustered around the centreline and the p-value(0.05) is less than the prescribed 0.05.

From the menu bar of Minitab click on stat then click on quality tools and from the drop down list select Johnson transformation the graph shows the probability plot before and after transformation now the p value becomes greater than the standard (0.05) the data are therefore concluded to be normally distributed.

Anova

If for instance 3 sample means A, B, C are being compared using the t-test is cumbersome for this we can use analysis of variance ANOVA can be used instead of multiple t-tests.

ANOVA is a Hypothesis test used when more than 2 means are being compared.

If K Samples are being tested the null hypothesis will be in the form given below

H0: µ1 = µ2 = ….µk

And the alternate hypothesis will be

H1: At least one sample mean is different from the others

Annova are One-way, 2-way, 3 way etc. one way annova has one independent factor and 2 way

annova has 2 Independent factors.

Problem of 2-way Annona: Consider an experiment on aluminium castings .Customer requires hardness to be controlled. Hardness is an important to quality characteristic (CTQ) we therefore want to evaluate the effect of two factors on hardness (y).The 2 potential contributors are percentage of copper and magnesium. We have Controlled Copper percentage at 3.5 and magnesium at 1.2 and 1.8 percent data is available in the file hardness.xls

Solution: Open the XLS file Anova Hardness and the Minitab window opens click on stat-> Annova- > and select 2 way a window pops up in response select hardness and row factor select copper and in column factor select magnesium at 95% confidence intervals you will get the output as shown below

Two-way ANOVA: Hardness versus Copper_1, Magnesium

Source        DF    SS        MS        F        P

Copper_1      1    1.125     1.125     1.29     0..320

Magnesium   1    21.125   21.125   24.14   0.008

Interaction     1   15.125   15.125   17.29   0.014

Error             4     3.500     0.875

Total             7     40.875

S = 0.9354 R-Sq = 91.44% R-Sq(adj) = 85.02%

Each MS(mean Square) value represents variance due to the source mentioned in the row we can conclude that magnesium an interaction between magnesium and copper have significant effect on the hardness of aluminium castings.

Note: Minitab & SigmaXL are Statistical Software’s used in Six Sigma and they are available as a trial version for 30 days after which it has to be purchased.

The Author Suresh .V. Menon is a Six Sigma Consultant & Trainer , he is a certified Lean Six Sigma Black Belt accredited by IASSC and has worked in various capacities as Principal Consultant, Test Manager ,ERP Consultant, Project Manager and can be contacted at sureshmenonr1009@gmail.com for queries and comments.