Six Sigma – WMEP

Fractional Factorial Designs with Minitab

Michael Parker — Tue, 03 May 2016 23:23:59 +0000

What Are Fractional Factorial Experiments?

In simple terms, a fractional factorial experiment is a subset of a full factorial experiment.

Fractional factorials use fewer treatment combinations and runs.
Fractional factorials are less able to determine effects because of fewer degrees of freedom available to evaluate higher order interactions.
Fractional factorials can be used to screen a larger number of factors.
Fractional factorials can also be used for optimization.

Why Fractional Factorial Experiments?

To run a full factorial experiment for k factors, we need 2^k unique treatments. In other words, we need resources that can afford at least 2^k runs.

With k increasing, the number of runs required in full factorial experiments rises dramatically even without any replications, and the percentage of degrees of freedom spent on the main effects decreases. However, the higher order interactions (3 or 4 factor interactions) can typically be ignored, which allows us to run fewer trials to understand the main effects and two-way interactions.

The main effects and two-way interaction are the key effects we need to evaluate. The higher order the interaction is, the more we can ignore it.

Notice the number of treatments increases dramatically as factors are added.

How Does a Fractional Factorial Work?

We are trying to find the cause-and-effect relationship between a response (Y) and three factors (factor A, B, and C) and their interactions (AB, BC, AC, and ABC). As follows is the 2³ full factorial design (2 level 3 factor). There are eight treatment combinations (2 * 2 * 2).

To perform a 2³ full factorial experiment, we need to run at least eight unique treatments (2 * 2 * 2).
What if we only have enough resources to run four treatments?
As a result, we need to carefully select a subset from the eight treatments so that all of our main effects can be evaluated and the design can be kept balanced and orthogonal.

Example of an invalid design

This design is invalid because only the low setting of factor C is tested. We cannot evaluate the main effect of factor C using this design. Remember orthogonality?

This design is also invalid because it is neither balanced nor orthogonal. Checking orthogonality: the sum of AC interaction signs should equal zero (0).

Run 1 (−)
Run 2 (−)
Run 3 (−)
Run 4 (+)
Sum (−1)

This design has a low and high setting for each factor, but is not orthogonal. To select the four treatments run in the 2³⁻¹ fractional factorial experiment, we start from the 2² full factorial design of experiment. If we replace the two-way interaction (AB) column with the factor C column, the design will be valid.

Imagine a two-factor full factorial with factors A and B. We also learn about the interaction of A and B. In a fractional factorial, we sacrifice learning about the two-way interaction between A and B, and substitute factor C.

2³⁻¹ Fractional Factorial Design Pattern

This pattern implies three factors and four treatments.

Note: We also call this kind of design a half-factorial design since we only have half of the treatments that we would have in a full factorial design. In 2³⁻¹ fractional factorial design of experiment, the effect of three-way interaction (ABC) is not measurable since it only has “+1”.
In four runs, we are able to run high and low settings for each of the three factors. The three-way interaction ABC is only at the high setting. In the 2³⁻¹ fractional factorial design, we notice that the column of each main effect has identical “+1” and “−1” values with one two-way interaction column.

A and BC
B and AC
C and AB

In this situation, we say that A is aliased with BC or A is the alias of BC. By multiplying any column with itself, we obtain the identity (I).

A*A=I

The product of any column and the identity is the column itself.

A*I=A

Column ABC is called the generator. By multiplying any column with the generator, we obtain its alias.

A*ABC=(A*A)*BC=I*BC=BC

Use Minitab to Run a Fractional Factorial Experiment

Step 1: Initiate the experiment design

Click Stat → DOE → Factorial → Create Factorial Design.
A new window named “Create Factorial Design” pops up.
Select the radio button “2-level factorial (default generators).”
Enter “3” as the “Number of factors.”
Click on the “Design” button in the window “Create Factorial Design” and another new window named “Create Factorial Design – Designs” pops up.
Highlight the “1/2 fraction” design in the box.
Select “2” as the “Number of replicates for corner points.”
Click “OK” in the window “Create Factorial Design – Designs.”
Click “OK” in the window “Create Factorial Design.”
The design table is created in the data table.

This is what the design table should look like.

Step 1.2 (optional): Define the names and the two levels for individual factors

Click on the “Factors” button in the window “Create Factorial Design” and a new window named “Create Factorial Design – Factors” appears.
Enter the names and the levels for individual factors. In this example, we keep the default names and levels for factors A, B, and C.

Step 2: Implement the experiment and record the results in the DOE table

Step 3: Fit the model using the experiment results

Click Stat → DOE → Factorial → Analyze Factorial Design.
A new window named “Analyze Factorial Design” appears.
Select “Y” as the “Responses.”
Click on the “Storage” button and another new window named “Analyze Factorial Design – Storage” pops up.
Check the box “Fits” and “Residuals” so that both fitted responses and the residuals would be saved in the data table.
Click “OK” in the window “Analyze Factorial Design – Storage.”
Click “OK” in the window “Analyze Factorial Design.”
The DOE analysis results appear in the session window.

Step 4: Analyze the model results

Check whether the model is statistically significant.
Check which factors are insignificant.

If any independent variables are not significant, remove them one at a time and rerun the model until all the independent variables in the model are significant.

The p-value of factor B is greater than the alpha level (0.05), so it is not statistically significant. In this example, since factor B is not statistically significant, it needs to be removed from the model.

Step 5.0:

Click Stat → DOE → Factorial → Analyze Factorial Design.
A new window named “Analyze Factorial Design” appears.
Select “Y” as the “Responses.”
Click on the “Terms” button and another new window named “Analyze Factorial Design – Terms” pops up.
Deselect the factor “B:B” from the box “Selected Terms.”
Click “OK” in the window “Analyze Factorial Design – Terms.”
Click “OK” in the window “Analyze Factorial Design.”
The DOE analysis results appear in the session window.

The p-values of all the independent variables are smaller than 0.05. There is no need to remove any independent variables from the model.

Step 5.1: Check whether residuals are normally distributed with mean equal to zero.

Click Stat → Basic Statistics → Graphical Summary.
A new window named “Graphical Summary” appears.
Select “RESI2” as the “Variables.”
Click “OK.”
The histogram and the normality test of the residuals appear in the newly-generated window.

If the p-value of the normality test is greater than the alpha level (0.05), the residuals are normally distributed. The p-value of the normality test is larger than alpha level (0.05). The residuals are normally distributed. Residuals’ mean is zero.

Step 5.2: Check whether the residuals are independent.

Click Stat → Control Charts → Variables Charts for Individuals → I-MR.
A new window named “Individuals – Moving RangeChart” appears.
Select the “RESI2” as the “Variables.”
Click “OK.”
The control charts appear in the newly-generated window.

If no data points on the control charts fail any tests, the residuals are in control and independent of each other.

Note: The prerequisite of plotting I-MR chart for residuals: the residuals are in the time order. These are the control charts for estimating whether residuals are independent.

Step 5.3: Check whether the residuals have equal variance across the predicted responses.

Click Graph → Scatterplot.
A new window named “Scatterplots” pops up.
Click “OK” in the window “Scatterplots.”
Another window named “Scatterplot Simple” pops up.
Select “RESI2” as “Y variables” and “FITS2” as “X variables.”
Click “OK” in the window “Scatterplot– Simple.”
The scatterplot between the fitted responses and the residuals appears in a new window.

Model summary: We look for patterns in which the residuals tend to have even variation across the entire range of the fitted response values.

Full Factorial DOE with Minitab

Michael Parker — Tue, 03 May 2016 19:47:22 +0000

What is a Full Factorial DOE?

In a full factorial experiment, all of the possible combinations of factors and levels are created and tested. For example, for two-level design (i.e.each factor has two levels) with k factors, there are 2k possible scenarios or treatments.

Two factors, each with two levels, we have 2²= 4 treatments
Three factors, each with two levels, we have 2³= 8 treatments
k factors, each with two levels, we have 2^k treatments

2^k Full Factorial DOE

Full factorial DOE is used to discover the cause-and-effect relationship between the response and both individual factors and the interaction of factors. Generate an equation to describe the relationship between Y and the important Xs:

Where:

Y is the response and X₁, X₂. . . X_k are the factors
α₀ is the intercept and α₁, α₂. . . α_p are the coefficients of the factors and interactions
ε is the error of the model

Two-Level Two-Factor Full Factorial

Below is a design pattern of a two-level two-factor full factorial experiment.

2 (level) raised to 2 (factors) = 4 treatment combinations.

Two-Level Three-Factor Full Factorial

Below is a design pattern of a two-level three-factor full factorial experiment.

2 (levels) raised to 3 (factors) = 8 treatment combinations.

Two-Level Four-Factor Full Factorial

Below is a design pattern of a two-level four-factor full factorial experiment

2 (levels) raised to 4 (factors) = 16 treatment combinations

Two-Level Five-Factor Full Factorial

Below is a design pattern of a two-level five-factor full factorial experiment

2 (levels) raised to 5 (factors) = 32 treatment combinations

Order to Run Experiments

The four design patterns shown earlier are listed in the standard order. Standard order is used to design the combinations/treatments before experiments start. When actually running the experiments, randomizing the standard order is recommended to minimize the noise.

Replication in Experiments

Each treatment can be tested multiple times in an experiment in order to increase the degrees of freedom and improve the capability of analysis. We call this method replication.
Replicates are the number of repetitions of running an individual treatment, which increase the power of the experimental responses. The order to run the treatments in an experiment should be randomized to minimize the noise.
Advantages of replication include: helps to better identify the true sources of variation, helps estimate the true impacts of the factors on the response, and overall improves the reliability and validity of the experimental results.

2² Full Factorial DOE

Case study: We are running a 2² full factorial DOE to discover the cause-and-effect relationship between the cake tastiness and two factors: temperature of the oven and time length of baking. Each factor has two levels and there are four treatments in total.
We decide to run each treatment twice so that we have enough degrees of freedom to measure the impact of two factors and the interaction between two factors. Therefore, there are eight observations in response eventually.

The objective is to understand the main effects and the interactions of these factors on the response variable. After running the four treatments twice in a random order, we obtain the following results

There are two factors and two levels, so there would be 2^2 = 4 treatment combinations. With replicates, each treatment combination is repeated once; therefore, there are in total 8 runs in this experiment. The experiment results are consolidated into the following table

The main effect of factor A is computed by averaging the difference between combinations where A was at its high settings and where A was at its low settings.
Main effect of factor A (temperature of the oven):

Where:

k is the number of factors
r is the number of times individual treatments are being run

Using the formula provided, the main effect of increasing the temperature of the oven is to decrease tastiness of the cake by −6.25.

The main effect of factor B, similar to A, is computed by averaging the difference between combinations where B was at its high settings and where B was at its low settings.
Main effect of factor B (time length of baking):

Where:

k is the number of factors
r is the number of times individual treatments are being run

Using the formula provided, the main effect of increasing the baking time is to decrease the tastiness of the cake by −1.75

The interaction effect is computed by averaging the difference between combinations where A and B were at opposite settings (low and high).

Interaction (i.e. A*B) effect:

Where:

k is the number of factors
r is the number of times individual treatments are being run

Using the formula provided, the interaction effect of the temperature and time variables on tastiness was −3.25.

Sum of squares of factors and interaction

Where:

k is the number of factors
r is the number of times individual treatments are being run

The sum of squares tells us the relative strength of each main effect and interaction. A has the strongest effect as indicated by the high SS value. The degrees of freedom are necessary to determine the mean squares value.

Degrees of freedom of factors and interaction:

Four degrees of freedom are necessary because there are three effects we are looking to understand: factor A, factor B, and the interaction between them.

Mean squares of factors and interaction:

Use Minitab to Run a 2k Full Factorial DOE

Step 1: Initiate the experiment design

Click Stat → DOE → Factorial → Create Factorial Design.
A new window named “Create Factorial Design” pops up.
Select the button of “General full factorial design.”
Select “2” as the “Number of factors.”
Click the “Design” button and a new window named “Create Factorial Design – Designs” pops up.
Enter the factor name “Temperature” for factor A.
Enter the factor name “Time Length” for factor B.
Enter the number of levels “2” for both factor A and B.
Enter “2” as the “Number of replicates.
Click “OK” button in the window “Create Factorial Design – Designs.”

Step 2: Enter the factors and make the design

Click on the “Factors” button in the window “Create Factorial Design”
A new window named “Create Factorial Design – Factors” pops up.
Select “Text” as the “Type” of both factor A and B.
Enter “Low” and “High” as the two levels for both factor A and B.
Click “OK” in the window “Create Factorial Design – Factors.”
Click “OK” in the window “Create Factorial Design.”
The design table is created in the data table.

Step 3: Run your experiment and record the response for each run in a new column named Tastiness within the table created by Minitab. Because we are simulating a DOE we have provided the data for you in the Sample Data.xlsx file. Use this data to populate your table with your experiment results

Step 4: Analyze the experiment results

Click Stat → DOE → Factorial → Analyze Factorial Design.
A new window named “Analyze Factorial Design” appears.
Select “Tastiness” in the list box of “Responses.”
Click on the “Storage” button and a new window named “Analyze Factorial Design – Storage” pops up.
Check the boxes “Fits” and “Residuals” in the window “Analyze Factorial Design – Storage.”
Click “OK” in the window “Analyze Factorial Design – Storage.”
Click “OK” in the window “Analyze Factorial Design.”
The DOE analysis results appear in the session window.

These are the results.

Since the p-values of all the independent variables in the model are smaller than the alpha level (0.05), both factors and their interactions have statistically significant impact on the response.

High R² value shows around 98% of the variation in the response can be explained by the model (very good results).

The fitted response and the residuals of the DOE model are stored in the last two columns of the data table.

Model summary: These are the software outputs for expected/predicted results, as well as the residuals for each combination.

Logistic Regression with Minitab

Michael Parker — Tue, 03 May 2016 15:10:30 +0000

What is Logistic Regression?

Logistic regression is a statistical method to predict the probability of an event occurring by fitting the data to a logistic curve using logistic function. The regression analysis used for predicting the outcome of a categorical dependent variable, based on one or more predictor variables. The logistic function used to model the probabilities describes the possible outcome of a single trial as a function of explanatory variables. The dependent variable in a logistic regression can be binary (e.g. 1/0, yes/no, pass/fail), nominal (blue/yellow/green), or ordinal (satisfied/neutral/dissatisfied). The independent variables can be either continuous or discrete.

Logistic Function

Where: z can be any value ranging from negative infinity to positive infinity.
The value of f(z) ranges from 0 to 1, which matches exactly the nature of probability (i.e., 0 ≤ P ≤ 1).
Logistic Regression Equation
Based on the logistic function,

we define f(z) as the probability of an event occurring and z is the weighted sum of the significant predictive variables.

Where: Z represents the weighted sum of all of the predictive variables.

Logistic Regression

Another of way of representing f(z) is by replacing the z with the sum of the predictive variables.

Where: Y is the probability of an event occurring and x’s are the significant predictors.
Notes:

When building the regression model, we use the actual Y, which is discrete (e.g. binary, nominal, ordinal).
After completing building the model, the fitted Y calculated using the logistic regression equation is the probability ranging from 0 to 1. To transfer the probability back to the discrete value, we need SMEs’ inputs to select the probability cut point.

Logistic Curve

The logistic curve for binary logistic regression with one continuous predictor is illustrated by the following Figure.

Odds

Odds is the probability of an event occurring divided by the probability of the event not occurring.

Odds range from 0 to positive infinity.
Probability can be calculated using odds.

Because probability can be expressed by the odds, and we can express probability through the logistic function, we can equate probability, odds, and ultimately the sum of the independent variables.
Since in logistic regression model

therefore

Three Types of Logistic Regression

Binary Logistic Regression
- Binary response variable
- Example: yes/no, pass/fail, female/male
Nominal Logistic Regression
- Nominal response variable
- Example: set of colors, set of countries
Ordinal Logistic Regression
- Ordinal response variable
- Example: satisfied/neutral/dissatisfied

All three logistic regression models can use multiple continuous or discrete independent variables and can be developed in Minitab using the same steps.

How to Run a Logistic Regression in Minitab

Case Study: We want to build a logistic regression model using the potential factors to predict the probability that the person measured is female or male.
Data File: “Logistic Regression” tab in “Sample Data.xlsx”

Response and potential factors

Response (Y): Female/Male
Potential Factors (Xs):
- Age
- Weight
- Oxy
- Runtime
- RunPulse
- RstPulse
- MaxPulse

Step 1:

Click Stat → Regression → Binary Logistic Regression→ Fit Binary Logistic Model
A new window named “Binary Logistic Regression” appears.
Click into the blank box next to “Response” and all the variables pop up in the list box on the left.
Select “Sex” as the “Response.”
Select “Age”, “Weight”, “Oxy”, “Runtime”, “RunPulse”, “RstPulse”, “MaxPulse” as “Continuous predictors.”
Click “OK.”
The results of the logistic regression model appear in session window.

Step 2:

Check the p-values of all the independent variables in the model.
Remove the insignificant independent variables one at a time from the model and rerun the model.
Repeat step 2.1 until all the independent variables in the model are statistically significant.

Since the p-values of all the independent variables are higher than the alpha level (0.05), we need to remove the insignificant independent variables one at a time from the model, starting with the highest p-value. Runtime has the highest p-value (0.990), so it will be removed from the model first.

After removing Runtime from the model, the p-values of all the independent variables are still higher than the alpha level (0.05). We need to continue removing the insignificant independent variables one at a time, continuing with the highest p-value. Age has the highest p-value (0.977), so it will be removed from the model next.

After removing both Age and RunTime from the model, the p-values of the remaining independent variables are still higher than the alpha level (0.05). We need to continue successively removing the insignificant independent variables. Continue with the next highest p-value. RstPulse has the highest p-value (0.803) of the remaining variables, it will be removed next.

After removing RstPulse from the model, the p-values of all the independent variables are still higher than the alpha level (0.05). Continue removing the insignificant independent variables. Weight has the highest p-value (0.218) of the remaining variables, it will be removed next.

After removing Weight from the model, the p-values of the remaining three independent variables are still higher than the alpha level (0.05). Once again, remove the next highest p-value. RunPulse with a p-value of 0.140 should be next.

After removing RunPulse from the model, the last two p-values are still higher than the alpha level (0.05). We need to remove one more insignificant variable, it will be MaxPulse with a p-value of 0.0755.

After removing MaxPulse from the model, the p-value of the only independent variable “Oxy” is lower than the alpha level (0.05). There is no need to remove “Oxy” from the model.

Step 3:

Analyze the binary logistic report in the session window and check the performance of the logistic regression model. The p-value here is 0.031, smaller than alpha level (0.05). We conclude that at least one of the slope coefficients is not equal to zero. The p-value of the “Goodness-of-Fit” tests are all higher than alpha level (0.05). We conclude that the model fits the data.

Step 4: Get the predicted probabilities of the event (i.e., Sex = M) occurring using the logistic regression model.

Click the “Storage” button in the window named “Binary Logistic Regression” and a new window named “Binary Logistic Regression – Storage” pops up.
Check the box “Fits (event probabilities).”
Click “OK” in the window of “Binary Logistic Regression– Storage.”
Click “OK” in the window of “Binary Logistic Regression.”
A column of the predicted event probability is added to the data table with the heading “FITS”.

Model summary: In column C10, Minitab provides the probability that the sex is male based on the only statistically significant independent variable “Oxy”.

Stepwise Regression with Minitab

Michael Parker — Thu, 28 Apr 2016 21:12:05 +0000

What is Stepwise Regression?

Stepwise regression is a statistical method to automatically select regression models with the best sets of predictive variables from a large set of potential variables. There are different statistical methods used in stepwise regression to evaluate the potential variables in the model:

F-test
T-test
R-square
AIC

Three Approaches to Stepwise Regression

Forward Selection
Bring in potential predictors one by one and keep them if they have significant impact on improving the model.
Backward Selection
Try out potential predictors one by one and eliminate them if they are insignificant to improve the fit.
Mixed Selection
Is a combination of both forward selection and backward selection. Add and remove variables based on pre-defined significance threshold levels.

How to Use Minitab to Run a Stepwise Regression

Case study: We want to build a regression model to predict the oxygen uptake of a person who runs 1.5 miles. The potential predictors are:

Age
Weight
Runtime
Runpulse
RstPulse
MaxPulse

Data File: “Stepwise Regression” tab in “Sample Data.xlsx”

Sample Data Glance

Steps to run stepwise regression in Minitab:

Click Stat → Regression → Regression → Fit Regression Model
A new window named “Regression” appears.
Select “Oxy” as the “Responses” and select all the other variables into the “Continuous Predictors” box.
Click the “Stepwise” button and a new window named “Regression: Stepwise” pops up.
Select the method of stepwise regression and enter the alphas to enter/remove. In this example, we use the “Forward selection” method and the alpha to enter is 0.25.
Click “OK” in the window “Stepwise – Methods.”
Click “OK” in the window “Stepwise Regression.”
The results appear in the session window.

Model summary: One out of six potential factors is not statistically significant since its p-value is higher than the alpha to enter. Step History: Step-by-step records on how to come up with the final model. Each column indicates the model built in each step.

Multiple Linear Regression with Minitab

Michael Parker — Thu, 21 Apr 2016 19:31:24 +0000

What is Multiple Linear Regression?

Multiple linear regression is a statistical technique to model the relationship between one dependent variable and two or more independent variables by fitting the data set into a linear equation.
The difference between simple linear regression and multiple linear regression:

Simple linear regression only has one predictor.
Multiple linear regression has two or more predictors.

Multiple Linear Regression Equation

Where:

Y is the dependent variable (response)
X₁, X₂ . . . X_p are the independent variables (predictors). There are p predictors in total.

Both dependent and independent variables are continuous.

β is the intercept indicating the Y value when all the predictors are zeros
α₁, α₂ . . . α_p are the coefficients of predictors. They reflect the contribution of each independent variable in predicting the dependent variable.
e is the residual term indicating the difference between the actual and the fitted response value.

Use Minitab to Run a Multiple Linear Regression

Case study: We want to see whether the scores in exam one, two, and three have any statistically significant relationship with the score in final exam. If so, how are they related to final exam score? Can we use the scores in exam one, two, and three to predict the score in final exam?
Data File: “Multiple Regression Analysis” tab in “Sample Data.xlsx.”

Step 1: Determine the dependent and independent variables, all should be continuous. Y (dependent variable) is the score of final exam. X₁, X₂, and X₃ (independent variables) are the scores of exams one, two, and three respectively. All x variables are continuous.

Step 2: Start building the multiple linear regression model

Click Stat → Regression → Regression → Fit Regression Model
A new window named “Regression” pops up.
Select “FINAL” as “Response” and “EXAM1”, “EXAM2” and “EXAM3” as “Predictors.”
Click the “Graph” button, select the radio button “Four in one” and click “OK.”
Click the “Storage” button, check the boxes of “Residuals” and “DFITS” and click “OK.”
Click “OK” in the window named “Regression.”
The regression analysis results appear in a session window and the four residual plots appear in another window named “Residual Plots for FINAL.”

Step 3: Check whether the whole model is statistically significant. If not, we need to re-examine the predictors or look for new predictors before continuing.

H₀: The model is not statistically significant (i.e., all the parameters of predictors are not significantly different from zeros).
H₁: The model is statistically significant (i.e., at least one predictor parameter is significantly different from zero).

In this example, p-value is much smaller than alpha level (0.05), hence we reject the null hypothesis; the model is statistically significant.

Step 4: Check whether multicollinearity exists in the model.

The VIF information is automatically generated in table of Coefficients.

We use the VIF (Variance Inflation Factor) to determine if multicollinearity exists.

Multicollinearity

Multicollinearity is the situation when two or more independent variables in a multiple regression model are correlated with each other. Although multicollinearity does not necessarily reduce the predictability for the model as a whole, it may mislead the calculation for individual independent variables. To detect multicollinearity, we use VIF (Variance Inflation Factor) to quantify its severity in the model.

Variance Inflation Factor (1)

VIF quantifies the degree of multicollinearity for each individual independent variable in the model.

VIF calculation:

Assume we are building a multiple linear regression model using p predictors.

Two steps are needed to calculate VIF for X₁.

Step 1: Build a multiple linear regression model for X₁ by using X₂, X₃ . . . X_p as predictors.

Step 2: Use the R²generated by the linear model in step 1 to calculate the VIF for X₁.

Apply the same methods to obtain the VIFs for other X’s. The VIF value ranges from one to positive infinity.

Variance Inflation Factor (2)

Rules of thumb to analyze variance inflation factor (VIF):

If VIF = 1, there is no multicollinearity.
If 1 < VIF < 5, there is small multicollinearity.
If VIF ≥ 5, there is medium multicollinearity.
If VIF ≥ 10, there is large multicollinearity.

How to Deal with Multicollinearity

Increase the sample size.
Collect samples with a broader range for some predictors.
Remove the variable with high multicollinearity and high p-value.
Remove variables that are included more than once.
Combine correlated variables to create a new one.

In this section, we will focus on removing variables with high VIF and high p-value.

Step 5: Deal with multicollinearity:

Identify a list of independent variables with VIF higher than 5. If no variable has VIF higher than 5, go to Step 6 directly.
Among variables identified in Step 5.1, remove the one with the highest p-value.
Run the model again, check the VIFs and repeat Step 5.1.

Note: we only remove one independent variable at a time.

In this example, all three predictors have VIF higher than 5. Among them, EXAM1 has the highest p-value. We will remove EXAM1 from the equation and run the model again.

Run the new multiple linear regression with only two predictors (i.e., EXAM2 and EXAM3).

Check the VIFs of EXAM2 AND EXAM3. They are both smaller than 5; hence, there is little multicollinearity existing in the model.

Step 6: Identify the statistically insignificant predictors. Remove one insignificant predictor at a time and run the model again. Repeat this step until all the predictors in the model are statistically significant.

Insignificant predictors are the ones with p-value higher than alpha level (0.05). When p > alpha level, we fail to reject the null hypothesis; the predictor is not significant.

H₀: The predictor is not statistically significant.
H₁: The predictor is statistically significant.

As long as the p-value is greater than 0.05, remove the insignificant variables one at a time in the order of the highest p-value. Once one insignificant variable is eliminated from the model, we need to run the model again to obtain new p-values for other predictors left in the new model. In this example, both predictors’ p-values are smaller than alpha level (0.05). As a result, we do not need to eliminate any variables from the model.

Step 7: Interpret the regression equation

The multiple linear regression equation appears automatically at the top of the session window. “Parameter Estimates” section provides the estimates of parameters in the linear regression equation.

Now that we have removed multicollinearity and all the insignificant predictors, we have the parameters for the regression equation.

Interpreting the Results

Rsquare Adj = 98.4%

98% of the variation in FINAL can be explained by the predictor variables EXAM2 & EXAM3.

P-value of the F-test = 0.000

We have a statistically significant model.

Variables p-value:

Both are significant (less than 0.05).

VIF

EXAM2 and EXAM3 are both below 5; we’re in good shape!

Equation: −4.34 + 0.722*EXAM2 + 1.34*EXAM3

−4.34 is the Y intercept, all equations will start with −4.34.
722 is the EXAM2 coefficient; multiply it by EXAM2 score.
34 is the EXAM3 coefficient; multiply it by EXAM3 score.

Let us say you are the professor again, and this time you want to use your prediction equation to estimate what one of your students might get on their final exam.

Assume the following:

Exam 2 results were: 84
Exam 3 results were: 102

Use your equation: −4.34 + 0.722*EXAM2 + 1.34*EXAM3

Predict your student’s final exam score:

−4.34 + (0.722*84) + (1.34*102) =−4.34 + 60.648 + 136.68 = 192.988

Model summary: Nice work again! Now you can use your “magic” as the smart and efficient professor and allocate your time to other students because this one projects to perform much better than the average score of 162. Now that we know that exams two and three are statistically significant predictors, we can plug them into the regression equation to predict the results of the final exam for any student.

Simple Linear Regression with Minitab

Michael Parker — Wed, 06 Apr 2016 14:40:52 +0000

What is Simple Linear Regression?

Simple linear regression is a statistical technique to fit a straight line through the data points. It models the quantitative relationship between two variables. It is simple because only one predictor variable is involved. It describes how one variable changes according to the change of another variable. Both variables need to be continuous; there are other types of regression to model discrete data.

Simple Linear Regression Equation

The simple linear regression analysis fits the data to a regression equation in the form

Where:

Y is the dependent variable (the response) and X is the single independent variable (the predictor)
α is the slope describing the steepness of the fitting line. β is the intercept indicating the Y value when X is equal to 0
e stands for error (residual). It is the difference between the actual Y and the fitted Y (i.e. the vertical difference between the data point and the fitting line).

Ordinary Least Squares

The ordinary least squares is a statistical method used in linear regression analysis to find the best fitting line for the data points. It estimates the unknown parameters of the regression equation by minimizing the sum of squared residuals (i.e. the vertical difference between the data point and the fitting line).

In mathematical language, we look for α and β that satisfy the following criteria:

The actual value of the dependent variable:

Where: i = 1, 2 . . . n.

The fitted value of the dependent variable:

Where: i = 1, 2 . . . n.

By using calculus, it can be shown the sum of squared error is minimal when

and

ANOVA in Simple Linear Regression

X: the independent variable that we use to predict;
Y: the dependent variable that we want to predict.

The variance in simple linear regression can be expressed as a relationship between the actual value, the fitted value, and the grand mean—all in terms of Y.

Total Variation = Total Sums of Squares =
Explained Variation = Regression Sums of Squares =
Unexplained Variation = Error Sums of Squares =

Regression follows the same methodology as ANOVA and the hypothesis tests behind it use the same assumptions.

Variation Components

i.e. Total Sums of Squares = Regression Sums of Squares + Error Sums of Squares

Degrees of Freedom Components

i.e. n – 1 = (k – 1) + (n – k), where n is the number of data points, k is the number of predictors

Whether the overall model is statistically significant can be tested by using F-test of ANOVA.

H₀: The model is not statistically significant.
H_a: The model is statistically significant.

Test Statistic

Critical Statistic

Is represented by F value in F table with (k – 1) degrees of freedom in the numerator and (n – k) degrees of freedom in the denominator.

If F ≤ F_critical (calculated F is less than or equal to the critical F), we fail to reject the null. There is no statistically significant relationship between X and Y.
If F > F_critical, we reject the null. There is a statistically significant relationship between X and Y.

Coefficient of Determination

R-squared or R²(also called coefficient of determination) measures the proportion of variability in the data that can be explained by the model.

R² ranges from 0 to 1. The higher R² is, the better the model can fit the actual data.
R² can be calculated with the formula:

Use Minitab to Run a Simple Linear Regression

Case study: We want to see whether the score on exam one has any statistically significant relationship with the score on the final exam. If yes, how much impact does exam one have on the final exam?
Data File: “Simple Linear Regression” tab in “Sample Data.xlsx”

Step 1: Determine the dependent and independent variables. Both should be continuous variables.

Y (dependent variable) is the score of final exam.
X (independent variable) is the score of exam one.

Step 2: Create a scatter plot to visualize whether there seems to be a linear relationship between X and Y.

Click Graph → Scatterplot.
A new window named “Scatterplots” pops up.
Click “OK.”
A new window named “Scatterplot– Simple” pops up.
Select “FINAL” as “Y variables” and “EXAM1” as “X variables.”
Click “OK.”
A scatter plot is generated in a new window.

Based on the scatter plot, the relationship between exam one and final seems linear. The higher the score on exam one, the higher the score on the final. It appears you could “fit” a line through these data points.

Step 3: Run the simple linear regression analysis.

Click Stat → Regression → Regression → Fit Regression Model.
A new window named “Regression” pops up.
Select “FINAL” as “Response” and “EXAM1” as “Continuous Predictors.”
Click the “Storage” button.
Check the box of “Residuals” so that the residuals can be saved automatically in the last column of the data table.
Click “OK.”
The regression analysis results appear in the new window.

Step 4: Check whether the model is statistically significant. If not significant, we will need to re-examine the predictor or look for new predictors before continuing. R²measures the percentage of variation in the data set that can be explained by the model. 89.5% of the variability in the data can be accounted for by this linear regression model. “Analysis of Variance” section provides an ANOVA table covering degrees of freedom, sum of squares, and mean square information for total, regression and error. The p-value of the F-test is lower than the α level (0.05), indicating that the model is statistically significant.

The p-value is 0.0001; therefore, we reject the null and claim the model is statistically significant. The R square value says that 89.5% of the variability can be explained by this model.

Step 5: Understand regression equation

The estimates of slope and intercept are shown in “Parameter Estimate” section. In this example, Y = 15.6 + 1.85 × X, where X is the score on Exam 1 and Y is the final exam score. One unit increase in the score of Exam1 would increase the final score by 1.85.

Interpreting the Results

Let us say you are the professor and you want to use this prediction equation to estimate what two of your students might get on their final exam.

Rsquare Adj = 89.0%

89% of the variation in FINAL can be explained by EXAM1

P-value of the F-test = 0.000

We have a statistically significant model

Prediction Equation: 15.6 + 1.85 × EXAM1

6 is the Y intercept, all equations will start with 15.6
85 is the EXAM1 Coefficient: multiply it by EXAM1 score

Because the model is significant, and it explains 89% of the variability, we can use the model to predict final exam scores based on the results of Exam1.

Let us assume the following:

Student “A” exam 1 results were: 79
Student “B” exam 1 results were: 94

Remember our prediction equation 15.6 + 1.85 × Exam1?

Now apply the equation to each student:

Student “A” Estimate: 15.6 + (1.85 × 79) = 161.8

Student “B” Estimate: 15.6 + (1.85 × 94) = 189.5

Model summary: By simply replacing exam 1 scores into the equation we can predict their final exam scores. But the key thing about the model is whether or not it is useful. In this case, the professor can use the results to Figure out where to spend his time helping students.

Correlation Coefficient with Minitab

Michael Parker — Tue, 05 Apr 2016 21:14:05 +0000

Pearson’s Correlation Coefficient

Pearson’s correlation coefficient is also called Pearson’s r or coefficient of correlation and Pearson’s product moment correlation coefficient (r), where r is a statistic measuring the linear relationship between two variables.

What is Correlation?

Correlation is a statistical technique that describes whether and how strongly two or more variables are related.
Correlation analysis helps to understand the direction and degree of association between variables, and it suggests whether one variable can be used to predict another. Of the different metrics to measure correlation, Pearson’s correlation coefficient is the most popular. It measures the linear relationship between two variables.
Correlation coefficients range from −1 to 1.

If r = 0, there is no linear relationship between the variables.
The sign of r indicates the direction of the relationship:
If r < 0, there is a negative linear correlation. If r > 0, there is a positive linear correlation.
The absolute value of r describes the strength of the relationship:
If |r| ≤ 0.5, there is a weak linear correlation.
If |r| > 0.5, there is a strong linear correlation.
If |r| = 1, there is a perfect linear correlation.
When the correlation is strong, the data points on a scatter plot will be close together (tight). The closer r is to −1 or 1, the stronger the relationship.
−1 Strong inverse relationship
+1 Strong direct relationship
When the correlation is weak, the data points are spread apart more (loose). The closer the correlation is to 0, the weaker the relationship.

This Figure demonstrates the relationships between variables as the Pearson r value ranges from 1 to 0 and to −1. Notice that at −1 and 1 the points form a perfectly straight line.

At 0 the data points are completely random.
At 0.8 and −0.8, notice how you can see a directional relationship, but there is some noise around where a line would be.
At 0.4 and −0.4, it looks like the scattering of data points is leaning to one direction or the other, but it is more difficult to see a relationship because of all the noise.

Pearson’s correlation coefficient is only sensitive to the linear dependence between two variables. It is possible that two variables have a perfect non-linear relationship when the correlation coefficient is low. Notice the scatter plots below with correlation equal to 0. There are clearly relationships but they are not linear and therefore cannot be determined with Pearson’s correlation coefficient.

Correlation and Causation

Correlation does not imply causation.
If variable A is highly correlated with variable B, it does not necessarily mean A causes B or vice versa. It is possible that an unknown third variable C is causing both A and B to change. For example, if ice cream sales at the beach are highly correlated with the number of shark attacks, it does not imply that increased ice cream sales cause increased shark attacks. They are triggered by a third factor: summer.
This example demonstrates a common mistake that people make: assuming causation when they see correlation. In this example, it is hot weather that is a common factor. As the weather is hotter, more people consume ice cream and more people swim in the ocean, making them susceptible to shark attacks.

Correlation and Dependence

If two variables are independent, the correlation coefficient is zero.
WARNING! If the correlation coefficient of two variables is zero, it does not imply they are independent. The correlation coefficient only indicates the linear dependence between two variables. When variables are non-linearly related, they are not independent of each other but their correlation coefficient could be zero.

Correlation Coefficient and X-Y Diagram

The correlation coefficient indicates the direction and strength of the linear dependence between two variables but it does not cover all the existing relationship patterns. With the same correlation coefficient, two variables might have completely different dependence patterns. A scatter plot or X-Y diagram can help to discover and understand additional characteristics of the relationship between variables. The correlation coefficient is not a replacement for examining the scatter plot to study the variables’ relationship.
The correlation coefficient by itself does not tell us everything about the relationship between two variables. Two relationships could have the same correlation coefficient, but completely different patterns.

Statistical Significance of the Correlation Coefficient

The correlation coefficient could be high or low by chance (randomness). It may have been calculated based on two small samples that do not provide good inference on the correlation between two populations.
In order to test whether there is a statistically significant relationship between two variables, we need to run a hypothesis test to determine whether the correlation coefficient is statistically different from zero.
Hypothesis Test Statements

H₀: r = 0: Null Hypothesis: There is no correlation.
H1: r ≠ 0: Alternate Hypothesis: There is a correlation.

Hypothesis tests will produce p-values as a result of the statistical significance test on r. When the p-value for a test is low (less than 0.05), we can reject the null hypothesis and conclude that r is significant; there is a correlation. When the p-value for a test is > 0.05, then we fail to reject the null hypothesis; there is no correlation.
We can also use the t statistic to draw the same conclusions regarding our test for significance of the correlation coefficient. To use the t-test to determine the statistical significance of the Pearson correlation, calculate the t statistic using the Pearson r value and the sample size, n.
Test Statistic

Critical Statistic
Is the t-value in t-table with (n – 2) degrees of freedom.
If the absolute value of the calculated t value is less than or equal to the critical t value, then we fail to reject the null and claim no statistically significant linear relationship between X and Y.

If |t| ≤ t_critical, we fail to reject the null. There is no statistically significant linear relationship between X and Y.
If |t| > t_critical, we reject the null. There is a statistically significant linear relationship between X and Y.

Using Software to Calculate the Correlation Coefficient

We are interested in understanding whether there is linear dependence between a car’s MPG and its weight and if so, how they are related. The MPG and weight data are stored in the “Correlation Coefficient” tab in “Sample Data.xlsx.” We will discuss three ways to get the results.

Use Excel to Calculate the Correlation Coefficient

The formula CORREL in Excel calculates the sample correlation coefficient of two data series. The correlation coefficient between the two data series is −0.83, which indicates a strong negative linear relationship between MPG and weight. In other words, as weight gets larger, gas mileage gets smaller.

Use Minitab to Calculate the Correlation Coefficient

Step 1: Stat → Basic Statistics → Correlation

Step 2: Select the two variables of interest in the pop-up window “Correlation” and click “OK.”

The correlation coefficient result (−0.832) appears in the session window. The p-value (0.000) is lower than the alpha level (0.05), indicating the linear correlation is statistically significant. We can claim that there is a linear relationship between mileage and weight.

Interpreting Results

How do we interpret results and make decisions based Pearson’s correlation coefficient (r) and p-values?

Let us look at a few examples:

r = −0.832, p = 0.000 (previous example). The two variables are inversely related and the linear relationship is strong. Also, this conclusion is significant as supported by p-valueof 0.00.
r = −0.832, p = 0.71. Based on r, you should conclude the linear relationship between the two variables is strong and inversely related. However, with a p-valueof 0.71, you should then conclude that r is not significant and that your sample size may be too small to accurately characterize the relationship.
r = 0.5, p = 0.00. Moderately positive linear relationship, r is statistically significant.
r = 0.92, p = 0.61. Strong positive linear relationship but r is not statistically significant. Get more data.
r = 1.0, p = 0.00. The two variables have a perfect linear relationship and r is significant.

Correlation Coefficient Calculation

Population Correlation Coefficient (ρ)

Sample Correlation Coefficient (r)

Model summary: It is only defined when the standard deviations of both X and Y are non-zero and finite. When covariance of X and Y is zero, the correlation coefficient is zero.

Two Sample Proportion Test with Minitab

Michael Parker — Tue, 05 Apr 2016 20:22:13 +0000

What is the Two Sample Proportion Test?

The two sample proportion test is a hypothesis test to compare the proportions of one certain event occurring in two populations following the binomial distribution.

Null Hypothesis(H₀): p₁ = p₂
Alternative Hypothesis(H_a): p₁ ≠ p₂

Two Sample Proportion Test Assumptions

The sample data drawn from the populations of interest are unbiased and representative.
There are only two possible outcomes in each trial for both populations: success/failure, yes/no, and defective/non-defective etc.
The underlying distributions of both populations are binomial distribution.
When np ≥ 5 and np(1 – p) ≥ 5, the binomial distribution can be approximated by the normal distribution.

How the Two Sample Proportion Test Works

When np ≥ 5 and np(1 – p) ≥ 5, we use normal distribution to approximate the underlying binomial distributions of the populations.
Test Statistic

Where:

and where:

and are the observed proportions of events in the two samples
and is the number of trials in the two samples respectively
and is the number of events in the two samples respectively

When |Z_calc| is smaller than Z_crit, , we fail to reject the null hypothesis.

Use Minitab to Run a Two Sample Proportion Test

Case study: We are interested in comparing the exam pass rates of a high school in March and April using a nonparametric (i.e. distribution-free) hypothesis test: two sample proportion test.
Data File: “Two Sample Proportion” tab in “Sample Data.xlsx”

Null Hypothesis(H₀): p_March = p_April
Alternative Hypothesis(H_a): p_March ≠ p_April

Steps to run a two sample proportion test in Minitab:

Click Stat → Basic Statistics → 2 Proportions.
A new window named “Two-SampleProportion” pops up.
Click the drop-down menu and choose “Summarized data.”
Enter “89” in the box intersecting “First” and “Events.”
Enter “112” in the box intersecting “First” and “Trials.”
Enter “102” in the box intersecting “Second” and “Events.”
Enter “130” in the box intersecting “Second” and “Trials.”
Click “OK.”
The two-sample proportion test results appear in the session window.

Model summary: The p-value of the two-sample proportion test is 0.849, greater than the alpha level (0.05), and we fail to reject the null hypothesis. We conclude that the exam pass rates of the high school in March and April are not statistically different.

P Chart with Minitab

Michael Parker — Tue, 05 Apr 2016 15:32:24 +0000

What is a P Chart?

The P chart plots the percentage of defectives in one subgroup as a data point. It considers the situation when the subgroup size of inspected units is not constant. The underlying distribution of the P-chart is binomial distribution.

P Chart Equations

Data Point:

Center Line:

Control Limits:
Where:

n_i is the subgroup size for the i^th subgroup
k is the number of subgroups
x_i is the number of defects in the i^th

Use Minitab to Plot a P Chart

Data File: “P” tab in “Sample Data.xlsx”

Steps to plot a P chart in Minitab:

Click Stat → Control Charts → Attributes Charts → P.
A new window named “P Chart” appears.
Select “Fail” as the “Variables.”
Select “N” as the “Subgroup Sizes.”
Click the button “P ChartOptions” to open a window named “P Chart Options”.
Click the tab “Tests.”
Select the item “Perform all tests for special causes” in the dropdown menu.
Click “OK” in the window “P Chart”
Click “OK.”
The P chart appears in the newly-generated window.

P Chart Diagnosis

Model summary: Since the sample sizes are not constant over time, the control limits are adjusted to different values accordingly. All the data points fall within the control limits and spread randomly around the mean. We conclude that the process is in control.

Box Cox Transformation with Minitab

Michael Parker — Thu, 31 Mar 2016 17:17:50 +0000

What is a Box Cox Transformation?

Data transforms are usually applied so that the data appear to more closely meet assumptions of a statistical inference model to be applied or to improve the interpret-ability or appearance of graphs.
Power transformation is a class of transformation functions that raise the response to some power. For example, a square root transformation converts X to X^1/2
Box Cox transformation is a popular power transformation method developed by George E. P. Box and David Cox.

Box Cox Transformation Formula

The formula of the Box Cox transformation is:

Where:

y is the transformation result
x is the variable under transformation
λ is the transformation parameter

Use Minitab to Perform a Box-Cox Transformation

Minitab provides the best Box-Cox transformation with an optimal λ that minimizes the model SSE (sum of squared error). Here is an example of how we transform the non-normally distributed response to normal data using Box-Cox method.
Data File: “Box-Cox” tab in “Sample Data.xlsx”

Step 1: Test the normality of the original data set.

Click Stat → Basic Statistics → Normality Test.
A new window named “Normality Test” pops up.
Select “Y” as “Variable.”
Click “OK.”
The normality test results are shown automatically in the new window.

Normality Test:

H₀: The data are normally distributed.
H₁: The data are not normally distributed.

If p-value > alpha level (0.05), we fail to reject the null hypothesis. Otherwise, we reject the null. In this example, p-value = 0.029 < alpha level (0.05). The data are not normally distributed.

Step 2: Run the Box-Cox Transformation:

Click Stat → Control Charts → Box-Cox Transformation.
A new window named “Box-Cox Transformation” pops up.
Click into the blank list box below “All observations for a chart are in one column.”
Select “Y” as the variable.
Select “Run” into the box next to “Subgroup sizes (enter a number or ID column).”
Click “OK.”
The analysis results are shown automatically in the new window.

The software looks for the optimal value of lambda that minimizes the SSE (Sum of Squares of Error). In this case the minimum value is 0.12. The transformed Y can also be saved in another column.

Create a new column named “Y1” in the data table.
Click Stat → Control Charts → Box-Cox Transformation.
Again, a window named “Box-Cox Transformation” pops up.
Like before, Select “Y” as the variable.
Select “Run” into the box next to “Subgroup sizes (enter a number or ID column).”
Now, click on the “Options” button in the “Box-Cox Transformation” window.
A new window named “Box-Cox Transformation – Options” appears.
Click in the blank box under “Store transformed data in” and all the columns pop up in the list box on the left.
Select “Y1” in “Store transformed data in.”
Click “OK” in the window “Box-Cox Transformation – Options.”
Click “OK” in the window “Box-Cox Transformation.”
The transformed column is stored in the column “Y1.”

Run the normality test to check whether the transformed data are normally distributed.

Use the Anderson–Darling test to test the normality of the transformed data

H₀: The data are normally distributed.
H₁: The data are not normally distributed.

Model summary: If p-value > alpha level (0.05), we fail to reject the null. Otherwise, we reject the null. In this example, p-value = 0.327 > alpha level (0.05). The data are normally distributed.

Six Sigma – WMEP

Fractional Factorial Designs with Minitab

What Are Fractional Factorial Experiments?

Why Fractional Factorial Experiments?

How Does a Fractional Factorial Work?

Example of an invalid design

23−1 Fractional Factorial Design Pattern

Use Minitab to Run a Fractional Factorial Experiment

Full Factorial DOE with Minitab

What is a Full Factorial DOE?

2k Full Factorial DOE

Two-Level Two-Factor Full Factorial

Two-Level Three-Factor Full Factorial

Two-Level Four-Factor Full Factorial

Two-Level Five-Factor Full Factorial

Order to Run Experiments

Replication in Experiments

22 Full Factorial DOE

Interaction (i.e. A*B) effect:

Sum of squares of factors and interaction

Degrees of freedom of factors and interaction:

Mean squares of factors and interaction:

Use Minitab to Run a 2k Full Factorial DOE

Logistic Regression with Minitab

What is Logistic Regression?

Logistic Function

Logistic Regression

Logistic Curve

Odds

Three Types of Logistic Regression

How to Run a Logistic Regression in Minitab

Stepwise Regression with Minitab

What is Stepwise Regression?

Three Approaches to Stepwise Regression

How to Use Minitab to Run a Stepwise Regression

Multiple Linear Regression with Minitab

What is Multiple Linear Regression?

Multiple Linear Regression Equation

Use Minitab to Run a Multiple Linear Regression

Multicollinearity

Variance Inflation Factor (1)

Variance Inflation Factor (2)

How to Deal with Multicollinearity

Interpreting the Results

Simple Linear Regression with Minitab

What is Simple Linear Regression?

Simple Linear Regression Equation

Ordinary Least Squares

ANOVA in Simple Linear Regression

Coefficient of Determination

Use Minitab to Run a Simple Linear Regression

Interpreting the Results

Correlation Coefficient with Minitab

Pearson’s Correlation Coefficient

What is Correlation?

Correlation and Causation

Correlation and Dependence

Correlation Coefficient and X-Y Diagram

Statistical Significance of the Correlation Coefficient

Using Software to Calculate the Correlation Coefficient

Use Excel to Calculate the Correlation Coefficient

Use Minitab to Calculate the Correlation Coefficient

Interpreting Results

Correlation Coefficient Calculation

Population Correlation Coefficient (ρ)

Sample Correlation Coefficient (r)

Two Sample Proportion Test with Minitab

What is the Two Sample Proportion Test?

Two Sample Proportion Test Assumptions

How the Two Sample Proportion Test Works

Use Minitab to Run a Two Sample Proportion Test

Steps to run a two sample proportion test in Minitab:

P Chart with Minitab

What is a P Chart?

P Chart Equations

Use Minitab to Plot a P Chart

P Chart Diagnosis

Box Cox Transformation with Minitab

What is a Box Cox Transformation?

Box Cox Transformation Formula

2³⁻¹ Fractional Factorial Design Pattern

2^k Full Factorial DOE

2² Full Factorial DOE