Developing Prediction Models for CMMI High Maturity

Statistical models based on multiple (linear) regression can be easily built to serve as prediction models as required by CMMI High Maturity. A prediction model in this case is, simply speaking, nothing but a multiple (linear) regression equation.

Regression Equation

A regression equation takes the form Y = f(x1, x2, …, xn), where,
  • Y = defendant variable (representing the variable of interest that has to be predicted)
  • x1, x2, …, xn = set of independent variables (representing the variables whose value is known and is fed into the model to obtain the predicted value of Y)
Steps to Build a Prediction Model Using Regression Method

The steps to build a prediction model are as follows. The details of the statistical methods to be used for these steps are easily available and hence have not been duplicated:
  • Check the data for basic sanity – erroneous data, missing data, etc.
  • Determine the outlier values in the data and treat them in an appropriate manner. Outlier values are typically wayward data and may be retained or removed based on certain considerations
  • Test the variables Y, x1, x2, …, xn for Normality. Most of the statistical results used commonly assume the underlying distribution to be a Normal Distribution. In case the distribution is Non-normal appropriate transformation is required to be done
  • Understand the statistical behavior of the variables. This can be done by computing the measures of central tendency (typically mean) and measures of dispersion (typically standard deviation). A good starting point for this is to draw a histogram which will additionally give insight into the shape of the distribution
  • Build the regression equation in the form Y = a + b1x1 + b2x2 + … + bnxn
  • Analyze the statistical results (p-value, etc.) for the regression equation for the significance of the overall regression fit (test for R-square) and significance of the regression coefficients (test b1, b2, b3)
  • Validate the regression equation for its prediction power. The preferred method is to use the regression equation on a different set of values than the ones that were used to build the model
Model Adequacy Checking

If both the overall regression fit and the regression coefficients are significant the regression equation can be confidently used as a Prediction Model.If that is not the case, however, alternative approaches need to be considered and adopted to develop the Prediction Model.

No comments:

Post a Comment