Regressionevaluationmetrics¶
Regression Evalulation Metrics¶
Table of Content¶
Abstract¶
For a regression problem, the output is a continous number from the infinite values in the number world, unlike the classification problem, where the output should belong to either a binary value like [0, 1], or a multiclass output.
In this markdown, we will highlight some of the most common used evaluation metrics, how to calculate them, and when to use them.
MSE - Mean Squared Error¶
This is the most common evaluation metric for regression problems. It measures the average squared difference between the predicted and actual values. MSE is sensitive to outliers, so it is not always the best metric to use if the data contains a lot of outliers.
Formula:
MSE = Σ(y_i - y_hat_i)^2 / n
RMSE - Root Mean Squared Error¶
This is the square root of MSE. It has the same units as the target variable, so it is easier to interpret than MSE. RMSE is also less sensitive to outliers than MSE.
Formula:
RMSE = √(MSE)
Mean absolute error (MAE)¶
This metric measures the average absolute difference between the predicted and actual values. MAE is less sensitive to outliers than MSE, but it is not as easy to interpret as RMSE.
Formula:
MAE = Σ|y_i - y_hat_i| / n
R-squared (R^2)¶
This metric measures the proportion of variance in the target variable that is explained by the model. R^2 can range from 0 to 1, where 0 means that the model does not explain any variance in the target variable and 1 means that the model perfectly explains all of the variance in the target variable.
Formula:
R^2 = 1 - Σ(y_i - y_hat_i)^2 / Σ(y_i - y_bar)^2
= 1 - RSS/TSS
Where:
RSS: Sum of squares of residuals.
TSS: Total sum of squares.
Adjusted R^2¶
This metric is similar to R^2, but it adjusts for the number of features in the model. Adjusted R^2 can be lower than R^2 if the model has too many features.
Formula:
Adjusted R^2 = 1 - (n - 1) * MSE / (TSS - MSE)
Mean absolute percentage error (MAPE)¶
This metric measures the average percentage error between the predicted and actual values. MAPE is expressed as a percentage, so it is easy to understand and interpret. However, MAPE is sensitive to outliers, so it is not always the best metric to use.
Formula:
MAPE = Σ|(y_i - y_hat_i) / y_i| * 100 / n
Root mean squared logarithmic error (RMSLE)¶
This metric is similar to RMSE, but it takes into account the fact that the target variable is often skewed. RMSLE is less sensitive to outliers than RMSE, and it is more appropriate for predicting skewed target variables.
Formula:
RMSLE = √(Σ(log(y_i + 1) - log(y_hat_i + 1))^2 / n)
Median absolute error (MedAE)¶
This metric is similar to MAE, but it uses the median instead of the mean. MedAE is less sensitive to outliers than MAE, and it is more robust to changes in the distribution of the target variable.
Formula:
MedAE = median(|y_i - y_hat_i|)
Explained variance score (EVS)¶
This metric measures the proportion of variance in the target variable that is explained by the model. EVS is similar to R^2, but it is not affected by the number of features in the model.
Formula:
EVS = 1 - (TSS - RSS) / TSS
Note: It can be misleading if the target variable is not normally distributed. Additionally, EVS can be sensitive to outliers.
AIC (Akaike information criterion)¶
This metric measures the information loss associated with a model. AIC is a penalized likelihood metric, which means that it penalizes models with more parameters. AIC can be used to compare different models and select the model with the best trade-off between goodness of fit and complexity.
Formula:
AIC = 2 * K - 2 * log(L)
where:
K is the number of parameters in the model
L is the log-likelihood of the model
The log-likelihood of a model is a measure of how well the model fits the data. It is calculated by taking the natural logarithm of the likelihood of the model. The likelihood of a model is a probability distribution that describes how likely the observed data is under the assumption that the model is correct.
The formula for the log-likelihood function is:
log(L) = Σ(ln(p(x_i | θ))) where: L is the log-likelihood function p(x_i | θ) is the probability of observing the data point x_i under the assumption that the model parameters are θ Σ is the sum over all data points
The log-likelihood function is a logarithmic transformation of the likelihood function, which makes it easier to work with mathematically. The log-likelihood function is also a more stable measure of fit than the likelihood function, because it is less sensitive to small changes in the data.
A higher log-likelihood value indicates a better-fitting model. This is because a higher log-likelihood value means that the model is more likely to have generated the observed data.
The penalty term in AIC is 2 * K. This penalty term penalizes models with more parameters. The reason for this is that models with more parameters are more likely to overfit the data. Overfitting occurs when a model learns the noise in the data instead of the underlying relationships between the variables.
A lower AIC value indicates a better-fitting model. This is because a lower AIC value means that the model has a higher log-likelihood and a smaller penalty term.
For example, let’s say that we have two models that we are trying to compare. The first model has 2 parameters, and the second model has 3 parameters. The log-likelihood of the first model is 100, and the log-likelihood of the second model is 90. The AIC of the first model is 40, and the AIC of the second model is 50.
In this case, the first model would be considered to be a better-fitting model because it has a lower AIC value. This is because the first model has a higher log-likelihood and a smaller penalty term.
AIC is a useful metric for evaluating the fit of a statistical model to a set of data. It is a simple metric to understand and interpret, and it can be used to compare models with different numbers of parameters.
BIC (Bayesian information criterion)¶
This metric is similar to AIC, but it uses a different penalty term. BIC is also a penalized likelihood metric, but it penalizes models with more parameters more heavily than AIC. BIC can be used to compare different models and select the model with the best trade-off between goodness of fit and complexity.
Formula:
BIC = K * log(n) - 2 * log(L)
Comparison¶
Evaluation Function |
Description |
Advantages |
Disadvantages |
Best suited for |
Date of creation |
|---|---|---|---|---|---|
Mean Squared Error (MSE) |
Measures the average squared difference between the predicted and actual values |
Easy to understand and interpret |
Sensitive to outliers |
Linear regression models |
1823 |
Root Mean Squared Error (RMSE) |
The square root of the MSE |
More interpretable than MSE because it is in the same units as the target variable |
Sensitive to outliers |
Linear regression models |
1823 |
Mean Absolute Error (MAE) |
Measures the average absolute difference between the predicted and actual values |
Robust to outliers |
Not as informative as R-squared for non-linear regression models |
Non-linear regression models |
1823 |
R-squared |
Measures the proportion of the variance in the actual values that is explained by the predicted values |
Can be used for both linear and non-linear regression models |
Can be negative for poor models |
Linear and non-linear regression models |
1888 |
Adjusted R-squared |
A penalized version of R-squared that takes into account the number of predictors in the model |
More accurate than R-squared for selecting the best model in a model selection procedure |
Linear and non-linear regression models |
1908 |
|
Mean Absolute Percentage Error (MAPE) |
Measures the average absolute percentage difference between the predicted and actual values |
Easy to interpret because it is expressed as a percentage |
Can be misleading if the actual values are close to zero |
Linear and non-linear regression models |
1960 |
Root Mean Squared Log Error (RMSLE) |
The square root of the mean squared log error |
More informative than MAPE for forecasting models |
Cannot be used for regression models with non-positive target variables |
Linear and non-linear regression models |
1972 |
Median Absolute Error |
Measures the median absolute difference between the predicted and actual values |
Robust to outliers |
Not as interpretable as MSE or RMSE |
Linear and non-linear regression models |
1975 |
Explained Variance Score (EVS) |
Measures the proportion of the variance in the actual values that is explained by the predicted values |
Easy to understand and interpret |
Can be negative for poor models |
Linear and non-linear regression models |
1981 |
Akaike information criterion (AIC) |
A measure of the complexity of a statistical model |
Can be used to compare different models and select the best model |
Sensitive to the scale of the data |
Linear and non-linear regression models |
1973 |
Bayesian information criterion (BIC) |
A penalized version of the AIC that takes into account the number of parameters in the model |
More accurate than AIC for selecting the best model in a model selection procedure |
Sensitive to the scale of the data |
Linear and non-linear regression models |
1978 |
A Good Tip¶
The best evaluation metric to use for a regression problem depends on the specific problem and the desired outcome. For example, if the goal is to minimize the error, then MSE or RMSE might be the best metrics to use. If the goal is to understand the relationship between the features and the target variable, then R^2 or adjusted R^2 might be the best metrics to use.
Accuracy Metric in Regression Problem¶
There is no accuracy metric for a regression problem, although the business always ask for a accuracy metric to evaulate the model that a data science build.
One of the best approaches, is to define a threshould, so if the output predicted lies between this margain of +- threshould, then we can classify this prediction to be a 1, otherwise it is classified to be a zero.
Advanced estimation to define the acceptable margain, is that, it could depend on other features, depending on the use case we have, and its related features.