Model Selection with AIC & BIC
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) sound like twin brothers, what are the differences and how can we apply them?
Q: Which model is the best?
Is it the most complex model that fits the data perfectly and captures every single detail of the data? Or should it be a simpler model that fits the data on a big picture and omits the unnecessary details?
The best model should be complex enough to capture key details of the data, yet not too complex to overfit the data.
A good model does not overfit, yet is flexible enough to be generalized.
A model that finds the best balance will be predictive. But how can we evaluate a model’s complexity and its fitting numerically? AIC and BIC are the tools we can utilize for this.
Where k, the number of parameters, captures the complexity of a model. ln(L), the log-likelihood of the model on the data, captures the goodness of fit. And n is the number of data points.
A model with a lower AIC and BIC provides a…