Validation Error less than training error?
I found two questions here and here about this issue but there is no obvious answer or explanation yet.I enforce the same problem where the validation error is less than training error in my Convolution Neural Network. What does that mean?
I don't think this question can be answered without knowing the absolute number of training (cv) and test cases as well as the variance observed for MSE for both cross validation and test.
It is difficult to be certain without knowing your actual methodology (e.g. cross-validation method, performance metric, data splitting method, etc.).
Generally speaking though, training error will almost always underestimate your validation error. However it is possible for the validation error to be less than the training. You can think of it two ways:
- Your training set had many 'hard' cases to learn
- Your validation set had mostly 'easy' cases to predict
That is why it is important that you really evaluate your model training methodology. If you don't split your data for training properly your results will lead to confusing, if not simply incorrect, conclusions.
I think of model evaluation in four different categories:
Underfitting – Validation and training error high
Overfitting – Validation error is high, training error low
Good fit – Validation error low, slightly higher than the training error
Unknown fit - Validation error low, training error 'high'
I say 'unknown' fit because the result is counter intuitive to how machine learning works. The essence of ML is to predict the unknown. If you are better at predicting the unknown than what you have 'learned', AFAIK the data between training and validation must be different in some way. This could mean you either need to reevaluate your data splitting method, adding more data, or possibly changing your performance metric (are you actually measuring the performance you want?).
To address the OP's reference to a previous python lasagne question.
This suggests that you have sufficient data to not require cross-validation and simply have your training, validation, and testing data subsets. Now, if you look at the lasagne tutorial you can see that the same behavior is seen at the top of the page. I would find it hard to believe the authors would post such results if it was strange but instead of just assuming they are correct let's look further. The section of most interest to us here is in the training loop section, just above the bottom you will see how the loss parameters are calculated.
The training loss is calculated over the entire training dataset. Likewise, the validation loss is calculated over the entire validation dataset. The training set is typically at least 4 times as large as the validation (80-20). Given that the error is calculated over all samples, you could expect up to approximately 4X the loss measure of the validation set. You will notice, however, that the training loss and validation loss are approaching one another as training continues. This is intentional as if your training error begins to get lower than your validation error you would be beginning to overfit your model!!!
I hope this clarifies these errors.
Nice answer. There is also the possibility that there is a bug in the code which makes it possible that training has not converged to the optimal soluion on the training set. Or,if the training objective is non convex and the training algorith converges to a local minimum that happens to be good for the validation set.
@cdeterman thanks.I use RMSE as a performance metric. I've divided my data into 20% for test and 80% for training and validation (20% of training data is cross validated to compute the validation error ). Actually, the Validation error is low, slightly lower than the training error. The test error is higher than training and validation errors. We can find a similar case in MNISTdataset for handwriting recognition http://stats.stackexchange.com/questions/178371/python-lasagne-tutorial-validation-error-lower-than-training-error?lq=1
@cdeterman Thanks. I've just noticed that you've edited your answer. It is clear and helpful.
Great explanation, if you could add a few graphs - it would be the best one possible