In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data.
In the following, we start a Python interpreter from our shell and then load the In the case of the digits dataset, the task is to predict, given an image, which digit it represents.
For instance, many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order.
If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
We can separate learning problems in a few large categories: Training set and testing set Machine learning is about learning some properties of a data set and applying them to new data.
This is why a common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the training set on which we learn data properties and one that we call the testing set on which we test these properties.