Skip to content

Learning ML.NET (2) – Evaluating and Improving the Classifier

In the previous post (Learning ML.NET – Using ML to Identify Lego Colors), we wrote a program that trains an ML model to identify the color of Lego pieces using C# and ML.NET. In this post, we’re going to see how we can evaluate the accuracy of the classifier, and tinker with the available parameters of the classifier to see if/how it can be improved. Disclaimer: I’m not an AI/ML expert (yet?). I’m doing this to learn and internalize my learning.

Machine learning is great because by using only examples, a computer can learn to do incredible stuff. At the same time, it has one problem (IMHO) – we don’t know how the algorithm works, and most importantly when it will work. Because of this, when an ML model is created we need to measure how good it is and only use it if we think it is good enough (which is completely dependent on the application). There are many metrics used to evaluate ML models. Here are some basic ones:

Accuracy: measures how often the classifier makes a correct prediction. In a multi-class problem, it is called micro-accuracy and counts the correct predictions over all classes.

Accuracy = \frac{Correct\ Predictions}{All\ Examples}

Per-Class Accuracy: in a multi-class scenario, the accuracy for each class can be different, so we can calculate the accuracy per class. The calculation is the same as for Accuracy but using instances of a single class. It is also called macro-Accuracy.

Per-Class\ Accuracy = \sum\limits_{All\ Classes}\frac{Correct\ Class\ Predictions}{All\ Class\ Examples}

Log-Loss: Measure the confidence of the classifier using a complex information-theory equation that I’m not going to write here :-). the important thing to know is that the lower the log-loss the better, with a minimum value of 0 for perfect accuracy.

Log-Loss Reduction: A measurement that can be interpreted as how much better the classifier over a random prediction. It ranges from -infinity to 1, where 1 is perfect prediction, 0 is mean prediction. I have no idea what negative values (or mean prediction means), but in general the closer this is to one, the better.

So how do we actually do this? ML.NET comes with a built-in function to do model evaluation using Cross-Validation. In Cross-Validation, the input training set is divided into X sets (folds). The network is then trained with the examples in X-1 folds and evaluated with the examples in the remaining fold. Doing this we are evaluating not a specific model but the architecture of the neural network and the hyperparameters used to create the model. And what are hyperparameters? Glad you asked :-). A Hyperparameter is a parameter that controls the learning process of the neural network. An example of a hyperparameter is the number of epochs (iterations) done to train the model. Each machine learning algorithm/framework has different hyperparameters.

Let’s see how this looks in our example:

static void Evaluate(MLContext mlContext, IDataView trainingDataView, IEstimator<ITransformer> trainingPipeline)
    Console.WriteLine("=============== Cross-validating to get model's accuracy metrics ===============");
    var crossValidationResults = mlContext.MulticlassClassification.CrossValidate(trainingDataView, trainingPipeline, numberOfFolds: 5, labelColumnName: "Label");

The result of this cross-validation is a list of cross-validation results (one per fold) where each result contains the metrics evaluated for that fold. By averaging the metrics on all folds we get a good idea of how our ML trainer and model will do “in the wild”.

Now we can test modifications to the hyperparameters to see if we can get a model that can identify correctly our Lego pieces. This is done with an ImageClassificationTrainer.Options instance that is passed as a parameter to the ML trainer. To keep things simple we are going to modify only two of them: the architecture and the number of epochs. The value of the architecture hyperparameter defines which base neural network architecture is used by the trainer. Doing image classification is a HARD problem that takes a lot of time, and there are already highly complex neural network models that can identify shapes, outlines, and cats (to name a few). What ML.NET does is take one of the existing models as the starting point of our neural network and train it from there with the examples we have. This is called Transfer Learning. ML.NET comes with 4 different architectures. The change is very simple:

var trainer = mlContext.MulticlassClassification.Trainers.ImageClassification(
    new ImageClassificationTrainer.Options() 
            Arch = architecture,
            Epoch = epoch,
            FeatureColumnName = "Features",
            LabelColumnName = "Label", 
        .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));

To see how these hyperparameters affect the classifier, we now run the evaluation using all of the available architectures, and epoch values of 50, 100, 200, and 400 (the default is 200). The results are shown in the table below.

As you can see, MobilenetV2 beats the other algorithm. What is even more interesting is that running less epochs of training resulted in a better classifier. Not something I would have expected, and since the training set is relatively small, I don’t there are general conclusions that can be learned from this.

And finally, running a classifier using MobilnetV2 on our test pieces gives us the correct result:

Testing with black piece. Prediction: Black.
Testing with blue piece. Prediction: Blue.
Testing with green piece. Prediction: Green.
Testing with yellow piece. Prediction: Yellow.


As always, the full code for this tutorial and supporting files can be found on my (GitHub repo)[]. Hope you enjoyed this tutorial and until next time, happy coding!


  1. Evaluate your ML.NET model with metrics.
  2. Evaluating Machine Learning Models – A Beginner’s Guide to Key Concepts and Pitfalls.
Published inProgramming

Be First to Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.