OpenCV & ML (Deep Learning) 03 – Regression and Mulit-layer Perceptron (MLP)

Utilizing OpenCV’s Multi-layer Perceptron from Artificial Neural Network module (ANN_MLP) for predicting a periodic or parabola 1-dimensional function (A regression problem). Samples are drawn randomly within a specific range in function domain meanwhile a rectangular (uniform distributed) or Gaussian (normal distributed) noise are applied to them.

Now, we could step forward and practice a much better example for MLP. This time we will use a periodic function like sine to generate the inputs and outputs. You could use any non-linear mathematical function such as a parabola as long as your activation function is non-linear like Symmetric Sigmoid f(x)=1.7159 \times tanh(2/3 \times x) which is our choice for this example. Of course, to model a periodic function, LSTM (RNN) models might be a better choice. we will get to that later. These types of problems that output variables takes continues values are so called regression. Classification is the quite antithesis of regression in terms of output types inasmuch as the output variable takes class labels. In OpenCV Artificial Neural Nets are categorized with non-classifiers tools, but that does not mean you could n’t use it for classification problems like what we are going to do in the next tutorial. What it means is that you should expect real numbers as output responses and then it is your responsibility to assign each float number to a class label. We will practice this by simply rounding the floating point to an integer in the next tutorial.

For this example, function y=sin(x)+cos(3 \times x) on the domain value ranging from -3.14 to +3.14 is used. In order to provide input and output data we need some discrete values from domain. In machine learning before using the training samples, it is always better to shuffle the samples. I will initialize an OpenCV matrix with random values with in the mentioned domain in order to provide input data. Since the values will be generated randomly we do not need to shuffle them. The number of rows in this matrix should equal to our number of training samples. The matrix type should be of type float as OpenCV ML module only uses floating point elements.

Mat1f x(150, 1); // Or: Mat mat(150, 1, CV_32FC1);
randu(x, Scalar(-1*M_PI), Scalar(+1*M_PI));
// randn(noise, 0, .3); // for normal distrbution

We could make the problem more realistic by adding some uniform (or normal) noise to the training samples.

Mat noise(x.size(), CV_32F);
randu(noise, -.4, .4); // use randn for normal distrbution
Mat y = fun(x) + noise;

Now x and y provide us with 150 training samples for input and output. The current version of OpenCV lacks sine and cosine functions for matrices so, I implemented simple element wise ones for calculating these functions. A better optimized implementation would be to use a lookup table with OpenCV’s parallel_for_() API to make it much more efficient. But this is not our concern in this tutorial.

template<typename type>
Mat_<type> sin(const Mat_<type>& x) {
	int channels = x.channels();
	int nRows = x.rows;
	int nCols = x.cols * channels;
	if (x.isContinuous()) {
		nCols *= nRows;
		nRows = 1;
	}
	Mat_<type> y = x.clone();
	for(int i = 0; i < nRows; ++i)   {
		type* p = reinterpret_cast <type*>(y.ptr(i));
		for (int j = 0; j < nCols; ++j) {
			p[j] = sin(p[j]);
		}
	}
	return y;
}

template <typename type>
Mat_<type> cos(const Mat_<type>& x) {
	return sin<type>(M_PI/2 - x);
}

OpenCV also provides a module named Plot for drawing 2D matrices. This module is employed here to demonstrate the results. The following function accepts  input and output matrices of both training samples and predicted results, then render them to an image matrix with the proper minimum and maximum values. The Plot module only accepts double precision numbers, that is why the matrices type are required to be converted first.

Mat plot_results(const Mat& train_x, const Mat& train_y,
                 const Mat& test_x, const Mat& test_y) {
    // plot module only acceptes double values
    Mat xd, yd;
    train_x.convertTo(xd, CV_64F);
    train_y.convertTo(yd, CV_64F);

    Mat xtd, ytd;
    test_x.convertTo(xtd, CV_64F);
    test_y.convertTo(ytd, CV_64F);

    //adjust border and margins of the 2 plots to match  together
    double xmin, xmax;
    minMaxIdx(xd, &xmin, &xmax);
    double ymin, ymax;
    minMaxIdx(yd, &ymin, &ymax);

    double xtmin, xtmax;
    minMaxIdx(xtd, &xtmin, &xtmax);
    double ytmin, ytmax;
    minMaxIdx(ytd, &ytmin, &ytmax);

    xmin = min(xmin, xtmin)-1;
    xmax = max(xmax, xtmax)+1;
    ymin = min(ymin, ytmin)-1;
    ymax = max(ymax, ytmax)+1;

    Ptr<plot::Plot2d> plot_train = plot::Plot2d::create(xd, yd);
    plot_train->setMinX(xmin);
    plot_train->setMaxX(xmax);
    plot_train->setMinY(ymin);
    plot_train->setMaxY(ymax);
    plot_train->setPlotLineColor(Scalar(0, 255, 0)); // Green
    plot_train->setPlotLineWidth(3);
    plot_train->setNeedPlotLine(false);
    plot_train->setShowGrid(false);
    plot_train->setShowText(false);
    plot_train->setPlotAxisColor(Scalar(0, 0, 0)); // Black (invisible)
    Mat img_trainpl;
    plot_train->render(img_trainpl);


    Ptr<plot::Plot2d> plot_test = plot::Plot2d::create(xtd, ytd);
    plot_test->setMinX(xmin);
    plot_test->setMaxX(xmax);
    plot_test->setMinY(ymin);
    plot_test->setMaxY(ymax);
    plot_test->setShowGrid(false);
    plot_test->setShowText(false);
    Mat img_testpl;
    plot_test->render(img_testpl);

    // to conformt the common coordinates (x increase from bottom to top)
    Mat img;
    flip(img_trainpl+img_testpl, img, 0);

    return img;
}

Scripting languages like MATLAB and Numpy (Python library) offers a method for generating evenly spaced vectors like x = -3.14:0.0.42:3.14; or x = linspace(-3.14,3.14,150) for MATLAB and x = np.arange(-3.14, 3.14, 0.042) or x = np.linspace(-3.14, 3.14, 150) for Numpy. Here I implemented almost the same equivalent functions for OpenCV. So, we could use them in this example and later to have more lucid source codes.

template<typename type>
Mat_<type> linspace(type start, type stop, size_t n) {
    type step = (stop-start)/n;
    type* data = new type[n];
    if(!data) {
        cerr << "Memory allocation failed!" << endl;
        return Mat();
    }
    generate(data, data+n, [&]()->type {type t=start; start+=step;return t;});
    return Mat_<type>(int(n), 1, data);
}

template<typename type>
Mat_<type> arange(type start, type stop, type step) {
    size_t sz = size_t((stop-start)/step);
  type* data = new type[sz];
  if(!data) {
    cerr << "Memory allocation failed!" << endl;
    return Mat();
  }
    generate(data, data+sz, [&]()->type {type t=start; start+=step;return t;});
  return Mat_<type>(int(sz), 1, data);
}

template<typename type>
Mat_<type> arange(type stop, type step = static_cast<type>(1)) {
    return arange(static_cast<type>(0), stop, step);
}

Following you could find the implementation of function which we have used to generate our training samples. You could replace the inner code to see how our network performs on other functions.

template<typename type>
Mat_<type> fun(const Mat_<type>& x) {
    return sin<type>(x)+cos<type>(3*x);
}

While you are using train() method in OpenCV on your samples it will automatically scales the inputs and outputs and also it will initialize the network weights using Nguyen-Widrow algorithm. following is our main code block, if you have reviewed the previous tutorial you should be familiar with most of it. For this example I expanded the number of layers to 5 and increased number of neurons in hidden layers up to 16. I have used a new API of OpenCV called setTrainTestSplitRatio() to separate the training samples from test ones. Since, our training samples are already shuffled I set the second argument to false value. The other function I used here is calcError() to calculate the Root Mean Square (RMS). The smaller the RMSE, the better, but there is no absolute threshold for it however, if it crosses 1.0 (in most cases) you should review your model and apply some changes. It is possible to calculate RMS on training or test samples. By setting the second argument to true the RMS will be calculated only on test samples. I also measured the training and prediction times using OpenCV TickMeter, so, that we could compare the results to a similar code from Keras (A a high-level neural networks) running based on TesnsorFlow. While the RMSE is calculated only on test samples, the prediction is calculated over the whole range of function in order to find if the network could predict the function shape properly.

int main() {
    // seeding random number generator of OpenCV
    cv::theRNG().state = uint64(cv::getTickCount());

    // OpenCV ml::ann module only uses float32 matrices
    // input traing data => number of rows: training samples
    // number of columns: data dimention=0-th (input) layer size
    Mat1f x(150, 1); // Or: Mat mat(150, 1, CV_32FC1);
    randu(x, Scalar(-1*M_PI), Scalar(+1*M_PI));
  
    // Initialize noise matrix of uniform distrbution
    Mat noise(x.size(), CV_32F);
    randu(noise, -.4, .4); // uniform distrbution
    //randn(noise, 0, .3); // normal distrbution

    // evaluate the function plus noise
    Mat y = fun(x) + noise;

  Ptr<ANN_MLP> net = ANN_MLP::create();
  
  Mat layerSizes = (Mat_<int>(5, 1) << x.cols, 4, 8, 4, y.cols);
  net->setLayerSizes(layerSizes);

    /*
     * sigmoid symmetric in OpenCV is 1.7159\*tanh(2/3 \* x),
     * same as tanh function in other libraries
     * The output will range from [-1.7159, 1.7159], instead of [0,1]
     */
  net->setActivationFunction(ANN_MLP::ActivationFunctions::SIGMOID_SYM);
  TermCriteria termCrit = TermCriteria(
    TermCriteria::Type::COUNT + TermCriteria::Type::EPS,
    1e4, 1e-10);
  net->setTermCriteria(termCrit);
  net->setTrainMethod(ANN_MLP::TrainingMethods::BACKPROP);
  
  Ptr<TrainData> trainingData = TrainData::create(
        x, SampleTypes::ROW_SAMPLE, y
  );
    // Spilits the traing set to have subset for testing
    trainingData->setTrainTestSplitRatio(.7, false); // shuffle already done

    cv::TickMeter t;
    // First will scale the input and output for training samples and
    // then Will use Nguyen-Widrow algorithm to initialize the weights
    t.start();
  net->train(trainingData);
    t.stop();
    double t_train = t.getTimeSec();
    t.reset();

    Mat resp;
    float rms = net->calcError(trainingData, true, resp);
  
    // plot perdicted results vs training samples to visualize the performance
  auto test = arange(-3.14f, +3.14f, .001f);
  Mat result;
    t.start();
  net->predict(test, result);
    t.stop();
    double t_predc = t.getTimeMilli();

    cout << "RMS: " << rms << endl
         << "Training Time: " << t_train << " s" << endl
         << "Prediction Time: " << t_predc << " ms" << endl;


    Mat img = plot_results(x, y, test, result);
    namedWindow("MLP");
    imshow("MLP", img);
  
  waitKey();
  return 0;
}
/*
BACKPROP
RMS: 0.762911
Training Time: 16.0031 s
Prediction Time: 1.7267 ms

RPROP
RMS: 1.08866
Training Time: 0.733092 s
Prediction Time: 1.65154 ms
*/

For the Makefile you need to link opencv_highgui and opencv_plot in addition to the libraries we have linked previously. In the following Makefile the name of source code file is considered MLPPredictPeriodic.cpp.

CPP = g++
CXXFLAGS = -std=c++11 -O3
INC = -I/usr/local/opencv3.3-cpu/include
LIBS = -L/usr/local/opencv3.3-cpu/lib
LIBS += -lopencv_core -lopencv_highgui -lopencv_ml -lopencv_plot

TARGETS = MLPPredictPeriodic

.DEFAULT: all

.PHONY: all debug clean

all: $(TARGETS)

debug: CXXFLAGS += -g

MLPPredictPeriodic: MLPPredictPeriodic.cpp
  $(CPP) $(CXXFLAGS) $(LIBS) $(INC) MLPPredictPeriodic.cpp -o MLPPredictPeriodic

clean:
  rm -f $(TARGETS) *.o

In the following image training samples have green square shape while predicted samples are yellow pixels lined to shape the function curve.

As you can see predicted results are almost like the expected curve with all the noise removed. So, the network is learned the function (of course in the provided range). Try to change the training method from BACKPROP to RPROP. As mentioned, the training speed will increase significantly while it looks like the network learned a little noise as well. In most cases we prefer to reduce the training time, because in real world applications sometimes it takes days to train a network even with cluster of computers. The results in the table at the end of tutorial shows that the training time was 16x faster for the same number of iteration but at the cost of increase in RMS which could be amended by increasing number of iterations.

Following you can find Keras implementation of the same example with it’s timing results on the same computer. Since, Keras is a rich high level library dedicated to neural network it will help you understand the concepts used in this example for neural network more.

import time
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers.core import Dense
from sklearn.metrics import mean_squared_error

def f(X): return np.sin(X) + np.cos(3*X)

np.random.seed(seed=int(time.time()))  # seeding random generator

a = -1*np.pi  # domain min
b = np.pi  # domain max

x = (b - a) * np.random.random_sample(150) + a  # input
y = f(x) + np.random.uniform(-.4, .4, (150,))  # output

SplitRatio = .7 # split input samples of data
border = int(len(x)*SplitRatio)
xtr, ytr = x[:border,], y[:border,]  # Training samples
xte, yte = x[border:,], y[border:,]  # Testing samples

fig = plt.plot(x, y, 'b.')  # plotting training samples

# Constructing a Multilayer Perceptron
model = Sequential()    # Feed forward Network
model.add(Dense(4, input_dim=1, activation='tanh'))
model.add(Dense(8, activation='tanh'))
model.add(Dense(4, activation='tanh'))
model.add(Dense(1))  # output layer

# stochastic gradient descent (sgd) optimization
model.compile(loss='mean_squared_error', optimizer='sgd')
# model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])

tm = time.time()
hist = model.fit(xtr, ytr, epochs=10000, verbose=0)
tm_train = time.time() - tm

xr = np.arange(a, b, .001)

tm = time.time()
results = model.predict(xr)  # predicting values over the function domain
tm_predict = time.time() - tm

print("RMS: ", mean_squared_error(xte, yte))  # calculating RMSE on test samples
# print(model.evaluate(xtr, ytr)) # print loss and accuracy
print("Training Time: %.4f s" % tm_train)
print("Prediction Time: %.4f ms" % (tm_predict * 1000))

fig = plt.plot(xr, f(xr), 'r')  # plotting the real function
plt.plot(xr, results, 'g')  # plotting the predicted values over the function domain
plt.show()

'''
RMS:  1.99770915462
Training Time: 78.9976 s
Prediction Time: 210.7198 ms
'''

And this is the result image retrieved from Python for the above code. The blue points are training samples, the red plot is original funtion and the green plot is the predicted one.

The following table compare all the results side by side, of course Keras has lots of parameters to be tuned but I tried to make it similar to OpenCV defaults as much as possible.

  RMS Training
Time (s)
Prediction
Time (ms)
OpenCV 3.3
BACKPROP
0.762911 16.0031 1.7267
OpenCV 3.3
RPROP
1.08866 0.7331 1.6515
Keras
(Tensor flow)
1.997709 78.9976 210.7198

Source codes of this tutorial on Github

Cite this article as: Amir Mehrafsa, "OpenCV & ML (Deep Learning) 03 – Regression and Mulit-layer Perceptron (MLP)," in MEXUAZ, September 14, 2017, http://mexuaz.com/opencv-machine-learning-03/.

If you have found a spelling error, please, notify us by selecting that text and pressing Ctrl+Enter.

Add a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Spelling error report

The following text will be sent to our editors: