OpenCV & ML (Deep Learning) 03 – Regression and Mulit-layer Perceptron (MLP)
Utilizing OpenCV’s Multi-layer Perceptron from Artificial Neural Network module (ANN_MLP) for predicting a periodic or parabola 1-dimensional function (A regression problem). Samples are drawn randomly within a specific range in function domain meanwhile a rectangular (uniform distributed) or Gaussian (normal distributed) noise are applied to them.
Now, we could step forward and practice a much better example for MLP. This time we will use a periodic function like to generate the inputs and outputs. You could use any non-linear mathematical function such as a parabola as long as your activation function is non-linear like Symmetric Sigmoid
which is our choice for this example. Of course, to model a periodic function, LSTM (RNN) models might be a better choice. we will get to that later. These types of problems that output variables takes continues values are so called regression. Classification is the quite antithesis of regression in terms of output types inasmuch as the output variable takes class labels. In OpenCV Artificial Neural Nets are categorized with non-classifiers tools, but that does not mean you could n’t use it for classification problems like what we are going to do in the next tutorial. What it means is that you should expect real numbers as output responses and then it is your responsibility to assign each float number to a class label. We will practice this by simply rounding the floating point to an integer in the next tutorial.
For this example, function on the domain value ranging from -3.14 to +3.14 is used. In order to provide input and output data we need some discrete values from domain. In machine learning before using the training samples, it is always better to shuffle the samples. I will initialize an OpenCV matrix with random values with in the mentioned domain in order to provide input data. Since the values will be generated randomly we do not need to shuffle them. The number of rows in this matrix should equal to our number of training samples. The matrix type should be of type float as OpenCV ML module only uses floating point elements.
Mat1f x(150, 1); // Or: Mat mat(150, 1, CV_32FC1); randu(x, Scalar(-1*M_PI), Scalar(+1*M_PI)); // randn(noise, 0, .3); // for normal distrbution
We could make the problem more realistic by adding some uniform (or normal) noise to the training samples.
Mat noise(x.size(), CV_32F); randu(noise, -.4, .4); // use randn for normal distrbution Mat y = fun(x) + noise;
Now x and y provide us with 150 training samples for input and output. The current version of OpenCV lacks sine and cosine functions for matrices so, I implemented simple element wise ones for calculating these functions. A better optimized implementation would be to use a lookup table with OpenCV’s parallel_for_() API to make it much more efficient. But this is not our concern in this tutorial.
template<typename type> Mat_<type> sin(const Mat_<type>& x) { int channels = x.channels(); int nRows = x.rows; int nCols = x.cols * channels; if (x.isContinuous()) { nCols *= nRows; nRows = 1; } Mat_<type> y = x.clone(); for(int i = 0; i < nRows; ++i) { type* p = reinterpret_cast <type*>(y.ptr(i)); for (int j = 0; j < nCols; ++j) { p[j] = sin(p[j]); } } return y; } template <typename type> Mat_<type> cos(const Mat_<type>& x) { return sin<type>(M_PI/2 - x); }
OpenCV also provides a module named Plot for drawing 2D matrices. This module is employed here to demonstrate the results. The following function accepts input and output matrices of both training samples and predicted results, then render them to an image matrix with the proper minimum and maximum values. The Plot module only accepts double precision numbers, that is why the matrices type are required to be converted first.
Mat plot_results(const Mat& train_x, const Mat& train_y, const Mat& test_x, const Mat& test_y) { // plot module only acceptes double values Mat xd, yd; train_x.convertTo(xd, CV_64F); train_y.convertTo(yd, CV_64F); Mat xtd, ytd; test_x.convertTo(xtd, CV_64F); test_y.convertTo(ytd, CV_64F); //adjust border and margins of the 2 plots to match together double xmin, xmax; minMaxIdx(xd, &xmin, &xmax); double ymin, ymax; minMaxIdx(yd, &ymin, &ymax); double xtmin, xtmax; minMaxIdx(xtd, &xtmin, &xtmax); double ytmin, ytmax; minMaxIdx(ytd, &ytmin, &ytmax); xmin = min(xmin, xtmin)-1; xmax = max(xmax, xtmax)+1; ymin = min(ymin, ytmin)-1; ymax = max(ymax, ytmax)+1; Ptr<plot::Plot2d> plot_train = plot::Plot2d::create(xd, yd); plot_train->setMinX(xmin); plot_train->setMaxX(xmax); plot_train->setMinY(ymin); plot_train->setMaxY(ymax); plot_train->setPlotLineColor(Scalar(0, 255, 0)); // Green plot_train->setPlotLineWidth(3); plot_train->setNeedPlotLine(false); plot_train->setShowGrid(false); plot_train->setShowText(false); plot_train->setPlotAxisColor(Scalar(0, 0, 0)); // Black (invisible) Mat img_trainpl; plot_train->render(img_trainpl); Ptr<plot::Plot2d> plot_test = plot::Plot2d::create(xtd, ytd); plot_test->setMinX(xmin); plot_test->setMaxX(xmax); plot_test->setMinY(ymin); plot_test->setMaxY(ymax); plot_test->setShowGrid(false); plot_test->setShowText(false); Mat img_testpl; plot_test->render(img_testpl); // to conformt the common coordinates (x increase from bottom to top) Mat img; flip(img_trainpl+img_testpl, img, 0); return img; }
Scripting languages like MATLAB and Numpy (Python library) offers a method for generating evenly spaced vectors like x = -3.14:0.0.42:3.14;
or x = linspace(-3.14,3.14,150)
for MATLAB and x = np.arange(-3.14, 3.14, 0.042)
or x = np.linspace(-3.14, 3.14, 150)
for Numpy. Here I implemented almost the same equivalent functions for OpenCV. So, we could use them in this example and later to have more lucid source codes.
template<typename type> Mat_<type> linspace(type start, type stop, size_t n) { type step = (stop-start)/n; type* data = new type[n]; if(!data) { cerr << "Memory allocation failed!" << endl; return Mat(); } generate(data, data+n, [&]()->type {type t=start; start+=step;return t;}); return Mat_<type>(int(n), 1, data); } template<typename type> Mat_<type> arange(type start, type stop, type step) { size_t sz = size_t((stop-start)/step); type* data = new type[sz]; if(!data) { cerr << "Memory allocation failed!" << endl; return Mat(); } generate(data, data+sz, [&]()->type {type t=start; start+=step;return t;}); return Mat_<type>(int(sz), 1, data); } template<typename type> Mat_<type> arange(type stop, type step = static_cast<type>(1)) { return arange(static_cast<type>(0), stop, step); }
Following you could find the implementation of function which we have used to generate our training samples. You could replace the inner code to see how our network performs on other functions.
template<typename type> Mat_<type> fun(const Mat_<type>& x) { return sin<type>(x)+cos<type>(3*x); }
While you are using train() method in OpenCV on your samples it will automatically scales the inputs and outputs and also it will initialize the network weights using Nguyen-Widrow algorithm. following is our main code block, if you have reviewed the previous tutorial you should be familiar with most of it. For this example I expanded the number of layers to 5 and increased number of neurons in hidden layers up to 16. I have used a new API of OpenCV called setTrainTestSplitRatio() to separate the training samples from test ones. Since, our training samples are already shuffled I set the second argument to false value. The other function I used here is calcError() to calculate the Root Mean Square (RMS). The smaller the RMSE, the better, but there is no absolute threshold for it however, if it crosses 1.0 (in most cases) you should review your model and apply some changes. It is possible to calculate RMS on training or test samples. By setting the second argument to true the RMS will be calculated only on test samples. I also measured the training and prediction times using OpenCV TickMeter, so, that we could compare the results to a similar code from Keras (A a high-level neural networks) running based on TesnsorFlow. While the RMSE is calculated only on test samples, the prediction is calculated over the whole range of function in order to find if the network could predict the function shape properly.
int main() { // seeding random number generator of OpenCV cv::theRNG().state = uint64(cv::getTickCount()); // OpenCV ml::ann module only uses float32 matrices // input traing data => number of rows: training samples // number of columns: data dimention=0-th (input) layer size Mat1f x(150, 1); // Or: Mat mat(150, 1, CV_32FC1); randu(x, Scalar(-1*M_PI), Scalar(+1*M_PI)); // Initialize noise matrix of uniform distrbution Mat noise(x.size(), CV_32F); randu(noise, -.4, .4); // uniform distrbution //randn(noise, 0, .3); // normal distrbution // evaluate the function plus noise Mat y = fun(x) + noise; Ptr<ANN_MLP> net = ANN_MLP::create(); Mat layerSizes = (Mat_<int>(5, 1) << x.cols, 4, 8, 4, y.cols); net->setLayerSizes(layerSizes); /* * sigmoid symmetric in OpenCV is 1.7159\*tanh(2/3 \* x), * same as tanh function in other libraries * The output will range from [-1.7159, 1.7159], instead of [0,1] */ net->setActivationFunction(ANN_MLP::ActivationFunctions::SIGMOID_SYM); TermCriteria termCrit = TermCriteria( TermCriteria::Type::COUNT + TermCriteria::Type::EPS, 1e4, 1e-10); net->setTermCriteria(termCrit); net->setTrainMethod(ANN_MLP::TrainingMethods::BACKPROP); Ptr<TrainData> trainingData = TrainData::create( x, SampleTypes::ROW_SAMPLE, y ); // Spilits the traing set to have subset for testing trainingData->setTrainTestSplitRatio(.7, false); // shuffle already done cv::TickMeter t; // First will scale the input and output for training samples and // then Will use Nguyen-Widrow algorithm to initialize the weights t.start(); net->train(trainingData); t.stop(); double t_train = t.getTimeSec(); t.reset(); Mat resp; float rms = net->calcError(trainingData, true, resp); // plot perdicted results vs training samples to visualize the performance auto test = arange(-3.14f, +3.14f, .001f); Mat result; t.start(); net->predict(test, result); t.stop(); double t_predc = t.getTimeMilli(); cout << "RMS: " << rms << endl << "Training Time: " << t_train << " s" << endl << "Prediction Time: " << t_predc << " ms" << endl; Mat img = plot_results(x, y, test, result); namedWindow("MLP"); imshow("MLP", img); waitKey(); return 0; } /* BACKPROP RMS: 0.762911 Training Time: 16.0031 s Prediction Time: 1.7267 ms RPROP RMS: 1.08866 Training Time: 0.733092 s Prediction Time: 1.65154 ms */
For the Makefile you need to link opencv_highgui and opencv_plot in addition to the libraries we have linked previously. In the following Makefile the name of source code file is considered MLPPredictPeriodic.cpp.
CPP = g++ CXXFLAGS = -std=c++11 -O3 INC = -I/usr/local/opencv3.3-cpu/include LIBS = -L/usr/local/opencv3.3-cpu/lib LIBS += -lopencv_core -lopencv_highgui -lopencv_ml -lopencv_plot TARGETS = MLPPredictPeriodic .DEFAULT: all .PHONY: all debug clean all: $(TARGETS) debug: CXXFLAGS += -g MLPPredictPeriodic: MLPPredictPeriodic.cpp $(CPP) $(CXXFLAGS) $(LIBS) $(INC) MLPPredictPeriodic.cpp -o MLPPredictPeriodic clean: rm -f $(TARGETS) *.o
In the following image training samples have green square shape while predicted samples are yellow pixels lined to shape the function curve.
As you can see predicted results are almost like the expected curve with all the noise removed. So, the network is learned the function (of course in the provided range). Try to change the training method from BACKPROP to RPROP. As mentioned, the training speed will increase significantly while it looks like the network learned a little noise as well. In most cases we prefer to reduce the training time, because in real world applications sometimes it takes days to train a network even with cluster of computers. The results in the table at the end of tutorial shows that the training time was 16x faster for the same number of iteration but at the cost of increase in RMS which could be amended by increasing number of iterations.
Following you can find Keras implementation of the same example with it’s timing results on the same computer. Since, Keras is a rich high level library dedicated to neural network it will help you understand the concepts used in this example for neural network more.
import time import numpy as np import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers.core import Dense from sklearn.metrics import mean_squared_error def f(X): return np.sin(X) + np.cos(3*X) np.random.seed(seed=int(time.time())) # seeding random generator a = -1*np.pi # domain min b = np.pi # domain max x = (b - a) * np.random.random_sample(150) + a # input y = f(x) + np.random.uniform(-.4, .4, (150,)) # output SplitRatio = .7 # split input samples of data border = int(len(x)*SplitRatio) xtr, ytr = x[:border,], y[:border,] # Training samples xte, yte = x[border:,], y[border:,] # Testing samples fig = plt.plot(x, y, 'b.') # plotting training samples # Constructing a Multilayer Perceptron model = Sequential() # Feed forward Network model.add(Dense(4, input_dim=1, activation='tanh')) model.add(Dense(8, activation='tanh')) model.add(Dense(4, activation='tanh')) model.add(Dense(1)) # output layer # stochastic gradient descent (sgd) optimization model.compile(loss='mean_squared_error', optimizer='sgd') # model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy']) tm = time.time() hist = model.fit(xtr, ytr, epochs=10000, verbose=0) tm_train = time.time() - tm xr = np.arange(a, b, .001) tm = time.time() results = model.predict(xr) # predicting values over the function domain tm_predict = time.time() - tm print("RMS: ", mean_squared_error(xte, yte)) # calculating RMSE on test samples # print(model.evaluate(xtr, ytr)) # print loss and accuracy print("Training Time: %.4f s" % tm_train) print("Prediction Time: %.4f ms" % (tm_predict * 1000)) fig = plt.plot(xr, f(xr), 'r') # plotting the real function plt.plot(xr, results, 'g') # plotting the predicted values over the function domain plt.show() ''' RMS: 1.99770915462 Training Time: 78.9976 s Prediction Time: 210.7198 ms '''
And this is the result image retrieved from Python for the above code. The blue points are training samples, the red plot is original funtion and the green plot is the predicted one.
The following table compare all the results side by side, of course Keras has lots of parameters to be tuned but I tried to make it similar to OpenCV defaults as much as possible.
RMS | Training Time (s) |
Prediction Time (ms) |
|
---|---|---|---|
OpenCV 3.3 BACKPROP |
0.762911 | 16.0031 | 1.7267 |
OpenCV 3.3 RPROP |
1.08866 | 0.7331 | 1.6515 |
Keras (Tensor flow) |
1.997709 | 78.9976 | 210.7198 |
Source codes of this tutorial on Github
Spelling error report
The following text will be sent to our editors: