Nonlinear Least Squares Regression for Python

In this article I will revisit my previous article on how to do Nonlinear Least Squares (NLLS) Regression fitting, but this time I will explore some of the options in the Python programming language. I wrote that walkthrough article a few years before this one, and since then, all nonlinear problems in data science seem to be immediately chucked into the magic answer machine called Deep Learning which is the foundation of what is pervasively now called AI. Have a bunch of data? Throw it into a neural network (AI), train on your data, sit back with your feet up and a drink in your hand, gain all kinds of insights, something, something, PROFIT! Why use something antiquated like NLLS parametric regression where you have to specify your model and parameters, use a neural network instead (ignore that you have to choose what type of neural network to use, how many layers, how many neurons in each layer, what type of neurons, etc.)! I’m not going to argue that neural networks/deep learning/AI aren’t amazing in what they can do in data science, but their power comes from two things: massive amounts of computing power and storage, and the explosion in the number and quantity of data. If you have a dataset with millions of high-resolution, full-color images, of course you are going to want to use a deep neural network that can pick out all of the nuances. The same holds if you have access to millions of professionally-edited documents with billions and billions of words.

In many applications, however, we don’t have rich, multidimensional data sets, we might only have tens of data points. We might only have two or three data dimensions/variables that we could measure. Now, if you have a lot of categorical variables or qualitative data, a classification algorithm such as logistic regression or other methods will work a lot better. If you do have data with continuous variables, though, and after trying linear regression and polynomial regression, you still feel that you can fit your data better with some other nonlinear model, welcome to NLLS Regression!

I won’t do as a complete explanation of the regression algorithm, what certain measures mean, and how to display them, as I did in my previous post. Rather, I’m going to discuss a few options available as Python modules, how to call these functions, and how to obtain or calculate certain return values. The first two methods come from the popular scientific module SciPy, specifically its optimize submodule, curve_fit and least_squares. The last module we will look at is the LMFit module, a module designed specifically for NLLS Regression applications.

Testing Notes

All testing was performed locally on my personal PC running Windows 10. Python version was 3.8.1 (visible by typing “python –V” at the command prompt), SciPy version was 1.4.1, NumPy version was 1.18.1, and LMFit version was 1.0.0 (access module versions by printing/examining <MODULE NAME>.__version__). I performed all testing using Visual Studio Code with the installed Python extension. Note, when debugging Python in Visual Studio Code (VS Code), once you have the Python extension installed, follow these instructions to setup your debugging configuration. This will create a launch.json file in your code directory. When you have that, if you want to be able to step into the module fitting (Numpy, SciPy, etc.), you need to add the justMyCode option and set it to false. Otherwise, VS Code will not step through any code but your own. My launch.json file for the Python File debugging option section looks like this:

{
    "name": "Python: Current File",
    "type": "python",
    "request": "launch",
    "program": "${file}",
    "console": "integratedTerminal",
    "justMyCode": false // This should be false to debug imported module code
}

Software Installation Notes

I installed Python from the standard CPython site. I then used pip to install all the need modules in the code below.

Simulate Test Data

Before we look at the various fitting algorithms, we will need to generate some test data. I will be using the same model equation to generate and fit this data as my previous article, an exponential decay factor multiplied by a cosine factor:

y = exp(-β1∙x)∙cos(β2∙x) + ε

First, import the required Python modules and their submodules/functions:

import numpy as np
from scipy.optimize import curve_fit, least_squares
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit

# Plotting module
import matplotlib.pyplot as plt

Next, the function that will be used to generate the signal:

def fcn2minExpCos(x, beta1, beta2):
    return np.exp(-beta1*x) * np.cos(beta2*x)

Finally, this section of code creates the data points, generates the noise-free signal, adds randomly distributed noise with a specified standard deviation and a mean of zero (the ε in our model above), and plots both the noise-free signal and the noisy signal.

# Simulate data using the same function we will fit to.
# Generate array of 101 data points from zero to ten in 0.1 increments
x = np.linspace(0, 10.0, num=101)   
Beta1 = 0.5                         # First Beta parameter for the exponential decay
Beta2 = 5                           # Second Beta parameter for the cosine
NumParams = 2                       # Number of model parameters 
StdNoise = 0.1                      # Noise Std Dev
y = fcn2minExpCos(x, Beta1, Beta2)  # Generate the signal values before adding noise

# Generate random noise sampled from a normal (Gaussian) distribution
# are added to the original signal
noiseSamples = np.random.normal(size=len(y), scale=StdNoise)
yNoisy = y + noiseSamples

# Plot the original signal and overlay the noisy signal to show the scale of the noise
plt.plot(x, y, 'b')
plt.plot(x, yNoisy, 'r')
plt.xlabel('x')
plt.ylabel('y (blue) and yNoise (red)')
plt.show()

This will result in a plot similar to this:

Line Plot

Now that we have a set of test data to fit the model to, we will set the starting guess or initial parameter values for our fitting algorithms:

InitialParams = [1., 1.]

curve_fit (scipy.optimize)

The curve_fit algorithm is fairly straightforward with several fundamental input options that returns only two output variables, the estimated parameter values and the estimated covariance matrix. I talk about the usefulness of the covariance matrix in my previous article, and won't go into it further here. The first three input parameters for curve_fit are required, f, x, and y, which are the fitting function, the independent variable x, and the data to be fit (our noisy data, yNoisy). The fitting function for curve_fit is the same function used to generate the data, fcn2minExpCos. The input parameter p0 is the starting guess, which is optional, but we will use the values we specified in the InitialParams array. The sigma and absolute_sigma are optional input parameters that allow you to specify a value of ε, the noise component in our model equation, in order to attempt to obtain a better estimated covariance matrix. This would be useful if we had done a measurement of the added noise for our data. You can add the value set in StdNoise above to see how the values change. check_finite is an optional boolean parameter that makes the algorithm do a specific check on any data values that are Inf or NaN, and throws a specified error if that is the case. The bounds parameter is useful if you want to set a minimum or maximum value (or both) on your estimated model parameters. So, say you know that one of your parameters will never be negative, you can specify a minimum parameter value of 0. There is an example of how to declare the bounds array and pass it to the fit function, but I won't specifically look at it in this article.

The method parameter allows you to specify the fitting algorithm you want to use, with the options being lm (a Levenberg Marquardt algorithm), trf (a trust region algorithm), or dogbox. As the curve_fit documentation states in the notes section, specifying lm calls the SciPy function leastsq whereas the other two methods will call the SciPy function least_squares, the function we will be examining next. We will examine the jac parameter later on when we discuss how to specify a Jacobian value for the fitting algorithms, and the kwargs parameter is if you want to pass any values to your specified fitting function. To call curve_fit on our data, use: :::python fitParams, pcov = curve_fit(fcn2minExpCos, x, yNoisy, p0=InitialParams, method='lm')

I specified lm for the fitting method here, but tested the speeds of all three fitting methods by wrapping the above curve_fit function call with the time method. After doing several calls with each method, here is the average time that each one took:

lm       3 ms
trf     13 ms
dogbox  16 ms

So, from my testing the lm method seems to be over 4 times faster than the other two methods. As I stated above, curve_fit calls the SciPy function leastsq and if you step through the code with the VS Code debugger, in the leastsq code in the file minpack.py (also visible on the Scipy github here), you can see that leastsq calls the MINPACK lmder or lmdif files directly, which are FORTRAN files included with the SciPy module. The least_squares algorithm in the next section also uses MINPACK FORTRAN functions, so we'll revisit this speed testing in the next section. Note: You can't use the lm option if you are providing bounds. The downside of the curve_fit algorithm is that it only returns the basic details from the fitting algorithm, so key measures like Residual Sum of Squares (RSS) or the residual array itself are not returned by the algorithm. The least_squares algorithm does return that information, so let's take a look at that next.

least_squares (scipy.optimize)

SciPy's least_squares function provides several more input parameters to allow you to customize the fitting algorithm even more than curve_fit. In addition to the parameters previously described above, you can customize the tolerances of the fitting algorithm, numerically scale the variables and functions, and select a different loss function, among others. I won't discuss these further, but I will note one option - verbose. For the trf method, this will output a detailed report of certain metrics during the fitting process. The lm method outputs a single statement about the number of times our fit function was evaluated, along with a few other metrics at the last step of fitting and a message about how the algorithm terminated. These values are all defined in the OptmizeResult object returned by the algorithm. Although this object provides a lot more information that the curve_fit algorithm, it still requires a little more work to get some of the key fitting measures I used and introduced in my previous article. Here is how I called the fitting algorithm:

LSOptimResult = least_squares(fcn2minExpCosErrFunc, InitialParams, method='lm', args=(x, yNoisy))

Note, the way that the least_squares function calls the fitting function is slightly different here. The x and y values are provided as extra arguments. Also, the fitting function itself needs to be slightly altered. In curve_fit, we merely pass in an equation for the fitting function f(β, x). The problem that fitting algorithms try to achieve is a minimization of the sum of squared residuals (RSS), with the equation for an individual residual being defined by r = y - f(β, x). In this fit function, we need to define that explicitly (also note how the parameters come in as a single object):

def fcn2minExpCosErrFunc(beta, x, y):
    return (y - (np.exp(-beta[0]*x) * np.cos(beta[1]*x)))

The estimated parameter values found in the OptimizeResult are found in the value of x, which is slightly confusing, since we already we have our independent variable named x. Displaying the value of OptimizeResult.x will give an answer like:

array([0.51724086, 5.00217856])

To get the RSS value, again useful for a variety of regression fitting measures including model selection, you need to sum up the squared values of the residual array:

RSS = sum(LSOptimResult.fun**2)

Calculating the Standard Error of Regression can be achieved with the number of measurements and the number of model parameters:

NumMeas = len(yNoisy)
SER = np.sqrt(RSS/(NumMeas - NumParams))

Number of measurements - number of model parameters is often described as "degrees of freedom". Since we know in this case what the standard deviation of the noise is from generating our data, the value of SER should be close to the original value, 0.1. A good check for any regression fitting problem is to display the residual array to see that is approximately normally distributed:

plt.figure()
hist, bins = np.histogram(LSOptimResult.fun, bins=25)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
plt.show()

This will produce a plot similar to this one:

Histogram

We can examine the standard deviation of this histogram to see if it also is close the original noise value, 0.1.

StdResiduals = np.std(LSOptimResult.fun)

Getting the covariance matrix, which is returned directly in curve_fit, takes a little bit more work:

# Calculate the Q & R values of the Jacobian matrix
q, r = np.linalg.qr(LSOptimResult.jac)
# Calcuate the inverse value of R
rInv = np.linalg.solve(r, np.identity(r.shape[0]))
# Matrix multiply R Inverse by the transpose of R Inverse
JTJInv = np.matmul(rInv, rInv.transpose())
# Multiply this matrix by the squared regression error (the variance)
CovMatrix = SER**2 * JTJInv

The value of this covariance matrix should be similar to this:

array([[7.31914334e-04, 7.79375648e-07],
    [7.79375648e-07, 7.31240831e-04]])

LMFit

At its heart, the fitting algorithms in the LMFit module are essentially wrappers around the SciPy optimization algorithms, including least_squares above. However, LMFit adds a lot of important information around its fitting algorithms. Probably the best feature of LMFit is the excellent documentation provided on their website, with tons of information about all input and output parameters, plenty of explanations about the fitting process, and plenty of detailed code examples. If you are relatively new to NLLS Regression, I recommend taking some time to give a solid read of the documentation, starting with the topic list here. LMFit also tries to achieve better readability during the fitting process, so model parameters aren't just passed in to the function as an array, they are actually part of a Parameters object. So, before I want to call the minimization fitting function, I would declare my model parameters like so:

LMparams = Parameters()
LMparams.add('beta1', value = 1.)
LMparams.add('beta2', value = 1.)

Note, that the initial parameter values are set here, and if you want to add bounds on the parameters, they are declared here, too. This helps out in accessing the parameter values in our model fitting function:

def fcn2minLMExpCosErrFunc(params, x, y):
    beta1 = params['beta1']
    beta2 = params['beta2']
    return (y - (np.exp(-beta1*x) * np.cos(beta2*x)))

LMFit has several options for fitting methods, as seen here, along with a few different options that can be provided at input time. To call the fitting algorithm, we first declare the Minimizer object and pass in our fitting function, input parameter object, and our x and y values. After that we call the minimize function of the Minimizer object, specifying the fitting method.

LMFitmin = Minimizer(fcn2minLMExpCosErrFunc, LMparams, fcn_args=(x, yNoisy))
LMFitResult = LMFitmin.minimize(method='least_squares')

Another benefit of the LMFit module is the amount of information returned by the minimize function, specifically as a MinimizerResult object. RSS (chisqr) and the covariance matrix (covar) are standard output measures found in the object, and many other measures can be found including model selection measures such as the AIC. LMFit provides much more information including functions to estimate the parameter confidence intervals, making it a very valuable module to use.

Algorithm Timing

Providing a lot of information can require additional computation time, making the algorithm take longer, costing computing resources. We already showed that the different fitting methods can vary in the time taken to compute the result. Let's look at how these three algorithms differ in execution speed. Using the time.time() function again to wrap the function calls, if we set the method to the Levenberg-Marquardt algorithm which calls MINPACK through leastsq, with leastsq the method for LMFIT and lm for the other two, the average run times come out to be:

curve_fit:      3 ms
least_squares:  3 ms
LMFit:          9.5 ms

If the same test is performed with the method set to trf for the first two functions, or least_squares for LMFit, which calls the least_squares function with the default trf method:

curve_fit:      15.5 ms
least_squares:  15 ms
LMFit:          21 ms

From this quick test, it looks like LMFit seems to run slower than the SciPy fitting methods, and both the SciPy methods seem to have similar runtimes. This was noticed in a previous issue raised in the LMFit GitHub, where a user commented on this speed difference. I agree with the sentiment of one of the comments there, speed is not the only consideration when it comes to fitting algorithms. Based on these brief tests done on my machine, one, I would always do some quick speed tests on your own machine to make a decision, and two, there is always a tradeoff optimizing for one particular factor. In this case, you are optimizing for speed over detailed reporting and ease of use. I would say that the SciPy least_squares is probably your best bet if you know and understand NLLS Regression fairly well AND you have a very large data set such that speed issues can save you considerable time and money. If you are starting out with NLLS Regression, you are new to Python programming in general, or you don't really care about speed at the moment, LMFit is a nice option.

One last speed note from above - it appears for all three fitting methods above that there is a considerable speed upgrade when using the lm fitting method, which calls MINPACK FORTRAN functions. I would expect this, as FORTRAN is a compiled, low-level language which is optimized for speed. Speaking of speed, let's look at one more option that might also give us some more improvement in that department, based on previous experience.

Jacobian

The last fitting measure that I will look at is the Jacobian matrix/array, which is essentially a matrix of derivatives. In their pursuit of finding a minimum, most NLLS Regression algorithms estimate the derivatives or slopes in order to better estimate which direction to travel to find this minimum. If a Jacobian is provided to the algorithm, instead of having to estimate the slope, it can quickly calculate it, which often leads to less function evaluations and faster run times. I have provided the Jacobian function code for all three fitting algorithms. To use the function in the fitting algorithm, add an input of jac=<FUNCTION NAME> to the method. Note, for some LMFit options, you will use Dfun, instead. Here is the Jacobian to use with curve_fit

def jac2minExpCos(x, beta1, beta2):
    JacobPt1 = -x * np.exp(-beta1*x) * np.cos(beta2*x) # d/da(exp(-a*x)*cos(b*x))
    JacobPt2 = -x * np.exp(-beta1*x) * np.sin(beta2*x) # d/db(exp(-a*x)*cos(b*x))
    return np.transpose([JacobPt1, JacobPt2])

Here is the Jacobian to use with least_squares

def jac2minExpCosErrFunc(beta, x, y=None):
    JacobPt1 = -x * np.exp(-beta[0]*x) * np.cos(beta[1]*x) # d/da(exp(-a*x)*cos(b*x))
    JacobPt2 = -x * np.exp(-beta[0]*x) * np.sin(beta[1]*x) # d/db(exp(-a*x)*cos(b*x))
    return np.transpose([JacobPt1, JacobPt2])

And here is the Jacobian to use with LMFit

def jac2minLMExpCosErrFunc(params, x, y):
    beta1 = params['beta1']
    beta2 = params['beta2']
    JacobPt1 = -x * np.exp(-beta1*x) * np.cos(beta2*x) # d/da(exp(-a*x)*cos(b*x))
    JacobPt2 = -x * np.exp(-beta1*x) * np.sin(beta2*x) # d/db(exp(-a*x)*cos(b*x))
    return np.transpose([JacobPt1, JacobPt2])

For the least_squares function, adding the Jacobian reduces the number of function evaluations from 40-45 to 13-15 for the lm method, giving an average runtime reduction from 3 ms to 2 ms. LMFit was reduced from 9.5 to 5, while curve_fit did not really improve all that much. Also got speed improvments when testing the trf method, as well. For the trf method in least_squares the average time was reduced from 15 ms to 8.5 ms. Thus, providing a Jacobian is another way to get more speed improvements out of your fitting algorithm. It's not always easy to calculate a Jacobian. You can manually do it, if you know how, use Wolfram Alpha, or you can try doing it in Python. I was able to do it using the Python module SymPy. Here is the code I used:

from sympy import *
b1, b2, x = symbols('b1 b2 x')
init_printing(use_unicode=True)
Modeldb1 = diff((exp(-b1*x) * cos(b2*x)), b1)
Modeldb2 = diff((exp(-b1*x) * cos(b2*x)), b2)
print(Modeldb1)
print(Modeldb2)

The output from printing was

-x*exp(-b1*x)*cos(b2*x)
-x*exp(-b1*x)*sin(b2*x)

Which, after adding the numpy specifier np. equals the values I put in my Jacobian function. There is also a Jacobian method in the Python module numdifftools. Note, that it may be possible to calculate the Jacobian on the fly inside your function, but this will probably take much longer than having no Jacobian, which takes away the benefit of providing the Jacobian in the first place. Again, I would experiment with your particular model and data.

Script

I have uploaded all code found on this article to my Github, with the script available here. Feel free to download and play around with the code, algorithms, and options. Thanks for reading.