# Automatic analysis for customer activity trends by using linear regression

*When your business deals with a big number of customers — it’s a hard task to keep all of them satisfied and be sure that all of them will continue to buy your products or services. Even if your revenue grows — this does not mean that everything is good, as some new customers are arriving, but old customers can get some issues and decrease their activity, or even go away.*

*To keep your business maximally growing — it’s very important to make sure that most customers are satisfied and keep their activity at least on a stable level. If you are working with tens of permanent customers — it’s not a big problem and you can monitor this manually by watching monthly sales reports for each customer. But if this is hundreds or thousands of permanent customers — it’s impossible to do this manually. Linear regression can be helpful for solving this problem.*

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (*Y*) and one or more explanatory variables (*X*).

Let’s imagine that number of sales per month for some customers collected in the table where column *X* contains serial numbers of months, *Y* — numbers of sales to this customer per month:

We can describe the underlying relationship between *Yi* and *Xi* involving this error term *Ei* by

*Yi=α+* *β Xi +Ei*

This relationship between the true underlying parameters *α* and *β* and the data points is called a linear regression model.

Graphically linear regression is represented in the figure below:

As you can see — the linear regression line describing the data set in the most optimal way.

But how can this be used for trend analysis? Regression coefficients can be helpful here. Coefficient *α* here means the intersection of the regression line with the *y-axis*. Coefficient *β* means slope — tangence of the degree between the *x-axis* and the regression line. So if a customer will have positive trends — coefficient *β* will have a positive value, if the trend is negative — coefficient *β* will have a negative value.

So, to determine trends for the customer — it’s enough to collect data for his activity grouped by time intervals and calculate regression coefficients for this data set. This can be done very easily on the Python:

import numpy as np

import matplotlib.pyplot as plt# get customer operations number grouped by montes as dictionary [‘x’: [4, 5, 6, ...], ‘y’: [12, 63, 88]]data = getPaymentsByMonths(accountId)x = np.array(data['x'])

y = np.array(data['y'])# draw data points on the chart

plt.plot(x, y, 'o')# calculate regression coefficients

b, a = np.polyfit(x, y, 1)# draw regression line on the chart

plt.plot(x, b*x + a)# display chart

plt.show()

Let’s compare the regression analysis results for 2 customers on basis of the data about the number of operations per month:

As you can see in the figures above — customers with a positive trend will have a positive value for the regression line coefficient, and customers with a negative trend — will have a negative value. The regression coefficient close to 0 means that a customer has stable activity.

We are using this approach widely in the Widr Pay to improve our user experience. We are displaying a regression slope coefficient for each user in our admin area like shown in fig. 4.

Just one thing which must be taken into account — take into account just data for time periods which are finished already. This means that if you for example do analysis on the basis of monthly sales reports — skip the current month if it’s not finished yet, as data for this month will be not representative and can influence negatively the regression line.