Automatic analysis for customer activity trends by using linear regression
When your business deals with a big number of customers — it’s a hard task to keep all of them satisfied and be sure that all of them will continue to buy your products or services. Even if your revenue grows — this does not mean that everything is good, as some new customers are arriving, but old customers can get some issues and decrease their activity, or even go away.
To keep your business maximally growing — it’s very important to make sure that most customers are satisfied and keep their activity at least on a stable level. If you are working with tens of permanent customers — it’s not a big problem and you can monitor this manually by watching monthly sales reports for each customer. But if this is hundreds or thousands of permanent customers — it’s impossible to do this manually. Linear regression can be helpful for solving this problem.
In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (Y) and one or more explanatory variables (X).
Let’s imagine that number of sales per month for some customers collected in the table where column X contains serial numbers of months, Y — numbers of sales to this customer per month:
We can describe the underlying relationship between Yi and Xi involving this error term Ei by
Yi=α+ β Xi +Ei
This relationship between the true underlying parameters α and β and the data points is called a linear regression model.
Graphically linear regression is represented in the figure below:
As you can see — the linear regression line describing the data set in the most optimal way.
But how can this be used for trend analysis? Regression coefficients can be helpful here. Coefficient α here means the intersection of the regression line with the y-axis. Coefficient β means slope — tangence of the degree between the x-axis and the regression line. So if a customer will have positive trends — coefficient β will have a positive value, if the trend is negative — coefficient β will have a negative value.
So, to determine trends for the customer — it’s enough to collect data for his activity grouped by time intervals and calculate regression coefficients for this data set. This can be done very easily on the Python:
import numpy as np
import matplotlib.pyplot as plt# get customer operations number grouped by montes as dictionary [‘x’: [4, 5, 6, ...], ‘y’: [12, 63, 88]]data = getPaymentsByMonths(accountId)x = np.array(data['x'])
y = np.array(data['y'])# draw data points on the chart
plt.plot(x, y, 'o')# calculate regression coefficients
b, a = np.polyfit(x, y, 1)# draw regression line on the chart
plt.plot(x, b*x + a)# display chart
Let’s compare the regression analysis results for 2 customers on basis of the data about the number of operations per month:
As you can see in the figures above — customers with a positive trend will have a positive value for the regression line coefficient, and customers with a negative trend — will have a negative value. The regression coefficient close to 0 means that a customer has stable activity.
We are using this approach widely in the Widr Pay to improve our user experience. We are displaying a regression slope coefficient for each user in our admin area like shown in fig. 4.
Just one thing which must be taken into account — take into account just data for time periods which are finished already. This means that if you for example do analysis on the basis of monthly sales reports — skip the current month if it’s not finished yet, as data for this month will be not representative and can influence negatively the regression line.