Matrix Factorization — Deep Dive

Suhas Aithal
Analytics Vidhya
Published in
5 min readDec 11, 2019

--

Image Sources — TechCrunch, Netflix & Kdnuggets

This article is the continuation of Matrix Factorization for Collaborative Filtering.

Here, we take an example of user-item matrix A and try to understand how the factorization and prediction take place. The implementation of this is done in python.

Matrix A contains all users represented in rows and all movies represented in columns as shown in Figure 1.

Figure 1
Figure 2

Performing Matrix Factorization (MF) on matrix A, we get the following equations.

Figure 3
Figure 4
Figure 5

Here, B and C are matrices of dimensions (n x d) and (m x d) respectively. Here, n is the number of users, m is the number of movies, and d is the hyperparameter representing the dimension of user and movie vectors in the matrices B and C.

The objective function of MF is to find the values of B and C such that the equation in Figure 3 is satisfied. It is as shown in Figure 6.

Figure 6

As the objective function in Figure 6 is a convex optimization problem, we can use Gradient Descent (GD) to obtain the optimum values of B and C. The equation to update the values of Bᵢ and Cⱼ are as shown in Figure 7.

Figure 7

Let us take an example to understand it better. Let the number of users be 3 and the number of movies be 4 and matrix A be represented as shown in Figure 8.

Figure 8

To reiterate our objective, based on the non-zero values of A we have to predict the zero values of A (whether those values can be kept at 0 or should they be 1) using MF.

Let us initialize matrices B and C by taking samples randomly from the normal distribution, as shown below.

B = np.random.normal(scale=1.0, size=(n,d)) 
C = np.random.normal(scale=1.0, size=(m,d))

The following code is used to perform the update rule provided in equations (5) and (6) in Figure 7.

for i in range(n):
for j in range(m):
if(A[i][j] > 0): # Consider only non-zero elements of A
B_new[i] = B[i]
B[i] += r * (2 * C[j]) * (A[i][j] - np.dot(B[i], C[j]))
C[j] += r * (2 * B_new[i]) * (A[i][j] - np.dot(B_new[i], C[j]))

Upon updating all the values of the vectors Bᵢ and Cⱼ, we find the predicted value of Aᵢⱼ and the Square Error value between the actual value of Aᵢⱼ and the predicted value of Aᵢⱼ. The following code describes the same.

rmse += np.power(A[i][j] - np.dot(B[i], C[j]), 2)

Once the squared errors for all the non-zero values of Aᵢⱼ are calculated, the Root Mean Squared Error (RMSE) for all these values is calculated and appended to an array that keeps track of the RMSE values at each iteration. The following code describes the same.

rmse_arr.append(np.round(np.sqrt(np.mean(rmse))))

The above process is run until the desired RMSE is obtained. The final predicted values of A (Aₚ) is as shown in Figure 9.

Figure 9
Figure 10

If we compare the non-zero values in A [A₁₁, A₁₂, A₁₃, A₂₁, A₂₂, A₂₄, A₃₁] to that in Aₚ, the difference between them as shown in Figure 10 is very small. This implies that our model (the values of the matrices B and C) is predicting the non-zero values (the movies that the users have watched) correctly. The same model is also giving us some values at the locations where the values of A are zero [A₁₄, A₂₃, A₃₂, A₃₃, A₃₄]. These are our predicted values.

However, the predicted values are in fractional form whereas we need them to be a 1 or a 0. To convert these decimal numbers to a binary form, we can use a threshold value where the values lying below this threshold will be considered as 0 and the values lying above will be considered as 1. The value of the threshold can be determined either by a domain expert or by setting up a cross-validation system.

If we keep the threshold as 0.5 for the predicted values, we get the results as shown in Figure 11.

Figure 11

The obtained predicted values are not absolute, it is bound to be wrong. Hence, to improve the accuracy of the prediction, the model has to be trained regularly with updated non-zero values of A. Also, our model overfits to matrix A and will not generalize as we have not added any regularization parameter in the objective function specified in Figure 6.

The implementation of the same using regularization can be found here.

References

  1. http://www.albertauyeung.com/post/python-matrix-factorization/

--

--