Matrix completion is a procedure for imputing the missing elements in matrices by using the information of observed elements. This procedure can be visualized as:
Matrix completion has attracted a lot of attention, it is widely applied in:
A computationally efficient R package, eimpute is developed for matrix completion. In eimpute, matrix completion problem is solved by iteratively performing low-rank approximation and data calibration, which enjoy two admirable advantages:
Compare eimpute and softimpute in systhesis datasets \(X_{m \times m}\) with \(p\) proportion missing observations. The square matrix \(X_{m \times m}\) is generated by \(X = UV + \epsilon\), where \(U\) and \(V\) are \(m \times r\), \(r \times n\) matrices whose entries are \(i.i.d.\) sampled standard normal distribution, \(\epsilon \sim N(0, r/3)\).
In high dimension case, als method in softimpute is a little faster than eimpute in low proportion of missing observations, as the proportion of missing observations increase, rsvd method in eimpute have a better performance than softimpute in time cost and test error. Compare with two method in **eimpute*, rsvd method is better than tsvd in time cost.
Install the stable version from CRAN:
Install the development version from github:
We start with a toy example. Let us generate a small matrix with some values missing via incomplete.generator function.
m <- 6
n <- 5
r <- 3
x_na <- incomplete.generator(m, n, r)
x_na
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -0.8269428 1.2228586 NA NA NA
#> [2,] -2.2410010 4.5095165 NA NA NA
#> [3,] 0.4499102 NA -0.2818085 0.7718102 -0.8364048
#> [4,] NA 1.7167365 0.9480745 NA 3.5680208
#> [5,] NA 0.7240437 NA NA 0.2633712
#> [6,] NA -2.8879249 NA 1.2027552 NA
Use eimpute function to impute missing values.
x_impute <- eimpute(x_na, r)
x_impute[["x.imp"]]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -0.8269428 1.2228586 0.19035820 0.9514541 0.2994880
#> [2,] -2.2410010 4.5095165 0.39560039 0.7295574 0.4911418
#> [3,] 0.4499102 -1.2083884 -0.28180850 0.7718102 -0.8364048
#> [4,] -0.3408353 1.7167365 0.94807452 0.1835412 3.5680208
#> [5,] -0.3669454 0.7240437 0.11988844 0.3294654 0.2633712
#> [6,] 1.3875965 -2.8879249 0.01871091 1.2027552 0.4512052