Introduction
Eigenvalue decomposition is a commonly used technique in numerous statistical problems. For example, principal component analysis (PCA) basically conducts eigenvalue decomposition on the sample covariance of a data matrix: the eigenvalues are the component variances, and eigenvectors are the variable loadings.
In R, the standard way to compute eigenvalues is the
eigen()
function. However, when the matrix becomes large,
eigen()
can be very time-consuming: the complexity to
calculate all eigenvalues of a \(n \times
n\) matrix is \(O(n^3)\).
While in real applications, we usually only need to compute a few
eigenvalues or eigenvectors, for example to visualize high dimensional
data using PCA, we may only use the first two or three components to
draw a scatterplot. Unfortunately in eigen()
, there is no
option to limit the number of eigenvalues to be computed. This means
that we always need to do the full eigen decomposition, which can cause
a huge waste in computation.
The same thing happens in Singular Value Decomposition (SVD). It is often the case that only a Partial SVD or Truncated SVD is needed, and moreover the matrix is usually stored in sparse format. Base R does not provide functions suitable for these special needs.
And this is why the RSpectra package was developed. RSpectra provides an R interface to the Spectra library, which is used to solve large scale eigenvalue problems. The core part of Spectra and RSpectra was written in efficient C++ code, and they can handle large scale matrices in either dense or sparse formats.
Eigenvalue Problem
Basic Usage
The RSpectra package provides functions
eigs()
and eigs_sym()
to calculate eigenvalues
of general and symmetric matrices respectively. If the matrix is known
to be symmetric, eigs_sym()
is preferred since it
guarantees that the eigenvalues are real.
To obtain eigenvalues of a square matrix A
, simply call
the eigs()
or eigs_sym()
function, tell it how
many eigenvalues are requested (argument k
), and which ones
are going to be selected (argument which
). By default,
which = "LM"
means to pick the eigenvalues with the largest
magnitude (modulus for complex numbers and absolute value for real
numbers).
Below we first generate some random matrices and display some of the eigenvalues and eigenvectors:
set.seed(123)
= 100 # matrix size
n = 5 # number of eigenvalues to calculate
k # Some random data
= matrix(rnorm(n^2), n)
M # Make it symmetric
= crossprod(M)
A # Show its largest 5 eigenvalues and associated eigenvectors
head(eigen(A)$values, 5)
## [1] 391.3649 367.1143 353.5425 322.6301 315.4341
head(eigen(A)$vectors[, 1:5])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.12981428 -0.011305031 -0.0001022712 0.088485329 0.05035607
## [2,] -0.06690188 0.001012109 0.0103448033 -0.009411417 -0.07992121
## [3,] -0.05333064 -0.057256548 0.0998913793 -0.171712454 0.01334678
## [4,] 0.13423510 -0.165303749 0.0076458165 -0.006789840 0.10246642
## [5,] -0.13023611 -0.019154947 0.0425667905 -0.160273074 -0.06246803
## [6,] -0.06632982 -0.053457170 0.0323196492 -0.074827181 -0.16365506
Now we use RSpectra to directly obtain the largest 5 eigenvalues:
library(RSpectra)
= eigs_sym(A, k, which = "LM") # "LM" is the default
res $values res
## [1] 391.3649 367.1143 353.5425 322.6301 315.4341
head(res$vectors)
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.12981428 0.011305031 0.0001022712 0.088485329 0.05035607
## [2,] 0.06690188 -0.001012109 -0.0103448033 -0.009411417 -0.07992121
## [3,] 0.05333064 0.057256548 -0.0998913793 -0.171712454 0.01334678
## [4,] -0.13423510 0.165303749 -0.0076458165 -0.006789840 0.10246642
## [5,] 0.13023611 0.019154947 -0.0425667905 -0.160273074 -0.06246803
## [6,] 0.06632982 0.053457170 -0.0323196492 -0.074827181 -0.16365506
If only eigenvalues are requested, the retvec
option can
be used:
eigs_sym(A, k, opts = list(retvec = FALSE))
## $values
## [1] 391.3649 367.1143 353.5425 322.6301 315.4341
##
## $vectors
## NULL
##
## $nconv
## [1] 5
##
## $niter
## [1] 4
##
## $nops
## [1] 59
Matrix in Sparse Format
For really large data, the matrix is usually stored in sparse format.
RSpectra supports the dgCMatrix
and
dgRMatrix
matrix types defined in the
Matrix package.
library(Matrix)
= as(M, "dgCMatrix")
Msp = as(A, "dgRMatrix")
Asp
eigs(Msp, k, which = "LR", opts = list(retvec = FALSE))$values # largest real part
## [1] 9.685399-1.964917i 9.685399+1.964917i 9.696766+0.000000i 8.270271+1.751652i
## [5] 8.270271-1.751652i
eigs_sym(Asp, k, opts = list(retvec = FALSE))$values
## [1] 391.3649 367.1143 353.5425 322.6301 315.4341
An even more general way to specify the matrix A
is to
define a function that calculates A %*% x
for any vector
x
. This representation is called the Function
Interface, which can also be regarded as a sparse format, since
users do not need to store the matrix A
, but only the
operator x -> Ax
. For example, to represent a diagonal
matrix \(diag(1, 2, \ldots, 10)\) and
calculate its eigenvalues, we can define the function f
and
call the eigs_sym()
function:
# Implicitly define the matrix by a function that calculates A %*% x
# Below represents a diagonal matrix whose elements are stored in the `args` parameter
= function(x, args)
f
{# diag(args) %*% x == x * args
return(x * args)
}eigs_sym(f, k = 3, n = 10, args = 1:10)
## $values
## [1] 10 9 8
##
## $vectors
## [,1] [,2] [,3]
## [1,] 2.307346e-16 1.087717e-16 -1.186185e-17
## [2,] 9.737491e-17 8.384949e-17 5.104424e-16
## [3,] -6.923647e-17 -7.659888e-17 5.290907e-17
## [4,] -6.640738e-18 -7.632783e-17 -7.285839e-17
## [5,] -8.239937e-18 8.673617e-18 2.775558e-17
## [6,] 1.368195e-16 2.836273e-16 -3.677614e-16
## [7,] 3.652440e-16 5.412879e-16 2.547875e-16
## [8,] -8.036987e-17 9.469422e-16 1.000000e+00
## [9,] 2.618602e-17 -1.000000e+00 9.074772e-16
## [10,] 1.000000e+00 -2.263780e-16 3.543410e-16
##
## $nconv
## [1] 3
##
## $niter
## [1] 1
##
## $nops
## [1] 10
n
gives the dimension of the matrix, and
args
contains user data that will be passed to
f
.
Smallest Eigenvalues
Sometimes you may need to calculate the smallest (in magnitude)
k
eigenvalues of a matrix. A direct way to do so is to use
eigs(..., which = "SM")
. However, the algorithm of
RSpectra is good at finding large eigenvalues rather
than small ones, so which = "SM"
tends to require much more
iterations or even may not converge.
The recommended way to retrieve the smallest eigenvalues is to utilize the spectral transformation: If \(A\) has eigenvalues \(\lambda_i\), then \((A-\sigma I)^{-1}\) has eigenvalues \(1/(\lambda_i-\sigma)\). Therefore, we can set \(\sigma = 0\) and calculate the largest eigenvalues of \(A^{-1}\), whose reciprocals are exactly the smallest eigenvalues of \(A\).
eigs()
implements the spectral transformation via the
parameter sigma
, so the following code returns the smallest
eigenvalues of A
. Note that eigs()
always
returns eigenvalues in the original scale (i.e., \(\lambda_i\) instead of \(1/(\lambda_i-\sigma)\)).
eigs_sym(A, k, which = "LM", sigma = 0)$values # recommended way
## [1] 0.39278178 0.21130356 0.11411209 0.02151974 0.00473068
eigs_sym(A, k, which = "SM")$values # not recommended
## [1] 0.39278178 0.21130356 0.11411209 0.02151974 0.00473068
More generally, the option which = "LM", sigma = s
selects eigenvalues that are closest to s
.
Truncated SVD
Truncated SVD (or Partial SVD) is frequently used in text mining and image compression, which computes the leading singular values and singular vectors of a rectangular matrix.
RSpectra has the svds()
function to
compute Truncated SVD:
set.seed(123)
= 100
m = 20
n = 5
k = matrix(rnorm(m * n), m)
A
str(svds(A, k, nu = k, nv = k))
## List of 5
## $ d : num [1:5] 14.1 13.8 13 11.8 11.3
## $ u : num [1:100, 1:5] 0.1616 0.0871 0.0536 0.0537 0.1271 ...
## $ v : num [1:20, 1:5] -0.186 0.116 0.27 -0.273 0.1 ...
## $ niter: num 1
## $ nops : num 45
Similar to eigs()
, svds()
supports sparse
matrices:
= as(A, "dgCMatrix")
Asp svds(Asp, k, nu = 0, nv = 0)
## $d
## [1] 14.13079 13.76937 12.95764 11.82205 11.30081
##
## $u
## NULL
##
## $v
## NULL
##
## $niter
## [1] 1
##
## $nops
## [1] 40
svds()
also has the function interface: users provide
functions A
and Atrans
that calcualtes \(Ax\) and \(A'x\) respectively, and
svds()
will compute the Truncated SVD.
= function(x, args) as.numeric(args %*% x)
Af = function(x, args) as.numeric(crossprod(args, x))
Atransf str(svds(Af, k, Atrans = Atransf, dim = c(m, n), args = Asp))
## List of 5
## $ d : num [1:5] 14.1 13.8 13 11.8 11.3
## $ u : num [1:100, 1:5] 0.1616 0.0871 0.0536 0.0537 0.1271 ...
## $ v : num [1:20, 1:5] -0.186 0.116 0.27 -0.273 0.1 ...
## $ niter: num 1
## $ nops : num 45
Performance
RSpectra is written in efficient C++ code. This page gives some
benchmarking results of svds()
with other similar functions
available in R and extension packages.