matsindf_apply
is a powerful and versatile function that
enables analysis of data frames by applying FUN
in helpful
ways. The function is called matsindf_apply
, because it can
be used to apply FUN
to a matsindf
data frame,
a data frame that contains matrices as individual entries in a data
frame. (A matsindf
data frame can be created by calling
collapse_to_matrices
, as demonstrated below.)
But matsindf_apply
can apply FUN
across
much more: data frames of single numbers, lists of matrices, lists of
single numbers, and individual numbers. This vignette demonstrates
matsindf_apply
, starting with simple examples and
proceeding to sophisticated analyses.
The basis of all analyses conducted with matsindf_apply
is a function (FUN
) to be applied across data.
FUN
must return a named list of variables, its result. Here
is an example function that both adds and subtracts is arguments,
a
and b
and returns a list containing its
result, c
and d
.
<- function(a, b){
example_fun return(list(c = sum_byname(a, b), d = difference_byname(a, b)))
}
Similar to lapply
and its siblings, additional
argument(s) to matsindf_apply
include the data over which
FUN
is to be applied. These arguments can, in the first
instance, be supplied as named arguments to the ...
argument of matsindf_apply
. The ...
arguments
to matsindf_apply
are passed to FUN
according
to their names. In this case, the output of matsindf_apply
is the the named list returned by FUN
.
matsindf_apply(FUN = example_fun, a = 2, b = 1)
#> $c
#> [1] 3
#>
#> $d
#> [1] 1
Passing an additional argument (z = 2
) causes the
familiar unused argument
error, because
example_fun
does not have a z
argument.
tryCatch(
matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
error = function(e){e}
)#> <simpleError in FUN(...): unused argument (z = 2)>
Failing to pass a needed argument (b = 1
) causes the
familiar argument X is missing
error, because
example_fun
requires a value for b
.
tryCatch(
matsindf_apply(FUN = example_fun, a = 2),
error = function(e){e}
)#> <simpleError in sum_byname(a, b): argument "b" is missing, with no default>
(If example_fun
tolerated a missing argument, no such
error would be created.)
Alternatively, arguments to FUN
can be given in a named
list to the first argument to matsindf_apply
(.dat
). When a value is assigned to .dat
, the
return value from matsindf_apply
contains all named
variables in .dat
(in this case both a
and
b
) in addition to the results provided by FUN
(in this case both c
and d
).
matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $c
#> [1] 3
#>
#> $d
#> [1] 1
Extra variables are tolerated in .dat
, because
.dat
is considered to be a store of data from which
variables can be drawn as needed.
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $z
#> [1] 42
#>
#> $c
#> [1] 3
#>
#> $d
#> [1] 1
In contrast, named arguments to ...
are specified by the
user, so including an extra variable is considered an error, as shown
above.
If a named argument is supplied by both .dat
and
...
, the argument in ...
takes precedence,
overriding the argument in .dat
.
matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
#> $a
#> [1] 10
#>
#> $b
#> [1] 1
#>
#> $c
#> [1] 11
#>
#> $d
#> [1] 9
When supplying both .dat
and
...
, ...
can contain named strings which are
interpreted as mappings from item names in .dat
to
arguments in the signature of FUN
. In the example below,
a = "z"
indicates that argument a
to
FUN
should be supplied by item z
in
.dat
.
matsindf_apply(list(a = 2, b = 1, z = 42),
FUN = example_fun, a = "z")
#> $a
#> [1] 2
#>
#> $b
#> [1] 1
#>
#> $z
#> [1] 42
#>
#> $c
#> [1] 43
#>
#> $d
#> [1] 41
If a named argument appears in both .dat
and the output
of FUN
, a name collision occurs in the output of
matsindf_apply
, and a warning is issued.
tryCatch(
matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
warning = function(w){w}
)#> <simpleWarning in matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun): name collision in matsindf_apply: c>
.dat
can be a list (as shown in several examples above),
but it can also be a data frame.
<- data.frame(a = 2:4, b = 1:3)
df matsindf_apply(df, FUN = example_fun)
#> a b c d
#> 1 2 1 3 1
#> 2 3 2 5 1
#> 3 4 3 7 1
Furthermore, matsindf_apply
works with a
matsindf
data frame, a data frame wherein each entry in the
data frame is a matrix. To demonstrate use of
matsindf_apply
with a data frame, we’ll construct a simple
matsindf
data frame (midf
) using functions in
this package.
# Create a tidy data frame containing data for matrices
<- data.frame(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
tidy matnames = c(rep("U", 8), rep("V", 8)),
matvals = c(1:4, 11:14, 21:24, 31:34),
rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2),
rep(c(rep("i1", 2), rep("i2", 2)), 2)),
colnames = c(rep(c("i1", "i2"), 4),
rep(c("p1", "p2"), 4))) %>%
mutate(
rowtypes = case_when(
== "U" ~ "product",
matnames == "V" ~ "industry",
matnames TRUE ~ NA_character_
),coltypes = case_when(
== "U" ~ "industry",
matnames == "V" ~ "product",
matnames TRUE ~ NA_character_
)
)
tidy#> Year matnames matvals rownames colnames rowtypes coltypes
#> 1 2017 U 1 p1 i1 product industry
#> 2 2017 U 2 p1 i2 product industry
#> 3 2017 U 3 p2 i1 product industry
#> 4 2017 U 4 p2 i2 product industry
#> 5 2018 U 11 p1 i1 product industry
#> 6 2018 U 12 p1 i2 product industry
#> 7 2018 U 13 p2 i1 product industry
#> 8 2018 U 14 p2 i2 product industry
#> 9 2017 V 21 i1 p1 industry product
#> 10 2017 V 22 i1 p2 industry product
#> 11 2017 V 23 i2 p1 industry product
#> 12 2017 V 24 i2 p2 industry product
#> 13 2018 V 31 i1 p1 industry product
#> 14 2018 V 32 i1 p2 industry product
#> 15 2018 V 33 i2 p1 industry product
#> 16 2018 V 34 i2 p2 industry product
# Convert to a matsindf data frame
<- tidy %>%
midf group_by(Year, matnames) %>%
collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") %>%
spread(key = "matnames", value = "matvals")
# Take a look at the midf data frame and some of the matrices it contains.
midf#> Year U V
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34
$U[[1]]
midf#> i1 i2
#> p1 1 2
#> p2 3 4
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
$V[[1]]
midf#> p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "industry"
#> attr(,"coltype")
#> [1] "product"
With midf
in hand, we can demonstrate use of tidyverse
-style
functional programming to perform matrix algebra within a data frame.
The functions of the matsbyname
package (such as
difference_byname
below) can be used for this purpose.
<- midf %>%
result mutate(
W = difference_byname(transpose_byname(V), U)
)
result#> Year U V W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
$W[[1]]
result#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
$W[[2]]
result#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
This way of performing matrix calculations works equally well within
a 2-row matsindf
data frame (as shown above) or a 1000-row
matsindf
data frame.
Users can write their own functions using
matsindf_apply
. A flexible calc_W
function can
be written as follows.
<- function(.DF = NULL, U = "U", V = "V", W = "W"){
calc_W # The inner function does all the work.
<- function(U_mat, V_mat){
W_func # When we get here, U_mat and V_mat will be single matrices or single numbers,
# not a column in a data frame or an item in a list.
# Calculate W_mat from the inputs U_mat and V_mat.
<- difference_byname(transpose_byname(V_mat), U_mat)
W_mat # Return a named list.
list(W_mat) %>% magrittr::set_names(W)
}# The body of the main function consists of a call to matsindf_apply
# that specifies the inner function
matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}
This style of writing matsindf_apply
functions is
incredibly versatile, leveraging the capabilities of both the
matsindf
and matsbyname
packages. (Indeed, the
Recca
package uses matsindf_apply
heavily and
is built upon the functions in the matsindf
and
matsbyname
packages.)
Functions written like calc_W
can operate in ways
similar to matsindf_apply
itself. To demonstrate, we’ll use
calc_W
in all the ways that matsindf_apply
can
be used, going in the reverse order to our demonstration of the
capabilities of matsindf_apply
above.
calc_W
can be used as a specialized mutate
function that operates on matsindf
data frames.
%>% calc_W()
midf #> Year U V W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
The added column could be given a different name from the default
(“W
”) using the W
argument.
%>% calc_W(W = "W_prime")
midf #> Year U V W_prime
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
As with matsindf_apply
, column names in
midf
can be mapped to the arguments of calc_W
by the arguments to calc_W
.
%>%
midf rename(X = U, Y = V) %>%
calc_W(U = "X", V = "Y")
#> Year X Y W
#> 1 2017 1, 3, 2, 4 21, 23, 22, 24 20, 19, 21, 20
#> 2 2018 11, 13, 12, 14 31, 33, 32, 34 20, 19, 21, 20
calc_W
can operate on lists of single matrices, too.
This approach works, because the default values for the U
and V
arguments to calc_W
are “U
”
and “V
”, respectively. The input list members (in this case
midf$U[[1]]
and midf$V[[1]]
) are returned with
the output.
calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
#> $U
#> i1 i2
#> p1 1 2
#> p2 3 4
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
#>
#> $V
#> p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "industry"
#> attr(,"coltype")
#> [1] "product"
#>
#> $W
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
It may be clearer to name the arguments as required by the
calc_W
function without wrapping in a list first, as shown
below. But in this approach, the input matrices are not returned with
the output.
calc_W(U = midf$U[[1]], V = midf$V[[1]])
#> $W
#> i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "product"
#> attr(,"coltype")
#> [1] "industry"
calc_W
can operate on data frames containing single
numbers.
data.frame(U = c(1, 2), V = c(3, 4)) %>% calc_W()
#> U V W.1 W.2
#> 1 1 3 2 2
#> 2 2 4 2 2
Finally, calc_W
can be applied to single numbers, and
the result is 1x1 matrix.
calc_W(U = 2, V = 3)
#> $W
#> [,1]
#> [1,] 1
This vignette demonstrated use of the versatile
matsindf_apply
function. Inputs to
matsindf_apply
can be
matsindf_apply
can be used for programming, and
functions constructed as demonstrated above share characteristics with
matsindf_apply
:
mutate
operators,
and