library(mustashe)
The following is the actual code for the stash()
function, the main function of the ‘mustashe’ package. I have only added a few more comments for clarification.
<- function(var, code, depends_on = NULL) {
stash
# Make sure the stashing directory ".mustashe" is available.
check_stash_dir()
# Deparse and format the code.
<- deparse(substitute(code))
deparsed_code <- format_code(deparsed_code)
formatted_code
# Make sure the `var` and `code` are not `NULL`.
if (is.null(var)) stop("`var` cannot be NULL")
if (formatted_code == "NULL") stop("`code` cannot be NULL")
# Make a new hash table.
<- make_hash_table(formatted_code, depends_on)
new_hash_tbl
# if the variable has been stashed:
# if the hash tables are equivalent:
# load the stored variable
# else:
# make a new stash
# else:
# make a new stash
if (has_been_stashed(var)) {
<- get_hash_table(var)
old_hash_tbl if (hash_tables_are_equivalent(old_hash_tbl, new_hash_tbl)) {
message("Loading stashed object.")
load_variable(var)
else {
} message("Updating stash.")
new_stash(var, formatted_code, new_hash_tbl)
}else {
} message("Stashing object.")
new_stash(var, formatted_code, new_hash_tbl)
}
invisible(NULL)
}
Overall, I believe the logic is quite simple. The steps that the stash()
function follows, further explained in the following sections, are:
The first step taken by the stash()
function is to deparse and format the code.
Deparsing the code means to turn the unevaluated expression into a string. The deparsing is done by passing the code immediately to substitute()
and deparse()
. This must be done immediately, else the code will be evaluated. The substitute()
function “returns the parse tree for the (unevaluated) expression expr
, substituting any variables bound in env.”
substitute(x <- 1)
#> x <- 1
The deparse()
function “Turn[s] unevaluated expressions into character strings.” Paired with substitute()
, it returns a string of the unevaluated code.
deparse(substitute(x <- 1))
#> [1] "x <- 1"
With the code now as a string, it is formatted using the tidy_source()
function from ‘formatR’. An internal function in ‘mustashe’, format_code()
handles this process:
<- function(code) {
format_code <- formatR::tidy_source(
fmt_code text = code,
comment = FALSE,
blank = FALSE,
arrow = TRUE,
brace.newline = FALSE,
indent = 4,
wrap = TRUE,
output = FALSE,
width.cutoff = 80
$text.tidy
)paste(fmt_code, sep = "", collapse = "\n")
}
format_code("x <- 2")
#> [1] "x <- 2"
The purpose of formatting the code is so any stylistic changes to the code
input do not affect the hash table. To demonstrate this, notice how the output from format_code()
is the same between the two different code examples.
format_code("x=2")
#> [1] "x <- 2"
format_code(("x <- 2 # a comment"))
#> [1] "x <- 2"
The hash table is a two-column table with the name and hash value of the code and any (optional) dependencies.
The hashing is handled by the ‘digest’ package. It takes a value and reproducibly produces a unique hash value.
::digest("mustashe")
digest#> [1] "ac2aad9fdb730500c56009bff6154a7e"
A hash value is made for the code and for any of the dependencies linked to the object. This process is handled by the make_hash_table(code, depends_on)
internal function.
To tell if the code or dependencies have changed, the new hash table and stashed hashed table are compared. The function underlying this process is all.equal()
from base R. This function compares two objects and “If they are different, [a] comparison is still made to some extent, and a report of the differences is returned.”
Here is an example of using all.equal()
to compare two data frames.
# Two data frames with a small difference *
<- data.frame(a = c(1, 2, 3), b = c(5, 6, 7))
df1 <- data.frame(a = c(1, 2, 3), b = c(5, 6, 8))
df2
# When the two data frames are equivalent.
all.equal(df1, df1)
#> [1] TRUE
# When the two data frmaes are not equivalent.
all.equal(df1, df2)
#> [1] "Component \"b\": Mean relative difference: 0.1428571"
A word of caution, if using all.equal()
for a boolean comparison (like in an if-statement), make sure to wrap it with isTRUE
, otherwise it will return TRUE
or comments on the differences, but not FALSE
.
If the hash tables are different, that means the code must be evaluated, the new object be assigned to the desired name (var
), and the new hash table and value stashed. This is handled by the internal function new_stash()
.
# Make a new stash from a variable, code, and hash table.
<- function(var, code, hash_tbl) {
new_stash <- evaluate_code(code)
val assign_value(var, val)
write_hash_table(var, hash_tbl)
write_val(var, val)
}
The first step is to evaluate the code with the evaluate_code(code)
function. It uses the parse()
and eval()
functions and returns the resulting value.
# Evaluate the code in a new environment.
<- function(code) {
evaluate_code eval(parse(text = code), envir = new.env())
}
This value is then assigned the desired name in the global environment using the internal assign_value(var, val)
function, where .TargetEnv
is a variable in the package pointing to .GlobalEnv
.
# Assign the value `val` to the variable `var`.
<- function(var, val) {
assign_value assign(var, val, envir = .TargetEnv)
}
Lastly, the hash table and value are written to file using wrapper functions around readr::write_tsv()
and saveRDS()
.
Any issues and feedback on ‘mustashe’ can be submitted here. I can be reached through the contact form on my website or on Twitter @JoshDoesa.