In contrast to other programming languages, R has no widely established and undisputed style guide (e.g. PEP 8 for Python). As a data scientist at STATWORX, I helped to establish a company wide R style guide. While it mainly relies on the tidyverse style guide, we generally decided to be more explicit in our coding practice. This includes that we always refer to functions from non-native R packages with the double colon operator ::
. While it is relatively easy to establish such a convention in new projects, it is challenging to adapt ongoing projects and legacy code. origin
allows for much faster conversions of both legacy code as well as currently written code.
origin
The sole purpose is to add pkg::
to an R function call, i.e. it changes code like this:
origin
In general, you can either originize some selected text (more on that later in Addins), a whole script, or a all scripts in a specific folder, e.g. your project folder. There is a specifically designed function for each purpose yet they all share the same options. Therefore, only originize_file()
is extensively presented as an example with its default options.
originize_file(file = "testscript.R",
pkgs = .packages(),
overwrite = TRUE,
ask_before_applying_changes = TRUE,
ignore_comments = TRUE,
check_conflicts = TRUE,
add_base_packages = FALSE,
check_base_conflicts = TRUE,
check_local_conflicts = TRUE,
excluded_functions = list(dplyr = c("%>%", "across"),
data.table = c(":=", "%like%"),
# exclude from all packages:
c("first", "last")),
verbose = TRUE,
use_markers = TRUE)
pkgs
: which packages to check for functions used in the code (see Considered Packages). The default are all packages attached via library
or require
overwrite
: actually insert pkg::
into the code. Otherwise, logging shows only what would happen. Note that ask_before_applying_changes
still allows to keep control over your code before origin
changes anything.ask_before_applying_changes
: whether changes should be applied immediately or the user must approve them first.ignore_comments
: should comments be ignored.check_conflicts
: should origin
check for potential namespace conflicts, i.e. a used function is defined in more than one considered package. User input is required to solve the issue. Strongly encouraged to be set to TRUE
.add_base_packages
: should base packages also be added, e.g. base::sum()
.check_base_conflicts
: Should origin also check for conflicts with base R functions.check_local_conflicts
: Should origin also check for conflicts with locally defined functions anywhere in your project? Note that it does not check the environment but solely parses files and scans them for function definitions.excluded_functions
: a (named) list of functions to exclude from checking.verbose
: some sort of logging is performed, either in the console or via the markers tab in RStudio.use_markers
: whether to use the Markers tab in RStudio.Besides using regular R functions to originize files, there are also useful addins delivered with origin
. These addins are designed to be used on-the-fly while coding. You can either originize selected text, the currently opened file, or all scripts in the currently opened project. However, to have as much control as when using functions, each function argument corresponds to an option that can be set and used inside the addins, e.g.
options(origin.pkgs = c("dplyr", "data.table"),
origin.overwrite = TRUE)
Actually, most function arguments of origin
first check whether an option has been declared and uses the assigned value as its default. This allows for equal outcomes regardless whether you use the addin or a function sequentially.
Since origin
changes files on disk, it is very important that the user has full control over what happens and user input is required before critical steps.
Most importantly, the user must be aware of what the originized file(s) would look like. For this, all changes and potential missed changes are presented, either in the Markers tab (recommended) or in the console.
pkg::
is inserted prior to a functionorigin
highlights such cases in the logging output.%>%
are exported by packages but cannot be called with the pkg::fun()
convention. Such functions are highlighted by default to point the user that these stem from a package. When using dplyr-style code, consider to exclude the pipe-operator via exclude_functions
.Due to the variety of R packages, function names must not be unique across all packages out there. By default, R masks priorly imported functions by those imported afterwards. origin
mimics this rule by applying a higher priority to those packages that are listed first. In case there is a conflict regarding a used function, These functions are listed along with the packages from which they stem.
Used functions in mutliple Packages!
filter: dplyr, stats first: data.table, dplyr
Order in which relevant packges are evaluated: data.table >> dplyr >> stats
Do you want to proceed? 1: YES 2: NO
As packages mask each others functions, the same applies to locally defined custom functions. In case you defined your own last
function in your project, origin
should not add dplyr::
to it. Therefore, your project is searched for function definitions and local functions have higher priority than those exported by packages. Note that, depending on the project size, this process can take quite some time. In this case, set the argument/option path_to_local_functions
to a subdirectory or check_local_conflicts
to FALSE
to skip this feature.
Locally defined and used functions mask exported functions from packages
last: dplyr
Local functions have higher priority. In case you want to use an exported version of a function listed above set pkg::fun manually
Got it? 1: YES 2: NO 3: Show files
When originizing
a complete folder or project, many R scripts might be checked. In case the user is unaware that there are many files in the selected folder, resulting in a long run time of origin
, a warning is triggered and user input is required.
You are about to originize 99 files.
Proceed? 1: YES 2: NO 3: Show files
Before the proposed changes are applied eventually, a final user input is required.
Happy with the result? 😀
1: YES 2: NO
Whether or not to add pkg::
to each (imported) function is a controversial issue in the R community. While the tidyverse style guide does not mention explicit namespacing, R Packages and the Google R style guide are in favor of it.
Pros
Cons
(minimal) performance issue
more writing required
longer code
infix functions like %>%
cannot be called via magrittr::%>%
and workarounds are still required here. Either use
library(magrittr, include.only = "%>%")
`%>%` <- magrittr::`%>%`
calling library()
on top of a script clearly indicates which packages are needed. A not yet installed package throws an error right away, not until a function cannot be found later in the script. However, one can use the include_only
argument and set it to NULL
. No functions are attached into the search list then.
library(magrittr, include_only = NULL)