The concept behind org
is fairly simple - most analyses
have three main sections:
Yet each of these sections have extremely different requirements.
Code should:
Results should:
Data should:
org::initialize_project
takes in 2+ arguments. It then
saves its results (i.e. folder locations) in org::project
,
which you will use in all of your subsequent code.
home
is the location of Run.R
and the
R/
folder. It will be accessible via
org::project$home
(but you will rarely need this).
results
is your results folder. It will create a folder
inside results
with today’s date and it will be accessible
via org::project$results_today
(this is where you will
store all of your results). Because the folder’s name is today’s date,
this means that it automatically archives one copy of results per day.
This allows you and your collaborators to easily check and see how the
results have changed over time.
...
you can specify other folders if you want, and they
will also be accessible via org::project
. Some suggested
ones are raw
(raw data) and clean
(clean
data).
Run.R
will be your masterfile. It will sequentially run
through every part of your analysis:
You will not have any other masterfiles, as this creates confusion when you need to return to your code 2 years later.
All sections of code (e.g. data cleaning, analyses for table 1,
analyses for table 2, result generation for table 1, sensitivity
analysis 1, etc) will be encapsulated in functions that live in the
R
folder.
An example Run.R
might look like this:
# Define the following 'special' locations:
# 'home' (i.e. Run.R and R/ folder)
# 'results' (i.e. A folder with today's date will be created here)
# Define any other locations you want:
# 'raw' (i.e. raw data)
org::initialize_project(
env = .GlobalEnv,
home = "/git/analyses/2019/analysis3/",
results = "/dropbox/analyses_results/2019/analysis3/"
raw = "/data/analyses/2019/analysis3/"
)
# It is a good practice to describe how the archived
# results are changing from day-to-day.
#
# We recommend saving this information as a text file
# stored in the `org::project$results` folder.
txt <- glue::glue("
2019-01-01:
Included:
- Table 1
- Table 2
2019-02-02:
Changed Table 1 from mean -> median
", .trim=FALSE)
org::write_text(
txt = txt,
file = fs::path(org::project$results, "info.txt")
)
library(data.table)
library(ggplot2)
# This function would access data located in org::project$raw
d <- clean_data()
# These functions would save results to org::project$results_today
table_1(d)
figure_1(d)
figure_2(d)
All of your functions should be defined in
org::project$home/R
. The function
org::initialize_project
automatically sources all of the R
scripts in org::project$home/R
, making them available for
use.
Our (very opinionated) solution is as follows:
Code should be stored in Github. Data should be stored locally in an encrypted volume. Results should be stored on Dropbox, with the results for each day stored in a different folder.
# code goes here:
git
+-- analyses
+-- 2018
| +-- analysis1 <- org::project$home
| | +-- Run.R
| | +-- R
| | +-- CleanData.R
| | +-- Descriptives.R
| | +-- Analysis1.R
| | +-- Graphs1.R
| +-- analysis2 <- org::project$home
| +-- Run.R
| +-- R
| +-- CleanData.R
| +-- Descriptives.R
| +-- Analysis1.R
| +-- Graphs1.R
+-- 2019
+-- analysis3 <- org::project$home
+-- Run.R
+-- R
+-- CleanData.R
+-- Descriptives.R
+-- Analysis1.R
+-- Graphs1.R
# results goes here:
dropbox
+-- analyses_results
+-- 2018
| +-- analysis1 <- org::project$results
| | +-- 2018-03-12 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- figure1.png
| | +-- 2018-03-15 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- figure1.png
| | +-- 2018-03-18 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- figure1.png
| | | +-- figure2.png
| | +-- 2018-06-18 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- table2.xlsx
| | | +-- figure1.png
| | | +-- figure2.png
| +-- analysis2 <- org::project$results
| | +-- 2018-06-09 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- figure1.png
| | +-- 2018-12-15 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- figure1.png
| | +-- 2019-01-18 <- org::project$results_today
| | | +-- table1.xlsx
| | | +-- figure1.png
| | | +-- figure2.png
+-- 2019
+-- analysis3 <- org::project$results
| +-- 2019-06-09 <- org::project$results_today
| | +-- table1.xlsx
| | +-- figure1.png
| +-- 2019-12-15 <- org::project$results_today
| | +-- table1.xlsx
| | +-- figure1.png
| +-- 2020-01-18 <- org::project$results_today
| | +-- table1.xlsx
| | +-- figure1.png
| | +-- figure2.png
# data goes here:
data
+-- analyses
+-- 2018
| +-- analysis1 <- org::project$raw
| | +-- data.xlsx
| +-- analysis2 <- org::project$raw
| | +-- data.xlsx
+-- 2019
+-- analysis3 <- org::project$raw
+-- data.xlsx
Suggested code for Run.R
:
org::initialize_project(
env = .GlobalEnv,
home = "/git/analyses/2019/analysis3/",
results = "/dropbox/analyses_results/2019/analysis3/",
raw = "/data/analyses/2019/analysis3/"
)
If you only have one folder on a shared network drive, without access to github or dropbox, and you would like to create an RMarkdown document, then we propose the following solution:
# code goes here:
project_name <- org::project$home
+-- Run.R
+-- R
| +-- CleanData.R
| +-- Descriptives.R
| +-- Analysis1.R
| +-- Graphs1.R
+-- paper
| +-- paper.Rmd
+-- results <- org::project$results
| +-- 2018-03-12 <- org::project$results_today
| | +-- table1.xlsx
| | +-- figure1.png
| +-- 2018-03-12 <- org::project$results_today
| | +-- table1.xlsx
| | +-- figure1.png
+-- data_raw <- org::project$raw
| +-- data.xlsx
Suggested code for Run.R
:
# Initialize the project
org::initialize_project(
env = .GlobalEnv,
home = "/network/project_name/",
results = "/network/project_name/results/",
paper = "/network/paper/",
raw = "/network/project_name/data_raw/"
)
# do some analyses here
# render the paper
rmarkdown::render(
input = fs::path(org::project$paper,"paper.Rmd"),
output_dir = org::project$results_today,
quiet = F
)