pepr
This vignette will show you how and why to use the amendments functionality of the pepr
package.
basic information about the PEP concept on the project website.
broader theoretical description in the amendments documentation section.
The example below demonstrates how and why to use amendments project attribute to, e.g. define numerous similar projects in a single project config file. This functionality is extremely convenient when one has to define projects with small settings discreptancies, like different attributes in the annotation sheet. For example libraries ABCD
and EFGH
instead of the original RRBS
.
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | RRBS | pig | 0 | source1 |
pig_1h | RRBS | pig | 1 | source1 |
frog_0h | RRBS | frog | 0 | source1 |
frog_1h | RRBS | frog | 1 | source1 |
This can be achieved by using amendments section of project_config.yaml
file (presented below). The attributes specified in the lowest levels of this section (here: sample_table
) overwrite the original ones. Consequently, a completely new set of settings is determined with just this value changed. Moreover, multiple amendments can be defined in a single config file and activated at the same time. Based on the file presented below, two subprojects will be defined: newLib
and newLib2
.
pep_version: 2.0.0
sample_table: sample_table.csv
output_dir: $HOME/hello_looper_results
sample_modifiers:
derive:
attributes: file_path
sources:
source1: /data/lab/project/{organism}_{time}h.fastq
source2:
/path/from/collaborator/weirdNamingScheme_{external_id}.fastq
project_modifiers:
amend:
newLib:
sample_table: sample_table_newLib.csv
newLib2:
sample_table: sample_table_newLib2.csv
Obviously, the amendments functionality can be combined with other pepr
package options, e.g. imply and derive sample modifiers. The derive modifier is used in the example considered here (derive
key in the sample_modifiers
section of the config file).
Files sample_table_newLib.csv
and sample_table_newLib2.csv
introduce different the library
attributes. They are used in the subprojects newLib
and newLib2
, respectively.
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | ABCD | pig | 0 | source1 |
pig_1h | ABCD | pig | 1 | source1 |
frog_0h | ABCD | frog | 0 | source1 |
frog_1h | ABCD | frog | 1 | source1 |
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | EFGH | pig | 0 | source1 |
pig_1h | EFGH | pig | 1 | source1 |
frog_0h | EFGH | frog | 0 | source1 |
frog_1h | EFGH | frog | 1 | source1 |
Load pepr
and read in the project metadata by specifying the path to the project_config.yaml
:
library(pepr)
projectConfig = system.file("extdata", paste0("example_peps-", branch),"example_amendments1", "project_config.yaml", package="pepr")
p=Project(projectConfig)
#> Loading config file: /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpSzdJbG/Rinst4b945a77f0f4/pepr/extdata/example_peps-master/example_amendments1/project_config.yaml
#> amendments: newLib,newLib2
An appropriate message is displayed, which informs you what are the names of the amendments that you have defined in the project_config.yaml
file. Nontheless, just the main project is “active”.
Let’s inspect it:
sampleTable(p)
#> sample_name protocol organism time file_path
#> 1: pig_0h RRBS pig 0 /data/lab/project/pig_0h.fastq
#> 2: pig_1h RRBS pig 1 /data/lab/project/pig_1h.fastq
#> 3: frog_0h RRBS frog 0 /data/lab/project/frog_0h.fastq
#> 4: frog_1h RRBS frog 1 /data/lab/project/frog_1h.fastq
The column file_path
was derived and the library
column holds the original attributes: RRBS
for each sample.
To “activate” any of the amendments just pass the names of the desired amendments to the amendments
argument in the Project
object constructor.
In case you don’t remember the subproject names run the listAmendments()
metohods on the Project
object, just like that:
pNewLib = Project(file = projectConfig, amendments = "newLib")
#> Loading config file: /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpSzdJbG/Rinst4b945a77f0f4/pepr/extdata/example_peps-master/example_amendments1/project_config.yaml
#> Activating amendment: newLib
#> amendments: newLib,newLib2
Let’s inspect it:
sampleTable(pNewLib)
#> sample_name protocol organism time file_path
#> 1: pig_0h ABCD pig 0 /data/lab/project/pig_0h.fastq
#> 2: pig_1h ABCD pig 1 /data/lab/project/pig_1h.fastq
#> 3: frog_0h ABCD frog 0 /data/lab/project/frog_0h.fastq
#> 4: frog_1h ABCD frog 1 /data/lab/project/frog_1h.fastq
As you can see, the library
columns consists of new attributes (ABCD
), which were defined in the sample_table_newLib.csv
file.
Amendments can be also activated interactively, after Project
object has been crated. Let’s activate the second amendment this way:
pNewLib2 = activateAmendments(p, "newLib2")
#> Activating amendment: newLib2
sampleTable(pNewLib2)
#> sample_name protocol organism time file_path
#> 1: pig_0h EFGH pig 0 /data/lab/project/pig_0h.fastq
#> 2: pig_1h EFGH pig 1 /data/lab/project/pig_1h.fastq
#> 3: frog_0h EFGH frog 0 /data/lab/project/frog_0h.fastq
#> 4: frog_1h EFGH frog 1 /data/lab/project/frog_1h.fastq
What is more, the p
object consists of all the information from the project config file (project_config.yaml
). Run the following line to explore it:
config(p)
#> Config object. Class: Config
#> pep_version: 2.0.0
#> sample_table:
#> /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpSzdJbG/Rinst4b945a77f0f4/pepr/extdata/example_peps-master/example_amendments1/sample_table.csv
#> output_dir: /Users/mstolarczyk/hello_looper_results
#> sample_modifiers:
#> derive:
#> attributes: file_path
#> sources:
#> source1: /data/lab/project/{organism}_{time}h.fastq
#> source2:
#> /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
#> project_modifiers:
#> amend:
#> newLib:
#> sample_table: sample_table_newLib.csv
#> newLib2:
#> sample_table: sample_table_newLib2.csv
#> name: example_amendments1