The American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative (GENIE BPC) is an effort to aggregate comprehensive clinical data linked to genomic sequencing data to create a pan-cancer, publicly available data repository. These data detail clinical characteristics and drug treatment regimen information, along with high-throughput sequencing data and clinical outcomes, for cancer patients across international institutions. The GENIE BPC data repository forms a unique observational database of comprehensive clinical annotation with molecularly characterized tumors that can be used to advance precision medicine research in oncology. Linking multiple clinical and genomic datasets that vary in structure introduces an inherent complexity for data users. Therefore, use of the GENIE BPC data requires a rigorous process for preparing and merging the data to build analytic models. The {genieBPC} package is a user-friendly data processing pipeline to streamline the process for developing analytic cohorts that are ready for clinico-genomic analyses.
You can install the released version of {genieBPC} from the R Universe
install.packages('genieBPC',
repos = c(mskccepibio = 'https://mskcc-epi-bio.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
and the development version with
::install_github("GENIE-BPC/genieBPC") remotes
Data import: pull_data_synapse()
imports GENIE BPC data from Synapse into the R environment
Data processing
create_analytic_cohort()
selects an analytic cohort
based on cancer diagnosis information and/or cancer-directed drug
regimen informationselect_unique_ngs()
selects a unique next generation
sequencing (NGS) test corresponding to the selected diagnosesData visualization:
drug_regimen_sunburst()
creates a sunburst figure of drug
regimen information corresponding to the selected diagnoses in the order
that the regimens were administered
Access to the GENIE BPC data release folders on Synapse is required in order to use this function. To obtain access:
For public data releases:
Register for a Synapse account
Navigate to the data release and request accept terms of use (e.g., for the NSCLC 2.0-public data release, navigate to the Synapse page for the data release). Towards the top of the page, there is information including the Synapse ID, DOI, Item count, and Access. Next to Access is a link that reads Request Access.
Select Request Access, review the terms of data use and select Accept
For consortium data releases (restricted to GENIE consortium members & BPC pharmaceutical partners):
Register for a Synapse account
Use this link to access the GENIE BPC team list and request to join the team. Please include your full name and affiliation in the message before sending out the request.
Once the request is accepted, you may access the data in the GENIE Biopharma Collaborative projects.
Note: Please allow up to a week to review and grant access.
The analytic data guides provide details on each analytic dataset and its corresponding variables for each data release.
Public Data Releases
Consortium Data Releases
Note that only GENIE BPC consortium users have access to the
consortium releases.
The following example creates an analytic cohort of patients diagnosed with Stage IV adenocarcinoma NSCLC.
Pull data for NSCLC version 2.0-public:
<- pull_data_synapse(cohort = "NSCLC", version = "v2.0-public") nsclc_2_1
Select stage IV adenocarcinoma NSCLC diagnoses:
<- create_analytic_cohort(data_synapse = nsclc_2_0$NSCLC_v2.0,
nsclc_stg_iv_adeno stage_dx = "Stage IV",
histology = "Adenocarcinoma")
Select one unique metastatic lung adenocarcinoma genomic sample per patient in the analytic cohort returned above:
<- select_unique_ngs(
nsclc_stg_iv_adeno_unique_sample data_cohort = nsclc_stg_iv_adeno$cohort_ngs)
Create a visualization of the treatment patterns for the first 3 regimens received by patients diagnosed with stage IV adenocarcinoma:
<- drug_regimen_sunburst(data_synapse = nsclc_2_0$NSCLC_v2.0,
sunplot data_cohort = nsclc_stg_iv_adeno,
max_n_regimens = 3)
Example of a sunburst plot showing 3 lines of treatment, Highlighting First Treatment Regimen: