mskcc.oncotree

The goal of {mskcc.oncotree} is to facilitate access to the OncoTree API.

The OncoTree is an open-source ontology that was developed at Memorial Sloan Kettering Cancer Center (MSK) for standardizing cancer type diagnosis from a clinical perspective by assigning each diagnosis a unique OncoTree code.

Currently, the functionality provided is the retrieval of tumor types and the mapping of tumor type codes across a few tumor type classification systems.

Installation

You can install the development version of {mskcc.oncotree} with:

# install.packages("remotes")
remotes::install_github("maialab/mskcc.oncotree")

OncoTree tumor types

Get the tumor types defined by OncoTree in their latest release:

library(mskcc.oncotree)

(tumor_types <- get_tumor_types())
#> # A tibble: 885 × 13
#>    oncotree_version  oncotree_code oncotree_name   oncotree_main_t… tissue level
#>    <chr>             <chr>         <chr>           <chr>            <chr>  <int>
#>  1 oncotree_latest_… MMB           Medullomyoblas… Embryonal Tumor  CNS/B…     3
#>  2 oncotree_latest_… GCB           Germinal Cente… Mature B-Cell N… Lymph…     6
#>  3 oncotree_latest_… SBLU          Splenic B-Cell… Mature B-Cell N… Lymph…     5
#>  4 oncotree_latest_… OHNCA         Head and Neck … Head and Neck C… Head …     2
#>  5 oncotree_latest_… PAOS          Parosteal Oste… Bone Cancer      Bone       3
#>  6 oncotree_latest_… TMDS          Therapy-Relate… Leukemia         Myelo…     5
#>  7 oncotree_latest_… ARMS          Alveolar Rhabd… Soft Tissue Sar… Soft …     3
#>  8 oncotree_latest_… SCST          Sex Cord Strom… Sex Cord Stroma… Ovary…     2
#>  9 oncotree_latest_… ITLPDGI       Indolent T-Cel… Mature T and NK… Lymph…     5
#> 10 oncotree_latest_… MBC           Metaplastic Br… Breast Cancer    Breast     2
#> # … with 875 more rows, and 7 more variables: parent <chr>, umls_code <list>,
#> #   nci_code <list>, history <list>, revocations <list>, precursors <list>,
#> #   color <chr>

The mapping to the corresponding tumor type identifier(s) in the Unified Medical Language System (UMLS) and in the National Cancer Institute (NCI) Thesaurus classification systems is provided in columns umls_code and nci_code.

tumor_types[c('oncotree_code', 'umls_code', 'nci_code')]
#> # A tibble: 885 × 3
#>    oncotree_code umls_code nci_code 
#>    <chr>         <list>    <list>   
#>  1 MMB           <chr [1]> <chr [1]>
#>  2 GCB           <NULL>    <NULL>   
#>  3 SBLU          <NULL>    <NULL>   
#>  4 OHNCA         <NULL>    <NULL>   
#>  5 PAOS          <chr [1]> <chr [1]>
#>  6 TMDS          <NULL>    <NULL>   
#>  7 ARMS          <chr [1]> <chr [1]>
#>  8 SCST          <chr [1]> <chr [1]>
#>  9 ITLPDGI       <NULL>    <NULL>   
#> 10 MBC           <chr [1]> <chr [1]>
#> # … with 875 more rows

umls_code and nci_code are list-columns because the mapping can be one-to-many.

If you want, you may convert these columns to type character by unnesting with tidyr::unnest(). This will have the side effect of creating more than one row per tumor type for cases where the mapping is one-to-many.

tumor_types[c('oncotree_code', 'umls_code')] %>%
  tidyr::unnest(cols = 'umls_code', keep_empty = TRUE)
#> # A tibble: 886 × 2
#>    oncotree_code umls_code
#>    <chr>         <chr>    
#>  1 MMB           C0205833 
#>  2 GCB           <NA>     
#>  3 SBLU          <NA>     
#>  4 OHNCA         <NA>     
#>  5 PAOS          C0206642 
#>  6 TMDS          <NA>     
#>  7 ARMS          C0206655 
#>  8 SCST          C0600113 
#>  9 ITLPDGI       <NA>     
#> 10 MBC           C1334708 
#> # … with 876 more rows

OncoTree versions

The data provided by OncoTree is versioned. You may list the released versions so far with get_versions():

get_versions()
#> # A tibble: 28 × 4
#>    oncotree_version           description                   visible release_date
#>    <chr>                      <chr>                         <lgl>   <chr>       
#>  1 oncotree_development       Latest OncoTree under develo… TRUE    2021-11-04  
#>  2 oncotree_candidate_release This version of the OncoTree… TRUE    2021-11-03  
#>  3 oncotree_2021_11_02        Stable OncoTree released on … FALSE   2021-11-02  
#>  4 oncotree_latest_stable     This is the latest approved … TRUE    2021-11-02  
#>  5 oncotree_2020_10_01        Stable OncoTree released on … FALSE   2020-10-01  
#>  6 oncotree_2020_04_01        Stable OncoTree released on … FALSE   2020-04-01  
#>  7 oncotree_2020_02_06        Stable OncoTree released on … FALSE   2020-02-06  
#>  8 oncotree_2020_02_01        Stable OncoTree released on … FALSE   2020-02-01  
#>  9 oncotree_2019_12_01        Stable OncoTree released on … FALSE   2019-12-01  
#> 10 oncotree_2019_08_01        Stable OncoTree released on … FALSE   2019-08-01  
#> # … with 18 more rows

The function get_tumor_types() accepts a release version, allowing to get data for past versions of OncoTree:

get_tumor_types(oncotree_version = 'oncotree_2019_08_01')
#> # A tibble: 863 × 13
#>    oncotree_version    oncotree_code oncotree_name oncotree_main_t… tissue level
#>    <chr>               <chr>         <chr>         <chr>            <chr>  <int>
#>  1 oncotree_2019_08_01 MMB           Medullomyobl… Embryonal Tumor  CNS/B…     3
#>  2 oncotree_2019_08_01 GCB           Germinal Cen… Mature B-Cell N… Lymph…     6
#>  3 oncotree_2019_08_01 SBLU          Splenic B-Ce… Mature B-Cell N… Lymph…     5
#>  4 oncotree_2019_08_01 OHNCA         Head and Nec… Head and Neck C… Head …     2
#>  5 oncotree_2019_08_01 PAOS          Parosteal Os… Bone Cancer      Bone       3
#>  6 oncotree_2019_08_01 TMDS          Therapy-Rela… Leukemia         Myelo…     5
#>  7 oncotree_2019_08_01 ARMS          Alveolar Rha… Soft Tissue Sar… Soft …     3
#>  8 oncotree_2019_08_01 SCST          Sex Cord Str… Sex Cord Stroma… Ovary…     2
#>  9 oncotree_2019_08_01 ITLPDGI       Indolent T-C… Mature T and NK… Lymph…     5
#> 10 oncotree_2019_08_01 MBC           Metaplastic … Breast Cancer    Breast     2
#> # … with 853 more rows, and 7 more variables: parent <chr>, umls_code <list>,
#> #   nci_code <list>, history <list>, revocations <list>, precursors <list>,
#> #   color <chr>

Mapping of tumor type codes

UMLS and NCIt

OncoTree provides a few mappings between their identifiers (codes) and other tumor classification systems. Currently, through their web API, only mappings between OncoTree codes and Unified Medical Language System (UMLS) and National Cancer Institute (NCI) Thesaurus (NCIt) codes are provided.

From {mskcc.oncotree} we provide four functions:

Function Description
oncotree_to_nci() Map OncoTree to NCIt codes
nci_to_oncotree() Map NCIt to OncoTree codes
oncotree_to_umls() Map OncoTree to UMLS codes
umls_to_oncotree() Map UMLS to OncoTree codes

From and to NCIt

oncotree_to_nci(c('MMB', 'PAOS'), expand = TRUE)
#> # A tibble: 2 × 2
#>   oncotree_code nci_code
#>   <chr>         <chr>   
#> 1 MMB           C3706   
#> 2 PAOS          C8969

nci_to_oncotree(c('C3706', 'C8969'), expand = TRUE)
#> # A tibble: 2 × 2
#>   nci_code oncotree_code
#>   <chr>    <chr>        
#> 1 C3706    MMB          
#> 2 C8969    PAOS

From and to UMLS

oncotree_to_umls(c('MMB', 'PAOS'), expand = TRUE)
#> # A tibble: 2 × 2
#>   oncotree_code umls_code
#>   <chr>         <chr>    
#> 1 MMB           C0205833 
#> 2 PAOS          C0206642

umls_to_oncotree(c('C0205833', 'C0206642', 'C1334708'), expand = TRUE)
#> # A tibble: 3 × 2
#>   umls_code oncotree_code
#>   <chr>     <chr>        
#> 1 C0205833  MMB          
#> 2 C0206642  PAOS         
#> 3 C1334708  MBC

Other mappings

Besides UMLS and NCIt, you may also map OncoTree codes to ICD-O topography and morphology, and HemeOnc codes. This functionality is based on the file ontology_mappings.txt. provided by OncoTree at their GitHub repository.

In OncoTree’s discussion forum the developers have warned that this data can’t be expected to be neither accurate nor complete. However, it might still be useful for some users, and hence we also provide access to these mappings with the function map_ontology_code().

Here are some usage examples map_ontology_code():

# Simple example
map_ontology_code(code = 'MMB', from = 'oncotree_code', to = 'nci_code')
#> # A tibble: 1 × 2
#>   oncotree_code nci_code
#>   <chr>         <chr>   
#> 1 MMB           C3706

# Omit the `code` argument to get all possible mappings. Note that
# one-to-many mappings will generate more than one row per `from` code.
map_ontology_code(from = 'oncotree_code', to = 'nci_code')
#> # A tibble: 855 × 2
#>    oncotree_code nci_code
#>    <chr>         <chr>   
#>  1 MMB           C3706   
#>  2 AIS           C4123   
#>  3 AASTR         C9477   
#>  4 FL            C3209   
#>  5 VIMT          C4286   
#>  6 KIDNEY        C12415  
#>  7 MDEP          C4327   
#>  8 PAOS          C8969   
#>  9 PRSCC         C6766   
#> 10 DSTAD         C9159   
#> # … with 845 more rows

# Some mappings are one-to-many, e.g. "SRCCR", which means repeated rows for
# the same input code.
map_ontology_code(code = 'SRCCR', from = 'oncotree_code', to = 'nci_code')
#> # A tibble: 2 × 2
#>   oncotree_code nci_code
#>   <chr>         <chr>   
#> 1 SRCCR         C9168   
#> 2 SRCCR         C7967

# Using the `collapse` argument to "collapse" one-to-many mappings makes sure
# that the output has as many rows as the `from` vector.
map_ontology_code(code = 'SRCCR',
                  from = 'oncotree_code',
                  to = 'nci_code',
                  collapse = toString)
#> # A tibble: 1 × 2
#>   oncotree_code nci_code    
#>   <chr>         <chr>       
#> 1 SRCCR         C9168, C7967

map_ontology_code(code = 'SRCCR',
                  from = 'oncotree_code',
                  to = 'nci_code',
                  collapse = list)
#> # A tibble: 1 × 2
#>   oncotree_code nci_code 
#>   <chr>         <list>   
#> 1 SRCCR         <chr [2]>

map_ontology_code(
  code = 'SRCCR',
  from = 'oncotree_code',
  to = 'nci_code',
  collapse = \(x) paste(x, collapse = ' ')
)
#> # A tibble: 1 × 2
#>   oncotree_code nci_code   
#>   <chr>         <chr>      
#> 1 SRCCR         C9168 C7967

# `map_ontology_code()` is vectorized over `code`
map_ontology_code(code = c('AASTR', 'MDEP'), from = 'oncotree_code', to = 'nci_code')
#> # A tibble: 2 × 2
#>   oncotree_code nci_code
#>   <chr>         <chr>   
#> 1 AASTR         C9477   
#> 2 MDEP          C4327

# Map from ICDO topography to ICDO morphology codes
map_ontology_code(code = 'C72.9', from = 'icdo_topography_code', to = 'icdo_morphology_code')
#> # A tibble: 65 × 2
#>    icdo_topography_code icdo_morphology_code
#>    <chr>                <chr>               
#>  1 C72.9                9401/3              
#>  2 C72.9                9501/3              
#>  3 C72.9                9390/1              
#>  4 C72.9                8000/3              
#>  5 C72.9                9424/3              
#>  6 C72.9                9391/3              
#>  7 C72.9                9425/3              
#>  8 C72.9                9383/1              
#>  9 C72.9                9100/3              
#> 10 C72.9                9522/3              
#> # … with 55 more rows