Introduction to metabolighteR

Thomas Wilson

Institute of Biological, Environmental & Rural Sciences (IBERS), Aberystwyth University, UK
tpw2@aber.ac.uk

Introduction

MetaboLights is a database for metabolomics experiments and associated information. The database allows users to deposit raw data, sample information, analysis protocols and metabolite annotation data.

metabolighteR provides easy access to publicly available MetaboLights studies via the MetaboLights RESTful API. Only API methods which retrieve data (GET) are supported by metabolighteR.

Installation

metabolighteR can be installed from CRAN or, for the latest development version, directly from GitHub using the remotes package.

install.packages('metabolighteR')

remotes::install_github('wilsontom/metabolighteR')

library(metabolighteR)

Querying the Repository

A list of all public study identification codes can be easily retrieved.

all_study_ids <- get_studies()

studies <- as.vector(all_study_ids$study)

head(studies)
#> [1] "MTBLS155" "MTBLS391" "MTBLS102" "MTBLS129" "MTBLS143" "MTBLS165"

Generate a summary table containing; Study ID, Study Title and Study Technology, for publicly available studies.

# For the first five studies

study_titles <- purrr::map(studies[1:5], get_study_title)
names(study_titles) <- studies[1:5]

study_titles <- tibble::as_tibble(study_titles) %>% tidyr::gather()
names(study_titles) <- c('STUDY', 'Title')

study_tech <- get_study_tech()

study_tech_filter <- study_tech %>% dplyr::filter(STUDY %in% studies[1:5])

StudyInfoTable <-
  dplyr::left_join(study_titles, study_tech_filter, by = 'STUDY')

StudyInfoTable
#> # A tibble: 5 x 3
#>   STUDY   Title                                        TECH                     
#>   <chr>   <chr>                                        <chr>                    
#> 1 MTBLS1… Release of Ecologically Relevant Metabolite… UPLC-FT-ICR-MS;UPLC-LTQ-…
#> 2 MTBLS3… Lipid Data Analyzer: Discrimination of isob… HPLC-LTQ-MS;UPLC-LTQ-MS;…
#> 3 MTBLS1… Comparative analysis of the adaptation of S… NMR                      
#> 4 MTBLS1… Coordinate Regulation of Metabolite Glycosy… UPLC-TOF-MS;UPLC-LTQ-MS  
#> 5 MTBLS1… Comprehensive systems biology analysis of a… UPLC-LTQ-MS

Download File Contents

A list of all available files can be generated using the get_study_files function.


studyFileList <- get_study_files('MTBLS264')

studyFileList
#> # A tibble: 6 x 6
#>   createdAt       directory file                  status timestamp   type       
#>   <chr>           <lgl>     <chr>                 <chr>  <chr>       <chr>      
#> 1 March 20 2017 … FALSE     a_mtbls264_NEG_mass_… active 2017032014… metadata_a…
#> 2 March 20 2017 … FALSE     a_mtbls264_POS_mass_… active 2017032014… metadata_a…
#> 3 January 13 202… FALSE     i_Investigation.txt   active 2020011313… metadata_i…
#> 4 March 20 2017 … FALSE     m_mtbls264_NEG_mass_… active 2017032014… metadata_m…
#> 5 March 20 2017 … FALSE     m_mtbls264_POS_mass_… active 2017032014… metadata_m…
#> 6 March 20 2017 … FALSE     s_MTBLS264.txt        active 2017032014… metadata_s…

The contents of these files can then be downloaded using the download_file function.


fileContents_A <- download_study_file('MTBLS264', studyFileList$file[1])
#> No encoding supplied: defaulting to UTF-8.

head(fileContents_A)
#> # A tibble: 6 x 35
#>   Sample.Name   Protocol.REF Parameter.Value.Po… Parameter.Value.D… Extract.Name
#>   <chr>         <chr>        <lgl>               <lgl>              <lgl>       
#> 1 Volunteer1_b… Extraction   NA                  NA                 NA          
#> 2 Volunteer1_p… Extraction   NA                  NA                 NA          
#> 3 Volunteer1_R… Extraction   NA                  NA                 NA          
#> 4 Volunteer2_b… Extraction   NA                  NA                 NA          
#> 5 Volunteer2_p… Extraction   NA                  NA                 NA          
#> 6 Volunteer2_R… Extraction   NA                  NA                 NA          
#> # … with 30 more variables: Protocol.REF.1 <chr>,
#> #   Parameter.Value.Chromatography.Instrument. <chr>, Term.Source.REF <lgl>,
#> #   Term.Accession.Number <lgl>, Parameter.Value.Column.model. <chr>,
#> #   Parameter.Value.Column.type. <chr>, Labeled.Extract.Name <lgl>,
#> #   Label <lgl>, Term.Source.REF.1 <lgl>, Term.Accession.Number.1 <lgl>,
#> #   Protocol.REF.2 <chr>, Parameter.Value.Scan.polarity. <chr>,
#> #   Parameter.Value.Scan.m.z.range. <chr>, Parameter.Value.Instrument. <chr>,
#> #   Term.Source.REF.2 <lgl>, Term.Accession.Number.2 <lgl>,
#> #   Parameter.Value.Ion.source. <chr>, Term.Source.REF.3 <chr>,
#> #   Term.Accession.Number.3 <chr>, Parameter.Value.Mass.analyzer. <chr>,
#> #   Term.Source.REF.4 <chr>, Term.Accession.Number.4 <chr>,
#> #   MS.Assay.Name <chr>, Raw.Spectral.Data.File <chr>, Protocol.REF.3 <chr>,
#> #   Normalization.Name <lgl>, Derived.Spectral.Data.File <lgl>,
#> #   Protocol.REF.4 <chr>, Data.Transformation.Name <lgl>,
#> #   Metabolite.Assignment.File <chr>

fileContents_B <- download_study_file('MTBLS264', studyFileList$file[4])
#> No encoding supplied: defaulting to UTF-8.

head(fileContents_B)
#> # A tibble: 6 x 69
#>   database_identi… chemical_formula smiles inchi metabolite_iden… mass_to_charge
#>   <chr>            <chr>            <chr>  <chr> <chr>                     <dbl>
#> 1 CHEBI:17552      C10H15N5O11P2    Nc1nc… InCh… GDP                        442.
#> 2 CHEBI:17345      C10H14N5O8P      Nc1nc… InCh… GMP                        362.
#> 3 CHEBI:16695      C9H13N2O9P       O[C@@… InCh… UMP                        323.
#> 4 CHEBI:15713      C9H15N2O15P3     O[C@@… InCh… UTP                        483.
#> 5 CHEBI:17368      C5H4N4O          O=c1[… InCh… Hypoxanthine               135.
#> 6 CHEBI:17775      C5H4N4O3         O=c1[… InCh… Urate                      167.
#> # … with 63 more variables: fragmentation <lgl>, modifications <chr>,
#> #   charge <chr>, retention_time <dbl>, taxid <chr>, species <chr>,
#> #   database <lgl>, database_version <lgl>, reliability <chr>, uri <lgl>,
#> #   search_engine <lgl>, search_engine_score <lgl>,
#> #   smallmolecule_abundance_sub <lgl>, smallmolecule_abundance_stdev_sub <lgl>,
#> #   smallmolecule_abundance_std_error_sub <lgl>, Volunteer1_blood_0h_NEG <dbl>,
#> #   Volunteer1_plasma_0h_NEG <dbl>, Volunteer1_RBC_0h_NEG <dbl>,
#> #   Volunteer2_blood_0h_NEG <dbl>, Volunteer2_plasma_0h_NEG <dbl>,
#> #   Volunteer2_RBC_0h_NEG <dbl>, Volunteer3_blood_0h_NEG <dbl>,
#> #   Volunteer3_plasma_0h_NEG <dbl>, Volunteer3_RBC_0h_NEG <dbl>,
#> #   Volunteer4_blood_0h_NEG <dbl>, Volunteer4_plasma_0h_NEG <dbl>,
#> #   Volunteer4_RBC_0h_NEG <dbl>, Volunteer1_blood_1h_NEG <dbl>,
#> #   Volunteer1_plasma_1h_NEG <dbl>, Volunteer1_RBC_1h_NEG <dbl>,
#> #   Volunteer2_blood_1h_NEG <dbl>, Volunteer2_plasma_1h_NEG <dbl>,
#> #   Volunteer2_RBC_1h_NEG <dbl>, Volunteer3_blood_1h_NEG <dbl>,
#> #   Volunteer3_plasma_1h_NEG <dbl>, Volunteer3_RBC_1h_NEG <dbl>,
#> #   Volunteer4_blood_1h_NEG <dbl>, Volunteer4_plasma_1h_NEG <dbl>,
#> #   Volunteer4_RBC_1h_NEG <dbl>, Volunteer1_blood_4h_NEG <dbl>,
#> #   Volunteer1_plasma_4h_NEG <dbl>, Volunteer1_RBC_4h_NEG <dbl>,
#> #   Volunteer2_blood_4h_NEG <dbl>, Volunteer2_plasma_4h_NEG <dbl>,
#> #   Volunteer2_RBC_4h_NEG <dbl>, Volunteer3_blood_4h_NEG <dbl>,
#> #   Volunteer3_plasma_4h_NEG <dbl>, Volunteer3_RBC_4h_NEG <dbl>,
#> #   Volunteer4_blood_4h_NEG <dbl>, Volunteer4_plasma_4h_NEG <dbl>,
#> #   Volunteer4_RBC_4h_NEG <dbl>, Volunteer1_blood_24h_NEG <dbl>,
#> #   Volunteer1_plasma_24h_NEG <dbl>, Volunteer1_RBC_24h_NEG <dbl>,
#> #   Volunteer2_blood_24h_NEG <dbl>, Volunteer2_plasma_24h_NEG <dbl>,
#> #   Volunteer2_RBC_24h_NEG <dbl>, Volunteer3_blood_24h_NEG <dbl>,
#> #   Volunteer3_plasma_24h_NEG <dbl>, Volunteer3_RBC_24h_NEG <dbl>,
#> #   Volunteer4_blood_24h_NEG <dbl>, Volunteer4_plasma_24h_NEG <dbl>,
#> #   Volunteer4_RBC_24h_NEG <dbl>