MINVAL - MINimal VALidation for Stoichiometric Reactions

Daniel Osorio, Janneth Gonzalez and Andres Pinzon

Introduction to MINVAL

The MINVAL package was designed as a tool to identify orphan metabolites and evaluate the mass and charge balance of stoichiometric reactions. The package also includes functions to characterize and write models in TSV and SBML formats, extract all reactants, products, metabolite names and compartments from a metabolic reconstruction. It is available through CRAN repositories, to install it just type:

install.packages("minval")

MINVAL package includes twelve functions designed to characterize, check and depurate metabolic reconstructions before its interrogation through Flux Balance Analysis (FBA). FBA methods are available for the R language in the "sybil" (Systems Biology Library) package. To load required packages just type:

library("minval")
#library("sybilSBML")
library("sybil")
library("glpkAPI")

Metabolic Models

Metabolic models are sets of stoichiometric reactions that represents a process where a set of chemical compounds called reactants are converted into others called products. In a stoichiometric reaction, all the reactants are placed on the left and the products on the right separated by an arrow symbol which indicates the direction of the reaction. Some examples of valid stoichiometric reactions are:

"H2O[c] + Urea-1-Carboxylate[c] <=> 2 CO2[c] + 2 NH3[c]"
"ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]"
"CO2[c] <=>"

MINVAL can extract both, reactants and products for a set of stoichiometric reactions through the reactants and products functions as follows:

reactants(reactionList = "ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]")
## [1] "ADP[c]"                 "Phosphoenolpyruvate[c]"
products(reactionList = "ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]")
## [1] "ATP[c]"      "Pyruvate[c]"
reactants(reactionList = "H2O[c] + Urea-1-Carboxylate[c] <=> 2 CO2[c] + 2 NH3[c]")
## [1] "H2O[c]"                "Urea-1-Carboxylate[c]" "CO2[c]"               
## [4] "NH3[c]"
products(reactionList = "H2O[c] + Urea-1-Carboxylate[c] <=> 2 CO2[c] + 2 NH3[c]")
## [1] "H2O[c]"                "Urea-1-Carboxylate[c]" "CO2[c]"               
## [4] "NH3[c]"

To show the potential use of the MINVAL package a human-readable model composed by a set of 19 stoichiometric reactions that represent an unbalanced model of the glycolysis process was included. To load it just type:

glycolysis <- read.csv(file = system.file("extdata", "glycolysisModel.csv",
                                          package = "minval"), 
                        stringsAsFactors = FALSE,
                        sep = '\t'
                        )
glycolysis$REACTION
##  [1] "2-Phospho-D-glycerate[c] <=> Phosphoenolpyruvate[c] + H2O[c]"                                                            
##  [2] "D-Glyceraldehyde 3-phosphate[c] <=> Glycerone phosphate[c]"                                                              
##  [3] "D-Glyceraldehyde 3-phosphate[c] + Orthophosphate[c] + NAD+[c] <=> 3 3-Phospho-D-glyceroyl phosphate[c] + NADH[c] + H+[c]"
##  [4] "beta-D-Fructose 1,6-bisphosphate[c] <=> Glycerone phosphate[c] + D-Glyceraldehyde 3-phosphate[c]"                        
##  [5] "ATP[c] + 3-Phospho-D-glycerate[c] <=> ADP[c] + 3-Phospho-D-glyceroyl phosphate[c]"                                       
##  [6] "2-Phospho-D-glycerate[c] <=> 3-Phospho-D-glycerate[c]"                                                                   
##  [7] "ATP[c] + alpha-D-Glucose[c] => ADP[c] + alpha-D-Glucose 6-phosphate[c]"                                                  
##  [8] "alpha-D-Glucose 6-phosphate[c] <=> beta-D-Fructose 6-phosphate[c]"                                                       
##  [9] "ATP[c] + beta-D-Fructose 6-phosphate[c] => ADP[c] + beta-D-Fructose 1,6-bisphosphate[c]"                                 
## [10] "ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]"                                                                 
## [11] "H2O[c] <=>"                                                                                                              
## [12] "NAD+[c] <=>"                                                                                                             
## [13] "NADH[c] <=>"                                                                                                             
## [14] "Orthophosphate[c] <=>"                                                                                                   
## [15] "H+[c] <=>"                                                                                                               
## [16] "ATP[c] <=>"                                                                                                              
## [17] "alpha-D-Glucose[c] <=>"                                                                                                  
## [18] "Pyruvate[c] <=>"                                                                                                         
## [19] "ADP[c] <=>"

Metabolic models include also another additional information related to the stoichiometric reactions. The generally associated information is:

colnames(glycolysis)
## [1] "ID"          "DESCRIPTION" "REACTION"    "GPR"         "LOWER.BOUND"
## [6] "UPPER.BOUND" "OBJECTIVE"

SBML files

The standard format to share and store biological processes such as metabolic models is the Systems Biology Markup Language (SBML) format. MINVAL package includes the "writeSBMLmod" function which is able to write models in SBML (level = 2, version = 1) format as follows:

writeSBMLmod(modelData = glycolysis,
          modelID = "Glycolysis",
          outputFile = "glycolysis.xml"
          )

Metabolic models in SBML format can be readed through the readSBMLmod function of the sybilSBML R package.

# glycoModel <- sybilSBML::readSBMLmod("glycolysis.xml")
# glycoModel

After load the metabolic model, it can be interrogated through FBA using the optimizeProb function of the sybil R package. In this case, the reaction R00200 was set as the objective function. The R00200 reaction describes the production of pyruvate from phosphoenolpyruvate an alpha-D-Glucose derivate.

Glycolysis pathway can be summarized as: 1 alpha-D-Glucose[c] + 2 NAD+[c] + 2 ADP[c] + 2 Orthophosphate[c] => 2 Pyruvate[c] + 2 NADH[c] + 2 H+[c] + 2 ATP[c] + 2 H2O[c] where the metabolism of one molecule of alpha-D-Glucose yield two molecules of pyruvate.

As is shown below, interrogated glycolysis model estimates a production of six molecules of pyruvate by each alpha-D-Glucose molecule due a mass unbalance. FBA methods are sensitive to thermodynamic (mass-charge) unbalance, so in order to achieve a valid biological extrapolation is mandatory to avoid this type of unbalancing in all model reactions.

# sybil::optimizeProb(glycoModel)

Syntax validation

The first step for a stoichiometric reactions validation is to check their syntax. Valid stoichiometric reactions must have the following mandatory characteristics:

Syntax validity can be checked through MINVAL package using the validateSyntax function which returns a boolean TRUE value if the stoichiometric reaction passes all validations. To validate reactions syntax just type:

validateSyntax(reactionList = glycolysis$REACTION)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE

Mass - Charge Balance Validation

Another step for a stoichiometric reactions validation is to check their mass-charge balance. This process requires the use of a reference with chemical formulas, molecular weights and/or net charges for each metabolite included in the metabolic model.

ChEBI database download

MINVAL package includes the downloadChEBI function that allows the download of different releases of the Chemical Entities of Biological Interest (ChEBI) database, a freely available dictionary of molecular entities focused on ‘small’ chemical compounds involved in biochemical reactions. To download the latest version (this process takes aprox. 2 min with a 5 Mb internet connection) of the ChEBI database just type:

ChEBI <- downloadChEBI(release = "latest",
                       woAssociations = TRUE
                       )

Reference also can be user provided. An example of a reference for the glycolysis model is shown below.

chemicalData <- read.csv2(file = system.file("extdata", "chemData.csv", 
                                             package = "minval")
                          )
head(chemicalData)
##                              NAME       FORMULA     MASS CHARGE
## 1                             H2O           H2O  18.0106      0
## 2                              H+             H   1.0078      1
## 3                             ATP C10H16N5O13P3 506.9957      0
## 4                            NAD+ C21H28N7O14P2 664.1169      1
## 5 3-Phospho-D-glyceroyl phosphate     C3H8O10P2 265.9593      0
## 6     beta-D-Fructose 6-phosphate      C6H13O9P 260.0297      0

Mass Balance Validation

In a balanced stoichiometric reaction according to the Lomonosov-Lavoisier law, the mass comprising the reactants should be the same mass present in the products. If the chemical formula is given, the checkBalance function multiplies the atom numbers by their respective stoichiometric coefficient and establishes if the atomic composition of reactants and products are the same. If the molecular weight is given then the sum of masses of reactants and products are compared. In both cases, a boolean TRUE value is returned if the mass is balanced.

Mass balance can be tested using the chemical formula or the molecular weight associated to each metabolite as follows:

checkBalance(reactionList = glycolysis$REACTION,
             referenceData = chemicalData,
             ids = "NAME",
             mFormula = "FORMULA"
             )
##  [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

As is shown above, the third stoichiometric reaction is unbalanced. It can be corrected replacing manually the unbalanced reaction as follows:

glycolysis$REACTION[3] <- "D-Glyceraldehyde 3-phosphate[c] + Orthophosphate[c] + NAD+[c] <=> 3-Phospho-D-glyceroyl phosphate[c] + NADH[c] + H+[c]"

checkBalance(reactionList = glycolysis$REACTION,
             referenceData = chemicalData,
             ids = "NAME",
             mWeight = "MASS"
             )
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE

When all stoichiometric reactions are mass-balanced, then the model can be exported and loaded to be interrogated again:

# writeSBMLmod(modelData = glycolysis,
#           modelID = "GlycolysisBalanced",
#           outputFile = "glycolysisBalanced.xml"
#           )

# sybil::optimizeProb(sybilSBML::readSBMLmod("glycolysisBalanced.xml"))

Charge Balance Validation

Charge balance can be also tested through the checkBalance function using the net charge of metabolites as follows:

checkBalance(reactionList = glycolysis$REACTION,
             referenceData = chemicalData,
             ids = "NAME",
             mCharge = "CHARGE"
             )
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE

Characterize model

Stoichiometric matrix

A metabolic model is often represented in a more compact form called the stoichiometry matrix (S). If a metabolic model has n reactions and m participating metabolites, then the stoichiometry matrix will have correspondingly m rows and n columns. Values in the stoichiometric matrix represent the metabolites coefficients in each reaction. To generate the stoichiometric matrix just type:

stoichiometricMatrix(reactionList = glycolysis$REACTION)
##                                      reactions
## metabolites                           R01 R02 R03 R04 R05 R06 R07 R08 R09 R10
##   2-Phospho-D-glycerate[c]             -1   0   0   0   0  -1   0   0   0   0
##   Phosphoenolpyruvate[c]                1   0   0   0   0   0   0   0   0  -1
##   H2O[c]                                1   0   0   0   0   0   0   0   0   0
##   D-Glyceraldehyde 3-phosphate[c]       0  -1  -1   1   0   0   0   0   0   0
##   Glycerone phosphate[c]                0   1   0   1   0   0   0   0   0   0
##   Orthophosphate[c]                     0   0  -1   0   0   0   0   0   0   0
##   NAD+[c]                               0   0  -1   0   0   0   0   0   0   0
##   3-Phospho-D-glyceroyl phosphate[c]    0   0   1   0   1   0   0   0   0   0
##   NADH[c]                               0   0   1   0   0   0   0   0   0   0
##   H+[c]                                 0   0   1   0   0   0   0   0   0   0
##   beta-D-Fructose 1,6-bisphosphate[c]   0   0   0  -1   0   0   0   0   1   0
##   ATP[c]                                0   0   0   0  -1   0  -1   0  -1   1
##   3-Phospho-D-glycerate[c]              0   0   0   0  -1   1   0   0   0   0
##   ADP[c]                                0   0   0   0   1   0   1   0   1  -1
##   alpha-D-Glucose[c]                    0   0   0   0   0   0  -1   0   0   0
##   alpha-D-Glucose 6-phosphate[c]        0   0   0   0   0   0   1  -1   0   0
##   beta-D-Fructose 6-phosphate[c]        0   0   0   0   0   0   0   1  -1   0
##   Pyruvate[c]                           0   0   0   0   0   0   0   0   0   1
##                                      reactions
## metabolites                           R11 R12 R13 R14 R15 R16 R17 R18 R19
##   2-Phospho-D-glycerate[c]              0   0   0   0   0   0   0   0   0
##   Phosphoenolpyruvate[c]                0   0   0   0   0   0   0   0   0
##   H2O[c]                               -1   0   0   0   0   0   0   0   0
##   D-Glyceraldehyde 3-phosphate[c]       0   0   0   0   0   0   0   0   0
##   Glycerone phosphate[c]                0   0   0   0   0   0   0   0   0
##   Orthophosphate[c]                     0   0   0  -1   0   0   0   0   0
##   NAD+[c]                               0  -1   0   0   0   0   0   0   0
##   3-Phospho-D-glyceroyl phosphate[c]    0   0   0   0   0   0   0   0   0
##   NADH[c]                               0   0  -1   0   0   0   0   0   0
##   H+[c]                                 0   0   0   0  -1   0   0   0   0
##   beta-D-Fructose 1,6-bisphosphate[c]   0   0   0   0   0   0   0   0   0
##   ATP[c]                                0   0   0   0   0  -1   0   0   0
##   3-Phospho-D-glycerate[c]              0   0   0   0   0   0   0   0   0
##   ADP[c]                                0   0   0   0   0   0   0   0  -1
##   alpha-D-Glucose[c]                    0   0   0   0   0   0  -1   0   0
##   alpha-D-Glucose 6-phosphate[c]        0   0   0   0   0   0   0   0   0
##   beta-D-Fructose 6-phosphate[c]        0   0   0   0   0   0   0   0   0
##   Pyruvate[c]                           0   0   0   0   0   0   0  -1   0

Metabolites

The metabolites function automatically identifies and return all metabolites (with or without compartments) for a specific or a set of stoichiometric reactions. Some FBA implementations require the list of all metabolites included in the metabolic reconstruction as an independent section of the human-readable input model.

metabolites(reactionList = glycolysis$REACTION)
##  [1] "2-Phospho-D-glycerate[c]"            "Phosphoenolpyruvate[c]"             
##  [3] "H2O[c]"                              "D-Glyceraldehyde 3-phosphate[c]"    
##  [5] "Glycerone phosphate[c]"              "Orthophosphate[c]"                  
##  [7] "NAD+[c]"                             "3-Phospho-D-glyceroyl phosphate[c]" 
##  [9] "NADH[c]"                             "H+[c]"                              
## [11] "beta-D-Fructose 1,6-bisphosphate[c]" "ATP[c]"                             
## [13] "3-Phospho-D-glycerate[c]"            "ADP[c]"                             
## [15] "alpha-D-Glucose[c]"                  "alpha-D-Glucose 6-phosphate[c]"     
## [17] "beta-D-Fructose 6-phosphate[c]"      "Pyruvate[c]"
metabolites(reactionList = glycolysis$REACTION,
            woCompartment = TRUE)
##  [1] "2-Phospho-D-glycerate"            "Phosphoenolpyruvate"             
##  [3] "H2O"                              "D-Glyceraldehyde 3-phosphate"    
##  [5] "Glycerone phosphate"              "Orthophosphate"                  
##  [7] "NAD+"                             "3-Phospho-D-glyceroyl phosphate" 
##  [9] "NADH"                             "H+"                              
## [11] "beta-D-Fructose 1,6-bisphosphate" "ATP"                             
## [13] "3-Phospho-D-glycerate"            "ADP"                             
## [15] "alpha-D-Glucose"                  "alpha-D-Glucose 6-phosphate"     
## [17] "beta-D-Fructose 6-phosphate"      "Pyruvate"

Compartments

As well as in cells, in which not all reactions occur in all compartments, stoichiometric reactions in a metabolic reconstruction can be labeled to be restricted to a single compartment during FBA through the assignment of a compartment label at the end of each metabolite name. Some FBA implementations require the list of all compartments included in the metabolic reconstruction as an independent section of the human-readable input file.

compartments(reactionList = glycolysis$REACTION)
## [1] "c"

OrphanMetabolites

orphanMetabolites or compounds that are not produced or consumed in any other reaction are one of the main causes of mass accumulation in metabolic reconstructions. The orphanReactants function identifies compounds that are not produced internally by any other reaction and should be added to the reconstruction, for instance, as an exchange reaction while the orphanProducts function identifies compounds that are not consumed internally by any other reaction and should be added to the reconstruction as a sink reaction.

orphanMetabolites(reactionList = glycolysis$REACTION[1:10])
## [1] "alpha-D-Glucose[c]" "Pyruvate[c]"        "H+[c]"             
## [4] "H2O[c]"             "NAD+[c]"            "NADH[c]"           
## [7] "Orthophosphate[c]"
orphanReactants(reactionList = glycolysis$REACTION[1:10])
## [1] "alpha-D-Glucose[c]" "H+[c]"              "H2O[c]"            
## [4] "NAD+[c]"            "NADH[c]"            "Orthophosphate[c]"
orphanProducts(reactionList = glycolysis$REACTION[1:10])
## [1] "Pyruvate[c]"       "H+[c]"             "H2O[c]"           
## [4] "NAD+[c]"           "NADH[c]"           "Orthophosphate[c]"

TSV files

The function writeTSVmod writes a metabolic model in three text files, following a character-separated value format. Each line contains one entry; the default value separator is a tab. TSV models are the default input format for the sybil R package.

# writeTSVmod(modelData = glycolysis,
#           modelID = "Glycolysis",
#           outputFile = "glycolysis"
#           )
# 
# sybil::readTSVmod(prefix = "glycolysis",quoteChar = "\"")