The first thing to using a software is to know what it inputs and outputs are.
Before using geneHapR
there were several data set should
be prepared by the user.
A file stored difference in DNA level among individuals is necessary for haplotype analysis. This file could be supplied in variant call format (VCF) format or FASTA format or multi-aligned format.
The files hereafter are also recommend for filtration, visualization,
and phenotpye association analysis. An annotation file in General
Feature Format (GFF) format stored the annotations for the target
species. And another two tables stored phenotype data and individuals
grouping information separately, certainly this two table also could be
supplied as ‘R’ object of data.frame
class.
VCF file (variant call format file) imported into ‘R’ as vcfR object.
GFF file (genome annotations) imported into ‘R’ as GRanges object.
DNA sequences (fasta format) imported into ‘R’ as DNAStringSet object.
Phenotype data and accession group information imported into ‘R’ as data.frame objects.
The main results are hapResult
and
hapSummary
could be export as tab delimed tables; and
visualizations could be export as figures format or PDF files.
hapResult
and hapSummary
hapResult
and hapSummary
are effectively a
matrix, which could be divided into three parts, with some additional
attributes.
Part I consists of only one column, indicates contents type of each row. The first four rows are fix to additional information as CHROM, POS, INFO and ALLELE. Further annotations are stored in fields of INFO, and each field are separated by semicolons (;). Followed rows are names of each haplotype.
Part II: consists of at least one column. Each column represents a site. The first four elements in each contents information and annotations of the current sites. And followed elements represents genotype of the corresponding haplotype.
Part III: The part III of hapResult
consists of one column named as Accession, while the
part III of hapSummary
consof two columns named as
Accession and freq.
The differences between hapResult
and
hapSummary
only lied in part III: (a)
there is a freq column in hapSummary
while
hapResult
not; (b) multi-accessions are separated by
semicolons in hapSummary
while one accession in each row of
hapResult
.