Assessment-class: Assessment objects

Description DataMap Objects Results Objects Gene Categories S3 Methods

Description

In order to assess the quality of a set of (predicted) genes for a genome, evidence must first be mapped to that genome. Next, each gene must be categorized based on how strong the evidence is for that gene or against that gene. Class Assessment furnishes objects that can store the necessary information for assessing a set of genes for a genome and also provides functions for viewing and visualizing assessment information. Specifically, class Assessment objects utilize proteomic hits and evolutionarily conserved start & stop codons as evidence to determine the correctness for each gene in a given set.

DataMap Objects

Objects of class Assessment and subclass DataMap are used to store the mapping of proteomics and evolutionary conservation to the genome of interest (central genome). They are generated through the function MapAssessmentData, and they have a list structure containing the following elements:

StrainID

Equal to strainID if it was specified; otherwise ""

Species

Equal to speciesName if it was specified; otherwise ""

GenomeLength

Length of the central genome

StopsByFrame

Where the stops are in each frame, used to bound open reading frames in downstream functions

N-TermProteomics

Logical describing whether or not the proteomics hits are from N-terminal proteomics

FwdProtHits

Proteomic hit information that maps to the three forward frames of the central genome

RevProtHits

Proteomic hit information that maps to the three reverse frames of the central genome

FwdCoverage

Coverage of the forward strand of the central genome

FwdConStarts

Start codon conservation of the forward strand of the central genome

FwdConStops

Stop codon conservation of the forward strand of the central genome

RevCoverage

Coverage of the reverse strand of the central genome

RevConStarts

Start codon conservation of the reverse strand of the central genome

RevConStops

Stop codon conservation of the reverse strand of the central genome

NumRelatedGenomes

Final number of related genomes that were mapped to the central genome

HasProteomics

Logical describing whether or not proteomics evidence has been mapped to the central genome

HasConservation

Logical describing whether or not evolutionary conservation evidence has been mapped to the central genome

Results Objects

Objects of class Assessment and subclass Results are used to store how correct a set of genes for a given genome. The function AssessGenes generates Results using a DataMap object and information on a set of genes for the genome corresponding to the DataMap object. Results objects have a list structure containing the following elements:

StrainID

Equal to the strainID of the corresponding DataMap object

Species

Equal to speciesName of the corresponding DataMap object

GenomeLength

Length of the genome

GeneLeftPos

Left positions of the given set of genes (in forward strand terms)

GeneRightPos

Right positions of the given set of genes (in forward strand terms)

GeneStrand

Strand information of the given set of genes ("+" or "-")

GeneSource

The source of the given set of genes

NumGenes

Number of genes given

N_CS-_PE+_ORFs

Data for open reading frames with no gene start but with proteomics evidence

N_CS<_PE+_ORFs

Data for open reading frames with no gene start but with proteomics evidence and at least one valid evolutionarily conserved start

CategoryAssignments

A character vector that stores the category assignment for each of the given genes in the same order as the gene information (please see below for a list of all possible categories, their descriptions, and their character string codes)

Gene Categories

The CategoryAssignments vector in Results objects describes how the proteomics evidence and evolutionarily conserved start/stop codon evidence support or disprove the corresponding set of genes. In the vector, each gene is assigned a character string code that has the following format: "Y CS[_] PE[_]". The first part, the "Y", signifies that for this ORF contains a predicted gene. The second part, the "CS[_]", describes how the conserved start(s) lines up with the given gene start. The third part, the "PE[_]", describes how the proteomics hits line up with the given gene start.

Y CS+ PE+

There is a good conserved start aligned with the gene start with protein evidence downstream.

Y CS+ PE-

There is a good conserved start aligned with the gene start without protein evidence downstream.

Y CS- PE+

There is no good conserved start aligned with the predicted start, and there is protein evidence downstream of the gene start.

Y CS- PE-

There is no good conserved start aligned with the predicted start, and there is no protein evidence downstream of the gene start.

Y CS! PE-

There are either multiple good conserved stops in the middle of the gene, or the most downstream, good conserved stop is followed by a good conserved start. There is no protein evidence downstream of the gene start

Y CS! PE+

The most downstream, good conserved stop is followed by a good conserved start, and there is protein evidence downstream of the gene start.

Y CS< PE!

The protein evidence disagrees with/is upstream of the gene start, and there is a good conserved start upstream of the protein evidence.

Y CS- PE!

The protein evidence disagrees with/is upstream of the gene start, and there is no good conserved start upstream of the protein evidence.

Y CS> PE+

The best conserved starts are downstream of the predicted start, and there is protein evidence downstream of the gene start.

Y CS> PE-

The best conserved starts are downstream of the predicted start, and there is no protein evidence downstream of the gene start.

Y CS< PE+

At least one of the best conserved starts is upstream of the predicted start, and there is protein evidence downstream of the gene start.

Y CS< PE-

At least one of the best conserved starts is upstream of the predicted start, and there is no protein evidence downstream of the gene start.

S3 Methods

as.matrix.Assessment (only works with objects of class Results)

print.Assessment

plot.Assessment

mosaicplot.Assessment (only works with objects of class Results)


DRK248/AssessORF documentation built on Jan. 30, 2020, 7:05 p.m.