R/data.r

#' Genome annotation file.
#'
#' A dataset containing the annotation file WITHOUT the column name. The example file is Maize genome
#' annotation, and the version is AGPv3.IMPORTANT, the annotation file could be read by data.table::fread.
#' The variables are as follows:
#'
#' @format:
#' \describe{
#'   \item{the first column}{The name of the sequence. Commonly, this is the chromosome ID or contig ID}
#'   \item{the second column}{The source column should be a unique label indicating whether the annotation is protein-coding or something else.}
#'   \item{the third column}{The following feature types are required: 'CDS', 'start_codon', 'stop_codon'. The features '5UTR', '3UTR', 'inter', 'inter_CNS', 'intron_CNS' and 'exon' are optional.}
#'   \item{the fourth column}{Integer start coordinates of the feature relative to the beginning of the sequence named}
#'   \item{the fifth co0umn}{Integer end coordinates of the feature relative to the beginning of the sequence named}
#'   \item{the sixth column}{The score field indicates a degree of confidence in the feature's existence and coordinates.}
#'   \item{the seventh column}{The strand indicates the annotation is located on forward or reverse strand.}
#'   \item{the eigth column}{frame}
#'   \item{the ninth column}{attributes}
#' }
"gff"

#' Protein annotation file.
#'
#' A dataset containing the protein annotation file.THe annotation file could be read by readLines.
#' The dataset can be generated by querying protein sequences against protein domain database by rpsblast and rpsbproc.
"annofile"

#' A dataser containing genic structure and protein domain information.
#'
#' A dataset containing the protein annotation file.THe annotation file could be read by readLines.
#' Initialy, the protein annotaion file can be generated by querying protein sequences against protein domain database by rpsblast and rpsbproc, and then
#' combine gff file and the protein annotation file by function VisProDom::CreDat.
#' @format:
#' \describe{
#'   \item{the 1th column}{transcript name}
#'   \item{the 2th column}{feature types}
#'   \item{the 3th column}{start position in genome scale}
#'   \item{the 4th column}{end position in genome scale}
#'   \item{the 5th co0umn}{start position in transcriptom scale}
#'   \item{the 6th column}{end position in transcriptom scale}
#'   \item{the 7th column}{protein domani start position in genome scale}
#'   \item{the 8th column}{protein domani end position in genome scale}
#'   \item{the 9th column}{protein domani start position in transcriptom scale}
#'   \item{the 10th column}{protein domani end position in transcriptom scale}
#'   \item{the 11th column}{protein domain name}
#' }
whweve/VisProDom documentation built on May 23, 2022, 6:45 p.m.