Description Usage Arguments Details Value Author(s) See Also Examples
Creates a list
object from mutation and reference data for use with BaySIC fitting and testing functions
1 | baysic.data(dat, ref.dat, plot = FALSE, N = NULL, silent = TRUE)
|
dat |
matrix; Mutation input data. Baysic requires a specific format similar to the MUT format file, and should be an M\times7 matrix with column headings "chr", "start", "end", "id","type", "gene","context," where each row details an individual mutation. |
ref.dat |
a dataframe or |
plot |
logical; if |
N |
an integer (optional); equal to the number of subjects represented in |
silent |
logical; if |
The mutation data dat
is a 7-column matrix similar in style to other popular mutation file formats. The first three columns ("chr","start","end") correspond to the positional information of the somatic mutation. The "id" column represents an identification vector including subject ids for each documented mutation. The "type" column corresponds to the type of mutation for each entry. This is relatively flexible for point mutations, and only requires some form of "silent" or "synonymous" for such mutations if silent=FALSE
, but insertion/deletion events should be designated as "INDEL." The "gene" column represents the name of the gene the mutation corresponds to, and must match the gene names used in ref.dat
. The "context" entries represent the trinucleotide sequence context of each point mutation (NA
for INDELS)
The first two columns of the data matrix (or matrices) in ref.dat
should correspond to the gene name and corresponding chromosome, and the column names of the remaining 32 columns should correspond to the trinucleotide motif (e.g. "ACA"). The sequence content entries should be integer values which correspond to the number of nucleotides in the coding content of a given gene which satisify the trinucleotide motif (central base with flanking 5' and 3' bases). Each base should be uniquely represented, such that the sum of all 32 counts is equivalent to the basepair length of the total coding sequence for a given gene.
The baysic.data
function has its own trinucleotide naming convention, in that all motifs are in all caps and have either "T" or "C" as the central base. Column names of ref.dat
and "context" entries in dat
will be adjusted to accommodate this convention if they deviate from it.
Returns a list
data structure with the following components:
all.dat |
Original mutation data object |
ref.dat |
Original reference data object |
N |
Number of subjects with observed data |
genes |
Vector of length G of gene names included in analysis, where G is the total number of genes. Derived from |
snv.dat |
A G\times32 matrix of total number of SNV mutations per sequence context and gene |
indel.dat |
Vector of length G of total number of indel mutations per gene |
Nicholas B. Larson
1 2 3 4 5 6 | ## Not run:
data(example.dat)
data(ccds.19)
baysic.dat.ex<-baysic.data(example.dat,ccds.19)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.