Description Usage Arguments Value
Calculates the proportions of pure cell type components in heterogeneous cell type samples of RNA-seq data utilizing isoform-level expression differences
1 2 3 4 | IsoDeconvMM(directory = NULL, mix_files, pure_ref_files, fraglens_files,
bedFile, knownIsoforms, discrim_genes, readLen, lmax = 600,
eLenMin = 1, mix_names = NULL, initPts = NULL,
optim_options = optimControl())
|
directory |
an optional character string denoting the path to the directory where all of the mix_files, pure_ref_files, fraglens_files, and bedfile are located. The working directory is set as this directory. If this directory is left 'NULL', then all of the relevent files must either (a) be located in the current working directory or (b) have their full path specified. |
mix_files |
a vector of the file names for the text files recording the number of RNA-seq
fragments per exon set, which should have 2 columns "count" and "exons", without header.
For example:
|
pure_ref_files |
a matrix where the first column is the file names for the text files recording the number of RNA-seq fragments per exon set (see 'mix_files' for additional description), one for each of the pure reference cell type samples (again, see the Step_0_Processes directory in <https://github.com/hheiling/deconvolution> for directions on how to create these files) and the second column contains the character names of the pure cell type associated with each sample |
fraglens_files |
a vector of the file names for the text files recording the distribution
of the fragment lengths, which should have 2 columns: "Frequency" and "Length", without header.
For example:
|
bedFile |
file name of the .bed file recording information of non-overlapping exons, which
has 6 colums: "chr", "start", "end", "exon", "score", and "strand",
without header. For example:
|
knownIsoforms |
character string for the name of an .RData object that contains the known isoform
information. When loaded, this object is a list where each component is a binary matrix
that specifies a set of possible isoforms (e.g., isoforms from annotations). Specifically, it is a
binary matrix of k rows and m columns, where k is the number of
non-overlapping exons and m is the number of isoforms. isoforms[i,j]=1
indicates that the i-th exon belongs to the j-th isoform. For example,
the following matrix indicates the three isoforms for one gene ENSMUSG00000000003:
|
discrim_genes |
vector of genes that are suspected to have differential gene expression.
This gene list could come from CuffLinks output, |
readLen |
numeric value of the length of a read in the RNAseq experiment |
lmax |
numeric value of the maximum fragment length of the experiment |
eLenMin |
numeric value of the minimum value of effective length. If the effective length of an exon or exon junction is smaller than eLenMin, i.e., if this exon is not included in the corresponding isoform, set it to eLenMin. This is to account for possible sequencing error or mapping errors. |
mix_names |
an optional vector of the desired nicknames of the mixture samples corresponding,
in the same order, to the mix_files list. If left as the default |
initPts |
an optional matrix of initial probability estimates for the cell composition of the mixture samples to be used in the optimization procedure. The matrix should have J columns, where J = number of pure cell types of interest. Each row corresponds to different combinations of initial probability values. The column names of the matrix must be provided and must correspond to the pure cell type names given in the second column of the pure_ref_files object (no particular ordering needed) |
optim_options |
a list inheriting from class |
A list object with the following structure: first layer of list has elements associated with each of the mixture samples; second layer of list as elements associated with each transcript cluster used in the analysis, determined by the genes in the discrim_gene vector. Each of these transcript cluster elements is itself a list with the following elements:
info |
|
candiIsoform |
|
I |
Number of isoforms utilized in transcript cluster |
E |
Number of exons in transcript cluster |
X |
ExI matrix of effective lengths for each of the E exon sets within each of the I isoforms |
info_status |
|
y_mix, other y vectors for each pure cell type reference sample |
Ex1 vectors of read count at each exon set for the given mixture or pure cell type sample |
countN_mix, other countN values for each pure cell type reference sample |
|
mix |
a list with the elements rds_exons_t (vector of length E+1 where the last E elements are y_mix, and the first element is the total read counts for the mixture sample minus the sum of y_mix), gamma.est ((I-1)xK matrix of isoform expression parameters for each cell type k), tau.est (vector of length K of gene expression parameters in cell type k), p.est (vector of length K containing estimated proportions based on the given transcript cluster), and pm.rds.exons (ExK matrix containing posterior means for each of E exon sets in each of K cell types) |
"cellType1","cellType2" ... |
|
l_tilde |
Ix1 vector of total effective lengths of each of the I isoforms;
Each elemement of the vector, denoted l_i, is a column sum from the matrix |
X.fin |
edited design matrix for new gamma parameters, where the ith column of the new matrix is
|
X.prime |
first (I-1) columns X.fin pertaining to gamma parameters |
alpha.est |
IxK hyperparameters governing average isoform expression levels and variances within cells of type k |
beta.est |
2xK hyperparameters governing gene expression levels within cells type k |
CellType_Order |
For outputs giving K different estimates for each of the K cell types, these outputs are ordered with respect to CellType_Order |
WARN |
An integer indicating the following information:
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.