Data formats used in cubfits.
All are in simple formats as S3 default lists or data frames.
Format b
:
A named list A
contains amino acids.
Each element of the list A[[i]]
is a list of elements
coefficients
(coefficients of log(mu) and Delta.t),
coef.mat
(matrix format of coefficients
), and
R
(covariance matrix of coefficients
).
Note that coefficients
and R
are typically as in the output
of vglm()
of VGAM package.
Also, coef.mat
and R
may miss in some cases.
e.g. A[[i]]$coef.mat
is the regression beta matrix of i
-th
amino acid.
Format bVec
:
A vector simply contains all coefficients of a b
object A
.
Note that this is probably only used inside MCMC or the output of
vglm()
of VGAM package.
e.g. do.call("c", lapply(A, function(x) x$coefficients))
.
Format n
:
A named list A
contains amino acids.
Each element of the list A[[i]]
is a vector containing total
codon counts.
e.g. A[[i]][j]
is for j
-th ORF of i
-th amino acid
names(A)[i]
.
Format n.list
:
A named list A
contains ORFs.
Each element of the list A[[i]]
is a named list of amino acid
containing total count.
e.g. A[[i]][[j]]
contains total count of
j
-th amino acid in i
-th ORF.
Format phi.df
:
A data frame A
contains two columns ORF
and phi.value
.
e.g. A[i,]
is for i
-th ORF.
Format reu13.df
:
A named list A
contains amino acids.
Each element is a data frame summarizing ORF and expression.
The data frame has four to five columns including
ORF
, phi
(expression), Pos
(amino acid position),
Codon
(synonymous codon), and
Codon.id
(synonymous codon id, for computing only).
Note that Codon.id
may miss in some cases.
e.g. A[[i]][17,]
is the 17-th recode of i
-th amino acid.
Format reu13.list
:
A named list A
contains ORFs.
Each element is a named list A[[i]]
contains amino acids.
Each element of nested list A[[i]][[j]]
is a position vector
of synonymous codon.
e.g. A[[i]][[j]][k]
is the k
-th synonymous codon position of
j
-th amino acid in the i
-th ORF.
Format scuo
:
A data frame of 8 named columns includes
AA
(amino acid), ORF
, C1
, ..., C6
where C*
's are for codon counts.
Format seq.string
:
Default outputs of read.fasta()
of seqinr package.
A named list A
contains ORFs.
Each element of the list is a long string of a ORF.
e.g. A[[i]][1]
or A[[i]]
is the sequence of
i
-th ORF.
Format seq.data
:
Converted from seq.string
format.
A named list A
contains ORFs.
Each element of the list A[[i]]
is a string vector.
Each element of the vector is a codon string.
e.g. A[[i]][j]
is i
-th ORF and j
-th codon.
Format phi.Obs
:
A named vector A
of observed expression values and possibly
with measurement errors.
e.g. A[i]
is the observed phi value of i
-th ORF.
Format y
:
A named list A
contains amino acids.
Each element of the list A[[i]]
is a matrix
where ORFs are in row and synonymous codons are in column.
The element of the matrix contains codon counts.
e.g. A[[i]][j, k]
is the count for i
-th amino acid,
j
-th ORF, and k
-th synonymous codon.
Format y.list
:
A named list A
contains ORFs.
Each element of the list A[[i]]
is a named list A[[i]][[j]]
contains amino acids.
The element of amino acids list is a codon count vector.
e.g. A[[i]][[j]][k]
is the count for i
-th ORF,
j
-th amino acid, and k
-th synonymous codon.
Wei-Chen Chen [email protected].
https://github.com/snoweye/cubfits/
