GCAI.bias-package: Guided Correction Approach for Inherited bias (GCAI.bias)
In GCAI.bias: Guided Correction Approach for Inherited bias (GCAI.bias)

Description Details Author(s) References Examples

Many inherited biases and effects exists in RNA-seq due to both biological and technical effects. We observed the biological variance of testing target transcripts can influence the yield of sequencing reads, which might indicate a resource competition existing in RNA-seq. We developed this package to capture the bias depending on local sequence and perform the correction of this type of bias by borrowing information from spike-in measurement.

Package:	GCAI-bias
Type:	Package
Version:	1.0
Date:	2014-07-14
License:	GPL (>=2)
LazyLoad:	yes

This package is used for correcting bias introduced by the biological variance of sample transcripts sources. Batch effect in measurement of the same biological sample can be corrected by this package directly. However, spike-in are required to correct bias between different biological samples. For strand specific RNA-seq, antisense and sense reads should be corrected separately. Sequencing reads on each base pair are required to be formatted as train.dat.seq, train.dat.counts, test.dat.seq and test.dat.counts objects. Coefficients of local sequence will be estimated in lm.estimate and they will be used to correct bias by correct.guided function. Visualization of coefficients and correcting performance can be achieved by coeplot, corplot and posplot.

Guoshuai Cai

Maintainer: Guoshuai Cai <GCAI.bioinfo@gmail.com>

Cai G, RNA-SEQUENCING APPLICATIONS: GENE EXPRESSION QUANTIFICATION AND METHYLATOR PHENOTYPE IDENTIFICATION, Ph.D. Thesis, 2013

#initialize index matrix
word<-81
word.vec<-c("A","T","C","G")
pos.vec<-c((-(word-1)/2):((word-1)/2))

obj.index<-index.mat.generation(word.vec,pos.vec)

#train

data(train.dat.seq)
data(train.dat.counts)

train.index<-index.preprocess(train.dat.seq,word)
obj.train<-counts.preprocess(train.dat.counts)
obj.train[["index"]]<-train.index

coe.lm<-lm.estimate(obj.train,fit.cut.train=5)

coeplot(coe.lm,obj.index)

#test,correct

data(test.dat.seq)
data(test.dat.counts)

test.index<-index.preprocess(test.dat.seq,word)
obj.test<-counts.preprocess(test.dat.counts)
obj.test[["index"]]<-test.index

test.corrected<-correct.guided(coe.lm,obj.test)

corplot(test.corrected)
posplot(test.corrected,obj.test$pos)