README.md

RbashGEO

The purpose of the package RbashGEO

Install it with the command below

devtools::install_github("ZhenWei10/RbashGEO")

A template work flow for RNA-Seq

library(RbashGEO)
knitr::kable( Coldata_example[1:6,1:6] )

| SRR_RUN | IP_input | Experiment | Perturbation | Interest | Lib | |:-----------|:----------|:---------------|:-------------|:---------|:-------| | SRR5417009 | IP | human-NB4 | C | METTL14 | Single | | SRR5417010 | IP | human-MonoMac6 | C | METTL14 | Single | | SRR5417011 | IP | human-NB4 | C | METTL14 | Single | | SRR5417012 | IP | human-NB4 | METTL14- | METTL14 | Single | | SRR5417013 | IP | human-NB4 | METTL14- | METTL14 | Single | | SRR5417014 | IP | human-MonoMac6 | C | METTL14 | Single |

Only 2 collumns: SRR_RUN and Lib are neccessary to complete the work flow of this package, but addition collumns are valuable for their own good.

The design of the collumn table is summarized from this page on GEO.

library(RbashGEO)
mapply(function(x,y) Rnohup(WgetQC(x,y),x), Coldata_example$SRR_RUN,Coldata_example$Lib == "Paired")
library(RbashGEO)
library(dplyr)

mapply(
 function(x,y){
   Rhisat2(Fastq_file_name = x,
           Paired = y,
           parallel_num = 1,
           Fastq_directory = getwd()) %>% Rnohup(.,paste0(x,"_hisat2"))}, 
 Coldata_example$SRR_RUN,
 (Coldata_example$Lib == "Paired")
)
hisat2_report <- RbashGEO::Check_hisat2_reports()
knitr::kable(hisat2_report[1:6,-1*c(2,3,4)])

| Sample_ID | Uniquely_alignment | Multiple_alignment | Mapping_efficiency | |:-----------|:--------------------|:--------------------|:--------------------| | SRR5417009 | 18096777 (52.81%) | 14678868 (42.83%) | 95.64% | | SRR5417010 | 11359024 (44.98%) | 12732204 (50.42%) | 95.40% | | SRR5417011 | 13108383 (58.39%) | 6646745 (29.61%) | 87.99% | | SRR5417012 | 10650346 (45.12%) | 5640761 (23.90%) | 69.02% | | SRR5417013 | 7925819 (36.53%) | 4123070 (19.00%) | 55.53% | | SRR5417014 | 11652714 (47.75%) | 7618818 (31.22%) | 78.97% |

You should see a data.frame with summarized outputs of hisat2.

library(RbashGEO)

sapply(
  Coldata_example$SRR_RUN,
  function(x) Rnohup(
  Rsamtools_view(x,sam_end = ".sam",parallel_num = 1,flag_filter = 2820),
  x)
)

At this step, we need to provide a GRanges object for annotation, in this case we use Annotation_gr as an example.

Count_SRRs(Coldata_example$SRR_RUN,"./",Annotation_gr,"Example_human_SE")

At last, the obtained SummarizedExperiment object could be conveniently handeled with other QC, inference, and learning work flows.

If you are interested in MeRIP data, you could use meripQC;

if you are interested in the analysis of the RNA modification data, you could look m6ALogisticModel for more information.

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RbashGEO_1.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14               XVector_0.18.0            
##  [3] knitr_1.18                 magrittr_1.5              
##  [5] zlibbioc_1.24.0            GenomicRanges_1.30.1      
##  [7] BiocGenerics_0.24.0        GenomicAlignments_1.14.1  
##  [9] IRanges_2.12.0             BiocParallel_1.12.0       
## [11] lattice_0.20-35            highr_0.6                 
## [13] stringr_1.2.0              GenomeInfoDb_1.14.0       
## [15] tools_3.4.2                grid_3.4.2                
## [17] SummarizedExperiment_1.8.1 parallel_3.4.2            
## [19] Biobase_2.38.0             matrixStats_0.52.2        
## [21] htmltools_0.3.6            yaml_2.1.16               
## [23] rprojroot_1.3-2            digest_0.6.13             
## [25] Matrix_1.2-12              GenomeInfoDbData_1.0.0    
## [27] S4Vectors_0.16.0           bitops_1.0-6              
## [29] RCurl_1.95-4.10            evaluate_0.10.1           
## [31] rmarkdown_1.8              DelayedArray_0.4.1        
## [33] stringi_1.1.6              compiler_3.4.2            
## [35] Rsamtools_1.30.0           Biostrings_2.46.0         
## [37] backports_1.1.2            stats4_3.4.2


ZhenWei10/RbashGEO documentation built on May 28, 2019, 8:22 a.m.