Welcome to the ohmyR package!

For 'how to build my package', sees README.

Reading in

The first, oh no -- the first one is hello(), the second function in this package is reading_in(). This function is mainly to extract the content of a fastqc_data.txt, which is one of the files generated by fastqc program.

In fastqc_data.txt, each session title starts with >> and ends with >>END_MODULE. For example:

>>Basic Statistics  pass
#Measure    Value
Filename    SRR12953537_1.fastq.gz
File type   Conventional base calls
Encoding    Sanger / Illumina 1.9
Total Sequences 50817747
Sequences flagged as poor quality   0
Sequence length 75
%GC 44
>>END_MODULE

Running reading_in equals to executing this sed command: sed -n '/anytest/,/END_MODULE/p' file | grep -v '^>>'.The -n option disables the automatic printing, and print the lines you explicitly tell it to print (with p) get printed only once. So the sed command here is to print the lines from the title of test of your intrest to the end of the content of this test. Then grep -v '^>>' is to get rid of the first and the last lines started with >>. Let's try "Per base sequence quality".

fa.file <- "/athena/angsd/scratch/zhp4001/data/fastqc/SRR12953537/SRR12953537_1_fastqc/fastqc_data.txt"
per.base.seq <- reading_in(fa.file, test = "Per base sequence quality")
# equals to `sed -n '/Per base sequence quality/,/END_MODULE/p' /path/to/fastqc_data.txt | grep -v '^>>'`
head(per.base.seq)

Generating an object

How did I generate an example object:

# eval=FALSE
wt_1_1 <- '/home/zhp4001/ANGSD/ERP004763/WT-1/ERR458493_fastqc/fastqc_data.txt'
wt_1_2 <- '/home/zhp4001/ANGSD/ERP004763/WT-1/ERR458494_fastqc/fastqc_data.txt'
snf2_1_1 <- '/home/zhp4001/ANGSD/ERP004763/SNF2-1/ERR458500_fastqc/fastqc_data.txt'
snf2_1_2 <- '/home/zhp4001/ANGSD/ERP004763/SNF2-1/ERR458501_fastqc/fastqc_data.txt'

for (file in c("wt_1_1", "wt_1_2", "snf2_1_1", "snf2_1_2")) {
  assign(file, ohmyR::reading_in(get(file), sample=file))
}

all.reports <- rbind(wt_1_1, wt_1_2, snf2_1_1, snf2_1_2)
colnames(all.reports)[1] <- "Base"
# save into data-raw
save(all.reports, file="/home/zhp4001/ohmyR/data/fastqc.reports.rda")

ggplot for the object

library(ohmyR)
library(ggplot2)

?all.reports # will show the information

new.report <- all.reports[, c("Base", "Mean", "sampleName")]
new.report$Base <- as.numeric(new.report$Base)
new.report$type <- gsub("_._.", "", new.report$sampleName)

ggplot(new.report, aes(x=Base, y=Mean, color=sampleName)) + 
  geom_point() +
  theme_bw() + facet_grid(cols = vars(type))

Session info

sessionInfo()


chilampoon/ohmyR documentation built on March 8, 2021, 12:04 a.m.