In kathrynrozengagnon/CLIPflexR: CLIP tools and wrappers in R

knitr::opts_chunk$set(echo = TRUE)
baseDir <- "/Users/thomascarroll/Desktop/Projects/brc/clipRUTest/"

This vignette shows how to execute the CTK "Standard/BrdU-CLIP_data_analysis_using_CTK" vignette using CLIPflexR, from within R.

Download Data

Download data from SRA. SRR110753 was selected as an example in this vignette.

require(SRAdb)
mySRA <- "SRAmetadb.sqlite"
if(!file.exists(mySRA)){
  mySRA <- getSRAdbFile()
}
sra_con <- dbConnect(dbDriver("SQLite"), mySRA )

Fox3_Std <- "SRR1107535.fastq.gz"
if(!file.exists(Fox3_Std)){
  getFASTQfile("SRR1107535",
               sra_con = sra_con,
               destDir = getwd())
}

Process data

Quality filter reads (wrapping CTK's fastq_filter.pl)

Filtering low quality read using R-wrapped fastq_filter.pl from CTK.
Use fastq/gzipped fast files as input. outFile: fastq file after filtering low quality reads; qsFilter: filtering criteria, "mean:0-29:20" means the average quality of the first 30 bases must be equal or greater than 20.
CLIPflexR documentation.

Fox3_Std <- "SRR1107535.fastq.gz"
Fox3_Std_filtered <- ctk_fastqFilter(Fox3_Std,
                                     outFile = "SRR1107535_FF.fastq",
                                     qsFilter="mean:0-29:20")

Remove 3' linkers (wrapping FASTX's fastx_clipper)

Trimming 3'-end adapter sequences using R-wrapped fastx_clipper function from FASTX_toolkit.
adaptor: CLIP this sequence from the 3'-end of reads, "GTGTCAGTCACTTCCAGCGG" (default); length: minimum read length, only keep reads with length equal or greater then this threshold, 18 (default).
CLIPflexR documentation.

Fox3_Std_filtered <- fastx_clipper(Fox3_Std_filtered,outFile = "SRR1107535_clipped.fastq",adaptor = "GTGTCAGTCACTTCCAGCGG",length =20)

Trim low quality nucleotides and short reads (wrapping FASTX's fastq_quality_trimmer)

Discard reads with low quality bases and and short reads using R-wrapped fastq_quality_trimmer function from FASTX-toolkit. outfile: file name and path of the fastq files after filtering; qualityThreshold: the minimum quality threshold. CLIPflexR documentation

Fox3_Std_filtered <- fastq_quality_trimmer(Fox3_Std_filtered,
                                           outFile = "SRR1107535_clipFiltered.fastq.gz",qualityThreshold = 5,minimumLength = 20 )

Collapse exact ctk_fastq2collapse (wrapping CTKs's ctk_fastq2collapse.pl)

Collapse duplicate sequences into unique reads using R-wrapped fastq2collapse.pl from CTK.
CLIPflexR documentation.

Fox3_Std_filtered <- ctk_fastq2collapse(Fox3_Std_filtered,
                                           outFile = "SRR1107535_collapsed.fastq.gz")

Remove random barcodes (wrapping CTK's stripBarcode.pl)

Strip random barcodes with specific length from the reads using R-wrapped stripBarcode.pl from CTK.
linkerlength: barcode length; inputFormat: format of input sequences, fasta (default) or fastq.
CLIPflexR documentation.

Fox3_Std_filtered <- ctk_stripBarcode(Fox3_Std_filtered,linkerlength = 5,
                                      inputFormat = "fasta",
                                           outFile = "SRR1107535_stripped.fastq.gz")

Map processed reads to genome

Make reference data for mm10

Fetch reference sequences from BSgenome.

require(BSgenome.Mmusculus.UCSC.mm10)
require(magrittr)
listOfSeq <- lapply(seqnames(BSgenome.Mmusculus.UCSC.mm10),function(x)BSgenome.Mmusculus.UCSC.mm10[[x]])
names(listOfSeq) <- seqnames(BSgenome.Mmusculus.UCSC.mm10)
toWrite <- DNAStringSet(listOfSeq[!grepl("Un|random|hap",names(listOfSeq))])
writeXStringSet(toWrite,file="mm10.fa")

Build a Bowtie2 genome index using Rbowtie2

Build Bowtie2 index of reference genome, mm10 in this vignette.
threads: number of threads used. Index documentation.

bowtie2_index("mm10.fa",threads = 6)

Align to genome using Rbowtie2

Align reads to indexed reference genome using Rbowtie2.
format: format of input sequences, fasta (default) or fastq; threads: number of threads used, 1 (default); maxMismatches: number of mismatches allowed, 1 (default) or 0; seedSubString: length of seed substrings, 18 (default), must be >3 & <32; report_k: NULL, look for multiple alignments, report best (default), otherwise report up to n (integer) alignments per read.
Alignment documentation.

Fox3_Std_filtered <- decompress(Fox3_Std_filtered)
alignedFile <- bowtie_align(Fox3_Std_filtered,
              "mm10",
              gsub("\\.fastq",".bam",Fox3_Std_filtered),
              format="fastq",
              threads = 6)

Parsing SAM files

Parsing SAM files to get unique mapped reads and mismatches per read using R-wrapped parseAlignment.pl from CTK.
minLen: miminal mapping length to report, 18 (default); mapQual: Minimum map quality, 1 (default MAPQ), keep only uniquely mapped reads when mapQual>=1.
CLIPflexR documentation.

parsedAlignement <- ctk_parseAlignment(alignedFile,
                                       minLen = 18,
                                       mapQual = 1,
                                       mutationFile = "mutations.txt")

Collapse duplicated mapped reads

Collapse CLIP tags generated by parseAlignment using R-wrapped tag2collapse.pl from CTK.
bigFile: set TRUE if input file is big; keepMaxScore: keep the tag with the most weight (instead of the longest one) as representative; keepTagName: do not change tag name (no extra information); weight: consider the weight of each tag; weightInName: find weight in name; randomBarcode: random barcode exists, no collapse for different barcodes; em: EM threshold to infer reliability of each collapsed read (when have random linker, -1 = no EM); outputSeqError: output sequencing errors estimated by the EM algorithm.
CLIPflexR documentation.

# -v -big --random-barcode -EM 30 --seq-error-model alignment -weight --weight-in-name --keep-max-score --keep-tag-name $f.tag.bed
tagcollapsed <- ctk_tag2collapse(parsedAlignement,
                                     keepMaxScore = TRUE,
                                     keepTagName = TRUE,
                                     weight = TRUE,
                                     bigFile = TRUE,
                                     weightInName = TRUE,
                                     randomBarcode = TRUE,
                                     em = 30,
                                     outputSeqError = TRUE
                                     )

Call peaks & visualize

Join tags and mutations

Get the mutations in unique tags using R-wrapped selectRow.pl from CTK (originally wrapped from Galaxy).
file1: mutation file; file2: tagged bed file; field1: query column; field2: filter column; mode: running mode, set N while only print paired rows.
CLIPflexR documentation.

tagJoined <- ctk_joinWrapper("mutations.txt",tagcollapsed,field1 = 4,field2 = 4,mode = "n",
                                outFile="Fox3_3_tag.unique.mutation.txt")

Label bed file with rgb information

Add colors to bed file using R-wrapped bed2rgb.pl from CTK.
filesToRun: taged bed files; col: assign color in r,g,b format.
CLIPflexR documentation.

tagcollapsedRGB <- ctk_bed2rgb(filesToRun = tagcollapsed,col = "128,0,0")

Generate bedgraph

Generate bedgraphs for visualization using R-wrapped tag2profile.pl from CTK.
bigFile: set TRUE if input file is big; ss: separate strand, TRUE (default); exact: exact count of each nucleotide, TRUE (default).
CLIPflexR documentation.

# -v -big --random-barcode -EM 30 --seq-error-model alignment -weight --weight-in-name --keep-max-score --keep-tag-name $f.tag.bed
tagProfile <- ctk_tag2profile(filesToRun = tagcollapsedRGB,bigFile = TRUE,ss = TRUE,exact = TRUE,outputFormat = "bedgraph")

Peak calling without statistical assessment

Call peaks using R-wrapped tag2peak.pl from CTK without statistical assessment. filesToRun: RGB labeled tagged bed files; valleySeeking: find candidate peaks by valley seeking, TRUE (default); valleyDepth: depth of valley, if valley seeking, 0.9 (default), between 0.5 and 1; bigFile: set TRUE if the input file is big; ss: separate strand, TRUE (default).
CLIPflexR documentation.

tagPeaksNoStats <- ctk_tag2peak(filesToRun = tagcollapsedRGB,valleySeeking = TRUE,valleyDepth = 0.9,bigFile = TRUE,ss = TRUE)

Peak calling with statistical assessment

Call peaks using R-wrapped tag2peak.pl from CTK with statistical assessment. filesToRun: RGB labeled tagged bed files; valleySeeking: find candidate peaks by valley seeking, TRUE (default);valleyDepth: depth of valley, if valley seeking, 0.9 (default), between 0.5 and 1; bigFile: set TRUE if the input file is big; ss: separate strand, TRUE (default); genes: gene locus in bed format.
CLIPflexR documentation.

require(TxDb.Mmusculus.UCSC.mm10.knownGene)
export.bed(genes(TxDb.Mmusculus.UCSC.mm10.knownGene),"genes.bed")
tagProfile <- ctk_tag2peak(filesToRun = tagcollapsedRGB,
                           outFile = "TC_SRR1107535_stripped.RGB.peakWithStats.bed",
                           outBoundary ="TC_SRR1107535_stripped.RGB.peakBoundary.bed",
                           outHalfPH = "TC_SRR1107535_stripped.RGB.peakHalf.bed",
                           valleySeeking = TRUE,valleyDepth = 0.9,bigFile = TRUE,ss = TRUE,genes = "genes.bed")

Use CLIPflexR to get peak sequences or search for motifs in peaks

Retrieve sequence under peaks using CLIPflexR fetchSequencesForClIP

peaks: tagged bed file or GRanges of peaks; reSize: resize peak to this width around peak center, 64 (default); fasta: reference sequence.
CLIPflexR documentation.

myseq <- fetchSequencesForClIP(peaks = tagProfile,reSize = 20,fasta = "mm10.fa",bedHeader = FALSE)

Retrieve peaks associated with particular motifs using CLIPflexR annotatePeaksWithPatterns

peaks: tagged bed file or GRanges of peaks; reSize: resize peak to this width around peak center, 64 (default); fasta: reference sequence; pattern: FASTA of motif sequence(s).
CLIPflexR documentation.

myseq <- annotatePeaksWithPatterns(peaks = tagProfile,patterns = "TGCATG",resize = 100,fasta = "mm10.fa",bedHeader=FALSE)

kathrynrozengagnon/CLIPflexR documentation built on Dec. 8, 2022, 7:31 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kathrynrozengagnon/CLIPflexR
CLIP tools and wrappers in R

In kathrynrozengagnon/CLIPflexR: CLIP tools and wrappers in R

This vignette shows how to execute the CTK "Standard/BrdU-CLIP_data_analysis_using_CTK" vignette using CLIPflexR, from within R.

Download Data

Process data

Quality filter reads (wrapping CTK's fastq_filter.pl)

Remove 3' linkers (wrapping FASTX's fastx_clipper)

Trim low quality nucleotides and short reads (wrapping FASTX's fastq_quality_trimmer)

Collapse exact ctk_fastq2collapse (wrapping CTKs's ctk_fastq2collapse.pl)

Remove random barcodes (wrapping CTK's stripBarcode.pl)

Map processed reads to genome

Make reference data for mm10

Build a Bowtie2 genome index using Rbowtie2

Align to genome using Rbowtie2

Parsing SAM files

Collapse duplicated mapped reads

Call peaks & visualize

Join tags and mutations

Label bed file with rgb information

Generate bedgraph

Peak calling without statistical assessment

Peak calling with statistical assessment

Use CLIPflexR to get peak sequences or search for motifs in peaks

Retrieve sequence under peaks using CLIPflexR fetchSequencesForClIP

Retrieve peaks associated with particular motifs using CLIPflexR annotatePeaksWithPatterns

R Package Documentation

Browse R Packages

We want your feedback!

kathrynrozengagnon/CLIPflexR CLIP tools and wrappers in R

In kathrynrozengagnon/CLIPflexR: CLIP tools and wrappers in R

This vignette shows how to execute the CTK "Standard/BrdU-CLIP_data_analysis_using_CTK" vignette using CLIPflexR, from within R.

Download Data

Process data

Quality filter reads (wrapping CTK's fastq_filter.pl)

Remove 3' linkers (wrapping FASTX's fastx_clipper)

Trim low quality nucleotides and short reads (wrapping FASTX's fastq_quality_trimmer)

Collapse exact ctk_fastq2collapse (wrapping CTKs's ctk_fastq2collapse.pl)

Remove random barcodes (wrapping CTK's stripBarcode.pl)

Map processed reads to genome

Make reference data for mm10

Build a Bowtie2 genome index using Rbowtie2

Align to genome using Rbowtie2

Parsing SAM files

Collapse duplicated mapped reads

Call peaks & visualize

Join tags and mutations

Label bed file with rgb information

Generate bedgraph

Peak calling without statistical assessment

Peak calling with statistical assessment

Use CLIPflexR to get peak sequences or search for motifs in peaks

Retrieve sequence under peaks using CLIPflexR fetchSequencesForClIP

Retrieve peaks associated with particular motifs using CLIPflexR annotatePeaksWithPatterns

R Package Documentation

Browse R Packages

We want your feedback!

kathrynrozengagnon/CLIPflexR
CLIP tools and wrappers in R