wrapperPdx: Running PDX data preprocessing TO BE REVISED

View source: R/wrapperPdx.R

wrapperPdxR Documentation

Running PDX data preprocessing TO BE REVISED

Description

This function executes xenome, to remove mouse data, skewer, to trim adapters, bwa, to map reads to hg19 and to mark duplicates. IMPORTANT to prepare data for mutect v1 analysis it is mandatory to download the hg19 index archive indicated in the example.

Usage

wrapperPdx(
  group = c("sudo", "docker"),
  fastq.folder,
  scratch.folder,
  xenome.folder,
  seq.type,
  threads,
  adapter5 = "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA",
  adapter3 = "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT",
  min.length = 40,
  genome.folder = "/data/scratch/hg19_exome",
  sample.id = "sampleX"
)

Arguments

group

a character string. Two options: "sudo" or "docker", depending to which group the user belongs

fastq.folder

a character string indicating where gzip fastq files are located

scratch.folder

a character string indicating the scratch folder where docker container will be mounted

xenome.folder

a character string indicating the folder where the indexed reference genomes generated by xenome are locates

seq.type

a character string indicating the type of reads to be trimmed. Two options: "se" or "pe" respectively for single end and pair end sequencing

threads

a number indicating the number of cores to be used from the application

adapter5

a character string indicating the fwd adapter

adapter3

a character string indicating the rev adapter

min.length

a number indicating minimal length required to return a trimmed read

genome.folder

a character string indicating the folder where the indexed reference genome for bwa is located

sample.id

a character string indicating the unique id to be associated to the bam that will be created. IMPORTANT it is necessary to have a sample.id for each sample for further analysis.

Value

three files: dedup_reads.bam, which is sorted and duplicates marked bam file, dedup_reads.bai, which is the index of the dedup_reads.bam, and dedup_reads.stats, which provides mapping statistics

Author(s)

Raffaele Calogero

Examples

## Not run: 
    #downloading examples 1 million reads of mcf7 exome mixed with 1 million of mouse derived by human exome capturing
    system("wget http://130.192.119.59/public/hs1m_mm1m_R1.fastq.gz")
    system("wget http://130.192.119.59/public/hs1m_mm1m_R2.fastq.gz")

    #required for bwa 61Gb At the present time this is required to run mutect1
    system("wget http://130.192.119.59/public/hg19_exome.tar.gz")

    #running wrapperPdx
    wrapperPdx(group="docker",fastq.folder=getwd(), scratch.folder="/data/scratch",
    xenome.folder="/data/scratch/hg19.mm10", seq.type="pe", threads=24,
    adapter5="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA",
    adapter3="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT",
    min.length=40, genome.folder="/data/scratch/hg19_exome", sample.id="sampleX")


## End(Not run)

kendomaniac/docker4seq documentation built on Sept. 3, 2024, 6:42 p.m.