gpatterns.import_from_bam: Create a track from bam files.

View source: R/import.R

gpatterns.import_from_bamR Documentation

Create a track from bam files.

Description

Creates a track from bam files.

Usage

gpatterns.import_from_bam(
  bams,
  workdir = NULL,
  track = NULL,
  steps = "all",
  paired_end = TRUE,
  cgs_mask_file = NULL,
  trim = NULL,
  umi1_idx = NULL,
  umi2_idx = NULL,
  use_seq = FALSE,
  only_seq = FALSE,
  frag_intervs = NULL,
  maxdist = 0,
  rm_off_target = TRUE,
  add_chr_prefix = FALSE,
  bismark = FALSE,
  nbins = nrow(gintervals.all()),
  groot = GROOT,
  import_raw_tcpgs = FALSE,
  use_sge = FALSE,
  max_jobs = 400,
  parallel = getOption("gpatterns.parallel"),
  cmd_prefix = "",
  run_per_interv = TRUE,
  min_qual = 20,
  ...
)

Arguments

bams

character vector with path of bam files

workdir

directory in which the files would be saved (please provide full path)

track

name of the track to generate

steps

steps of the pipeline to do. Possible options are: 'bam2tidy_cpgs', 'filter_dups', 'bind_tidy_cpgs', 'pileup', 'pat_freq'

paired_end

bam files are paired end, with R1 and R2 interleaved

cgs_mask_file

comma separated file with positions of cpgs to mask (e.g. MSP1 sticky ends). Needs to have chrom and start fields with the position of 'C' in the cpgs to mask

trim

trim cpgs that are –trim bp from the beginning/end of the read

umi1_idx

position of umi1 in index (0 based)

umi2_idx

position of umi2 in index (0 based)

use_seq

use UMI sequence (not only position) to filter duplicates

only_seq

use only UMI sequence (without positions) to filter duplicates

frag_intervs

intervals set of the fragments to change positions to.

maxdist

maximal distance from fragments

rm_off_target

if TRUE - remove reads with distance > maxdist from frag_intervs if FALSE - those reads would be left unchanged

add_chr_prefix

add "chr" prefix for chromosomes (in order to import to misha)

bismark

bam was aligned using bismark

nbins

number of genomic bins to separate the analysis.

groot

root of misha genomic database to save the tracks

import_raw_tcpgs

import raw tidy cpgs to misha (without filtering duplicates)

use_sge

use sun grid engine for parallelization

max_jobs

maximal number of jobs for sge parallelization

parallel

parallelize using threads (number of threads is determined by gpatterns.set_parallel)

cmd_prefix

prefix to run on 'system' commands (e.g. source ~/.bashrc)

run_per_interv

split run of bam2tidy_cpgs scripts separatly for each interval.

min_qual

minial base quality

...

gpatterns.import_from_tidy_cpgs parameters

Value

if 'stats' is one of the steps - data frame with statistics. Otherwise none.


tanaylab/gpatterns documentation built on May 15, 2023, 6:23 p.m.