sc_atac_pipeline: A convenient function for running the entire pipeline

View source: R/sc_atac_pipeline.R

sc_atac_pipelineR Documentation

A convenient function for running the entire pipeline

Description

A convenient function for running the entire pipeline

Usage

sc_atac_pipeline(
  r1,
  r2,
  bc_file,
  valid_barcode_file = "",
  id1_st = -0,
  id1_len = 16,
  id2_st = 0,
  id2_len = 16,
  rmN = TRUE,
  rmlow = TRUE,
  organism = NULL,
  reference = NULL,
  feature_type = NULL,
  remove_duplicates = FALSE,
  samtools_path = NULL,
  genome_size = NULL,
  bin_size = NULL,
  yieldsize = 1e+06,
  exclude_regions = TRUE,
  excluded_regions_filename = NULL,
  fix_chr = "none",
  lower = NULL,
  cell_calling = "filter",
  promoters_file = NULL,
  tss_file = NULL,
  enhs_file = NULL,
  gene_anno_file = NULL,
  min_uniq_frags = 3000,
  max_uniq_frags = 50000,
  min_frac_peak = 0.3,
  min_frac_tss = 0,
  min_frac_enhancer = 0,
  min_frac_promoter = 0.1,
  max_frac_mito = 0.15,
  report = TRUE,
  nthreads = 12,
  output_folder = NULL
)

Arguments

r1

The first read fastq file

r2

The second read fastq file

bc_file

the barcode information, can be either in a fastq format (e.g. from 10x-ATAC) or from a .csv file (here the barcode is expected to be on the second column). Currently, for the fastq approach, this can be a list of barcode files.

valid_barcode_file

optional file path of the valid (expected) barcode sequences to be found in the bc_file (.txt, can be txt.gz). Must contain one barcode per line on the second column separated by a comma (default =""). If given, each barcode from bc_file is matched against the barcode of best fit (allowing a hamming distance of 1). If a FASTQ bc_file is provided, barcodes with a higher mapping quality, as given by the fastq reads quality score are prioritised.

id1_st

barcode start position (0-indexed) for read 1, which is an extra parameter that is needed if the bc_file is in a .csv format.

id1_len

barcode length for read 1, which is an extra parameter that is needed if the bc_file is in a .csv format.

id2_st

barcode start position (0-indexed) for read 2, which is an extra parameter that is needed if the bc_file is in a .csv format.

id2_len

barcode length for read 2, which is an extra parameter that is needed if the bc_file is in a .csv format.

rmN

ogical, whether to remove reads that contains N in UMI or cell barcode.

rmlow

logical, whether to remove reads that have low quality barcode sequences.

organism

The name of the organism e.g. hg38

reference

The reference genome file

feature_type

The feature type (either 'genome_bin' or 'peak')

remove_duplicates

Whether or not to remove duplicates (samtools is required)

samtools_path

A custom path of samtools to use for duplicate removal

genome_size

The size of the genome (used for the cellranger cell calling method)

bin_size

The size of the bins for feature counting with the 'genome_bin' feature type

yieldsize

The number of reads to read in for feature counting

exclude_regions

Whether or not the regions should be excluded

excluded_regions_filename

The filename of the file containing the regions to be excluded

fix_chr

Specify 'none', 'exclude_regions', 'feature' or 'both' to prepend the string "chr" to the start of the associated file

lower

the lower threshold for the data if using the emptydrops function for cell calling.

cell_calling

The desired cell calling method either cellranger, emptydrops or filter

promoters_file

The path of the promoter annotation file (if the specified organism isn't recognised)

tss_file

The path of the tss annotation file (if the specified organism isn't recognised)

enhs_file

The path of the enhs annotation file (if the specified organism isn't recognised)

gene_anno_file

The path of the gene annotation file (gtf or gff3 format)

min_uniq_frags

The minimum number of required unique fragments required for a cell (used for filter cell calling)

max_uniq_frags

The maximum number of required unique fragments required for a cell (used for filter cell calling)

min_frac_peak

The minimum proportion of fragments in a cell to overlap with a peak (used for filter cell calling)

min_frac_tss

The minimum proportion of fragments in a cell to overlap with a tss (used for filter cell calling)

min_frac_enhancer

The minimum proportion of fragments in a cell to overlap with a enhancer sequence (used for filter cell calling)

min_frac_promoter

The minimum proportion of fragments in a cell to overlap with a promoter sequence (used for filter cell calling)

max_frac_mito

The maximum proportion of fragments in a cell that are mitochondrial (used for filter cell calling)

report

Whether or not a HTML report should be produced

nthreads

The number of threads to use for alignment (sc_align) and demultiplexing (sc_atac_bam_tagging)

output_folder

The path of the output folder

Value

None (invisible 'NULL')

Examples

data.folder <- system.file("extdata", package = "scPipe", mustWork = TRUE)
r1      <- file.path(data.folder, "small_chr21_R1.fastq.gz") 
r2      <- file.path(data.folder, "small_chr21_R3.fastq.gz") 

# Using a barcode fastq file:

# barcodes in fastq format
barcode_fastq      <- file.path(data.folder, "small_chr21_R2.fastq.gz") 

## Not run: 
sc_atac_pipeline(
  r1 = r1,
  r2 = r2,
  bc_file = barcode_fastq
)

## End(Not run)


LuyiTian/scPipe documentation built on Dec. 11, 2023, 8:21 p.m.