sc_atac_pipeline: A convenient function for running the entire pipeline
In LuyiTian/scPipe: Pipeline for single cell multi-omic data pre-processing

sc_atac_pipeline

R Documentation

A convenient function for running the entire pipeline

Description

A convenient function for running the entire pipeline

Usage

sc_atac_pipeline(
  r1,
  r2,
  bc_file,
  valid_barcode_file = "",
  id1_st = -0,
  id1_len = 16,
  id2_st = 0,
  id2_len = 16,
  rmN = TRUE,
  rmlow = TRUE,
  organism = NULL,
  reference = NULL,
  feature_type = NULL,
  remove_duplicates = FALSE,
  samtools_path = NULL,
  genome_size = NULL,
  bin_size = NULL,
  yieldsize = 1e+06,
  exclude_regions = TRUE,
  excluded_regions_filename = NULL,
  fix_chr = "none",
  lower = NULL,
  cell_calling = "filter",
  promoters_file = NULL,
  tss_file = NULL,
  enhs_file = NULL,
  gene_anno_file = NULL,
  min_uniq_frags = 3000,
  max_uniq_frags = 50000,
  min_frac_peak = 0.3,
  min_frac_tss = 0,
  min_frac_enhancer = 0,
  min_frac_promoter = 0.1,
  max_frac_mito = 0.15,
  report = TRUE,
  nthreads = 12,
  output_folder = NULL
)

Arguments

`r1`	The first read fastq file
`r2`	The second read fastq file
`bc_file`	the barcode information, can be either in a `fastq` format (e.g. from 10x-ATAC) or from a `.csv` file (here the barcode is expected to be on the second column). Currently, for the fastq approach, this can be a list of barcode files.
`valid_barcode_file`	optional file path of the valid (expected) barcode sequences to be found in the bc_file (.txt, can be txt.gz). Must contain one barcode per line on the second column separated by a comma (default =""). If given, each barcode from bc_file is matched against the barcode of best fit (allowing a hamming distance of 1). If a FASTQ `bc_file` is provided, barcodes with a higher mapping quality, as given by the fastq reads quality score are prioritised.
`id1_st`	barcode start position (0-indexed) for read 1, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`id1_len`	barcode length for read 1, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`id2_st`	barcode start position (0-indexed) for read 2, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`id2_len`	barcode length for read 2, which is an extra parameter that is needed if the `bc_file` is in a `.csv` format.
`rmN`	ogical, whether to remove reads that contains N in UMI or cell barcode.
`rmlow`	logical, whether to remove reads that have low quality barcode sequences.
`organism`	The name of the organism e.g. hg38
`reference`	The reference genome file
`feature_type`	The feature type (either 'genome_bin' or 'peak')
`remove_duplicates`	Whether or not to remove duplicates (samtools is required)
`samtools_path`	A custom path of samtools to use for duplicate removal
`genome_size`	The size of the genome (used for the `cellranger` cell calling method)
`bin_size`	The size of the bins for feature counting with the 'genome_bin' feature type
`yieldsize`	The number of reads to read in for feature counting
`exclude_regions`	Whether or not the regions should be excluded
`excluded_regions_filename`	The filename of the file containing the regions to be excluded
`fix_chr`	Specify 'none', 'exclude_regions', 'feature' or 'both' to prepend the string "chr" to the start of the associated file
`lower`	the lower threshold for the data if using the `emptydrops` function for cell calling.
`cell_calling`	The desired cell calling method either `cellranger`, `emptydrops` or `filter`
`promoters_file`	The path of the promoter annotation file (if the specified organism isn't recognised)
`tss_file`	The path of the tss annotation file (if the specified organism isn't recognised)
`enhs_file`	The path of the enhs annotation file (if the specified organism isn't recognised)
`gene_anno_file`	The path of the gene annotation file (gtf or gff3 format)
`min_uniq_frags`	The minimum number of required unique fragments required for a cell (used for `filter` cell calling)
`max_uniq_frags`	The maximum number of required unique fragments required for a cell (used for `filter` cell calling)
`min_frac_peak`	The minimum proportion of fragments in a cell to overlap with a peak (used for `filter` cell calling)
`min_frac_tss`	The minimum proportion of fragments in a cell to overlap with a tss (used for `filter` cell calling)
`min_frac_enhancer`	The minimum proportion of fragments in a cell to overlap with a enhancer sequence (used for `filter` cell calling)
`min_frac_promoter`	The minimum proportion of fragments in a cell to overlap with a promoter sequence (used for `filter` cell calling)
`max_frac_mito`	The maximum proportion of fragments in a cell that are mitochondrial (used for `filter` cell calling)
`report`	Whether or not a HTML report should be produced
`nthreads`	The number of threads to use for alignment (sc_align) and demultiplexing (sc_atac_bam_tagging)
`output_folder`	The path of the output folder

Value

None (invisible 'NULL')

Examples

data.folder <- system.file("extdata", package = "scPipe", mustWork = TRUE)
r1      <- file.path(data.folder, "small_chr21_R1.fastq.gz") 
r2      <- file.path(data.folder, "small_chr21_R3.fastq.gz") 

# Using a barcode fastq file:

# barcodes in fastq format
barcode_fastq      <- file.path(data.folder, "small_chr21_R2.fastq.gz") 

## Not run: 
sc_atac_pipeline(
  r1 = r1,
  r2 = r2,
  bc_file = barcode_fastq
)

## End(Not run)

LuyiTian/scPipe documentation built on Dec. 11, 2023, 8:21 p.m.