addGeneScoreMatrix: Add GeneScoreMatrix to ArrowFiles or an ArchRProject

View source: R/MatrixGeneScores.R

addGeneScoreMatrixR Documentation

Add GeneScoreMatrix to ArrowFiles or an ArchRProject

Description

This function, for each sample, will independently compute counts for each tile per cell and then infer gene activity scores.

Usage

addGeneScoreMatrix(
  input = NULL,
  genes = getGenes(input),
  geneModel = "exp(-abs(x)/5000) + exp(-1)",
  matrixName = "GeneScoreMatrix",
  extendUpstream = c(1000, 1e+05),
  extendDownstream = c(1000, 1e+05),
  geneUpstream = 5000,
  geneDownstream = 0,
  useGeneBoundaries = TRUE,
  useTSS = FALSE,
  extendTSS = FALSE,
  tileSize = 500,
  ceiling = 4,
  geneScaleFactor = 5,
  scaleTo = 10000,
  excludeChr = c("chrY", "chrM"),
  blacklist = getBlacklist(input),
  threads = getArchRThreads(),
  parallelParam = NULL,
  subThreading = TRUE,
  force = FALSE,
  logFile = createLogFile("addGeneScoreMatrix")
)

Arguments

input

An ArchRProject object or character vector of ArrowFiles.

genes

A stranded GRanges object containing the ranges associated with all gene start and end coordinates.

geneModel

A string giving a "gene model function" used for weighting peaks for gene score calculation. This string should be a function of x, where x is the stranded distance from the transcription start site of the gene.

matrixName

The name to be used for storage of the gene activity score matrix in the provided ArchRProject or ArrowFiles.

extendUpstream

The minimum and maximum number of basepairs upstream of the transcription start site to consider for gene activity score calculation.

extendDownstream

The minimum and maximum number of basepairs downstream of the transcription start site or transcription termination site (based on 'useTSS') to consider for gene activity score calculation.

geneUpstream

An integer describing the number of bp upstream the gene to extend the gene body. This effectively makes the gene body larger as there are proximal peaks that should be weighted equally to the gene body. This parameter is used if 'useTSS=FALSE'.

geneDownstream

An integer describing the number of bp downstream the gene to extend the gene body.This effectively makes the gene body larger as there are proximal peaks that should be weighted equally to the gene body. This parameter is used if 'useTSS=FALSE'.

useGeneBoundaries

A boolean value indicating whether gene boundaries should be employed during gene activity score calculation. Gene boundaries refers to the process of preventing tiles from contributing to the gene score of a given gene if there is a second gene's transcription start site between the tile and the gene of interest.

useTSS

A boolean describing whether to build gene model based on gene TSS or the gene body.

extendTSS

A boolean describing whether to extend the gene TSS. By default useTSS uses the 1bp TSS while this parameter enables the extension of this region with 'geneUpstream' and 'geneDownstream' respectively.

tileSize

The size of the tiles used for binning counts prior to gene activity score calculation.

ceiling

The maximum counts per tile allowed. This is used to prevent large biases in tile counts.

geneScaleFactor

A numeric scaling factor to weight genes based on the inverse of there length i.e. (Scale Factor)/(Gene Length). This is scaled from 1 to the scale factor. Small genes will be the scale factor while extremely large genes will be closer to 1. This scaling helps with the relative gene score value.

scaleTo

Each column in the calculated gene score matrix will be normalized to a column sum designated by scaleTo.

excludeChr

A character vector containing the seqnames of the chromosomes that should be excluded from this analysis.

blacklist

A GRanges object containing genomic regions to blacklist that may be extremeley over-represented and thus biasing the geneScores for genes nearby that locus.

threads

The number of threads to be used for parallel computing.

parallelParam

A list of parameters to be passed for biocparallel/batchtools parallel computing.

subThreading

A boolean determining whether possible use threads within each multi-threaded subprocess if greater than the number of input samples.

force

A boolean value indicating whether to force the matrix indicated by matrixName to be overwritten if it already exist in the given input.

logFile

The path to a file to be used for logging ArchR output.


haibol2016/ArchR documentation built on June 15, 2022, 5:41 p.m.