SCREEN: Batch Run

View source: R/SCREEN.R

SCREENR Documentation

Batch Run

Description

Get all results using one function easily.

Usage

SCREEN(
  sg_dir,
  mtx_dir,
  fragments,
  cal.FRiP = TRUE,
  species = "Hs",
  version = "v75",
  data_type = "RNA",
  Mixscape = TRUE,
  prefix = "./",
  label = "",
  gene_type = "Symbol",
  protein_coding = TRUE,
  frac = 0.01,
  cal.mt = TRUE,
  nFeature = c(200, 5000),
  nCount = 1000,
  FRiP = 0.1,
  mt = 10,
  blank_NTC = FALSE,
  lambda = 0.01,
  permutation = NULL,
  p_val_cut = 0.05,
  score_cut = 0.5,
  cicero_p_val_cut = 0.05,
  cicero_score_cut = 0,
  ylimit = "auto",
  project = "perturb",
  NTC = "NTC",
  replicate = 1,
  select_gene = NULL,
  selected = NULL,
  gene_annotations = NULL,
  pro_annotations = NULL,
  pro_up = 3000,
  pro_down = 0,
  overlap_cut = 0,
  p_adj_cut = 0.05,
  logFC_cut = 1,
  min.pct = 0.2,
  upstream = 2e+06,
  downstream = 2e+06,
  test.use = "wilcox",
  track_size = c(1, 0.3, 0.2, 0.3),
  include_axis_track = TRUE,
  connection_color = "#7F7CAF",
  connection_color_legend = TRUE,
  connection_width = 2,
  connection_ymax = NULL,
  gene_model_color = "#81D2C7",
  alpha_by_coaccess = FALSE,
  gene_model_shape = c("smallArrow", "box")
)

Arguments

sg_dir

Data frame or directory to a txt file containing 3 columns: cell, barcode, gene. If sgRNA information stored in a matrix-like format or sinput data frame only has sgRNA frequence of each cell, use sgRNAassign to assign sgRNA to each cell.

mtx_dir

SeuratObject or directory to rds file of SeuratObject, with cell in columns and features in rows.

fragments

Directory of fragments file used to calculate FRiP for perturb-ATAC input.

cal.FRiP

Logical, calculate FRiP or not. Default is TRUE.

species

Only support "Hs" and "Mm", if input other species, percent.mt will be count as "Mm". Default is "Hs".

version

Version of the reference genome(Ensembl), used for perturb-ATAC input and perturb-enhancer input. Default is "v75".

data_type

Type of input data, can be one of c("RNA", "ATAC"). Default is "RNA".

Mixscape

Logical, run IntegratedMixscape or not. Default is TRUE.

prefix

Path to save all the results. Default is current directory.

label

The label of the output file.

gene_type

Type of gene name, selected from one of c("Symbol", "Ensembl"). Default is "Symbol".

protein_coding

Logical, only use protein coding gene or not. This parameter is only used for calculating gene activity for perturb-ATAC input. Default is TRUE.

frac

A paramter for filtering low expressed genes or low accessibility peaks. By default, only genes or peaks that have expressions or counts in at least that fractions of cells are kept. Default is 0.01.

cal.mt

Logical, calculate percentage of mitochondrial gene expression of each cell or not. Default is TRUE.

nFeature

Limitation of detected feature numbers in each cell, in the format c(200, 5000). Default is c(200, 5000).

nCount

Minimal count numbers in each cell. Default is 1000.

FRiP

Minimal FRiP of each cell. Default is 0.1.

mt

Maximal percentage of mitochondrial gene expression of each cell. Default is 10.

blank_NTC

Logical, use blank control as negative control or not. Default is FALSE.

lambda

Parameter used in ridge regression of improved_scmageck_lr. Default is 0.01.

permutation

Permutation times in improved_scmageck_lr. Default is 10000.

p_val_cut

P-value cutoff of improved_scmageck_lr results. Default is 0.05.

score_cut

Score cutoff of improved_scmageck_lr results. Default is 0.5.

cicero_p_val_cut

P-value cutoff of improved_scmageck_lr results used for ciceroPlot. Default is 0.05.

ylimit

Limitation of y-axis of DE_gene_plot in the format c(-600, 600, 200). These numbers mean c(minimum, maximum, interval). Default is "auto", which means that this function will get ylimit automatically.

project

Title of DE_gene_plot. Default is "perturb".

NTC

The name of the genes served as negative controls. Default is "NTC".

replicate

Required a vector of replicate information corresponding to each cell with the same order. Default is 1, which means no replicate.

select_gene

The list of genes for regression in scMAGeCK step. By default, all genes in the table are subject to regression.

selected

Enhancer regions to visualize for perturb-enhancer or perturbations to chose for perturb-ATAC, in cicero step. By default, all enhancers or all perturbations will be chosen.

gene_annotations

Gene annotations stored in data frame format, including c("chromosome", "start", "end", "strand", "transcript") as colnames, used for /codeciceroPlot step. By default, gene annotations are from ensembldb.

pro_annotations

Gene annotations stored in data frame format, including c("chromosome", "start", "end", "strand", "transcript") as colnames. By default, gene annotations are from ensembldb.

pro_up

The number of nucleotides upstream of the transcription start site that should be included in the promoter region, only used for perturb-ATAC data. Default is 3000.

pro_down

The number of nucleotides downstream of the transcription start site that should be included in the promoter region, only used for perturb-ATAC data. Default is 0.

overlap_cut

Maximum overlap nucleotides between peaks and promoters, only used for perturb-ATAC data. Default is 0.

p_adj_cut

Parameter only used for finding DA peaks. Maximum adjust p_value calculated by FindMarkers. Default is 0.05.

logFC_cut

Parameter only used for finding DA peaks. Minimum log fold change calculated by FindMarkers. Default is 1.

min.pct

Parameter only used for finding DA peaks. Only test genes that are detected in a minimum fraction of min.pct cells in either of the NTC or perturbations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.2.

upstream

The number of nucleotides upstream of the start site of selected region in ciceroPlot step. Default is 2000000.

downstream

The number of nucleotides downstream of the start site of selected region in ciceroPlot step. Default is 2000000.

test.use

Parameter only used for finding DA peaks. Default is "wilcox".For more details, see FindMarkers.

track_size

Size of each axis in /codeciceroPlot result. Default is c(1,.3,.2,.3). If include_axis_track=FALSE, track_size should be a vector with 3 elements.

include_axis_track

Logical, should a genomic axis be plotted? Default is TRUE.

connection_color

Color for connection lines. A single color, the name of a column containing color values, or the name of a column containing a character or factor to base connection colors on.

connection_color_legend

Logical, should connection color legend be shown?

connection_width

Width of connection lines.

connection_ymax

Connection y-axis height. If NULL, chosen automatically.

gene_model_color

Color for gene annotations.

alpha_by_coaccess

Logical, should the transparency of connection lines be scaled based on co-accessibility score?

cicero_socre_cut

Score cutoff of improved_scmageck_lr results used for ciceroPlot. Default is 0.


HailinWei98/SCREEN documentation built on June 15, 2022, 12:21 a.m.