QC_fgc_crispr_data: QC_fgc_crispr_data

Description Usage Arguments Details Value Author(s)

View source: R/QC_fgc_crispr_data.R

Description

Produce Quality Control (QC) metrics from the following FGC CRISPR screen data inputs: an analysis config JSON file, a combined counts csv file, a Bagel results tsv file (Control vs Plasmid), a gRNA library tsv file ('cleanr.tsv'), and one or more bcl2fastq2 output JSON files ('Stats.json'). QC output will be returned for a single named comparison.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
QC_fgc_crispr_data(
  analysis_config,
  combined_counts,
  bagel_ctrl_plasmid,
  bagel_treat_plasmid,
  bcl2fastq,
  library,
  comparison_name,
  output,
  output_R_object,
  norm_method = "median_ratio",
  mock_missing_data = FALSE
)

Arguments

analysis_config

A path to a valid analysis config JSON file to be used as input for the AZ-CRUK CRISPR analysis pipeline.

combined_counts

A path to a valid combined counts csv file (produced by the AZ-CRUK CRISPR analysis pipeline).

bagel_ctrl_plasmid

A path to a valid Bagel tsv results file for Control vs Plasmid. If NULL, no such file is available.

bagel_treat_plasmid

A path to a valid Bagel tsv results file for Treatment vs Plasmid. If NULL, no such file is available.

bcl2fastq

A character string giving one or more paths to valid bcl2fastq2 summary output JSON files (paths separated by commas). May be NULL if mock_missing_data argument is TRUE.

library

A valid path to a library tsv file in which the first column gives the sgRNA sequence and the second column gives the sgRNA ID (produced by the AZ-CRUK CRISPR reference data generation pipeline). May be NULL if mock_missing_data argument is TRUE.

comparison_name

A character string naming a single comparison to extract QC data for (should correspond to the comparison name used in the analysis config JSON file).

output

A character string giving an output file name for the csv results. If NULL, do not write out any results.

output_R_object

A character string giving an output file name for the returned R object (useful for trouble-shooting). If NULL, do not save the R object.

norm_method

A character string naming a normalization method for the count data. Can be median_ratio or relative. Defaults to median_ratio.

mock_missing_data

A logical indicating whether any missing inputs should be mocked or not. Defaults to FALSE.

Details

There are 7 main steps in the function:

  1. Check inputs (mock missing data, if needed).

  2. Read data.

  3. QC for sequencing metrics (merge with qc data in analysis config).

  4. Normalize counts and calculate logFC data.

  5. QC for counts and logFC data.

  6. QC for Bagel Bayes Factors.

  7. Wrap up and write out results (mask any mocked columns).

Value

A list containing the following elements:

Author(s)

Alex T. Kalinka, alex.kalinka@cancer.org.uk


alex-kalinka-cruk/fgcQC documentation built on June 23, 2020, 9:05 p.m.