read-deprecated: Read data
In bailey-lab/miplicorn: A Framework for Molecular Inversion Probe and Amplicon Analysis

read-deprecated

R Documentation

Read data

Description

read_file() has been replaced by read_tbl_reference(), read_tbl_alternate(), and read_tbl_coverage() to provide more specific functionality.

read() has been renamed to read_tbl_ref_alt_cov().

Usage

read(
  .ref_file,
  .alt_file,
  .cov_file,
  ...,
  chrom = deprecated(),
  gene = deprecated()
)

read_file(.file, ..., .name = "value")

Arguments

`.ref_file`	File path to the reference table.
`.alt_file`	File path to the alternate table.
`.cov_file`	File path to the coverage table.
`...`	<`data-masking`> Expressions that return a logical value and are used to filter the data. If multiple expressions are included, they are combined with the `&` operator. Only rows for which all conditions evaluate to `TRUE` are kept.
`chrom`	: The chromosome(s) to filter to.
`gene`	: The gene(s) to filter to.
`.file`	File path to a file.
`.name`	The information contained in the specific file. For example `"coverage"` or `"ref_umi_count"`.

Details

Read files containing MIPTools' data tables. read_file() reads a single file. read() is a convenience function that reads all files output by MIPTools and combines them. Data files include the reference table, the alternate table, and the coverage table. Data is read lazily using the vroom package. Data can be filtered, retaining all rows that satisfy the conditions. To be retained, the row in question must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA, the row will be dropped.

Value

A tibble(). The first six columns contain the metadata associated with each sample and mutation. Columns ref_umi_count and alt_umi_count contain the umi count of the reference and alternate allele, respectively. Column coverage contains the coverage for each data point.

Data structure

Input data must contain six rows of metadata. The metadata can vary depending on what type of file is read, but typically contains information about the location of a mutation. The remaining rows represent the data for each sample sequenced. Together, the alternate, reference, and coverage tables can provide information about mutations observed and the coverage at each site.

Useful filter functions

The dplyr::filter() function is employed to subset the rows of the data applying the expressions in ... to the column values to determine which rows should be retained.

There are many functions and operators that are useful when constructing the expressions used to filter the data:

==, >, >=, etc.
&, |, !, xor()
is.na()
between(), near()

Examples

# Get path to example file
ref_file <- miplicorn_example("reference_AA_table.csv")
alt_file <- miplicorn_example("alternate_AA_table.csv")
cov_file <- miplicorn_example("coverage_AA_table.csv")
cov_file

# Input sources -------------------------------------------------------------
# Read from a path
read_file(cov_file, .name = "coverage")
read(ref_file, alt_file, cov_file)

# You can also use paths directly
# read_file("reference_AA_table.csv")
# read("reference_AA_table.csv", "alternate_AA_table.csv", "coverage_AA_table.csv")

# Read entire file ----------------------------------------------------------
read_file(cov_file, .name = "coverage")
read(ref_file, alt_file, cov_file)

# Data filtering ------------------------------------------------------------
# Filtering by one criterion
read_file(cov_file, gene == "atp6", .name = "coverage")
read(ref_file, alt_file, cov_file, gene == "atp6")

# Filtering by multiple criteria within a single logical expression
read_file(cov_file, gene == "atp6" | targeted == "Yes", .name = "coverage")
read(ref_file, alt_file, cov_file, gene == "atp6" & targeted == "Yes")

# When multiple expressions are used, they are combined using &
read(ref_file, alt_file, cov_file, gene == "atp6", targeted == "Yes")

bailey-lab/miplicorn documentation built on March 19, 2023, 7:40 p.m.