analyze_dataset: Analyze all samples in a dataset

View source: R/analyze_dataset.R

analyze_datasetR Documentation

Analyze all samples in a dataset

Description

Load all samples for a given dataset and produce a summary data frame showing each sample's summary row-by-row, a list of processed input files, and a list of processed samples. The entries in the processed-samples list and the rows in the summary data frame will be sorted according to the ordering of loci in locus_attrs and by the sample attributes. Processed files are stored separately (as there may be multiple samples per file) and named by input file path. An error is thrown if any locus entries in the given dataset are not found in the locus attributes data frame.

Usage

analyze_dataset(
  dataset,
  locus_attrs,
  analysis_function = analyze_sample,
  summary_function = summarize_sample,
  ncores = cfg("ncores"),
  known_alleles = NULL
)

Arguments

dataset

data frame of sample details as produced by prepare_dataset.

locus_attrs

data frame of locus attributes as produced by load_locus_attrs.

analysis_function

function to use when analyzing each sample's data frame into the filtered version Defaults to analyze_sample.

summary_function

function to use when summarizing each sample's full details into the standard attributes. Defaults to summarize_sample.

ncores

integer number of CPU cores to use in parallel for sample analysis. Defaults to one less than half the number of detected cores with a minimum of 1. If 1, the function will run without using the parallel package.

known_alleles

data frame of custom allele names as defined for load_allele_names. if NULL only the names automatically generated for the dataset summary will be used.

Value

list of results, with summary set to the single summary data frame, files the processed sequence files, and samples the per-sample data frames.


ShawHahnLab/chiimp documentation built on Aug. 20, 2023, 1:41 a.m.