getRandomSubsample: Generating a random subsample

View source: R/getRandomSubsample.R

getRandomSubsampleR Documentation

Generating a random subsample

Description

Generating a random subsample. This function is used by the turbo_gliph function to generate random subsamples of naive reference sequences with a similar size as the sample size for random repeat sampling. If specified the function tries to maintain the distribution of cdr3 lengths and/or V-gene usage of the whole sample in the subsample.

Usage

getRandomSubsample(
  cdr3_len_stratify = FALSE,
  vgene_stratify = FALSE,
  refseqs_motif_region,
  motif_region,
  motif_lengths_list,
  ref_motif_lengths_id_list,
  motif_region_vgenes_list,
  ref_motif_vgenes_id_list,
  ref_lengths_vgenes_list,
  lengths_vgenes_list
)

Arguments

cdr3_len_stratify

logical. By default FALSE. Specifies whether the distribution of the cdr3 lengths in the sample should be retained during repeat random sampling.

vgene_stratify

logical. By default FALSE. Specifies whether the distribution of V-genes in the sample should be retained during repeat random sampling.

refseqs_motif_region

character vector. Contains the motif regions of reference sequences.

motif_region

character vector. Contains the motif regions of sample sequences.

motif_lengths_list

list. Required if cdr3_len_stratify = TRUE. The elements are named after the different cdr3 lengths in the motif_region vector and contain the frequency of occurrence of the corresponding cdr3 length in the motif_region vector.

ref_motif_lengths_id_list

list. Required if cdr3_len_stratify = TRUE. The elements are named after the different cdr3 lengths in the motif_region vector and contain the indices of sequences in the refseqs_motif_region vector with the corresponding cdr3 length.

motif_region_vgenes_list

list. Required if vgene_stratify = TRUE. The elements are named after the different V-genes of the sequences in the motif_region vector and contain the frequency of occurrence of the corresponding V-genes of the sequences in the motif_region vector.

ref_motif_vgenes_id_list

list. Required if vgene_stratify = TRUE. The elements are named after the different V-genes of the sequences in the motif_region vector and contain the indices of sequences in the refseqs_motif_region vector with the corresponding V-gene.

ref_lengths_vgenes_list

list. Required if cdr3_len_stratify = TRUE and vgene_stratify = TRUE. The elements are lists itself and are named after the different cdr3 lengths in the motif_region vector. The elements of any list are named after the different V-genes of the sequences in the motif_region vector and contain the frequency of simultaneous occurrence of the corresponding cdr3 length and V-gene of the sequences in the refseqs_motif_region vector.

lengths_vgenes_list

list. Required if cdr3_len_stratify = TRUE and vgene_stratify = TRUE. The elements are lists itself and are named after the different cdr3 lengths in the motif_region vector. The elements of any list are named after the different V-genes of the sequences in the motif_region vector and contain the frequency of simultaneous occurrence of the corresponding cdr3 length and V-gene of the sequences in the motif_region vector.

Value

getRandomSubsample returns a character vector containing a subsample of refseqs_motif_region with the same size as motif_region


HetzDra/turboGliph documentation built on Oct. 2, 2022, 2:22 a.m.