createFullHaplotype: Anchor gene haplotype inference

View source: R/functions.R

createFullHaplotypeR Documentation

Anchor gene haplotype inference

Description

The createFullHaplotype functions infers haplotype based on an anchor gene.

Usage

createFullHaplotype(
  clip_db,
  toHap_col = c("v_call", "d_call"),
  hapBy_col = "j_call",
  hapBy = "IGHJ6",
  toHap_GERM = NULL,
  relative_freq_priors = TRUE,
  kThreshDel = 3,
  rmPseudo = TRUE,
  deleted_genes = c(),
  nonReliable_Vgenes = c(),
  min_minor_fraction = 0.3,
  single_gene = TRUE,
  chain = c("IGH", "IGK", "IGL", "TRB")
)

Arguments

clip_db

a data.frame in AIRR format. See details.

toHap_col

a vector of column names for which a haplotype should be inferred. Default is v_call and d_call

hapBy_col

column name of the anchor gene. Default is j_call

hapBy

a string of the anchor gene name. Default is IGHJ6.

toHap_GERM

a vector of named nucleotide germline sequences matching the allele calls in toHap_col columns in clip_db.

relative_freq_priors

if TRUE, the priors for Bayesian inference are estimated from the relative frequencies in clip_db. Else, priors are set to c(0.5,0.5). Default is TRUE

kThreshDel

the minimum lK (log10 of the Bayes factor) to call a deletion. Default is 3.

rmPseudo

if TRUE non-functional and pseudo genes are removed. Default is TRUE.

deleted_genes

double chromosome deletion summary table. A data.frame created by deletionsByBinom.

nonReliable_Vgenes

a list of known non reliable gene assignments. A list created by nonReliableVGenes.

min_minor_fraction

the minimum minor allele fraction to be used as an anchor gene. Default is 0.3

single_gene

if to only consider genes from single assignment. If true then calls where genes appear with others are discarded. If false then the calls are seperated an counted for all genes that appeared. Default is True.

chain

the IG/TR chain: IGH,IGK,IGL,TRB. Default is IGH.

Details

Function accepts a data.frame in AIRR format (https://changeo.readthedocs.io/en/stable/standard.html) containing the following columns:

  • 'subject': The subject name

  • 'v_call': V allele call(s) (in an IMGT format)

  • 'd_call': D allele call(s) (in an IMGT format, only for heavy chains)

  • 'j_call': J allele call(s) (in an IMGT format)

Value

A data.frame, in which each row is the haplotype inference summary of a gene from the column selected in toHap_col.

The output containes the following columns:

  • subject: the subject name.

  • gene: the gene name.

  • Anchor gene allele 1: the haplotype inference for chromosome one. The column name is the anchor gene with the first allele.

  • Anchor gene allele 2: the haplotype inference for chromosome two. The column name is the anchor gene with the second allele.

  • alleles: allele calls for the gene.

  • proirs_row: priors based on relative allele usage of the anchor gene.

  • proirs_col: priors based on relative allele usage of the inferred gene.

  • counts1: the appereance count on each chromosome of the first allele from alleles, the counts are seperated by a comma.

  • k1: the Bayesian factor value for the first allele (from alleles) inference.

  • counts2: the appereance count on each chromosome of the second allele from alleles, the counts are seperated by a comma.

  • k2: the Bayesian factor value for the second allele (from alleles) inference.

  • counts3: the appereance count on each chromosome of the third allele from alleles, the counts are seperated by a comma.

  • k3: the Bayesian factor value for the third allele (from alleles) inference.

  • counts4: the appereance count on each chromosome of the fourth allele from alleles, the counts are seperated by a comma.

  • k4: the Bayesian factor value for the fourth allele (from alleles) inference.

Examples

# Load example data and germlines
data(samples_db, HVGERM, HDGERM)

# Selecting a single individual
clip_db = samples_db[samples_db$subject=='I5', ]

# Infering haplotype
haplo_db = createFullHaplotype(clip_db,toHap_col=c('v_call','d_call'),
hapBy_col='j_call',hapBy='IGHJ6',toHap_GERM=c(HVGERM,HDGERM))



rabhit documentation built on Feb. 16, 2023, 9:25 p.m.