batchConvertAnalysis: Annotating gene conversion events in BCR repertoire data.

View source: R/batchConvertAnalysis.R

batchConvertAnalysisR Documentation

Annotating gene conversion events in BCR repertoire data.

Description

This is the main function of the BrepConvert package for users to annotate gene conversion events in BCR repertoire data, given DNA sequence sets of functional and pseudogene V gene alleles.

Usage

batchConvertAnalysis(
  functional,
  pseudogene,
  repertoire,
  blat_exec,
  convertAA = TRUE
)

Arguments

functional

character, filepath to FASTA file containing DNA sequence(s) of the functional V gene allele(s).

pseudogene

character, filepath to FASTA file containing DNA sequence(s) of the pseudogene V gene allele(s).

repertoire

a named vector of characters corresponding to IMGT-gapped DNA sequence from the BCR repertoire. The names are taken as the identifiers of the sequences. See examples below for suggestions on how to generate this from AIRR format repertoire data.

blat_exec

character, filepath to the executable of the BLAT program.

convertAA

Do you want amino acid sequences of the original, germline functional allele and the observed, converted segment? (default: TRUE)

Value

A data.frame with each row corresponding to one gene conversion event. The following annotations are stored in separate columns:

event

integer from 1, 2, ... up to n = the number of events observed on a sequence. Just an identifier of gene conversion event.

possibility

character from a, b, ... and so on. Denote different possibilities of donor pseudogenes which could account for the observed conversion event.

start

integer, position on the repertoire sequence which denotes the start of the conversion event.

end

integer, position on the repertoire sequence which denotes the end of the conversion event.

gene

character, a semicolon-delimited list of possible donor pseudogenes which could account for the conversion event. NA if the gene conversion event could not be matched to any pseudogenes.

fiveprime_identical_length

integer, the number of nucleotides at the 5' of the named conversion event which is identical between the observed sequence and the named genes. NA if the gene conversion event could not be matched to any pseudogenes.

threeprime_identical_length

integer, the number of nucleotides at the 3' of the named conversion event which is identical between the observed sequence and the named genes. NA if the gene conversion event could not be matched to any pseudogenes.

edit_distance

integer, Levenshtein distance comparing the sequence stretch observed on the repertoire sequence and the aligned sequence stretch originated from the donor pseudogene(s). NA if the gene conversion event could not be matched to any pseudogenes.

nearest_AID_motif

integer, the position on the observed sequence where a DNA motif targeted by the AID enzyme can be found closest (at 5') to the gene conversion event.

AID_motif

character, the DNA motif targeted by the AID enzyme which is closest (at 5') to the gene conversion event, at the position given by nearest_AID_motif.

distance_to_AID_motif

integer, the number of nucleotides between the named gene conversion event and the given AID_motif.

SeqID

character, identifier for the repertoire sequence, taken from the names attribute of the input parameter repertoire.

seq_event

character, nucleotide sequence stretch corresponding to the gene conversion event.

seq_5prime

character, sequence stretch 10 nucleotides 5' of the start site of the gene conversion event. NA if the gene conversion event begins at position 1.

seq_3prime

character, sequence stretch 10 nucleotides 3' of the end site of the gene conversion event. NA if the gene conversion event stops at the last position of the V gene.

germline_AA_narrow

(avaialble if convertAA == TRUE) Amino acid sequence of the region between start and end, from the functional germline allele.

observed_AA_narrow

(avaialble if convertAA == TRUE) Amino acid sequence of the region between start and end, from the observed sequence sampled in the repertoire data.

germline_AA_broad

(avaialble if convertAA == TRUE) Amino acid sequence of the region between start and end plus the flanking stretches given by fiveprime_identical_length and threeprime_identical_length, from the functional germline allele.

observed_AA_broad

(avaialble if convertAA == TRUE) Amino acid sequence of the region between start and end plus the flanking stretches given by fiveprime_identical_length and threeprime_identical_length, from the observed sequence sampled in the repertoire data.

Examples

## Not run: 
# FASTA files containing pseudogene and functional alleles of the Chicken IGLV
# locus are shipped with the package.
functional_IGLV <- system.file("extdata/IMGT_Chicken_IGLV_F.fasta",
                               package = "BrepConvert")
pseudogene_IGLV <- system.file("extdata/IMGT_Chicken_IGLV_P.fasta",
                               package = "BrepConvert")

# An executable of the BLAT program is also shipped with the package.
blat <- system.file("exe/blat", package = "BrepConvert")
# NOTE: This works for Linux OS only. For other OS, please download/compile
# the executable and indicate the filepath like so:
# blat <- "/home/abc/Documents/blat/blat"

annotation <- batchConvertAnalysis(
  functional = functional_IGLV,
  pseudogene = pseudogene_IGLV,
  repertoire = repertoire, # notice here the vector is passed, NOT the entire data table!
  blat_exec = blat
)


## End(Not run)


Fraternalilab/BrepConvert documentation built on Oct. 14, 2022, 5:54 p.m.