View source: R/batchConvertAnalysis.R
batchConvertAnalysis | R Documentation |
This is the main function of the BrepConvert
package for users to annotate gene conversion events in BCR repertoire data, given DNA sequence sets of functional and pseudogene V gene alleles.
batchConvertAnalysis( functional, pseudogene, repertoire, blat_exec, convertAA = TRUE )
functional |
character, filepath to FASTA file containing DNA sequence(s) of the functional V gene allele(s). |
pseudogene |
character, filepath to FASTA file containing DNA sequence(s) of the pseudogene V gene allele(s). |
repertoire |
a named vector of characters corresponding to IMGT-gapped DNA sequence from the BCR repertoire. The names are taken as the identifiers of the sequences. See examples below for suggestions on how to generate this from AIRR format repertoire data. |
blat_exec |
character, filepath to the executable of the BLAT program. |
convertAA |
Do you want amino acid sequences of the original, germline functional allele and the observed, converted segment? (default: TRUE) |
A data.frame with each row corresponding to one gene conversion event. The following annotations are stored in separate columns:
integer from 1, 2, ... up to n = the number of events observed on a sequence. Just an identifier of gene conversion event.
character from a, b, ... and so on. Denote different possibilities of donor pseudogenes which could account for the observed conversion event.
integer, position on the repertoire sequence which denotes the start of the conversion event.
integer, position on the repertoire sequence which denotes the end of the conversion event.
character, a semicolon-delimited list of possible donor pseudogenes which could account for the conversion event. NA
if the gene conversion event could not be matched to any pseudogenes.
integer, the number of nucleotides at the 5' of the named conversion event which is identical between the observed sequence and the named genes
. NA
if the gene conversion event could not be matched to any pseudogenes.
integer, the number of nucleotides at the 3' of the named conversion event which is identical between the observed sequence and the named genes
. NA
if the gene conversion event could not be matched to any pseudogenes.
integer, Levenshtein distance comparing the sequence stretch observed on the repertoire sequence and the aligned sequence stretch originated from the donor pseudogene(s). NA
if the gene conversion event could not be matched to any pseudogenes.
integer, the position on the observed sequence where a DNA motif targeted by the AID enzyme can be found closest (at 5') to the gene conversion event.
character, the DNA motif targeted by the AID enzyme which is closest (at 5') to the gene conversion event, at the position given by nearest_AID_motif
.
integer, the number of nucleotides between the named gene conversion event and the given AID_motif
.
character, identifier for the repertoire sequence, taken from the names
attribute of the input parameter repertoire
.
character, nucleotide sequence stretch corresponding to the gene conversion event.
character, sequence stretch 10 nucleotides 5' of the start site of the gene conversion event. NA
if the gene conversion event begins at position 1.
character, sequence stretch 10 nucleotides 3' of the end site of the gene conversion event. NA
if the gene conversion event stops at the last position of the V gene.
(avaialble if convertAA == TRUE
) Amino acid sequence of the region between start
and end
, from the functional germline allele.
(avaialble if convertAA == TRUE
) Amino acid sequence of the region between start
and end
, from the observed sequence sampled in the repertoire data.
(avaialble if convertAA == TRUE
) Amino acid sequence of the region between start
and end
plus the flanking stretches given by fiveprime_identical_length
and threeprime_identical_length
, from the functional germline allele.
(avaialble if convertAA == TRUE
) Amino acid sequence of the region between start
and end
plus the flanking stretches given by fiveprime_identical_length
and threeprime_identical_length
, from the observed sequence sampled in the repertoire data.
## Not run: # FASTA files containing pseudogene and functional alleles of the Chicken IGLV # locus are shipped with the package. functional_IGLV <- system.file("extdata/IMGT_Chicken_IGLV_F.fasta", package = "BrepConvert") pseudogene_IGLV <- system.file("extdata/IMGT_Chicken_IGLV_P.fasta", package = "BrepConvert") # An executable of the BLAT program is also shipped with the package. blat <- system.file("exe/blat", package = "BrepConvert") # NOTE: This works for Linux OS only. For other OS, please download/compile # the executable and indicate the filepath like so: # blat <- "/home/abc/Documents/blat/blat" annotation <- batchConvertAnalysis( functional = functional_IGLV, pseudogene = pseudogene_IGLV, repertoire = repertoire, # notice here the vector is passed, NOT the entire data table! blat_exec = blat ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.