aaDistribution: Amino acid distribution of sequences

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

This function calculates the amino acid distribution of sequences. Distribution is calculated for sequences of the same length and therein for each position.

aaDistribution returns a list containing either only amino acid distribution or a list containing amino acid distribution and analyzed number of sequences per length.

plotAADistribution visualizes the amino acid distribution of sequences of the same length.

Usage

1
2
3
4
aaDistribution(sequences = NULL, numberSeq = FALSE)

plotAADistribution(aaDistribution.tab=NULL, plotSeqN=FALSE, 
     colors=NULL, PDF=NULL, ...)

Arguments

sequences

A vector containing amino acid sequences.

numberSeq

TRUE: table containing number of sequences will be returned, as well (default: FALSE).

aaDistribution.tab

Output list of function aaDistribution()

plotSeqN

TRUE: Number of sequences for each length will be plotted (see Details; default: FALSE).

colors

Colors to be used for figure containing number of sequences (default: rainbow)

PDF

PDF project name (see Details)

...

Details

The vector containing sequences will be divided in sequences of the same length and then amino acid distribution for each position is analyzed.

If numberSeq = T, the number of sequences used for the analysis of sequences of the same length will be returned, as well. This information is also required for plotAADistribution(..., plotSeqN = T). Sequence numbers equal to 0 are not plotted; the smallest number is 1.

The PDF character string should be only the project name (without ".pdf"). If plotAADistr = T a figure called "PDF"_Amino-acid-distribution.pdf will be saved to the working directory. If plotSeqN = T a figure called "PDF"_Number-of-sequences.pdf will be saved, as well.

Value

Output is a list containing

Amino_acid_distribution

list contains data frames of amino acid distributions (including stop codons "*") for each length

Number_of_sequences_per_length

data frame contains the number of sequences for each length, used for analysis (optional)

Note

For large datasets computational time can be extensive.

Author(s)

Julia Bischof

See Also

trueDiversity

Examples

1
2
3
4
5
data(aaseqtab)
aadistr<-aaDistribution(sequences = aaseqtab$CDR3_IMGT, numberSeq = TRUE)
## Not run: plotAADistribution(aaDistribution.tab=aadistr, plotAADistr=TRUE, plotSeqN=FALSE, 
     PDF="test")
## End(Not run)

bcRep documentation built on May 2, 2019, 5:14 a.m.