generate_profiles: Generate Protein k-mer frequency profiles
In armenabnousi/naddaR: Prediction of Protein Conserved Regions Using NADDA Algorithm

Description Usage Arguments Details Value Author(s) Examples

constructs a dataframe where each row corresponds to one index of one protein sequence from the input dataset. It can be used to generate training and test sets to train a NADDA classification model or to predict the conserved indices of input sequences based on a trained model.

1
2
3

generate_profiles(obj, klen = 6, parallel = TRUE, nproc = ifelse(parallel,
  pbdMPI::comm.size(), 1), normalize = TRUE, impute = TRUE, winlen = 20,
  imputing_length = winlen%/%2, distributed = FALSE)

`obj`	A filepath to a fasta file containing protein sequences or an AAStringSet object containing the sequences
`klen`	length of the k-mers to be used
`parallel`	Indicating whether the operation should be performed in parallel
`nproc`	Currently not supported. Will use all processors available to the job on cluster
`normalize`	A boolean value, indicating whether the k-mer frequencies should be normalized
`impute`	A boolean value, indicating whether imputed values should be inserted at the beginning and the end of the profiles
`winlen`	An integer, size the window used for generation of each instance
`imputing_length`	An integer, number of frequencies from the beginning and end of a sequence profile that should be used to impute the new values
`distributed`	A boolean, indicating whether the data is spread among multiple processors.

If parallel is set to TRUE and distributed is set to FALSE, the method distributes the data between different processors and sets distributed to TRUE. Otherwise, if the parallel is set to FALSE and distributed is set to TRUE, the kmer frequencies are computed on each processor separately but then communicated between each other, and therefore at the end all processors have the same set of frequencies for kmers stored, using which they will generate frequency profiles for their chunk of sequences. If you prefer to run the operation in serial, set both parallel and distributed to FALSE.

Returns a list with one vector for each protein sequence in the dataset. A vector for sequence s contains |s| - klen + 1 indices if impute is set to FALSE (where |s| is the length of the sequence). Otherwise it will include one index for each position in the sequence but also winlen %\% 2 indices at the beginning and end of each sequence.

Armen Abnousi

library(Biostrings)
library(data.table)
## Generate a set of three example protein sequences
seqs <- AAStringSet(c("seq1"="MLVVD",
                      "seq2"="PVVRA",
                      "seq3"="LVVR"))
## Count the kmers and generate a dataframe of the frequencies
profs <- generate_profiles(seqs, klen = 3, parallel = FALSE, winlen = 5, normalize = FALSE)
head(profs)
profs
##[[1]]
##[[1]]$freqs
##[1] 1.5 1.5 1.0 2.0 1.0 1.5 1.5 1.5 1.5
##[[1]]$seq
##[1] "seq1"
##
##[[2]]
##[[2]]$freqs
##[1] 1.5 1.5 1.0 2.0 1.0 1.5 1.5 1.5 1.5
##[[2]]$seq
##[[1]] "seq2"
##
##[[3]]
##[[3]]$freqs
##[1] 2 2 2 2 2 2 2 2 
##[[3]]$seq
##[1] "seq3"

armenabnousi/naddaR documentation built on May 24, 2019, 8:47 p.m.

armenabnousi/naddaR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

armenabnousi/naddaR
Prediction of Protein Conserved Regions Using NADDA Algorithm

generate_profiles: Generate Protein k-mer frequency profiles
In armenabnousi/naddaR: Prediction of Protein Conserved Regions Using NADDA Algorithm

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to generate_profiles in armenabnousi/naddaR...

R Package Documentation

Browse R Packages

We want your feedback!

armenabnousi/naddaR Prediction of Protein Conserved Regions Using NADDA Algorithm

generate_profiles: Generate Protein k-mer frequency profiles In armenabnousi/naddaR: Prediction of Protein Conserved Regions Using NADDA Algorithm

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to generate_profiles in armenabnousi/naddaR...

R Package Documentation

Browse R Packages

We want your feedback!

armenabnousi/naddaR
Prediction of Protein Conserved Regions Using NADDA Algorithm

generate_profiles: Generate Protein k-mer frequency profiles
In armenabnousi/naddaR: Prediction of Protein Conserved Regions Using NADDA Algorithm