R/seqMatrix.R

Defines functions seqMatrix

Documented in seqMatrix

#' Sequence matrix
#' 
#' Creates a data frame with unique, productive amino acid sequences as rows and 
#' sample names as headers.  Each value in the data frame represents the 
#' frequency that the sequence appeared in the sample.
#' 
#' @param productive.aa A list data frames of of productive amino acid sequences 
#' generated by LymphoSeq function productiveSeq where the aggregate parameter 
#' was set to "aminoAcid". 
#' @param sequences A character vector of amino acid sequences of interest.  It 
#' is useful to specify the output from the LymphoSeq functions uniqueSeqs or 
#' topSeqs and subsetting the "aminoAcid" column.  See examples below.
#' @return Returns a data frame of unique, productive amino acid sequences as 
#' rows and the \% frequency it appears in each sample as columns.
#' @seealso \code{\link{topSeqs}} and \code{\link{uniqueSeqs}}
#' @examples
#' file.path <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")
#' 
#' file.list <- readImmunoSeq(path = file.path)
#' 
#' productive.aa <- productiveSeq(file.list = file.list, aggregate = "aminoAcid")
#' 
#' top.seqs <- topSeqs(productive.seqs = productive.aa, top = 0.1)
#' 
#' sequence.matrix <- seqMatrix(productive.aa = productive.aa, 
#'    sequences = top.seqs$aminoAcid)
#' 
#' unique.seqs <- uniqueSeqs(productive.aa = productive.aa)
#' 
#' sequence.matrix <- seqMatrix(productive.aa = productive.aa, 
#'    sequences = unique.seqs$aminoAcid)
#' 
#' # It can be helpful to combine top.freq and sequence.matrix
#' top.freq <- topFreq(productive.aa = productive.aa, percent = 0)
#' 
#' sequence.matrix <- seqMatrix(productive.aa = productive.aa, sequences = top.freq$aminoAcid)
#' 
#' top.freq.matrix <- merge(top.freq, sequence.matrix)
#' @export
#' @importFrom plyr llply ldply
seqMatrix <- function(productive.aa, sequences) {
    if(any(unlist(lapply(productive.aa, function(x) 
        x[, "aminoAcid"] == "" |
        grepl("\\*", x[, "aminoAcid"]) | 
        duplicated(x[, "aminoAcid"]))))){
        stop("Your list contains unproductive sequences or has not been aggreated for productive amino acid sequences.  Remove unproductive sequences first using the function productiveSeq with the aggregate parameter set to 'aminoAcid'.", call. = FALSE)
    }
    sequence.matrix <- plyr::ldply(productive.aa, function(x) 
        x[match(sequences, x$aminoAcid), "frequencyCount"])
    rownames(sequence.matrix) <- sequence.matrix$.id
    sequence.matrix$.id <- NULL
    colnames(sequence.matrix) <- sequences
    sequence.matrix <- as.data.frame(t(sequence.matrix))
    sequence.matrix[is.na(sequence.matrix)] <- 0
    sequence.matrix$numberSamples <- apply(sequence.matrix, 1, function(x) 
        length(which(x > 0)))
    sequence.matrix <- sequence.matrix[order(sequence.matrix$numberSamples, decreasing = TRUE), ]
    sequence.matrix$aminoAcid = rownames(sequence.matrix)
    rownames(sequence.matrix) = NULL
    sequence.matrix <- sequence.matrix[c("aminoAcid", "numberSamples", setdiff(names(sequence.matrix), c("aminoAcid", "numberSamples")))]
    return(sequence.matrix)
}

Try the LymphoSeq package in your browser

Any scripts or data that you put into this service are public.

LymphoSeq documentation built on Nov. 8, 2020, 8:09 p.m.