R/AbForests_SubRepertoiresByUniqueSeq.R

Defines functions AbForests_SubRepertoiresByUniqueSeq

Documented in AbForests_SubRepertoiresByUniqueSeq

#' Split single cell immune repertoire into sub-repertoires by isotype based on number of unique sequences

#' @description SubRepertoiresByUniqueSeq separates the single cell immune repertoire into 5 sub-repertoires taking into account only unique sequences. The goal is to determine the majority isotype per each network in the immune repertoire. Therefore, each sub-repertoire is dominated by isotype IGG, IGA, IGM, other and if there is an equal number of IGA and IGG isotypes in a network, IGA-IGG category exists respectively. In particular, in case there exists a tie in the number of IGA and IGM, the network is considered to contain IGA as majority isotype, while the same number of IGG and IGM in the network categorize this network as containing IGG as majority isotype.
#' @param list a list of data.frames. Each data.frame represents a clone lineage and contains 2 columns, one that describes the antibody sequences and the other which type of information (isotype) is considered in the analysis. This list of data.frames has been generated by ConvertStructure function based on the initial data input or the output of CsvToDf and user's preferences.
#' @param opt a string with options "isotype" and "cluster". The option "isotype" is utilized when the user desires to do an isotype analysis, while the selection of "cluster" denotes that an analysis based on transcriptome is requested.
#' @param distance_mat a custom integer distance matrix, or NULL for using the default distance matrix (calucated based on the levenshtein distance, which counts the number of mutations between sequences). Given the phylogenetic tree, a custom-made distance matrix can be produced by PlyloToMatrix function.
#' @param tie_flag  a string, with options 'rand', 'full', 'close_to_germ', 'far_from_germ', 'close_path_to_germ', 'far_path_from_germ','most_expanded' and 'least_expanded' for removing edges when equal distance (tie) in distance matrix.
#' 'rand' means random pruning in one of nodes, 'full' means keeping all nodes, close_to_germ means pruning of node(s) farthest from germline (based on number of intermediate nodes), 'far_from_germ' means pruning of node(s) closest to germline (based on number of intermediate nodes),
#' 'close_path_to_germ' means pruning of node(s) farthest from germline (based on edge path length), 'far_path_from_germ' meams pruning of node(s) closest to germline (based on edge path length),'most_expanded' means pruning of node(s) with the lowest B cell count(clonal frequency) and least_expanded, which means pruning of node(s) with the hightest B cell count(clonal frequency). In cases of subsequent ties, a random node is selected.
#' @param weight  logical variable. When its value is FALSE, then the weights of outgoing edges from Germline node are set to 1. When its value is TRUE, the weights are set to the difference between the number of mutations among sequences in germline and connected nodes(value in the corresponding distance matrix) and the absolute value of the difference between the sequence lengths of germline and corresponding connected nodes. In both cases, weights of remaining edges are extracted from the distance matrix.
#' @param random.seed a random seed, specified by the user, when random sampling of sequences happens in each of the cases described in tie_flag argument.
#' @param alg_opt a string denoting the version of the edge selection algorithm used in the construction of networks. "0" means the naive version and "1" the advanced one.
#' @param cdr3 variable with values 0 if the user desires to select full length sequences (only when the input is a list of csv files), 1 for sequences in the CDR3 only (only when the input is a list of csv files) and NULL otherwise.
#' @return  list a nested list of 5 sub-lists of data.frames. Each sub-list corresponds to the set of networks, in which a majority isotype is specifyied. list[[1]] or list$list_IGHG contains the networks, in data.frame format, with more IGG isotypes, list[[2]] or list$list_IGHA contains the networks, in data.frame format, with more IGA isotypes, list[[3]] or list$list_IGHM contains the networks, in data.frame format, with more IGM isotypes, list[[4]] or list$list_IGAG contains the networks, in data.frame format, with a tie in IGA and IGG isotypes and list[[5]] or list$list_other contains the networks, in data.frame format, with other isotypes apart from the aforementioned combinations.
#' @export
#' @seealso AntibodyForest, ConvertStructure, CsvToDf, PlyloToMatrix
#' @examples
#' \dontrun{
#' SubRepertoiresByUniqueSeq(list,opt="isotype",distance_mat=NULL,
#' tie_flag='close_to_germ',weight=TRUE,random.seed=165,alg_opt="naive",cdr3=NULL)
#'}


AbForests_SubRepertoiresByUniqueSeq<-function(list,opt,distance_mat,tie_flag,weight,random.seed,alg_opt,cdr3){

  if(length(cdr3)>0){
    list<-AbForests_ConvertStructure(list,opt,cdr3)
  }else{
    list<-AbForests_ConvertStructure(list,opt,NULL)
  }

  if (any(sapply(list, sapply,is.list))) {
    list<-lapply(list,lapply, function(k) .SPLITCOMBINATION(k,opt,distance_mat,tie_flag,weight,random.seed,alg_opt))
    list<-lapply(list, lapply,function(x) utils::head(x))
    list<-lapply(list, function(x) Filter(function(k) length(k)>0, x))
    list <- do.call(Map, c(c,list))
    list<-do.call(Map,c(c,list))
    n <- 2
    list <- lapply(list,function(x) split(x, as.integer(gl(length(x), n,
                                                           length(x))), function(z)
                                                             lapply(z, function(y) data.frame(x=z$Seq,y=z$isotype))))

    list<-lapply(list, function(x) Filter(length, x))
  }else{
    list<-lapply(list, function(k) .SPLITCOMBINATION(k,opt,distance_mat,tie_flag,weight,random.seed,alg_opt))
    n <- unique(unlist(lapply(list, names)))
    names(n) <- n
    list<-lapply(n, function(ni) (lapply(list, `[[`, ni)))
    list<-lapply(list, function(x) Filter(length, x))
  }
  return(list)
}

Try the Platypus package in your browser

Any scripts or data that you put into this service are public.

Platypus documentation built on Aug. 15, 2022, 9:08 a.m.