R/Echidna_data.R

#' hum_b_h
#'
#' human germline IgH (heavy chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 3 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The d gene name}
#'   \item{seq}{The corresponding sequence}
#' [[3]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("hum_b_h")
"hum_b_h"

#' hum_b_l
#'
#' human germline IgH (light chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 2 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("hum_b_l")
"hum_b_l"

#' hum_t_h
#'
#' human germline TRB (heavy chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 3 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The d gene name}
#'   \item{seq}{The corresponding sequence}
#' [[3]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("hum_t_h")
"hum_t_h"

#' hum_t_l
#'
#' human germline TRA (light chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 2 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("hum_t_l")
"hum_t_l"

#' mus_b_h
#'
#' C57BL/6 germline IgH (heavy chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 3 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The d gene name}
#'   \item{seq}{The corresponding sequence}
#' [[3]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("mus_b_h")
"mus_b_h"

#' mus_b_l
#'
#' C57BL/6 germline IgH (light chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 2 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("mus_b_l")
"mus_b_l"

#' mus_t_h
#'
#' C57BL/6 germline TRB (heavy chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 3 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The d gene name}
#'   \item{seq}{The corresponding sequence}
#' [[3]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("mus_t_h")
"mus_t_h"

#' mus_t_l
#'
#' C57BL/6 germline TRA (light chain v,d,j gene segments). When multiple alleles were present, the
#' first one was included. These names and sequences can be changed by
#' customized by changing this dataframe. Additionally, repeating elements
#' can give certain germline gene elements a larger probability of being used
#' during repertoire evolution.
#'
#' @format A list including 2 elements (data frames): v gene, d gene, j gene,respectively.
#' \describe{
#' [[1]]
#'   \item{gene}{The v gene name}
#'   \item{seq}{The corresponding sequence}
#' [[2]]
#'   \item{gene}{The j gene name}
#'   \item{seq}{The corresponding sequence}
#' }
#' @source IMGT
#' @usage data("mus_t_l")
"mus_t_l"

#' one_spot_df
#'
#' WRC hotspot mutations taken from Yaari et al., Frontiers in Immunology, 2013.
#' These include only the mutations following the WRC pattern,
#' where W equals A or T and R equals A or G). Custom mutation hotspots can be supplied
#' by modifying this dataframe. Repeating particular hotspot entries allows
#' for the hotspot to mutate more than one time per SHM event.
#'
#' @format A data frame with 32 rows and 6 variables:
#' \describe{
#'   \item{pattern}{Character array where each entry corresponds to a 5 base motif. The
#'   mutation probabilities correspond to the middle nucleotide in each 5mer.}
#'   \item{toA}{The probability for the middle nucleotide in "pattern" to mutate to an adenine}
#'   \item{toC}{The probability for the middle nucleotide in "pattern" to mutate to an cytosine}
#'   \item{toG}{The probability for the middle nucleotide in "pattern" to mutate to an guanine}
#'   \item{toT}{The probability for the middle nucleotide in "pattern" to mutate to an thymine}
#'   \item{Source}{The origin of how this motif was discovered. Either Inferred or Experimental}
#' }
#' @source Yaari et al., Frontiers in Immunology, 2013
#' @usage data("one_spot_df")
"one_spot_df"


#' hotspot_df
#' Hotspot mutations taken from Yaari et al., Frontiers in Immunology, 2013.
#' This contains transition probabilities for all 5mer combinations based
#' on high throughput sequencing data. The transition probabilities are for
#' the middle nucleotide in each 5mer set. This can be customized by changing the genes
#' and sequences. Custom mutation hotspots can be supplied
#' by modifying this dataframe. Repeating particular hotspot entries allows
#' for the hotspot to mutate more than one time per SHM event.
#'
#'  @format A data frame with 1024 rows and 6 variables:
#' \describe{
#'   \item{pattern}{Character array where each entry corresponds to a 5 base motif. The
#'   mutation probabilities correspond to the middle nucleotide in each 5mer.}
#'   \item{toA}{The probability for the middle nucleotide in "pattern" to mutate to an adenine}
#'   \item{toC}{The probability for the middle nucleotide in "pattern" to mutate to an cytosine}
#'   \item{toG}{The probability for the middle nucleotide in "pattern" to mutate to an guanine}
#'   \item{toT}{The probability for the middle nucleotide in "pattern" to mutate to an thymine}
#'   \item{Source}{The origin of how this motif was discovered. Either Inferred or Experimental}
#' }
#' @source Yaari et al., Frontiers in Immunology, 2013
#' @usage data("hotspot_df")
"hotspot_df"


#' mus_b_trans
#' A data frame contains mouse B cell average gene expression for multiple cell types, with the rows representing the gene names, column names representing the cell type names.
#' The original single cell sequencing data is retrieved from 10xgenomics and combined with experimental data
#' The expression level for different cell types are obtained by calculating the average expression after sorting the original data by markers as shown below.
#' @format A data frame with 26538 rows and 4 variables, with the rows representing the gene names, column names representing the cell type names.
#' \describe{
#' \item{NaiveBcell}{Cd19+;Cd27-;Cd38-}
#' \item{GerminalcenterBcell}{Fas+;Cd19+}
#' \item{Plasmacell}{Sdc1+}
#' \item{MemoryBcell}{Cd38+;Fas-}
#' }
#' @source https://support.10xgenomics.com/single-cell-vdj/datasets/3.0.0/vdj_v1_mm_c57bl6_pbmc_5gex; https://support.10xgenomics.com/single-cell-vdj/datasets/3.0.0/vdj_v1_mm_balbc_pbmc_5gex
#' @usage data("mus_b_trans")
"mus_b_trans"


#' trans_switch_prob_b
#' The probability for B cell transcriptome states switching. The row names of the matrix are the cell states the cell is switching from, the column names are the cells states the cell is switching to.
#' @format A 4*4 matrix. The row and clumn names are: "GerminalcenterBcell","NaiveBcell","Plasmacell","MemoryBcell".
#' The probability for a cell to switch from "GerminalcenterBcell" to "Plasmacell" is the value at trans_switch_prob_b[1,3].
#' @usage data("trans_switch_prob_b")
"trans_switch_prob_b"
#'
#' trans_switch_prob_t
#' The probability for T cell transcriptome states switching. The row names of the matrix are the cell states the cell is switching from, the column names are the cells states the cell is switching to.
#' @format A 7*7 matrix. The row and clumn names are: "NaiveCd4","ActivatedCd4","MemoryCd4","NaiveCd8","EffectorCd8","MemoryCd8","ExhaustedCd8".
#  The probability for a cell to switch from "NaiveCd4" to "ActivatedCd4" is the value at trans_switch_prob_t[1,2]. Since Cd4 expressing T cells normally don't turn into Cd8 expressing T cells, the default probabilities for Cd4 T cells switching to Cd8 T cells are set to 0.
#' @usage data("trans_switch_prob_t")
"trans_switch_prob_t"

#' class_switch_prob_mus
#' The probability matrix of class switching for mouse b cells. The row names of the matrix are the isotypes the cell is switching from, the column names are the isotypes the cell is switching to. All B cells start from IGHM, and switch to one of the other isotypes or remain the same.
#' @format A 9*9 matrix.The row and clumn names are "IGHM","IGHD","IGHG1","IGHG2A","IGHG2B","IGHG2C","IGHG3","IGHE","IGHA".
#' The probability for a cell to switch from "IGHM" to "IGHD" is the value at class_switch_prob_mus[1,2].
#' @usage data("class_switch_prob_mus")
"class_switch_prob_mus"

#' class_switch_prob_hum
#' The probability matrix of class switching for human b cells. The row names of the matrix are the isotypes the cell is switching from, the column names are the isotypes the cell is switching to. All B cells start from IGHM, and switch to one of the other isotypes or remain the same.
#' @format A 8*8 matrix.The row and clumn names are "IGHM","IGHD","IGHG1","IGHG2","IGHG3","IGHG4","IGHE","IGHA".
#' The probability for a cell to switch from "IGHM" to "IGHD" is the value at class_switch_prob_hum[1,2].
#' @usage data("class_switch_prob_hum")
"class_switch_prob_hum"

#' special_v
#' a dataframe, of heavy and light chain v gene combination and their probability to be selected for expansion.
#' @usage data("special_v")
"special_v"

#' vdj_length_prob
#' A list dataframe specifying lengths and probabilities of bases deleted or inserted at each junction site of VDJ recombination event.
#' @format a dataframe:
#' \describe{
#' \item{v3_deletion}{length and probability of deleted bases at 3' end of V segment}
#' \item{d5_deletion}{length and probability of deleted bases at 5' end of D segment}
#' \item{d3_deletion}{length and probability of deleted bases at 3' end of D segment}
#' \item{j5_deletion}{length and probability of deleted bases at 5' end of J segment}
#' \item{dj_insertion}{length and probability of inserted bases between D-J segment}
#' \item{vj_insertion}{length and probability of inserted bases between V-J segment for light or alpha chains}
#' }
#' @usage data("vdj_length_prob")
"vdj_length_prob"

#' iso_SHM_prob
#' A probability dataframe specifying SHM.nuc.prob for cells of different isotypes. The first column is the names of isotypes, while the second column is the SHM.nuc.prob of cell of that isotype. user can define different SHM.nuc.prob for isotypes.
#' @format a dataframe with 2 columns
#' @usage data("iso_SHM_prob")
"iso_SHM_prob"

#' pheno_SHM_prob
#' A probability dataframe specifying SHM.nuc.prob for cells of different phenotypes. The first column is the names of phenotypes, while the second column is the SHM.nuc.prob of cell of that phenotype. user can define different SHM.nuc.prob for phenotypes.
#' @format a dataframe with 2 columns
#' @usage data("pheno_SHM_prob")
"pheno_SHM_prob"

#' colors
#' A vector of characters specifying colors used in igraph phylogenetic tree. Default colors: "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3" ,"#A6D854"
#' @format a character vector
#' @usage data("colors")
"colors"

NULL

Try the Platypus package in your browser

Any scripts or data that you put into this service are public.

Platypus documentation built on Aug. 15, 2022, 9:08 a.m.