View source: R/RepDaAnalysisFns.R
runDaAnalysis | R Documentation |
Function performs clustering based differential abundance analysis of CDR3 sequences in two sample groups with repeat resampling strategy. It first performs within sample unsupervised clustering using subsequence frequency based unsupervised clustering, matches the clusters to their closest match across samples, and performs differential abundance testing at the level of matching clusters to identify differentially abundant condition associated CDR3 sequences
runDaAnalysis(repSeqObj, clusterby = "NT", kmerWidth = 4, paired = T, clusterDaPcutoff = 0.1, positionWt = F, distMethod = c("euclidean", "cosine"), useDynamicTreeCut = T, matchingMethod = "km", repeatResample = T, nRepeats = 10, resampleSize = 5000, useProb = T, returnAll = T, nRR = 1000)
repSeqObj |
is an object containing all repertoire sample data |
clusterby |
boolean; subsequence type to consider, either NT (nucleotide) or AA (amino acid) |
kmerWidth |
subsequence width to use, default is 4 for NT, and 3 for AA clusterby |
paired |
boolean; whether to perform paired analysis for matched datasets,default is true. |
clusterDaPcutoff |
sub-repertoire level differential abundance testing cut off, default is 0.1. This works well for our test cases. |
positionWt |
boolean; whether to use positional weights for kmer frequencies, default is false |
distMethod |
the distance method to be used for distance calculation between CDR3 feature vectors, use "euclidean" for nt 4-mer, and "cosine" for aa 3-mer feature vectors |
useDynamicTreeCut |
boolean; default true, uses Dynamic Tree cut algorithm to cut clustering dendrograms. if false, findOptimalK will be used to find optimal k |
matchingMethod |
matching method to match cluster centroids from all samples to identify subrepertoires; default is km (kmeans). If hc, hierarchical clustering will be used with dynamic tree cut to define clusters, if og an in house algorithm will be used that matches each cluster centroid in first sample to their closest centroids in all samples. |
repeatResample |
boolean; perform repeat resampling, default is true. If false, all repertoire dataset will be used for analysis without downsampling. |
nRepeats |
number of repeat resample runs to perform if repeatResample is true, default is 10 |
resampleSize |
the downsampling size in the repeat resample runs. default is 5000 |
useProb |
boolean; if true, probabilistic sampling is performed for downsampling with most frequenty CDR3s being more likely to be resampled. If false, all CDR3s have equal chance of being resampled. Default is true. |
returnAll |
boolean; if true, the function returns a list whose first and second elements are candidate CDR3s from differentially abundant subrepertoires along with their ranking statistics from enrichment and de-enrichment analyses respectively, the third element contains the directory where all intermediate repeat resample resuls are written. If false, the intermediate results address is not returned. |
nRR |
the number of permutations to perform in the ranking step of candidate DA CDR3s to determine statistical significance. |
analysisName |
prefix to the directory name in which intermediate results from resample runs will be written. |
a data frame with all candidate DA CDR3s if returnAll is false, a list with data frame of candidate DA CDR3s and address to all intermediate results if returnAll is true.
results <- runDaAnalysis(repObj,clusterby="NT",kmerWidth=4,paired=T,clusterDaPcutoff=0.1,positionWt = F,distMethod="euclidean",matchingMethod="km",nRepeats=2,resampleSize=1000,useProb=T,returnAll=T,nRR=1000)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.