knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", out.width = "100%", fig.width = 9, fig.height = 7 )
The NAIR
package contains a number of functions supplementary to buildRepSeqNetwork()
that can be used to perform additional downstream tasks.
We simulate some toy data for demonstration.
We simulate data consisting of two samples with 100 observations each, for a total of 200 observations (rows).
set.seed(42) library(NAIR) dir_out <- tempdir() toy_data <- simulateToyData() head(toy_data)
nrow(toy_data)
Before we go through the various functions, the code below shows an example of how some of them might be used together.
library(magrittr) # For pipe operator (%>%) toy_data %>% filterInputData("CloneSeq", drop_matches = "\\W") %>% buildNet("CloneSeq") %>% addNodeStats("all") %>% addClusterMembership("greedy", cluster_id_name = "cluster_greedy") %>% addClusterMembership("leiden", cluster_id_name = "cluster_leiden") %>% addClusterStats("cluster_leiden", "CloneSeq", "CloneCount") %>% addPlots(color_nodes_by = c("cluster_leiden", "cluster_greedy"), color_scheme = "Viridis" ) %>% labelClusters("cluster_leiden", cluster_id_col = "cluster_leiden") %>% labelClusters("cluster_greedy", cluster_id_col = "cluster_greedy") %>% saveNetwork(output_dir = tempdir(), output_name = "my_network")
addPlots()
addPlots()
can be used to generate plots of a network graph.
net <- buildRepSeqNetwork(toy_data, "CloneSeq") net <- addPlots(net, color_nodes_by = "SampleID")
See this article for more details.
addNodeStats()
addNodeStats()
can be used to compute node-level network properties for a network.
net <- addNodeStats(net, stats_to_include = "all")
See this article for more details.
addClusterStats()
addClusterStats()
can be used to perform cluster analysis for a network. It performs clustering, records the cluster membership of the nodes, and computes cluster properties.
net <- addClusterStats(net, cluster_fun = "walktrap", cluster_id_name = "cluster_walktrap")
See this article for more details.
addClusterMembership()
addClusterMembership()
can be used to perform clustering and record the cluster membership of the nodes without computing cluster properties. It is useful for performing multiple instances of clustering with different algorithms.
net <- addClusterMembership(net, cluster_fun = "leiden", cluster_id_name = "cluster_leiden" ) net <- addClusterMembership(net, cluster_fun = "louvain", cluster_id_name = "cluster_louvain" )
Refer here for more details.
addClusterLabels()
addClusterLabels()
can be used to label the clusters in a plot.
net <- addPlots(net, color_nodes_by = "cluster_louvain", color_scheme = "Viridis" ) net <- labelClusters(net, cluster_id_col = "cluster_louvain", top_n_clusters = 7, size = 7 ) net$plots$cluster_louvain
Refer here for more details.
labelNodes()
labelNodes()
can be used to label the nodes in network plots.
set.seed(42) small_sample <- simulateToyData(1, sample_size = 10, prefix_length = 1) net <- buildNet(small_sample, "CloneSeq", plot_title = NULL) net <- labelNodes(net, "CloneSeq", size = 4) net$plots[[1]]
saveNetwork()
saveNetwork()
can be used to save a list of network objects. Also prints any plots to a PDF containing one plot per page.
saveNetwork(net, output_dir = dir_out, output_type = "individual")
The parameters output_dir
, output_type
, output_name
, pdf_width
and pdf_height
have the same behavior as they do in the buildRepSeqNetwork()
function.
saveNetworkPlots()
saveNetworkPlots()
can be used to save the plots for a network to a PDF containing one plot per page.
saveNetworkPlots(net$plots, outfile = file.path(dir_out, "plots.pdf"))
loadDataFromFileList()
loadDataFromFileList()
can be used to load data from multiple files and combine them into a single data frame.
dat <- loadDataFromFileList(list.files(my_dir), input_type = "rds")
The primary parameter accepts a character vector containing file paths or a list containing file paths and connections. Each element corresponds to a single file. Each file is assumed to contain the data for a single sample, with observations indexed by row, and with the same columns across samples.
The supported values of input_type
are "rds"
, "rda"
, "csv"
, "csv2"
, "tsv"
and "table"
. Each value specifies a different function to load the files. The respective functions are readRDS()
, load()
, read.csv()
, read.csv2()
, read.delim()
, and read.table()
.
For text formats (values of input_type
other than "rds"
and "rda"
), non-default argument values can be specified to the optional parameters of the reading function using the read.args
argument. It accepts a named list of argument values.
loadDataFromFileList(list.files(my_dir), input_type = "table", read.args = list( header = TRUE, sep = " ", dec = ",", na.strings = "NA!", row.names = 1, col.names = c("RowID", "CloneSeq", "CloneFrequency", "CloneCount", "VGene" ) ) )
See ?utils::read.table()
for the parameters and their accepted values. Note that read.csv()
, read.csv2()
and read.delim()
are identical to read.table()
other than their default argument values. Some examples of useful parameters include:
header
: Whether the first row contains column names. sep
: The character separating consecutive values in each row. dec
: The character used as a decimal point. quote
: The character(s) used as quotes within character strings.na.strings
: The character string representing NA values.row.names
: The row names or the column containing the row names.col.names
: The column names, for files without a header row.colClasses
: For manually specifying the class of each column.as.is
: For specifying the character columns to be converted to factors.nrows
: Max number of rows (specification can improve memory usage)For the "rda"
input type, the data_symbols
parameter accepts a character vector specifying the name of each sample's data frame within its respective Rdata file (i.e., the name of the data frame in the R environment). A single character string can be used if each sample's data frame has the same name.
save(df_sample1, file = file_1) save(df_sample2, file = file_2) save(df_sample3, file = file_3) loadDataFromFileList(c(file_1, file_2, file_3), input_type = "rda", data_symbols = c("df_sample1", "df_sample2", "df_sample3" ) )
combineSamples()
combineSamples()
has the same default behavior as loadDataFromFileList()
, but possesses additional parameters that give it extra functionality, such as the ability to filter data and assign sample/subject/group IDs to each data file, which are then included as variables in the combined data frame.
dat <- combineSamples(list.files(my_dir), input_type = "rds", min_seq_length = 7, drop_matches = "[*|_]", subset_cols = c("CloneSeq", "CloneCount", "VGene"), sample_ids = 1:5, subject_ids = c(1, 2, 2, 3, 3), group_ids = c(1, 1, 1, 2, 2) )
filterInputData()
filterInputData()
can be used to filter data prior to performing network analysis.
filtered_data <- filterInputData(toy_data, seq_col = "CloneSeq", min_seq_length = 13, drop_matches = "GGGG", subset_cols = c("CloneFrequency", "SampleID"), count_col = "CloneCount", verbose = TRUE )
The function has parameters data
, seq_col
, min_seq_length
, drop_matches
and subset_cols
, all of which behave in the same manner as seen in buildRepSeqNetwork()
. In addition, the count_col
parameter can be used to specify a column containing the clone count or UMI count. If specified, observations with NA
values in this column will be removed from the data.
aggregateIdenticalClones()
aggregateIdenticalClones()
can be used on bulk AIRR-seq data to aggregate data rows containing the same clone sequence, with the clone counts and clone frequencies being added together. Aggregation can be restricted to being performed only within groups that are defined based on specified grouping variables.
my_data <- data.frame( clone_seq = c("ATCG", rep("ACAC", 2), rep("GGGG", 4)), clone_count = rep(1, 7), clone_freq = rep(1/7, 7), time_point = c("t_0", rep(c("t_0", "t_1"), 3)), subject_id = c(rep(1, 5), rep(2, 2)) ) # group clones by time point and subject ID data_agg_time_subject <- aggregateIdenticalClones(my_data, clone_col = "clone_seq", count_col = "clone_count", freq_col = "clone_freq", grouping_cols = c("subject_id", "time_point") )
getNeighborhood()
getNeighborhood()
can be used to extract a subset of observations with receptor sequences sufficiently similar to a target sequence.
nbd <- getNeighborhood(toy_data, seq_col = "CloneSeq", target_seq = "GGGGGGGAATTGG" )
generateNetworkObjects()
generateNetworkObjects()
can be used to construct the minimal output possible from buildRepSeqNetwork()
. It does not filter the input data, produce plots, compute network properties, perform cluster analysis or save data.
net <- generateNetworkObjects(toy_data, "CloneSeq")
The function has parameters data
, seq_col
, dist_type
, dist_cutoff
and drop_isolated_nodes
, all of which have the same behavior and default values as seen in buildRepSeqNetwork()
.
generateNetworkGraph()
generateNetworkGraph()
can be used to generate the network igraph
from the adjacency matrix.
net$igraph <- generateNetworkGraph(net$adjacency_matrix)
generateAdjacencyMatrix()
generateAdjacencyMatrix()
can be used to compute the network adjacency matrix from the list of receptor sequences.
output$adjacency_matrix <- generateAdjacencyMatrix(toy_data$CloneSeq) # use same settings from original call to buildRepSeqNetwork() net$adjacency_matrix <- generateAdjacencyMatrix( net$node_data$CloneSeq, dist_type = net$details$dist_type, dist_cutoff = net$details$dist_cutoff, drop_isolated_nodes = net$details$drop_isolated_nodes )
The parameters dist_type
, dist_cutoff
and drop_isolated_nodes
have the same behavior and default values as in buildRepSeqNetwork()
.
# clean up temp directory file.remove( file.path( tempdir(), c("MyRepSeqNetwork_NodeMetadata.csv", "MyRepSeqNetwork_ClusterMetadata.csv", "MyRepSeqNetwork.pdf", "MyRepSeqNetwork_EdgeList.txt", "MyRepSeqNetwork_AdjacencyMatrix.mtx", "MyRepSeqNetwork_Details.rds", "MyRepSeqNetwork_Plots.rds", "MyRepSeqNetwork_GraphLayout.txt", "MyRepSeqNetwork.rds" ) ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.