import_dataset | R Documentation |
Read an anndata
file which is the output of python metacells
package,
and import the metacell dataset to MCView. Each project can have multiple datasets
which can be in the app using the right sidebar.
import_dataset(
project,
dataset,
anndata_file,
cell_type_field = NULL,
metacell_types_file = NULL,
cell_type_colors_file = NULL,
outliers_anndata_file = NULL,
cluster_metacells = TRUE,
cluster_k = NULL,
metadata_fields = NULL,
metadata = NULL,
metadata_colors = NULL,
cell_metadata = NULL,
cell_to_metacell = NULL,
gene_modules_file = NULL,
gene_modules_k = NULL,
calc_gg_cor = TRUE,
gene_names = NULL,
metacell_graphs = NULL,
atlas_project = NULL,
atlas_dataset = NULL,
projection_weights_file = NULL,
copy_atlas = TRUE,
minimal_max_log_fraction = -13,
minimal_relative_log_fraction = 2,
umap_anchors = NULL,
umap_config = NULL,
min_umap_log_expr = -14,
genes_per_anchor = 30,
layout = NULL,
default_graph = NULL,
overwrite = TRUE,
copy_source_file = FALSE,
...
)
project |
path to the project |
dataset |
name for the dataset, e.g. "PBMC". The name of the dataset can only contain alphanumeric characters, dots, dashes and underscores. |
anndata_file |
path to |
cell_type_field |
name of a field in the anndata |
metacell_types_file |
path to a tabular file (csv,tsv) with cell type assignement for
each metacell. The file should have a column named "metacell" with the metacell ids and another
column named "cell_type", or "cluster" with the cell type assignment. Metacell ids that do
not exists in the data would be ignored. |
cell_type_colors_file |
path to a tabular file (csv,tsv) with color assignement for
each cell type. The file should have a column named "cell_type" or "cluster" with the
cell types and another column named "color" with the color assignment. |
outliers_anndata_file |
path to anndata file with outliers (optional). This would enable, by default, the following tabs: ["Outliers", "Similar-fold", "Deviant-fold"]. See the metacells python package for more details. |
cluster_metacells |
When TRUE and no metacell type is given (via |
cluster_k |
number of clusters for initial metacell clustering. If NULL - the number of clusters would be determined such that a metacell would contain 16 cells on average. |
metadata_fields |
names of fields in the anndata |
metadata |
can be either a data frame with a column named "metacell" with the metacell id and other metadata columns
or a name of a delimited file which contains such data frame. See |
metadata_colors |
a named list with colors for each metadata column, or a name of a yaml file with such list.
For numerical metadata columns, colors should be given as a list where the first element is a vector of colors and the second element is a vector of breaks. |
cell_metadata |
data frame with a column named "cell_id" with
the cell id and other metadata columns, or a name of a delimited file which
contains such data frame. For activating the "Samples" tab, the data frame should have an additional
column named "samp_id" with a sample identifier per cell (e.g., batch id, patient etc.).
Optionally, a column named "metacell" can be added to the data frame, which will be used instead
of the |
cell_to_metacell |
data frame with a column named "cell_id" with cell id and
another column named "metacell" with the metacell the cell is part of, or a
name of a delimited file which contains such data frame. If NULL, the metacell
will be inferred from the 'metacell' column in |
gene_modules_file |
path to a tabular file (csv,tsv) with assignment of genes to gene modules. Should have a field named "gene" with the gene name and a field named "module" with the name of the gene module. |
gene_modules_k |
number of clusters for initial gene module calculation. If NULL - the number of clusters would be determined such that an gene module would contain 16 genes on average. |
calc_gg_cor |
Calculate top 30 correlated and anti-correlated genes for each gene. This computation can be heavy for large datasets or weaker machines, so you can set |
gene_names |
use alternative gene names (optional). A data frame with a column called 'gene_name' with the original gene name (as it appears at the 'h5ad' file) and another column called 'alt_name' with the gene name to use in MCView. Genes that do not appear at the table would not be changed. |
metacell_graphs |
a named list of metacell graphs or files containing metacell graphs. Each graph should be a data frame columns named "from", "to" and "weight" with the ids of the metacells and the weight of the edge. If the list is not named, the names would be 'graph1', 'graph2' and so on. Note that the graph cannot be named "metacell" as this is reserved for the metacell graph. |
atlas_project |
path to and |
atlas_dataset |
name of the atlas dataset |
projection_weights_file |
Path to a tabular file (csv,tsv) with the following fields "query", "atlas" and "weight". The file is an output of |
copy_atlas |
copy atlas MCView to the current project. If FALSE - a symbolic link would be created instead. |
minimal_max_log_fraction |
When choosing marker genes: take only genes with at least one value (in log fraction units - normalized egc) above this threshold |
minimal_relative_log_fraction |
When choosing marker genes: take only genes with relative log fraction (mc_fp) above this this value |
umap_anchors |
a vector of gene names to use for UMAP calculation. If NULL, the umap from the anndata object would be used. |
umap_config |
a named list with UMAP configuration. See |
min_umap_log_expr |
minimal log2 expression for genes to use for UMAP calculation. |
genes_per_anchor |
number of genes to use for each umap anchor. |
layout |
a data frame with a column named "metacell" with the metacell id and other columns with the x and y coordinates of the metacell. If NULL, the layout would be taken from the anndata object. |
default_graph |
a data frame with a column named "from", "to" and "weight" with the ids of the metacells and the weight of the edge. If NULL, the graph would be taken from the anndata object. |
overwrite |
if a dataset with the same name already exists, overwrite it. Otherwise, an error would be thrown. |
copy_source_file |
if TRUE, copy the source file to the project cache directory. If FALSE, create a symbolic link to the source file. |
... |
Arguments passed on to
|
The function would create a directory under project/cache/dataset
which
would contain objects used by MCView shiny app (such as the metacell matrix).
In addition, you can supply file with type assignment for each metacell
(metacell_types_file
) and a file with color assignment for each metacell type
(cell_type_colors_file
).
invisibly returns an AnnDataR6
object of the read anndata_file
## Not run:
dir.create("raw")
download.file(
"http://www.wisdom.weizmann.ac.il/~atanay/metac_data/PBMC_processed.tar.gz",
"raw/PBMC_processed.tar.gz"
)
untar("raw/PBMC_processed.tar.gz", exdir = "raw")
import_dataset("PBMC", "PBMC163k", "raw/metacells.h5ad")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.