load_mm_data: Load data from matrix market format files.

View source: R/io.R

load_mm_dataR Documentation

Load data from matrix market format files.

Description

Load data from matrix market format files.

Usage

load_mm_data(
  mat_path,
  feature_anno_path,
  cell_anno_path,
  header = FALSE,
  feature_metadata_column_names = NULL,
  cell_metadata_column_names = NULL,
  umi_cutoff = 100,
  quote = "\"'",
  sep = "\t",
  verbose = FALSE,
  matrix_control = list()
)

Arguments

mat_path

Path to the Matrix Market .mtx matrix file. The values are read and stored as a sparse matrix with nrows and ncols, as inferred from the file. Required.

feature_anno_path

Path to a feature annotation file. The feature_anno_path file must have nrows lines and at least one column. The values in the first column label the matrix rows and each must be distinct in the column. Values in additional columns are stored in the cell_data_set 'gene' metadata. For gene features, we urge use of official gene IDs for labels, such as Ensembl or Wormbase IDs. In this case, the second column has typically a 'short' gene name. Additional information such as gene_biotype may be stored in additional columns starting with column 3. Required.

cell_anno_path

Path to a cell annotation file. The cell_anno_path file must have ncols lines and at least one column. The values in the first column label the matrix columns and each must be distinct in the column. Values in additional columns are stored in the cell_data_set cells metadata. Required.

header

Logical set to TRUE if both feature_anno_path and cell_anno_path files have column headers, or set to FALSE if both files do not have column headers (only these cases are supported). The files may have either ncols or ncols-1 header fields. In both cases, the first column is used as the matrix dimension names. The default is FALSE.

feature_metadata_column_names

A character vector of feature metadata column names. The number of names must be one less than the number of columns in the feature_anno_path file. These values will replace those read from the feature_anno_path file header, if present. The default is NULL.

cell_metadata_column_names

A character vector of cell metadata column names. The number of names must be one less than the number of columns in the cell_anno_path file. These values will replace those read from the cell_anno_path file header, if present. The default is NULL.

umi_cutoff

UMI per cell cutoff. Columns (cells) with less than umi_cutoff total counts are removed from the matrix. The default is 100.

quote

A character string specifying the quoting characters used in the feature_anno_path and cell_anno_path files. The default is "\"'".

sep

field separator character in the annotation files. If sep = "", the separator is white space, that is, one or more spaces, tabs, newlines, or carriage returns. The default is the tab character for tab-separated-value files.

verbose

a logical value that determines whether or not the function writes diagnostic information.

matrix_control

an optional list of values that control how matrices are stored in the cell_data_set assays slot. Typically, matrices are stored in-memory as dgCMatrix class (compressed sparse matrix) objects using matrix_class="dgCMatrix". This is the default. A very large matrix can be stored in a file and accessed by Monocle3 as if it were in-memory. For this, Monocle3 uses the BPCells R package. Here the matrix_control list values are set to matrix_class="BPCells" and matrix_mode="dir". Then the counts matrix is stored in a directory, on-disk, which is created by Monocle3 in the directory where you run Monocle3. This directory has a name with the form "monocle.bpcells.*.tmp" where the asterisk is a string of random characters that makes the name unique. Do not remove this directory while Monocle3 is running! If you choose to store the counts matrix as an on-disk BPCells object, you must use the "save_monocle_objects" and "load_monocle_objects" functions to save and restore the cell_data_set. Monocle3 tries to remove the BPCells matrix directory when your R session ends; however, sometimes a matrix directory may persist after the session ends. In this case, the user must remove the directory after the session ends. For additional information about the matrix_control list, see the examples below and the set_matrix_control help. Note that for the load_mm_data function the BPCells matrix_mode is "dir", the matrix_type is "double", and the matrix_compress is FALSE.

Value

cds object

Comments

  • load_mm_data estimates size factors.

Examples

  
    pmat<-system.file("extdata", "matrix.mtx.gz", package = "monocle3")
    prow<-system.file("extdata", "features_c3h0.txt", package = "monocle3")
    pcol<-system.file("extdata", "barcodes_c2h0.txt", package = "monocle3")
    cds <- load_mm_data( pmat, prow, pcol,
                         feature_metadata_column_names =
                         c('gene_short_name', 'gene_biotype'), sep='' )

    # In this example, the features_c3h0.txt file has three columns,
    # separated by spaces. The first column has official gene names, the
    # second has short gene names, and the third has gene biotypes.
    #
    # For typical count matrices with a small to medium number of cells,
    # we suggest that you use the default matrix_control list by not
    # not setting the matrix_control parameter. In this case, the
    # counts matrix is stored in-memory as a sparse matrix in the
    # dgCMatrix format, as it has in the past. It is also possible to
    # set the matrix_control list explicitly to use this in-memory
    # dgCMatrix format by setting the matrix_control parameter to
    #
      load_mm_data(..., matrix_control=list(matrix_class='dgCMatrix'))
    #
    # For large matrices, we suggest that you try storing the count
    # matrix as a BPCells object on-disk by setting the matrix_control
    # parameter list as follows
    #
      load_mm_data(..., matrix_control=list(matrix_class='BPCells'))
    #
  


cole-trapnell-lab/monocle3 documentation built on June 11, 2025, 11:22 p.m.