format_covariates: Format covariates

View source: R/format_covariates.R

format_covariatesR Documentation

Format covariates

Description

Mainly, this method splits the categorical variables (which should be 'factor' variables) into indicator variables (i.e., one-hot encoding), dropping the last level, and then rescales all the numerical variables (but does not center them), and computes the "Log_UMI" (i.e., log total counts) for each cell. "Log_UMI" is added as its own column.

Usage

format_covariates(
  dat,
  covariate_df,
  bool_center = FALSE,
  rescale_numeric_variables = NULL,
  variables_enumerate_all = NULL
)

Arguments

dat

Dataset (either matrix or dgCMatrix) where the n rows represent cells and p columns represent genes. The rows and columns of the matrix should be named.

covariate_df

data.frame where each row represents a cell, and the columns are the different categorical or numerical variables that you wish to adjust for

bool_center

Boolean if the numerical variables should be centered around zero, default is FALSE

rescale_numeric_variables

A vector of strings denoting the column names in covariate_df that are numerical and you wish to rescale

variables_enumerate_all

If not NULL, this allows you to control specifically which factor variables in covariate_df you would like to split into indicators. By default, this is NULL, meaning all the factor variables are split into indicators

Value

a matrix with the same number of rows as dat


linnykos/eSVD2 documentation built on July 17, 2024, 12:01 a.m.