knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

SBF: A R package for Shared Basis Factorization

Orthogonal Shared Basis Factorization (OSBF) is a joint matrix diagonalization algorithm we developed for cross-species gene expression analysis.

Installation

  1. If you have git installed, clone SBF from Github and install.
git clone https://github.com/amalthomas111/SBF.git

If git is not installed, go to code -> Download ZIP. Unzip the file.

Then, inside an R console

install.packages("<path to SBF>/SBF", repos = NULL, type = "source")

[OR]

  1. Install using devtools. Install devtools if not already installed. Note: devtools requires lot of dependencies.
# check devtools is installed. If not install.
pkgs <- c("devtools")
require_install <- pkgs[!(pkgs %in% row.names(installed.packages()))]
if (length(require_install))
  install.packages(require_install)
library(devtools)
devtools::install_github("amalthomas111/SBF")

Please contact us via e-mail or submit a GitHub issue if there is any trouble with installation.

Quick demo

Load SBF package

# load SBF package
library(SBF)

Create a test dataset

set.seed(1231)
mymat <- createRandomMatrices(n = 4, ncols = 3, nrows = 4:6)
sapply(mymat, dim)
mymat

Note: Depending upon the R versions, the random matrices generated could be different.

The rank of the matrices

sapply(mymat, function(x) {qr(x)$rank})

We will use the test dataset as our $D_i$ matrices.

OSBF examples

The OSBF's shared orthogonal $V$, $U_i$'s with orthonormal columns, and diagonal matrices $\Delta_i$'s can be estimated using the SBF function with argument orthogonal = TRUE.

Check the arguments for the SBF function call using ?SBF.

# OSBF call
osbf <- SBF(matrix_list = mymat, orthogonal = TRUE)
names(osbf)

For OSBF, the factorization error is optimized by default (minimizeError=TRUE), and the optimizeFactorization function is invoked. Optimization could take some time depending on the data matrix (mymat) and initial values.
?SBF help function shows all arguments for the SBF call.

In OSBF, $V$ is orthogonal and columns of $U_i$'s are orthonormal ($U_i^T U_i = I$).

zapsmall(t(osbf$v) %*% osbf$v)
# check the columns of first matrix of U_ortho
zapsmall(t(osbf$u[[names(osbf$u)[[1]]]]) %*%
           osbf$u[[names(osbf$u)[[1]]]])

OSBF is not an exact factorization. The decomposition error is minimized and the final factorization error is stored in osbf$error. The initial factorization error we started with is stored in osbf$error_start.

# initial decomposition error
osbf$error_start
# final decomposition error
osbf$error

The error decreased by a factor of r round(osbf$error_start/osbf$error,2).

The number of iteration taken for optimizing:

cat("For osbf, # iteration =", osbf$error_pos, "final error =", osbf$error)

A detailed explanation of the algorithm and example cases of factorization can be found in the vignettes/docs directory.

Factorization for an example gene expression dataset

# load sample dataset from SBF package
avg_counts <- SBF::TissueExprSpecies
# dimension of matrices for different species
sapply(avg_counts, dim)
# names of different tissue types in humans
names(avg_counts[["Homo_sapiens"]])

OSBF computation based on inter-sample correlation. $V$ is estimated using the mean $R_i^T R_i$.

# OSBF call using correlation matrix
osbf_gem <- SBF(matrix_list = avg_counts, orthogonal = TRUE,
                transform_matrix = TRUE, tol = 1e-4)
# initial decomposition error
osbf_gem$error_start
# final decomposition error
osbf_gem$error

Note: For high-dimensional datasets, vary the tolerance threshold (tol) or maximum number of iteration parameter (max_iter) to reduce the computing time.

When calling SBF, orthogonal = FALSE computes the SBF factorization, an exact joint matrix factorization we developed. In SBF, the estimated columns of $U_i$ are not orthonormal. More details and examples can be found in the vignettes/docs directory.

Contacts

Amal Thomas amalthom@usc.edu

Contact us via e-mail or submit a GitHub issue

Publication

Available on bioRxiv

Copyright

Copyright (C) 2018-2021 Amal Thomas

Authors: Amal Thomas

SBF is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

SBF is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.



amalthomas111/SBF documentation built on Sept. 2, 2022, 11:27 a.m.