doc/scanpy_integration.md

Convert Conos Object to ScanPy

Install Auxilliary Data Packages

First install the auxilliary packages for conos, conosPanel:

install.packages('conosPanel', repos='https://kharchenkolab.github.io/drat/', type='source')

Now load the conos library, and the R package conosPanel for the example data panel:

library(conos)
panel <- conosPanel::panel

Next, use pagoda2 for pre-processing:

library(pagoda2)
panel.preprocessed <- lapply(panel, basicP2proc, n.cores=1, min.cells.per.gene=0, n.odgenes=2e3, 
                             get.largevis=FALSE, make.geneknn=FALSE)
## creating space of type angular done
## adding data ... done
## building index ... done
## querying ... done
## creating space of type angular done
## adding data ... done
## building index ... done
## querying ... done
## creating space of type angular done
## adding data ... done
## building index ... done
## querying ... done
## creating space of type angular done
## adding data ... done
## building index ... done
## querying ... done

Now align the datasets:

con <- Conos$new(panel.preprocessed, n.cores=1)
con$buildGraph(k=15, k.self=5, space='PCA', ncomps=30)
## .............

Next find the clusters, and create an embedding:

con$findCommunities()
con$embedGraph(method="UMAP")
## Estimating hitting distances: 05:07:28.
## Done.
## Estimating commute distances: 05:08:20.
## Hashing adjacency list: 05:08:20.
## Done.
## Estimating distances: 05:08:23.
## Done
## Done.
## All done!: 05:08:29.

Now prepare the metadata (which can be any type of clustering of all the cells):

metadata <- data.frame(Cluster=con$clusters$leiden$groups)

Save data (set exchange_dir to your path):

## use current directory
exchange_dir <- "."
hdf5file = "example.h5"
saveConosForScanPy(con, output.path=exchange_dir, hdf5_filename=hdf5file, verbose=TRUE)

Users can then access the data saved to the HDF5 file, e.g. to access metadata, run:

library(rhdf5)
metadata = h5read(paste0(exchange_dir, "/example.h5"), 'metadata/metadata.df')
head(metadata, 4)
##                                 CellId           Dataset
## 1 MantonBM1_HiSeq_1-TCTATTGGTCTCTCGT-1 MantonBM1_HiSeq_1
## 2 MantonBM1_HiSeq_1-GAATAAGTCACGCATA-1 MantonBM1_HiSeq_1
## 3 MantonBM1_HiSeq_1-ACACCGGTCTAACTTC-1 MantonBM1_HiSeq_1
## 4 MantonBM1_HiSeq_1-TCATTTGGTACGCTGC-1 MantonBM1_HiSeq_1

All possible fields included in the output HDF5 file are:

In order to read in the dcGMatrix again, simply use the Matrix package as follows:

library(rhdf5)
library(Matrix)
rawcountMat = h5read(paste0(exchange_dir, "/example.h5"), 'raw_count_matrix')
raw_count_matrix = sparseMatrix(x = as.numeric(rawcountMat$data),  
    dims = as.numeric(c(rawcountMat$shape[1], rawcountMat$shape[2])), 
    p = as.numeric(rawcountMat$indptr), 
    i = rawcountMat$indices, index1=FALSE)

Note: Please set index1=FALSE as the index vectors are 0-based. For more details, see the documentation for Matrix::sparseMatrix()



kharchenkolab/conos documentation built on Feb. 28, 2024, 6:03 a.m.