In adamnc2/simpleSCDE: Simplified Cell Data Manipulation and SCDE Testing

Simplified Single Cell Differential Expression Analysis

This vignette will show you how to import, prepare, and export data related to SCDE analysis with the simpleSCDE package.

Loading the Package

In order to begin working with the simpleSCDE package, the package must be uploaded from github. The following code will upload the package directly:

# load devtools and use it to pull simpleSCDE from github
library(devtools)
devtools::install_github("adamnc2/simpleSCDE", bulid_vignettes = FALSE)

Once the package has been imported from github, the following code should been run to load the package into your R session:

library("simpleSCDE", lib.loc="~/R/win-library/3.5")

If you are working in RStudio (which is recommended), the above step can also be achieved by going into the Packages tab and checking the small box that appears next to the simpleSCDE package.

Importing Data

The format and size of the data used in SCDE analysis can vary greatly, making mining data particularly difficult. In order to prepare for as many scenarios as possible, 'simpleSCDE' includes methods for reading CSV, text, and excel files. Below are examples of how to use all three functions:

muscle_csv <- import_csv(file_path = "C:/Users/exampleUser/Desktop/Fincher_muscle.xlsx", colnames = TRUE, rownames = TRUE)
muscle_txt <- import_txt(file_path = "C:/Users/exampleUser/Desktop/Fincher_muscle.xlsx", colnames = TRUE, rownames = TRUE)
muscle_excel <- import_excel(file_path = "C:/Users/exampleUser/Desktop/Fincher_muscle.xlsx", sheet_number = 2, sheet_name = "LatMuscle", row_start = 1, row_stop = 3250, col_start = 5, col_stop = 55, colnames = FALSE, rownames = FALSE)

In the above examples, the imported data is being assigned to the values import_csv, import_txt, and import_excel. In order to store the data that is being imported, import functions must be assigned to user-defined variables. In this case, the user-defined variables used are muscle_csv, muscle_txt, and muscle_excel.

To see the details of a function in R, a ? can be written prior to a function name to see a description of its purpose, a list of its parameters, and often some examples of it being used. (ex: ?import_excel)

For the purposes of this vignette, we are going to use data that has been attached to thr 'simpleSCDE' package itself. While there are a few different example datasets included in the 'simpleSCDE' package, we are going to use a list of planarian muscle cells that was originally created by Fincher et al.:

data("Fincher_muscle")

Manipulating Data Frames

Prior to SCDE testing, any data you may be interested in using must be properly wrangled and isolated. Flexible functions for column selection and row selection are included in the 'simpleSCDE' package. Below, we will use the column selection function to isolate trunk muscle cells from our current data frame of planarian muscle cells:

# defining a new variable trunk_cells to store the data frame of trunk muscle cells selected from all muscle cells
trunk_cells <- colselect(df = Fincher_muscle, colchars = "Cells_Trunk", complete_names = FALSE)

The above code identified trunk cells based on whether or not their names (which are found in colnames(Fincher_muscle)) contained the string "Cells_Trunk". In order to identify cells using a fragment of their cell name, complete_names must be equal to FALSE. If you want to identify cells using their full name(s) only, you can set complete_names equal to true and enter one or more complete cell names in the form of a vector.

After selecting the desired trunk cells, it is a good idea to elminate any possible duplicate cells(columns) or genes(rows) from your data frame:

# assign trunk_cells to the data frame created when duplicate cells(columns) are deleted from trunk_cells
trunk_cells <- delete_duplicate_col(trunk_cells)
# assign trunk_cells to the data frame created when duplicate genes(rows) are deleted from trunk_cells
trunk_cells <- delete_duplicate_rows(trunk_cells)

Labeling Cell Groups

Now that all preferred cells are chosen and you are sure that there are no duplicate cells or genes present in the data frame, the next step is to create group labels that the SCDE test will compare. The simpleSCDE package includes five methods for cell group labeling:

pos_neg.label (positive and negative cell group based on the expression of a single gene)
pospos_neg.label (positive and negative cell group based on the expression of two specific genes)
pos_pos.label (two cell groups that are each positive for the expression of different single genes)
pospos_pospos.label (two cell groups that are each positive for the expression of different sets of two genes)
label_by_colnumbs (define two cell groups based on column numbers)

For the purpose of this vignette, we are going to use pos_neg.label to label our trunk cells positive or negative for the expression of slit(dd_Smed_v4_1211_0_1):

# assign vector containing positive and negative labels to slit_labels
slit_labels <- pos_neg.label(df = trunk_cells, rowname = "dd_Smed_v4_12111_0_1", x = 5, x.limit = 0, posname = "slit_positive", negname = "slit_negative")

If you would like to see descriptions or parameters of any of the functions above, remember that the ? symbol can be used in the console in front of any function name to see the details of that specific function.

Running SCDE Test

Now that your data frame and labels have been prepared, you are ready to run the SCDE test. The run_SCDE function includes the following parameters:

df (data frame)
labels (cell group labels)
write_to_txt (boolean determining if results should be written to an external text file)
file_name (name of file to write results to if write_to_txt = TRUE)
ncores (number of cores to use during test)
min_genes_detected (minimum number of genes detected in a cell)
min_cells_gene_in (minimum number of cells a gene must be seen in)
min_reads_per_gene (minimum number of reads per gene)

When using run_SCDE, errors may arise involving the final three parameters mentioned above. These parameters control what cells and genes are filtered out of your data, and it is eay to set these values too high, eliminating all of cells or genes.

The following code will run SCDE Testing on our trunk muscle cells:

# results of our test will be saved to slit_results
slit_results <- run_SCDE(df = trunk_cells, labels = slit_labels, write_to_txt = TRUE, file_name = "slitTestResults.txt", min_genes_detected = 5, min_cells_gene_in = 1, min_reads_per_gene = 1)

The code above will save test results to slit_results and also create a text file("slitTestResults.txt", in this case) containing the same test results. Note that this portion of the process can take a very long time if your data frame is large or if your computer is slow.

Exporting Data

Once the SCDE test is complete, you may want to export your results or any other relevant data to other file types on your computer. The simpleSCDE package includes export options for CSV, text, and excel file types. Examples of how to use each of these functions are included below:

export_csv(data = Fincher_muscle, file_path = "C:/Users/exampleUser/Desktop/resultsSCDE.csv", colnames = FALSE, rownames = FALSE)
export_txt(data = Fincher_muscle, file_path = "C:/Users/exampleUser/Desktop/resultsSCDE.txt", colnames = TRUE, rownames = FALSE)
export_excel(data = fincher_muscle, file_path = "C:/Users/exampleUser/Desktop/resultsSCDE.xlsx", colnames = TRUE, rownames = TRUE, name_sheet = "Fincher_muscle", add_to_file = TRUE)

Issues or Questions?

Any issues or questions regarding the process detailed above or the simpleSCDE package in general should be reported to https://github.com/adamnc2/simpleSCDE/issues.

adamnc2/simpleSCDE documentation built on May 7, 2019, 7:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com