This vignette will show you how to import, prepare, and export data related to SCDE analysis with the simpleSCDE
package.
In order to begin working with the simpleSCDE
package, the package must be uploaded from github. The following code will upload the package directly:
# load devtools and use it to pull simpleSCDE from github library(devtools) devtools::install_github("adamnc2/simpleSCDE", bulid_vignettes = FALSE)
Once the package has been imported from github, the following code should been run to load the package into your R session:
library("simpleSCDE", lib.loc="~/R/win-library/3.5")
If you are working in RStudio (which is recommended), the above step can also be achieved by going into the Packages tab and checking the small box that appears next to the simpleSCDE
package.
The format and size of the data used in SCDE analysis can vary greatly, making mining data particularly difficult. In order to prepare for as many scenarios as possible, 'simpleSCDE' includes methods for reading CSV, text, and excel files. Below are examples of how to use all three functions:
muscle_csv <- import_csv(file_path = "C:/Users/exampleUser/Desktop/Fincher_muscle.xlsx", colnames = TRUE, rownames = TRUE)
muscle_txt <- import_txt(file_path = "C:/Users/exampleUser/Desktop/Fincher_muscle.xlsx", colnames = TRUE, rownames = TRUE)
muscle_excel <- import_excel(file_path = "C:/Users/exampleUser/Desktop/Fincher_muscle.xlsx", sheet_number = 2, sheet_name = "LatMuscle", row_start = 1, row_stop = 3250, col_start = 5, col_stop = 55, colnames = FALSE, rownames = FALSE)
In the above examples, the imported data is being assigned to the values import_csv
, import_txt
, and import_excel
. In order to store the data that is being imported, import functions must be assigned to user-defined variables. In this case, the user-defined variables used are muscle_csv
, muscle_txt
, and muscle_excel
.
To see the details of a function in R, a ?
can be written prior to a function name to see a description of its purpose, a list of its parameters, and often some examples of it being used. (ex: ?import_excel
)
For the purposes of this vignette, we are going to use data that has been attached to thr 'simpleSCDE' package itself. While there are a few different example datasets included in the 'simpleSCDE' package, we are going to use a list of planarian muscle cells that was originally created by Fincher et al.:
data("Fincher_muscle")
Prior to SCDE testing, any data you may be interested in using must be properly wrangled and isolated. Flexible functions for column selection and row selection are included in the 'simpleSCDE' package. Below, we will use the column selection function to isolate trunk muscle cells from our current data frame of planarian muscle cells:
# defining a new variable trunk_cells to store the data frame of trunk muscle cells selected from all muscle cells trunk_cells <- colselect(df = Fincher_muscle, colchars = "Cells_Trunk", complete_names = FALSE)
The above code identified trunk cells based on whether or not their names (which are found in colnames(Fincher_muscle)
) contained the string "Cells_Trunk". In order to identify cells using a fragment of their cell name, complete_names
must be equal to FALSE. If you want to identify cells using their full name(s) only, you can set complete_names
equal to true and enter one or more complete cell names in the form of a vector.
After selecting the desired trunk cells, it is a good idea to elminate any possible duplicate cells(columns) or genes(rows) from your data frame:
# assign trunk_cells to the data frame created when duplicate cells(columns) are deleted from trunk_cells trunk_cells <- delete_duplicate_col(trunk_cells) # assign trunk_cells to the data frame created when duplicate genes(rows) are deleted from trunk_cells trunk_cells <- delete_duplicate_rows(trunk_cells)
Now that all preferred cells are chosen and you are sure that there are no duplicate cells or genes present in the data frame, the next step is to create group labels that the SCDE test will compare. The simpleSCDE
package includes five methods for cell group labeling:
pos_neg.label
(positive and negative cell group based on the expression of a single gene)pospos_neg.label
(positive and negative cell group based on the expression of two specific genes)pos_pos.label
(two cell groups that are each positive for the expression of different single genes)pospos_pospos.label
(two cell groups that are each positive for the expression of different sets of two genes)label_by_colnumbs
(define two cell groups based on column numbers)For the purpose of this vignette, we are going to use pos_neg.label
to label our trunk cells positive or negative for the expression of slit(dd_Smed_v4_1211_0_1):
# assign vector containing positive and negative labels to slit_labels slit_labels <- pos_neg.label(df = trunk_cells, rowname = "dd_Smed_v4_12111_0_1", x = 5, x.limit = 0, posname = "slit_positive", negname = "slit_negative")
If you would like to see descriptions or parameters of any of the functions above, remember that the ?
symbol can be used in the console in front of any function name to see the details of that specific function.
Now that your data frame and labels have been prepared, you are ready to run the SCDE test. The run_SCDE
function includes the following parameters:
df
(data frame)labels
(cell group labels)write_to_txt
(boolean determining if results should be written to an external text file)file_name
(name of file to write results to if write_to_txt = TRUE
)ncores
(number of cores to use during test)min_genes_detected
(minimum number of genes detected in a cell)min_cells_gene_in
(minimum number of cells a gene must be seen in)min_reads_per_gene
(minimum number of reads per gene)When using run_SCDE
, errors may arise involving the final three parameters mentioned above. These parameters control what cells and genes are filtered out of your data, and it is eay to set these values too high, eliminating all of cells or genes.
The following code will run SCDE Testing on our trunk muscle cells:
# results of our test will be saved to slit_results slit_results <- run_SCDE(df = trunk_cells, labels = slit_labels, write_to_txt = TRUE, file_name = "slitTestResults.txt", min_genes_detected = 5, min_cells_gene_in = 1, min_reads_per_gene = 1)
The code above will save test results to slit_results
and also create a text file("slitTestResults.txt", in this case) containing the same test results. Note that this portion of the process can take a very long time if your data frame is large or if your computer is slow.
Once the SCDE test is complete, you may want to export your results or any other relevant data to other file types on your computer. The simpleSCDE
package includes export options for CSV, text, and excel file types. Examples of how to use each of these functions are included below:
export_csv(data = Fincher_muscle, file_path = "C:/Users/exampleUser/Desktop/resultsSCDE.csv", colnames = FALSE, rownames = FALSE)
export_txt(data = Fincher_muscle, file_path = "C:/Users/exampleUser/Desktop/resultsSCDE.txt", colnames = TRUE, rownames = FALSE)
export_excel(data = fincher_muscle, file_path = "C:/Users/exampleUser/Desktop/resultsSCDE.xlsx", colnames = TRUE, rownames = TRUE, name_sheet = "Fincher_muscle", add_to_file = TRUE)
Any issues or questions regarding the process detailed above or the simpleSCDE
package in general should be reported to https://github.com/adamnc2/simpleSCDE/issues.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.