examine_dataset: Generate several plots and objects for dataset exploration
In robAndrewCarter/rnaseqUtils: Utility Functions for RNASeq Data Analysis

This function examines several facets of the dataset, inclusing filtering cells based on metadata, dimension reduction, cluster analysis, and visualization

1
2
3

examine_dataset(cell_data_dataframe = data.frame(),
  normalized_expression_matrix = matrix(), .species = "mmusculus",
  color_var = character())

`cell_dataframe:`	cell_dataframe consists of sample names of each cell to be examined, as well as all other data that might be used for visualizing the resulting cells. This data.frame must have a character column named 'sample_name', which must match the row names of normalized_expression_matrix
`normalized_expression_matrix:`	this must be a CxG matrix, with C cells and G genes, The C cells are the rownamews of the matrix and must match the sample_name column from cell_dataframe. The column names must be ENSEMBL IDs. The entries of the matrix must be the normalized since they will be used in tsne
`.species:`	currently either 'mmusculus' or 'hsapiens'
`color_var:`	variable name to color cells by in overview ggplot images. This must correspond to a column name in cell_data_dataframe.

This function returns a list with the following elements:

top_1000_mat: a matrix of the top 1000 overdispersed genes
tsne_results: a list of Rtsne outputs from the filtered matrix, each from a differnet perplexity (15,30,45,60)
n50_enriched_genes_by_cluster: a dataframe showing the top 50 enriched genes in each cluster. Enrichment is based on proportion of cells expressing a gene
updated_metadata_df: the original cell_dataframe annotated with cluster_id
clusters_scatterplot: a scatterplot coloring the cells by cluster_id
important_genes_heatmap: a heatmap plot showing the enriched genes in each cluster
overview_plots: a list of plots corresponding to the tsne runs that are coloredc by @param color_by