create_expt: Wrap bioconductor's expressionset to include some extra...
In elsayed-lab/hpgltools: A pile of (hopefully) useful R functions

create_expt

R Documentation

Wrap bioconductor's expressionset to include some extra information.

Description

Note: You should just be using create_se(). It does everything the expt does, but better.

Usage

create_expt(
  metadata = NULL,
  gene_info = NULL,
  count_dataframe = NULL,
  sanitize_rownames = TRUE,
  sample_colors = NULL,
  title = NULL,
  notes = NULL,
  include_type = "all",
  countdir = NULL,
  include_gff = NULL,
  file_column = "file",
  id_column = NULL,
  savefile = NULL,
  low_files = FALSE,
  handle_na = "drop",
  researcher = "elsayed",
  study_name = NULL,
  file_type = NULL,
  annotation_name = "org.Hs.eg.db",
  tx_gene_map = NULL,
  feature_type = "gene",
  ignore_tx_version = TRUE,
  ...
)

Arguments

`metadata`	Comma separated file (or excel) describing the samples with information like condition, batch, count_filename, etc.
`gene_info`	Annotation information describing the rows of the data set, this often comes from a call to import.gff() or biomart or organismdbi.
`count_dataframe`	If one does not wish to read the count tables from the filesystem, they may instead be fed as a data frame here.
`sanitize_rownames`	Clean up weirdly written gene IDs?
`sample_colors`	List of colors by condition, if not provided it will generate its own colors using colorBrewer.
`title`	Provide a title for the expt?
`notes`	Additional notes?
`include_type`	I have usually assumed that all gff annotations should be used, but that is not always true, this allows one to limit to a specific annotation type.
`countdir`	Directory containing count tables.
`include_gff`	Gff file to help in sorting which features to keep.
`file_column`	Column to use in a gene information dataframe for
`id_column`	Column which contains the sample IDs.
`savefile`	Rdata filename prefix for saving the data of the resulting expt.
`low_files`	Explicitly lowercase the filenames when searching the filesystem?
`handle_na`	How does one wish to deal with NA values in the data?
`researcher`	Used to make the creation of gene sets easier, set the researcher tag.
`study_name`	Ibid, but set the study tag.
`file_type`	Explicitly state the type of files containing the count data. I have code which autodetects the method used to import count data, this short-circuits it.
`annotation_name`	Ibid, but set the orgdb (or other annotation) instance.
`tx_gene_map`	Dataframe of transcripts to genes, primarily for tools like salmon.
`feature_type`	Make explicit the type of feature used so it may be printed later.
`...`	More parameters are fun!

Details

The primary innovation of this function is that it will check the metadata for columns containing filenames for the count tables, thus hopefully making the collation and care of metadata/counts easier. For example, I have some data which has been mapped against multiple species. I can use this function and just change the file_column argument to pick up each species' tables.

Value

experiment an expressionset

Examples

 cdm_expt_rda <- system.file("share", "cdm_expt.rda", package = "hpgldata")
 load(file = cdm_expt_rda)
 head(cdm_counts)
 head(cdm_metadata)
 ## The gff file has differently labeled locus tags than the count tables, also
 ## the naming standard changed since this experiment was performed, therefore I
 ## downloaded a new gff file.
 example_gff <- system.file("share", "gas.gff", package = "hpgldata")
 gas_gff_annot <- load_gff_annotations(example_gff)
 rownames(gas_gff_annot) <- make.names(gsub(pattern = "(Spy)_", replacement = "\\1",
                                            x = gas_gff_annot[["locus_tag"]]), unique = TRUE)
 mgas_expt <- create_expt(metadata = cdm_metadata, gene_info = gas_gff_annot,
                          count_dataframe = cdm_counts)
 head(pData(mgas_expt))
 ## An example using count tables referenced in the metadata.
 sb_annot <- system.file("share", "sb", "trinotate_head.csv.xz", package = "hpgldata")
 sb_annot <- load_trinotate_annotations(trinotate = sb_annot)
 sb_annot <- as.data.frame(sb_annot)
 rownames(sb_annot) <- make.names(sb_annot[["transcript_id"]], unique = TRUE)
 sb_annot[["rownames"]] <- NULL
 sb_data <- system.file("share", "sb", "preprocessing.tar.xz", package = "hpgldata")
 untarred <- utils::untar(tarfile = sb_data)
 sb_expt <- create_expt(metadata = "preprocessing/kept_samples.xlsx",
                        gene_info = sb_annot)
 dim(exprs(sb_expt))
 dim(fData(sb_expt))
 pData(sb_expt)
 ## There are lots of other ways to use this, for example:
 ## Not run: 
  new_experiment <- create_expt(metadata = "some_csv_file.csv", gene_info = gene_df)
  ## Remember that this depends on an existing data structure of gene annotations.
  meta <- extract_metadata("some_supplementary_materials_xls_file_I_downloaded.xls")
  another_expt <- create_expt(metadata = meta, gene_info = annotations, count_dataframe = df_I_downloaded)
 
## End(Not run)

elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.