create_expt: Wrap bioconductor's expressionset to include some extra...

View source: R/expt.R

create_exptR Documentation

Wrap bioconductor's expressionset to include some extra information.

Description

Note: You should just be using create_se(). It does everything the expt does, but better.

Usage

create_expt(
  metadata = NULL,
  gene_info = NULL,
  count_dataframe = NULL,
  sanitize_rownames = TRUE,
  sample_colors = NULL,
  title = NULL,
  notes = NULL,
  include_type = "all",
  countdir = NULL,
  include_gff = NULL,
  file_column = "file",
  id_column = NULL,
  savefile = NULL,
  low_files = FALSE,
  handle_na = "drop",
  researcher = "elsayed",
  study_name = NULL,
  file_type = NULL,
  annotation_name = "org.Hs.eg.db",
  tx_gene_map = NULL,
  feature_type = "gene",
  ignore_tx_version = TRUE,
  ...
)

Arguments

metadata

Comma separated file (or excel) describing the samples with information like condition, batch, count_filename, etc.

gene_info

Annotation information describing the rows of the data set, this often comes from a call to import.gff() or biomart or organismdbi.

count_dataframe

If one does not wish to read the count tables from the filesystem, they may instead be fed as a data frame here.

sanitize_rownames

Clean up weirdly written gene IDs?

sample_colors

List of colors by condition, if not provided it will generate its own colors using colorBrewer.

title

Provide a title for the expt?

notes

Additional notes?

include_type

I have usually assumed that all gff annotations should be used, but that is not always true, this allows one to limit to a specific annotation type.

countdir

Directory containing count tables.

include_gff

Gff file to help in sorting which features to keep.

file_column

Column to use in a gene information dataframe for

id_column

Column which contains the sample IDs.

savefile

Rdata filename prefix for saving the data of the resulting expt.

low_files

Explicitly lowercase the filenames when searching the filesystem?

handle_na

How does one wish to deal with NA values in the data?

researcher

Used to make the creation of gene sets easier, set the researcher tag.

study_name

Ibid, but set the study tag.

file_type

Explicitly state the type of files containing the count data. I have code which autodetects the method used to import count data, this short-circuits it.

annotation_name

Ibid, but set the orgdb (or other annotation) instance.

tx_gene_map

Dataframe of transcripts to genes, primarily for tools like salmon.

feature_type

Make explicit the type of feature used so it may be printed later.

...

More parameters are fun!

Details

The primary innovation of this function is that it will check the metadata for columns containing filenames for the count tables, thus hopefully making the collation and care of metadata/counts easier. For example, I have some data which has been mapped against multiple species. I can use this function and just change the file_column argument to pick up each species' tables.

Value

experiment an expressionset

See Also

[Biobase] [cdm_expt_rda] [example_gff] [sb_annot] [sb_data] [extract_metadata()] [set_expt_conditions()] [set_expt_batches()] [set_expt_samplenames()] [subset_expt()] [set_expt_colors()] [set_expt_genenames()] [tximport] [load_annotations()]

Examples

 cdm_expt_rda <- system.file("share", "cdm_expt.rda", package = "hpgldata")
 load(file = cdm_expt_rda)
 head(cdm_counts)
 head(cdm_metadata)
 ## The gff file has differently labeled locus tags than the count tables, also
 ## the naming standard changed since this experiment was performed, therefore I
 ## downloaded a new gff file.
 example_gff <- system.file("share", "gas.gff", package = "hpgldata")
 gas_gff_annot <- load_gff_annotations(example_gff)
 rownames(gas_gff_annot) <- make.names(gsub(pattern = "(Spy)_", replacement = "\\1",
                                            x = gas_gff_annot[["locus_tag"]]), unique = TRUE)
 mgas_expt <- create_expt(metadata = cdm_metadata, gene_info = gas_gff_annot,
                          count_dataframe = cdm_counts)
 head(pData(mgas_expt))
 ## An example using count tables referenced in the metadata.
 sb_annot <- system.file("share", "sb", "trinotate_head.csv.xz", package = "hpgldata")
 sb_annot <- load_trinotate_annotations(trinotate = sb_annot)
 sb_annot <- as.data.frame(sb_annot)
 rownames(sb_annot) <- make.names(sb_annot[["transcript_id"]], unique = TRUE)
 sb_annot[["rownames"]] <- NULL
 sb_data <- system.file("share", "sb", "preprocessing.tar.xz", package = "hpgldata")
 untarred <- utils::untar(tarfile = sb_data)
 sb_expt <- create_expt(metadata = "preprocessing/kept_samples.xlsx",
                        gene_info = sb_annot)
 dim(exprs(sb_expt))
 dim(fData(sb_expt))
 pData(sb_expt)
 ## There are lots of other ways to use this, for example:
 ## Not run: 
  new_experiment <- create_expt(metadata = "some_csv_file.csv", gene_info = gene_df)
  ## Remember that this depends on an existing data structure of gene annotations.
  meta <- extract_metadata("some_supplementary_materials_xls_file_I_downloaded.xls")
  another_expt <- create_expt(metadata = meta, gene_info = annotations, count_dataframe = df_I_downloaded)
 
## End(Not run)

elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.