create_augmented_feature_info: Create meta information for the genes and transcripts in the...

Description Usage Arguments Details See Also

Description

This is a preprocessing function that is required to successfully build an Archs4Repository. It is not really intended for use during analyses.

This Function creates all of the feature-level CSV files for the features enumerated in the meta/genes gene-level hdf5 file, and the meta/transcript transcript identfiers in the transctipt-level hdf5 file for the mouse and human files found in datadir.

In order for this to work you have to download the approprate human and mouse gtf files from ensembl and save them in datadir. Reference the archs4_local_data_dir_validate() function.

For the initial relesae of the ARCHS4 dataset, the Homo_sapiens.GRCh38.90.gtf.gz and Mus_musculus.GRCm38.90.gtf.gz were used.

Usage

1

Arguments

datadir

The directory that has the mouse and human expression hdf5 files. There will be SPECIES_FEATURETYPE_augmented_info.csv.gz files saved in this directory whe this function completes.

Details

This function will write the augmented transcript- and gene-level files in the datadir, using the following pattern: <organism>_<feature_type>_augmented_info.csv.gz

Gene symbols are the only piece of information provided for the row-level identifieres for the gene count matrices. Furthermore, the gene symbol used in mouse are in all uppercase, which is not how genes are referred to there. In order to augment the gene symbol information with gene-level identifiers and other information, we parse relatively recent GTFs provided by GENCODE.

The fruits of the labor generated by this function are used by the archs4_feature_info() function.

Note that this function will replace already existing "augmented" files if the already exist in datadir.

See Also

archs4_feature_info()


denalitherapeutics/archs4 documentation built on May 17, 2019, 1:29 p.m.