convertAccession: Convert between accession types

Description Usage Arguments Value Accepted Accession Types Output format Troubleshooting Background Information on Accession Types See Also Examples

View source: R/Accession_Conversion_Functions.R

Description

convertAccession converts a vector of accessions (all belonging into the same accession type) into all possible accession types within SRA and GEO. If no SRA/GEO conversion is possible, all the missing accession types are marked as NAs.

Usage

1
convertAccession(acc_vector)

Arguments

acc_vector

A vector of accessions (all must belong to the same type)

Value

A data frame with conversion between all accession types

Accepted Accession Types

convertAccession accepts any of the 4 SRA or 2 GEO accession types (see section 'Background Information on Accession Types'). convertAccession accepts only one accession type at a time.

For example, the following queries are NOT allowed:

convertAccession("SRR_____", "SRP_____")

convertAccession("GSE_____", "SRP_____")

In order to obtain the above results, it is necessary to run separate queries for each accession type, and, if desirable, bind the data frames together (e.g. rbind(convertAccession("SRR_____"), convertAccession("SRP_____"))).

SRA accessions differing by the first letter belong to the same type, hence it is possible to run: convertAccession("SRP_____", "ERP_____")

Output format

The function outputs a data frame with conversion of the input accessions into all possible types.

In the best case scenario, i.e. if an accession exists in both SRA and GEO databases, these would include all 6 accession types (SRR, SRX, SRS, SRP, GSM, GSE).

If an accession exists only in one of the databases, the conversion will be limited to that one database. For example, if an accession only exists in SRA, only SRA accessions will be provided, whilst the GEO columns will be populated with NAs.

Troubleshooting

The conversion between SRA and GEO databases is based on a custom database generated by startSpiderSeqR() function. To ensure best results, make sure that the most up to date versions of the databases. To improve results, you can do the following:

  1. Download the most up to date versions of SRAmetadb.sqlite and GEOmetadb.sqlite files - this is done by running startSpiderSeqR, specifying an appropriate argument for expiry period of database files (e.g. startSpiderSeqR(path = getwd(), general_expiry = 1))

  2. Generate a fresh custom database for conversion between accessions (SRR_GSM.sqlite) - this is also done by running startSpiderSeqR, specifying an appropriate argument for expiry period of the database file

  3. As a last resort, manually search for the missing conversions online

NOTE: because the SRR_GSM.sqlite database is machine-generated, there is some risk that it might not include some conversions in case they have been recorded in the database in a non-standard way. If in doubt, it is worth checking the accession page online. However, users should be aware that the overlap between SRA and GEO is only about 20% (at the time of writing), so most entries will not have corresponding accession numbers in the other database

Background Information on Accession Types

The two lists below include accession types within SRA and GEO respectively.

All of these are supported by the convertAccession function.

SRA

  1. SRP or DRP or ERP - project_accession

  2. SRS or DRS or ERS - sample_accession

  3. SRX or DRX or ERX - experiment_accession

  4. SRR or DRR or ERR - run_accession

NOTE: depending on the location of the database (NCBI, EBI or DDBJ), these accessions might begin with a different letter (S, E or D), so the accession levels can be either SRP/SRX/SRS/SRR or ERP/ERX/ERS/ERR or DRP/DRX/DRS/ERR. Accessions beginning with 'S' are by far the most common.

GEO

  1. GSE - series_id

  2. GSM - sample

NOTE: GEO accession system is further complicated by existence of 'superseries', which act as higher level series. In these cases a given GSM would belong to multiple (at least two) GSEs - its series_id and superseries.

See Also

Other Workflow functions: addMissingSamples(), filterByTermByAccessionLevel(), filterByTerm(), searchForAccession()

Other Core functions: searchForAccession()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Setup SpiderSeqR environment first (please use non-demo version)
startSpiderSeqRDemo()

convertAccession("SRP134708")

convertAccession("SRR3707942")

convertAccession("GSM2027840")

# Note that DRP, ERP and SRP are of the same accession type (study level)
convertAccession(c("DRP003157", "SRP061795"))




 

ss-lab-cancerunit/SpiderSeqR documentation built on Nov. 2, 2020, 12:18 a.m.