searchAnywhere: Search Anywhere within SRA and GEO databases

Description Usage Arguments Value Argument requirements Query arguments Accession levels Category_both, SRA_library_strategy and GEO_type Examples

View source: R/Search_Anywhere.R

Description

Search Anywhere within SRA and GEO databases

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
searchAnywhere(
  query_all,
  acc_levels = c("run", "experiment", "sample", "gsm"),
  category_both = NULL,
  SRA_library_strategy = NULL,
  SRA_other_library_strategy = c("OTHER", "NA", "NULL"),
  GEO_type = NULL,
  SRA_query,
  GEO_query,
  GSM_query,
  GSE_query,
  call_output = FALSE
)

Arguments

query_all

Search term for both SRA and GEO (gse and gsm tables)

acc_levels

Accession levels at which the search is conducted. Possible options include run, sample, experiment, study, gsm, gse. Defaults to c("run", "experiment", "sample", "gsm")

category_both

A character with category for SRA library strategy and GEO type

SRA_library_strategy

A character with SRA library strategy

SRA_other_library_strategy

A logical whether to include unclassified entries

GEO_type

A character with GEO type

SRA_query

Search term for SRA only

GEO_query

Search term for GEO only

GSM_query

Search term for gsm table only (GEO)

GSE_query

Search term for gse table only (GEO)

call_output

A logical indicating whether to produce a call record

Value

A data frame with results of the search

Argument requirements

Either query_all or both SRA_query and GEO_query need to be provided (this is to facilitate column-specific search in the databases; if you wish to search within specific columns, provide SRA_query and GEO_query with appropriate column names)

Query arguments

Query arguments include query_both, SRA_query, GEO_query, GSM_query and GSE_query.

In the simplest case, it is recommended to just use query_both, which will apply to all the searches across databases. However, for user in need of more fine-tuning, other query arguements can be used (e.g. when you wish to search within specific columns of each database table; this is mostly appropriate for use in fts search). Only the highest level query arguments will be considered. Hence the following combinations of arguments are accepted (any extra query arguments will be ignored):

Accession levels

Each accession level is associated with its own set of information. Sometimes the information is replicated across levels, sometimes it is unique to the level. Only information associated with the specified accession levels will be subject of the search. For example, it is common for study abstracts to mention a lot of gene names or proteins that were not a direct object of the study; by searching everywhere studies with a mere mention of a gene will be included.

Restricting accession levels, e.g.

searchAnywhere(query_all = "p53", acc_levels = c("run", "experiment", "sample", "gsm"))

will help avoid including these cases. However, always consider using a broader search and comparing the results to the more refined one.

Another use of accession levels is to restrict search to only one database. To do so, only list accession levels specific to one database: SRA (run, experiment, sample, study) or GEO (gsm, gse).

Category_both, SRA_library_strategy and GEO_type

SRA and GEO have distinct ways of specifying the type of their data (such as e.g. RNA-Seq, ChIP-Seq or microarray expression experiments). SRA stores that information as *library_strategy*, GEO records *types*. For users' convenience, a data frame with the conversion between the commmonest *library_strategies* and *types* is provided in SRA_GEO_Category_Conversion (for more details, please examine SRA_GEO_Category_Conversion or its documentation, ?SRA_GEO_Category_Conversion).

Hence, it is possible to specify *category*, which refers to either one or both SRA and GEO (some categories exist within both SRA and GEO, some only in one of the databases; e.g. only GEO stores microarray data).

Similarly to query arguments, the highest level argument will be taken into account and if lower-level arguments exist, they will be ignored.

Hence, the user can provide the following combinations of arguments:

* If only one of the SRA_library_strategy and GEO_type is provided, no search will be undertaken in the database corresponding to the missing argument. The same is the case if the supplied category_both refers only to one of the databases (e.g. search in SRA only if category_both = "DNA NGS" (DNA sequencing))

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
startSpiderSeqRDemo()
searchAnywhere("*sir3*") # The broadest search
searchAnywhere("sir3") # omits entries with characters before/after sir3
searchAnywhere("sir3 OR sir3p") # Can list synonyms

## Only search for matches in SRA
searchAnywhere ("sir3", acc_levels = c("run", 
           "sample", "experiment", "study"))

## Only search for matches in GEO
searchAnywhere ("sir3", acc_levels = c("gsm", "gse"))

ss-lab-cancerunit/SpiderSeqR documentation built on Nov. 2, 2020, 12:18 a.m.