Description Usage Arguments Value Note Examples
parse a document and place content in a DocSet
1 2 3 4 5 6 7 8 | parseDoc(csv, DocSetInstance = new("DocSet"), doctitle = NA_character_,
docabst = NA_character_, rec_id_field = "experiment.accession",
exclude_fields = c("study.accession"),
substrings_to_omit = c("http://purl.obolibrary.org/obo/"),
patterns_to_kill = "....-..-..|.*...,...",
token_fixups = list(c("t''", "t'"), c(":$", "")), max_tok_nchar = 25,
min_tok_nchar = 4, cleanFields = list("..*id$", ".name$", "_name$",
"checksum", "isolate", "filename", "^ID$", "barcode", "Sample.Name"))
|
csv |
a character(1) CSV file path |
DocSetInstance |
if missing, DocSet is initialized in this function, otherwise the instance is updated with new content |
doctitle |
character(1) document title |
docabst |
character(1) a string: the document abstract |
rec_id_field |
character(1) field in CSV identifying records |
exclude_fields |
character vector of fields to ignore while parsing |
substrings_to_omit |
character vector of strings to remove from candidate keywords via gsub |
patterns_to_kill |
character(1) regexp that identifies tokens to be omitted from keyword set |
token_fixups |
a list if character(2) vectors that will be |
max_tok_nchar |
numeric(1) defaults to 25, tokens with more characters will be truncated to this length and suffixed with ellipsis |
min_tok_nchar |
numeric(1) defaults to 4, tokens shorter than this are not in index used with gsub() to repair irregularities. For example ‘c("t”", "t’")‘ will transform 'Burkitt”s' to 'Burkitt’s' |
cleanFields |
list of regular expressions identifying fields to ignore |
instance of DocSet
The expected use case has 'DocSetInstance' being updated in a loop. Sharing of environments across multiple DocSetInstances can occur and unexpected behaviors may ensue. Note also that many of the parameter defaults to parseDoc are for the use case of processing SRA metadata.
1 2 3 4 5 6 7 8 9 10 | myob = ssrch::docset_cancer68
td = tempdir()
alld = ls(docs2kw(myob))
r1 = retrieve_doc(alld[1], myob)
expo = write.csv(r1, paste0(td, "/expo.csv"))
pd = parseDoc(paste0(td, "/expo.csv"), doctitle=ssrch::titles68[alld[1]],
docabst="qwerty")
pd
searchDocs("quer", pd) # query will fail
searchDocs("qwer", pd) # should succeed
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.