av | R Documentation |
avtables()
describes tables available in a
workspace. Tables can be visualized under the DATA tab, TABLES
item. avtable()
returns an AnVIL table. avtable_paged()
retrieves an AnVIL table by requesting the table in 'chunks',
and may be appropriate for large tables. avtable_import()
imports a data.frame to an AnVIL table. avtable_import_set()
imports set membership (i.e., a subset of an existing table)
information to an AnVIL table. avtable_delete_values()
removes rows from an AnVIL table.
avtable_import_status()
queries for the status of an
'asynchronous' table import.
avdata()
returns key-value tables representing the
information visualized under the DATA tab, 'REFERENCE DATA' and
'OTHER DATA' items. avdata_import()
updates (modifies or
creates new, but does not delete) rows in 'REFERENCE DATA' or
'OTHER DATA' tables.
avbucket()
returns the workspace bucket, i.e., the
google bucket associated with a workspace. Bucket content can
be visualized under the 'DATA' tab, 'Files' item.
avfiles_ls()
returns the paths of files in the
workspace bucket. avfiles_backup()
copies files from the
compute node file system to the workspace bucket.
avfiles_restore()
copies files from the workspace bucket to
the compute node file system. avfiles_rm()
removes files or
directories from the workspace bucket.
avruntimes()
returns a tibble containing information
about runtimes (notebooks or RStudio instances, for example)
that the current user has access to.
avruntime()
returns a tibble with the runtimes
associated with a particular google project and account number;
usually there is a single runtime satisfiying these criteria,
and it is the runtime active in AnVIL.
'avdisks()' returns a tibble containing information about persistent disks associatd with the current user.
avtables(namespace = avworkspace_namespace(), name = avworkspace_name())
avtable(
table,
namespace = avworkspace_namespace(),
name = avworkspace_name(),
na = c("", "NA")
)
avtable_paged(
table,
n = Inf,
page = 1L,
pageSize = 1000L,
sortField = "name",
sortDirection = c("asc", "desc"),
filterTerms = character(),
filterOperator = c("and", "or"),
namespace = avworkspace_namespace(),
name = avworkspace_name(),
na = c("", "NA")
)
avtable_import(
.data,
entity = names(.data)[[1]],
namespace = avworkspace_namespace(),
name = avworkspace_name(),
delete_empty_values = FALSE,
na = "NA",
n = Inf,
page = 1L,
pageSize = NULL
)
avtable_import_set(
.data,
origin,
set = names(.data)[[1]],
member = names(.data)[[2]],
namespace = avworkspace_namespace(),
name = avworkspace_name(),
delete_empty_values = FALSE,
na = "NA",
n = Inf,
page = 1L,
pageSize = NULL
)
avtable_import_status(
job_status,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avtable_delete_values(
table,
values,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avdata(namespace = avworkspace_namespace(), name = avworkspace_name())
avdata_import(
.data,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avbucket(
namespace = avworkspace_namespace(),
name = avworkspace_name(),
as_path = TRUE
)
avfiles_ls(
path = "",
full_names = FALSE,
recursive = FALSE,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avfiles_backup(
source,
destination = "",
recursive = FALSE,
parallel = TRUE,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avfiles_restore(
source,
destination = ".",
recursive = FALSE,
parallel = TRUE,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avfiles_rm(
source,
recursive = FALSE,
parallel = TRUE,
namespace = avworkspace_namespace(),
name = avworkspace_name()
)
avruntimes()
avruntime(project = gcloud_project(), account = gcloud_account())
avdisks()
namespace |
character(1) AnVIL workspace namespace as returned
by, e.g., |
name |
character(1) AnVIL workspace name as returned by, eg.,
|
table |
character(1) table name as returned by, e.g., |
na |
in |
n |
numeric(1) maximum number of rows to return |
page |
integer(1) first page of iteration |
pageSize |
integer(1) number of records per page. Generally, larger page sizes are more efficient. |
sortField |
character(1) field used to sort records when determining page order. Default is the entity field. |
sortDirection |
character(1) direction to sort entities
( |
filterTerms |
character(1) string literal to select rows with an exact (substring) matches in column. |
filterOperator |
character(1) operator to use when multiple
terms in |
.data |
A tibble or data.frame for import as an AnVIL table. |
entity |
|
delete_empty_values |
logical(1) when |
origin |
character(1) name of the entity (table) used to create the set e.g "sample", "participant", etc. |
set |
|
member |
|
job_status |
tibble() of job identifiers, returned by
|
values |
vector of values in the entity (key) column of
|
as_path |
logical(1) when TRUE (default) return bucket with
prefix |
path |
For |
full_names |
logical(1) return names relative to |
recursive |
logical(1) list files recursively? |
source |
character() file paths. for |
destination |
character(1) a google bucket
( |
parallel |
logical(1) backup files using parallel transfer?
See |
project |
|
account |
|
Treatment of missing values in avtable()
,
avtable_paged()
and avtable_import()
are handled by the
na
parameter.
avtable()
may sometimes result in a curl error 'Error in
curl::curl_fetch_memory' or a 'Internal Server Error (HTTP
500)' This may be due to a server time-out when trying to read
a large (more than 50,000 rows?) table; using avtable_paged()
may address this problem.
For avtable()
and avtable_paged()
, the default na = c("", "NA")
treats empty cells or cells containing "NA" in a Terra
data table as NA_character_
in R. Use na = character()
to
indicate no missing values, na = "NA"
to retain the
distinction between ""
and NA_character_
.
For avtable_import()
, the default na = "NA"
records
NA_character_
in R as the character string "NA"
in an AnVIL
data table.
The default setting (na = "NA"
in avtable_import()
,
na = c("", NA_character_")
in avtable()
, is appropriate to
'round-trip' data from R to AnVIL and back when character vectors
contain only NA_character_
. Use na = "NA"
in both functions to
round-trip data containing both NA_character_
and "NA". Use
a distinct string, e.g., na = "__MISSING_VALUE__"
, for both
arguments if the data contains a string "NA"
as well as
NA_character_
.
avtable_import()
tries to work around limitations in
.data
size in the AnVIL platform, using pageSize
(number of
rows) to import so that approximately 1500000 elements (rows x
columns) are uploaded per chunk. For large .data
, a progress
bar summarizes progress on the import. Individual chunks may
nonetheless fail to upload, with common reasons being an
internal server error (HTTP error code 500) or transient
authorization failure (HTTP 401). In these and other cases
avtable_import()
reports the failed page(s) as warnings. The
user can attempt to import these individually using the page
argument. If many pages fail to import, a strategy might be to
provide an explicit pageSize
less than the automatically
determined size.
avtable_import_set()
creates new rows in a table
<origin>_set
. One row will be created for each distinct value
in the column identified by set
. Each row entry has a
corresponding column <origin>
linking to one or more rows in
the <origin>
table, as given in the member
column. The
operation is somewhat like split(member, set)
.
avfiles_backup()
can be used to back-up individual files
or entire directories, recursively. When recursive = FALSE
,
files are backed up to the bucket with names approximately
paste0(destination, "/", basename(source))
. When recursive = TRUE
and source is a directory path/to/foo/', files are backed up to bucket names that include the directory name, approximately
paste0(destination, "/", dir(basename(source),
full.names = TRUE)). Naming conventions are described in detail in
gsutil_help("cp")'.
avfiles_restore()
behaves in a manner analogous to
avfiles_backup()
, copying files from the workspace bucket to
the compute node file system.
avtables()
: A tibble with columns identifying the table,
the number of records, and the column names.
avtable()
: a tibble of data corresponding to the AnVIL
table table
in the specified workspace.
avtable_paged()
: a tibble of data corresponding to the
AnVIL table table
in the specified workspace.
avtable_import()
returns a tibble()
containing the page
number, 'from' and 'to' rows included in the page, job
identifier, initial status of the uploaded 'chunks', and any
(error) messages generated during status check. Use
avtable_import_status()
to query current status.
avtable_import_set()
returns a character(1)
name of the
imported AnVIL tibble.
avtable_delete_values()
returns a tibble
representing
deleted entities, invisibly.
avdata()
returns a tibble with five columns: "type"
represents the origin of the data from the 'REFERENCE' or
'OTHER' data menus. "table"
is the table name in the
REFERENCE
menu, or 'workspace' for the table in the 'OTHER'
menu, the key used to access the data element, the value label
associated with the data element and the value (e.g., google
bucket) of the element.
avdata_import()
returns, invisibly, the subset of the
input table used to update the AnVIL tables.
avbucket()
returns a character(1)
bucket identifier,
prefixed with gs://
if as_path = TRUE
.
avfiles_ls()
returns a character vector of files in the
workspace bucket.
avfiles_backup()
returns, invisibly, the status code of the
gsutil_cp()
command used to back up the files.
avfiles_rm()
on success, returns a list of the return
codes of gsutil_rm()
, invisibly.
avruntimes()
returns a tibble with columns
id: integer() runtime identifier.
googleProject: character() billing account.
tool: character() e.g., "Jupyter", "RStudio".
status character() e.g., "Stopped", "Running".
creator character() AnVIL account, typically "user@gmail.com".
createdDate character() creation date.
destroyedDate character() destruction date, or NA.
dateAccessed character() date of (first?) access.
runtimeName character().
clusterServiceAccount character() service ('pet') account for this runtime.
masterMachineType character() It is unclear which 'tool' populates which of the machineType columns).
workerMachineType character().
machineType character().
persistentDiskId integer() identifier of persistent disk (see
avdisks()
), or NA
.
avruntime()
returns a tibble witht he same structure as
the return value of avruntimes()
.
avdisks()
returns a tibble with columns
id character() disk identifier.
googleProject: character() billing account.
status, e.g, "Ready"
size integer() in GB.
diskType character().
blockSize integer().
creator character() AnVIL account, typically "user@gmail.com".
createdDate character() creation date.
destroyedDate character() destruction date, or NA.
dateAccessed character() date of (first?) access.
zone character() e.g.. "us-central1-a".
name character().
## Not run:
## editable copy of '1000G-high-coverage-2019' workspace
avworkspace("bioconductor-rpci-anvil/1000G-high-coverage-2019")
sample <-
avtable("sample") %>% # existing table
mutate(set = sample(head(LETTERS), nrow(.), TRUE)) # arbitrary groups
sample %>% # new 'participant_set' table
avtable_import_set("participant", "set", "participant")
sample %>% # new 'sample_set' table
avtable_import_set("sample", "set", "name")
## End(Not run)
if (gcloud_exists() && nzchar(avworkspace_name())) {
## from within AnVIL
data <- avdata()
data
}
## Not run:
avdata_import(data)
## End(Not run)
if (gcloud_exists() && nzchar(avworkspace_name()))
## From within AnVIL...
bucket <- avbucket() # discover bucket
## Not run:
path <- file.path(bucket, "mtcars.tab")
gsutil_ls(dirname(path)) # no 'mtcars.tab'...
write.table(mtcars, gsutil_pipe(path, "w")) # write to bucket
gsutil_stat(path) # yep, there!
read.table(gsutil_pipe(path, "r")) # read from bucket
## End(Not run)
if (gcloud_exists() && nzchar(avworkspace_name()))
avfiles_ls()
## Not run:
## backup all files in the current directory
## default buckets are gs://<bucket-id>/<file-names>
avfiles_backup(dir())
## backup working directory, recursively
## default buckets are gs://<bucket-id>/<basename(getwd())>/...
avfiles_backup(getwd(), recursive = TRUE)
## End(Not run)
if (gcloud_exists())
## from within AnVIL
avruntimes()
if (gcloud_exists())
## from within AnVIL
avdisks()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.