updateFromDF: Easier to use wrapper of 'updateValues'

View source: R/update-special.R

updateFromDFR Documentation

Easier to use wrapper of updateValues

Description

A wrapper of updateValues which takes in updates in the form of a dataframe, csv, tsv, with rows = records and columns = attributes.

Usage

updateFromDF(
  target,
  projectName,
  modelName,
  df,
  table.method = NULL,
  autolink = FALSE,
  dryRun = FALSE,
  separator = ",",
  show.df = TRUE,
  auto.proceed = FALSE,
  revisions.only = FALSE,
  template = NULL,
  ...
)

Arguments

target

A list, which can be created using magmaRset, containing your authorization 'token' (a string), a 'url' of magma to target (a string), and optional 'opts' for specifying additions parameters for curl requests (a named list).

projectName

Single string. The name of the project you would like to interact with. For options, see retrieveProjects.

modelName

Single string. The name of the subset data structure within the project, which are referred to as 'model's in magma, to interact with. For options, see retrieveModels or https://timur.ucsf.edu/<projectName>/map.

df

A dataframe, containing the data to upload to magma.

Alternatively, a String specifying the file path of a file containing such data.

See below for additional formatting details.

table.method

"replace" or "append". Sets the methodology used for building the revisions to request for table-type model updates:

  • "append": Add the currently defined records, attaching them to parents defined in the df.

  • "replace": Add the currently defined records, attaching them to parents defined in the df, AND unlink / remove all current records, of modelName, attached to parent records defined in the df.

autolink

Logical. FALSE by default for safety, but often you will want to set it to TRUE. Passed through to magma, this parameter controls whether the system will attempt to connect all targeted records up with the project's root. Specifically, this means the system will 1) determine parent records of all targeted records if it can, based on the project's gnomon grammar, 2) continue parent determination for those parent records, repeating this process until reaching the project's root (the project record), then 3) creates any of these records that don't currently exist, and finally 4) creates all the assumed parent-child linkages

dryRun

Logical. FALSE by default. Passed through to magma, this parameter controls whether the system will only test whether the update is valid without making changes to the database.

separator

String indicating the field separator to use if providing df as a file path. Default = ",". Use "\t" for tsvs.

show.df

Logical which sets whether the df-data should be printed out.

auto.proceed

Logical. When set to TRUE, the function does not ask before proceeding forward with the 'magma/update'.

revisions.only

Logical. For troubleshooting purposes, when set to TRUE, no data will be sent to magma. Instead, the list structure that would have been passed to the revisions input of updateValues is returned as output.

template

For internal use in minimizing excess http requests to magma, NULL or the return of retrieveTemplate(target, projectName).

...

Additional parameters passed along to the internal '.retrieve()', '.query()', or '.update()' functions, for troubleshooting or advanced-user purposes only:

  • request.only (Logical) & json.params.only (Logical) which 1) stop the function before its main curl request to magma and 2) returns the values that would have been sent to magma in either of two formats.

  • verbose (Logical) sets whether to report the status of the curl request after it is performed.

Details

This function provides a simple method for updating multiple attributes of multiple magma records provided as a rectangular dataframe, or equivalent file structure. It utilizes the magma/query function, documented here https://mountetna.github.io/magma.html#update, to upload data after converting to the format required by that function.

The user-indicated df is read in, presented to the user for inspection, then transformed to the necessary format and passed along to updateValues.

The updateValues() function will then summarize records to be updated and allow the user to double-check this information before proceeding.

This user-prompt step can be bypassed (useful when running in a non-interactive way) by setting auto.proceed = TRUE, but NOTE: It is a good idea to always check carefully before proceeding, if possible. Data can be overwritten with NAs or zeros or the like, or disconnected from parent records, but improperly named records cannot be easily removed.

For "standard" models with explicit identifiers, the function targets the df's row-indicated records and column-indicated attributes of the modelName model of projectName project. In such cases, the first column of df must contain the identifiers of the records your wish to update.

For table-type models which do not have explicit identifiers, the function creates records per each row of the given df with the requested values filled in for column-indicated attributes of the modelName model of projectName project, and attaches these records to the indicated parent records. In such cases, a column named as the parent model must exist in df to provide the parent identifiers of all requested new data. In such cases, table.method must also be given as either "append" or "replace".

df can be provided either as a data.frame directly, or as a file path pointing to a file containing such data. If given as a file path, the separator input can be used to adjust for whether the file is a csv (the default, separator = ","), or tsv, separator = "\t", or other format.

The df data structure when targeting 'standard' models:

  • Rows = records, with the first column indicating the record identifiers.

  • Columns = represent the data desired to be given for each attribute.

  • Column Names (or the top row when providing a file) = attribute names. Except for the first column (ignored as this column's data are used as identifiers), all column names must be valid attribute names of the target modelName.

The df data structure when targeting table-type models:

  • Rows = records, but no identifiers are needed.

  • Columns = represent the data desired to be given for each attribute.

  • Column Names (or the top row when providing a file) = attribute names. At least one column must be named after the parent model and must represent parent model identifiers.

Value

None directly.

The function sends data to magma, and the only outputs are information reported via the console.

Use Case. Using this function to change records' identifiers

To do so, provide a file or dataframe where 1) The first column, named something random Iits name will be ignored.), contains current identifiers; 2) Some other column, named as the attribute which is treated as the identifier for the model, contains the new identifiers

To determine the identifier attribute's name, you can use retrieveTemplate:

retrieveTemplate(<target>, <projectName>)$models$<modelName>$template$identifier.

See Also

updateMatrix for uploading matrix data

updateValues for a more direct replica of magma/update which is more flexible, though a bit more complicated to use.

https://mountetna.github.io/magma.html#update for documentation of the underlying magma/update function.

Examples


if (interactive()) {
    # First, we use magmaRset to create an object which will tell other magmaR
    #  functions our authentication token (as well as some other optional bits).
    # When run in this way, it will ask you to give your token.
    magma <- magmaRset()
    
    ### Note that you likely do not have write-permissions for the 'example'
    # project, so this code can be expected to give an authorization error,
    # yet still provides a good example of how to structure your code.
    
    ### Case 1: A 'standard' model which has an identifier:
    # Retrieving some example data from magma to use as our update
    df <- retrieve(
        magma, projectName = "example", modelName = "rna_seq",
        recordNames = "all",
        attributeNames = c("tube_name", "biospecimen", "cell_number")
        )
    df
    # Keys to note:
    # - the first column of this df holds the identifiers of all records we
    #     wish to target.  (The fact that this column is properly named as
    #     'tube_name' des not matter.)
    # - all subsequent columns hold the new values and are named as the
    #     attributes we wish to update.
    
    # To update values of the 'standard'-type "rna_seq" model
    updateFromDF(
        target = magma,
        projectName = "example",
        modelName = "rna_seq",
        df = df)

    ### Case 2: A 'table' model which has no identifiers:
    # Retrieving some example data from magma to use as our update
    table <- retrieve(
        magma, projectName = "example", modelName = "demographic",
        recordNames = "all",
        attributeNames = c("subject", "name", "value")
        )
    table
    # Keys to note:
    # - the first column of this df holds the parent record identifiers we
    #     wish to target. The fact that this column is properly named as
    #     'subject' does matter, but this column does not need to have
    #     been the first column.
    # - all subsequent columns hold the new values and are named as the
    #     attributes we wish to update.

    ## Key decision:
    # For table models, you must additionally decide whether to 'append' to
    #  (meaning to add them in addition to all previous records which
    #  attach to the same parents), or 'replace' (meaning clear all previous
    #  records attaching to the same parents this update hits, so that only
    #  the values within this update will exist) current values of the
    #  target model with your update's values.
    # This choice is given to the 'table.method' input.

    # To update values of the 'table'-type "demographics" model
    updateFromDF(
        target = magma,
        projectName = "example",
        modelName = "demographic",
        table.method = "replace",
        df = table)

}


magmaR documentation built on June 8, 2025, 10 a.m.