(Adapted from https://style.tidyverse.org/)
The styler
package is highly recommended for automatically applying some (but not all) of the style recommendations here. styler
is available as a stand-alone R package, but also comes with a handy RStudio add-in.
We use camelCase in R. Function and variable names all start with lowercase. Package names start with uppercase.
Examples:
cohortData <- loadCohortData("myFolder")
SqlRender
packageFunction names typically start with a verb. Variable names are typically nouns. Do not encode the data type in the variable names. Also, everything is data, so no need to say that unless unavoidable.
Good
fitOutcomeModel
function.computeCovariateBalance
function.population
argument.Bad
sampling
as variable name (not a noun)namesVector
, covariatesDf
(encodes the data type)getResultData
(everything is data)Place spaces around all infix operators (=
, +
, -
, <-
, etc.). The same rule applies when using =
in function calls. Always put a space after a comma, and never before (just like in regular English).
Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
Bad
average<-mean(feet/12+inches,na.rm=TRUE)
There’s a small exception to this rule: :
, ::
and :::
don’t need spaces around them.
Good
x <- 1:10
base::get
Bad
x <- 1 : 10 base :: get
Place a space before left parentheses, except in a function call.
Good
if (debug) { do(x) } plot(x, y)
Bad
if(debug){ do(x) } plot (x, y)
Extra spacing (i.e., more than one space in a row) is ok if it improves alignment of equal signs or assignments (<-
).
Do not place spaces around code in parentheses or square brackets (unless there’s a comma, in which case see above).
Good
if (debug) { do(x) } diamonds[5, ]
Bad
if ( debug ) { # No spaces around debug do(x) } x[1,] # Needs a space after the comma x[1 ,] # Space goes after comma not beforeCurly braces
An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else.
Always indent the code inside curly braces. It’s ok to leave very short statements on the same line:
if (y < 0 && debug) { message("Y is negative") }
Strive to limit your code to 100 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
When indenting your code, use tabs. Never use spaces or mix tabs and spaces.
Hint: In RStudio you can use ctrl-i to automatically indent the code for you.
Use <-, not =, for assignment.
Good
x <- 5
Bad
x = 5
If-then-else clauses should always use curly brackets, even if there's only one clause and it's one statement.
Good
if (a == b) { doSomething() }
Bad
if (a == b) doSomething()
When calling a function that has more than one argument, make sure to refer to each argument by name instead of relying on the order of arguments.
Good
translateSql(sql = "COMMIT", targetDialect = "PDW")
Bad
translateSql("COMMIT", "PDW")
Comment your code only where the intent is not immediately obvious. Each line of a comment should begin with the comment symbol and a single space: #
. Comments should explain the why, not the what.
Use commented lines of -
to break up your file into easily readable chunks, for example:
## Load data --------------------------- x <- readRDS("data.rds") ## Plot data --------------------------- plot(x)
Opening curly brackets should precede a new line. A closing curly bracket should be followed by a new line except when it is followed by else
or a closing parenthesis.
Good
if (a == b) { doSomething() } else { doSomethingElse() }
Bad
if (a == b) { doSomething() } else { doSomethingElse() }
Pipes should always be at the end of the line.
Good
foo %>% filter(x > 0) %>% group_by(y) %>% summarize(total = sum(x))
Bad
foo %>% filter(x > 0) %>% group_by(y) %>% summarize(total = sum(x))
Dplyr joins and merge statements should always have a 'by' argument.
Good
foo %>% inner_join(bar, by = "covariateId")
Bad
foo %>% inner_join(bar)
The OHDSI code style for SQL is heavily inspired by the Poor Man's T-SQL Formatter, which is available as a NotePad++ plugin. The only difference with the default settings is that in OHDSI, commas are trailing. You can automatically format your SQL correctly by using the Poor Man's T-SQL Formatter Online Tool (but don't forget to set Trailing Commas).
Because several database platforms are case-insensitive and tend to convert table and field names to either uppercase (e.g. Oracle) or lowercase (e.g. PostgreSQL), we use snake_case. All names should be in lowercase. Reserved words should be in upper case.
Good
SELECT COUNT(*) AS person_count FROM person
Bad
SELECT COUNT(*) AS personCount FROM person SELECT COUNT(*) AS Person_Count FROM person SELECT COUNT(*) AS PERSON_COUNT FROM person select count(*) as person_count from person
Commas should be trailing.
Good
SELECT COUNT(*) AS person_count, condition_concept_id, condition_type_concept_id FROM condition_era GROUP BY condition_concept_id, condition_type_concept_id
Bad
SELECT COUNT(*) AS person_count ,condition_concept_id ,condition_type_concept_id FROM condition_era GROUP BY condition_concept_id ,condition_type_concept_id
Indentation is done using tabs. Field definitions are followed by a new line.
Good
SELECT COUNT(*) AS person_count, condition_type_concept_id FROM ( SELECT * FROM condition_era WHERE condition_concept_id = 123 ) tmp GROUP BY condition_type_concept_id;
Bad
SELECT COUNT(*) AS person_count, condition_type_concept_id FROM (SELECT * FROM condition_era WHERE condition_concept_id = 123) tmp GROUP BY condition_type_concept_id;
In R we use camel case, and in SQL we use snake case. Therefore, on the interface between the two languages we must convert from one convention to the other. To facilitate this, the OHDSI SqlRender
and DatabaseConnector
packages provide various features.
In general, in R we can convert from one case to another using the camelCaseToSnakeCase
and snakeCaseToCamelcase
functions in the SqlRender
package:
data <- data.frame(cohortId = 1, cohortName = "test") colnames(data) <- camelCaseToSnakeCase(colnames(data)) colnames(data) # [1] "cohort_id" "cohort name" colnames(data) <- snakeCaseToCamelcase(colnames(data)) colnames(data) # [1] "cohortId" "cohortName"
When downloading data, a shortcut is to use the snakeCaseToCamelcase
argument of the querySql
function in the DatabaseConnector
package:
sql <- "SELECT cohort_definition_id, subject_id FROM cdm.cohort;" cohort <- querySql(connection = connection, sql = sql, snakeCaseToCamelcase = TRUE)
Where cohort
will be a data frame with columns cohortDefinitionId
and subjectId
.
When uploading data, a shortcut is to use the camelCaseToSnakeCase
argument of the insertTable
in the DatabaseConnector
package:
dataToInsert <- data.frame(cohortId = 1, cohortName = "test") insertTable(connection = connection, tableName = "my_table", data = dataToInsert, camelCaseToSnakeCase = TRUE)
Where the table called my_table
will have columns named cohort_id
and cohort_name
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.