knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(pttdatahaku)

Browsing databases

path to database: r pttdatahaku:::db_path

To list all data sets in the data base, use ptt_search_data without arguments. To search, you can set a search term.

ptt_search_data()
ptt_search_data("tyonv")

To list all data bases, use ptt_search_database without arguments

ptt_search_db()

Saving Data

Preferably do not save data to database manually/interactively but use the database lists for reproducibility and updating. See Section Creating Databases. Sometimes you need to save manually however, for this, use ptt_save_data:

test_df <- data.frame(x = letters[1:5], y = rnorm(5))
ptt_save_data(test_df)

Reading data

Use ptt_read_data with the name of data set to read data from a data base.

data <- ptt_read_data("test_df")
data

Importing Data from Statistics Finland

To import data, you need an url that tells you where the data is. Function statfi_parse_url transforms the addresses in the GUI of pxweb to the addresses of API of pxweb:

gui_url <- "https://pxweb2.stat.fi/PxWeb/pxweb/fi/StatFin/StatFin__vkour/statfin_vkour_pxt_12bq.px/"
api_url <- statfi_parse_url(gui_url)
api_url

Given url, you can construct queries using pxweb_print_full_query:

my_url <- api_url
pxweb_print_full_query(my_url)

Given url, you can print a code that imports the data in the url:

pxweb_print_code_full_query(my_url)
pxweb_print_code_full_query_c(my_url)

where the latter copies the code to clipboard for you to add where ever you wish simply by ctrl+V. With a query and an url, you can import data. Let's simplify the query a bit:

my_query <-
     list("Vuosi"=c("1970","1975"),
          "Alue"=c("SSS","KU020","KU005"),
          "Ik\U00E4"=c("SSS","15-19"),
          "Sukupuoli"=c("SSS","1","2"),
          "Koulutusaste"=c("SSS","3T8"),
         "Tiedot"=c("vaesto"))

Use function ptt_get_statfi to import data:

ptt_get_statfi(my_url, my_query)

Creating Databases

Each database has a database list. Database lists contain the code that accesses the Statistics Finland data API and imports the data to the database. For each data set there is an entry in a database list. Database lists are used to create and update databases.

Use function ptt_save_db_list to create a database list. Note that we are adding an object to the data base. Thus, first create the object and then use this object as an argument of ptt_save_db_list without quotation marks. Database lists are list-objects.

test_db <- list()
ptt_save_db_list(test_db)

To retrieve a database list, use function ptt_read_db_list. Note that we are fetching an object from the database by its name. Hence, use quotation marks.

ptt_read_db_list("test_db")

This database is still empty. Database lists store queries. A query is a set of information that is required to import a data set. To add a query, you need the url of the data set, the query itself and a call. Query and url are the objects that are also the arguments to the ptt_get_statfi-function. The third element, the call, is the expression to use ptt_get_statfi-function itself. The call may also contain further operations as ptt_get_statfi cannot possibly handle all data in the wild.

To add a query to a database list, use function ptt_db_add_query with arguments that indicate the name of the database to which you wish to add the query, the url, and the call.

ptt_add_query("test_db", url = my_url, query = my_query, call = c("ptt_get_statfi(url, query)"))

Now our test database is not empty anymore:

db_list <- ptt_read_db_list("test_db")
db_list

Use R base names-function to see all the data sets that the database list manages:

names(db_list)

Now the the database list has all information it needs to import the data. To add the data to the database, use ptt_db_update-function. It takes as argument the name of the database list and imports and updates all data sets on which the database list has information.

ptt_db_update("test_db")

To keep better track of your data you might want to add information about your data to your database. To add metadata information from SF:

ptt_add_pxweb_metadata("test_db")
ptt_read_db_list("test_db")$vkour_12bq$pxweb_metadata

You can also create complete databases using simple the url locations of required data:

my_new_db_list <- c("StatFin__atp/statfin_atp_pxt_11my.px/",
                    "https://pxweb2.stat.fi/PxWeb/pxweb/fi/StatFin/StatFin__vkour/statfin_vkour_pxt_12bq.px/")
ptt_create_db_list(my_new_db_list, overwrite = TRUE)
names(ptt_read_db_list("my_new_db_list"))

To summarize a data base list:

ptt_glimpse_db("my_new_db_list")

Codes and names

df <- ptt_read_data("tyonv_12u4")
df
label_key <- get_codes_names("tyonv_12u4", "tyovoimapalvelut_db")
df <- statficlassifications::key_recode(df, label_key)
df

Switching between codes and names is straightforward

df <- statficlassifications::key_recode(df, label_key)
df


pttry/pttdatahaku documentation built on Jan. 25, 2025, 10:37 a.m.