View source: R/lsa.convert.data.r
lsa.convert.data | R Documentation |
lsa.convert.data
converts datasets from large-scale assessments from their original formats (SPSS or ASCII text) into .RData
files. print
prints the properties of an lsa.data
objects on screen. lsa.select.countries.PISA
lets selecting PISA data from specific countries for analysis.
lsa.convert.data(
inp.folder,
PISApre15 = FALSE,
ISO,
missing.to.NA = FALSE,
out.folder
)
## S3 method for class 'lsa.data'
print(x, col.nums, ...)
lsa.select.countries.PISA(data.file, data.object, cnt.names, output.file)
inp.folder |
The folder containing the IEA-like SPSS data files or ASCII text files and
|
PISApre15 |
When converting PISA files, set to |
ISO |
Vector containing character ISO codes of the countries' data files to
convert (e.g. |
missing.to.NA |
Should the user-defined missing values be recoded to |
out.folder |
Path to the folder where the converted files will be stored. If omitted,
same as the |
x |
( |
col.nums |
( |
... |
( |
data.file |
( |
data.object |
( |
cnt.names |
( |
output.file |
( |
The lsa.convert.data
function converts the originally provided data files into .RData
sets. RALSA adds its own method for printing lsa.data
objects on screen. The lsa.select.countries.PISA
is a utility function that allows the user to select countries of interest from a converted PISA data file (or PISA object residing in memory) and remove the rest of the countries' data. This is useful when the user does not want to analyze all countries data in a PISA file.
lsa.convert.data
IEA studies, as well as OECD TALIS and some conducted by other organizations, provide their data in SPSS .sav
format with same or very similar structure: one file per country and type of respondent (e.g. school principal, student, teacher, etc.) per population. For IEA studies and OECD TALIS use the ISO
argument to specify the countries' three-letter ISO codes whose data is to be converted. The three-letter ISO codes for each country can be found in the user guide for the study in scope. For example, the ISO codes of the countries participating in PIRLS 2016 can be found in its user guide on pages 52-54. To convert the files from all countries in the downloaded data from IEA studies and OECD TALIS, simply omit the ISO
argument. Cycles of OECD PISA prior to 2015, on the other hand, do not provide SPSS .sav
or other binary files, but ASCII text files, accompanied with SPSS syntax (.sps
) files that are used to import the text files into SPSS. These files are per each type of respondent containing all countries' data. The lsa.convert.data
function converts the data from either source assuring that the structure of the output .RData
files is the same, although the structure of the input files is different (SPSS binary files vs. ASCII text files plus import .sps
files). The data from PISA 2015 and later, on the other hand, is provided in SPSS format (all countries in one file per type of respondent). Thus, the PISApre15
argument needs to be specified as TRUE
when converting data sets from PISA prior to its 2015 cycle. The default for the PISApre15
argument is FALSE
which means that the function expects to find IEA-like SPSS binary files per country and type of respondent in the directory in inp.folder
or OECD PISA 2015 (or later) SPSS .sav
files. If PISApre15 = TRUE
and country codes are provided to ISO
, they will be ignored because PISA files contain data from all countries together.
The files to be converted must be in a folder on their own, from a single study, single cycle and single population. In addition, if there are more than one file types per study, cycle and population, these also must be in different folders. For example, in TIMSS 2019 the grade 8 data files are main (end with "m7", electronic version of the paper administered items), bridge (end with "b7", paper administration with trend items for countries participating in previous TIMSS cycles) and Problem Solving and Inquiry (PSI) tasks (end with "z7", electronic administration only, optional for countries). These different types must be in separate folders. In case of OECD PISA prior 2015, the folder must contain both the ASCII text files and the SPSS .sps
import syntax files. If the folder contains data sets from more than one study or cycle, the operation will break with error messages.
If the path for the inp.folder
argument is not specified, the function will search for files in the working directory (i.e. as returned by getwd()
). If folder path for the the out.folder
is not specified, it will take the one from the inp.folder
and the files will be stored there. If both the inp.folder
and out.folder
arguments are missing, the directory from getwd()
will be used to search, convert and store files.
If missing.to.NA
is set to TRUE
, all user-defined missing values from the SPSS will be imported as NA
which is R
's only kind of missing value. This will be the most often case when analyzing these data since the reason why the response is missing will be irrelevant most of the times. However, if it is needed to know why the reasons for missing responses, as when analyzing achievement items (i.e. not administered vs. omitted or not reached), the argument shall be set to FALSE
(default for this argument) which will convert all user-defined missing values as valid ones.
print
RALSA uses its own method for printing objects of class lsa.data
on screen. Passing just the object name to the console will print summarized information about the study's data and the first six columns of the dataset (see the Value section). If col.nums
specifies which columns from the dataset shall be included in the output (see examples).
lsa.select.countries.PISA
lsa.select.countries.PISA
lets the user to take a PISA dataset, either a converted file or lsa.data
object in the memory and reduce the number of countries in it by passing the names of the countries which need to be kept as a character vector to the cnt.names
argument. If full path (including the file name) to the resulting file is specified in the output.file
argument, it will be written on disk. If not, the data will be written to an lsa.object
in memory with the same name as the input file. See the examples.
lsa.convert.data
.RData
data files, containing an object with class lsa.data
, an extension of the data.table
class. The data.table
object has the same name as the .RData
file it is saved in. The object has additional attributes: study name (study
), study cycle (cycle
), and respondent file type (file.type
). Each variable has its own additional attributes: its own label attached to it, if it existed in the source SPSS file. If the missing.to.NA
was set to TRUE
, each variable has an attribute missings
, containing the user-defined missing values from the SPSS files.
The object in the .RData
file is keyed on the country ID variable.
print
Prints the information of an lsa.data
object (study, cycle, respondent type, number of countries, key – country ID, and if the variables have user-defined missing values) and a preview of the data. The default preview (when no col.nums
) are specified will include the first six columns.
lsa.select.countries.PISA
Writes a file containing an lsa.object
with the data for the countries passed to the cnt.names
argument, if the output.file
argument is used. If the output.file
argument is not used, the lsa.object
will be written to the memory with the same name as the file name in inp.file
.
When downloading the .sps
files (ASCII text and control .sps
) for OECD PISA files prior to the 2015 cycle (say http://www.oecd.org/pisa/pisaproducts/pisa2009database-downloadabledata.htm), save them without changing their names and without modifying the file contents. The function will look for the files as they were named originally.
Different studies and cycles define the "I don't know" (or similar) category of discrete variables in different ways - either as a valid or missing value. The lsa.convert.data
function sets all such or similar codes to missing value. If this has to be changed, the lsa.recode.vars
can be used as well (also see lsa.vars.dict
).
Foy, P. (Ed.). (2018). PIRLS 2016 User Guide for the International Database. TIMSS & PIRLS International Study Center.
lsa.merge.data
, lsa.vars.dict
, lsa.recode.vars
# Convert all IEA-like SPSS files in the working directory, setting all user-defined missing
# values to \code{NA}
## Not run:
lsa.convert.data(missing.to.NA = TRUE)
## End(Not run)
# Convert IEA TIMSS 2011 grade 8 data from Australia and Slovenia, keeping all user-defined
# missing values as valid ones specifying custom input and output directories
## Not run:
lsa.convert.data(inp.folder = "C:/TIMSS_2011_G8", ISO = c("aus", "svn"), missing.to.NA = FALSE,
out.folder = "C:/Data")
## End(Not run)
# Convert OECD PISA 2009 files converting all user-defined missing values to \code{NA}
# using custom input and output directories
## Not run:
lsa.convert.data(inp.folder = "/media/PISA_2009", PISApre15 = TRUE, missing.to.NA = TRUE,
out.folder = "/tmp")
## End(Not run)
# Print 20th to 25th column in PISA 2018 student questionnaire dataset loaded into memory
## Not run:
print(x = cy07_msu_stu_qqq, col.nums = 20:25)
## End(Not run)
# Select data from Albania and Slovenia from PISA 2018 student questionnaire dataset
# and save it under the same file name in a different folder
## Not run:
lsa.select.countries.PISA(data.file = "C:/PISA/cy07_msu_stu_qqq.RData",
cnt.names = c("Albania", "Slovenia"),
output.file = "C:/PISA/Reduced/cy07_msu_stu_qqq.RData")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.