knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
There are several functions in base R to pivot data. These include the t()
and reshape()
functions. There are also well-designed functions in the
tidyverse for transposing data. Nevertheless, some users will prefer
the syntax of proc_transpose()
. This function provides control over output
column naming, and an intuitive set of parameters.
To explore proc_transpose
, let's first create some sample data:
# Create input data dat <- read.table(header = TRUE, text = ' Name Subject Semester1 Semester2 Samma Maths 96 94 Sandy English 76 51 Devesh German 76 95 Rakesh Maths 50 63 Priya English 62 80 Kranti Maths 92 92 William German 87 75') # View data dat # Name Subject Semester1 Semester2 # 1 Samma Maths 96 94 # 2 Sandy English 76 51 # 3 Devesh German 76 95 # 4 Rakesh Maths 50 63 # 5 Priya English 62 80 # 6 Kranti Maths 92 92 # 7 William German 87 75
The proc_tranpose()
function may be executed without any parameters. The
default usage will tranpose all numeric variables and construct generic
column names for the new columns:
# No parameters res <- proc_transpose(dat) # View result res # NAME COL1 COL2 COL3 COL4 COL5 COL6 COL7 # 1 Semester1 96 76 76 50 62 92 87 # 2 Semester2 94 51 95 63 80 92 75
If you didn't want all numeric variables transposed, you could specify which
variable(s) to transpose using the var
parameter. Multiple variables can be
passed as a vector:
# Var parameter res <- proc_transpose(dat, var = "Semester1") # View result res # NAME COL1 COL2 COL3 COL4 COL5 COL6 COL7 # 1 Semester1 96 76 76 50 62 92 87
You may control the output column names using the prefix
,
suffix
, and name
parameters. The prefix
and suffix
will be used to
construct the new transposed column names. The name
parameter value will
be used as the name of the generic "NAME" column, which identify
the var
parameter values:
# With prefix res <- proc_transpose(dat, name = VarName, prefix = Student) # View result res VarName Student1 Student2 Student3 Student4 Student5 Student6 Student7 1 Semester1 96 76 76 50 62 92 87 2 Semester2 94 51 95 63 80 92 75
Here is the same function call with a suffix
:
# With suffix res <- proc_transpose(dat, name = VarName, prefix = S, suffix = Score) # View result res VarName S1Score S2Score S3Score S4Score S5Score S6Score S7Score 1 Semester1 96 76 76 50 62 92 87 2 Semester2 94 51 95 63 80 92 75
Note that since a variable name in R cannot start with a number, some sort of prefix is required.
If your data contains a column with appropriate labels for the transposed columns,
you can assign it to the id
parameter. The id
values will then be used for
the new column names.
# Assign column names from data res <- proc_transpose(dat, name = VarName, id = Name) # View result res VarName Samma Sandy Devesh Rakesh Priya Kranti William 1 Semester1 96 76 76 50 62 92 87 2 Semester2 94 51 95 63 80 92 75
Using two id
parameters tells the function that you want columns that are
combinations of the two variables:
# Two id variables res <- proc_transpose(dat, id = v(Name, Subject)) res # NAME Samma.Maths Sandy.English Devesh.German Rakesh.Maths Priya.English Kranti.Maths William.German # 1 Semester1 96 76 76 50 62 92 87 # 2 Semester2 94 51 95 63 80 92 75
The default delimiter shown above is a dot ("."). The delimiter may be changed
with the delimiter
parameter:
# Underscore delimiter res <- proc_transpose(dat, id = v(Name, Subject), delimiter = "_") res # NAME Samma_Maths Sandy_English Devesh_German Rakesh_Maths Priya_English Kranti_Maths William_German # 1 Semester1 96 76 76 50 62 92 87 # 2 Semester2 94 51 95 63 80 92 75
The by
parameter tells the function to group by the by
variable
before transposing. As you can see below, the by
varible is then retained
on the output dataset so you can identify which rows belong to which group.
# By variable res <- proc_transpose(dat, by = Name, id = Subject, name = Semester) # View result res # Name Semester German English Maths # 1 Devesh Semester1 76 NA NA # 2 Devesh Semester2 95 NA NA # 3 Kranti Semester1 NA NA 92 # 4 Kranti Semester2 NA NA 92 # 5 Priya Semester1 NA 62 NA # 6 Priya Semester2 NA 80 NA # 7 Rakesh Semester1 NA NA 50 # 8 Rakesh Semester2 NA NA 63 # 9 Samma Semester1 NA NA 96 # 10 Samma Semester2 NA NA 94 # 11 Sandy Semester1 NA 76 NA # 12 Sandy Semester2 NA 51 NA # 13 William Semester1 87 NA NA # 14 William Semester2 75 NA NA
The by
parameter is a valuable feature of proc_transpose()
. The by variable
can have a significant effect on the shape of the output data. Let's see
what happens when we use a different by variable.
# By variable res <- proc_transpose(dat, by = Subject, id = Name) # Subject NAME Sandy Priya Devesh William Samma Rakesh Kranti # 1 English Semester1 76 62 NA NA NA NA NA # 2 English Semester2 51 80 NA NA NA NA NA # 3 German Semester1 NA NA 76 87 NA NA NA # 4 German Semester2 NA NA 95 75 NA NA NA # 5 Maths Semester1 NA NA NA NA 96 50 92 # 6 Maths Semester2 NA NA NA NA 94 63 92
Now let's use two by variables:
# Two by variables res <- proc_transpose(dat, by = v(Name, Subject)) # View results res # Name Subject NAME COL1 # 1 Priya English Semester1 62 # 2 Priya English Semester2 80 # 3 Sandy English Semester1 76 # 4 Sandy English Semester2 51 # 5 Devesh German Semester1 76 # 6 Devesh German Semester2 95 # 7 William German Semester1 87 # 8 William German Semester2 75 # 9 Kranti Maths Semester1 92 # 10 Kranti Maths Semester2 92 # 11 Rakesh Maths Semester1 50 # 12 Rakesh Maths Semester2 63 # 13 Samma Maths Semester1 96 # 14 Samma Maths Semester2 94
By transposing one more time on the results of the previous example, you can nearly restore the original data frame.
# Restore original data shape res2 <- proc_transpose(res, by = v(Name, Subject), id = NAME) # View results res2[ , c("Name", "Subject", "Semester1", "Semester2")] # Name Subject Semester1 Semester2 # 1 Priya English 62 80 # 2 Sandy English 76 51 # 3 Devesh German 76 95 # 4 William German 87 75 # 5 Kranti Maths 92 92 # 6 Rakesh Maths 50 63 # 7 Samma Maths 96 94
The proc_sort()
function in the procs package attempts to recreate the most
basic functionality of PROC SORT. The function accepts an input data frame
and returns the sorted result. Before examining the function, first let's create
some sample data:
# Create sample data dat <- read.table(header = TRUE, text = ' ID Name Score 1 David 74 2 Sam 45 3 Bane 87 3 Mary 92 4 Dane 23 5 Jenny 87 6 Simran 63 8 Priya 72 1 David 45 2 Ram 54 3 Bane 87 5 Ken 87') # View data dat # ID Name Score # 1 1 David 74 # 2 2 Sam 45 # 3 3 Bane 87 # 4 3 Mary 92 # 5 4 Dane 23 # 6 5 Jenny 87 # 7 6 Simran 63 # 8 8 Priya 72 # 9 1 David 45 # 10 2 Ram 54 # 11 3 Bane 87 # 12 5 Ken 87
Like proc_transpose()
, the function has a default
usage, which is to sort by all variables:
# Default sort res <- proc_sort(dat) # View results res # ID Name Score # 9 1 David 45 # 1 1 David 74 # 10 2 Ram 54 # 2 2 Sam 45 # 3 3 Bane 87 # 11 3 Bane 87 # 4 3 Mary 92 # 5 4 Dane 23 # 6 5 Jenny 87 # 12 5 Ken 87 # 7 6 Simran 63 # 8 8 Priya 72
To sort by a specific variable, you can use the by
parameter:
# Sort By res <- proc_sort(dat, by = Score) # View results res # ID Name Score # 5 4 Dane 23 # 2 2 Sam 45 # 9 1 David 45 # 10 2 Ram 54 # 7 6 Simran 63 # 8 8 Priya 72 # 1 1 David 74 # 3 3 Bane 87 # 6 5 Jenny 87 # 11 3 Bane 87 # 12 5 Ken 87 # 4 3 Mary 92
You can also sort by two variables:
# Sort By res <- proc_sort(dat, by = v(Score, Name)) # View results res # ID Name Score # 5 4 Dane 23 # 9 1 David 45 # 2 2 Sam 45 # 10 2 Ram 54 # 7 6 Simran 63 # 8 8 Priya 72 # 1 1 David 74 # 3 3 Bane 87 # 11 3 Bane 87 # 6 5 Jenny 87 # 12 5 Ken 87 # 4 3 Mary 92
Notice that when the scores are tied, the data is now sorted alphabetically by name.
To put the highest scores on top, you can sort descending using the order
parameter.
# Sort By res <- proc_sort(dat, by = Score, order = descending) # View results res # ID Name Score # 4 3 Mary 92 # 3 3 Bane 87 # 6 5 Jenny 87 # 11 3 Bane 87 # 12 5 Ken 87 # 1 1 David 74 # 8 8 Priya 72 # 7 6 Simran 63 # 10 2 Ram 54 # 2 2 Sam 45 # 9 1 David 45 # 5 4 Dane 23
The order
instructions can be abbreviated "a" for ascending and "d" for
descending. These abbreviations are convenient when there is more than
one by
variable.
# Sort By res <- proc_sort(dat, by = v(Score, Name), order = v(d, a)) # View results res # ID Name Score # 4 3 Mary 92 # 3 3 Bane 87 # 11 3 Bane 87 # 6 5 Jenny 87 # 12 5 Ken 87 # 1 1 David 74 # 8 8 Priya 72 # 7 6 Simran 63 # 10 2 Ram 54 # 9 1 David 45 # 2 2 Sam 45 # 5 4 Dane 23
The keep
parameter allows you to select only some of the variables for the
output dataset:
# Keep Name and Score only res <- proc_sort(dat, by = Name, keep = v(Name, Score)) # View results res # Name Score # 3 Bane 87 # 11 Bane 87 # 5 Dane 23 # 1 David 74 # 9 David 45 # 6 Jenny 87 # 12 Ken 87 # 4 Mary 92 # 8 Priya 72 # 10 Ram 54 # 2 Sam 45 # 7 Simran 63
Another convenient feature of proc_sort()
is the nodupkey
option. This
option will eliminate duplicates based on the unique combination of values
of the by
variables. For instance, to get a unique list of students, you
could use keep
and options = nodupkey
:
# Keep and Nodupkey res <- proc_sort(dat, by = Name, keep = Name, options = nodupkey) # View results res # Name # 3 Bane # 5 Dane # 1 David # 6 Jenny # 12 Ken # 4 Mary # 8 Priya # 10 Ram # 2 Sam # 7 Simran
There is also a dupkey
option to retain only records with duplicate
key values. This feature
would allow you to easily find students who took the exam twice, and show
the scores for each attempt.
# Keep and dupkey res <- proc_sort(dat, by = Name, options = dupkey) # View results res # ID Name Score # 1 3 Bane 87 # 2 3 Bane 87 # 3 1 David 74 # 4 1 David 45
Most of the time, you will sort alphabetically or numerically. But sometimes you may have character data that you want to sort in a specific order that is not alphabetic. Let's return to the class data we used above:
# Create input data dat <- read.table(header = TRUE, text = ' Name Subject Semester1 Semester2 Samma Maths 96 94 Sandy English 76 51 Devesh German 76 95 Rakesh Maths 50 63 Priya English 62 80 Kranti Maths 92 92 William German 87 75') # View data dat # Name Subject Semester1 Semester2 # 1 Samma Maths 96 94 # 2 Sandy English 76 51 # 3 Devesh German 76 95 # 4 Rakesh Maths 50 63 # 5 Priya English 62 80 # 6 Kranti Maths 92 92 # 7 William German 87 75
What if we wanted to sort "Maths" to the top, followed by "English" and "German"? How could that be accomplished?
To perform a sort in a specific, non-alphabetic order, first create a factor on your desired sort column and order the levels by the intended sort. For instance:
# Create factor with ordered levels dat$Subject <- factor(dat$Subject, levels = c("Maths", "English", "German")) # View attributes attributes(dat$Subject) # $levels # [1] "Maths" "English" "German" # # $class # [1] "factor"
Then use proc_sort()
to sort by the factor variable:
# Sort by factor variable dat2 <- proc_sort(dat, by = Subject) # View result dat2 # Name Subject Semester1 Semester2 # 1 Samma Maths 96 94 # 4 Rakesh Maths 50 63 # 6 Kranti Maths 92 92 # 2 Sandy English 76 51 # 5 Priya English 62 80 # 3 Devesh German 76 95 # 7 William German 87 75
Note that after the sort operation, the "Subject" variable is still a factor:
class(dat2$Subject) # [1] "factor"
If desired, we can turn the variable back into a character using the "as.character" parameter. Following the sort, this parameter will automatically cast any factors on the by parameter to character variables:
# Sort by factor variable dat2 <- proc_sort(dat, by = Subject, as.character = TRUE) # Same results dat2 # Name Subject Semester1 Semester2 # 1 Samma Maths 96 94 # 4 Rakesh Maths 50 63 # 6 Kranti Maths 92 92 # 2 Sandy English 76 51 # 5 Priya English 62 80 # 3 Devesh German 76 95 # 7 William German 87 75 # But now Subject is a character class(dat2$Subject) # [1] "character"
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.