Introduction to legco

knitr::opts_chunk$set(message = FALSE)
library(httptest)
start_vignette("introduction")

legco is an R package to fetch data from the Hong Kong Legislative Council (LegCo) API. It provides easy access to LegCo’s open data in an integrated manner without having to dig deep into the structures and parameters of the API. Each function corresponds to a data endpoint of the API. Data is retrieved in json format and presented in a dataframe.

Installing legco

install.packages("devtools")
devtools::install_github("elgarteo/legco")

And to load it before every use:

library(legco)

Understanding the LegCo API

The LegCo API consists of different databases with separate structures and identifiers. The functions in this package correspond to the various data endpoints of these databases. The databases supported by this package and the associating functions are as follow.

  1. Bills Database

This database contains key information of all the bills that have been presented in LegCo since 1906. This database is accessible with the following function:

Compared to bills() of the Hansard Database, all_bills() provides much more comprehensive information of bills, including all key dates and the proposer.

See here for more information of this database.

  1. Hansard Database

This database contains information on matters discussed in Council meeting since the fifth term of LegCo. Hinting by its name, the database is built upon the PDF hansard files. Relying on the bookmarks and section codes of the hansard files, the data endpoints retrieve data from the files directly. This database is accessible with the following functions/data:

When using in combination, these functions are essentially hansard files crawler.

See here for more information of this database.

  1. Meeting Attendance Database

This database contains the attendance record of Council and committee open meetings since the sixth term.

This database is accessible with the following function:

See here for more information of this database.

  1. Meeting Schedule Database

This database contains the member list and the meeting schedule of the Council and every committee since the fifth term of LegCo. This database is accessible with the following functions:

While the purpose of member() is similar to speakers() of the Hansard Database, member() returns only LegCo members. Their identifiers, i.e. Speaker ID and Member ID, are unique and not interchangeable.

See here for more information of this database.

  1. Voting Results Database

This database contains voting results of the Council, the House Committee and the Finance Committee and its subcomittees starting from the fifth term of LegCo. This database is accessible with the following function:

While voting_results() of the Hansard Database returns only the overall result of votes conducted in Council meetings, voting_record() returns much more detailed information of votes conducted in Council and other four committees and subcommittees meetings.

See here for more information of this database.


A generic function legco_api() is available for access to other databases and endpoints of the LegCo API that are not specified here.

All functions can also be accessed with the prefix legco_, e.g. legco_speakers() and speakers() return the same result.

API Limits

Rate Limit

LegCo API has a rate limit of approximately 1000 requests per IP per hour. Once the limit is reached, you will not be able to retrieve any data for awhile. If you run into the following error message while pretty sure you have entered the correct parameters, you have probably reached the limit:

# Generate error message by requesting non-existence id
speakers(0)
change_state()
Sys.sleep(2)

Node Count Limit

Another limit is the node count limit which is associated with the setting of the LegCo API server. The limit is at most 100 nodes per request, or approximately 20 filtering conditions in most cases. For instance, the following code contains 21 filtering conditions and thus reach the node count limit:

# A request with 21 filtering conditions
x <- c(1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 
       23, 25, 27, 29, 31, 33, 35, 37, 39, 41)
speakers(speaker_id = x)

If you run into the error message above, you need to break it down to multiple smaller requests, like one of the examples below.

Note that numeric filtering conditions that are sequential are not subjected to this limit:

# A request with sequential numeric filtering conditions
x <- c(1:50)
speakers(speaker_id = x)

The code above will execute just fine without reaching the node count limit even though it effectively contains 50 filtering conditions.

Common Connection Problems

It is common for the connection to the LegCo API to experience SSL error from time to time, especially during repeated requests. This can usually be resolved simply by retrying. This package automatically retries the request once when an SSL error occurs with the following messages:

message("Encountered Error in open.connection(con, \"rb\"): LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to app.legco.gov.hk:443")
message("Possible common connection problem resolvable by retrying.")
message("Retrying...")

If the same error happens again after retrying, the process will be terminated with an error message. In this case, the cause of the problem is likely to be a genuine SSL error.

Another common problem is that the LegCo API sometimes returns an empty json file when it is not supposed to. Again, this can usually be resolved by retrying. This package automatically retries the request once to make sure that an invalid search query or rate limit is not the cause of the problem. The following messages will be shown if this problem occurs:

message("The request did not return any data.")
message("Possible common connection problem resolvable by retrying.")
message("Retrying...")

If the API returns an empty result again, the process will be terminated with an error message. In this case, the cause is likely to be invalid search parameters or that you have exceeded the rate limit (see here).

Examples

The following code shows all the Council meetings held from May 1, 2018 to May 31, 2018.

hansard(from = "2018-05-01", to = "2018-05-31")
change_state()
Sys.sleep(2)

To fetch the items on the agenda of the meeting on May 23, 2018:

subjects(hansard_id = 2451)
subjects(hansard_id = 2451, to = "2020-03-27")
change_state()
Sys.sleep(2)

In this example, we will look into the motion Not forgetting the 4 June incident.

The section codes indicate the type of the business. For example, the section code "mbm" of the Not forgetting the 4 June incident means that it is a member's motion:

legco_section_type[legco_section_type$SectionCode == "mbm", ]

The RundownID indicates the location of the first paragraph of the matter concerned in the hansard file. In other words, paragraphs of RundownID between 828783 and 828826 (one RundownID prior to the next item on the agenda) are the records of the matter the Not forgetting the 4 June incident during the meeting. The full text of these paragraphs can be fetched by:

w <- rundown(rundown_id = 828783:828826)
# Order by RundownID, i.e. chronological order
w <- w[order(w$RundownID), ]
rownames(w) <- 1:nrow(w)
# Some two paragraphs of the hansard relevant to the discussion of the motion
w$Content[41:42]
w <- rundown(rundown_id = 828783:828826, to = "2020-03-27")
# Order by RundownID, i.e. chronological order
w <- w[order(w$RundownID), ]
rownames(w) <- 1:nrow(w)
# Some two paragraphs of the hansard relevant to the discussion of the motion
w$Content[41:42]
change_state()
Sys.sleep(2)

To check which members have spoken about the motion and to look up some of their names:

x <- unique(w$SpeakerID[!is.na(w$SpeakerID)])
speakers(speaker_id = x)

The code above produces 27 filtering conditions which exceed the node count limit. If you need have parameters longer than that, break them down into multiple requests:

y1 <- speakers(speaker_id = x[1:19])
y2 <- speakers(speaker_id = x[20:27])
y <- rbind(y1, y2)
# Show members who have spoken about this motion
head(y[y$Type == "MB", ])
change_state()
Sys.sleep(2)

To find out the voting results of the motion:

v <- voting_results(from = w$MeetingDate[1], to = w$MeetingDate[1], section_code = "mbm")
v[v$Subject == "Not forgetting the 4 June incident", ]
change_state()
Sys.sleep(2)

It is possible to fetch the detailed voting record of each member on this motion. To get a list of members who voted yes:

v <- voting_record(from = w$MeetingDate[1], to = w$MeetingDate[1])
v[v$Vote == "Yes", 32:33]
change_state()
Sys.sleep(2)

To check the attendance of this meeting:

# Find out MeetID of the Council meeting on May 23, 2018
meeting(from = w$MeetingDate[1], to = w$MeetingDate[1])
change_state()
Sys.sleep(2)
z <- attendance(meet_id = 126590)
# Show members who were present
z[z$PresentAbsent == "Present", 7:9] 
z <- attendance(meet_id = 126590, to = "2020-03-27")
# Show members who were present
z[z$PresentAbsent == "Present", 7:9] 
end_vignette()


Try the legco package in your browser

Any scripts or data that you put into this service are public.

legco documentation built on Oct. 16, 2021, 5:09 p.m.