knitr::opts_chunk$set(message = FALSE) library(httptest) start_vignette("introduction")
legco
is an R package to fetch data from the Hong Kong Legislative Council (LegCo) API. It provides
easy access to LegCo’s open data in an integrated manner without having to dig deep into the structures
and parameters of the API. Each function corresponds to a data endpoint of the API. Data is retrieved
in json format and presented in a dataframe.
install.packages("devtools") devtools::install_github("elgarteo/legco")
And to load it before every use:
library(legco)
The LegCo API consists of different databases with separate structures and identifiers. The functions in this package correspond to the various data endpoints of these databases. The databases supported by this package and the associating functions are as follow.
This database contains key information of all the bills that have been presented in LegCo since 1906. This database is accessible with the following function:
all_bills()
fetches the information of bills that have gone through LegCo.Compared to bills()
of the Hansard Database, all_bills()
provides much more comprehensive information
of bills, including all key dates and the proposer.
See here for more information of this database.
This database contains information on matters discussed in Council meeting since the fifth term of LegCo. Hinting by its name, the database is built upon the PDF hansard files. Relying on the bookmarks and section codes of the hansard files, the data endpoints retrieve data from the files directly. This database is accessible with the following functions/data:
hansard()
fetches the information of the hansard files.legco_section_type
is a data frame that explains that meaning of the section codes used in the
hansard files.subjects()
fetches the subjects in the hansard files, which are essentially items in the meeting agenda.speakers()
fetches the speakers in meetings, including members, government officials and secretariat
staff.rundown()
fetches the full text of the hansard files.questions()
fetches the questions put on the government by the members during Council meetings.bills()
fetches bills that have been debated in Council meetings. motions()
fetches motions that have been debated in Council meetings.petitions()
fetches petitions initiated by members in Council meetings.addresses()
fetches addresses made by members or government officials while presenting papers
to the Council.statements()
fetches statements made by government officials in Council meetings.voting_results()
fetches the results of votes conducted in Council meetings.summoning_bells()
fetches instances in which the summoning bell have been rung in Council meetings.When using in combination, these functions are essentially hansard files crawler.
See here for more information of this database.
This database contains the attendance record of Council and committee open meetings since the sixth term.
This database is accessible with the following function:
attendance()
fetches the attendance of members in Council and committee meetings.See here for more information of this database.
This database contains the member list and the meeting schedule of the Council and every committee since the fifth term of LegCo. This database is accessible with the following functions:
term()
fetches the terms in LegCo.session()
fetches the sessions in LegCo.committee()
fetches a list of LegCo committees. membership()
fetches the members of each committee in the form of Member ID.member()
fetches the members in LegCo. member_term()
fetches the terms LegCo members have served in.meeting()
fetches the meeting schedule of LegCo committees in the form of Meet ID.meeting_committee()
fetches which committee holds a meeting.While the purpose of member()
is similar to speakers()
of the Hansard Database, member()
returns only
LegCo members. Their identifiers, i.e. Speaker ID and Member ID, are unique and not interchangeable.
See here for more information of this database.
This database contains voting results of the Council, the House Committee and the Finance Committee and its subcomittees starting from the fifth term of LegCo. This database is accessible with the following function:
voting_record()
fetches the information of votes conducted in the meetings, including the voting
record of each member.While voting_results()
of the Hansard Database returns only the overall result of votes conducted in
Council meetings, voting_record()
returns much more detailed information of votes conducted in Council
and other four committees and subcommittees meetings.
See here for more information of this database.
A generic function legco_api()
is available for access to other databases and endpoints of the LegCo API that
are not specified here.
All functions can also be accessed with the prefix legco_
, e.g. legco_speakers()
and speakers()
return the same result.
LegCo API has a rate limit of approximately 1000 requests per IP per hour. Once the limit is reached, you will not be able to retrieve any data for awhile. If you run into the following error message while pretty sure you have entered the correct parameters, you have probably reached the limit:
# Generate error message by requesting non-existence id speakers(0)
change_state() Sys.sleep(2)
Another limit is the node count limit which is associated with the setting of the LegCo API server. The limit is at most 100 nodes per request, or approximately 20 filtering conditions in most cases. For instance, the following code contains 21 filtering conditions and thus reach the node count limit:
# A request with 21 filtering conditions x <- c(1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41) speakers(speaker_id = x)
If you run into the error message above, you need to break it down to multiple smaller requests, like one of the examples below.
Note that numeric filtering conditions that are sequential are not subjected to this limit:
# A request with sequential numeric filtering conditions x <- c(1:50) speakers(speaker_id = x)
The code above will execute just fine without reaching the node count limit even though it effectively contains 50 filtering conditions.
It is common for the connection to the LegCo API to experience SSL error from time to time, especially during repeated requests. This can usually be resolved simply by retrying. This package automatically retries the request once when an SSL error occurs with the following messages:
message("Encountered Error in open.connection(con, \"rb\"): LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to app.legco.gov.hk:443") message("Possible common connection problem resolvable by retrying.") message("Retrying...")
If the same error happens again after retrying, the process will be terminated with an error message. In this case, the cause of the problem is likely to be a genuine SSL error.
Another common problem is that the LegCo API sometimes returns an empty json file when it is not supposed to. Again, this can usually be resolved by retrying. This package automatically retries the request once to make sure that an invalid search query or rate limit is not the cause of the problem. The following messages will be shown if this problem occurs:
message("The request did not return any data.") message("Possible common connection problem resolvable by retrying.") message("Retrying...")
If the API returns an empty result again, the process will be terminated with an error message. In this case, the cause is likely to be invalid search parameters or that you have exceeded the rate limit (see here).
The following code shows all the Council meetings held from May 1, 2018 to May 31, 2018.
hansard(from = "2018-05-01", to = "2018-05-31")
change_state() Sys.sleep(2)
To fetch the items on the agenda of the meeting on May 23, 2018:
subjects(hansard_id = 2451)
subjects(hansard_id = 2451, to = "2020-03-27")
change_state() Sys.sleep(2)
In this example, we will look into the motion Not forgetting the 4 June incident.
The section codes indicate the type of the business. For example, the section code "mbm" of the Not forgetting the 4 June incident means that it is a member's motion:
legco_section_type[legco_section_type$SectionCode == "mbm", ]
The RundownID indicates the location of the first paragraph of the matter concerned in the hansard file. In other words, paragraphs of RundownID between 828783 and 828826 (one RundownID prior to the next item on the agenda) are the records of the matter the Not forgetting the 4 June incident during the meeting. The full text of these paragraphs can be fetched by:
w <- rundown(rundown_id = 828783:828826) # Order by RundownID, i.e. chronological order w <- w[order(w$RundownID), ] rownames(w) <- 1:nrow(w) # Some two paragraphs of the hansard relevant to the discussion of the motion w$Content[41:42]
w <- rundown(rundown_id = 828783:828826, to = "2020-03-27") # Order by RundownID, i.e. chronological order w <- w[order(w$RundownID), ] rownames(w) <- 1:nrow(w) # Some two paragraphs of the hansard relevant to the discussion of the motion w$Content[41:42]
change_state() Sys.sleep(2)
To check which members have spoken about the motion and to look up some of their names:
x <- unique(w$SpeakerID[!is.na(w$SpeakerID)]) speakers(speaker_id = x)
The code above produces 27 filtering conditions which exceed the node count limit. If you need have parameters longer than that, break them down into multiple requests:
y1 <- speakers(speaker_id = x[1:19]) y2 <- speakers(speaker_id = x[20:27]) y <- rbind(y1, y2) # Show members who have spoken about this motion head(y[y$Type == "MB", ])
change_state() Sys.sleep(2)
To find out the voting results of the motion:
v <- voting_results(from = w$MeetingDate[1], to = w$MeetingDate[1], section_code = "mbm") v[v$Subject == "Not forgetting the 4 June incident", ]
change_state() Sys.sleep(2)
It is possible to fetch the detailed voting record of each member on this motion. To get a list of members who voted yes:
v <- voting_record(from = w$MeetingDate[1], to = w$MeetingDate[1]) v[v$Vote == "Yes", 32:33]
change_state() Sys.sleep(2)
To check the attendance of this meeting:
# Find out MeetID of the Council meeting on May 23, 2018 meeting(from = w$MeetingDate[1], to = w$MeetingDate[1])
change_state() Sys.sleep(2)
z <- attendance(meet_id = 126590) # Show members who were present z[z$PresentAbsent == "Present", 7:9]
z <- attendance(meet_id = 126590, to = "2020-03-27") # Show members who were present z[z$PresentAbsent == "Present", 7:9]
end_vignette()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.