Copyright licenses (or copyleft licenses!) are a core part of how the internet functions - they're what allows us to reuse, modify and inspect code, including the R programming language.
The folks at the Open Source Initiative have built an API containing the metadata about every copyright license they track, including keywords associated with it, its approval status, whether it's been superseded, and various pointers to places where the actual license content can be obtained. This R package acts as a connector to that API and provides a few other goodies.
The core of the package is retrieving metadata about all the licenses the OSI lists. This can be done in one of three ways. The broadest is license_list()
; this gets all the OSI licenses (90 as of writing) and their associated metadata.
For a more precise filter you can look at license_by_keyword()
, which just retrieves the licenses with certain keywords in their metadata - the full list of supported keywords can be seen on the help page for that function.
Finally, if you already have the ID of a license ("GPL-2.0", say) you can use license_by_id()
to get just that license's metadata. It's fully vectorised, so plugging in multiple IDs is completely fine.
Once you've got your metadata, we move on to...
Extract things out of it! License metadata is a complex thing, and comes back from the API as a list. While some effort is made to tidy it up, it's always going to be some amount of jumbled just because it's representing a lot of different concepts within it.
To partially ameliorate this and provide some convenience, the osi
package contains various functions designed to extract specific components from a whole set of licenses' metadata, including:
extract_id()
;extract_name()
;NA
if it hasn't been), with extract_superseded()
, and;extract_keywords()
All these functions are fully documented; if you see more you'd like, head on down to the 'Feedback' section below.
One thing the OSI API doesn't have is the actual text of each license, which is obviously fairly useful to have to hand (we're talking about legal questions, after all). Luckily what it does have, a lot of the time, is pointers to where the plaintext content of the license lives, and we can use the metadata to go and grab it.
The license_text()
function lets you do just that; if you give it metadata from the OSI API, it goes through grabbing the plain text version of each license wherever possible, producing a data.frame of three columns - license ID, the location of the plain text version, and the actual text itself. In the event it can't retrieve a license's text (because no link to it was available), it just provides an NA instead. Again, this is fully vectorised and can be used over a whole set of metadata from a whole set of licenses.
If you have ideas for other things you'd like to see that would make playing with this data easier (or, heck, even other features for the API, I can pass them up the chain) the best approach is to either request it or add it!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.