Publishing to Scribd from R

This vignette walks through the basics of how to publish documents from R to Scribd using rscribd. Scribd is self-described as "the world’s largest collection of e-books and written works", consisting of premium, publisher-supplied content as well as "millions" of user-uploaded documents.

About rscribd

rscribd enables useRs to upload documents directly to Scribd, making them either publicly viewable or using Scribd as a personal library of private digital content. Scribd allows users to organize content into public or private "collections," tag documents to make them easily searchable by others, and supports all standard content licenses (full copyright, public domain, and all Creative Commons licenses) in addition to providing features for publishing paid and DRM-protected content.

The tutorial below demonstrates how to upload a document as part of a knitr reproducible document workflow, but possible use cases are numerous. Scribd and rscribd allow users to upload documents stored locally as well as on a remote server and the service can import files in a wide variety of formats. (Note, however, that image formats are not allowed (to store images online, consider using the imguR package.)

Installation

You can install rscribd from CRAN:

if(!require("rscribd")){
    install.packages("rscribd")
    library("rscribd")
}

A development version of rscribd is also available on GitHub, which can be installed using:

library("devtools")
install_github("leeper/rscribd")

Setup

In order to use rscribd, you need to have a Scribd API key. This can be passed as the api_key argument to all rscribd functions. It is also a good idea, from a security perspective, to pass an api_secret_key argument, which will mean that API requests are additionally MD5-hashed to prevent unauthorized use of an API key. To sign requests, your API account must be configured in the Scribd API options to have "Require API Signature" set to "Require signature". We can setup both of these arguments globally using options:

load("apikeys.Rdata") # load API keys from `save`d local copy
options('scribd_api_key' = apikey)
options('scribd_secret_key' = secretkey)

If you are trying to build a public-facing app on top of rscribd, you may also want to leverage a user-specific session key returned by scribd_login. This allows user-specific operations without requiring users to register an API key. Once a session key is obtained, it can be passed to any rscribd function using the session_key argument (or have that argument set globally using options(scribd_session_key)). The session key is reasonably long lived. If this value is not specified, all operations are performed (by default) on the account associated with the API key.

The Scribd API also supports a my_user_id argument, which associates a particular API request with a "phantom" user account within an application. In other words, if you want to use a single API account for multiple users (perhaps because you are integrating rscribd into another user-facing application, such as a Shiny app) you can affiliate particular API requests (and thus particular documents or collections) with multiple distinct users without creating an API key for each user. You can read more about this type of authentication in the Scribd API documentation.

Creating a document

To demonstrate a basic document upload, we will create a simple knitr-based literate programming document, which we will then send to Scribd. We'll use the knitr-minimal.Rnw example file from knitr. We can start by knitting the document to PDF:

library("knitr")
invisible(file.copy(system.file("examples/knitr-minimal.Rnw", package = "knitr"),
                    "knitr-minimal.Rnw"))
system('Rscript -e "knitr::knit2pdf(\'knitr-minimal.Rnw\', quiet=TRUE)"')
doc_output <- "knitr-minimal.pdf"

Then, to upload the document to Scribd, we only need one line of code:

mydoc <- doc_upload(doc_output)

By default, this creates a public-facing Scribd document. It can also be made private using the access = "private" argument to doc_upload. Note how the api_key and api_secret_key are also included by default from the global options we specified above. We can take a look at the response object created by doc_upload:

print(mydoc)

This provides basic details about the document, most important among them the doc_id that we can use to refer to our document in other function calls. For example, we can use doc_settings to retrieve metadata information about a document:

doc_settings(mydoc$doc_id)

If we want to change any of these settings, we could use doc_change:

doc_change(mydoc$doc_id, title = "My first rscribd upload", description = "A knitr/rscribd example")

One could also upload a revision of a document using doc_upload with the doc argument specified, like this:

doc_upload(doc_output, doc = mydoc)

This maintains the Scribd document_id and metadata but replaces the physical document. Using doc_change to update the edition metadata setting for the document additionally enables versioning of the document.

Collections

If you upload multiple documents and wish to organize them, you can use Scribd collections to put like documents together. (You can also tag and categorize documents using doc_change, perhaps to make them easier for others to find.) A collection is simply a named container for one or more documents.

To create a collection, use coll_create:

mycoll <- coll_create("rscribd documents", "Collection of rscribd documents")

You can use coll_update to modify the title, description, or public access status for a collection after it is created. coll_add to add a document to a collection and coll_docs will list the documents stored in a collection:

coll_add(mycoll, mydoc)
coll_docs(mycoll)

Removing a document from a collection works similarly to coll_add:

coll_remove(mycoll, mydoc)

To delete a collection entirely, you can simply call: coll_delete(mycoll).

Embedding Scribd documents

Once uploaded to Scribd, it is possible to embed the document in an HTML page:

cat(doc_embed(mydoc))
unlink("knitr-minimal*")
unlink("./figure", recursive=TRUE)


cloudyr/rscribd documentation built on May 13, 2019, 8:22 p.m.