two_layer_neural_network

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Source: https://github.com/jcjohnson/pytorch-examples#pytorch-nn

In this example we use the torch nn package to implement our two-layer network:

In R

Select device

library(rTorch)

device = torch$device('cpu')

# device = torch.device('cuda') # Uncomment this to run on GPU
 N is batch size; 
 D_in is input dimension;
 H is hidden dimension; 
 D_out is output dimension.

Create datasets

torch$manual_seed(0)

N <- 64L; D_in <- 1000L; H <- 100L; D_out <- 10L

# Create random Tensors to hold inputs and outputs
x = torch$randn(N, D_in, device=device)
y = torch$randn(N, D_out, device=device)

Define the model

Use the nn package to define our model as a sequence of layers. nn.Sequential is a Module which contains other Modules, and applies them in sequence to produce its output. Each Linear Module computes output from input using a linear function, and holds internal Tensors for its weight and bias. After constructing the model we use the .to() method to move it to the desired device.

model = torch$nn$Sequential(
          torch$nn$Linear(D_in, H),
          torch$nn$ReLU(),
          torch$nn$Linear(H, D_out))$to(device)

Loss function

The nn package also contains definitions of popular loss functions; in this case we will use Mean Squared Error (MSE) as our loss function. Setting reduction='sum' means that we are computing the sum of squared errors rather than the mean; this is for consistency with the examples above where we manually compute the loss, but in practice it is more common to use mean squared error as a loss by setting reduction='elementwise_mean'.

loss_fn = torch$nn$MSELoss(reduction = 'sum')

Iterate through batches

learning_rate = 1e-4

for (t in 1:500) {
  # Forward pass: compute predicted y by passing x to the model. Module objects
  # override the __call__ operator so you can call them like functions. When
  # doing so you pass a Tensor of input data to the Module and it produces
  # a Tensor of output data.
  y_pred = model(x)

  # Compute and print loss. We pass Tensors containing the predicted and true
  # values of y, and the loss function returns a Tensor containing the loss.
  loss = loss_fn(y_pred, y)

  cat(t, "\t")
  cat(loss$item(), "\n")

  # Zero the gradients before running the backward pass.
  model$zero_grad()

  # Backward pass: compute gradient of the loss with respect to all the learnable
  # parameters of the model. Internally, the parameters of each Module are stored
  # in Tensors with requires_grad=True, so this call will compute gradients for
  # all learnable parameters in the model.
  loss$backward()

  # Update the weights using gradient descent. Each parameter is a Tensor, so
  # we can access its data and gradients like we did before.
  with(torch$no_grad(), {
      for (param in iterate(model$parameters())) {
        # in Python this code is much simpler. In R we have to do some conversions

        # param$data <- torch$sub(param$data,
        #                         torch$mul(param$grad$float(),
        #                           torch$scalar_tensor(learning_rate)))

        param$data <- param$data - param$grad * learning_rate
      }
   })
}  

These two expression are equivalent, with the first being the long version natural way of doing it in PyTorch. The second is using the generics in R for subtraction, multiplication and scalar conversion.

param$data <- torch$sub(param$data,
                        torch$mul(param$grad$float(),
                          torch$scalar_tensor(learning_rate)))
}
param$data <- param$data - param$grad * learning_rate


Try the rTorch package in your browser

Any scripts or data that you put into this service are public.

rTorch documentation built on Jan. 13, 2021, 4:32 p.m.