This notebook contains the code samples found in Chapter 6, Section 2 of Deep Learning with R. Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.


A first recurrent layer in Keras

The process you just naively implemented in R corresponds to an actual Keras layer – layer_simple_rnn().

layer_simple_rnn(units = 32)

There is one minor difference: layer_simple_rnn() processes batches of sequences, like all other Keras layers, not a single sequence as in the R example. This means it takes inputs of shape (batch_size, timesteps, input_features), rather than (timesteps, input_features).

Like all recurrent layers in Keras, layer_simple_rnn() can be run in two different modes: it can return either the full sequences of successive outputs for each timestep (a 3D tensor of shape (batch_size, timesteps, output_features)) or only the last output for each input sequence (a 2D tensor of shape (batch_size, output_features)). These two modes are controlled by the return_sequences constructor argument. Let’s look at an example that uses layer_simple_rnn() and returns the last state:

library(keras)
model <- keras_model_sequential() %>% 
  layer_embedding(input_dim = 10000, output_dim = 32) %>% 
  layer_simple_rnn(units = 32)
summary(model)
_______________________________________________________________________________________________________________
Layer (type)                                     Output Shape                                 Param #          
===============================================================================================================
embedding_6 (Embedding)                          (None, None, 32)                             320000           
_______________________________________________________________________________________________________________
simple_rnn_8 (SimpleRNN)                         (None, 32)                                   2080             
===============================================================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_______________________________________________________________________________________________________________
model <- keras_model_sequential() %>% 
  layer_embedding(input_dim = 10000, output_dim = 32) %>% 
  layer_simple_rnn(units = 32, return_sequences = TRUE)
summary(model)
_______________________________________________________________________________________________________________
Layer (type)                                     Output Shape                                 Param #          
===============================================================================================================
embedding_7 (Embedding)                          (None, None, 32)                             320000           
_______________________________________________________________________________________________________________
simple_rnn_9 (SimpleRNN)                         (None, None, 32)                             2080             
===============================================================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_______________________________________________________________________________________________________________

It is sometimes useful to stack several recurrent layers one after the other in order to increase the representational power of a network. In such a setup, you have to get all intermediate layers to return full sequences:

model <- keras_model_sequential() %>% 
  layer_embedding(input_dim = 10000, output_dim = 32) %>% 
  layer_simple_rnn(units = 32, return_sequences = TRUE) %>% 
  layer_simple_rnn(units = 32, return_sequences = TRUE) %>%
  layer_simple_rnn(units = 32, return_sequences = TRUE) %>%
  layer_simple_rnn(units = 32)  # This last layer only returns the last outputs.
summary(model)
_______________________________________________________________________________________________________________
Layer (type)                                     Output Shape                                 Param #          
===============================================================================================================
embedding_8 (Embedding)                          (None, None, 32)                             320000           
_______________________________________________________________________________________________________________
simple_rnn_10 (SimpleRNN)                        (None, None, 32)                             2080             
_______________________________________________________________________________________________________________
simple_rnn_11 (SimpleRNN)                        (None, None, 32)                             2080             
_______________________________________________________________________________________________________________
simple_rnn_12 (SimpleRNN)                        (None, None, 32)                             2080             
_______________________________________________________________________________________________________________
simple_rnn_13 (SimpleRNN)                        (None, 32)                                   2080             
===============================================================================================================
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_______________________________________________________________________________________________________________

Now let’s try to use such a model on the IMDB movie review classification problem. First, let’s preprocess the data:

library(keras)
max_features <- 10000  # Number of words to consider as features
maxlen <- 500  # Cuts off texts after this many words (among the max_features most common words)
batch_size <- 32
cat("Loading data...\n")
Loading data...
imdb <- dataset_imdb(num_words = max_features)
c(c(input_train, y_train), c(input_test, y_test)) %<-% imdb 
cat(length(input_train), "train sequences\n")
25000 train sequences
cat(length(input_test), "test sequences")
25000 test sequences
cat("Pad sequences (samples x time)\n")
Pad sequences (samples x time)
input_train <- pad_sequences(input_train, maxlen = maxlen)
input_test <- pad_sequences(input_test, maxlen = maxlen)
cat("input_train shape:", dim(input_train), "\n")
input_train shape: 25000 500 
cat("input_test shape:", dim(input_test), "\n")
input_test shape: 25000 500 

Let’s train a simple recurrent network using a layer_embedding() and layer_simple_rnn().

model <- keras_model_sequential() %>%
  layer_embedding(input_dim = max_features, output_dim = 32) %>%
  layer_simple_rnn(units = 32) %>%
  layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)
history <- model %>% fit(
  input_train, y_train,
  epochs = 10,
  batch_size = 128,
  validation_split = 0.2
)

Let’s display the training and validation loss and accuracy:

plot(history)

As a reminder, in chapter 3, the first naive approach to this dataset got you to a test accuracy of 88%. Unfortunately, this small recurrent network doesn’t perform well compared to this baseline (only 84% validation accuracy). Part of the problem is that your inputs only consider the first 500 words, rather than full sequences – hence the RNN has access to less information than the earlier baseline model. The remainder of the problem is that layer_simple_rnn() isn’t good at processing long sequences, such as text. Other types of recurrent layers perform much better. Let’s look at some more advanced layers.

A concrete LSTM example in Keras

Now let’s switch to more practical concerns: we will set up a model using a LSTM layer and train it on the IMDB data. Here’s the network,similar to the one with layer_simple_rnn() that we just presented. We only specify the output dimensionality of the LSTM layer, and leave every other argument (there are lots) to the Keras defaults. Keras has good defaults, and things will almost always “just work” without you having to spend time tuning parameters by hand.

model <- keras_model_sequential() %>% 
  layer_embedding(input_dim = max_features, output_dim = 32) %>% 
  layer_lstm(units = 32) %>% 
  layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
  optimizer = "rmsprop", 
  loss = "binary_crossentropy", 
  metrics = c("acc")
)
history <- model %>% fit(
  input_train, y_train,
  epochs = 10,
  batch_size = 128,
  validation_split = 0.2
)
plot(history)

LS0tCnRpdGxlOiAiVW5kZXJzdGFuZGluZyByZWN1cnJlbnQgbmV1cmFsIG5ldHdvcmtzIgpvdXRwdXQ6IAogIGh0bWxfbm90ZWJvb2s6IAogICAgdGhlbWU6IGNlcnVsZWFuCiAgICBoaWdobGlnaHQ6IHRleHRtYXRlCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldCh3YXJuaW5nID0gRkFMU0UsIG1lc3NhZ2UgPSBGQUxTRSkKYGBgCgoqKioKClRoaXMgbm90ZWJvb2sgY29udGFpbnMgdGhlIGNvZGUgc2FtcGxlcyBmb3VuZCBpbiBDaGFwdGVyIDYsIFNlY3Rpb24gMiBvZiBbRGVlcCBMZWFybmluZyB3aXRoIFJdKGh0dHBzOi8vd3d3Lm1hbm5pbmcuY29tL2Jvb2tzL2RlZXAtbGVhcm5pbmctd2l0aC1yKS4gTm90ZSB0aGF0IHRoZSBvcmlnaW5hbCB0ZXh0IGZlYXR1cmVzIGZhciBtb3JlIGNvbnRlbnQsIGluIHBhcnRpY3VsYXIgZnVydGhlciBleHBsYW5hdGlvbnMgYW5kIGZpZ3VyZXM6IGluIHRoaXMgbm90ZWJvb2ssIHlvdSB3aWxsIG9ubHkgZmluZCBzb3VyY2UgY29kZSBhbmQgcmVsYXRlZCBjb21tZW50cy4KCioqKgoKIyMgQSBmaXJzdCByZWN1cnJlbnQgbGF5ZXIgaW4gS2VyYXMKClRoZSBwcm9jZXNzIHlvdSBqdXN0IG5haXZlbHkgaW1wbGVtZW50ZWQgaW4gUiBjb3JyZXNwb25kcyB0byBhbiBhY3R1YWwgS2VyYXMgbGF5ZXIgLS0gYGxheWVyX3NpbXBsZV9ybm4oKWAuCgpgYGB7ciBldmFsPUZBTFNFfQpsYXllcl9zaW1wbGVfcm5uKHVuaXRzID0gMzIpCmBgYAoKVGhlcmUgaXMgb25lIG1pbm9yIGRpZmZlcmVuY2U6IGBsYXllcl9zaW1wbGVfcm5uKClgIHByb2Nlc3NlcyBiYXRjaGVzIG9mIHNlcXVlbmNlcywgbGlrZSBhbGwgb3RoZXIgS2VyYXMgbGF5ZXJzLCBub3QgYSBzaW5nbGUgc2VxdWVuY2UgYXMgaW4gdGhlIFIgZXhhbXBsZS4gVGhpcyBtZWFucyBpdCB0YWtlcyBpbnB1dHMgb2Ygc2hhcGUgYChiYXRjaF9zaXplLCB0aW1lc3RlcHMsIGlucHV0X2ZlYXR1cmVzKWAsIHJhdGhlciB0aGFuIGAodGltZXN0ZXBzLCBpbnB1dF9mZWF0dXJlcylgLgoKTGlrZSBhbGwgcmVjdXJyZW50IGxheWVycyBpbiBLZXJhcywgYGxheWVyX3NpbXBsZV9ybm4oKWAgY2FuIGJlIHJ1biBpbiB0d28gZGlmZmVyZW50IG1vZGVzOiBpdCBjYW4gcmV0dXJuIGVpdGhlciB0aGUgZnVsbCBzZXF1ZW5jZXMgb2Ygc3VjY2Vzc2l2ZSBvdXRwdXRzIGZvciBlYWNoIHRpbWVzdGVwIChhIDNEIHRlbnNvciBvZiBzaGFwZSBgKGJhdGNoX3NpemUsIHRpbWVzdGVwcywgb3V0cHV0X2ZlYXR1cmVzKWApIG9yIG9ubHkgdGhlIGxhc3Qgb3V0cHV0IGZvciBlYWNoIGlucHV0IHNlcXVlbmNlIChhIDJEIHRlbnNvciBvZiBzaGFwZSBgKGJhdGNoX3NpemUsIG91dHB1dF9mZWF0dXJlcylgKS4gVGhlc2UgdHdvIG1vZGVzIGFyZSBjb250cm9sbGVkIGJ5IHRoZSBgcmV0dXJuX3NlcXVlbmNlc2AgY29uc3RydWN0b3IgYXJndW1lbnQuIExldCdzIGxvb2sgYXQgYW4gZXhhbXBsZSB0aGF0IHVzZXMgYGxheWVyX3NpbXBsZV9ybm4oKWAgYW5kIHJldHVybnMgdGhlIGxhc3Qgc3RhdGU6CgoKCmBgYHtyfQpsaWJyYXJ5KGtlcmFzKQptb2RlbCA8LSBrZXJhc19tb2RlbF9zZXF1ZW50aWFsKCkgJT4lIAogIGxheWVyX2VtYmVkZGluZyhpbnB1dF9kaW0gPSAxMDAwMCwgb3V0cHV0X2RpbSA9IDMyKSAlPiUgCiAgbGF5ZXJfc2ltcGxlX3Jubih1bml0cyA9IDMyKQoKc3VtbWFyeShtb2RlbCkKYGBgCgpgYGB7cn0KbW9kZWwgPC0ga2VyYXNfbW9kZWxfc2VxdWVudGlhbCgpICU+JSAKICBsYXllcl9lbWJlZGRpbmcoaW5wdXRfZGltID0gMTAwMDAsIG91dHB1dF9kaW0gPSAzMikgJT4lIAogIGxheWVyX3NpbXBsZV9ybm4odW5pdHMgPSAzMiwgcmV0dXJuX3NlcXVlbmNlcyA9IFRSVUUpCgpzdW1tYXJ5KG1vZGVsKQpgYGAKCkl0IGlzIHNvbWV0aW1lcyB1c2VmdWwgdG8gc3RhY2sgc2V2ZXJhbCByZWN1cnJlbnQgbGF5ZXJzIG9uZSBhZnRlciB0aGUgb3RoZXIgaW4gb3JkZXIgdG8gaW5jcmVhc2UgdGhlIHJlcHJlc2VudGF0aW9uYWwgcG93ZXIgb2YgYSBuZXR3b3JrLiAKSW4gc3VjaCBhIHNldHVwLCB5b3UgaGF2ZSB0byBnZXQgYWxsIGludGVybWVkaWF0ZSBsYXllcnMgdG8gcmV0dXJuIGZ1bGwgc2VxdWVuY2VzOgoKYGBge3J9Cm1vZGVsIDwtIGtlcmFzX21vZGVsX3NlcXVlbnRpYWwoKSAlPiUgCiAgbGF5ZXJfZW1iZWRkaW5nKGlucHV0X2RpbSA9IDEwMDAwLCBvdXRwdXRfZGltID0gMzIpICU+JSAKICBsYXllcl9zaW1wbGVfcm5uKHVuaXRzID0gMzIsIHJldHVybl9zZXF1ZW5jZXMgPSBUUlVFKSAlPiUgCiAgbGF5ZXJfc2ltcGxlX3Jubih1bml0cyA9IDMyLCByZXR1cm5fc2VxdWVuY2VzID0gVFJVRSkgJT4lCiAgbGF5ZXJfc2ltcGxlX3Jubih1bml0cyA9IDMyLCByZXR1cm5fc2VxdWVuY2VzID0gVFJVRSkgJT4lCiAgbGF5ZXJfc2ltcGxlX3Jubih1bml0cyA9IDMyKSAgIyBUaGlzIGxhc3QgbGF5ZXIgb25seSByZXR1cm5zIHRoZSBsYXN0IG91dHB1dHMuCgpzdW1tYXJ5KG1vZGVsKQpgYGAKCk5vdyBsZXQncyB0cnkgdG8gdXNlIHN1Y2ggYSBtb2RlbCBvbiB0aGUgSU1EQiBtb3ZpZSByZXZpZXcgY2xhc3NpZmljYXRpb24gcHJvYmxlbS4gRmlyc3QsIGxldCdzIHByZXByb2Nlc3MgdGhlIGRhdGE6CgpgYGB7cn0KbGlicmFyeShrZXJhcykKCm1heF9mZWF0dXJlcyA8LSAxMDAwMCAgIyBOdW1iZXIgb2Ygd29yZHMgdG8gY29uc2lkZXIgYXMgZmVhdHVyZXMKbWF4bGVuIDwtIDUwMCAgIyBDdXRzIG9mZiB0ZXh0cyBhZnRlciB0aGlzIG1hbnkgd29yZHMgKGFtb25nIHRoZSBtYXhfZmVhdHVyZXMgbW9zdCBjb21tb24gd29yZHMpCmJhdGNoX3NpemUgPC0gMzIKCmNhdCgiTG9hZGluZyBkYXRhLi4uXG4iKQppbWRiIDwtIGRhdGFzZXRfaW1kYihudW1fd29yZHMgPSBtYXhfZmVhdHVyZXMpCmMoYyhpbnB1dF90cmFpbiwgeV90cmFpbiksIGMoaW5wdXRfdGVzdCwgeV90ZXN0KSkgJTwtJSBpbWRiIApjYXQobGVuZ3RoKGlucHV0X3RyYWluKSwgInRyYWluIHNlcXVlbmNlc1xuIikKY2F0KGxlbmd0aChpbnB1dF90ZXN0KSwgInRlc3Qgc2VxdWVuY2VzIikKCmNhdCgiUGFkIHNlcXVlbmNlcyAoc2FtcGxlcyB4IHRpbWUpXG4iKQppbnB1dF90cmFpbiA8LSBwYWRfc2VxdWVuY2VzKGlucHV0X3RyYWluLCBtYXhsZW4gPSBtYXhsZW4pCmlucHV0X3Rlc3QgPC0gcGFkX3NlcXVlbmNlcyhpbnB1dF90ZXN0LCBtYXhsZW4gPSBtYXhsZW4pCmNhdCgiaW5wdXRfdHJhaW4gc2hhcGU6IiwgZGltKGlucHV0X3RyYWluKSwgIlxuIikKY2F0KCJpbnB1dF90ZXN0IHNoYXBlOiIsIGRpbShpbnB1dF90ZXN0KSwgIlxuIikKYGBgCgpMZXQncyB0cmFpbiBhIHNpbXBsZSByZWN1cnJlbnQgbmV0d29yayB1c2luZyBhIGBsYXllcl9lbWJlZGRpbmcoKWAgYW5kIGBsYXllcl9zaW1wbGVfcm5uKClgLgoKYGBge3IsIGVjaG89VFJVRSwgcmVzdWx0cz0naGlkZSd9Cm1vZGVsIDwtIGtlcmFzX21vZGVsX3NlcXVlbnRpYWwoKSAlPiUKICBsYXllcl9lbWJlZGRpbmcoaW5wdXRfZGltID0gbWF4X2ZlYXR1cmVzLCBvdXRwdXRfZGltID0gMzIpICU+JQogIGxheWVyX3NpbXBsZV9ybm4odW5pdHMgPSAzMikgJT4lCiAgbGF5ZXJfZGVuc2UodW5pdHMgPSAxLCBhY3RpdmF0aW9uID0gInNpZ21vaWQiKQoKbW9kZWwgJT4lIGNvbXBpbGUoCiAgb3B0aW1pemVyID0gInJtc3Byb3AiLAogIGxvc3MgPSAiYmluYXJ5X2Nyb3NzZW50cm9weSIsCiAgbWV0cmljcyA9IGMoImFjYyIpCikKCmhpc3RvcnkgPC0gbW9kZWwgJT4lIGZpdCgKICBpbnB1dF90cmFpbiwgeV90cmFpbiwKICBlcG9jaHMgPSAxMCwKICBiYXRjaF9zaXplID0gMTI4LAogIHZhbGlkYXRpb25fc3BsaXQgPSAwLjIKKQpgYGAKCkxldCdzIGRpc3BsYXkgdGhlIHRyYWluaW5nIGFuZCB2YWxpZGF0aW9uIGxvc3MgYW5kIGFjY3VyYWN5OgoKYGBge3J9CnBsb3QoaGlzdG9yeSkKYGBgCgpBcyBhIHJlbWluZGVyLCBpbiBjaGFwdGVyIDMsIHRoZSBmaXJzdCBuYWl2ZSBhcHByb2FjaCB0byB0aGlzIGRhdGFzZXQgZ290IHlvdSB0byBhIHRlc3QgYWNjdXJhY3kgb2YgODglLiBVbmZvcnR1bmF0ZWx5LCB0aGlzIHNtYWxsIHJlY3VycmVudCBuZXR3b3JrIGRvZXNuJ3QgcGVyZm9ybSB3ZWxsIGNvbXBhcmVkIHRvIHRoaXMgYmFzZWxpbmUgKG9ubHkgODQlIHZhbGlkYXRpb24gYWNjdXJhY3kpLiBQYXJ0IG9mIHRoZSBwcm9ibGVtIGlzIHRoYXQgeW91ciBpbnB1dHMgb25seSBjb25zaWRlciB0aGUgZmlyc3QgNTAwIHdvcmRzLCByYXRoZXIgdGhhbiBmdWxsIHNlcXVlbmNlcyAtLSBoZW5jZSB0aGUgUk5OIGhhcyBhY2Nlc3MgdG8gbGVzcyBpbmZvcm1hdGlvbiB0aGFuIHRoZSBlYXJsaWVyIGJhc2VsaW5lIG1vZGVsLiBUaGUgcmVtYWluZGVyIG9mIHRoZSBwcm9ibGVtIGlzIHRoYXQgYGxheWVyX3NpbXBsZV9ybm4oKWAgIGlzbid0IGdvb2QgYXQgcHJvY2Vzc2luZyBsb25nIHNlcXVlbmNlcywgc3VjaCBhcyB0ZXh0LiBPdGhlciB0eXBlcyBvZiByZWN1cnJlbnQgbGF5ZXJzIHBlcmZvcm0gbXVjaCBiZXR0ZXIuIExldCdzIGxvb2sgYXQgc29tZSBtb3JlIGFkdmFuY2VkIGxheWVycy4KCiMjIEEgY29uY3JldGUgTFNUTSBleGFtcGxlIGluIEtlcmFzCgpOb3cgbGV0J3Mgc3dpdGNoIHRvIG1vcmUgcHJhY3RpY2FsIGNvbmNlcm5zOiB3ZSB3aWxsIHNldCB1cCBhIG1vZGVsIHVzaW5nIGEgTFNUTSBsYXllciBhbmQgdHJhaW4gaXQgb24gdGhlIElNREIgZGF0YS4gSGVyZSdzIHRoZSBuZXR3b3JrLHNpbWlsYXIgdG8gdGhlIG9uZSB3aXRoIGBsYXllcl9zaW1wbGVfcm5uKClgIHRoYXQgd2UganVzdCBwcmVzZW50ZWQuIFdlIG9ubHkgc3BlY2lmeSB0aGUgb3V0cHV0IGRpbWVuc2lvbmFsaXR5IG9mIHRoZSBMU1RNIGxheWVyLCBhbmQgbGVhdmUgZXZlcnkgIG90aGVyIGFyZ3VtZW50ICh0aGVyZSBhcmUgbG90cykgdG8gdGhlIEtlcmFzIGRlZmF1bHRzLiBLZXJhcyBoYXMgZ29vZCBkZWZhdWx0cywgYW5kIHRoaW5ncyB3aWxsIGFsbW9zdCBhbHdheXMgImp1c3Qgd29yayIgd2l0aG91dCB5b3UgaGF2aW5nIHRvIHNwZW5kIHRpbWUgdHVuaW5nIHBhcmFtZXRlcnMgYnkgaGFuZC4KCmBgYHtyLCBlY2hvPVRSVUUsIHJlc3VsdHM9J2hpZGUnfQptb2RlbCA8LSBrZXJhc19tb2RlbF9zZXF1ZW50aWFsKCkgJT4lIAogIGxheWVyX2VtYmVkZGluZyhpbnB1dF9kaW0gPSBtYXhfZmVhdHVyZXMsIG91dHB1dF9kaW0gPSAzMikgJT4lIAogIGxheWVyX2xzdG0odW5pdHMgPSAzMikgJT4lIAogIGxheWVyX2RlbnNlKHVuaXRzID0gMSwgYWN0aXZhdGlvbiA9ICJzaWdtb2lkIikKCm1vZGVsICU+JSBjb21waWxlKAogIG9wdGltaXplciA9ICJybXNwcm9wIiwgCiAgbG9zcyA9ICJiaW5hcnlfY3Jvc3NlbnRyb3B5IiwgCiAgbWV0cmljcyA9IGMoImFjYyIpCikKCmhpc3RvcnkgPC0gbW9kZWwgJT4lIGZpdCgKICBpbnB1dF90cmFpbiwgeV90cmFpbiwKICBlcG9jaHMgPSAxMCwKICBiYXRjaF9zaXplID0gMTI4LAogIHZhbGlkYXRpb25fc3BsaXQgPSAwLjIKKQpgYGAKCmBgYHtyfQpwbG90KGhpc3RvcnkpCmBgYA==