Q1: I am following this tutorial on Recurrent Neural Networks, and I am wondering why do you need to create feed_dict in the following part of the code:
def run_epoch(session, model, eval_op=None, verbose=False):
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
I tested and it seems that if you remove this part of the code, the code also runs:
def run_epoch(session, model, eval_op=None, verbose=False):
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
vals = session.run(fetches)
So my question is why do you need to reset the initial state to zeros after you feed a new batch of data?
Q2: Also, from what I understand using feed_dict is considered to be slow. That is why it is recommended to feed data using tf.data APIs. Is using feed_dict also an issue in this case? If so, how is it possible to avoid using feed_dict in this example.
UPD: Thank you a lot @jdehesa for your detailed response. It helps a lot! Just before I close this question and accept your answer, could you clarify one point that you mentioned answering Q1.
I see now the purpose of feed_dict. However, I am not sure that it is something that is implemented in the tutorial. From what you say:
At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.
I just looked again into the source code of the tutorial, and I do not see where the the output state is set as new current state for the next iteration. Is it done somewhere implicitly or do I miss something?
I maybe also missing something on theoretical side. Just to make sure that I understand it correctly, here there is a quick example. Assume the input data is an array that stores integer values from 0 to 120. We set the batch size is 5, the number of data points in one batch is 24, and the number of time steps in unrolled RNN is 10. In this case you, you only use data points at time points from 0to 20. Then you process the data in two steps (model.input.epoch_size = 2). When you iterate over model.input.epoch_size:
state = session.run(model.initial_state)
# ...
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
you feed a batch of data like this:
> Iteration (step) 1:
x:
[[ 0 1 2 3 4 5 6 7 8 9]
[ 24 25 26 27 28 29 30 31 32 33]
[ 48 49 50 51 52 53 54 55 56 57]
[ 72 73 74 75 76 77 78 79 80 81]
[ 96 97 98 99 100 101 102 103 104 105]]
y:
[[ 1 2 3 4 5 6 7 8 9 10]
[ 25 26 27 28 29 30 31 32 33 34]
[ 49 50 51 52 53 54 55 56 57 58]
[ 73 74 75 76 77 78 79 80 81 82]
[ 97 98 99 100 101 102 103 104 105 106]]
At each iteration, you construct a new feed_dict with the initial state of he recurrent units at zero. So you assume at each step that you start processing the sequence from scratch. Is it correct?