Full Stack Deep Learning

1.Sequence Problems

Q) Why not use feedforward networks?

  1. Variable length inputs

→ padding을 모든 시퀀스에 추가 (max length와 같도록)

  1. Memory Scaling

→ 그러려면 메모리가 필요함

→ linearly in number of timesteps

  1. Overkill

→ flatten 시퀀스와 output을 매핑하는 행렬 ~ overkill

→ ignores the nature of the problem

2.RNNs

3.Vanishing gradients and LSTMs

3.1 Vanishing gradients