Transformers | Notion

Full Stack Deep Learning

1. Transfer Learning in Computer Vision

이미지 데이터 1 만장으로 새 분류

→ Resnet-50가 좋은 성능을 보일 것

→ 과적합 발생할 수 있음

→ fine tuning
전이 학습이란?
- 큰 데이터로 학습 시킨 large model (=pretrained model)
- 학습 되어있는 모델을 가져와 새로운 레이어를 더하거나 교체해 학습
  
  → 더 적은 데이터로 빠르고 정확하게 학습 가능
Model zoo
- pretrained-model
- tensorflow, pytorch 둘 다 사용 가능

2. Embeddings and Language Models

자연어 처리에서 실제 input은 단어이지만, 딥러닝에서는 벡터임
단어를 벡터로 어떻게 바꿀까?
- 원-핫 인코딩
- 문제) 작동은 되나 어휘 크기에 따라 제대로 확장 X
  
  → 매우 높은 차원의 희소 벡터에서의 신경망은 잘 작동하지 못함
  
  → Violates what we know about word similarity
- dense vector
  
  → embedding matrix
  
  문제) how do we find the values of the embedding matrix?
  - Learn as part of the task
  - Learn a Language Model
    
    → N-Grams
    
    → Skip-grams (Look on both sides of the target word)
  - 작업 속도를 높이기 위해서는?
    
    → Binary instead of multi-class
    
    → Word2Vec

3. "NLP's ImageNet Moment": ELMO/ULMFit

around 2017
Elmo
- SQuAD
- SNLI
- GLUE
ULMFit
- similar to Elmo

4. Transformers

Paper

→ Attention is all you need(2017) ~~다음 논문 스터디 때 읽기~~
- Encoder-decoder with only attention and fully-connected layers
- 실제 매커니즘
  
  focus just on the encoder
(Masked) Self-attention
Positional encoding
Layer normalization

Untitled

4.1 Attention in detail

Basic self-attention

No learned weights
Order of the sequence does not affect result of computations