[Question] What are the most important papers/​post/​resources to read to understand more of GPT-3?

https://www.lesswrong.com/posts/m23A54nFL5yeDRukL/what-are-the-most-important-papers-post-resources-to-read-to

I’m way more used to thinking about weird maths or distributed algorithms or abstract philosophical problems than about concrete machine learning architectures. But based on everything I see about GPT-3, it seems a nice idea to learn more about it, even if only for participating in the discussion without spouting non-sense. So I’m asking for what you think are the must-reads on GPT-3 specifically, and maybe any requirement to understand them.

Comment

https://www.lesswrong.com/posts/m23A54nFL5yeDRukL/what-are-the-most-important-papers-post-resources-to-read-to?commentId=cEyZhhg6vFjmgKEXE

nostalgebraist’s blog is a must-read regarding GPT-x, including GPT-3. Perhaps, start here ("the transformer… ‘explained’?"), which helps to contextualize GPT-x within the history of machine learning.

(Though, I should note that nostalgebraist holds a contrarian "bearish" position on GPT-3 in particular; for the "bullish" case instead, read Gwern.)

Comment

https://www.lesswrong.com/posts/m23A54nFL5yeDRukL/what-are-the-most-important-papers-post-resources-to-read-to?commentId=nG6JopEihgWwLQogP

Thanks for the answer! I knew about the "transformer explained" post, but I was not aware of its author’s position on GPT-3.

https://www.lesswrong.com/posts/m23A54nFL5yeDRukL/what-are-the-most-important-papers-post-resources-to-read-to?commentId=LuoeiZkmzwt2Nf72b

Here’s a list of resources that may be of use to you. The GPT-3 paper isn’t too specific on implementation details because the changes that led to it were rather incremental (especially from GPT-2, and more so the farther back we look at the Transformer lineage). So the scope to understand GPT-3 is broader than one might expect.

  • https://​​github.com/​​jalammar/​​jalammar.github.io/​​blob/​​master/​​notebooks/​​nlp/​​01_Exploring_Word_Embeddings.ipynb

  • http://​​www.peterbloem.nl/​​blog/​​transformers

  • http://​​jalammar.github.io/​​illustrated-transformer/​​

  • https://​​amaarora.github.io/​​2020/​​02/​​18/​​annotatedGPT2.html

  • http://​​jalammar.github.io/​​illustrated-gpt2/​​

  • http://​​jalammar.github.io/​​how-gpt3-works-visualizations-animations/​​

  • https://​​arxiv.org/​​pdf/​​1409.0473.pdf Attention (initial)

  • https://​​arxiv.org/​​pdf/​​1706.03762.pdf Attention Is All You Need

  • http://​​nlp.seas.harvard.edu/​​2018/​​04/​​03/​​attention.html (annotated)

  • https://​​www.arxiv-vanity.com/​​papers/​​1904.02679/​​ Visualizing Attention

  • https://​​stats.stackexchange.com/​​questions/​​421935/​​what-exactly-are-keys-queries-and-values-in-attention-mechanisms

  • https://​​arxiv.org/​​pdf/​​1807.03819.pdf Universal Transformers

  • https://​​arxiv.org/​​pdf/​​2007.14062.pdf Big Bird (see appendices)

  • https://​​www.reddit.com/​​r/​​MachineLearning/​​comments/​​hxvts0/​​d_breaking_the_quadratic_attention_bottleneck_in/​​

  • https://​​www.tensorflow.org/​​tutorials/​​text/​​transformer

  • https://​​www.tensorflow.org/​​tutorials/​​text/​​nmt_with_attention

  • https://​​cdn.openai.com/​​blocksparse/​​blocksparsepaper.pdf

  • https://​​openai.com/​​blog/​​block-sparse-gpu-kernels/​​

  • https://​​github.com/​​pbloem/​​former/​​blob/​​master/​​former/​​transformers.py

  • https://​​github.com/​​openai/​​blocksparse/​​blob/​​master/​​examples/​​transformer/​​enwik8.py

  • https://​​github.com/​​google/​​trax/​​blob/​​master/​​trax/​​models/​​transformer.py

  • https://​​github.com/​​huggingface/​​transformers/​​blob/​​master/​​src/​​transformers/​​modeling_gpt2.py

Comment

https://www.lesswrong.com/posts/m23A54nFL5yeDRukL/what-are-the-most-important-papers-post-resources-to-read-to?commentId=P6MgM6hsQ9FZpNhrG

Thanks! I’ll try to read that.