Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 630 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 630 Bytes

minutegpt

This is a minor generative pre trained transformer built which generates more text based on the data set trained on, (which is a Shakespeare here) based on the transformer architecture, from the paper attention all you need.

Couldnt really get the final loss prediction displaying the more.txt file with the generated text of the gpt, as i didnt have access to a good GPU, running this on a macbook

This is basically an implementation of the self attention, batch norm/layer norm and the feed forward mechanishm of the decoder part of the transformer architecture. You can use this with any other different dataset