Breaking News: GPT-4 Model Architecture Revealed - Boasting 1.8 Trillion Parameters and a Mixed Expert Model!

Home \ Tech \ AI \ Breaking News: GPT-4 Model Architecture Revealed – Boasting 1.8 Trillion Parameters and a Mixed Expert Model!

Semianalysis, a foreign technology publication, recently uncovered details about OpenAI’s GPT-4 large model, which was released in March this year.

The publication revealed the GPT-4 model architecture, training and reasoning infrastructure, parameter volume, training data sets, token number, cost, Mixture of Experts, and other specific parameters.

According to the source, GPT-4 consists of 1.8 trillion parameters in 120 layers, while GPT-3 has only around 175 billion parameters. To keep the cost reasonable, OpenAI employed a mixed expert model to build GPT-4. Mixture of Experts is a type of neural network that trains multiple models based on data.

After each model produces output, the system combines and outputs these models into a single task. GPT-4 uses 16 mixed expert models, each with 111 billion parameters, and each forward pass route passes through two expert models.

Additionally, GPT-4 has 55 billion shared attention parameters, which were trained using a dataset containing 13 trillion tokens, including non-unique tokens that count as more tokens based on the number of iterations.

The GPT-4 pre-training stage has a context length of 8k, and the 32k version is the result of fine-tuning the 8k. The training cost is quite high, and according to foreign media, 8x H100 is incapable of providing the necessary dense parameter model at a speed of 33.33 Tokens per second.

Consequently, training the model incurs significantly high inference costs. If the H100 physical machine costs $1 per hour, the training cost for one session could reach $63 million (about 5.16 Billion INR).

OpenAI decided to use the A100 GPU training model in the cloud to reduce the final training cost to approximately 21.5 million US dollars (about 1.76 Billion INR). While this approach took slightly longer, it ultimately reduced the training cost.

So guys, if you liked this post and wish to receive more tech stuff delivered daily, don’t forget to subscribe to the Inspire2Rise newsletter to obtain more timely tech news, updates and more!

Keep visiting for more such excellent posts, internet tips, and gadget reviews, and remember we cover, “Everything under the Sun!”

Follow Inspire2rise on Twitter. | Follow Inspire2rise on Facebook. | Follow Inspire2rise on YouTube.

Deepanker Verma

Deepanker Verma is a well-known technology blogger and gadget reviewer based in India. He has been writing about Tech for over a decade.

Breaking News: GPT-4 Model Architecture Revealed – Boasting 1.8 Trillion Parameters and a Mixed Expert Model!

READ MORE FROM INSPIRE2RISE

Leave a Comment Cancel reply

READ MORE FROM INSPIRE2RISE

Leave a Comment Cancel reply

Discover more from Inspire2Rise