On May 14, 2024, an OpenAI employee confirmed on the social platform X that the mysterious chatbot “gpt-chatbot” showing outstanding performance in the LMSYS Chatbot Arena is actually their newly released AI model, GPT-4o.


This model topped the leaderboard in the tests, achieving the highest score ever.

“GPT-4o is our most advanced cutting-edge model,” stated the OpenAI employee on Twitter. “We have been testing a version of this model in the arena under the name ‘im-also-a-good-gpt2-chatbot’.”


The Chatbot Arena is a website where visitors can converse with two random AI language models without knowing which is which and then choose the one that provides better responses.


Starting in April 2024, OpenAI tested several versions of GPT-4o, initially appearing under the name “gpt2-chatbot,” then evolving to “im-a-good-gpt2-chatbot,” and finally “im-also-a-good-gpt2-chatbot”.

Since the official release of GPT-4o, insiders have revealed that the model has significantly outperformed its competitors on the LMSYS internal leaderboard, surpassing the previously top-ranked models Claude 3 Opus and GPT-4 Turbo.

The official account of shared an internal screenshot showing the “gpt2-chatbot” series soaring to the top of the leaderboard, with a significant advantage (about 50 Elo points) over all other models, making it the strongest model in the arena.

“The public version of ‘gpt-4o’ has entered the arena and will soon be on the public leaderboard!”

As of this publication, “im-also-a-good-gpt2-chatbot” has an Elo score of 1309, leading the GPT-4-Turbo-2023-04-09’s 1253 points and Claude 3 Opus’s 1246 points.

Before the arrival of the three “gpt2-chatbot” models, Claude 3 and GPT-4 Turbo were competing for the top spot.

