In the next 12 months, will a Large Language Model built by a Chinese organization rank in the top 3 overall on the Chatbot Arena LLM Leaderboard?

Started Oct 7, 2024 08:00PM
Closed Jan 23, 2025 05:00PM (3 months ago)

See more details

Topics

Science & Technology Artificial Intelligence

Tags

Cybersecurity

Seasons

2024 Season 2025 Season

The development of large language models (LLMs) has been a key area of competition among global AI organizations. The Chatbot Arena LLM Leaderboard, a benchmark platform that ranks LLMs based on a series of side-by-side user comparisons, allows participants to vote on which models provide the best responses in various scenarios (KDnuggets). LLMs developed by organizations such as OpenAI, Anthropic, and others have dominated the top positions on the leaderboard. However, Chinese organizations have significantly advanced in AI research and development, presenting a challenge to the global leaders in this space (CNBC, PYMNTS).

Resolution Criteria:

This question will resolve as "Yes" if, within the next 12 months, an LLM built by a Chinese organization (e.g., Alibaba, 01.AI, Zhipu AI, or others) ranks within the top 3 models on the overall Chatbot Arena LLM Leaderboard. For a model to resolve this question:

“Overall” must be selected in the leaderboard “Category” dropdown
The model’s “Rank” must be 1, 2, or 3
The organization listed under the “Organization” column must be a Chinese organization.

For the purposes of this question, a "Chinese organization" is one that meets at least one of the following criteria:

The organization is headquartered in mainland China.
The majority of the organization’s research, development, and production related to LLMs occurs in mainland China.
At least 50% of the organization is owned or controlled by entities, shareholders, or government bodies based in mainland China.

Organizations that do not meet any of the above criteria but have subsidiary operations in China will not count unless the specific model is developed primarily through its Chinese subsidiary.

Possible Answer	Correct?	Final Crowd Forecast
Yes		30%
No		70%

Crowd Forecast Profile

Participation Level
Number of Forecasters	88
Average for questions older than 6 months: 61
Number of Forecasts	231
Average for questions older than 6 months: 229

Accuracy
Participants in this question vs. all forecasters	better than average

Scored Periods

Scores for forecasts between	Final Crowd Forecast
Oct 7, 2024 08:00PM - Jan 23, 2025 05:00PM	30%