In the next 12 months, will a Large Language Model built by a Chinese organization rank in the top 3 overall on the Chatbot Arena LLM Leaderboard? (Scores for forecasts between Oct 7, 2024 and Jan 23, 2025)

Started Oct 7, 2024 08:00PM
Closed Jan 23, 2025 05:00PM (2 months ago)

The development of large language models (LLMs) has been a key area of competition among global AI organizations. The Chatbot Arena LLM Leaderboard, a benchmark platform that ranks LLMs based on a series of side-by-side user comparisons, allows participants to vote on which models provide the best responses in various scenarios (KDnuggets). LLMs developed by organizations such as OpenAI, Anthropic, and others have dominated the top positions on the leaderboard. However, Chinese organizations have significantly advanced in AI research and development, presenting a challenge to the global leaders in this space (CNBC, PYMNTS).

Resolution Criteria:
This question will resolve as "Yes" if, within the next 12 months, an LLM built by a Chinese organization (e.g., Alibaba, 01.AI, Zhipu AI, or others) ranks within the top 3 models on the overall Chatbot Arena LLM Leaderboard. For a model to resolve this question:
  • “Overall” must be selected in the leaderboard “Category” dropdown
  • The model’s “Rank” must be 1, 2, or 3
  • The organization listed under the “Organization” column must be a Chinese organization.

For the purposes of this question, a "Chinese organization" is one that meets at least one of the following criteria: 
  • The organization is headquartered in mainland China.
  • The majority of the organization’s research, development, and production related to LLMs occurs in mainland China.
  • At least 50% of the organization is owned or controlled by entities, shareholders, or government bodies based in mainland China.

Organizations that do not meet any of the above criteria but have subsidiary operations in China will not count unless the specific model is developed primarily through its Chinese subsidiary.

Resolution Notes

DeepSeek-R1, an LLM by Chinese organization DeepSeek, ranked 3 overall in Chatbot Arena's 2025-01-23 LLM Leaderboard


This question is a resolved time period from In the next 12 months, will a Large Language Model built by a Chinese organization rank in the top 3 overall on the Chatbot Arena LLM Leaderboard?
Possible Answer Correct? Final Crowd Forecast
Yes 30%
No 70%

Crowd Forecast Profile

Participation Level
Number of Forecasters 88
Average for questions in their first 6 months: 57
Number of Forecasts 231
Average for questions in their first 6 months: 168
Accuracy
Participants in this question vs. all forecasters better than average

Most Accurate

Relative Brier Score

1.
-0.785265
2.
-0.508967
3.
-0.447612
4.
-0.349402
5.
-0.322565

Consensus Trend

Files
Tip: Mention someone by typing @username