Confirmed previous forecast
No Scores Yet
Relative Brier Score
Questions Forecasted
0
Forecasts
0
Upvotes
Forecasting Activity
Forecasting Calendar
No forecasts in the past 3 months
Past Week | Past Month | Past Year | This Season | All Time | |
---|---|---|---|---|---|
Forecasts | 0 | 0 | 17 | 0 | 17 |
Comments | 0 | 0 | 1 | 0 | 1 |
Questions Forecasted | 0 | 0 | 7 | 0 | 7 |
Upvotes on Comments By This User | 0 | 0 | 0 | 0 | 0 |
Definitions |

New Prediction
Probability
Answer
5%
(0%)
Kharkiv
1%
(0%)
Kyiv
0%
(0%)
Odesa
Files

New Prediction
Probability
Answer
0%
(0%)
Yes
100%
(0%)
No
Confirmed previous forecast
Files

New Prediction
Probability
Answer
0%
(0%)
Yes
100%
(0%)
No
Confirmed previous forecast
Files

New Prediction
Probability
Answer
0%
(0%)
Estonia
0%
(0%)
Latvia
0%
(0%)
Lithuania
Confirmed previous forecast
Files

New Prediction
Probability
Answer
0%
(0%)
Moldova
0%
(0%)
Armenia
0%
(0%)
Georgia
0%
(0%)
Kazakhstan
Confirmed previous forecast
Files

New Prediction
Probability
Answer
0%
(0%)
Yes
100%
(0%)
No
Confirmed previous forecast
Files

New Prediction
This forecast expired on Jan 17, 2025 02:24AM
Probability
Answer
Forecast Window
0%
(0%)
Yes
Dec 17, 2024 to Jun 17, 2025
100%
(0%)
No
Dec 17, 2024 to Jun 17, 2025
Confirmed previous forecast
Files

New Prediction
Probability
Answer
5%
(+3%)
Kharkiv
1%
(0%)
Kyiv
0%
(0%)
Odesa
That doesn't mean, the cities' infrastructure and factories were safe, or protected.
Files
The competitiveness of the mainland's LLMs cannot be estimated by watching these rankings. The reseacher is recommended to discourage the interpretation or assumption, that these leaderboards were more than entertainment products, like keeping score in an imagined Sino-American rivalry, akin to fantasy football.
Reason one is the power consumption, a million tokens by a high-quality LLM with the best answers might cost so much that it only pays to be employed where cooling as well as electricity remains very cheap, or free. The analogy is the CPU, though not a piece of software, the CPU's by certain vendors are energy-hungrier than the competition, yet the same power-inefficient CPUs are designed to achieve maximal compute power, achieving top ranks. The rankings seldom calculate the performance by the employed effort, or cost, say the power consumption.
Reason two is the upfront costs for hardware that an LLM requires. The high quality LLMs currently require large amounts of VRAM, unusually large power supply units, and unusual cabling and cooling systems. The purchase alone has become a problem, many AI hobbyists and startups are waiting for datacenters to sell off their last generation's hardware, say, H100 accelerators. An LLM that can produce reasonable results with old, but cheaper hardware, is more desirable than the latest LLM that requires the latest hardware, for the best results. The leaderboard doesn't capture this reality.
Reason three is the absence of a productive use case and application. Not all tasks require the same kind of all-purpose LLM, many task-specific LLMs won't produce good answers to many types of questions, but might be excellent in a narrowly defined application and use case. China's English-language Tongyi (Qwen2.5) is said to be the programmers' favorite, because of its reasonable or excellent results in mathematics and programming despite lower hardware requirements and flaws in other tasks. The leaderboard assumes excellence as a generalist, while the need for computer software is usually specialized.
Reason four are the mainland's LLMs that are not taking part, some by major corporations, and LLMs that are not designed for an English-first-speaking audience, only Tongyi (aka Qwen) was explicitly made for English. It's also been widely shared and widely employed, even though the leaderboard doesn't capture its popularity among developers of LLMs.
Reason five is trivial, though: The mainland read and write Chinese, this leaderboard and audience doesn't, not even German or French. What the Chinese deem as intelligent responses, or capacities, may not be universally shared with Americans, for example, wit, demeanor, use of culturally specific sayings, attempts to negotiate or presentation.
In short, this is a flawed attempt, to estimate the progress of LLMs outside Silicon Valley, and even more flawed, if it shall estimate the effectiveness of sabotage.