Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models at the end of 2025?

253

10kṀ180k

Dec 31

67%

chance

ALL

Question is about any current or future openAI models vs any competitor models.

If a language model exists that is undoubtedly the most accurate, reliable, capable, and powerful, that model will win. If there is dispute as to which is more powerful, a significant popularity/accessibility advantage will decide the winner. There must be public access for it to be eligible.

See previous market for more insight into my resolution plan: /Gen/will-there-be-an-ai-language-model

2024 recap: capabilities were "similar". Both Google and openAI models tied for first place on LLM Arena. OpenAI won because of their popularity/market dominance.‌

Update 2025-11-27 (PST) (AI summary of creator comment): Creator will not resolve based on current Gemini 3 lead alone. Will wait until end of year to allow for:
- Potential new OpenAI model releases
- Further discussion on whether the lead is "strong enough"
- Assessment of whether there is dispute about which model is more powerful

Creator leans toward YES if OpenAI releases no new models by end of year.

Update 2025-11-27 (PST) (AI summary of creator comment): Creator will not resolve early despite Gemini 3.0's current lead. The bar for early resolution is higher than the bar for determining a winner at end-of-year assessment. Creator still leans YES but will wait before resolving.

Update 2025-12-06 (PST) (AI summary of creator comment): Creator distinguishes between early resolution criteria vs end-of-year resolution criteria:
- Early resolution requires a model that is so obviously better it takes a huge chunk of market share from ChatGPT (which still has ~80% market share)
- End-of-year resolution (Dec 31) will be based on whatever is the best model, with popularity only acting as a tie-breaker rather than a necessary component

Creator acknowledges Gemini/Claude dominate ChatGPT for top-end use, but notes most people either don't know or don't care that they are better.

Update 2025-12-08 (PST) (AI summary of creator comment): Title updated to reflect end-of-year resolution: "At the end of 2025"

Current assessment: Creator believes Gemini and Claude are sufficiently ahead of ChatGPT based on all metrics.

Resolution plan:

Market will resolve YES unless OpenAI releases a new model that top scores before year end
Not resolving early to give OpenAI time to release a potential new model
Market is meant to measure if OpenAI has been "strongly surpassed" - would be inappropriate to resolve YES if OpenAI releases a superior model (e.g., GPT-6) shortly after, as that would indicate they were only "beat to release" rather than truly surpassed

Evidence considered: Benchmarks, stock market activity, and OpenAI's "code red" all indicate OpenAI knows they are no longer the leader

Update 2025-12-08 (PST) (AI summary of creator comment): Creator clarifies the "strongly surpassed" criterion:
- Market will resolve YES unless OpenAI releases a new public model before year-end that demonstrates they were never really "strongly surpassed"
- If OpenAI was merely "beat to release" but maintained their lead behind the scenes, this would resolve NO
- There is no specific metric (absolute or relative) for determining "strongly surpassed"
- The current lead by competitors (Gemini/Claude) is sufficient to resolve YES if OpenAI cannot produce evidence of having something better in development
- Example: If a competitor releases something only 0.1% better and OpenAI releases a superior model shortly after, OpenAI was never truly "strongly surpassed" - they were just beat to release

Update 2025-12-11 (PST) (AI summary of creator comment): For OpenAI's o5.2 (or any new OpenAI model) to affect resolution:
- Must be publicly released
- Must be independently tested/scored
- Must be at least top scoring to show OpenAI wasn't strongly surpassed

If new OpenAI model is only barely best in class, resolution may be more complicated. Otherwise, market resolves YES if these conditions aren't met.

Update 2025-12-11 (PST) (AI summary of creator comment): Creator clarifies how GPT-5.2 (or any new OpenAI model) will affect resolution:
- Since Gemini 3.0 has been out for a while and 5.2 is a "catch up model," if Gemini is ahead on benchmarks, this shows Gemini strongly surpassed OpenAI when it was released
- If Google releases an update same day that beats GPT-5.2, that would also count
- At year end, if there is a clear non-OpenAI leader, market resolves YES
- If GPT-5.2 is a really close number 1 model where nobody can determine which is better, creator will probably still resolve YES

Creator notes the "strongly surpassed" language is meant to avoid situations where OpenAI has achieved AGI/ASI but loses due to release schedules. Since OpenAI has shown signs they're worried (code red, etc.), if the new model isn't a top scorer, they know OpenAI was passed.

Update 2025-12-11 (PST) (AI summary of creator comment): Creator clarifies the "strongly surpassed" criterion:
- OpenAI must prove they weren't surpassed behind the scenes (i.e., their unreleased models are better than competitors' new releases)
- If OpenAI releases a new model that isn't a top scorer, this proves they were surpassed
- The "strongly" qualifier means definitely knowing OpenAI was passed, not just tied for first or ahead behind the scenes but withholding for a big release
- Last year resolved in OpenAI's favor despite Claude having a few point lead because OpenAI hadn't released in a while
- This time there will be a clear picture of OpenAI's best most recent capabilities - if it's not top tier, they were clearly surpassed

Update 2025-12-11 (PST) (AI summary of creator comment): Creator clarifies evaluation approach for GPT-5.2 (or any new OpenAI model):
- Will grade OpenAI harder since they have the most recent release and most time to tune
- Evaluation is holistic but only on language capabilities (no image/video/etc.)
- No specific weights to any particular benchmarks
- If experts/industry leaders consensus says GPT-5.2 is inferior to Gemini/Claude, market can resolve YES even if GPT-5.2 is comparable on lmarena

Update 2025-12-12 (PST) (AI summary of creator comment): Creator clarifies that GPT-5.2's higher rank on WebDev alone is not sufficient to resolve NO. The model is not #1 on the relevant leaderboard at this time.

ChatGPT

GPT-5 Speculation

Google Gemini

Get

1,000

to start trading!

People are also trading

By January 2026, will we have a language model with similar performance to GPT-3.5 (i.e. ChatGPT as of Feb-23) that is small enough to run locally on the highest end iPhone available at the time?

96% chance

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, in 2025?

7% chance

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, before August 2026?

Sort by:

sold Ṁ820 YES

Yeah, I'll cut my losses :P

I wouldn't say that GPT-5.2 is the best model now, but neither would I say it's strongly surpassed.

Ok let's vibe bet a little... To me it seems most observers are pretty much underwhelmed by 5.2

bought Ṁ100 YES

false fall - chat gpt 4.2 hadn't made it to the top, OPEN Ai models are still strongly surpassed by other models as if at the end of 2025

@1bets GPT-5.2 hasn't even been evaluated on the text leaderboard yet lmao

Yeah it's expected in roughly 5 days

Is gpt 5.2's higher rank on WebDev enough for this to resolve no?

@ItsMe higher than what? It’s not even #1 atp

https://x.com/sama/status/1999185784012947900?s=46

How does 5.2 affect resolution?

@FaroukRice it needs to be publicly released, independently tested/scored, and be good enough to show that they weren’t strongly surpassed (I.e. it needs to be at least top scoring)

Otherwise we still resolve YES. If it’s kind of good but only barely best in class, things might get a little more complicated.. hopefully it’s either very impressive or hot trash

@Gen Is there any outcome where ChatGPT models are surpassed, but it doesn’t meet the threshold of “strongly surpassed”? Or is any surpassing sufficient?

Ex: If Gemini 3.0 is very slightly better than 5.2 on some benchmarks but tied on others, is this sufficient for “strongly surpassed” and a YES?

@Gen Related, how possible is a 50/50 if Gemini 3 was obviously better than GPT 5/5/1 but 5.2 is rather close?

bought Ṁ200 NO

@Panfilo my interpretation of the resolution criteria that doesn’t matter since we are looking at any sufficient openAI model ie right now 5.2.

@Gen what is your definition of strongly vs weakly surpassed?

@FaroukRice because Gemini has been out for a while and 5.2 is a catch up model, if Gemini is ahead on benchmarks that will show to me that when Gemini was released it strongly surpassed openAI (was largely indisputably better than any model OpenAI were developing for public release)

if Google released an update same day that was ahead of GPT5.2, that would count. At the end of the year, if there is a clear non openAI leader then this market will resolve YES.

The “strongly surpassed” encapsulates a bunch of weird rules from last year where we resolved NO even though Claude was ahead on benchmarks by 1-2pts because openAI were still leaders behind the scenes. I’m trying to be as precise as possible about real outcomes rather than explaining the language at this point, but there’s a lot you can go back and read if you want. One of the guiding principles has always been to avoid a situation where openAI have essentially achieved AGI/ASI but this resolves against them because of release schedules. They have basically done everything we thought they could (code red, etc) to indicate they’re worried, so if the new one isn’t a top scorer - we know 100% they were passed.

If it’s a really close number 1 model and nobody can really determine which is better, I’ll probably still resolve YES .. I hope people can follow the reasoning

Happy to continue to discuss, I recommend not making huge bets if you’re not comfortable with how I’m explaining things

> clarification

> i get more confused

@Bayesian yeah I shouldn’t have written a half baked reply, my bad

Bottom line is, at this point openAI have to prove they weren’t surpassed behind the scenes (that is, their unreleased models are better than the new releases by competitors). If they release something that isn’t a top scorer, that’s obviously not the case.

Last year it resolved in their favour despite Claude or whoever having a few pt lead because openAI hadn’t released in a while. This time we will have a clear picture of their best most recent capabilities and if it’s not top top then clearly they were surpassed

This is where the “strongly” comes from, it is more about definitely knowing they were passed, and not just tied for first, or ahead behind the scenes but withholding for a big release

@Gen In this case, Gemini 3 and Opus 4.5 were both released within the past few weeks. If GPT-5.2 scores roughly similar to both of them, would this count as other labs surpassing or merely catching up with OpenAI?

I also think all of these models will outperform the other two in certain tasks. Is your evaluation holistic or based off a few key benchmarks (ex: lmarena)?

@SolarFlare “roughly similar” is hard to say, because it’ll depend what that looks like. I’m inclined to go harder when grading openAI because they have the most recent release and the most time to tune things

The evaluation is holistic but only on language capabilities (no image/video/etc.). No specific weights to any particular benchmarks. If all of the experts (or, all of the people on the TIME cover) held hands and said GPT5.2 was trash, and that Gemini/Claude have it beat, then it can still resolve YES even if it’s comparable on lmarena

@Gen

@Gen I think the most likely outcome is that it is barely best in class

Market clarification, you should read this:

The title is honestly not well written at this point, we are following a bunch of rules defined in the previous year market. The important ones are:

This market will resolve YES if there is an obvious better chatbot at the end of the year, or
it could have early resolved YES if there was an unambiguous non-openAI leader prior to the end of the year (but there was discussion about giving openAI time to release a retaliation model, provided it happens in the same year)

This is what we are following right now. The year is basically over, so I'm updating the title to say "At the end of 2025".

I believe that Gemini and Claude are sufficiently ahead of ChatGPT. All metrics point to this being the case. I expect that unless openAI release a new model which top scores, it should resolve YES.

If this is highly contested I am happy to discuss it, but I think it's very reasonable given the prior year criteria which we follow, and the only reason I am not resolving it early is because there is still time for openAI to prove that they are "top of class". It would be extremely cringe for this market to resolve based on a temporary lead by competitors, when it is supposed to measure the fall of openAI as a leader. If openAI released GPT6 next week and crushed benchmarks, it wouldn't make sense to say they were "strongly surpassed" for this transient period between releases.

Among benchmarks, stock market activity, and the "code red", it seems clear that openAI knows they are no longer the leader, and this market will resolve YES unless they produce something (a new public model) that indicates they were never really "strongly surpassed", i.e. they were beat to release, but behind the scenes, they maintained their lead.

@Gen have early resolved YES if there was an unambiguous non-openAI leader - that would be better (my opinion)

bought Ṁ30 NO

https://x.com/0xashensoul/status/1997639370669539462

opened a Ṁ750 YES at 69% order

This market now is essentially a bet on whether openAI's response to Altman's red alert will bear fruit before year's end

@LeonardoKr in my opinion this possibility is being overrated

@LeonardoKr doesn't it ask "strongly surpasses before the end of the year" so even if openai catches back to google, if gemini 3 strongly surpassed openai at the time of release it still resolves YES? i might have misunderstood that nuance

@Bayesian I feel like strongly surpassed is once again a poorly defined criteria. That being said, strongly in my personal opinion carries an aspect of time. If openAI lost the lead for a couple of days or weeks and come out with something surpassing everyone else I would concede. I say that because a small lead by competitors over openAI, yet sustained over long periods of time would amount to "strongly surpassed" for me as well (would be as lame or cringe to say, meh but they are only slightly behind, not strongly)

@LeonardoKr it could just be that capabilities increase very quickly. so despite there being large absolute and even relative gaps between models, capabilities increase so quickly that the newest model is almost always frontier by a lot. i don't think this is too interesting of a distinction though and regardless genzy has clarified what he meant and openai being close enough to catching up that it is reasonably said that it is not "strongly surpassed" by gemini 3 would seem to mean that this market would resolve NO, which I think will happen.

@Bayesian @Gen clearly stated that if the current lead holds (>2% LLM arena leaderboard points) this market will resolve YES.

So, if capabilities change quickly, well then openAI will have it's shot until new year's eve.

If not, well we surely will have new markets to bet on these things next year

@LeonardoKr

(>2% LLM arena leaderboard points)

where

@Bayesian the most decision relevant part of what I said was this:

this market will resolve YES unless [openAI] produce something (a new public model) that indicates they were never really "strongly surpassed", i.e. they were beat to release, but behind the scenes, they maintained their lead.

there is no specific metric in absolute or relative terms, but I did say that the lead held so far is sufficient to resolve YES if openAI can't produce evidence that they have something cooking behind the scenes.

This condition was formed last year when people were worried that the market would resolve YES on the first day that openAI lost their #1 spot on one of the benchmarks. IIRC Claude beat GPT by 3 points on the arena at the end of last year, and was arguably better (insufficiently so). This was never the intended purpose, because if someone follows GPT3 with something 0.1% better 6 months into it's release, and a week later openAI release GPT4, they were never really strongly surpassed or dethroned as the leader despite transiently not having the best public model

@Gen well, they definitely have and had something cooking behind the scenes, but that doesn't mean much; google also has stuff cooking behind the scenes. regardless, yeah, openai will release gpt5.2 this week and it will be better than gpt5.1. it will indicate they were never really strongly surpassed. but at the same time, it's inaccurate to say they were "beat to release". They rushed gpt5.2 explicitly because of gemini3 and claude opus 4.5, but would have waited otherwise. It will likely be less polished than their usual models but more polished in many ways than gemini 3 is, because google is beat in terms of many facets of post-training (character, tool-use, situational awareness).

It'll be clear that they are not "the lab in the lead" anymore, that no lab is the lab in the lead, that they are trading off and specializing. You may want to count that as "they are no longer in the lead" or "they are still in the lead wrt stuff they are focused on". As such it's kind of a tricky situation

People are also trading

By January 2026, will we have a language model with similar performance to GPT-3.5 (i.e. ChatGPT as of Feb-23) that is small enough to run locally on the highest end iPhone available at the time?

96% chance

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, in 2025?

7% chance

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, before August 2026?

67% chance

People are also trading

People are also trading

Related questions