Top SWE-Bench Pro public dataset score by January 1, 2026
9
125Ṁ1121
Jan 1
62.1 %
expected
1.4%
0.00% - 29.99%
1.3%
30.00% - 39.99%
30%
40.00% - 54.99%
68%
55%+

This market predicts what the highest score on the SWE-Bench Pro public dataset leaderboard will be as of January 1, 2026.

Current top performers on SWE-Bench Pro public dataset (as of September 24 2025):

  • OpenAI GPT-5: 23.26%

  • Claude Opus 4.1: 22.71%

Resolution Criteria: This market will resolve to the score range that contains the highest score on the official SWE-Bench Pro public dataset leaderboard (https://scale.com/leaderboard/swe_bench_pro_public) as of January 1, 2026.

  • Update 2025-12-12 (PST) (AI summary of creator comment): The market will resolve based on Scale AI's verified scores on the official SWE-Bench Pro public dataset leaderboard, not self-reported scores from model creators.

    • Self-reported scores (like Claude Opus 4.5's 52.0% or GPT 5.2 Thinking's 55.6%) will only count if Scale AI independently verifies them

    • Example: Claude Opus 4.5 reported 52.0% but Scale AI evaluated it at 45.89%, so it would resolve to the 45.89% range

Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ8 NO

While Claude Opus 4.5 reported a 52.0% on SWE-Bench Pro, Scale AI evaluated it at a 45.89. OpenAI reports that GPT 5.2 Thinking got a 55.6% but this will only resolve 55+ if Scale AI verifies it

sold Ṁ9 YES

@Jolliest very relevant information, thanks!

Would love a 2027 market of this

Market might be a little scuffed because I'm cheap with mana. Feel free to make another market for SWE-Bench Pro that is more precise

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules