This market will resolve to the highest 50% time horizon, as reported by METR, for any R2 model released within a month of the first R2 announcement.
50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.
See also:
/jim/claude-45-opuss-metr50-horizon (jim's version)
/Bayesian/claude-opus-45s-metr50-time-horizon (my version)
/Bayesian/gemini-3s-50-time-horizon-per-metr
/Bayesian/gpt5s-50-time-horizon-per-metr
/Bayesian/grok-5s-50-time-horizon-per-metr
/Bayesian/r2s-50-time-horizon-per-metr (this market)
If that happens, @traders do you agree it's fair to make it about V4 instead? ie if V4 is a reasoning model, R2 would refer to V4-thinking for the purpose of this market?
DeepSeek-R1: 27 mins, released 01-20-25 (SOTA since December was 39 mins)
DeepSeek-R1-0528: 31 mins, released 4 months later (SOTA since April was 1.5 hours)
quadrupling from 31 mins to > 2 hours in another 4 months seems (very) unlikely, not betting more because of uncertainty over when (if ever) it’ll be released.