Grok 4.20's METR 50% time horizon
1
1.1kṀ765
Dec 31
4%
<1.5h
10%
1.5h - 2h
22%
2h - 2.5h
23%
2.5h - 3h
15%
3h - 3.5h
8%
3.5h - 4h
5%
4h - 4.5h
4%
4.5h - 5h
3%
5h - 5.5h
2%
5.5h - 6h
4%
Other

This market will resolve to the first 50% time horizon, as reported by METR, for Grok 4.2 / Grok 4.20. If Grok 4.2 is never released, this market resolves N/A. If there is some coding- or other form of specialized version (e.g. Grok 4.20 Code), it will not count for the purpose if this market. A model expected to be called Grok 4.20 Heavy (using parallel test-time compute) would not count for this market, but Grok 4.20 Thinking, Grok 4.20 High, or Grok 4.20 xhigh, would all count for the purpose of this market.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.

See also:

/jim/gpt-52-metr

/jim/claude-45-opuss-metr50-horizon (jim's version)

/Bayesian/claude-opus-45s-metr50-time-horizon (my version)
/Bayesian/gemini-3s-50-time-horizon-per-metr

/Bayesian/grok-420s-metr-50-time-horizon (this market)

/Bayesian/grok-5s-50-time-horizon-per-metr

/Bayesian/r2s-50-time-horizon-per-metr

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules