This market will resolve to the first 50% time horizon, as reported by METR, of Moonshot AI's Kimi K3 Thinking. If a model in the Kimi K3 family of models is evaluated by METR that is able to reason before providing an answer, like a reasoning model, but it doesn't contain "Thinking" in its name (like Kimi K2 Thinking did), this still counts as Kimi K3 Thinking for the purpose of this market. Kimi K3 Code, Kimi K3 Heavy, these all count if they are the first such model to be evaluated and reported on by METR.
50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.
See also:
/jim/claude-45-opuss-metr50-horizon (jim's version)
/Bayesian/claude-opus-45s-metr50-time-horizon (my version)
/Bayesian/gemini-3s-50-time-horizon-per-metr
/Bayesian/grok-420s-metr-50-time-horizon
/Bayesian/grok-5s-50-time-horizon-per-metr