Last Updated on September 29, 2025 by Becky Halls
Julian Schrittwieser, a prominent AI researcher and Member of Technical Staff at Anthropic, has published a detailed analysis countering claims of stagnation in AI development. Known for his pivotal roles in creating revolutionary algorithms like AlphaGo, AlphaZero, and MuZero, Schrittwieser’s latest findings, released on September 27, 2025, showcase evidence of continued exponential growth in AI capabilities.
His analysis draws parallels to misconceptions seen during the early days of the COVID-19 pandemic, when the exponential nature of virus transmission was misunderstood. Schrittwieser remarked, “Long after the timing and scale of the coming global pandemic was obvious from extrapolating the exponential trends, politicians, journalists and most public commentators kept treating it as a remote possibility or a localized phenomenon.”
Key Data Reveals Consistent Growth
Schrittwieser’s report relies heavily on two evaluation frameworks, METR (Model Evaluation & Threat Research) and OpenAI’s GDPval. METR findings indicate that AI systems are now capable of autonomously completing software engineering tasks lasting up to two hours, achieving a 50% success rate. This marks consistent progress, with task-length capabilities doubling approximately every seven months. Notably, models such as Grok 4, Opus 4.1, and GPT-5 have exceeded performance expectations.
Complementing this, OpenAI’s GDPval evaluation, which spans 44 occupations across nine industries, provides additional validation. The study assessed 1,320 tasks designed by professionals with an average of 14 years of experience. Using a blinded comparison methodology, evaluators graded the performance of various AI systems against human-generated solutions. While GPT-5 delivered high accuracy in multiple industries, Claude Opus 4.1 outperformed, matching expert-level human performance in numerous tasks. Schrittwieser commended these results, stating, “I want to especially commend OpenAI here for releasing an eval that shows a model from another lab outperforming their own model – this is a good sign of integrity and caring about beneficial AI outcomes.”
Exponential Trends Across Industries
According to METR, AI’s task-completion capabilities have improved dramatically since GPT-2’s ability to handle one-second tasks in 2020. By 2025, newer models such as Sonnet 3.7, Grok 4, Opus 4.1, and GPT-5 demonstrate capacity for tasks lasting two hours or more. Schrittwieser projects that AI systems will achieve autonomous work capabilities for full eight-hour tasks by mid-2026 and may surpass human expert performance across multiple industries before the end of 2026.
The GDPval evaluation highlights progress beyond software engineering, encompassing areas such as healthcare, finance, manufacturing, and legal analysis. Tasks ranged from regulatory compliance and strategic planning to real estate management and technical engineering, designed to mirror realistic workplace scenarios.
Industry-Wide Insights and Challenges
The report underscores performance variations among leading AI models, with some, like Grok 4 and Gemini 2.5 Pro, underperforming relative to initial benchmarks. These discrepancies emphasize the critical importance of standardized evaluation methodologies.
Despite the progress, Schrittwieser’s analysis also acknowledges limitations. METR tasks, for instance, are rated on a 16-point “messiness” scale, with an average score of 3, while real-world software engineering tasks often score between 7 and 8. Similarly, GDPval tasks are structured for digital-only scenarios with complete instructions, which do not fully reflect the complexities of organizational settings, such as ambiguity, multi-team coordination, or iterative processes.
Broader Implications and Looking Ahead
Schrittwieser argues that many misconceptions about AI’s development arise from a focus on surface-level interactions rather than structured evaluations. He emphasizes the importance of understanding exponential growth trends, citing historical examples of technological adoption, such as the internet and mobile devices. “Mathematical extrapolation often provides more accurate predictions than expert intuition in rapidly changing technical domains”, he explained.
The findings are particularly timely as debates around AI progress and investment intensify. Schrittwieser’s work suggests that the apparent “slowdown” in AI is a misinterpretation, and exponential advancements remain on track. His projections and data highlight the transformative potential of AI across industries, with significant milestones expected in the near future.
Overall, the analysis reinforces the need for objective frameworks like METR and GDPval to accurately measure AI’s capabilities and ensure transparency in reporting development progress. As the industry continues to evolve, Schrittwieser’s call for a deeper understanding of exponential trends may prove critical for guiding investments, policies, and strategies in the AI space.