📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are now capable of automating core engineering tasks in AI research, reaching near-saturation on several benchmarks. However, research activities still involve residual, less automatable components, though this gap is narrowing.
Recent assessments of AI progress reveal that engineering tasks in AI research are nearing complete automation, while research activities still involve residual, less automatable elements. This shift has significant implications for the future pace of AI development and the role of human researchers.
According to Thorsten Meyer’s analysis of Jack Clark’s recent work, six key benchmarks measuring AI’s core research skills are approaching saturation, with several achieving or surpassing 95% performance. For example, the CORE-Bench, which tests AI’s ability to reproduce research papers, reached 95.5% in December 2025, with its author declaring it ‘solved.’ Similarly, the MLE-Bench, which assesses AI on Kaggle competitions, hit 64.4% in February 2026, approaching mid-tier human performance. These benchmarks indicate that AI can now handle many engineering tasks involved in research, such as reproducing experiments and optimizing code, at a level comparable to competent human researchers.
However, the same analysis suggests that some aspects of research—particularly those involving creativity, hypothesis generation, and interpretation—may remain less automatable. Clark’s structural assessment leaves open whether research itself is becoming a form of scaled engineering, which could accelerate the closing of this residual gap. The institutional response to this rapid progress should consider that reliance on inspiration or novelty as a permanent moat may be increasingly untenable.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

The No-BS Guide to AI for Trading & Market Research: How to Use ChatGPT, Claude & AI Tools for Market Analysis, Stock Research & Data-Driven Trading … … Required (The No-BS AI Playbooks Book 3)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

Base Kit Computer Coding Game for Kids 8-12+ and Teens to Learn Code & Electronics. Great STEM Gift for Boys & Girls for Real C++ Coding with Over 60 Projects Included.
Great STEM gift; coding for kids 8+ has never been easier; perfect for boys and girls interested in…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Design of Heuristic Algorithms for Hard Optimization: With Python Codes for the Travelling Salesman Problem (Graduate Texts in Operations Research)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Auctori: Never Lose the Plot in AI Writing: The Story Cohesion Platform for Creating & Publishing Ebooks on Amazon KDP – Prompts, Canon, Export
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Near-Automation in AI Engineering
The near-complete automation of engineering tasks in AI research signifies a potential acceleration in AI development cycles, reducing costs and increasing reproducibility. This shift could diminish the need for extensive human effort in routine research activities, allowing human researchers to focus on higher-level creative and strategic tasks. It also raises questions about the future role of human intuition and hypothesis generation, which may become the remaining bottleneck in AI progress. Overall, this trend suggests a fundamental transformation in how AI research is conducted, with automation becoming the dominant force in engineering aspects.
Recent Advances in AI Engineering Capabilities
Over the past two years, multiple AI benchmarks related to research reproduction, Kaggle competition performance, and kernel optimization have shown rapid progress. The CORE-Bench, measuring research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025. The MLE-Bench, assessing Kaggle competition success, moved from 16.9% in October 2024 to 64.4% in February 2026. Additionally, numerous research papers and tools demonstrate continuous improvements in automated kernel design, GPU optimization, and infrastructure automation. This pattern indicates a broader trend of AI systems approaching or reaching saturation in engineering tasks critical to research workflows.
“Clark’s conclusion is correct and possibly understated for engineering. The residual research question is real but may be less binding than the framing suggests.”
— Thorsten Meyer
Remaining Uncertainties in AI Research Automation
It is still unclear to what extent AI can automate the creative, hypothesis-driven components of research, such as generating novel ideas or interpreting results. The structural question of whether research itself is becoming a form of scaled engineering remains open, and the pace at which residual research tasks will fully automate is uncertain. Additionally, the institutional and ethical implications of this shift are still emerging and require further analysis.
Next Steps in AI Research Automation Development
Over the coming 12 to 24 months, focus will likely be on refining AI’s capabilities in less automatable research areas, understanding the implications of near-complete engineering automation, and developing frameworks to manage the transition. Monitoring the progression of benchmarks and the emergence of new tools will be critical to assess how quickly the residual research activities diminish. Researchers and institutions should prepare for a landscape where automation plays an increasingly central role in AI development.
Key Questions
What does near-saturation of AI benchmarks mean for human researchers?
It suggests many routine engineering tasks in AI research can be automated, potentially reducing the need for human effort in these areas and shifting human focus toward creative and strategic aspects.
Are there aspects of AI research that cannot be automated yet?
Yes, components involving hypothesis generation, interpretation, and creative problem-solving remain less automatable, though progress is ongoing.
How quickly might residual research tasks become fully automated?
Based on current trajectories, significant automation could occur within the next 2 years, but some aspects may require longer or face fundamental challenges.
What are the potential risks of automating AI research so extensively?
Risks include over-reliance on automation, reduced human oversight, and ethical concerns about transparency and accountability in AI development.
Source: ThorstenMeyerAI.com