Alibaba Qwen3.7-Max Agent: 35-Hour Autonomous Tasks, China #1

On May 20, 2026, Alibaba Cloud did something no Chinese cloud provider has done before: it announced a complete, top-to-bottom rebuild of its technology stack for the agentic AI era. Not a feature update. Not a model refresh. A chip-to-inference architecture designed from the ground up so AI agents can run 24/7, call tools autonomously, and self-improve without human intervention.

The centerpiece is Qwen3.7-Max — a flagship model that just landed at the top of Chinese AI benchmarks and, in a live demonstration, autonomously completed a 35-hour task on an unfamiliar chip platform, writing code and making over 1,000 tool calls along the way. By the end, it had rewritten a critical kernel to run 10x faster. No human touched the code.

What Just Happened at the 2026 Alibaba Cloud Summit

The summit in Hangzhou on May 20 was not a typical cloud keynote. Alibaba Cloud SVP Liu Weiguang opened with a blunt thesis: "After agents cross the critical threshold, they can work 24/7 non-stop, and their demand for AI and cloud resources becomes infinite." Then the company backed it up with four major announcements:

Autonomous AI agent working inside a glowing chip platform

Qwen3.7-Max: A new flagship model purpose-built for agent workloads, ranking #1 among Chinese models on the Arena global blind test leaderboard — ahead of Kimi-K2.6, DeepSeek-v4-pro, and GLM-5.1, and within striking distance of GPT, Claude, and Gemini.
Zhenwu M890 chip: A next-generation AI chip deployed in supernode servers, forming the compute foundation for agent workloads.
Qianwen Cloud (qianwenai.com): A new AI product portal offering API access to 150+ models — including competitors like GLM, Kimi, and DeepSeek — packaged as Skills and CLI tools that agents can directly consume.
Bailian platform open to rivals: Alibaba's model-as-a-service platform now hosts third-party models from Zhipu (GLM-5.1), MiniMax (M2.7), and Moonshot AI (Kimi-K2.6), positioned as a one-stop agent development marketplace.

This is the first time a Chinese cloud vendor has built an entire product lineup around the assumption that AI agents — not chatbots, not copilots — are the primary workload.

Why This Matters: Agent Is the Future, and Alibaba Just Bet the Farm on It

Let's be clear about what's happening here. We're watching a landmark shift — the moment AI moves from answering questions to executing work. This isn't a chatbot upgrade. It's the difference between asking a colleague for advice and hiring someone to run your operations while you sleep.

The Qwen3.7-Max benchmark results tell a story that goes beyond the numbers. On Terminal Bench 2.0-Terminus — a test specifically designed to measure an AI's ability to operate like a software engineer — it scored 69.7, beating both DeepSeek-v4-pro-Max and Claude-Opus4.6. On GPQA Diamond, HLE, and HMMT 2026, it exceeded Claude-Opus4 across the board.

But the real proof wasn't in the benchmarks. It was in the demo: Qwen3.7-Max was dropped onto a completely unfamiliar chip platform — one it had never seen — and asked to optimize a kernel. Over 35 hours, it wrote code, tested, debugged, and rewrote until the kernel ran 10x faster. Over a thousand tool calls. Zero human intervention. That's not a model demonstrating intelligence; that's an agent demonstrating autonomy.

This is what the "foundation-to-application evolution" looks like in practice. When Microsoft rolled out autonomous agents inside Dynamics 365, it signaled that enterprise software was being rebuilt around agents. Alibaba's move signals the same for the cloud infrastructure layer beneath it. The shift is happening at every level of the stack simultaneously.

Key Details: What Makes Qwen3.7-Max Different

Built for Agent Workloads, Not Just Benchmarks

Unlike previous Qwen releases that competed on general language understanding, Qwen3.7 was designed specifically for agentic scenarios: programming, reasoning, tool calling, and sustained long-horizon tasks. The model family includes Qwen3.7-Plus for multimodal reasoning and visual understanding — covering the full spectrum from coding agents to visual agents.

The 35-Hour Milestone

The autonomous kernel optimization task reveals three capabilities that don't show up in standard benchmarks:

Tool orchestration at scale: Making 1,000+ coherent tool calls without derailing requires persistent context management that goes far beyond chat memory.
Self-correction loops: The agent encountered failures, debugged, and iterated — not by being told to, but by detecting its own output wasn't optimal.
Domain transfer: Dropped onto unknown hardware, it adapted. This matters because real-world enterprise deployments rarely happen on ideal, well-documented systems.

Opening the Gates to Competitors

The most surprising move was Alibaba opening Bailian and Qianwen Cloud to rival models. GLM-5.1, Kimi-K2.6, and MiniMax M2.7 are now available alongside Qwen on Alibaba's infrastructure. It's the cloud-provider equivalent of a supermarket stocking competitors' products — a bet that controlling the agent development workflow (the platform) matters more than controlling which model runs on it.

It also mirrors what ByteDance achieved with the Seedance model's open posture: Chinese AI giants are learning that platform plays win bigger than walled gardens when developers have more model choices than ever.

AI agent making thousands of tool calls simultaneously

What This Means For Different Audiences

For Developers and AI Engineers

You now have a domestic model that competes with GPT and Claude on agent-specific benchmarks, running on infrastructure purpose-built for agent workloads. The "packaged as Skills and CLI tools" approach means agents can directly consume model APIs without custom integration layers. If you're building agents on Chinese cloud infrastructure, the friction just dropped significantly.

For Enterprises

A full-stack chip-to-inference agent pipeline from a major cloud vendor means you can deploy agents without stitching together infrastructure from five different vendors. The 35-hour autonomous demo isn't a parlor trick — it's a preview of what "set it and forget it" agent workflows look like in production. For any company running repetitive analytical or operational workflows, the ROI math is starting to look compelling.

For the Global AI Competition

Alibaba's move puts pressure on AWS, Google Cloud, and Azure to match this level of agent-native infrastructure integration. The Qianwen Cloud model — hosting competitor models as first-class citizens — is an approach no Western cloud provider has attempted at this scale. It's a distinctly Chinese competitive strategy: use openness to capture the workflow, then monetize the compute.

The Bigger Picture: This Is a Landmark Moment

There are moments in technology where the direction becomes unmistakable. The transition from on-premise servers to cloud computing was one. The shift from desktop to mobile was another. We're now watching the same class of transition — from AI that chats to AI that acts.

Alibaba committing its entire technology stack to this vision — from silicon to model to developer platform — isn't just a product announcement. It's a declaration that the agent era has arrived at the infrastructure level. When the people who build the cloud start redesigning chips for agent workloads, the conversation is no longer about whether agents will change how we work. It's about how fast.

Gartner projects that by the end of 2026, 40% of enterprise applications will embed AI agents — up from less than 5% in 2025. Alibaba's full-stack bet is an attempt to make its cloud the default home for those 40%. The 35-hour autonomous task isn't just a neat demo; it's the opening move in a race to prove whose infrastructure can keep agents running longest, cheapest, and most reliably.

FAQ

Is Qwen3.7-Max actually better than GPT and Claude?

On agent-specific benchmarks like Terminal Bench 2.0-Terminus, yes — it scored 69.7, ahead of Claude-Opus4.6 and DeepSeek-v4-pro-Max. On general benchmarks, it's competitive but not universally superior. The gap is narrowing fast, and the real differentiator is its optimization for sustained autonomous tasks rather than single-turn chat quality.

What does "35-hour autonomous task" actually mean?

Alibaba demonstrated Qwen3.7-Max optimizing a software kernel on an unfamiliar chip platform. Over 35 hours, the model wrote code, tested results, debugged failures, and rewrote sections to achieve a 10x speed improvement — all without human intervention. It made over 1,000 tool calls during the process, managing context and recovering from errors on its own.

Can I use Qwen3.7-Max now?

Yes, the model API will be available through Alibaba Cloud's Bailian platform and the new Qianwen Cloud portal (qianwenai.com). Developers can also access it through Skills and CLI tools designed specifically for agent integration.

Why is Alibaba selling competitor models on its own platform?

It's a platform strategy: Alibaba wants Qianwen Cloud and Bailian to be the default development environment for AI agents, regardless of which model developers prefer. By hosting GLM, Kimi, and DeepSeek alongside Qwen, Alibaba captures the agent development workflow — and monetizes the compute, storage, and networking underneath — even when developers choose rival models.

Conclusion

Alibaba Cloud's full-stack agent pivot is a signal that the AI industry is entering its next phase. Chip makers, cloud providers, and model builders are no longer optimizing for chat — they're rebuilding for autonomous execution. Qwen3.7-Max's 35-hour autonomous run is a preview of what happens when agents stop asking for permission and start delivering results. The infrastructure is catching up to the ambition, and that changes the timeline for every team building with AI agents today.

Breaking

Alibaba Cloud Goes All-In on Agents: Qwen3.7-Max Tops Chinese Benchmarks, Runs 35-Hour Autonomous Tasks

What Just Happened at the 2026 Alibaba Cloud Summit

Why This Matters: Agent Is the Future, and Alibaba Just Bet the Farm on It

Key Details: What Makes Qwen3.7-Max Different

Built for Agent Workloads, Not Just Benchmarks

The 35-Hour Milestone

Opening the Gates to Competitors

What This Means For Different Audiences

For Developers and AI Engineers

For Enterprises

For the Global AI Competition

The Bigger Picture: This Is a Landmark Moment

FAQ

Is Qwen3.7-Max actually better than GPT and Claude?

What does "35-hour autonomous task" actually mean?

Can I use Qwen3.7-Max now?

Why is Alibaba selling competitor models on its own platform?

Conclusion

Related Reading

由 Allen Zeng

《Alibaba Cloud Goes All-In on Agents: Qwen3.7-Max Tops Chinese Benchmarks, Runs 35-Hour Autonomous Tasks》有3条评论

您错过了

OpenAI Jalapeño Slashes Inference Costs 50%, Rivals NVIDIA

OpenAI Jalapeño Slashes Inference Costs 50%, Rivals NVIDIA

Sakana AI Fugu Rivals Fable 5 Using Multi-Agent Orchestration

Nobel Winner Jumper Joins Anthropic, DeepMind Ranked 5th in AI

About

Tags

Categories

Latest Posts

Archives

Categories

Alibaba Cloud Goes All-In on Agents: Qwen3.7-Max Tops Chinese Benchmarks, Runs 35-Hour Autonomous Tasks

What Just Happened at the 2026 Alibaba Cloud Summit

Why This Matters: Agent Is the Future, and Alibaba Just Bet the Farm on It

Key Details: What Makes Qwen3.7-Max Different

Built for Agent Workloads, Not Just Benchmarks

The 35-Hour Milestone

Opening the Gates to Competitors

What This Means For Different Audiences

For Developers and AI Engineers

For Enterprises

For the Global AI Competition

The Bigger Picture: This Is a Landmark Moment

FAQ

Is Qwen3.7-Max actually better than GPT and Claude?

What does "35-hour autonomous task" actually mean?

Can I use Qwen3.7-Max now?

Why is Alibaba selling competitor models on its own platform?

Conclusion

Related Reading

由 Allen Zeng

相关文章

Claude Orbit Leaked: Is Anthropic Building the Anti-Filter-Bubble Machine We Need?

Claude Code Found a Linux Vulnerability Hidden for 23 Years — Should AI Replace Security Researchers?

Kimi K2.6: How Moonshot AI’s Open-Weight Model Challenges the Closed-Source Pricing Model

《Alibaba Cloud Goes All-In on Agents: Qwen3.7-Max Tops Chinese Benchmarks, Runs 35-Hour Autonomous Tasks》有3条评论

您错过了

OpenAI Jalapeño Slashes Inference Costs 50%, Rivals NVIDIA

OpenAI Jalapeño Slashes Inference Costs 50%, Rivals NVIDIA

Sakana AI Fugu Rivals Fable 5 Using Multi-Agent Orchestration

Nobel Winner Jumper Joins Anthropic, DeepMind Ranked 5th in AI