
The Week Everything Shipped: Claude 4, Cursor 1.0, Google I/O, Build 2025

Table of Contents
The Week Everything Shipped: Claude 4, Cursor 1.0, Google I/O, Build 2025 #
This week redefines what is possible with AI. May 19-22, 2025 marks the most concentrated period of AI product launches in history — five major companies ship flagship products within 96 hours. Anthropic releases Claude 4, claiming the world's best coding model. Cursor graduates to 1.0 with Bugbot and Memories. Google I/O unveils Gemini 2.5 Pro Deep Think, Veo 3, and Project Mariner. Microsoft Build proposes an "open agentic web" with NLWeb and Azure AI Foundry. OpenAI bets $6.5 billion on Jony Ive's io. Mistral drops Devstral 24B — the best open coding agent.
William Spurlock is an AI automation engineer and custom web designer tracking these launches from the builder's perspective. In this post, I break down exactly what shipped, how these products compare, and what this convergence means for your stack.
| Launch Component | Date | Core Innovation |
|---|---|---|
| Microsoft Build 2025 | May 19-22 | "Open agentic web" thesis, NLWeb, Copilot agent GA |
| Google I/O 2025 | May 20-21 | Gemini 2.5 GA, Veo 3, Project Astra, AI Ultra $250/mo |
| Mistral Devstral 24B | May 20 | Best open coding agent at 24B parameters |
| OpenAI io acquisition | May 21 | $6.5B all-equity hardware bet, Jony Ive joins |
| Anthropic Claude 4 | May 22 | World's best coding model, hybrid reasoning |
| Cursor 1.0 | May 22 | Bugbot, Memories, Background Agents GA |
The timing is not coincidental. This is competitive convergence — the market has reached a maturity point where every major player must ship or be perceived as falling behind. The result is a market reset that changes what builders can expect from AI tooling.
The Timeline: Four Days That Redefine AI #
From May 19 to May 22, 2025, the AI industry experiences its most consequential product launch window ever. Six major product drops from five companies create a new baseline for what AI can do.
Day by day breakdown:
| Date | Time (PT) | Company | Launch | Significance |
|---|---|---|---|---|
| May 19 | 08:00 | Microsoft | Build 2025 Keynote | "Open agentic web" vision |
| May 20 | 10:00 | I/O 2025 Keynote | Gemini 2.5 GA, AI Mode US-wide | |
| May 20 | 14:00 | Mistral | Devstral 24B | Apache 2.0 coding model |
| May 21 | 09:00 | OpenAI | io acquisition | $6.5B hardware play |
| May 21 | 13:00 | I/O Developer Keynote | Project Astra, Mariner beta | |
| May 22 | 09:00 | Anthropic | Claude 4 | 72.5% SWE-bench, coding king |
| May 22 | 12:00 | Cursor | Cursor 1.0 | Bugbot, Memories, MCP GA |
The convergence signal: When companies that typically stagger launches to capture news cycles instead ship simultaneously, it indicates market maturation. No single company owns the narrative — the story is the density itself.
Microsoft Build 2025: The Open Agentic Web #
Microsoft Build 2025 (May 19-22) establishes "the open agentic web" as Redmond's strategic response to proprietary AI ecosystems. The thesis is simple: agents should interoperate across platforms using open standards, not walled gardens.
GitHub Copilot Agent Mode Goes GA #
GitHub Copilot's agent mode graduates from preview to general availability on May 19, 2025 with capabilities that push beyond simple code completion into autonomous software engineering.
New agent mode capabilities:
- Multi-file refactoring — Agents can plan and execute changes across entire repositories
- Test generation — Automatic test creation with coverage analysis
- Autonomous debugging — Agent persistence across debugging sessions with state retention
- Integration with GitHub Actions — Agents can trigger and monitor CI/CD pipelines
The key architectural change is agent persistence. Unlike earlier versions that reset context between interactions, Copilot agent mode maintains state across sessions — it remembers your codebase structure, previous decisions, and ongoing tasks.
Availability:
| Tier | Agent Mode | Cost |
|---|---|---|
| Copilot Free | No | $0 |
| Copilot Pro | Yes | $10/month |
| Copilot Business | Yes (admin enabled) | $19/month |
| Copilot Enterprise | Yes (admin enabled) | $39/month |
Windows AI Foundry and the NLWeb Standard #
Windows AI Foundry launches as a runtime for local AI agents on Windows 11 and the upcoming Windows 12. It positions Microsoft to own the agent infrastructure layer on consumer and enterprise PCs.
Windows AI Foundry features:
- Local model hosting via ONNX Runtime — Run quantized models on-device
- Native MCP client support — Connect to Model Context Protocol servers without third-party wrappers
- NPU acceleration — Hardware-accelerated inference on Snapdragon X Elite, Intel Core Ultra, and AMD Ryzen AI processors
- Agent sandboxing — Security boundary for untrusted agent code
NLWeb (Natural Language Web) is Microsoft's proposed open standard for exposing conversational APIs. The protocol allows any website to define:
- Intent schemas (what actions users can take)
- Entity extraction patterns
- Multi-turn dialogue flows
- Authentication and permission scopes
NLWeb example schema:
{
"nlweb_version": "1.0",
"intents": [
{
"name": "search_products",
"description": "Search for products in inventory",
"parameters": {
"query": {"type": "string", "required": true},
"category": {"type": "string", "enum": ["electronics", "clothing", "home"]}
}
}
]
}The strategic play is clear: if NLWeb becomes the standard for agent-web interaction, Microsoft owns the protocol layer regardless of which AI models power the agents.
Azure AI Foundry Agents #
Azure AI Foundry launches as a managed platform for building production-grade AI agents. It competes directly with Amazon Bedrock Agents and Google Vertex AI Agent Builder.
Azure AI Foundry capabilities:
| Feature | Description |
|---|---|
| Multi-agent orchestration | Supervisor-worker patterns, agent swarms |
| Built-in evals | Automated agent performance testing |
| Monitoring | Traces, logs, and metrics for agent runs |
| Model routing | Automatic model selection based on task |
| Tool integration | Native connectors to 100+ Azure services |
| Security | Enterprise auth, RBAC, audit logging |
Pricing: Pay-per-use based on agent invocations and underlying model tokens. Base rate: $0.002 per agent invocation plus model costs.
OpenAI's $6.5 Billion Bet on Hardware #
On May 21, 2025, OpenAI announces the acquisition of Jony Ive's io for $6.5 billion in an all-equity transaction — the largest AI hardware acquisition in history and OpenAI's first serious move beyond software.
What io Brings to OpenAI #
io (formerly known as LoveFrom's hardware division) is Jony Ive's stealth hardware startup that has been developing AI-native device form factors since 2023. The company emerged from Ive's departure from Apple in 2019 and represents his first major hardware initiative since the Apple Watch.
What io reportedly has built:
- AI-first mobile devices — Form factors designed around conversational AI rather than apps
- Ambient computing hardware — Home devices with always-on AI presence
- Wearable prototypes — Screen-free devices that use audio and haptics
- Custom silicon designs — Optimized for on-device inference
Jony Ive's role: Creative Head of OpenAI, reporting directly to Sam Altman. Ive brings his design team of approximately 20 industrial designers and engineers.
Strategic Implications of the io Acquisition #
The all-equity structure signals OpenAI's confidence in its equity value — and its desire to conserve cash for compute infrastructure. At OpenAI's last reported valuation of $300 billion, the $6.5 billion deal represents approximately 2.2% dilution.
Why hardware now:
| Factor | Analysis |
|---|---|
| Differentiation | Software-only AI is becoming commoditized |
| Data access | Hardware provides unique sensor data streams |
| Distribution | Own the customer relationship end-to-end |
| Margin structure | Hardware margins complement SaaS economics |
| Competition | Apple, Google, and Meta all have hardware plays |
The competitive response is immediate:
- Apple accelerates its "Apple Intelligence" roadmap, reportedly moving up Siri improvements
- Google pushes deeper Gemini Nano integration into Android and Pixel devices
- Meta doubles down on Ray-Ban Meta smart glasses and its Orion AR prototype
What Jony Ive Might Build for OpenAI #
Industry speculation centers on three potential form factors:
The "AI Phone" — A minimalist device designed for conversational AI rather than app browsing. Think of it as an iPhone reimagined from first principles for the LLM era.
Ambient Home Device — An always-on AI presence for the home, possibly with a display, possibly audio-only. Positioned against Amazon Echo and Google Nest.
Wearable AI — Screen-free devices that use audio input/output with haptic feedback. Competes with Humane AI Pin and Rabbit R1, but with Ive's design sensibility.
Timeline expectations: Hardware development cycles typically run 18-24 months. Industry analysts expect the first io-designed OpenAI hardware in late 2026 or early 2027.
Mistral Devstral 24B: The Open Coding Agent King #
On May 20, 2025, Mistral releases Devstral 24B under Apache 2.0 license — immediately claiming the title of best open-weight model for coding agent tasks.
Devstral Specifications and Architecture #
Devstral 24B is a 24-billion parameter dense transformer optimized for software engineering workflows. It is trained on a curated dataset of code repositories, technical documentation, and agent trajectories.
Full specifications:
| Specification | Value |
|---|---|
| Parameters | 24 billion |
| Architecture | Dense decoder-only transformer |
| Context window | 128,000 tokens |
| License | Apache 2.0 |
| Training data | Code, docs, agent trajectories |
| Quantization support | INT8, INT4, GGUF formats |
Key architectural decisions:
- Smaller size for efficiency — 24B enables inference on consumer GPUs (RTX 4090, 32GB VRAM)
- Agent-optimized training — Fine-tuned for tool use and multi-step workflows
- Open weights — Full model weights available for download and modification
Devstral Benchmark Performance #
Devstral 24B achieves competitive performance with models 3x its size on coding benchmarks.
| Benchmark | Devstral 24B | Llama 3.1 70B | GPT-4.1 mini | Claude 3.5 Haiku |
|---|---|---|---|---|
| SWE-bench Verified | 46.8% | 42.1% | 45.3% | 41.7% |
| HumanEval | 92.1% | 89.4% | 91.2% | 88.6% |
| LiveCodeBench | 58.3% | 52.7% | 54.1% | 51.9% |
| MBPP | 84.7% | 81.2% | 83.5% | 79.4% |
The SWE-bench Verified result is particularly notable. At 46.8%, Devstral 24B outperforms Llama 3.1 70B (42.1%) despite being one-third the size. This demonstrates efficient training and architecture optimization.
Why Devstral Matters for Builders #
Devstral 24B enables three deployment scenarios that were previously impractical:
- Local development agents — Run a capable coding agent on a consumer GPU without cloud API calls
- Cost-effective APIs — Self-host Devstral for $0.001/1K tokens vs. $0.015/1K for GPT-4.1
- Privacy-sensitive workflows — Codebases that cannot use cloud APIs get capable local agents
Deployment requirements:
| Hardware | VRAM | Inference Speed |
|---|---|---|
| RTX 4090 | 24GB | ~40 tokens/sec |
| A100 40GB | 40GB | ~60 tokens/sec |
| M2 Ultra | 128GB unified | ~35 tokens/sec |
| 2x A10G | 48GB combined | ~50 tokens/sec |
Mistral's positioning is clear: Devstral proves that open models can compete with proprietary frontier models on specific tasks. The 24B size hits a sweet spot of capability and deployability.
Google I/O 2025: Gemini 2.5 Goes Mainstream #
Google I/O 2025 (May 20-21) marks the general availability of Gemini 2.5 and dramatically expands Google's AI product surface area. The theme is ubiquity — Gemini in Search, Workspace, Android, Chrome, and dedicated agent products.
Gemini 2.5 Pro Deep Think and Flash GA #
Gemini 2.5 Pro graduates from preview to general availability with Deep Think mode and a 1 million token context window.
Gemini 2.5 Pro specifications:
| Specification | Value |
|---|---|
| Context window | 1,000,000 tokens (2M in preview) |
| Knowledge cutoff | March 2025 |
| Multimodal | Text, image, audio, video, PDF |
| Reasoning modes | Standard, Deep Think |
| API pricing | $1.25/1M input, $10/1M output |
Deep Think mode is an experimental enhanced reasoning capability that achieves state-of-the-art results on mathematics and coding benchmarks. It uses additional compute for chain-of-thought reasoning before generating responses.
Gemini 2.5 Flash also goes GA, optimized for latency-sensitive applications:
| Specification | Flash | Pro |
|---|---|---|
| Context window | 1,000,000 tokens | 1,000,000 tokens |
| API pricing | $0.15/1M input, $0.60/1M output | $1.25/1M input, $10/1M output |
| Best for | High-volume, latency-sensitive | Complex reasoning, accuracy |
AI Mode in Search Rolls Out US-Wide #
Google Search's AI Mode exits Labs and rolls out to all US users on May 20, 2025. This is Google's response to Perplexity and ChatGPT's threat to search.
AI Mode capabilities:
- Direct answers — AI-generated responses above traditional search results
- Multi-step reasoning — Complex queries requiring synthesis across sources
- Personal context — Integration with Gmail for personalized results (opt-in)
- Visual responses — Charts, graphs, and images in search results
- Shopping integration — Product comparisons and purchase recommendations
The rollout strategy:
- Phase 1 (May 2025): US users, English queries
- Phase 2 (June 2025): Expansion to 100+ countries
- Phase 3 (July 2025): Additional language support
Project Astra and Project Mariner #
Project Astra (Google's real-time multimodal AI) enters public beta with capabilities that blur the line between AI assistant and companion.
Project Astra features:
- Live video understanding — Point your camera at anything and ask questions
- Real-time voice conversations — Natural dialogue with low latency
- Screen sharing — Share your screen and get assistance
- Memory across sessions — Astra remembers previous conversations
Project Mariner (Google's browser-based agent) opens public beta for Google AI Ultra subscribers.
Project Mariner capabilities:
- Autonomous web navigation — Browse websites and complete tasks
- Form filling — Automatically complete forms with your information
- Purchase assistance — "Buy for me" functionality with Google Pay
- Integration with Google services — Calendar, Maps, Gmail actions
Example Mariner task: "Find 2 affordable tickets for this Saturday's Reds game in the lower level and make a dinner reservation nearby." Mariner can search ticketing sites, compare prices, purchase tickets, and book a restaurant — all with user approval at each step.
Veo 3, Imagen 4, and Google's Media Stack #
Google's media generation capabilities take a major leap forward at I/O 2025:
Veo 3 — Video generation with native audio:
| Feature | Veo 3 |
|---|---|
| Resolution | Up to 1080p |
| Duration | Up to 60 seconds |
| Audio | Native generation (dialogue, sound effects) |
| Availability | Google AI Ultra (US) |
The native audio capability is the breakthrough. Previous video generators required separate audio tracks. Veo 3 generates synchronized audio including dialogue, environmental sounds, and music.
Imagen 4 — Enhanced image generation:
- Better text rendering in images
- Higher fidelity details
- Faster generation speeds
- Available in Gemini app and Workspace
Google Flow — AI filmmaking tool:
- Natural language scene descriptions
- Camera control (pan, zoom, tracking)
- Scene builder for multi-shot sequences
- Integration with Veo and Imagen
AI Ultra: The $250/Month Tier #
Google introduces AI Ultra — a $249.99/month subscription tier positioned above the existing $19.99/month AI Pro plan.
AI Ultra includes:
| Feature | Ultra | Pro | Free |
|---|---|---|---|
| Price | $249.99/mo | $19.99/mo | $0 |
| Gemini 2.5 Pro rate limits | 10x higher | Standard | Limited |
| Veo 3 access | Yes | No | No |
| Project Mariner | Yes | No | No |
| Deep Think priority | First | Standard | Limited |
| Workspace features | Enterprise | Standard | Basic |
The pricing signals Google's segmentation strategy: Pro for power users, Ultra for professionals and enterprises where AI is a core workflow component.
Anthropic Claude 4: The Coding Model to Beat #
On May 22, 2025, Anthropic releases Claude 4 Sonnet and Claude 4 Opus — claiming "world's best coding model" status and extending its lead in AI agent capabilities.
Claude 4 Sonnet and Opus Specifications #
Claude 4 Opus is Anthropic's flagship model, designed for complex software engineering and long-running agent workflows.
Claude 4 Opus specifications:
| Specification | Value |
|---|---|
| Context window | 200,000 tokens |
| Output maximum | 128,000 tokens |
| Knowledge cutoff | February 2025 |
| Input price | $15/million tokens |
| Output price | $75/million tokens |
| Extended thinking | Up to 256K thinking budget |
Claude 4 Sonnet balances performance and efficiency:
| Specification | Value |
|---|---|
| Context window | 200,000 tokens |
| Output maximum | 128,000 tokens |
| Input price | $3/million tokens |
| Output price | $15/million tokens |
| Extended thinking | Up to 128K thinking budget |
Claude 4 Benchmark Performance #
Claude 4 Opus achieves industry-leading performance on software engineering benchmarks:
| Benchmark | Claude 4 Opus | Claude 4 Sonnet | GPT-4.1 | Gemini 2.5 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 72.5% / 79.4%* | 72.7% / 80.2%* | 54.6% | 63.2% |
| Terminal-bench | 43.2% / 50.0%* | 35.5% / 41.3%* | 30.3% | 25.3% |
| GPQA Diamond | 79.6% / 83.3%* | 75.4% / 83.8%* | 66.3% | 83.0% |
| HumanEval | 97.3% | 96.8% | 91.2% | 92.1% |
| LiveCodeBench | 70.1% | 65.3% | 54.1% | 68.7% |
*With extended thinking enabled
The SWE-bench Verified result is decisive. At 72.5% (79.4% with extended thinking), Claude 4 Opus establishes a new state-of-the-art, outperforming GPT-4.1 by nearly 18 percentage points and Gemini 2.5 Pro by over 9 points.
What the benchmarks mean in practice:
- SWE-bench tests real-world software engineering — bug fixing, feature implementation, test writing
- Terminal-bench measures command-line coding tasks
- 72.5% on SWE-bench means Claude 4 Opus successfully completes nearly 3 out of 4 real GitHub issues
Extended Thinking with Tool Use #
Claude 4 introduces extended thinking with tool use — a capability that allows the model to alternate between reasoning and tool invocation during complex problem-solving.
How it works:
- User requests a complex task (e.g., "Refactor this microservice to use a new database schema")
- Claude enters extended thinking mode with a budget of reasoning tokens
- During thinking, Claude can call tools — search the web, read files, execute code
- Tool results feed back into the reasoning process
- Claude synthesizes the final answer after reasoning completes
Extended thinking configuration:
{
"model": "claude-opus-4-20250514",
"max_tokens": 128000,
"thinking": {
"type": "enabled",
"budget_tokens": 64000
},
"tools": [{...}]
}The breakthrough is the combination of deep reasoning with tool use. Previous models could reason or use tools; Claude 4 can do both simultaneously within a single reasoning trace.
Claude Code Goes Generally Available #
Claude Code exits research preview and becomes generally available on May 22, 2025 with expanded integrations and new capabilities.
Claude Code GA features:
| Feature | Description |
|---|---|
| VS Code integration | Native extension for direct editor integration |
| JetBrains support | Plugin for IntelliJ, PyCharm, WebStorm |
| GitHub Actions | Background task execution via GitHub Actions |
| Extended thinking | Access to Claude 4's reasoning capabilities |
| Files API | Direct file upload and management |
| Code execution | Safe sandboxed code execution |
| MCP connector | Native Model Context Protocol support |
Pricing:
| Plan | Claude Code | Extended Thinking |
|---|---|---|
| Free | Web, mobile only | No |
| Pro ($20/mo) | Full access | Yes |
| Max ($100-200/mo) | Full access | Yes + higher limits |
| Enterprise | Custom | Yes |
API capabilities for builders:
Anthropic also releases four new API capabilities for building agents:
- Code execution tool — Safe, sandboxed Python execution
- MCP connector — Connect to any MCP server
- Files API — Upload and manage files for Claude to access
- Prompt caching (1 hour) — Cache prompts for repeated use
Cursor 1.0: Bugbot, Memories, and the MCP Revolution #
On May 22, 2025, Anysphere releases Cursor 1.0 — graduating the AI editor from rapid beta iterations to a mature, enterprise-ready platform.
Bugbot: Automatic Code Review #
Bugbot is Cursor 1.0's headline feature — an AI agent that automatically reviews pull requests and identifies issues before human review.
Bugbot capabilities:
| Issue Type | Detection Rate | Example |
|---|---|---|
| Logic errors | High | Off-by-one errors, null pointer risks |
| Security vulnerabilities | High | SQL injection risks, XSS vectors |
| Performance regressions | Medium | N+1 queries, inefficient algorithms |
| Style violations | High | Deviation from project conventions |
| Edge cases | Medium | Missing error handling, boundary conditions |
Bugbot workflow:
- Developer opens a pull request on GitHub
- Bugbot automatically reviews the diff
- Issues are posted as PR comments with explanations
- "Fix in Cursor" button allows one-click fixes
- Human reviewers get a pre-checked PR
Availability: Bugbot is included with Cursor Pro ($20/month) and Business ($40/month) tiers.
Memories: Persistent Context #
Cursor Memories provides persistent, cross-project context that survives across sessions.
How Memories work:
- Cursor observes patterns in your codebase and interactions
- It extracts "facts" about your preferences and architecture
- Facts are stored per-project and globally
- Future sessions reference Memories for better context
Example Memories:
- "This project uses TypeScript strict mode and never uses
any" - "The user prefers functional components over class components"
- "Authentication is handled via JWT in the
auth.tsfile" - "This codebase uses SWR for data fetching, not React Query"
Memory management: Users can view, edit, and delete Memories in Settings → Rules. The feature is in beta and requires opt-in.
One-Click MCP Setup #
Cursor 1.0 makes MCP server installation one-click through its new MCP marketplace.
MCP marketplace features:
- Curated list of official MCP servers (25+ at launch)
- One-click installation with auto-configuration
- OAuth support for authenticated servers
- "Add to Cursor" buttons for MCP developers
Popular MCP servers at launch:
| Server | Function |
|---|---|
| filesystem | Local file operations |
| github | Repository management |
| postgres | Database queries |
| slack | Team messaging |
| notion | Documentation access |
| browser | Web automation |
For MCP developers: Cursor provides a deeplink generator at cursor.com/docs/deeplinks to create "Add to Cursor" buttons for README files.
Background Agents Go GA #
Cursor Background Agents — the remote agent feature for long-running tasks — becomes generally available.
Background Agent capabilities:
- Async execution — Start a task, close your laptop, come back to results
- Long-running tasks — Agents can run for hours
- CI/CD integration — Trigger agents from GitHub Actions, GitLab CI
- Notifications — Slack, Discord, or email when tasks complete
Starting a Background Agent:
- Click the cloud icon in chat
- Or press
Cmd/Ctrl+E(privacy mode must be disabled) - Describe the task
- Agent runs remotely, notifies on completion
Use cases:
- Multi-hour refactoring jobs
- Dependency upgrades across large codebases
- Test generation for legacy code
- Documentation generation
Jupyter Notebook Support #
Cursor 1.0 adds first-class Jupyter Notebook support — a significant expansion beyond traditional code files.
Jupyter capabilities:
- AI-assisted cell generation
- Edit multiple cells in one pass
- Code explanation inline
- Error debugging assistance
- Chart and visualization generation
Model support: Jupyter features work with Sonnet models (Claude 3.5/4 Sonnet) initially.
Additional Cursor 1.0 Features #
Richer chat responses:
- Mermaid diagram rendering in chat
- Markdown table display
- Mathematical equation support
New settings and dashboard:
- Usage analytics by tool and model
- Team spend tracking
- Individual user statistics
Keyboard shortcuts:
Cmd/Ctrl+E— Open Background Agent panel
Improvements:
@Linkand web search can parse PDFs- Network diagnostics in settings
- Parallel tool calls for faster responses
- Collapsible tool calls in chat
Account features:
- Enterprise users locked to stable releases
- Team admins can disable Privacy Mode
- Admin API for usage metrics and spend data
Models:
- Max mode now available for Gemini 2.5 Flash
What It All Means: The Market Resets #
This week establishes five converging trends that reshape the AI tooling landscape. Builders who understand these shifts can make better technology choices.
Trend 1: Agents Are the Primary Interface #
Every major launch this week centers on agentic AI. The transition from "AI-assisted" to "AI-agent" is now the consensus view across the industry.
| Product | Agent Type | Key Capability |
|---|---|---|
| GitHub Copilot Agent Mode | IDE-embedded | Multi-file refactoring with persistence |
| Claude Code GA | Terminal-native | Long-horizon workflows, skills, hooks |
| Project Mariner | Browser-native | Web navigation and task completion |
| Azure AI Foundry | Cloud-orchestrated | Multi-agent systems, enterprise scale |
| Cursor Background Agents | Remote execution | Async long-running tasks |
What this means: The unit of interaction is shifting from "prompt and response" to "delegate and review." Developers will increasingly act as supervisors rather than operators.
Trend 2: Hybrid Reasoning Becomes Standard #
Hybrid reasoning — the ability to toggle between fast responses and deep analysis — is now table stakes for frontier models.
| Model | Fast Mode | Deep Mode | Toggle Mechanism |
|---|---|---|---|
| Claude 4 | Standard | Extended thinking (256K budget) | API parameter |
| Gemini 2.5 Pro | Standard | Deep Think | Model selection |
| GPT-4.1 | Standard | GPT-4.1 extended | Model selection |
Claude 4's innovation is granularity. The budget_tokens parameter lets developers dial reasoning depth precisely, rather than choosing between binary modes. This enables cost optimization: use minimal thinking for simple tasks, maximum thinking for complex analysis.
Trend 3: Open Standards Win #
Microsoft's NLWeb proposal and MCP's rapid adoption demonstrate the industry's shift toward composable, open ecosystems.
Why open standards matter now:
- Vendor lock-in resistance — Enterprises want to avoid dependency on single providers
- Agent interoperability — Agents should work across platforms, not within walled gardens
- Developer leverage — Standards give developers options and bargaining power
The standards landscape:
| Standard | Sponsor | Purpose |
|---|---|---|
| MCP | Anthropic | Tool and resource exposure to LLMs |
| NLWeb | Microsoft | Web-to-agent communication protocol |
| A2A | Agent-to-agent communication | |
| Agent Skills | Anthropic | Portable agent capabilities |
Trend 4: Hardware Becomes Strategic #
OpenAI's $6.5 billion io acquisition signals that hardware is the next competitive frontier. When differentiation at the software layer becomes difficult, companies look to hardware for moats.
The hardware strategies:
| Company | Approach | Advantage |
|---|---|---|
| OpenAI | Acquire io, Jony Ive design | Premium positioning, Apple-style differentiation |
| Deep Android/Pixel integration | Distribution, on-device inference | |
| Apple | Apple Intelligence, Neural Engine | Privacy, ecosystem lock-in |
| Meta | Ray-Ban Meta, Orion AR | Social context, wearable form factors |
| Anthropic | Software-only (currently) | Focus, platform independence |
Implication for builders: Expect more on-device capabilities and tighter hardware-software integration. The cloud-only AI era is ending.
Trend 5: The Model Layer Commoditizes #
When Claude 4, GPT-4.1, Gemini 2.5 Pro, and Devstral 24B all achieve 70%+ on coding benchmarks, raw capability is no longer the differentiator.
What's differentiating now:
| Layer | Differentiator | Example |
|---|---|---|
| Interface | UX, editor integration | Cursor's tab completion |
| Agent framework | Skills, hooks, memory | Claude Code's skill system |
| Tooling | MCP servers, extensions | Cursor's MCP marketplace |
| Deployment | Latency, reliability, cost | Gemini 2.5 Flash for speed |
| Ecosystem | Integrations, workflows | GitHub Copilot's Actions integration |
The portfolio approach wins: Rather than betting on a single model, successful teams will use Claude 4 for deep reasoning, Cursor for daily coding, Gemini for multimodal tasks, and Devstral for self-hosted scenarios.
Which Tools Win for Which Workflows #
With so many capable options, decision frameworks matter more than benchmark chasing. Here's how to choose based on real-world workflows.
Coding Models: Claude 4 vs GPT-4.1 vs Gemini 2.5 Pro #
For complex software engineering (multi-file, architectural):
| Factor | Claude 4 Opus | Claude 4 Sonnet | GPT-4.1 | Gemini 2.5 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 72.5% | 72.7% | 54.6% | 63.2% |
| Context window | 200K | 200K | 1M | 1M |
| Cost (input/output) | $15/$75 | $3/$15 | $2/$8 | $1.25/$10 |
| Best for | Frontier tasks | Production work | High volume | Multimodal |
Recommendation: Claude 4 Opus for high-stakes, complex tasks where accuracy matters more than cost. Claude 4 Sonnet for production workloads. GPT-4.1 for high-volume, cost-sensitive applications. Gemini 2.5 Pro when multimodal inputs are required.
Editors: Cursor 1.0 vs Claude Code vs Copilot #
For daily development work:
| Factor | Cursor 1.0 | Claude Code | GitHub Copilot |
|---|---|---|---|
| Interface | AI-first editor | Terminal agent | IDE plugin |
| Tab completion | Yes | No | Yes |
| Agent capabilities | Yes (Composer) | Yes (native) | Yes (agent mode) |
| PR review (Bugbot) | Yes | No | Limited |
| Memories/context | Yes | Yes | Limited |
| MCP support | Yes | Yes | Limited |
| Price | $20/mo | $20/mo (via Claude) | $10-39/mo |
Recommendation: Cursor 1.0 for developers who want AI deeply integrated into their editing experience. Claude Code for those who prefer terminal workflows and long-horizon automation. Copilot for teams standardized on Microsoft tooling who need enterprise controls.
Agents: Azure AI Foundry vs Claude Code vs Project Mariner #
For production agent deployment:
| Factor | Azure AI Foundry | Claude Code | Project Mariner |
|---|---|---|---|
| Multi-agent | Native | Subagents | No |
| Coding focus | General | Yes | No |
| Web automation | Via tools | Limited | Native |
| Enterprise features | Extensive | Limited | Limited |
| Cloud integration | Azure-native | API-based | Google-only |
| Pricing | Per-invocation | Per-token | Included in Ultra |
Recommendation: Azure AI Foundry for enterprise multi-agent systems. Claude Code for coding-specific agents with complex workflows. Project Mariner for web-based task automation within Google's ecosystem.
The Week Ahead: What to Watch #
The launches this week are just the starting gun. The real impact unfolds as these products reach production workloads and user feedback loops begin.
Immediate Questions (Next 30 Days) #
1. Will Claude 4 maintain its benchmark lead in real-world usage?
SWE-bench is a controlled environment. Real codebases have messy dependencies, legacy code, and undocumented behaviors. Early user reports suggest Claude 4 Opus lives up to the hype for greenfield projects, but struggles with some legacy refactoring tasks.
2. How quickly does Cursor's Bugbot become standard practice?
The "Fix in Cursor" workflow is elegant, but requires teams to change their code review process. Watch for adoption curves in the next month. If Bugbot catches real bugs that human reviewers miss, it becomes table stakes quickly.
3. Can Google's AI Mode stem the threat from ChatGPT and Perplexity?
AI Mode in Search is Google's defensive move. The question is whether it can deliver answers authoritative enough to prevent users from going elsewhere. Initial feedback suggests speed is good, but attribution and sourcing need improvement.
4. Does Devstral 24B disrupt commercial API pricing?
At 24B parameters and Apache 2.0 license, Devstral enables self-hosted coding agents at a fraction of cloud API costs. If deployment tooling matures quickly, we could see enterprise adoption that pressures OpenAI and Anthropic pricing.
5. When do we see hints of Jony Ive's OpenAI hardware?
The io acquisition is long-term. But leaks, patents, or job postings could reveal direction sooner than expected. Watch for industrial design hires and supply chain rumors.
The Meta-Shift: Competition Synchronizes #
The real story this week isn't any single launch — it's the synchronization. Every major AI company ships flagship products within four days. This isn't accidental; it's market maturation.
What synchronized competition means:
- Shorter product cycles — No more year-long gaps between launches
- Feature parity pressure — If one product has it, others must match quickly
- Price compression — Competition drives costs down
- Builder benefit — More options, better tools, lower prices
The end of leisurely monopolies:
| Era | Pattern | Winner |
|---|---|---|
| 2022-2023 | OpenAI leads, others follow | First movers |
| 2024 | Competition emerges | Best products |
| 2025+ | Simultaneous launches | Users |
For builders, this is the best possible outcome. Capabilities that were frontier six months ago are now table stakes. The gap between research and production has collapsed from years to weeks. The winner-take-all narrative is replaced by a portfolio approach — use the right tool for the right job.
The new normal: Claude 4 for deep reasoning, Cursor for daily coding, Gemini for multimodal tasks, Devstral for self-hosted scenarios, and Copilot for enterprise standardization. The tools are mature enough that the question shifts from "which model is best?" to "which workflow am I optimizing for?"
FAQ: The Week Everything Shipped #
Q: What is Claude 4 and how does it compare to Claude 3.7 Sonnet? #
Claude 4 is Anthropic's next-generation model family released on May 22, 2025, consisting of Claude 4 Opus and Claude 4 Sonnet. Both models achieve significant improvements over Claude 3.7 Sonnet: Opus reaches 72.5% on SWE-bench Verified (vs. 3.7's 62.3%), while Sonnet 4 hits 72.7%. The key upgrade is extended thinking with tool use — Claude 4 can reason deeply while calling tools mid-reasoning. Pricing remains consistent with previous generations.
Q: What are the key features of Cursor 1.0? #
Cursor 1.0, released May 22, 2025, introduces Bugbot for automatic PR review, Memories for persistent cross-session context, one-click MCP setup, and general availability of Background Agents. Bugbot automatically reviews GitHub PRs and suggests fixes. Memories learns your codebase patterns and preferences. Background Agents enable async long-running tasks. The 1.0 release also adds Jupyter notebook support and richer chat with Mermaid diagrams.
Q: What did Google announce at I/O 2025? #
Google I/O 2025 (May 20-21) announced Gemini 2.5 Pro and Flash GA, AI Mode in Search for all US users, Project Astra real-time AI in beta, Project Mariner browser agent for Ultra subscribers, and Veo 3 video generation with native audio. The company also introduced AI Ultra — a $249.99/month subscription tier with highest rate limits and early access to features. Imagen 4 for higher-fidelity images and Google Flow for AI filmmaking were also unveiled.
Q: What is Microsoft's "open agentic web" thesis from Build 2025? #
Microsoft Build 2025 (May 19-22) proposed the "open agentic web" — a vision where AI agents interoperate across platforms using open standards rather than proprietary ecosystems. Key components include NLWeb (a protocol for websites to expose conversational APIs), Windows AI Foundry (local agent runtime for Windows), and Azure AI Foundry (managed multi-agent platform). GitHub Copilot agent mode also graduated to GA with persistence and Actions integration.
Q: Why did OpenAI acquire Jony Ive's io for $6.5 billion? #
OpenAI acquired Jony Ive's io for $6.5 billion in an all-equity deal on May 21, 2025, to build AI-native hardware devices. Ive, former Apple design chief, joins as Creative Head. The acquisition signals OpenAI's move from software-only to integrated hardware/software products. io has reportedly been developing AI-first mobile devices, ambient home hardware, and wearables since 2023. The first products are expected in 2026-2027.
Q: How does Mistral Devstral 24B compare to Claude 4 for coding? #
Mistral Devstral 24B achieves 46.8% on SWE-bench Verified, making it the best open-source coding model under 70B parameters, though it trails Claude 4 Opus's 72.5%. Released May 20, 2025, Devstral offers Apache 2.0 licensing (fully open), runs on consumer GPUs (24GB VRAM), and costs a fraction of commercial APIs. It's ideal for self-hosted agents and privacy-sensitive codebases, while Claude 4 remains superior for complex frontier tasks requiring deep reasoning.
Q: What is the pricing for Google's AI Ultra plan? #
Google AI Ultra costs $249.99 per month, launching at I/O 2025 on May 20. The tier sits above the existing $19.99/month AI Pro plan. Ultra includes 10x higher rate limits on Gemini 2.5 Pro, access to Veo 3 video generation, Project Mariner browser agent, priority access to Deep Think mode, and enhanced Workspace features. It targets professionals and enterprises where AI is a core workflow component.
Q: When is Claude Code generally available? #
Claude Code became generally available on May 22, 2025, alongside the Claude 4 launch. After a research preview starting in February 2025, Claude Code now supports VS Code and JetBrains extensions, GitHub Actions integration for background tasks, and new API capabilities including the code execution tool, MCP connector, Files API, and 1-hour prompt caching. Access requires Claude Pro ($20/month) or higher tiers.
Q: What is hybrid reasoning in Claude 4? #
Hybrid reasoning allows Claude 4 to toggle between fast responses and deep chain-of-thought analysis via the thinking API parameter. Unlike separate reasoning models (OpenAI's o-series), Claude 4 uses the same model weights for both modes. Extended thinking mode budgets up to 256K tokens for reasoning before generating output, enabling 72.5% performance on SWE-bench Verified. Pricing is identical for both modes; you only pay for tokens actually generated.
Q: What new video generation capabilities did Google announce? #
Google announced Veo 3 at I/O 2025, featuring native audio generation alongside video. Veo 3 generates up to 60 seconds of 1080p video with synchronized audio including dialogue, sound effects, and environmental audio — a significant advancement over video-only generators. It's initially available to Google AI Ultra subscribers in the US. Google also introduced Google Flow, an AI filmmaking tool that combines Veo, Imagen 4, and Gemini for scene-based video creation.
Q: How does Cursor 1.0's Bugbot work for PR reviews? #
Bugbot is an AI agent that automatically reviews pull requests and posts findings as GitHub comments. When a PR is opened, Bugbot analyzes the diff for logic errors, security vulnerabilities, performance regressions, and style violations. Each finding includes an explanation and a "Fix in Cursor" button that opens the editor with a pre-filled prompt to resolve the issue. Bugbot is included with Cursor Pro ($20/month) and Business ($40/month) tiers.
Q: What are Project Astra and Project Mariner from Google? #
Project Astra is Google's real-time multimodal AI entering public beta, while Project Mariner is a browser-based agent for Google AI Ultra subscribers. Astra offers live video understanding, real-time voice conversations, and screen sharing capabilities through the Gemini app. Mariner can autonomously navigate websites, fill forms, and complete tasks like purchasing tickets or making reservations — all with user approval at each step. Both represent Google's agent strategy for 2025.
Q: What is the NLWeb standard Microsoft proposed at Build 2025? #
NLWeb (Natural Language Web) is Microsoft's proposed open standard for websites to expose conversational APIs to AI agents. Announced at Build 2025, NLWeb defines intent schemas, entity extraction patterns, multi-turn dialogue flows, and authentication scopes. If adopted, any website could provide a machine-readable API that agents consume, enabling interoperable web automation across platforms. It positions Microsoft to own the protocol layer of the "agentic web."
Q: How much does Claude 4 cost compared to Claude 3.7 Sonnet? #
Claude 4 maintains the same pricing structure as previous generations. Claude 4 Sonnet costs $3 per million input tokens and $15 per million output tokens — identical to Claude 3.7 Sonnet. Claude 4 Opus costs $15 per million input and $75 per million output. Extended thinking uses the same pricing; thinking tokens are billed as output tokens, but you control the budget via the budget_tokens parameter up to 256K.
Q: What are Cursor Memories and how do they work? #
Cursor Memories is a beta feature in Cursor 1.0 that provides persistent, cross-session context by learning from your codebase and interactions. As you work, Cursor extracts "facts" about your preferences (e.g., "uses TypeScript strict mode," "prefers SWR over React Query") and stores them per-project and globally. Future sessions reference these Memories for better context. Users can view, edit, and delete Memories in Settings → Rules.
Q: When is the first io-designed OpenAI hardware expected? #
Industry analysts expect the first Jony Ive-designed OpenAI hardware in late 2026 or early 2027. Hardware development cycles typically run 18-24 months, and the io acquisition closed on May 21, 2025. Speculation centers on three form factors: an AI-first mobile device, an ambient home device, and wearables. Ive's design team of approximately 20 designers and engineers joined OpenAI as part of the $6.5 billion all-equity deal.
Closing: Building in the New Normal #
This week redefines what "state of the art" means. The tools that shipped May 19-22, 2025 aren't incremental improvements — they're the foundation for the next phase of AI-native development. Claude 4 establishes new coding benchmarks. Cursor 1.0 brings enterprise-ready AI to daily engineering. Google makes Gemini ubiquitous across Search, Workspace, and dedicated agents. Microsoft proposes open standards for the agentic web. OpenAI bets big on hardware. Mistral proves open models can compete.
The convergence creates opportunity. Six months ago, frontier capabilities were locked behind research preview waitlists. Today, they're generally available with clear pricing and documented APIs. The barrier to building AI-native products has never been lower.
The portfolio approach wins. No single tool dominates every use case. Successful teams will use Claude 4 Opus for complex reasoning, Cursor 1.0 for daily coding, Gemini 2.5 for multimodal tasks, Devstral 24B for self-hosted scenarios, and Copilot for enterprise standardization. The question shifts from "which model is best?" to "which workflow am I optimizing for?"
William Spurlock builds custom AI automation systems and immersive digital experiences. I track these tools not as a spectator, but as a practitioner shipping production workflows. If you need implementation support—whether architecting Claude 4 integrations, setting up Cursor 1.0 for your team, or building custom agent systems—I can accelerate your path.
For AI automation + growth:
I design and implement custom AI agent systems, n8n/MCP automations, and growth engineering pipelines. If you need help integrating Claude 4 into your development workflow, setting up Cursor 1.0 with team-wide configurations, or building multi-agent orchestration with Azure AI Foundry, let's talk.
Book an AI automation strategy call →
For custom web design + digital experiences:
I build 5-figure immersive websites and digital experiences using the same stack I write about — Next.js, GSAP, Three.js, and AI-augmented workflows. If you're looking to create a flagship web presence that stands out, let's discuss your project.
Start a custom website project →
Related Reading #
- Claude 3.7 Sonnet + Claude Code: The Hybrid Reasoning Era Begins — The precursor to Claude 4, establishing hybrid reasoning as a paradigm
- Google A2A Protocol: The Open Answer to MCP — How Google's agent-to-agent protocol compares to Anthropic's MCP
- Gemini 2.5 Pro and the Studio Ghibli Wave — Earlier coverage of Gemini 2.5's capabilities before I/O 2025
- OpenAI o3 and Codex CLI: Terminal Agents Arrive — OpenAI's competing terminal agent approach
Last updated: May 22, 2025
Related Posts

Google I/O 2026 Action List: How I Prompted Gemini 3.5 Flash and Antigravity Workflows
Google I/O 2026 just reset the AI tooling landscape. Here's the 9-action checklist for builders who want to ship this week, not just watch the keynote.

Anthropic vs. OpenAI vs. Google: The State of the Frontier in May 2026
A head-to-head breakdown of the three AI giants in May 2026: Claude Opus 4.6, GPT-5.3 and 5.4, Gemini 3.1 Pro. Real specs, real pricing, and what actually matters for builders.

Kimi K2 Open Weights: How I Prompted Moonshot's Frontier Model for Agentic Tool Use
How I direct Kimi K2 by Moonshot AI for agentic workflows, long-context tool calling, and workflow automation. A 1 trillion parameter MoE model with competitive benchmarks at 5-17x lower cost than GPT-5 and Claude.




