Keynote Summary · Frameworks · Applied Guide for ITGIRL / Agenicore / Cocode / Filmteams
The keynote was not a model drop — it was a strategy signal. Anthropic's CPO Ami Vora opened by declaring: "Today is about how we are making our products work better for you." The throughline was orchestration, agent reliability, and giving serious builders infrastructure they can run for hours, not minutes.
Anthropic is using all capacity of SpaceX's Colossus data center — 300 MW, 220,000+ NVIDIA GPUs. Direct result: doubled rate limits for Claude Code Pro/Max/Enterprise users immediately.
Opus 4.7 can be called on-demand by Sonnet to provide advice on hard decisions — "advisor mode." One team (eve) got frontier model quality at 5× lower cost using this pattern.
Multi-agent orchestration (public beta), Outcomes (public beta), and Dreaming (research preview) all dropped. This is Anthropic's answer to "how do teams ship 10× faster."
Routines are "higher-order prompts" — you set up async automations and wake up to merge-ready PRs. The head of Claude Code says most of his code is now built by routines.
Opus 4.7 has genuine visual design judgment. Claude Design was announced (labs.anthropic.com). Planning mode on amp switched to Opus 4.7 — better visual outputs across the board.
This time last year agents worked for minutes. Today builders run them for hours. Shopify and Mercado Libre (23,000 engineers!) are targeting 90% autonomous coding by Q3 2025.
These are the repeatable frameworks and strategies Anthropic's leaders surfaced — extracted and translated for your work.
Instead of running Opus for everything (expensive) or Sonnet for everything (lower quality), you architect a two-tier system. Sonnet handles execution — writing code, drafting content, processing tasks. Opus serves as on-call advisor for high-judgment moments: reviewing output quality, making tricky decisions, setting strategy.
The result: frontier-level quality at dramatically lower cost. One production team cut costs 5× while maintaining quality benchmarks.
Dianne Penn from Anthropic Research dropped this as "classic advice": build things that don't quite work today on the assumption that they'll start working with the next model upgrade. Don't let current model limits define your architecture ceiling.
This is the AI builder's version of Moore's Law thinking. If a workflow fails at 70% reliability today but needs 90%, ship the scaffold and trust the model curve.
"Design for the next model. Build things that don't quite work today on the assumption that they'll start working with a model upgrade in the future."
— Dianne Penn, Head of Product for Research, Anthropic
Anthropic's Dianne Penn shared what separates the teams getting the most value from Claude from everyone else. Three things:
They don't manually review outputs. They build automated evaluation pipelines that score Claude's work against defined criteria at every step. You can't improve what you don't measure systematically.
They resist the urge to over-engineer agent frameworks. Clean, minimal scaffolding that's easy to iterate on. The power is in the model + the task definition, not the orchestration complexity.
They find use cases others haven't figured out yet. Not the obvious AI features — the adjacent, creative applications that unlock disproportionate value for their specific domain.
Anthropic's "Dreaming" feature for Managed Agents introduces a self-improvement loop: an agent reviews its own previous sessions overnight, identifies what it missed or did poorly, and generates new memory or playbooks. In the demo, the agent created a descent-playbook.md from reviewing past drone-landing sessions.
Even if you can't use Dreaming directly yet (research preview), the framework is something you can implement manually or prompt-engineer into your own loops.
Agent executes multi-step work and logs its session context.
Agent reads its own session history and identifies gaps, errors, or missing context.
Generates updated playbook/memory file that improves future runs automatically.
Each cycle the agent is smarter about its own domain without manual prompt tuning.
Boris Cherny (who created Claude Code) defined this cleanly: Routines are higher-order prompts — not one-shot instructions but standing, reusable prompt patterns that trigger autonomous workflows. You define the routine once; it runs on schedule or trigger, produces PR-ready output, and you review/merge async.
The key insight: a lot of code going forward will be written asynchronously. You're not supervising a session — you're reviewing work that ran while you were elsewhere.
"The person who owns the PR is never going to see a red X. Claude is prompting Claude Code on its own."
— Boris Cherny, Head of Claude Code, Anthropic
Feature status as announced May 6, 2026:
| Feature | Status | What It Does | Relevant To You |
|---|---|---|---|
| Multi-Agent Orchestration | Public Beta | Create Commander/Detector/Navigator-style agent fleets for complex tasks | Cocode, Agenicore |
| Outcomes | Public Beta | Define what success looks like; Claude iterates until it gets there | All projects |
| Dreaming | Research Preview | Agent reviews past sessions, generates self-improvement playbooks overnight | Agenicore, ITGIRL |
| Routines | Live | Async, higher-order prompt automations; wake up to merge-ready PRs | Cocode, Filmteams |
| CI Auto-Fix | Live | Claude files automatic fixes against failing PRs automatically | Cocode dev pipeline |
| Security Reviews | Live | Automated security review pass on your codebase | ITGIRL (Supabase/Stripe) |
| Claude Design | New / Labs | Opus 4.7's visual design taste applied to UI generation | Filmteams UI, ITGIRL |
| Claude Code Desktop | Live | Full-screen GUI with preview pane; multiple parallel sessions | All coding projects |
| Remote Agents | Live | Control your dev laptop from your phone | Mobile-first workflow |
| Doubled Rate Limits | Live (today) | 5-hour Claude Code limit doubled for Pro/Max/Enterprise | All projects |
Check off as you go. These are ordered by impact and availability right now.