Too good to deploy

Happy Tuesday ⚡️

If Claude has been ignoring your setup instructions, half-finishing tasks, or asking "should I continue?" every five minutes — you're not losing your mind. Someone did the research.

An AMD Senior AI Director analyzed Claude's session logs from January through March and shared what she found: median thinking depth dropped from ~2,200 to ~600 characters between February and March. API requests went up 80x — not from higher usage, but because the model was failing more often and retrying. On your worst hours (5pm and 7pm PST), it's noticeably slower and shallower than at midnight. The thinking budget is GPU-load-sensitive.

Today, we're talking about:

The model Anthropic built and can't release yet — and what that says about where AI actually is
The agent that finds homes without pools, renders one in the backyard, and mails a postcard (no salesperson involved)
Claude getting quietly throttled, Anthropic's leaked Lovable competitor, and a Cowork tutorial for anyone who's still trying to figure out where to start

The Model They Can't Ship Yet

Anthropic launched Project Glasswing this week — a 40-plus partner initiative built around Mythos Preview, a frontier model so capable at coding and reasoning it became, almost incidentally, the best vulnerability-hunter alive. It's already found thousands of critical flaws across every major OS and browser, including a 27-year-old bug in OpenBSD.

The UK's AI Security Institute ran their most rigorous evaluation and confirmed Mythos passed every stage — the first model to do so. That's the benchmark for "this thing can do real damage."

Anthropic is being explicit that this version of Mythos won't ship publicly — not yet, and maybe not ever in this form. Glasswing had to be assembled first because a model this capable couldn't exist in the open without turning critical infrastructure into a wider attack surface. The workflows and systems designed to absorb it didn't exist. They had to be built from scratch. And by the time a version of Mythos does reach the public, Anthropic's teams will already be well past it.

The teams that stopped designing around model limitations a while ago don't have to do anything special when Mythos eventually ships — and that's exactly why they keep pulling ahead. Each model upgrade is free acceleration. Their workflows get faster, more capable, more autonomous without anyone touching them. The ones still treating this as a pilot are on a six-month rewrite cycle, chasing a capability curve that doesn't slow down for anyone. There's no historical playbook for building on infrastructure that upgrades this fast — the closest thing is compounding interest, except the rate keeps going up.

Mythos can't ship yet because the world wasn't ready for it. The question is whether your workflows will be.

Need help building AI into your engineering and growth workflows?

Tenex is the team behind this awesome newsletter. We embed with your team to design, build, and ship AI systems that actually work—from agentic engineering pipelines to AI-powered growth engines.

Talk to our team →

The Business That Runs While You Sleep

Someone used OpenClaw — an open-source AI agent that runs on your machine — to build a fully automated sales machine for pool installations. It finds homes worth $500k–$1.2M without pools, renders a pool in the backyard using AI, and mails a physical before-and-after postcard to the homeowner. The whole operation runs without a salesperson or an ad budget — an agent, property records, and the postal service.

Greg Isenberg shared the example this week alongside 10 more like it, all running the same underlying logic: surface a gap in public data, build the solution before anyone asks for it, and show up with the math already done.

The shift worth naming: These aren't workflows where AI helps a human do more. They're businesses where the agent does the prospecting, the analysis, and the pitch — and a human shows up for the close. If you have a sales or BD function with a repeatable outreach motion, this is roughly the shape of what replaces your top-of-funnel in the next few years. The pool agent doesn't assist a salesperson. It replaces the entire front end of the sales process.

Here's how the framework works, pulled from Isenberg's best examples:

1. Find the data gap. Every industry has publicly available data that someone should be acting on but isn't. Commercial buildings with flat roofs in sunny states have calculable solar savings. Properties near EV corridors are missing charging infrastructure. Medical practices billing under specific CPT codes are leaving reimbursement on the table. The gap already exists — the agent surfaces it.

2. Build before they ask. The pool agent doesn't wait for a homeowner to call a pool company. It renders the pool, calculates the value add, and shows up uninvited with evidence.

3. Show up with the math done. Every pitch in Isenberg's list leads with a number: the solar ROI, the 40% cut in software spend, the recovered reimbursement revenue. "Here's what you're leaving on the table" closes faster than "here's what we could do." The agent calculates it before any human gets on a call.

Full list of 10 ideas and the framework behind them: Isenberg's thread.

Claude is being throttled by GPU load — An AMD Senior AI Director tracked Claude's performance from January through March and found median thinking depth dropped from ~2,200 to ~600 characters. API requests went up 80x — not more usage, more failures and retries. The fix a lot of people landed on: switch to Sonnet, avoid Opus at peak hours, and don't run sessions that have been idle for more than an hour. Read the full analysis

Anthropic is building a Lovable competitor — Leaked screenshots this week showed a full-stack app builder coming to Claude — drag-and-drop, build and deploy, the whole thing. The original post said "sneak peek at something coming soon to Claude :)" and pulled 5.2 million views. If you've built a product in the "make it easy to build apps with Claude" category, this week changed your competitive picture. See the leak

Google's desktop agent, before I/O — A new Agent tab has appeared in Gemini Enterprise: set a goal, connect your apps, toggle human review. The design closely mirrors Claude Cowork. Still in testing, but Google I/O is in a few weeks. Full breakdown

Turning agent output into a video, automatically — HeyGen launched a tool that takes whatever your AI agent produces and turns it into a video with an avatar presenter. The CEO's personal setup: an OpenClaw agent does overnight research, HeyGen turns it into a morning briefing delivered by a custom AI avatar. Every agent eventually hands off to a human — and right now that handoff is a wall of text. See how it works

Cowork 101: automate your workday without writing code — A practical walkthrough of how to set up Claude Cowork from scratch and start automating real workflows — no technical background needed. Good starting point if you or someone on your team has been meaning to actually do this. Watch free

Three ways to share Cowork skills across your team — How to get consistent AI workflows across an org instead of everyone rebuilding the same setups independently — three methods, each suited to a different team size. Watch free

Open roles:

AI Strategist
Forward Deployed Engineer
Applied AI Engineer
Engagement Manager

Salary ranges vary by role and experience. Additional comp based on output. Must be NY-based.

JOIN US

Too good to deploy

Happy Tuesday ⚡️

The Model They Can't Ship Yet

Need help building AI into your engineering and growth workflows?

The Business That Runs While You Sleep

Keep Reading

ultrathink