Open source caught up

It’s tuesday

DeepSeek open-sourced V4 this week — a model the company says rivals the world's top closed offerings, with a 1M context window and free to run yourself. It's the latest data point in a longer trend we dig into below: the open model tier is catching up to the closed tier faster than most vendor strategies account for.

Today, we're talking about:

Why open-source AI is catching up to the frontier — and how to tell when it actually makes sense to switch
How to turn Claude Code into a video production studio (3-5 minutes per scene, no editing software)
A developer's model switch, OpenAI's AGI framework, and the first consumer Level 4 car you can actually buy

DeepSeek V4 vs. Opus vs. GPT — When the Cost Case Actually Pencils Out

DeepSeek V4 shipped this week with a 1M context window, frontier-level benchmark scores, and a price tag that should make every CFO with a Claude or OpenAI bill ask the same question: do we keep paying premium for closed models, or is open finally good enough? For most exec teams the right answer is "some of both" — but only if you know which workloads belong on which side of the line.

The head-to-head. On reasoning and coding benchmarks, V4 is within striking distance of Opus 4.7 and GPT-5 — close enough that for most enterprise tasks, you wouldn't notice the difference in output quality. The cost spread is where the gap shows up. Opus 4.7 runs $5/$25 per million tokens (input/output). GPT-5 sits in a similar band. V4 served on Bedrock or Vertex lands roughly 10–20x cheaper at the API layer, and self-hosted is cheaper still if you have the infrastructure. Where closed still pulls ahead: agentic tool use, long-horizon reasoning chains, and customer-facing work where one bad answer costs you a relationship. Those gaps are real and they aren't closing as fast as the benchmark scores suggest.

Where the cost case wins for an exec team: high-volume internal workloads where quality is "good enough" rather than "best possible." Document classification, extraction from PDFs, internal search and Q&A, summarization, first-draft content, support ticket triage, log analysis. These are the workloads where you're burning Opus tokens to do work that doesn't need Opus. Move them to V4 on Bedrock and the math is brutal — a team running $50k/month on Claude for extraction work could be running $3–5k for output a non-technical reviewer can't tell apart.

Where the cost case fails: anything where a wrong answer costs you money or trust. Customer-facing chatbots, agent workflows that take real actions (booking, purchasing, sending emails on your behalf), legal or compliance review, anything regulated. The frontier labs still earn their premium on the long tail of edge cases — and that's exactly where customer-facing work lives. The other failure mode is operational: if you don't have an MLOps function, the cost of standing up and babysitting an open model deployment will eat the savings. Bedrock and Vertex make this easier, but "easier" isn't "free."

Our call: don't migrate your stack, segment your workloads. Run an honest audit of the top three things your team uses Claude or GPT for. The high-volume, lower-stakes ones likely belong on V4 by Q3. The customer-facing and agentic ones stay on the frontier — for now. The exec teams that get this segmentation right cut their AI spend 40–60% without anyone outside the engineering org noticing the difference. The ones that don't are paying frontier prices for commodity work, and that gap compounds every month the open tier keeps catching up.

We build the stuff we write about.

Tenex is the team behind this newsletter. If something in today's issue made you think "we need someone to build that for us" — that's literally us. Agentic pipelines, AI workflows, fwd AI engineers, AI audits, training, and enablement.

Talk to our team →

Make Pro Motion Graphics With Claude Code and HyperFrames

HyperFrames is HeyGen's motion graphics engine — think After Effects, except scenes are described in plain English and rendered through HTML and CSS instead of a timeline GUI. The closest comparison is Remotion, which does the same job but writes in React instead of HTML. HyperFrames lowers the floor: because it's HTML-native, an AI coding agent like Claude Code or Codex can read, write, and modify scenes the same way it edits any web codebase. You still edit the final video — what changes is how fast motion graphics, effects, and captions get made. Work that used to take an hour in After Effects takes minutes when an agent is generating the scenes.

The setup is HyperFrames for the motion graphics and Claude Code (or Codex) as the operator. The result is a finished motion graphic scene in 3–5 minutes, no designer, no After Effects license, no terminal commands the user has to type. Here's how to get there:

1. Install HyperFrames and the media production kit. Point Claude (or Codex) at the kit's README and let the agent handle the install end-to-end — no command-line knowledge required. The kit is a plain-English spec that teaches the agent how HyperFrames works, and it ships with a custom plugin for generating music tracks alongside your scenes. One-time setup, under 30 minutes on a standard Mac.

2. Load your brand style guide into the project. Feed the agent your colors, fonts, spacing, and component patterns once per project. It stores them and applies them to every scene automatically. This is the entire difference between generic AI video output and something that looks on-brand — skip it and nothing else in the workflow saves you.

3. Tell the agent what scene you want. Describe the scene you have in mind — a stat callout, a product reveal, a quote card, an animated chart — and Claude Code builds it. Or drop in a section of your script and let the agent generate the scenes for it directly.

4. Render and bring into your editor of choice. Export from HyperFrames as a video clip and drop it into whatever timeline you already use — Premiere, Final Cut, Descript, CapCut. Total time per scene once the brand guide is in place: 3–5 minutes.

Full walkthrough here — the 22-minute version shows the whole stack end to end.

Musk v. Altman hits court — Jury selection started Monday in Oakland on Elon Musk's lawsuit against OpenAI and Sam Altman. The core claim: Altman broke an original agreement when he converted OpenAI from a nonprofit into a for-profit company, and Musk wants the conversion unwound and Altman removed from leadership. Trial runs four weeks; witnesses include Musk, Altman, Satya Nadella, and current and former OpenAI board members. Worth following — if Musk wins, it could derail OpenAI's IPO and force a structural unwind of the most valuable AI company in the world. If he loses, it settles the question of whether AI labs can reorganize however they want without accountability to their founding charters. Either outcome matters. Full coverage

A developer's model switch worth noting — McKay Wrigley, a developer and AI builder with a large following among practitioners, went from 80/20 Claude to 80/20 GPT for code in under three months. His take: Claude still wins for non-coding agent work, but OpenAI's Codex "feels like an engineer" for pure coding. If your team is standardized on one model for development work, the leading tool six months ago may not be the leading tool today.

OpenAI published its AGI principles — The most candid line in the document is Altman's own: OpenAI is materially larger than in 2018 and may have to "trade off some empowerment for more resilience" as it scales. That's the CEO of the most powerful AI company saying out loud that user autonomy might give way to platform constraints. Worth one read now and another read in six months when you notice what changed.

ChatGPT Images 2.0 is in a different tier — Reasoning-native image generation, 2K resolution, and a benchmark lead so large it has no close competitors right now. Replaces DALL-E 3 and GPT Image 1.5. The text rendering is the unlock — character-accurate multilingual output at a level no image model has reliably hit before. For teams doing brand content, product mockups, or presentation visuals without a designer, the gap closed in one release.

The full breakdown on Cursor's $60B deal — The Contrary Research piece on the SpaceX/Cursor acquisition option is worth reading in full: the margin data, the compute dependency chain, and what it implies for any AI-native company whose infrastructure runs through a single vendor. A useful read alongside this week's Future Proof — vendor concentration risk shows up in more than just model choice.

Level 4 autonomous vehicles are shipping this year — Tensor Robocar is taking orders for end-of-2026 delivery — the first personal Level 4 vehicle available for private purchase, with Lyft reserving hundreds for fleet ops. Worth tracking as a proof point: AI that enterprise roadmaps treated as a 2028-or-later story is showing up in driveways in 2026. The timeline compression happening in software is starting to show up in hardware too.

Open roles:

AI Strategist
Forward Deployed Engineer
Applied AI Engineer
Engagement Manager

Salary ranges vary by role and experience. Additional comp based on output. Must be NY-based.

JOIN US

Open source caught up

It’s tuesday

DeepSeek V4 vs. Opus vs. GPT — When the Cost Case Actually Pencils Out

We build the stuff we write about.

Make Pro Motion Graphics With Claude Code and HyperFrames

Keep Reading

ultrathink