⏩ tl;dr
Zuck is adding “AI-driven impact” to Meta performance reviews next year. Lucky for you, this counts as workplace training.
Ryan Carson’s viral “ship code faster, idiot” kit
AI that ends slide-deck suffering (RSVP for FREE)
Logan Kilpatrick explains Gemini 3 like you’re 5 (RSVP for FREE)
Karpathy predicts your career funeral
Evals. WTF are they?
Reply + we’ll send a link we’d get fired for sharing.

Ship Software Without Touching Your Keyboard (Seriously)
the problem: You’re getting steaming-hot code slop because you talk to your AI agent like an all-knowing being + expect miracles. In fact, everyone using AI is getting garbage responses back because they toss a vague request into AI and hope it magically understands the assignment.
the solution: Instead, treat your AI agent/ChatGPT chat like a junior hire. Say the steps you’d take, name the constraints of the project, and make your tool ask you clarifying questions before generating anything. This same logic applies to all AI tools.
Better context = better output
Better context ≠ context dumps or one-liners
Lazier input = lazier output
Ryan Carson builds and ships software features using Amp, but his strat is stack-agnostic. He helped us build this 5-step loop for getting better outputs + shipping features to prod faster.
step 1: Talk your feature into existence using a terminal-integrated recorder (like Wispr Flow).
Why voice?
when we type, we over-optimize for brevity and precision
when we talk, we naturally include more context, assumptions, and edge cases
it also gives the agent more surface area to ask good questions later
pro tip: map Wispr Flow to a hot key
Say something like this:
“We need to ship a feature that [high-level goal or outcome].
Our stack is [frameworks/languages/infra/tools].
We’re using [key dependencies like auth or payments tools].
Generate a rough PRD for this feature, but do not finalize it yet. Also, ask me some questions.”step 2: Answer those questions, then copy Ryan Carson’s pretty famous GitHub prompt to build a real PRD that:
forces even more clarifying questions
standardizes everything into a junior-dev-ready format
pro tip: Save Ryan’s files as slash commands or keep them handy in a folder
step 3: Next, convert your PRD into a task list that the agent can check off. Ryan’s other prompt makes the agent:
work with parent tasks (0.0 create feature branch, 1.0 backend, 2.0 frontend)
flesh out concrete subtasks (step 0.1, 0.2, etc.)
show its progress via a checklist
step 4: What’s with all the questions and checklists? Without them, the code falls apart. Here’s what else these prompts are doing for you:
limiting blast radius (focus on shipping one small feature per change)
running static analysis early (treat warnings as to-do items in the draft PR)
shipping w/ receipts (includes a small, human-reviewable changelog and a step-by-step rollout plan)
step 5: Teach the full flow to your team to become a pod of 10x engineers.
➿ Grab the complete playbook (with all Ryan Carson’s prompts + a full breakdown of each step).
What’s the hardest part about using AI at work right now?

There's a reason they have 70M users: Stop wasting hours making decks (build a swanky one in minutes w/ Gamma AI)
Guest: Gamma CPO, Jon Noronha
Day: Wednesday, November 19
Time: 4:00 PM - 5:00 PM EST
You keep hearing ‘Gemini is insane.’ Here’s what that actually means for your job
Guest: Lead product eng (Gemini’s API + Google AI Studio), Logan Kilpatrick
Day: Tuesday, November 25
Time: 2:00 PM - 3:00 PM EST

ultrathought: Software 1.0 was the old programming model where humans wrote rules by hand. Software 2.0 is the AI era where systems learn the rules themselves—as long as there is a reliable way to check whether the output is correct.
Andrej Karpathy (No. 1 AI guy you should know) argues that the real predictor of automation is not repetition, but verification. If the system can verify its own output, AI improves fast. If it cannot, the slope slows.
Creative direction is a good example. It is hard to verify automatically, which means the model cannot train at scale without humans telling it what “good” looks like.
So, if you are worried about AI taking over your job, the right question is not “Is my work repetitive?” but rather “How easy is it to verify the outcome?”
more thoughts: This post is both right + wrong.
For example, @ Tenex, we recently had a client who needed help migrating from one SaaS to another. So we did the labor and helped. But what we want to do next is offer them their own custom software.
Execs need to know this:
It used to be crazy to build your own software because it was $$$
People used to strictly buy
Now, people have the option to buy Saas, the option to labor it out, and the option to build their own
speaking of data: Most orgs run on SaaS tools that lock their operational data inside someone else’s database. Think CRMs, PM tools, transcription apps, analytics dashboards—the whole stack.
Jordan’s point is that when AI speeds up tool churn, the only companies that can adapt are the ones that can swap tools without breaking workflows. That only happens if you own the data layer on which those tools sit.
Owning your data infrastructure just means all operational data lands in your warehouse (Postgres, Snowflake, BigQuery), and SaaS tools read/write from there. Switch a CRM and nothing crashes because the CRM isn’t the system of record—your database is.
If you don’t own that layer, every AI-driven category reset could become a painful migration with broken automations, lost history, and weeks of downtime. You move more slowly. You lose more clients.
model news: Gemini 3 Pro is out!!!
Logan Kilpatrick—the public face of Gemini for developers—is joining us on Human in the Loop next Tuesday to show us how it works.

background: Karpathy’s Software 2.0 only works when the machine can verify its own work. Evals are how you create that verification. Think of them as drug trials for prompts and AI products: controlled inputs, measurable outcomes, and no room for guesswork/vibe coding.
in plain english: Dan, an applied AI engineer @ Tenex (the company that puts ultrathink on the map), says evals—or evaluations—are the backbone of AI reliability. They let devs check whether a model, prompt, or workflow actually works the way it is supposed to (every time, not just when the demo gods cooperate).
OpenAI, Anthropic, Google, and every serious AI lab publish eval documentation because evals are how they prove their models are safe, accurate, and production-ready.
here’s the part most people miss: Traditional software is deterministic (“if this, then that”), but LLMs are probabilistic (“if this, follow these guidelines”). That means there’s way more variance at play, just like giving instructions to a junior employee.
Evals are basically the scorecard that lets you inspect their work, give feedback, and tune the prompts until the model behaves consistently.
think: If your company is building a customer support AI chatbot, how well is it actually answering support tickets? How do you score that?
Well, there are a few flavors of eval:
accuracy tests (did it get the right answer?)
quality tests (was it clear, relevant, on-brand?)
safety tests (did it avoid saying something that gets someone fired?)
robustness tests (does it break when inputs aren’t perfect?)
And there’s the increasingly common one: model-on-model evals, where one LLM grades another because, at scale, humans are slow and inconsistent.
apply it: Pick what “good” means—relevance, safety, creativity, correctness—then score responses 1–5. Let one model judge the other.
Without evals, you’re shipping untested code into your business + hoping for the best.
Build your AI engine. Win the next decade.
Tenex, the team behind this awesome newsletter, helps companies architect, staff, and ship real AI systems that move the P&L, not the hype cycle.
Get started →


