May 19, 2026·9 min read

Sierra Killed the Coding Round. Now Source for Spikes, Not Titles.

Sierra's new AI-native interview loop grades spikes, not weaknesses. Here's how to source for it when "agent engineer" isn't a real LinkedIn title.

AI-native interview loophiring for spikes not weaknessesagent engineer sourcingSierra AI hiringcoding agent era recruiting

Sierra Killed the Coding Round. Now Source for Spikes, Not Titles.

On April 22, Sierra published "The AI-native interview," a post that quietly retired the canonical engineering loop most of your hiring managers still run. Two coding rounds, algorithms, system design, culture fit: gone. In its place, an open-ended onsite that grades candidates on spikes instead of an absence of weakness. If you're a recruiter or talent lead, the bigger problem isn't redesigning your loop. It's that your sourcing filters were calibrated for the dead one.

What Sierra actually changed

Sierra's old loop was the one everyone runs: two coding interviews, plus algorithms, plus system design, plus culture fit, plus references. Their own words: "a well-understood, scalable approach." The reason they tore it down is the part worth quoting in your next hiring-manager sync. Much of the signal, they wrote, was about mechanics. Typing syntax into an editor. Remembering algorithm details. Stitching frameworks together. That felt dissonant from how their engineers actually work, which is with coding agents in the loop, all day, every day.

The new format removes the coding and algorithms interviews entirely. The onsite opens with a Plan phase, a working session where the candidate drives ideation on a product to build while interviewers probe and stress-test the design. There's a debugging round in pilot where candidates pull down a medium-sized codebase plus a draft PR introducing a cross-cutting feature, then review and improve it, iterating with coding agents the same way they would on the job.

The hiring philosophy underneath is the part that breaks sourcing. Sierra is explicit: they're "hiring for strengths, not just an absence of weakness." Debriefs surface spikes and gaps. Someone who excels at product strategy and initiative but has holes in deep system understanding can still get hired, with a plan for where they'd thrive. The debrief question shifted from "should we hire this person?" to "where would this person thrive, and how do we support them?"

This is not a Sierra invention. The "spike" vocabulary traces back to McKinsey through the recruiting community: "Not everyone does everything brilliantly, that's fine. Everyone should have something that they do the best, that's their thing, that's their spike." What Sierra has done is operationalize it for the coding agent era, and they have the runway to make it stick.

Why this is a sourcing problem, not an interview problem

Sierra closed a $950M round on May 4 at a $15.8B post-money, led by Tiger Global and GV. They crossed $100M ARR seven quarters after launching in February 2024, and over 40% of the Fortune 50 are customers. When a company at that scale and visibility publishes their interview loop, AI-native shops copy it within a quarter. Sakana AI is already running a converging format: a month-long take-home rewarding depth, and a technical interview that's a conversation about real experience, not a coding exam. Anthropic has long said roughly half their technical staff come from non-ML backgrounds and that they weight "direct evidence of ability" over credentials. Same philosophy, different wrapper.

If your hiring managers want "the engineer who can take an agent-built 0→1 prototype to 1→N in a messy codebase," your job description is going to say "Senior Software Engineer, 5+ years, distributed systems." Those two sentences are not describing the same person, and your funnel is going to feel it.

7,751

US profiles tagged "AI agents" in Refolk's index

Top titles are dominated by founder and CEO variants, not "AI Engineer" or "Agent Engineer."

The pool is real. It's just hiding. A US scan of profiles tagged "AI agents" in our index returns roughly 7,751 people, and the dominant current titles are "Co-founder & CEO," "Founder," and "Founder & CEO." The actual operators building agents day to day are scattered across "Senior Software Engineer," "Solutions Engineer," "Developer Experience," and self-defined titles. A Boolean search on title="AI Engineer" undercounts this pool by close to an order of magnitude. If you run that search and report back to a hiring manager that the market is thin, you're wrong, and the next recruiter will eat your lunch.

What "source for spikes" actually looks like

The spike loop quietly raises the bar on the top of the funnel. Sierra admits the old format pushed managers to lean on referrals when calibrated signal was weak. The new format trades calibration for richness, which means recruiters can't hand a hiring manager a shortlist of profiles that look right on paper and call it a day. The work moves left. You have to bring pre-evidenced spikes into the funnel.

What counts as pre-evidenced? A shipped agent with a public demo. A PR that landed in LangChain, LlamaIndex, or an MCP server. A blog post that walks through a real tradeoff, not a tutorial. A Hugging Face Space with non-trivial usage. A Cursor or Codex workflow shared on X with a working repo behind it. These are artifacts. They are also exactly what title and YOE filters can't see.

This is the friction we built Refolk for. You describe the spike in plain English, the kind of evidence you'd want a candidate to defend in a Plan phase, and you get a ranked shortlist back across GitHub, LinkedIn, and the open web. Not a title match. A behavior match.

Title search asks who someone calls themselves. Spike search asks what they actually shipped last quarter.

Three sourcing briefs you can run this week

Brief one: "the agent shipper." Plain English: engineers in the US who have shipped a non-trivial agent in the last 12 months, have a public repo or demo, and write occasionally about tradeoffs. Notice what's missing: years of experience, current title, current employer. This is the brief Sierra would actually give you if they could give you one.

Brief two: "the full-stack 0→1→N." Sierra says the leverage now comes from "combining technical ability with product thinking and business context. They don't just write code; they define scope, make tradeoffs, and iterate with customers." Source for engineers who have a product surface attached to their name. A side project with users. A Show HN that landed. A small consulting practice. These are spikes that a Plan phase will surface and a LeetCode loop will miss.

Brief three: "the infra engineer who builds product." Sierra explicitly extended the new loop to infrastructure roles because "many infrastructure engineers now build full-stack tools or agents and work closely with product to vertically integrate with what customers need." The pool you want is SREs and platform engineers who have shipped internal-tools agents or developer-facing products, not the ones who only show up in incident retros.

The contrarian reads worth taking to your hiring managers

YOE is now actively misleading. When one engineer can build across the stack with agents, a 10-YOE backend specialist at FAANG is not obviously stronger than a 3-YOE generalist who ships agent-built products end-to-end. Sierra's framing reads like a sourcing brief, not a JD. Treat it that way.

"Agent engineer" is a fake title. Almost nobody on LinkedIn uses it. The 7,751 number above is a floor. The operator pool is fragmented across at least a dozen self-defined labels, and the only reliable way to find them is to search on what they've built, not what they're called. This is also why the macro signal from Salesforce matters: Marc Benioff has said Salesforce paused some software engineering hiring in 2025, and their roughly 15,000 developers are now leveraging Claude, Codex, Cursor, and AgentForce while slowly moving into supervisory positions managing AI-generated code. The titles inside Salesforce haven't caught up to the work either.

The new loop favors candidates evaluated in public. Plan, build, defend onsites reward people who already do that in the open. Writing about tradeoffs. Shipping side-project agents. Contributing to LangChain, LlamaIndex, MCP. Recruiters who only ping passive FAANG engineers will lose to those sourcing from Hugging Face, GitHub trending, and AI-builder Discords. This is also where Refolk indexes deeply, because LinkedIn alone won't surface the contributor graph behind an MCP server.

There's a candidate-experience arbitrage. Sierra openly says many candidates called the new loop "the most fun interview they've ever had." If you're competing for the same spike-y engineers with a five-round LeetCode gauntlet, you are selling a measurably worse product. Pitch the loop itself as part of the offer. If your company hasn't redesigned its loop yet, at least be honest with candidates about which rounds are mechanics and which are evaluating the work.

How the funnel changes, concretely

The old funnel was a wide net plus a tight rubric. Sourcers brought volume, the loop calibrated it down. The new funnel is the inverse. The loop is richer but less calibrated, so sourcers have to bring evidence, not just profiles. Three operational changes follow.

First, kill the title-first search. Start every brief with a behavior or artifact, then layer location and seniority on after. If your ATS only lets you search on titles and YOE, you are working with a tool calibrated for the old loop. Refolk exists because that's the gap: ask for the person you want in plain English, get the shortlist back.

Second, attach the artifact to the profile when you hand it to the hiring manager. A link to the repo, the demo, the post. The Plan phase is going to ask the candidate to defend their thinking. Your shortlist should already have done some of that work.

Third, change the debrief question on your side too. Stop asking "is this person a fit for the role?" Start asking "what spike does this person have, and is it one we need this quarter?" That's the same shift Sierra made internally, and it will reshape who you advance.

FAQ

Is the Sierra loop actually replicable, or is it a Bret Taylor luxury?

It's replicable, but it requires more senior interviewer time per candidate and a hiring manager bench that can run open-ended sessions without falling back on rubrics. The sourcing work, however, is replicable by anyone willing to source on artifacts instead of titles. You can adopt the spike-sourcing philosophy without adopting the full onsite, and most teams probably should start there.

Won't sourcing on artifacts bias toward public builders and miss great engineers who don't post?

Yes, partially, and that's a real cost. The mitigation is to combine artifact signal with referral networks and to weight private-company evidence (internal tooling that shipped, named customer wins) when candidates surface it. The point isn't that public artifacts are the only signal. It's that title and YOE are now the weakest signal, and you need a primary sieve that isn't either.

How do I sell "spike hiring" to a hiring manager who wants a senior generalist?

Reframe the request. Ask them which two or three spikes the team is currently weakest on and which one this hire needs to bring. Most "senior generalist" briefs are actually three spike briefs in a trench coat. Once you separate them, you can run three different sourcing passes and bring back candidates the hiring manager would never have surfaced from a title search.

What's the fastest way to find candidates who'd actually thrive in a Plan-phase interview?

Look for engineers who have driven a product from scoping through ship in public, regardless of title. Side projects with users, Show HN posts with traction, MCP or LangChain contributions with a design discussion attached. That behavior maps directly to what Sierra's Plan phase is grading. Tools like Refolk let you describe that behavior in plain English and get the people back, which is faster than scraping GitHub by hand.