May 15, 2026·9 min read

Standard Intelligence Raised $75M With 6 Engineers. Title Search Finds 1.

Standard Intelligence just raised $75M from Sequoia and Spark with six engineers. Here's how to source the next 24 hires by skill, not LinkedIn title.

sourcing foundation model researchersGUI agent engineers hiringStandard Intelligence hiringskill-based sourcing not titlesvideo model ML researcher

Standard Intelligence Raised $75M With 6 Engineers. Title Search Finds 1.

On April 30, 2026, a six-person San Francisco lab called Standard Intelligence announced a $75M Series A led by Sequoia and Spark Capital, with Andrej Karpathy, Stanley Druckenmiller, and ex-Tesla Optimus lead Milan Kovac on the cap table. Their model, FDM-1, learns to control software by watching video instead of labeled screenshots. The next 24 hires will define whether that thesis ships, and the people who can actually build it are not titled anything close to what a recruiter would type into LinkedIn.

If you are sourcing for this brief, or competing with it, the search bar is the wrong tool. The right tool is a citation graph.

The actual talent pool is a few hundred people, globally

Standard Intelligence is doing three things at once: video pretraining at petabyte scale, GUI grounding in pixel space, and computer-use evaluation against benchmarks like OSWorld. The intersection of those three skills is not a job category. It is roughly the author lists of about a dozen arXiv papers from 2024 and 2025, plus the commit histories on a handful of GitHub repos.

To stress-test that, we ran a senior-researcher query against professional-network data for people who describe themselves as working on "computer use agent multimodal." The top result was one Research Scientist at Google DeepMind. That was effectively the entire useful return.

senior profiles globally that match "computer use agent multimodal" by headline

A title-based LinkedIn search returns essentially nobody for the exact skillset Standard Intelligence is hiring against.

That is not a tooling failure. It is a structural one. The researchers building this stuff are titled "Research Scientist," "Member of Technical Staff," "PhD Student," or, in the Microsoft Research case, "Principal Researcher." None of them carry "GUI Agent Engineer" in their headline because the field is eighteen months old and no HR system has caught up.

If you only search by title, you will miss the field.

Where the signal actually lives

Sourcing foundation model researchers in this niche is a paper-trail exercise. Four lists are doing more work than any LinkedIn filter:

OSWorld authors and leaderboard contributors. OSWorld was released in April 2024 by XLANG Lab at the University of Hong Kong, with collaborators at Salesforce Research, Carnegie Mellon, and University of Waterloo. The author list (Tianbao Xie, Tao Yu, Victor Zhong, Shuyan Zhou, Yiheng Xu, Caiming Xiong, Silvio Savarese) is a literal sourcing list. So is every team that has since submitted a result to the leaderboard.

ByteDance UI-TARS. Yujia Qin and team shipped the leading open-weight grounding model that most OSWorld evaluations are now run against. The author list is small and named.

Microsoft Research Magma. Jianwei Yang, Reuben Tan, Qianhui Wu, Jianfeng Gao. Magma is pretrained on images, videos, and robotics data using Set-of-Mark and Trace-of-Mark. It is the closest published analog to FDM-1's thesis, which means it is also the closest analog to FDM-1's hiring pool.

Shanghai AI Lab / OS-Atlas. Zhiyong Wu et al. built a cross-platform corpus of over 13 million GUI elements with performance gains across six benchmarks spanning mobile, desktop, and web. Read the acknowledgments section, not the abstract.

The state of the art these teams are chasing is still painfully early. Claude 3.7 hits 28% on OSWorld. Agent S2 with Claude 3.7 reaches 34.5%. OpenAI Operator scores 58% on WebArena and 38% on OSWorld. There is enormous headroom, which is exactly why the pool of people who can move the number is so small and so over-courted.

Why title-based search is structurally broken for this hire

Pull up LinkedIn Recruiter and type "GUI agent." You get a handful of post-2025 startup hires who adopted the term in marketing-adjacent roles. The researchers who define the field are titled by their employer's HR taxonomy, which predates the field by a decade.

This is the same pattern we saw with "prompt engineer" in 2023 and "LLM researcher" in 2022. Title catches up to skill on a two-year lag. If you wait for the title to exist, the hire is already gone.

Skill-based sourcing, not titles, is the only thing that works here. You are looking for people whose work intersects three vectors: video encoders, pixel-space policy models, and desktop or web-agent evaluation. Most of them publish. Almost none of them advertise.

This is exactly the kind of brief we built Refolk for. You describe the person in plain English ("research scientist who has published on video pretraining and has any commit history on OSWorld, UI-TARS, OS-Atlas, or Magma") and get back a ranked shortlist across GitHub, LinkedIn, and the open web. The title field is one signal among many, not the only one.

The video-pretraining angle widens the pool in a non-obvious direction

Most recruiters chasing this brief will only look at the agent crowd. They will miss half the talent.

FDM-1 is trained on raw video rather than annotated screenshots. Standard Intelligence says its video encoder is 100 times more efficient than OpenAI's alternative. That claim, true or not, tells you what kind of researcher they need: someone who has spent the last three years on video representation learning, not on prompting agents.

That widens the pool to V-JEPA contributors at Meta FAIR, VideoMAE authors, and the Magvit team at Google. None of those people have "GUI" anywhere in their work history. All of them are exactly the hire.

Title catches up to skill on a two-year lag. If you wait for the title to exist, the hire is already gone.

A video-model ML researcher who has never touched a desktop agent benchmark is a stronger FDM-1 candidate than a prompt-engineering generalist who has. The thesis is in the substrate, not the surface.

The Kovac signal: source from robotics, not NLP

Milan Kovac led Tesla Optimus engineering before angeling this round. Karpathy is also on the cap table. That is not just a press-release flex. It is a sourcing map.

The kind of engineer they admire (and the kind their network will refer) is an end-to-end vision-policy person from Tesla Autopilot, Figure, 1X, or Waymo perception. Not a classical NLP or agent-framework researcher. The driving demo in FDM-1's launch post used a web-based steering interface built on top of openpilot's joystick mode. Comma.ai's contributor graph is, again, a literal list of viable candidates.

If you are recruiting against Standard Intelligence, or competing for the same people, do not start with the agent crowd. Start with the imitation-learning and policy-learning crowd. They are fewer in number and far less contacted.

The infra-taste filter most recruiters miss

A six-person team built a 30-petabyte storage cluster in San Francisco for under $500,000. That is roughly 20 times cheaper than the equivalent on a hyperscaler.

30 PB

storage cluster the six-person team built for under $500K

That is roughly 20x cheaper than the hyperscaler equivalent and tells you what the next data engineering hire needs to actually do.

That number changes the hiring profile. The next data engineer at Standard Intelligence is not a generic "ML platform" person. They have shipped video data pipelines at petabyte scale on commodity hardware. That filter points at ex-YouTube infra, ex-Tesla data engine, and ex-Waymo perception data, not at people whose résumé is mostly Databricks and Snowflake.

GUI agent engineers hiring at this scale is a two-axis problem: research taste and infra taste. Most recruiters only filter on one.

The Atlas Fellowship angle

The co-founders, Galen Mead (21) and Devansh Pandey (20), met through the Atlas Fellowship. Mead left the University of Toronto. The Atlas Fellowship alumni network is small, identifiable, and full of AI-alignment-pilled young researchers who would self-select into this mission.

For a recruiter, that is a sourcing list of a few hundred people, almost all of whom are at the start of their careers and most of whom are not yet on the radar of any agent lab. Half the value of the round is that it puts Standard Intelligence in a position to hire the next five Atlas alumni before OpenAI or Anthropic notice.

A practical Standard Intelligence hiring playbook

If you are running the next 24 hires at this company, or competing for the same talent, here is the order of operations the research note backs:

Start with arXiv, not LinkedIn. Pull author lists for OSWorld, UI-TARS, OS-Atlas, Magma, V-JEPA, VideoMAE, Magvit, and CUA-Suite (VideoCUA, UI-Vision, GroundCUA, March 2026). That is your universe.
Cross-reference with GitHub. Filter to people with commit history on the corresponding repos plus openpilot. Recency matters more than total commits.
Map current employer. XLANG Lab at HKU, Microsoft Research, ByteDance, Shanghai AI Lab, Salesforce Research, Meta FAIR, Google DeepMind, and the agent teams at OpenAI and Anthropic. That accounts for most of the pool.
Run the infra filter separately. Ex-Tesla data engine, ex-YouTube infra, ex-Waymo perception. These are different people from the researchers and they need a different message.
Send a researcher-grade first message. Reference a specific paper, not a generic "saw your background." This is a community small enough that template outreach is recognized within one message.

Step one through three are the part that breaks for most recruiters, because the tools they use are title-indexed. Refolk is built for exactly this shape of brief: describe the person in plain English, get a ranked shortlist across GitHub, LinkedIn, and the open web, with the citation graph already wired in. You do not have to know in advance that V-JEPA contributors belong on the same shortlist as OSWorld authors. The query handles it.

What this means for the rest of the field

Standard Intelligence hiring is a leading indicator. Every lab working on computer use agents is about to discover that their next 24 hires live on the same author lists. Sequoia and Spark just put $75M behind the bet that a six-person team can win the race. They can, if they hire faster than the others find the list.

The recruiters who win this cycle will be the ones who stopped searching by title in 2025.

FAQ

Why is title-based LinkedIn search so bad for this hire?

Because the field is younger than the job-title taxonomy. Researchers working on GUI agents, computer use, and video-pretrained policy models are titled "Research Scientist," "Member of Technical Staff," or "PhD Student" by their employers. The skill exists. The title does not. Searching by title returns roughly one useful profile globally, which is a structural problem, not a tooling complaint.

Who should Standard Intelligence actually be poaching from?

XLANG Lab at HKU (the OSWorld team), ByteDance's UI-TARS group, Microsoft Research's Magma team, Shanghai AI Lab's OS-Atlas team, plus the video-pretraining crowds at Meta FAIR (V-JEPA), Google (Magvit), and the VideoMAE authors. Add openpilot contributors and ex-Tesla Optimus and Autopilot policy engineers for the Kovac side of the network.

Is the video-pretraining angle really separate from the GUI agent angle?

Yes, and that is the most expensive thing for recruiters to get wrong. FDM-1's thesis is that video is the right substrate for learning computer use. The people who can move that needle are video-encoder researchers who have never published a GUI paper. If you only source from the agent crowd, you miss the half of the team that makes the thesis work.

How do I source this without spending six months reading arXiv?

You describe the person in plain English (skills, papers, repos, adjacent fields) and let a sourcing tool resolve the citation graph for you. That is the brief Refolk was built for: find anyone, just ask, across GitHub, LinkedIn, and the open web. For a pool this narrow, it is the difference between a shortlist on Monday and a hire that goes to a competitor by Friday.