Refolk
June 29, 2026·10 min read

Fabric's 19,368 Interviews: 61% of Cheaters Still Passed. Redesign the Loop.

Fabric's 19,368-interview dataset shows 38.5% of candidates triggered cheating signals and 61% still passed. Why detection loses and what to run instead.

AI interview cheating detectionCluely interview toollive coding interview alternativestechnical screening 2026junior engineer hiring signal
Fabric's 19,368 Interviews: 61% of Cheaters Still Passed. Redesign the Loop.

If you still run a Zoom screen-share with a shared IDE as your primary technical screen, your loop is leaking signal in a way you cannot see. Cluely's overlay renders into the GPU compositor below the screen-capture layer, so the cheat is literally invisible to the interviewer watching the share. Fabric's January 2026 dataset puts a number on the damage: 38.5% of technical candidates trip cheating signals, and 61% of those still clear the ≥7.0 approval bar.

The mechanic interviewers keep underestimating

Cluely (the rebranded Interview Coder, built by Columbia dropouts Roy Lee and Neel Shanmugam) does not screen-scrape or inject keystrokes. It hooks the OS graphics pipeline directly: DirectX on Windows, Metal on macOS. The answer renders on the candidate's local display as a transparent overlay above the IDE. Zoom, Teams, and Meet capture the framebuffer beneath the overlay. The interviewer sees a clean editor. The candidate sees the solution.

This is not a "tell the candidate to share their full desktop" problem. The overlay sits above everything the conferencing software is allowed to read. The only way to catch it from the interviewer side is a second device pointed at the screen, which is not a hiring process, it is a deposition.

The Cluely interview tool is the loudest example, but it is not the only one. Final Round AI and Leetcode Wizard run the same play. Fabric counted at least six new Cluely-class tools and three open-source clones in 2025 alone. When detection vendors announce they are "specifically tuned for Cluely," the cheating market just ships the next overlay. You are not winning that race.

What the Fabric numbers actually say

Fabric analyzed 19,368 AI-powered interviews on its platform between July 2025 and January 2026, with automated behavioral and technical signal detection on every session. The platform sells detection, so read the absolute rate with appropriate skepticism. The shape of the data is harder to argue with.

38.5%
Technical candidates triggering AI cheating signals
Fabric's analysis of 19,368 interviews, July 2025 to January 2026.

The split by role is where the format problem becomes obvious: 48% in technical roles versus 12% in sales. Cheating is not a character trait of engineers. It is a response to a format that asks them to solve LeetCode in a window while pretending the assistant they use eight hours a day does not exist.

Juniors (zero to five years of experience) cheat at roughly twice the rate of seniors. That ratio is not moral, it is economic. A $20 to $50 monthly Cluely subscription against a $150K offer in the worst entry-level market in a decade is a rational bet. Sunday interviews hit 47.1%, the highest day of the week, which tells you the marginal cheater is a tired candidate at the end of a weekend grind, not a sociopath.

Adoption more than doubled from 15% in June 2025 to 35% by December 2025. Fabric projects it becomes the median behavior by late 2026. Gartner projects one in four candidate profiles will be entirely synthetic by 2028. The trendline is not your friend.

The 38.5% figure is a statistic about the format, not about the candidates.

The 61% is the real scandal

The headline number from the dataset is not that 38.5% cheated. It is that 61% of detected cheaters still passed the ≥7.0 bar. That means more than half the time, your rubric is grading the model's output, not the candidate's thinking. The cost does not show up on the scorecard. It shows up in month three when the new hire cannot debug their own code, cannot read a stack trace, and cannot use git without asking the model.

The Amazon intern story that put Roy Lee in the headlines is the canonical version of this failure mode. He cleared a real technical loop using the tool he then sold to everyone else. The loop did not catch it. Neither will yours.

Why detection is the wrong investment

Karat's 2026 survey of 400 engineering leaders across the US, India, and China found 71% say AI is making it harder to assess candidates' technical skills. Mohit Bhende, Karat's CEO, put the diagnosis bluntly: the popular theory is that cheating is the problem, but the real problem is that the fundamental job has changed and the interview format has not.

If your loop bans the tool every working engineer uses every day, you are not measuring skill. You are measuring compliance with an obsolete protocol. The candidates who comply are not necessarily better. They are often just less informed about what is available, or more risk-averse, or earlier in their job search. None of those are job-relevant traits.

Cluely itself is already pivoting toward enterprise meeting-assistant use cases. When the category leader of the cheating market is fleeing the use case, designing your interview loop around defeating it is a strategic error. The 83,000-user Cluely breach is a useful footnote here: every candidate who used the tool is now in a leaked dataset. That is a problem for them, not a sourcing solution for you.

Split the prescription by level

The single biggest mistake hiring teams make right now is running one format for everyone. The cheating delta is largest at the junior end. The judgment signal is largest at the senior end. Those need different loops.

Junior loop: supervised work-sample in a hardened environment

For zero to five years of experience, the failure mode is grading model output as candidate skill. The fix is to run a supervised, in-environment work-sample test where the candidate is observed working on a small, real codebase and asked to explain decisions as they go. Think a 45-minute session in a sandbox you control, with a webcam on, screen recording on, and the interviewer asking "why did you do that" every few minutes.

The point is not to catch cheaters. The point is that the format makes the cheat useless. An overlay cannot answer "why did you pick a HashMap here and not a TreeMap" in real time with the interviewer's specific follow-up. You are measuring reasoning under interactive load, not output correctness.

For junior engineer hiring signal that actually predicts performance, the cleanest read comes from a candidate explaining a tradeoff they did not anticipate. That is the moment the model-as-crutch breaks.

Senior loop: AI-allowed pair-programming on ambiguous problems

For senior and staff levels, ban the AI and you have banned the actual job. Allow it and grade the judgment. Meta began piloting an AI-enabled coding interview in October 2025 that replaces one of the two onsite coding rounds: 60 minutes in a CoderPad environment with Claude Sonnet 4.5, GPT-5, or Gemini 2.5 Pro built in. Google is piloting a Gemini-assisted round, initially targeting junior and mid-level roles on select US teams. Sundar Pichai said in his April 22, 2026 blog post that 75% of new code at Google is now AI-generated and approved by engineers, up from 50% the prior fall.

If three quarters of new code at Google is AI-generated, an interview that forbids the model is testing for the 25% slice. The senior loop should look like the job: ambiguous problem, AI assistant available, candidate has to decide what to ask the model, when to trust it, when to override it, and how to verify. Shopify (Farhan Thawar has been vocal here), Stripe, Canva, Rippling, and Red Hat are running variants of this.

The grading rubric changes. You are no longer scoring "did the code work." You are scoring prompt quality, verification rigor, willingness to push back on a wrong suggestion, and architectural reasoning. Those are the live-coding interview alternatives that actually generalize to 2026 production work.

The sourcing implication

This is where the problem doubles back on the top of the funnel. If your screen is bleeding 38.5% noise and 61% of bad signal passes through, the only way to keep your engineering bar is to start with a higher-quality pool. Inbound, ranked by an ATS that prioritizes recency, is not that pool. Public signal, ranked by what someone actually built, is.

Sourcing on plain-English descriptions of demonstrated work is the part of the pipeline that does not get easier with detection vendors. It gets easier when you can ask for what you actually want and get back people whose public output already shows it. That is the bet behind Refolk: describe the engineer in plain English, get the ranked shortlist across GitHub, LinkedIn, and the open web, and walk into the interview already knowing the candidate has shipped the thing.

For technical screening 2026, the order of operations flips. The interview is no longer the first real filter. The sourcing query is. Refolk lets you front-load the signal you used to hope a 45-minute screen-share would extract, which means the interview can do the one thing it is still good at: watching a human reason through ambiguity in real time.

Wait, that block already ran above. The point stands. If your loop is collapsing, your sourcing query has to do more work.

What to stop doing on Monday

Stop running the Zoom-shared-IDE LeetCode screen as your primary filter. It is not catching the overlay, the rubric is grading model output, and the pass-through rate of detected cheaters is 61%. Karat's data shows 63% of US employers still use automated code tests and 45% still use take-homes, with confidence in both falling. You do not need to be the last team running the format whose own vendors are losing faith in it.

Stop investing in AI interview cheating detection as a primary line of defense. It is a useful secondary signal. It is not a strategy. Detection vendors are training against named tools and the market forks the next overlay within weeks.

Start with format redesign by level. Junior: supervised, observed, conversational work-sample. Senior: AI-allowed pair-programming on an ambiguous problem, rubric scoring judgment. And start earlier, at sourcing, where the work of finding people who have already demonstrated the skill in public has gotten dramatically more tractable. Refolk's plain-English queries against GitHub, LinkedIn, and the open web are how a small team gets to a 10-person shortlist that already passes the bar before the interview starts.

The teams that win the next two years of engineering hiring are the ones who stop trying to defeat Cluely and start running loops where Cluely does not matter.

FAQ

Does Cluely actually work against Zoom, Teams, and Meet screen-share?

Yes, by design. Cluely renders through the OS graphics pipeline (DirectX on Windows, Metal on macOS) as an overlay above the framebuffer that conferencing software captures. The interviewer sees a clean IDE. The candidate sees the model's answer over the top of it. Asking the candidate to share their entire desktop does not help, because the overlay sits above the layer Zoom is permitted to read. The only reliable detection from the interviewer side is a second camera pointed at the candidate's screen, which is not a scalable hiring process.

If detection does not work, what is the single highest-leverage change to make?

Split your loop by level and change what you are measuring. For juniors, run a supervised work-sample in an environment you control, with the interviewer asking "why" questions in real time. For seniors, allow the AI and grade prompt quality, verification, and architectural judgment. Meta's October 2025 CoderPad-with-AI pilot and Google's Gemini-assisted round are the live templates. The point is to make the cheat irrelevant, not to catch it.

How should sourcing change in response to this?

Move the signal earlier in the funnel. If the interview cannot reliably distinguish skill from model output, your shortlist has to arrive with the skill already demonstrated in public. That means sourcing on shipped repos, technical writing, and conference talks, not on self-declared skills or keyword matches on a resume. Tools like Refolk let you describe the engineer in plain English and rank candidates by what they have actually built across GitHub, LinkedIn, and the open web.

Is it fair to allow AI in some rounds but not others?

It is more fair than the current state, where the candidate who knows about Cluely beats the one who does not, and both get graded on a rubric that pretends the tool does not exist. Being explicit (AI-allowed in this round, AI-prohibited in that round, here is what we are grading) is a clearer contract than the implicit honor system the industry has been running. Candidates respond well to clarity. Loops that publish their AI policy upfront also see better completion rates.

Read next