Over the past few weeks, we ran manual codebase scans for 10 early-stage startups. All voluntarily. All anonymized below. The deal was simple: connect your repo, our scanner runs in an isolated sandbox, you get a findings report. We never see your code. No sales pitch attached.
We were not looking for everything. We were looking for the things that actually blow up in production. The things a fast-moving team with AI-assisted PRs would miss because the reviewer was tired, or because the reviewer was also the author, or because there was no reviewer at all.
Here is what we found.
By the numbers
9/10
codebases had at least one high-severity finding
6/10
had a secret that had already touched a production branch
8/10
had a dependency with a known critical CVE
7/10
had at least one migration that could not be cleanly rolled back
Finding 01: Leaked secrets in production branches
Six out of ten. That is the number that surprised us most. These were not secrets buried in an old feature branch. They were in main. A few were in commit history that had been squash-merged and forgotten. One was a live Stripe secret key that had been rotated but not removed from the repo.
The pattern we see most often: a developer hardcodes a key locally to test something quickly, then that file gets staged, committed, and pushed in a batch commit at 11pm. The PR looks fine. Nobody checks the diff on config files carefully.
# what we actually found, paraphrased
STRIPE_SECRET_KEY=sk_live_4xR9...redacted # committed 47 days ago
AWS_ACCESS_KEY_ID=AKIA...redacted # committed 61 days ago
DATABASE_URL=postgres://admin:p4ss...redacted # still in HEADNone of these would have been caught by a standard linter. They were caught by a scan that was specifically looking for them.
Finding 02: Vulnerable dependencies nobody noticed
Eight out of ten codebases had at least one dependency flagged with a critical CVE. The average time since the vulnerability was published: 94 days. The average number of PRs merged into the codebase in that window: 38.
The dependency that came up most often was a JSON parsing library used as a transitive dependency three layers deep. Nobody installed it directly. It showed up because something else pulled it in. It had a known prototype pollution vulnerability. It was easy to miss and easy to fix once you knew it was there.
The problem is not that teams are careless. The problem is that dependency trees are opaque and most review workflows do not have a dedicated step for checking what changed below the surface.
Finding 03: Schema drift and migration risk
Seven out of ten had at least one migration that we flagged as non-reversible without data loss. Three of those had already been applied to a staging database that shared the same schema as production.
Schema drift is the category where we see the clearest gap between what teams think is safe and what actually is. Most teams that write careful application code write fast migrations. The migration is an afterthought. It ships in the same PR as the feature.
“The migration ran fine in staging. It would have failed against 180,000 rows. The rollback would not have been clean.”
Finding 04: Unsafe merges from AI-generated code
This one was harder to quantify, so we are being careful with the number. In five out of ten codebases, we found code patterns consistent with AI generation that had been merged without substantive review. Not because the teams were sloppy. Because the code looked fine on the surface and the PR was small.
What we actually flagged in this category: logic that handled the happy path correctly but had no error boundary for an obvious edge case. Input validation that was present in the UI layer but absent in the API route. Auth checks that existed on the route but not on a background job that hit the same data.
This is the blast radius problem. A single unsafe merge is not usually catastrophic on its own. It is catastrophic when it is in a path that touches billing, user data, or access control.
What this tells us

These are not bad engineering teams. Most of them are moving fast with small crews, using AI tooling to ship faster, which is rational. The problem is that the generation tools and the review layer are not independent. The same model that wrote the code is, at best, being asked to review a diff of the code. That is not a review. That is a spell-check.
What these codebases needed was not more suggestions. They needed something to block the merge until the finding was resolved. That is what we are building.
We are opening early beta scans. Free.
We are running free codebase scans for early-stage startups right now. The scan is now fully automated. Connect your repo, the pipeline runs, you get a report. No queue. No batch. No waiting on us to find time in our day.
Fully blackboxed. We never touch your code.
The scan runs inside an isolated, ephemeral sandbox. Your code is never transmitted to us, stored on our infrastructure, or accessible to anyone on the autter team. The scanner executes, produces findings, and the environment is destroyed. What you get is a report. Not a relationship where someone at a startup has read your codebase.
This is not a footnote. For most of the teams we spoke to, it was the deciding factor.
No sales call required. No commitment after. If you want to talk through the findings, there is a link below to book 30 minutes. If you just want the report, that works too.
Get your free scan

P.S. Tanvi reviewed this post before it went out and said the stat on AI-generated code was "too soft and needs a harder number." She is right. We are working on that. For now, five out of ten is what we can stand behind without overfitting to a pattern. We will publish the harder number when we have it.

