Synthetic audience testing · built on peer-reviewed methodology
Find out if your page converts — after before you launch it.
Paste your landing page or App Store listing. We sit 150 strangers from your selected peer lane in front of it, capture their unfiltered reactions, and convert them into the same purchase-intent metrics Fortune-500 consumer brands use — plus every objection, in their own words.
Report in ~5 minutes · From $29 · No signup until you confirm the lane
“How likely are you to subscribe to this?”
def. not
unsure
def. yes
“The free tier sounds generous, but I can't tell what the paid plan actually adds. I'd try it and probably never upgrade.”
How it works
A consumer research panel, rebuilt in software.
The same four stages a research agency runs over a month — compressed into minutes, at a price an indie budget survives.
We read your page like a customer does
Screenshot + copy extraction of your landing page or store listing. The panel reacts to exactly what visitors see — headline, screenshots, pricing, all of it.
You approve the lane
We suggest the peer lane your page should be benchmarked against. Nothing runs until you confirm the fixed panel and baseline.
150 strangers respond in their own words
Each persona reacts in free text — no forced ratings. Research shows direct numeric ratings from AI are unrealistic; natural reactions are where the signal lives.¹
Reactions become research-grade metrics
Semantic Similarity Rating maps every reaction onto a purchase-intent distribution, benchmarked against hundreds of scored pages — plus the top objections, quoted.
The science
Not vibes. A published method, validated against 9,300 real consumers.
150 Strangers implements Semantic Similarity Rating (SSR) — a technique developed by researchers at PyMC Labs and Colgate-Palmolive and tested against 57 real consumer surveys.
“LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings”
Maier, Aslak, Fiaschi, Rismal, Fletcher, Luhmann, Dow, Pappas & Wiecki (2025). arXiv:2510.08338 · open-source reference implementation on GitHub
- Asking AI for a 1–5 rating directly fails. Models cluster on “safe” middle answers and produce distributions nothing like real consumers. The study measured it: only 0.26 distributional similarity.
- Free-text reactions, mapped by meaning, work. SSR embeds each reaction and measures its semantic distance to calibrated anchor statements — recovering realistic distributions (similarity > 0.85) and product rankings at ~90% of what a repeated human panel achieves.
- Detailed personas are non-negotiable. With rich conditioning the method reached ~90% reliability; without it, signal collapsed to ~50%. That's why every lane uses a fixed, detailed panel before we run anything.
- Comparison is where it shines. The method's strength is ranking — which variant, which competitor, which message wins. We engineered the whole product around that, instead of pretending one number predicts your conversion rate.
Straight answers
What this can and can't tell you.
A research tool you can't trust is worthless. So here is exactly where the method is strong — and where we'll refuse to oversell it.
Reliable for
- Ranking variants: which headline, pricing frame, or screenshot set your audience prefers
- Competitive position: how your page lands next to up to 3 competitors, same panel, same question
- Objection mining: the recurring reasons skeptics say no — quoted, clustered, segmented
- Message clarity: whether your value proposition is even understood at a glance
- Segment fit: which of your audience segments responds — and whether it's the one you expected
Not built for
- Predicting your conversion rate: synthetic intent is directional, not a revenue forecast — anyone claiming otherwise is selling you something
- Replacing real usage data: retention, churn, and pricing elasticity need actual customers
- Truly novel domains: if your market has no footprint of real customer conversation online, we flag the report as low-confidence — visibly
- Fine-grained demographic claims: the research found subgroup fidelity uneven; we report segments by behavior, not by census box
Pricing
Cheaper than one hour of a researcher's time.
One-time payments. No subscription. If the scrape fails or the report can't be produced, it's auto-refunded.
Diagnostic
- 150-stranger panel on one URL
- Purchase-intent distribution + benchmark percentile
- Top 5 objections, quoted & segmented
- Copy fixes suggested per objection
Variant duel
- 2–4 versions of your page or copy
- Same panel reacts to every variant
- Ranked results — the method's strongest mode
- Per-segment winner breakdown
Competitor scan
- Your page vs. up to 3 competitors
- Where you win, where you bleed
- Objections unique to your page
- Positioning gaps in their copy you can claim
Questions
Asked by people who should be skeptical.
Aren't AI survey respondents just made up?
Naively, yes — ask a model for a 1–5 rating and you get useless, middle-clustered answers. That's exactly what the underlying research demonstrated, and why we don't do it. SSR elicits natural-language reactions and maps them to ratings by semantic meaning. Validated against 9,300 real respondents across 57 surveys, it recovered ~90% of the reliability a repeated human panel achieves. Synthetic panels aren't a replacement for talking to customers — they're a way to arrive at those conversations with a sharper page.
How do you know which audience to use?
We read your page and suggest a positioning lane, then stop and show it to you. You can switch lanes before a single persona is generated. The scored panel is fixed for that lane, so your percentile compares against peer pages judged by the same buyer types.
Why won't you give me a predicted conversion rate?
Because the method can't honestly deliver one, and we'd rather be trusted than impressive. The validated strength of SSR is relative measurement — rankings, percentiles against a benchmark corpus, and the qualitative why. Synthetic intent distributions are systematically wider and slightly more critical than human ones, which makes them great discriminators and bad absolute forecasters.
What if my product is in a really niche space?
The method works because models have absorbed enormous amounts of real customer conversation about most consumer domains. If your domain has thin coverage — deep tech, novel B2B categories — synthetic reactions get less trustworthy. We detect this and put a low-confidence banner on the report rather than hiding it. If a report is flagged and you don't find it useful, ask for a refund.
What do you do with my page and data?
We screenshot and extract copy from the URL you give us, run the analysis, and store your report so you can revisit it. We don't train models on your pages, don't resell your data, and only ever test publicly accessible URLs you submit.
Has this been tested on itself?
Yes. This page exists because variant B beat variant A in our own panel, and we then verified the prediction with a live A/B test on real traffic. We publish those self-tests as we run them — a method that can't survive its own scrutiny doesn't deserve yours.
Five minutes from now
You could know what 150 strangers think.
Or you could keep guessing, launch, and find out from a flat conversion chart three weeks from now.
$29 · auto-refund if we can't produce your report