Running A/B Tests with Amplitude Experiment – Real-World Tips?

Hey everyone,
I wanted to share a few lessons I’ve learned running A/B tests using Amplitude Experiment and open up a conversation around how others are approaching it in real-world settings.

We’ve been using Amplitude for product analytics for a while, but recently moved some of our experimentation flows over to Amplitude Experiment—mainly to keep everything in one place and tie our test results more closely to user behavior.

Here’s a quick breakdown of what’s worked, what didn’t, and where I’d love to hear advice from others.

✅ 1. Getting Targeting Right Is Everything

We ran into issues early when our target segments weren’t clearly defined. One test aimed at new users actually hit a mix of new and returning users due to how our user property was being updated (or not updated fast enough).

Tip:

Use real-time flags sparingly—and be cautious about when properties get assigned (especially in onboarding).
We now double-check everything in a "pre-launch debug dashboard" to verify who would actually qualify for the test.

🔁 2. Split by User ID — Not Device ID (Most of the Time)

If your users log in across multiple devices, splitting by Device ID will backfire. We had a test where someone saw Variant A on mobile and Variant B on desktop. Ouch.

Lesson learned: Stick to User ID split for authenticated flows, and only use Device ID if you’re absolutely sure it’s a one-device experience.

📈 3. Don’t Stop Tests Too Early

Guilty of this one: our team stopped a test after 4 days because it looked like one variant was “winning.” But after checking the confidence intervals in Amplitude’s significance testing tools, it turned out the results weren’t statistically reliable.

What we do now:

Minimum of 7 days per test, even with high traffic
Set thresholds before launching (e.g., p < 0.05, minimum detectable effect, etc.)

🎯 4. Tie Experiments to Core Metrics, Not Vanity Metrics

Our first few tests focused on click-throughs and impressions—easy wins, right? But over time, we learned it’s way more valuable to track downstream outcomes (e.g., completed signups, conversions, retention after 1 week).

Now we ask:

“What user behavior do we ultimately care about, and is this test really influencing it?”

Amplitude makes this easier than most platforms by letting you hook experiments directly into funnel and retention views. That was a game changer for us.

🧪 5. Use Holdouts When Possible

This was new for us, but using holdouts (groups that get no experiment) helped reveal if any of the test variants were really moving the needle — or if we were just seeing noise.

Surprising result: In one case, both Variant A and B performed worse than the control group 🤯. Without a holdout, we would’ve never known.

💬 Let’s Share Tips

I’m still learning the ropes with Amplitude Experiment, but it’s already helping us move faster and stay aligned across data, product, and marketing.

If you’re also using it, I’d love to hear:

How do you pick your success metrics?
Do you use remote configs to roll out winning variants?
Any automation tricks (e.g. triggering feature flags via LaunchDarkly, Segment, etc.)?

Let’s trade war stories

Find more posts tagged with

Experiment

A/B Testing

Active

Status: Active

Comments

There are no comments yet

Quick Links

Give your vote

Linking AI Feedback to Specific Surveys in Amplitude
It would be really helpful if Amplitude showed which survey a piece of feedback belongs to. Right now, it’s hard to tell where comments originate when multiple surveys are active. As a bonus, it would also be great if the feedback text matched the UI language (currently, mine shows in German).
playback video in 9:16 in guides on phones
It would be great to have the option of playing back video (and having the video preview) in a 9:16 (vertical) orientation so that an end user of a video in a guide didn't have to full-screen the video in order to have it play in the correct orientation. It would also be great to have support for Wistia as more and more…
Grouping reordering
Hi, I’d like to request a feature: it would be really helpful to have the ability to shuffle or reorder properties within an event, especially when grouping. Right now, adjusting the order of grouped outputs is quite cumbersome.
Fix the search when filtering
When changing a property value, because we often have complex and long values, we paste the exact value in the search bar. However the matching value will appear at the very top but all other values will remain, which makes the search complex and we lose a few seconds each time Here is a loom to show my point Thanks !!

View All