Skip to main content

Join us for this month's Coffee Chat, featuring Amplitude's very own Head of Product for Experiment, Wil Pong. Enjoy a coffee on us while chatting about product experiments.

Share your questions and Wil will be posting answers here after the event! 

Here are a few to get us started: 

  • How would you leverage your data to run your experiments? 
  • Where are you at with your experimentation journey?
  • Have you just started or have you done this before?

Note that to see the latest posts you need to refresh this page. We can’t wait to hear from you! 

Some good books on experimentation: https://www.quora.com/What-are-some-good-books-to-read-on-the-practical-side-of-online-experimental-design/answer/Ronny-Kohavi
 

If you’re interested in learning more about the culture and many real-life examples, I teach a 10-hour interactive zoom class: https://bit.ly/ABClassRKLI, partly based on the book https://experimentguide.com 

 


Thank you @ronnyk! These books and your course are amazing resources! 😃


Amazing Coffee Chat this morning with Wil and Rachel! Adding some of the interesting questions we didn’t get to in the chat here for continued discussion :)

First question: Can you use historical data instead of running an online experiment to prove causation?


Another great question from our chat: How do you usually decide the % of users to roll out the experiment to?


Also from our chat: How do you know when to close out an experiment? How many results are good enough?


> Can you use historical data instead of running an online experiment to prove causation?
No.  You can not prove causality from observational data without significant assumption (which can only be claimed in very limited settings, that is, not in real-life business scenarios). 
Recommender systems have tried to find functions that work well; see great answer at https://www.quora.com/What-offline-evaluation-metric-for-recommender-systems-better-correlates-with-online-AB-test-results

This is why A/B tests, or controlled experiments, are the gold standard in science.

For slides about famous claims from observational data that were later proven wrong, see https://bit.ly/CODE2018Kohavi


 

> How do you know when to close out an experiment? How many results are good enough?

See https://www.quora.com/How-long-should-you-run-an-A-B-test-on-your-site-before-you-declare-one-a-winner/answer/Ronny-Kohavi


> How do you usually decide the % of users to roll out the experiment to?

The simple answer is split all your users across the variants (this gives you the maximum statistical power to detect small differences).
But don’t start an experiment that way, as new code is more buggy.  Start with a small percentage (e.g., 1%-5%) and make sure there are no egregious issues, then ramp-up to 50/50% or 1/3*3 (for A/B/C).  See https://experimentguide.com 

 


Thank you @ronnyk. Great stuff! 😀


Reply