I’m attempting to run a segmentation report for the total number of a specific event for the month of august. I’ve noticed that the number jumped the following day after I Initially ran the report on 9/4. My question is, does Amplitude do some sort of post-processing or qa/qc? This might explain the different tally I’m getting.
Does Amplitude post process data?
Hey
- If you group the event by
Library
Amplitude user property, which source are you ingesting the data from? Some upstream source vendors such as Segment offer automatic retries to queue up the data and batch send it to the destination. - Did any filters change (event filters, or segment filters) upon inspecting the first report count on 9/4 vs 9/5?
- If you duplicate the event on your event segmentation chart and set the following conditions on the event, does this event total count match the first event? (Did your org backfill any data for the month of August?)
- on the duplicated event, set the
Server upload time
Amplitude user property to ≥ 2023-08-01 00:00:00 - and set the
Server upload time
< 2023-09-01 00:00:00
- on the duplicated event, set the
The above will help identify what the total count of events were for August that were also uploaded to Amplitude in August.
From what I can tell, I think the filters were probably changed is why the numbers were off the following day.
But you brought something to my attention I never considered. If the datasource was coming from segment, would it have been better to use the ‘library’ and ‘server upload time’ user property for a more accurate number of events? I was unaware that a source could retry events if they were unsuccessful.
Hey
I took a look at Segment’s doc on retries to Destinations, and it states the following
Segment’s internal systems retry failed destination API calls for up to 4 hours
Given it’s just a 4 hour window, I don’t think it’s critical to set the server upload time when pulling event counts, but there certainly could be some latency in data ingested due to this.
I generally lean on including server upload time in event queries when the company is backfilling data into the project to differentiate between the data ingested during a certain time window vs data timestamped during a certain time window.
I hope this helped!
Reply
Welcome to the Amplitude Community!
If you don't have an Amplitude account, you can create an Amplitude Starter account for free and enjoy direct access to the Community via SSO. Create an Amplitude account. You can also create a Guest account below!
If you're a current customer, select the domain you use to sign in with Amplitude.
analytics.amplitude.com analytics.eu.amplitude.comWelcome to the Amplitude Community!
If you don't have an Amplitude account, you can create an Amplitude Starter account for free and enjoy direct access to the Community via SSO. Create an Amplitude account. Want to sign up as a guest? Create a Community account.
If you're a current customer, select the domain you use to sign in with Amplitude.
analytics.amplitude.com analytics.eu.amplitude.comEnter your E-mail address. We'll send you an e-mail with instructions to reset your password.