Skip to main content
Solved

Does Amplitude post process data?

  • 6 September 2023
  • 3 replies
  • 176 views

I’m attempting to run a segmentation report for the total number of a specific event for the month of august. I’ve noticed that the number jumped the following day after I Initially ran the report on 9/4. My question is, does Amplitude do some sort of post-processing or qa/qc? This might explain the different tally I’m getting. 

Hey @Mario Federis, a few follow-up questions:

  • If you group the event by Library Amplitude user property, which source are you ingesting the data from? Some upstream source vendors such as Segment offer automatic retries to queue up the data and batch send it to the destination. 
  • Did any filters change (event filters, or segment filters) upon inspecting the first report count on 9/4 vs 9/5?
  • If you duplicate the event on your event segmentation chart and set the following conditions on the event, does this event total count match the first event? (Did your org backfill any data for the month of August?) 
    • on the duplicated event, set the Server upload time Amplitude user property to ≥ 2023-08-01 00:00:00
    • and set the Server upload time < 2023-09-01 00:00:00

The above will help identify what the total count of events were for August that were also uploaded to Amplitude in August. 


From what I can tell, I think the filters were probably changed is why the numbers were off the following day. 

But you brought something to my attention I never considered. If the datasource was coming from segment, would it have been better to use the ‘library’ and ‘server upload time’ user property for a more accurate number of events? I was unaware that a source could retry events if they were unsuccessful.


Hey @Mario Federis , thanks so much for confirming there may have been filter changes across the 2 charts, which resulted in different counts

 

I took a look at Segment’s doc on retries to Destinations, and it states the following 

Segment’s internal systems retry failed destination API calls for up to 4 hours 

Given it’s just a 4 hour window, I don’t think it’s critical to set the server upload time when pulling event counts, but there certainly could be some latency in data ingested due to this. 

I generally lean on including server upload time in event queries when the company is backfilling data into the project to differentiate between the data ingested during a certain time window vs data timestamped during a certain time window. 

 

I hope this helped!   


Reply