Skip to main content

Hello! I’m a PM who is new to Amplitude, which we use in combination with Segment track calls. We have a particular metric we track with data that begins on Jan 1 and goes through the present. The data from May 6 through the present has been coming in correctly; however, the data between Jan 1 and May 5 is partially missing, thus resulting in much lower numbers than should be there.

What would be the best way to correct this? Is there a way to delete only the bad data between Jan 1 and May 5 and then replay it through Segment to populate in Amplitude, or is there a better way to backfill this data so it will be correct? 

Any guidance is very much appreciated. Thank you!

Hey @slooper 

Data ingested in Amplitude is immutable. We can only modify the data cosmetically during query time using some Govern tools and drop filters and , but it doesn’t affect the raw data.

If the case is for missing events which can uniquely identified from the already ingested data, then you can backfill those using Batch Event Upload API.

If you have your data stored in any warehouse, then you can setup a data source import for the missing data. Ensure that you are ingesting only the data which is missing and not the entire data for Jan 1 - May 5, else you run into ingesting duplicate data. Dedupe only happens if data is reingested within 7 days.

If you have to correct the ingested event data in itself ( eg. adding some event properties and user properties retrospectively) then that’s one of the elusive feature right now given Amplitude’s data immutability. I personally had to reingest the entire cleaned and fixed up data in a new project just to avoid the caveat circus.

Hope this helps.


Reply