@aela welcome and thanks for writing in! In Amplitude, we do provide a deduplicated view of the data. This is done by creating a table function within the provided dataset named deduplicated_<appid>
. This function helps to manage potential duplicate events, especially if a backfill exists for a set of already-exported data.
Here's an example of how to use the deduplicate table function:
SELECT * FROM `testProject.testDataset.deduplicated_EVENTS_12345`('2022-01-01', '2022-02-10');
Please replace testProject.testDataset.deduplicated_EVENTS_12345
with your actual project and dataset name, and adjust the dates accordingly.
If you're not seeing this deduplicated view, it might be worth checking your setup and ensuring that the data sync is working correctly.
Thanks @Esther Trapadoux for your help. Indeed, the normal view is working without any issues. However,, I do not see the deduplicated view and I have followed the Amplitude documentation for the BigQuery Destination setup.
Hi @aela!
I have the same issue now. Please let me know if you managed to find out the cause of the problem? Сould you please share the solution?
Hi @aela @Mariia Kyselova sorry for the late response!
To resolve this issue, you can use the following statement to create the table function.
* Please replace APP ID
with your project ID.
CREATE OR REPLACE TABLE FUNCTION `amplitude_bq_ingestion`.deduplicated_EVENTS_APP ID(start_date DATE, end_date DATE)
AS
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER (
PARTITION BY uuid
) rn
FROM `amplitude_bq_ingestion`.`EVENTS_APP ID`
WHERE DATE(event_time) >= start_date and DATE(event_time) <= end_date) t
WHERE rn = 1;
I hope this helps. Please let me know if you have any questions.
Best regards,
Thao