Hi @Imploxion ,
You are free to use the S3 Export if you wish here. https://www.docs.developers.amplitude.com/data/destinations/amazon-s3/#run-a-manual-export
However, most people who use the redshift export do not have an issue with their times of the event. However, I do see and understand your case and if you do want to add that field to your data table, I think Snowflake would be a better option.
If the source table in the data warehouse had a timestamp field representing when the event arrived in the data warehouse, you could use this as the timestamp_column.
Creating an Insertion Timestamp Column in Snowflake
-
To add a snowflake_insertion_time column to a snowflake table, you need to specify a column with a default value.
-
The snowflake function that returns the current timestamp in UTC is SYSDATE(). You will use the SYSDATE() function as the expression in the column’s default statement.
-
Unfortunately, today you cannot use ALTER TABLE
to add a column that defaults to a non-deterministic expression. This means that the snowflake_insertion_time column has to be added to the table when it is created
-
In your CREATE TABLE
command, specify a column snowflake_insertion_time timestamp default SYSDATE()
.
If you do add an insertion timestamp, it is advised to create a new source (and possibly a new project depending how new it is).
-
Then create a new project.
-
In the new project create a new source with the same sql as before except you will select the new insertion timestamp column you created in the source table, and you will specify that column as the timestamp column.
-
Test and enable the source. All existing data in the table should be imported in the first ingestion run of the source, and any data loaded after should be imported in successive job runs.
I hope this helps!
Kind Regards,
Denis
Hello @Denis Holmes and thanks for your answer.
Actually moving to Snowflake is not an option, I tried the same in Redshift and the export failed, but I don’t know exactly if it for the COPY command used to send the data.
There wouldn’t be any option to do the same or something similar with Redshift? Maybe I didn’t add this column correctly and we should load all the data with this column from the beginning (edit the DDL before start sending data).
Thank you.
Hello @Imploxion ,
Thanks for the reply! With Redshift, I do not believe you are allowed to edit the export schema in any way and believe we don’t allow for exceptions either. The main issue is you have redshift and this is a data source, correct? And sometimes, not all the data will be available at the source at once. You want to have a column in redshift that determines the time/order of the events? Please correct me if I am misunderstanding. I will then ask my team internally for clarification if anything can be done.
Kind Regards,
Denis
Hi @Denis Holmes and thank you.
Yes, we are using Amplitude as Data Source to get the event Data in our Redshift cluster, and we are using your Redshift Export tool. We have investigated this destination and also S3, because with S3 we can explore this possibility of a column with the insertion Timestamp in Redshift. It would be a column in redshift that indicates the time at which a given row is written to redshift. In addition of client_event_time and event_time that any event has.
We have tried also adding this column in the DDL of the Amplitude table in Redshift, but this made the export to fail. It would be super if you can talk with your team about this.
One thing that can help us is having an example in the case the export send delayed data, because using event_time as order key could make it to get lost. The event_time description is:
Amplitude timestamp (UTC) which is the client_event_time adjusted by the difference between server_received_time and client_upload_time, specifically: event_time = client_event_time + (server_received_time - client_upload_time) Amplitude uses this timestamp is used to organize events on Amplitude charts. NOTE: If the difference between server_received_time and client_upload_time is less than 60 seconds, the event_time will not be adjusted and will equal the client_event_time.
But this doesn’t clarify us too much about how it works as order key.
Thank you.
Hi @Imploxion ,
Let me ask the team and I will get back to you :)
Kind Regards,
Denis
Hi @Imploxion ,
As confirmed by the Engineers, you can refer to the server_upload_time
column, it’s the value that we use for determining the export (we look back several hours to see what data are ready for export based on server_upload_time), all of data are organized based on server_upload_time
within Amplitude. There shouldn’t be any issues with missing data. I hope that helps!
Kind Regards,
Denis