BigQuery Import Best Practices

  • 23 September 2022
  • 1 reply
  • 352 views

Hi community,

I plan to utilize the BigQuery Import feature to populate data as part of a conversion from GA universal to Amplitude. The feature is in beta and I have a few questions I thought best to post here.

  1. This process requires I set up a GCS storage bucket, but there is no discussion about size constraints. Are there any? I’m moving a few year’s worth of data!
     
  2. Is it best to break up your historic session tables into monthly temp tables first (which also gives me the option to remove unnecessary events from sessions if I choose)?
     
  3. Using the “Time-based” import, if I run multiple imports by monthly tables as mentioned above, do I assume the data is simply additive by date and does not replace other data with each import? Do I need to go from oldest date to newest?

    I also see “SQL query” listed in the documentation -- am I designing the SQL to match up to a known schema or does the import assume the UA (or GA4) schema as part of the design of this process?

Many thanks for answers/thoughts/ideas.

 


1 reply

Userlevel 4

Hey @plear,

Looks like we were able to assist you via a support ticket which we will communicate further in, but just wanted to provide some context here as well since you’ve also asked on our community.

To your questions: 

* This process requires I set up a GCS storage bucket, but there is no

discussion about size constraints. Are there any? I’m moving a few year’s

worth of data!

 
There aren't constraints to the amount of data that you're able to import into Amplitude with the GCS Import, however, contractually you have a certain amount of events that you're able to ingest per month (depending on contract). This should be reviewed with your account admins with regards to how much data you want to send into Amplitude. 
 

* Is it best to break up your historic session tables into monthly temp tables

first (which also gives me the option to remove unnecessary events from

sessions if I choose)?

 
While I am able to provide best practices on how to utilize the Amplitude platform and general event structure, I would not be able to recommend whether or not you should break up your session tables into a specific format. However, since we do take a precision tracking approach, I would suggest to remove any irrelevant events that may not be important to all of your analyses. 
 

* Using the “Time-based” import, if I run multiple imports by monthly tables as

mentioned above, do I assume the data is simply additive by date and does not

replace other data with each import? Do I need to go from oldest date to

newest?

 
Chronological event uploads would most often be recommended, but if you run multiple imports, you will just need to specify the cadence in which you like for the sync to occur.
 
 ?name=Screen+Shot+2022-09-27+at+2.26.51+PM.png
 

* I also see “SQL query” listed in the documentation -- am I designing the SQL

to match up to a known schema or does the import assume the UA (or GA4)

schema as part of the design of this process?

 
Would you be able to clarify where you are seeing this SQL query listed in the documentation? If you are referring to our SQL query helper then BigQuery doesn't have great support for JSON, so the documentation describes how to make sure your data is exported from BigQuery and imported to Amplitude without errors with the aid of this SQL helper.
 
We have some tooltips within the UI when setting up the BigQuery integration but also have some documentation covering some query examples here.

Hope this helps!

Reply