Solved

Data Discrepancy

  • 20 April 2021
  • 6 replies
  • 289 views

Badge

We have an amplitude connector extracting data from our app. The data is available on the amplitude dashboard and also on our snowflake api. We compared the data loaded via the amplitude connector to snowflake with that in the amplitude dashboard and we are seeing differences in the sessions measure. Also the session_id is not converting to an accurate timestamp of the relevant session.

icon

Best answer by belinda.chiu 21 April 2021, 01:08

View original

6 replies

Userlevel 6
Badge +8

Hi @g.owusu 

Happy to help here! To clarify, did you purchase Amplitude’s Snowflake package or are you using a different way to load Amplitude data into Snowflake? 

Regarding to session_id, if you are using our native SDKs, Amplitude automatically generates a session ID for each new session; that ID is the session's start time in milliseconds since epoch (also known as the Unix timestamp). All events within the same session share the same session ID. https://help.amplitude.com/hc/en-us/articles/115002323627#how-amplitude-defines-sessions

If the behavior mentioned above is not what you are seeing for your session_ids, can you clarify what you mean by the session_id is not converting to an accurate timestamp of the relevant session? 

 

Badge

Hi @belinda.chiu 

We are using a different approach to loading Amplitude data into Snowflake. No filter has been applied on the data. According to this Unix timestamp epoch time is 1 January 1970. If we want to convert session_id using the function TO_TIMESTAMP_TZ(session_id) we should take into account the epoch time right?

Userlevel 6
Badge +8

Hi @g.owusu ! 

If you do want to convert session_id to a timestamp, then yes you should take into account the epoch time! I recommend cross-checking the conversion with https://www.epochconverter.com/ which is what I use whenever I convert Session ID to a time. 

Badge

Hi @belinda.chiu 

Thank you for that conversion method. I still have questions about the discrepancy in the data. We are seeing a difference of 1 on some days between the amplitude dashboard and the snowflake load from the amplitude connector. Can you please provide a troubleshooting approach to resolving the problem or to enable us to deep dive to the source of the problem?

Badge

Hello again @belinda.chiu 

Further investigation was undertaken and we noticed that sessions with events both in the Production App and Test App are excluded in the user session analyses in the Amplitude Dashboard thus resulting in the discrepancy I spoke of. Why are session_ids in the Test App appearing in the Production App? Also how does Amplitude define a user session in it’s analytical dashboard? How does a session have events in the Test App and also in the Production App? 

Userlevel 6
Badge +8

Hi @g.owusu ! Usually I would ask customers to start with this guide: https://help.amplitude.com/hc/en-us/articles/115002383247 when it comes to troubleshooting data discrepancies between Amplitude and other vendors (or anything that is not part of the official Amplitude integrations). 

To clarify, is it that the count of sessions is different? Is the count of sessions higher in Amplitude UI or higher in your Snowflake database? 

 

Here’s some information about sessions in Amplitude:

  1. If you use Amplitude’s SDKs, one project can have multiple users with the same session_ids so long as those users start a session at the same time. Remember session_ids are just the epoch timestamp of the first event in a session so if multiple users start at the same time then there will be multiple session_ids.
  2. If two projects have the same session_ids and those same session_ids also have the same events, then that means your organization is sending the same data to both Production and Test Apps. If this shouldn’t be the case, then you will need to check in with your Instrumentation team on why this is the case. 
  3. In the User Sessions chart (HC doc is here), user sessions are measured by unique session_ids per unique users of the project that you are querying in. So for example, the Total Sessions chart will give you to total number of unique sessions across unique users in the AmpliTunes project: https://analytics.amplitude.com/demo/chart/new/u2qjm3c?source=redirect%3A+new+chart+created
  4. If for any reason you are using a PROPCOUNT metric to count sessions aka https://analytics.amplitude.com/demo/chart/new/t9q42p5, please note that this will give you a different result than the User Sessions chart. In this chart, the same session_ids will be deduplicated. Notice how this chart is less than the Total Sessions chart above. 
  5. I’m not sure how you are comparing sessions but in Amplitude UI, each chart should be relative to the project you are querying in. Therefore, sessions with events both in the Production App and Test App shouldn’t be excluded from a Production App chart because we will just look at sessions in the Production App for the chart built in Production App.

Given the information you have provided thus far, the discrepancy is likely coming from how you are querying the session data - so long as you are using Amplitude data in the database. That said, since you are using a different way to connect Amplitude data to your Snowflake account rather than the official Amplitude-Snowflake integration, I won’t be able to rule out the error potentially stemming from the connector. 

If the guide I shared above and the information above still doesn’t help resolve the discrepancy you are seeing, it may be best to transfer this to a Support ticket so I can get specifics. Let me know if you would like to do so and an Admin will help us transfer this to a Support ticket! 

Reply