Solved

Redshift query to find unique users

  • 24 March 2023
  • 2 replies
  • 177 views

Badge

How can I modify my Redshift query to find a unique user that includes the merged Amplitude ID in addition to the WHERE condition?

The results of the query from the Amplitude segmentation and the unique user I found in Redshift are different, even though the WHERE condition and duration are the same. The number of the unique users from Redshift more then amplitude’s result.

I suspect that the difference may be due to the merged Amplitude ID, device ID and user ID.

Could you please help me with this issue?


I’ve already read below articles.
https://help.amplitude.com/hc/en-us/articles/115003135607

https://help.amplitude.com/hc/en-us/articles/206964247-Redshift-Active-users

icon

Best answer by Saish Redkar 24 March 2023, 16:43

View original

2 replies

Userlevel 7
Badge +10

Hi @dykim 

Here’s how the merged user id table will look like in your exported data along with the event table schema

 

To take into account merged ids, try using a JOIN condition for joining the event data table with the merged user table -  the amplitude_id being the join key. Then a distinct count on the user_id should help remove the duplicates introduced as a result of the JOINed dataset.

Let me know if this works out.

Badge

Thank you for your response!

Your solution has been helpful in removing duplicate amplitude_ids. However, there still remains a discrepancy between the unique users identified by the Amplitude segmentation board and the results from my Redshift query.

Here is my refined query that eliminates duplicate users. It has been successful in identifying and removing approximately 80% of the duplicated users.

select a.user_id, a.amplitude_id  
from events_1234 a
        left join merge_ids_1234 b on a.amplitude_id = b.amplitude_id
WHERE 
      DATE(event_time) between '2023-01-01' and '2023-01-07'
      AND platform IN ('iOS', 'Android')
      and user_id is not null
group by a.user_id, a.amplitude_id 


Maybe I missed unmerged amplitude_ids. How can I find that?

I've analyzed the unique user data from Amplitude by following the steps below:

  1. Obtained user information from the Segmentation dashboard by clicking the 'Download Users' button.
  2. Exported the query results from Redshift.
  3. Compared the Amplitude IDs from both datasets to identify any discrepancies or overlaps.

These steps helped me to effectively evaluate the unique users in Amplitude and cross-reference them with the query results from Redshift.

 

 

Reply