Solved

Identity Resolution of Off-platform Events

  • 8 March 2024
  • 5 replies
  • 67 views

Badge

Amplitude is a CDP. As a CDP, Amplitude is responsible for ensuring it’s platform can consume multiple data sources to form an accurate picture of the customer. The primary function of the CDP is identity resolution for these various sources. There is a very common use case that I can’t seem to figure out with Amplitude. Consider this example:

A company decides to hold a webinar hoping it will create engagement that increases conversion. Leads signup to the webinar on a separate platform e.g. Zoom or RingCentral. After the webinar, the company wants to know how attending the webinar impacted conversion. They are able to export CSV of the list of attendees with their name, phone number, and email address. They take their CSV and upload it to S3 to utilize the native Amplitude S3 Integration.

Upon setting up the S3 as a source, there’s a problem. There is no device ID, or user ID in the CSV since these leads didn’t sign up on a platform where Amplitude could collect these.

I’ve tested this example without success. I generated a random device ID in the CSV data and assumed that Amplitude would perform Identity resolution on the supplied email and phone user properties. Amplitude did not recognize phone or email as a unique user property and will not merge users based on these fields. Can you please help me understand how to properly identify these leads into Amplitude with their email address or phone number?

Note: I have already reviewed the following guides:

https://help.amplitude.com/hc/en-us/articles/115003135607-Track-unique-users

https://amplitude.com/blog/identity-resolution-insights

https://www.docs.developers.amplitude.com/data/sources/amazon-s3/#create-amazon-s3-import-source

https://www.docs.developers.amplitude.com/analytics/apis/aliasing-api/

icon

Best answer by Saish Redkar 8 March 2024, 21:03

View original

5 replies

Userlevel 7
Badge +10

Hi @dargelies 

Identity resolution in Amplitude is currently based on just the device id / user id. I don’t think it will work based on just email and phone. If these are user properties stored on an user in Amplitude, I would :

  • reconcile the email and phone user properties and associate the user_id if applicable.
    • if all attendees of this webinar are using your app with an assigned user id, you should have a 100% coverage here. If not, the non user attendees will be anonymous users
  • create an event for the webinar attendance with the appropriate ingest schema and attributed to the correct user_id
Badge

Hi Saish,

Thank you for the reply. Yes, I’ve tested this and Amplitude does not reconcile based on just email and phone number.

The users are not in our system and do not have a user id. They are leads. We don’t believe it’s a good practice to create a user in your database for every lead that happens to show some interest in your product (consider every person that lands on your website). We create them after they’ve followed through with the intent to become a customer. Then it makes sense to create a user in our database and assign an ID to them.

From what I understand the non-user attendees are not created as anonymous users in Amplitude. They cannot be ingested at all, because there is no device ID or user ID associated with them. The entire S3 integration fails. I believe they should be created as anonymous users in Amplitude and then reconciled when they become identified users at the conversion event based on their deterministic matching identifiers: email and phone number.

Am I understanding that Amplitude is not capable of reconciling users when given deterministic matching identifiers? If so this seems like a pretty large gap in the functionality of a CDP, this blog post discusses how Amplitude uses deterministic identifiers for reconciliation.

Hi @dargelies  - apologies for the late reply but wanted to make sure you got an answer. Amplitude does indeed use deterministic identifiers for resolution. You may have noticed in this article that we merge on user ID only. Unfortunately that means that unless you are willing to set phone number or email as the “user ID”, users will remain anonymous (and unmerged).  However, you can hash the email and set it as device ID, if you do not want to set email as user ID, and have the phone number and email as user properties. This will require you to add another column to your CSV but should solve your problem.

A consistent hash will allow for anonymous behavior to stay merged (say if they attend 3 different webinars). This approach will also allow you to maintain consistency with your internal systems and not assign a true User ID until they follow through. 

 

Hope this helps!

Badge

Hey @audrey.xu thanks, interesting solution here. If we use a solution like this, we’d just need to be able to consistently generate the same hash when we’re identifying the user to associate those previous anonymous events with the recently identified user. I’d assume we could also do the same with just phone number as we may just have phone number for some of these sources. Tactically does this just mean that when we do finally call identify that we’ll call it several times? Consider the following example:

  • User is anonymously browsing on our website with Device ID assigned by Amplitude as a UUID in cookies
  • User attends an off platform webinar event with Device ID assigned as hash of email
  • User makes an off platform inbound call with Device ID assigned as hash of phone number

When this user now completes the signup flow on the website we will:

  • Call identify with our identifier as the User ID and Device ID assigned as the Amplitude UUID
  • Call identify again with our identifier as the User ID and Device ID as the hashed email
  • Call identify a third time with our identifier as User ID and Device ID as the hashed phone number

Will this then merge these three Amplitude users into a single user so that we capture all on-platform and off-platform events? Which user will they be associated with (by Device ID)?

Would we need to continue calling identify three times afterwards e.g. on login to associate off platform events that have occurred since the last login or would this initial linking keep associating these “device IDs” to the same Amplitude user? E.g. user attends another webinar post signup flow by the email hash Device ID, does that get associated to the same Amplitude user as the initial identify call on signup?

Hey @dargelies !

In the scenario you listed, correct, you’ll call identify 3 times for the 3 disparate device IDs to associate them with the same user ID. They will now be keyed primarily off of the Amplitude ID + user ID (and all previous Amplitude IDs will be associated with it) but you’ll be able to see the device ID change if you look in the user stream. You will not need to call Identify again, those device IDs will now be linked to the user ID. To your point, if the hash is consistent then you will not need to continuously call Identify, once the device ID is associated to a known User ID it will remain so until you force it to be associated to another User ID. This would be done by passing the device ID and a new user ID simultaneously together in a subsequent call. It doesn’t sound like you would run into this scenario though!

Reply