Solved

Stitching issue with shopify store event data


Badge

Hey Everyone,

I recently set up Amplitude to capture Shopify store event data, and I am seeing a bit of unexpected behavior. While Amplitude does seem to be capturing all events on our website, I’m having a bit of an issue piecing together the full user journey for customers who go completely through the funnel and make a purchase on the site.

 

Problem: For purchasing customers, the event stream is getting broken apart as coming from two different users which I’ll refer to as the pre-checkout user and checkout user.

 

Pre-Checkout User

This stream has the most events and captures all the website browsing type events (page view, product clicked, product details viewed, product added, etc.). This user has a device_id in the form of an alphanumeric string (ie. jHdTf2LNvsXrKDkwnq2Yto) in addition to information about the device type, and properly allocates a amplitude_id, session_id as well as everything else I would expect. The user_id is also generally null because the customer hasn’t signed into their account or created one yet.

 

Checkout User

The weird behavior comes when that same pre-checkout user starts their checkout process as the checkout started, purchase item, and order created events all get logged to a new user (different device_id, amplitude_id). The device_id is now in the form of a UUID (ie. 31518f28-9d40-59be-aa34-a1e5633f0f01) with no information in regards to the device type, platform, etc. From my understanding of the docs, it seems that the amplitude_id is generated based off the device_id and user_id. Since the device_id is now different it makes sense that the amplitude_id would be different, but I am trying to understand why the device_id is all of a sudden different.

Some other points to note for the checkout user:

  • user_id now has a value — email (this is expected)
  • session_id = -1 (unexpected)

Curious if anyone in the community has run into this problem before when setting Amplitude up on their Shopify store? To me, it seems like something may be happening with the cookie that is stored on the user’s device to recognize their device_id? My hypothesis is that when the customer starts a checkout that checkout process is happening in some other window/browser that is not aware of the original cookie. Is there any further setup I should be doing to ensure the appropriate device_id from the pre-checkout user gets passed into the checkout user?

 

Thanks in advance, and excited to join the community!

 

Sean

 

 

icon

Best answer by Denis Holmes 6 April 2022, 10:00

View original

13 replies

Userlevel 6
Badge +8

Hi @sean.meehan ,

 

Thanks for writing in! May I ask why the events are set as out of session events (-1) for the Checkout User? I would try to keep the session ID consistent so that Amplitude can keep track of certain properties. Regardless, my understanding is that everything works as expected until the pre-checkout user (PCU) gets a new Device ID, and therefore Amplitude ID, when they get checked in. They are then assigned a new user ID which is expected. However, the new device ID is not. How are you sending the data to Amplitude? I would make sure the implementation is right and you should be able to track the events and pass the same Device ID in when they check in to the system. 

 

If you can share with me a funnel chart, I can look into it further and see what I can find.

 

Kind Regards,
Denis

 

Badge

Hey @Denis Holmes ,

 

Appreciate the reply.

I’m not sure why the events are coming in as out of session (-1) for the checkout user; this is not anything that I intentionally set.

I have the Amplitude plug-in installed on our shopify store and am sending data to Amplitude that way. I followed the basic installation instructions here, but did not do anything additional with any SDK configurations.

I was doing some more digging last night, and I believe the distinction between the pre-checkout user and checkout user is related to client-side vs server-side events, could this make sense? The checkout user seems to have all server side events (I think this is denoted by the green square in their event stream? — see image). Is there some further setup I need to do to pass the device_id from the client-side events to the server-side events?

 

In regards to the funnel chart, what is the best way for me to share that with you? Are you just looking for a picture, or do you need a link so you can click around? If I give you a link will you be able to view it or is there some other setting I need to tweak? Bear with me — super new to the platform so trying to get a handle on the best way to share in these sorts of troubleshooting scenarios.

 

Sean

Userlevel 6
Badge +8

Hi @sean.meehan ,

 

No need for the funnel chart as I understand the issue now. So yes, the fact that the checked In User is sending their events server-side, that would mean you would need to manually send the data for that user in the HTTP API payload. I would recommend a read of this article which states

“Amplitude automatically generates a session ID for each new session; that ID is the session's start time in milliseconds since epoch (also known as the Unix timestamp). All events within the same session share the same session ID. If you are using Amplitude's SDKs, this happens automatically. However, if you are sending data to Amplitude using the HTTP API, you will have to explicitly set the session ID field in order to track sessions.“

 

I would review your implementation of the server-side issue and ensure that you are sending the correct data for that user, you will need to work on where the events are being sent from (server side) and ensure the correct data is being sent so that the pre checkout and checkout user will be merged into one. I hope that helps for now!

Badge

No need for the funnel chart as I understand the issue now. So yes, the fact that the checked In User is sending their events server-side, that would mean you would need to manually send the data for that user in the HTTP API payload. I would recommend a read of this article which states

 

Thank you for article link.

Badge

@Denis Holmes appreciate the information. I had a feeling this might be the problem, but wanted to get some verification that I wasn’t over thinking it before going down this rabbit hole. I’m planning to work on this with the web team tomorrow and will let you know how it goes. I imagine in addition to passing through the client-side session_id, I’ll also want to pass through the device_id? Or will passing just the session_id be enough for Amplitude to then attach the rest of the user properties as they are known for that session?

Userlevel 6
Badge +8

@sean.meehan I would also pass through the Device ID to be sure that the user properties will sync. Pass both session ID and Device ID.

Badge

Hi @Denis Holmes ,

I wanted to give you an update as lots of work has been put into this over the past week in terms of initial set-up, and then validation. The good news is we have full event stream data flowing now. The bad is that unfortunately Shopify does not give us access to modifying their server-side events so we cannot pass any identifying information into them (ie. session_id, device_id, etc.) which I believe would be the ideal path we discussed here.


Because of this limitation, we took the approach of generating our own custom checkout_started event. This isn’t super ideal as it’s one extra thing we have to manage right now (there was a lot of work and testing we had to do to ensure it fires in various scenarios — it’s still not perfect in all scenarios, but it is working in most). In that sense, it’s more of a band-aid patch in the hopes that the Amplitude app on Shopify is better able to tie out these server side events in the future. I understand the Shopify plugin is relatively new? Happy to help with more feedback any way that I can.


In our own checkout_started event we are logging a cart_id. Logging the cart_id is the critical bit of information that allows me to stitch users together with their server side events in our data warehouse (I am loading the raw Amplitude data in via Fivetran). This was the only tie-in I could think of for the time being. I am also noticing that Amplitude often later figures out that these users should be stitched together itself (in most cases, but not all). I haven’t had too much time to dig into this myself yet, but do you think Amplitude’s stitching could be leveraging this cart_id match as well? Or would it be figuring it out some other way (ie. a customer completes an order and gets sent a thank you confirmation email and then clicks a link in that email which takes them back to our store and some sort of cookie is then passed that server-side order completion back to their client now). For downstream modeling purposes: in the case where Amplitude is able to stitch two users together, do you know how long of a time delay it usually is before Amplitude figures this out, and updates the data? (a few hours, a day, etc.?)

 

Userlevel 6
Badge +8

Hi @sean.meehan ,

 

Thanks for the reply. I hear your feedback and I will pass it on to the relevant team, thank you! It is a shame you cannot edit the server-side events which is problematic. My suggestion would be is there a way you can try to see if Shopify will provide the payload, and not send the event, and you make your own edits to the payload and send it yourself? Just a thought but maybe that might work! If you could implement the standalone JS SDK, that might be a better fit but I am not very aware of the Shopify interface.

No, Amplitude should not be using the custom cart_Id in the event to merge. It would use Device ID, user ID and Amplitude ID as well as some other meta data but it should not use the cart_id to merge on. The email link sounds like the most reasonable explanation as I know cookies do come into play when it comes to keeping a track of users. The data should be relatively quick once Amplitude has enough data to judge the two users as the same one. 

 

I hope this helps! Let me know if you have any further queries!

 

Regards,
Denis

 

Badge

Thanks for all this info @Denis Holmes

 

It is a shame you cannot edit the server-side events which is problematic.

 

So this is a limitation of the Amplitude Shopify App?  Its not intended to maintain context through the checkout phase?

Userlevel 6
Badge +8

Unfortunately, if Shopify does not give you access to modifying their server-side events, then it would be an issue. It would be a limitation on their part when it comes to not allowing users to edit the server payloads that are sent. You would need to see with them if they will let you edit it but I assume not. You would need to pass the Device or User ID in the server side event to help it match to the right user. 

 

Have you spoken with Shopify about this issue? I am sure they have had others with a similar setup who must be successful! 

Badge

Hi @Denis Holmes

 

You would need to pass the Device or User ID in the server side event to help it match to the right user.

 

Have you spoken with Shopify about this issue? I am sure they have had others with a similar setup who must be successful!

 

I am facing the same problem.

Our site does not seem to be able to get user IDs because the specs do not allow for a My Page to be created.

It might be possible to link them by device ID. However, I do not know how to do that. What is the best way? (Looking at amplitude, it looks like we are getting device IDs, but the IDs are different on the checkout page than on the TOP page?)

 

I contacted the shopify side and they said that the integration part on the app side should be handled by the person in charge of the app side...

Badge +10

@sean.meehan I stumbled across this and was wondering if you have solved this properly now? Things have changed on the Shopify side and I have been able to get a complete stream of user data attributed to a single identifier including checkout without any workarounds.

Badge

hey @timothy-permutable, that’s really good to hear. I’m actually no longer with the company that I hooked this up for so I couldn’t tell you how the data looks these days, but appreciate having this insight for future ref. 

Reply