Solved

Understanding how the "Language" key is populated

  • 30 March 2023
  • 8 replies
  • 377 views

Badge

Hi there, we're trying to understand how the values for the key "Language" are set.

A bit of context, our data is piped from Segment on Android using the segment Android SDK.

For the "Language" key, we see multiple values stored for this key, (e.g. 'es-MX', 'es-US', 'en-US', 'Spanish' and 'English'"). We're trying to work out when 'Spanish' or 'English' are set vs. something like 'es-MX'.

We found some documentation that we thought might have to do with this called 'Canonicalization'?

https://www.docs.developers.amplitude.com/experiment/general/evaluation/remote-evaluation/?h=language#user-enrichment

If so, does anyone understand it in more detail? Not understanding this is blocking our ability to understand our customers better and debug issues on our end and blocking work being completed.

icon

Best answer by Jeremie Gluckman 4 April 2023, 23:26

View original

8 replies

Userlevel 6
Badge +9

Thanks for posting here @samelliott8889 and excuse the delay. The language value is pulled from what is provided by the device in the `navigator.languages` field. Here is the source code for the JS SDK, which is the bundled integration for segment. I believe the user could change this value if they change their device or their browser settings. 

Badge

Thank you so much for this reply. 

We are trying to working out why Amplitude sometimes displays ‘Spanish’ or ‘English’ vs ‘en-US’ vs ‘es-MX’ in the dashboard or when data is exported.

 

Are there other data fields that Amplitude canonicalizes to decide whether or not to present ‘Spanish’ vs ‘es-MX’ for example.

 

Best,

Sam

Userlevel 6
Badge +9

Sure thing @samelliott8889 So just focusing on the [Amplitude] Language, if a user is ingested through the JS SDK, the language is pulled from what is provided by the device on the navigator.languages field - here is the source code for JS SDK. For Android and iOS SDK, these should also be from the device itself so the OS - here is the source code for Android SDK and here is the source code for the iOS SDK. Therefore to conclude, if the user is coming through our native SDKs, then yes the Language should be representative of the user's OS/device. If a user is ingested through HTTP API, then this value was explicitly set by your organization so it could mean anything. 
 
On a separate note, what language your organization uses to send events to Amplitude is completely separate from the language that your users on their device or app. If your organization wanted to send events to us in Spanish, your Engineering team would have had to coded the event name etc in Spanish.  

Badge

Thanks again for your reply.

Below is a json example of the data sent from Segment to Amplitude. We use the Segment SDK on Android. We assuming the Language key is set from the “locale” key.

And so in your reply, are you saying that it is likely the Android devices that send the key ‘Spanish’/‘es-MX’ vs some data transformation done on Amplitudes side?
 

{
"_metadata": {
"bundled": [],
"bundledIds": [],
"unbundled": []
},
"channel": "client",
"context": {
"app": {
"build": "31",
"name": "Empower MX",
"namespace": "finance.empower.mx",
"version": "1.4.0.67"
},
"device": {
"id": "89836dc615f2656799c56e4017bef80a8788c063e489a56ce17437faa0005862",
"manufacturer": "samsung",
"model": "SM-A045M",
"name": "a04",
"type": "android"
},
"ip": "201.173.66.49",
"library": {
"name": "analytics-kotlin",
"version": "1.8.0"
},
"locale": "es-US", // assuming this is the key that's used
"network": {
"bluetooth": false,
"cellular": false,
"wifi": true
},
"os": {
"name": "Android",
"version": "12"
},
"screen": {
"density": 2.125,
"height": 1453,
"width": 720
},
"timezone": "America/Mexico_City",
"userAgent": "Dalvik/2.1.0 (Linux; U; Android 12; SM-A045M Build/SP1A.210812.016)"
},
"integrations": {
"Actions Amplitude": {
"session_id": 1680584103709
}
}

 

Badge

For some additional context, Segment never sends a value ‘Spanish’ or ‘English’, its always an ISO 639-1 language code.
 

We log our Segment data, and we only ever see ISO 639-1 language codes:

This is a count on the “context_locale” key on our Segment data. We never see 'Spanish' for example.

es-US 739688
es-MX 364326
es-ES 26197
en-US 7599

Userlevel 6
Badge +9

Thanks for the additional context @samelliott8889 I’m going to forward these details to our support team who can provide additional assistance. 

Badge

Thanks, I already have a thread going there with them too, just wanted to ask the community as well if anyone had seen it before.

 

Thanks again for your replies.

Userlevel 6
Badge +9

Sure thing @samelliott8889 Keep us posted if you need anything!

Reply