Skip to main content

Hello !

I would like to measure MAU/DAU with S3 Exported data format.

 

Define MAU such that

MAU(/DAU) : Monthly(/Daily) Unique Active Event WHERE user_id != None

 

And here is the exported data JSON object schema:

{
  "server_received_time": UTC ISO-8601 timestamp,
  "app": int,
  "device_carrier": string,
  "$schema":int,
  "city": string,
  "user_id": string,
  "uuid": UUID,
  "event_time": UTC ISO-8601 timestamp,
  "platform": string,
  "os_version": string,
  "amplitude_id": long,
  "processed_time": UTC ISO-8601 timestamp,
  "user_creation_time": UTC ISO-8601 timestamp,
  "version_name": string,
  "ip_address": string,
  "paying": boolean,
  "dma": string,
  "group_properties": dict,
  "user_properties": dict,
  "client_upload_time": UTC ISO-8601 timestamp,
  "$insert_id": string,
  "event_type": string,
  "library":string,
  "amplitude_attribution_ids": string,
  "device_type": string,
  "device_manufacturer": string,
  "start_version": string,
  "location_lng": float,
  "server_upload_time": UTC ISO-8601 timestamp,
  "event_id": int,
  "location_lat": float,
  "os_name": string,
  "amplitude_event_type": string,
  "device_brand": string,
  "groups": dict,
  "event_properties": dict,
  "data": dict,
  "device_id": string,
  "language": string,
  "device_model": string,
  "country": string,
  "region": string,
  "is_attribution_event": bool,
  "adid": string,
  "session_id": long,
  "device_family": string,
  "sample_rate": null,
  "idfa": string,
  "client_event_time": UTC ISO-8601 timestamp,
 }
 

Can you help me figure out which column to refer to in order to calculate MAU metric?

Also if there are any specification for this JSON object schema, please let me know.


Just to let you know, I have tried following queries but all of these gave an output which differed from what amplitude application showed.

  • SELECT(DISTINCT uuid) FROM table WHERE user_id IS NOT NULL
  • SELECT(DISTINCT user_id) FROM table WHERE user_id IS NOT NULL
  • SELECT(DISTINCT event_id) FROM table WHERE user_id IS NOT NULL
  • etc.

 

Thank you !

 

Amazon S3 Exported data format link
https://www.docs.developers.amplitude.com/data/destinations/amazon-s3/#exported-data-format:~:text=Here%20is%20the%20exported%20data%20JSON%20object%20schema%3A

Hi @naian 

This behavior is expected since a direct SQL query on the event JSON dataset won’t be enough to get the results shown in Amplitude charts. You will have to query the merged user id dataset in combination with the event dataset to remove any duplicated user ids to get accurate enough results.

I would go with the “SELECT count(DISTINCT user_id) “ query on the joined dataset along with the event_time for MAU calculations.


Thank you for the answer ! @Saish Redkar 

 

“The merged user id dataset” that you mentioned means this dataset right?

https://www.docs.developers.amplitude.com/data/destinations/redshift/#redshift-export-format:~:text=version.%20Example%3A%201.0.0-,Merged%20User%20table%20schema,-%C2%B6
 

{

“amplitude_id”: NUMBER(38,0)

“merge_event_time”: TIMESTAMP

“merge_server_time”: TIMESTAMP

“merged_amplitude_id”: NUMBER(38,0)

}
 

If it is right, then “SELECT count(DISTINCT user_id) “ query is right too?

I wonder about why “SELECT count(DISTINCT user_id) “ needs merged user id dataset,

cause there are no any columns to be used. (eg. “amplitude_id”, “merge_event_time”, “merge_server_time”, “merged_amplitude_id”)

 

Thank you once again !


Here’s a callout from the docs re. Redshift export
 

 


Reply