Skip to main content
Solved

How can I measure DAU/MAU in S3 Exported data format ?

  • November 15, 2023
  • 3 replies
  • 304 views

Hello !

I would like to measure MAU/DAU with S3 Exported data format.

 

Define MAU such that

MAU(/DAU) : Monthly(/Daily) Unique Active Event WHERE user_id != None

 

And here is the exported data JSON object schema:

{
  "server_received_time": UTC ISO-8601 timestamp,
  "app": int,
  "device_carrier": string,
  "$schema":int,
  "city": string,
  "user_id": string,
  "uuid": UUID,
  "event_time": UTC ISO-8601 timestamp,
  "platform": string,
  "os_version": string,
  "amplitude_id": long,
  "processed_time": UTC ISO-8601 timestamp,
  "user_creation_time": UTC ISO-8601 timestamp,
  "version_name": string,
  "ip_address": string,
  "paying": boolean,
  "dma": string,
  "group_properties": dict,
  "user_properties": dict,
  "client_upload_time": UTC ISO-8601 timestamp,
  "$insert_id": string,
  "event_type": string,
  "library":string,
  "amplitude_attribution_ids": string,
  "device_type": string,
  "device_manufacturer": string,
  "start_version": string,
  "location_lng": float,
  "server_upload_time": UTC ISO-8601 timestamp,
  "event_id": int,
  "location_lat": float,
  "os_name": string,
  "amplitude_event_type": string,
  "device_brand": string,
  "groups": dict,
  "event_properties": dict,
  "data": dict,
  "device_id": string,
  "language": string,
  "device_model": string,
  "country": string,
  "region": string,
  "is_attribution_event": bool,
  "adid": string,
  "session_id": long,
  "device_family": string,
  "sample_rate": null,
  "idfa": string,
  "client_event_time": UTC ISO-8601 timestamp,
 }
 

Can you help me figure out which column to refer to in order to calculate MAU metric?

Also if there are any specification for this JSON object schema, please let me know.


Just to let you know, I have tried following queries but all of these gave an output which differed from what amplitude application showed.

  • SELECT(DISTINCT uuid) FROM table WHERE user_id IS NOT NULL
  • SELECT(DISTINCT user_id) FROM table WHERE user_id IS NOT NULL
  • SELECT(DISTINCT event_id) FROM table WHERE user_id IS NOT NULL
  • etc.

 

Thank you !

 

Amazon S3 Exported data format link
https://www.docs.developers.amplitude.com/data/destinations/amazon-s3/#exported-data-format:~:text=Here%20is%20the%20exported%20data%20JSON%20object%20schema%3A

Best answer by Saish Redkar

Hi @naian 

This behavior is expected since a direct SQL query on the event JSON dataset won’t be enough to get the results shown in Amplitude charts. You will have to query the merged user id dataset in combination with the event dataset to remove any duplicated user ids to get accurate enough results.

I would go with the “SELECT count(DISTINCT user_id) “ query on the joined dataset along with the event_time for MAU calculations.

View original
Did this topic help you find an answer to your question?

3 replies

Saish Redkar
Expert
Forum|alt.badge.img+10
  • Expert
  • 1380 replies
  • Answer
  • November 15, 2023

Hi @naian 

This behavior is expected since a direct SQL query on the event JSON dataset won’t be enough to get the results shown in Amplitude charts. You will have to query the merged user id dataset in combination with the event dataset to remove any duplicated user ids to get accurate enough results.

I would go with the “SELECT count(DISTINCT user_id) “ query on the joined dataset along with the event_time for MAU calculations.


  • Author
  • New Member
  • 1 reply
  • November 17, 2023

Thank you for the answer ! @Saish Redkar 

 

“The merged user id dataset” that you mentioned means this dataset right?

https://www.docs.developers.amplitude.com/data/destinations/redshift/#redshift-export-format:~:text=version.%20Example%3A%201.0.0-,Merged%20User%20table%20schema,-%C2%B6
 

{

“amplitude_id”: NUMBER(38,0)

“merge_event_time”: TIMESTAMP

“merge_server_time”: TIMESTAMP

“merged_amplitude_id”: NUMBER(38,0)

}
 

If it is right, then “SELECT count(DISTINCT user_id) “ query is right too?

I wonder about why “SELECT count(DISTINCT user_id) “ needs merged user id dataset,

cause there are no any columns to be used. (eg. “amplitude_id”, “merge_event_time”, “merge_server_time”, “merged_amplitude_id”)

 

Thank you once again !


Saish Redkar
Expert
Forum|alt.badge.img+10

Here’s a callout from the docs re. Redshift export
 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings