Alex Magnusson is a leading expert in the field of product analytics, specializing in helping companies leverage the power of Amplitude to make data-driven decisions. He is the co-founder and CEO of Magnusson Analytica, a consultancy that provides Amplitude implementation, training, and strategic guidance to clients worldwide. If you want 1:1 help and a free chat, book time with Alex here!
In this Amplitude Messy Data Cleanup Hour, we covered the essential steps for maintaining clean, reliable, and actionable data in Amplitude. Including common challenges such as overwhelming event lists, inconsistent naming conventions, data validation, and the ongoing task of aligning tracking with evolving business objectives.
For more help, and to chat with Alex and other users, join our Slack community!
Amplitude Data Cleanup Checklist
Here's a handy checklist to help you audit, clean, and govern your Amplitude data:
I. Establish a Data-Driven Mindset:
- Build strong habits, data cleanup is an ongoing process: Just like keeping a house clean, Amplitude projects require consistent attention to prevent data from becoming messy and unusable.
- Set a Target Event Limit, aim for 50 events: Strive to keep your event count to a manageable number, such as 50 events or fewer. This will simplify analysis and make it easier to maintain consistency in your tracking.
II. Conduct a Thorough Data Audit:
- Download Event and Property Lists: Export your event list as a CSV file to gain a comprehensive view of your current tracking setup. Include properties as well.
- Calculate Event "Power Scores": Consider developing a formula that combines event volume (how often they are fired) and query count (how often they are used in analysis) to create a "Power Score." This metric can help you prioritize events for review and potential removal. This formula is not explicitly mentioned in the sources.
- Analyze Event Usage: Identify events with very low usage or that haven't been queried recently (e.g., in the last 180 days).
- Identify properties with inconsistent naming or data types. More on event naming conventions and taxonomy here.
- Align Tracking with Business Objectives: Review your business objectives and key performance indicators (KPIs). Determine which events directly support those objectives and KPIs.
III. Implement Robust Data Governance:
- Document Naming Conventions: Clearly define your event and property naming conventions, including case (e.g., title case for events, snake case for properties), separators (e.g., underscores or camel case), and any prefixes or suffixes you'll use. Here’s a handy taxonomy spreadsheet to get you started.
- Share and Enforce Conventions: Store your naming conventions in a centrally accessible location (e.g., Confluence, a shared document, or a team wiki).
- Configure Amplitude's Data Governance settings:
- In your development environment, set rules to flag events or properties that deviate from your conventions. This provides immediate feedback to developers, especially if you set up email notifications to send error alerts.
- In your production environment, consider setting stricter rules to block non-compliant events and properties to prevent inconsistencies from polluting your data.
IV. Clean Up and Optimize Event and Property Structure:
- Consolidate Redundant Events: Merge events that capture very similar actions, using event properties to distinguish variations instead of creating separate event types.
- For example, instead of having separate events for "View Product Page," "View Homepage," and "View Checkout Page," use a single "View Page" event with a "page_name" property to specify the page being viewed.
- Merge Inconsistent Properties: Use Amplitude's Transform feature to merge properties that have different names but represent the same data point. This will ensure data consistency and simplify analysis. Be sure to delete the old, inconsistently named properties once you have enough historical data using the new, consolidated property.
V. Prioritize the Amplitude Tracking Plan:
- Collaborate with Developers Early: Work closely with developers before they start writing tracking code to make sure they understand your naming conventions, expected data types, and other requirements outlined in your tracking plan.
- Enforce Data Type Consistency: Be explicit about the data type (e.g., string, integer, boolean) expected for each property in your tracking plan. Communicate these requirements clearly to your developers to prevent errors that can arise from mismatched data types.
VI. Test New Tracking in a Development Environment:
- Use a Development API Key: Configure your tracking code to send data to a separate Amplitude project using a development API key. This will keep test data isolated from your production data.
- Leverage Testing Tools: Use Amplitude's User Lookup feature and the Amplitude Event Explorer Chrome extension to verify that events and properties are being sent correctly and that data is accurate and consistent.
Timestamped Version:
- 0:00 - Introductions and Session Overview
- 2:00 - The Importance of Regular Data Cleanup
- 3:45 - Too Many Events: Streamlining Your Event List
- 10:45 - Inconsistent Naming: Standardizing Conventions and Enforcing Them
- 15:50 - Finding Incorrect Data, and what to do
- 18:30 - Alignment with Objectives (plus taxonomy spreadsheet)
- 20:38 - Bugs and Errors: Identifying and Fixing Tracking Issues
- 22:35 - Get 1:1 help or help from the user community
- 24:58 - Question: I have this page view event, but it doesn't contain enough detail, so I end up adding another event for specific page view product page view. Is this correct?
- 26:06 - Question: Should I keep the page view event?
- 28:43 - Question: How do you fix errors when properties are passed through in different formats (e.g. same property gets values passed through boolean and number)?
- 32:13 - Question: How do you get devs to follow a specific naming convention?
- 36:46 - Queston: Can you show how to easily fix that when you have the Boolean versus string or I have platform Android, capital a, lower a, and so on and so forth. And is it possible to generalize it?
- 40:02 - Question: How do you fix raw event data and then whether you recommend exporting it all and then fixing them manually or being able to fix them, importing the fixed data into Amplitude?
- 46:50 - Question: We have inconsistent naming for properties and would like to merge all camelCase and snake_case properties to be considered snake_case. With >100 properties, is there an easy way to do this?
- 49:25 - Question: My question is about if there's a way to make, like, a formula or a calculation between two events that have the same property and how to calculate, for example, the conversion rate, between two events holding one property constant
- 54:30 - Question: Would you recommend the same pattern you explained for handling Forms? So Forms Started, Forms Submitted and with a property defining the Form Name. The target would be to build a funnel and group by Form Name to look at the conversions of different forms
- 57:11 - Question: So how do you find a good balance between the two in maintaining this type of architecture for, like, this clean cleanliness within the events, but then also be able to derive insights that may not be the focus for the quarter?