Hello friends! We’d like to say massive thanks to all of you who joined the Tracking Plan and Taxonomy Office Hours on February 8th!
We appreciate your passion for a clean analytics setup.
And a huge thank you to our fearless experts, Matthew New, AppFit cofounder, and Anuj Sharma, Senior Customer Success Architect at Amplitude.
If you want 1:1 help and would like Matthew to look at your setup, he’s offered up a few free sessions for those who need more help discussing their Amplitude setup. Book a free consult here.
And now, onto the recap: (link to full recording)
Q: How can you ensure your tracking plan remains consistent and adheres to the taxonomy over time? Any tips for maintaining documentation and taxonomy effectively?
TL;DR:
- Maintain a consistent and organized tracking plan by establishing a single source of truth, auditing regularly, assigning ownership, integrating analytics into development, and documenting changes effectively.
- Utilize Amplitude features like "Observe" and AI Data Assistant tools for data management, encourage team members to subscribe to updates, establish a governance function for data review, and maintain thorough documentation within internal tools.
Matthew’s tips:
- Establish a single source of truth for the tracking plan, utilizing tools like Amplitude's data tab, as it directly reflects the codebase.
- Regularly audit and update the tracking plan to ensure accuracy and relevance.
- Assign clear ownership of different parts of the tracking plan to specific roles, ideally product managers, as they rely on the data for decision-making.
- Integrate analytics as a fundamental part of the development process, including them in epics or stories.
- Utilize tools like Confluence and Jira to document and track analytics and changes to the tracking plan, ensuring it's easy for team members to understand updates and the rationale behind event tracking.
Anuj’s tips:
Use Amplitude's features for effective data management and stresses the importance of a governance function. My key recommendations include:
- Leveraging Amplitude's "Observe" feature to monitor expected versus unexpected data, which helps in maintaining data hygiene.
- Take advantage of the new AI Data Assistant tools in the Data tab to identify and fix issues.
- Encouraging team members, especially PMs interested in data quality, to subscribe to updates for regular alerts on data anomalies.
- Establishing a governance team or function that regularly reviews Amplitude data to identify unused or irrelevant events and metrics.
- Sorting events by various criteria such as last seen date, volume, and usage to assess their relevance and utility.
- Maintaining thorough documentation of the tracking plan and changes within internal tools like Confluence or Jira, ensuring it's accessible and comprehensible for all stakeholders.
- Regularly cleaning and updating the tracking plan based on these reviews to keep the data management system efficient and relevant.
- Make instrumentation a regular part of your release process.
Q: Who should play the role of the data quality cop? Is it the Product, IT, Eng, Data team? Other?
TL;DR:
- In smaller companies, product teams are primarily responsible for data quality due to limited resources.
- Product teams, regardless of company size, play a pivotal role in owning data quality within Amplitude due to their involvement in feature planning and event tracking.
- The close relationship between product teams and the development process underscores their responsibility for data accuracy and rectifying any discrepancies.
Matthew’s tips: It’s important for product teams to own data quality, especially in smaller companies where resources for dedicated governance teams might be limited.
- In smaller companies, product teams are primarily responsible for data quality due to limited resources.
- Even in companies with governance teams, product should have a significant say, as they are closely tied to the metrics.
- Product teams being closer to development makes them best positioned to ensure data accuracy and address any issues arising from code changes.
- The importance of product teams in data governance is underscored by their direct dependence on data for decision-making and their proximity to the development process.
Anuj’s tips:
- Product teams often emerge as the primary owners of data within Amplitude due to their involvement in feature planning and event tracking.
- The size of the company can influence the structure and dynamics of data ownership, with product teams consistently playing a key role regardless of the organization's scale.
- The close relationship between product teams and the development process underpins their responsibility for data quality, as they are well-placed to understand and rectify any discrepancies.
Q: After migrating to Amplitude in October, we've encountered discrepancies between our standard BI reports from the Snowflake data warehouse and Amplitude's analytics, particularly with retention analysis. As the primary data expert, I often get questions from PMs about these mismatches. The issue seems to stem from incomplete data backfilling in Amplitude, as tracking wasn't initially established there, unlike in our Snowflake warehouse. What are the best practices for ensuring data consistency between Amplitude and our data warehouse, especially for tracking user segments?
Anuj’s tips:
-
Work Closely with Analysts: Determine the necessary historical data extent. Consider event volume implications, as backfilling in Amplitude can impact costs due to increased event volume.
-
Relevance of Historical Data: Assess the need for extensive historical data. You might not need several years' worth; a few months might suffice.
-
Data Structure Alignment: Ensure the historical data structure aligns with your current setup in Amplitude. If the structures don't match, especially if migrating from another tool, backfilling might not be practical.
-
Consistency of Data Source: If you're continuously feeding data from the same source (like a warehouse) to Amplitude, backfilling is feasible. However, if the data source changes or if the historical data cannot be accurately mapped to Amplitude's schema, reconsider backfilling.
-
Focus on Key Data Elements: Often, backfilling user properties is sufficient instead of the entire event data. Consider what specific data elements are essential for your analysis.
Q: Curious about how Amplitude deals with user properties showing up only after an event. There's a disconnect between the first event in Amplitude and what users actually do in our app. We were thinking about backfilling data to fix this, but it's not going as planned. Any tips on how to tackle this?
TL;DR: Prioritize forward-looking user behavior tracking over dwelling on historical data, find a balance between past insights and present focus to address current challenges effectively.
Matthew’s tips:
-
Avoid Getting Hung Up on the Past:
- While I understand historical data can be valuable, keep your goal in mind and don’t overly fixate on it. Dwelling on the past may impede the team’s ability to address current challenges and opportunities.
Q: How can we handle data inaccuracies when events we send turn out to be wrong? Is there a way to edit or remove these data points?
TL;DR: Once data is sent to Amplitude, it's challenging to delete. Use features like transformation for duplicate data, drop filters to exclude incorrect data from analysis, block filters to prevent future ingestion of flawed events, and replay corrected data. Drop filters can also exclude historical data based on specific criteria.
-
Note: Once data is sent to Amplitude, it can't be easily deleted or scrubbed.
-
Transformation and Merging Features:
- Amplitude's transformation and merging features help handle duplicate events or properties.
-
Drop Filters and Block Filters:
- Drop filters exclude specific data from analysis, while block filters prevent future ingestion of erroneous events.
-
Replaying Corrected Data:
- Corrected data should be replayed to overwrite erroneous values.
-
- Block filters stop future ingestion of events with incorrect values.
-
- Drop filters exclude historical data from analysis based on criteria like timestamp or event properties.
Q: What is "identify" in Amplitude? Additionally, what are "auto-collected" user properties and event properties?
TL;DR: The "identify" event in Amplitude is not for analysis but to assign user properties or identities. It's different from events used for analysis. User attributes are for user profiles, while event properties are for specific actions. Amplitude's SDKs capture device and location info by default, but it's optional and can be configured.
-
Identify Event in Amplitude:
- An "identify" event in Amplitude is not an event used for analysis but rather a call made to Amplitude to assign properties or identities to a user.
- Typically used to add user properties onto a user's profile or identify a user, including properties like user IDs.
- It's initiated via network requests when sending data into Amplitude.
- It's separate from actual events used for analysis.
-
User Attributes vs. Event Properties:
- "Identify" calls are usually made when a session starts, whereas events are sent using a track method.
- User attributes (or user properties) are associated with the user profile, like subscription plans, while event properties pertain to specific actions, like item purchased or amount spent.
- Clarifying the difference between user attributes and event properties helps prevent confusion when working with data.
-
Auto-Track Properties:
- Amplitude's SDKs automatically capture certain device and location information, such as IP address, device model, OS, and platform.
- These properties are optional and can be enabled or disabled in the SDK configuration options.
Q: How do you duplicate data from a development project to a production project in Amplitude without having to export and import data each time?
TL;DR: Separate projects with distinct API keys for Dev and Prod ensure data integrity, while using a single tracking plan and copying a clean version between environments maintains a single source of truth.
Anuj’s tips:
- Maintain consistency: Having the same tracking plan across development (dev) and production environments is recommended.
- Utilize separate projects: Create separate projects for dev, analytics, and production to effectively manage data segregation.
- Change API keys: When transitioning between dev and production, ensure that API keys are changed to maintain security and integrity.
- Copy tracking plans: Utilize the option to copy tracking plans between projects to avoid manual event creation.
- Be mindful of API key management: Despite lacking environment flags, manage two separate API keys for dev and production environments.