Best Practices Related to Personally Identifiable Information
In preparation for the General Data Protection Regulation (GDPR) laws that took effect in May, we brushed up on best practices related to the handling of Personally Identifiable Information (PII). Keeping PII from leaking is a vast topic that can be discussed from many different angles, but for this article, we will focus specifically on its relationship to Google Analytics and how Google Analytics can be used to assist with the process of PII discovery and suppression. Remember, while we know Google Analytics can be a helpful tool, you should always aim to address problems in your code so that PII never leaks out in the first place.
What is considered PII?
Any data that can be used to identify an individual is considered PII, including:
- Obvious things like names, phone numbers, addresses, social security numbers, and email addresses
- GPS coordinates that can pinpoint an area more-specific than one square mile
- User IDs and any other IDs that can be traced to an individual
How can I use Google Analytics to detect and handle PII in real time?
There is an excellent guide from Brian Clifton that walks through the process of using Google Analytics to detect PII.
Can I hash/encrypt my PII instead of redacting it?
Google allows you to hash PII and send it, so long as that data is hashed with a minimum of SHA256 and uses a salt with a minimum of 8 characters. Regardless of how you hash or salt the data, you may not send Google Analytics encrypted Protected Health Information (as defined under HIPAA).
If you were to completely remove any trace of the PII data from the query string, it would make it a lot tougher to search Google Analytics for potential leaks. However, if you redact PII to leave behind markers like [REDACTED_PHONE_NUMBER] in your Analytics data, you can search through that data periodically looking for these signifiers, pinpoint problem scripts that are leaking PII, and take steps to prevent the data from being exposed in the first place.
How can I search my Google Analytics history for leaked PII or redacted PII markers?
What happens if PII does make it to Google Analytics?
If Google discovers any PII in your analytics data, they may delete your entire data set. However, word through the grapevine (blog posts) is that if you discover the PII first, carefully quarantine it, and approach Google proactively asking if they can simply trim-away the quarantined data, they may oblige your request.
It is important to prevent PII from leaking to Google Analytics, but it is even more important to prevent PII from leaking at all. Google Analytics is just one of many tools (e.g. server logs, mail filters, data migration tools) that you can use to detect PII leaks to address in your source code.
I hope this information helps you in your journey. As web developers, we shoulder a huge amount of trust and responsibility for handling PII. Let’s stay vigilant and keep it safe and secure.