The European Union’s rapidly approaching General Data Protection Regulation—known far and wide simply as “GDPR”—presents a terrifying “what-if” scenario for businesses. By the letter of the law, if you control personally identifying information (PII) and can’t find it, you’re running a huge risk.
Can you really say today that you know the location of absolutely every piece of user data across your systems? Not just credit card numbers and national identification numbers, either— everything. If you can’t, you might have a problem. Fortunately, there’s a solution. But first, let’s examine the issue at hand.
Imagine this scenario: An EU citizen contacts your company to invoke the Right to Be Forgotten, which forces “the data controller [to] erase his/her personal data, cease further dissemination of the data, and potentially have third parties halt processing of the data.” Simple, right? Find John in the system, hit delete, confirm the erasure and go on with your day.
But not so fast. Where is John’s data? Are there old customer records with his address? What has he shared with your team via email? Are there service desk records? What about duplicates created by misspellings or something as simple as not capitalizing his last name?
Right now, you probably can’t find it all. If you can’t find it, you can’t erase it. If you can’t erase it, you run the risk of being caught violating GDPR. And that carries a penalty of 4% of annual revenue or about $25 million, whichever is greater. Like we said—problem.
Intelligent Search & GDPR
The solution is at once logical and extremely complex. There are tons of enterprise search solutions on the market, most of which are currently making noise about how they can help you find stray PII and facilitate GDPR compliance. The thing is, only a few actually can.
Any search application can find John’s name when it’s mentioned in a document or an email. But when things get tough—complex, open-ended search strings, unstructured content, connecting to both cloud-based and on-premise systems, etc.—they tend to fall flat. And things get tough fast: Businesses as small as 25 or so employees probably have terabytes of content —Word® documents, emails, Slack messages, PDFs, images, etc.—spread over multiple systems in multiple environments. It’s not a needle in a haystack—it’s a needle in dozens of constantly moving haystacks, some of which are in locked rooms. Can’t find the needle? Kiss your bank account goodbye.
The best insurance against GDPR non-compliance due to stray user data is not standard enterprise search—it’s intelligent search. The differences are three-fold:
- Enterprise search thrives on structured content like spreadsheets and can perform some basic text extraction from unstructured documents. Intelligent search utilizes advanced machine learning technology to reach deep into just about any content. It can pull live text from images, transcribe audio files and even identify the subjects of a video.
- Most enterprise search applications look at text as little more than a series of symbols—if you search for “John,” they return results that include “John.” Intelligent search can find exact matches, but also identify patterns (e.g., XXX-XX-XXXX for a Social Security number) and read like a human being (we call this “natural language processing”) to understand context.
- User engagement with traditional enterprise search is usually transactional—you look for something, it tries to find it. Intelligent search can run one-off queries, but also “wake up” at specified intervals to look for certain types of information and alert users when with the results.
GDPR Compliance in the Real World
What does this look like in practice? Here’s how Docxonomy handles PII identification for GDPR compliance:
- Users select the types of data they want to find, the systems they want to query, and what they want to happen if escalation is required (e.g., a credit card number is identified in a salesperson’s email archive).
- Docxonomy runs an initial search to identify the exact location of all PII in the organization and alerts the data protection officer or other stakeholders so they can take action.
- Docxonomy repeats the process daily (or however often the user specifies) to ensure no stray data ends up where it shouldn’t be. If anything is found, it sends escalation messages so knowledge workers can take action.
Sound easy? It is—for the user. In fact, the intelligent search platform is bringing to bear a stunning amount of technology and know-how, from machine learning, natural language processing and optical character recognition to advanced indexing and classification techniques. The result, however, is quite simple: Companies know exactly where PII is located in their systems, and knowledge workers are empowered to take whatever action they deem necessary.
GDPR Compliance: Top Three Takeaways
Above all else, remember that:
- GDPR presents a clear and present danger for any firm that does business in the EU, no matter how tangentially and no matter where it’s based.
- One of the most pressing risks for GDPR non-compliance is misplaced or misfiled PII, which could cause a company to be unwittingly unable to comply with several of the statute’s provisions.
- Intelligent search technology—not traditional enterprise search—can root out stray PII, inform knowledge workers of its presence and location and allow them to take pre-emptive action.