Three Strategies for Big Data Security

12, June, 2018


Artificial Intelligence - Data - Featured - IoT -

istock gremlin

The 2018 Thales Data Threat Report (DTR) has great information on Big Data use and security. We surveyed more than 1,200 senior security executives from around the world, and virtually all (99%) report they plan to use Big Data this year.

Top Big Data Security Concerns

But they rightly have concerns. As the report notes:

The top Big Data security issue is that sensitive data can be anywhere – and therefore everywhere – a concern expressed by 34% of global and U.S. respondents. A related concern is that Big Data-generated reports could contain sensitive data (33% global and U.S.), while concerns over privacy regulations at 30% global round out the list of top issues.

In my experience, these are the right things to be worried about. Big Data is unstructured. It has no schema like traditional data bases, and any kind of sensitive data can go into the data lake and then show up in a report. The data could be regulated, payment card information, personal identity information or sensitive for any number of reasons. The Big Data platform doesn’t care what kind of data you put into it. And any global company likely has sensitive data from citizens from the EU as well as many other countries that have data privacy laws, which puts these enterprises at risk for non-compliance.

Top Responses to Concerns

The DTR also gives us some insight into what these security executives are doing to secure their Big Data.

  1. Stronger authentication and access controls (38% global, 37% in the U.S.)
  2. Improved monitoring and reporting tools (36% global, 34% U.S.)
  3. Encryption and access controls for underlying platforms (35% global and U.S.)

These are great tactics, but I’d like to discuss the three most common overarching strategies I see in Big Data security today.

Strategies for Securing Big Data

There are a number of different paths to Big Data security.

1. Protect at the System Level

The easiest path might be to encrypt the entire data lake and employ features that will harden platform security itself, such as authentication, user identity management, access monitoring, security information event management (SIEM) systems, and so forth. The point of this hardening is to carefully control ingress and egress and would include all three of the responses listed above. Thales eSecurity’s Vormetric Transparent Encryption for data at rest is ideal for this approach, and with it, we would expect no noticeable degradation in system response.

2. Selectively Protecting Data on Entry

However, the best practice is to secure data before it goes into the lake or Big Data platform, so those doing the data analytics don’t see the sensitive data. This approach employs techniques such as tokenization and application encryption to selectively protect sensitive information – first names, last names, addresses, payment card information, etc. — coming into your platform. But, when you use this approach, you need to make sure your data scientists and data analytics people understand they are working with protected information. They need to know they won’t, for example, see an actual social security number but a surrogate that looks and acts like one instead. Then they use token values to run the analytics to generate business case insights, such as identifying target customer segments or market trends. The point is to find ways to not use the sensitive data unless the organization needs it for specific cases, such as an e-mail campaign.

This approach takes more time, money, and effort. It may also have an impact on response times, but this depends on the data flow and data access requirements for your system.

3. Doing Nothing

While neither of these two approaches to securing Big Data is perfect, they both are better than the third approach, which is to do nothing. Unfortunately, I see too often organizations sticking their heads in the sand and doing nothing until they absolutely have to. By then they most likely have been damaged by loss of capital (fines), reputation, market share, market value and more.

The DTR tells us:

… More than two-thirds (67%) of global organizations and nearly three fourths (71%) in the U.S. have been breached at some point in the past. Further, nearly half (46%) of U.S. respondents reported a breach just in the previous 12 months, nearly double the 24% response from last year, while over one-third (36%) of global respondents suffered a similar fate. In addition to the massive Equifax breach that exposed personal information of 143 million individuals, other noted breaches last year included the education platform Edmodo (77 million records hacked); Verizon (14 million subscribers possibly hacked); and America’s JobLink (nearly 5 million records compromised).

The consensus is now that the question is not if you will be breached, but when. In the face of statistics like these, doing nothing should not be an option.

Thales will be joining the co-located AI, IoT and Blockchain Expo in Europe (27-28th June) this year. They will be speaking and exhibiting, find them at booth 183 and listen to Kelvin Cusack, Senior Sales Engineer within the IoT Privacy & Security conference track on Day 2 (28th June) at 14.10.

© istockphoto.com/GREMLIN