Building a modern data platform – what have we learned?

As I reach the end of this series, it raises the question “what have we learned?”. If you’ve read through it all, you’ve learned you are patient and I’ve learned that writing a series of posts actually takes quite a bit of time. But I digress!

Let’s start at the beginning – what is a modern data platform?

I’ve used the term throughout, but what does it mean? In the introductory post I stated “In today’s modern world however, storing our data is no longer enough, we need to consider much more” and that’s true as organisations now want their data to provide modern data platformcompetitive edge and insights, we also need to ensure we are “developing an appropriate data strategy and building a data platform that is fit for today’s business needs”. In essence those two areas neatly define a modern data platform, storing data is no longer enough and our platform needs to fit today’s rapidly changing demands, integrate with new technologies and give the scale and flexibility we need to turn our data into an asset, all of this while ensuring our data maintains its privacy, security and we maintain governance and control

It’s not storage

While storage plays an important part in any data strategy (our data has to live somewhere) it’s important to realise when we talk about a data platform, it’s not about storage, while the right storage partner plays a crucial part, the choice isn’t driven by media types, IOPS, or colour of bezel, it’s about a wider strategy and ensuring our technology choice enables us to provide the scale, flexibility and security a modern platform demands.

Break down walls

We have also learned that data cannot be stored in silo’s, be that an on-prem storage repository or its modern equivalent the “cloud silo” placing our data somewhere without consideration of how we move it so we can do what we need to with it quickly and easily, is not designing a modern data platform.

Data Insight is crucial

Where our data is held and on what, while important, pales when compared to the managing the futureimportance of insight into how our data is used. Our modern data platform must provide visibility into the who’s, where’s, when’s, what’s and why’s of data usage, who’s accessing it, where is it and when, if ever, are they accessing it, what are they accessing and why. Knowing this, is critical for a modern data platform, it allows us to build retention, security and compliance policies, it allows us to start to build effective data leak protections and be more efficient with our storage and control the costs and challenges that comes with our ever increasing reliance on data.

Without this insight you don’t have a modern data platform.

Data is everywhere

We have also learned that our data is everywhere, it no longer resides in the protected walls of our data centers, it’s living on a range of devices both sat inside and outside of those walls. That’s not just the data we have, it’s also the increasing range of devices creating data for us, our platform needs to be able to ingest, process and control all of it. Protecting data on the very edges of our network to the same degree that we protect, secure and govern that which sits inside our data centers is crucial.

Cloud, cloud and more cloud

Just a few years ago the prognosis for the data industry was that cloud was going to swallow it all and those who looked to use “traditional” thinking around data would be swept away by the cloud driven tide.

080118_0950_Optimisingt1.jpgNow while cloud is unlikely to wipe out all data life as we know it, cloud should certainly play a part in your data strategy, it has many of the attributes that make it an ideal repository, its flexibility, scale, even commercial models make it an attractive proposition.

But it has limits, however ensuring our data platform can integrate cloud where appropriate and maintain all of the enterprise control we need is a core part of a modern platform, you can’t design a modern platform without considering cloud.

It’s a platform

The reason I used the word platform, is because that is what it is, it’s not one component, it is built up of multiple components, as I’ve shown here, it’s storage, data management, governance, control, be it in the datacentre, on the edges of your network or utilising the cloud.

The days of our data just been about one element are gone, we need a strategy that looks at how we use data in its entirety.

Building a modern data platform

The point of this series has been to provide some practical examples of the tools and technologies I’ve used building modern data platforms. Not every platform uses all of these technologies all of the time and it doesn’t have to be these specific ones to build your platform. What is more important is the concept of a data platform and hopefully this series has introduced you to some areas you may not have considered previously and will help you design a platform to get the very best from your data assets.

If you have any questions, please leave a comment on the site, or contact me on twitter @techstringy or LinkedIn

If you’ve missed any of the series head back to the introduction where you’ll find links to all of the parts of the series.

Thanks for reading.

Advertisements

Building a modern data platform – exploiting the cloud

No modern data platform would be complete if we didn’t talk about the use of public cloud. Public cloud can play a very important part in building a modern data platform and provide us with capabilities we couldn’t get any other way.

In this part of our series we look at the benefits of public cloud, the challenges of adoption and how to overcome them and ensure we can embrace cloud as part of our platform.

Why is public cloud useful for our data?

If we look at the challenges normally associated with traditional approaches to data storage, scale, flexibility, data movement, commercials, then it quickly becomes clear how cloud can be valuable.

While these challenges are common in traditional approaches, these are the areas were public cloud is strongest. It gives us scale that is almost infinite, a consumption model were we pay for what we need as we need it and of course flexibility, the ability to take our data and do interesting things with it once it’s within the public cloud. From analytics and AI to the more mundane backup and DR, flexbility is one of the most compelling reasons for considering public cloud at all.

While the benefits are clear, why are more organisations not falling over themselves to move to cloud?

What’s it lacking?

It’s not about what public cloud can do, it is more about what it doesn’t that tends to stop organisations wholeheartedly embracing it when it comes to data assets.

As we’ve worked through the different areas of building a modern data platform our approach to data is about more than storage, it’s insight, protection, availability, security, privacy and these are things not normally associated with native cloud storage and we don’t want our move to cloud to mean we lose all of those capabilities or have to implement and learn a new set of tools to deliver them.

Of course there is also the “data gravity” problem, we can’t have our cloud based data siloed away from the rest of our platform, it has to be part of it, we need to be able to move data in to the cloud, out again, between cloud providers, all while retaining enterprise control and management.

So how do we overcome these challenges?

How to make the cloud feel like the enterprise?

When it comes to the modern data platforms, NetApp have developed into an ideal partner for helping to integrate public cloud storage. If we look back at part one of this series (Building a modern data platform-the storage) we discussed NetApp’s data services which are built into their ONTAP operating system making it the cornerstone of their data fabric strategy. What makes ONTAP that cornerstone is, as a piece of software, the ability for it to be installed anywhere, which today also means public cloud.

Taking ONTAP and its data services into the cloud provides us with massive advantages, it allows us to deliver enterprise storage efficiencies, performance guarantees and the ability to use the enterprise tools we have made a key part of our platform with our cloud based data as well.

NetApp has two ways to deploy ONTAP into public cloud. It can be installed as Cloud Volumes ONTAP, a full ONTAP deployment on top of native cloud storage, providing all of the same enterprise data services we have on-prem and extend them into the cloud and seamlessly integrate them with our on-prem data stores.

An alternative and even more straightforward approach, is having ONTAP delivered as a native service, no ONTAP deployment or experience necessary. You order your service enter a size, performance characteristics and away you go, with no concern at all with underlying infrastructure, how it works and how it’s managed. You are provided with enterprise class storage with data protection, storage efficiencies and performance service levels previously unheard of in native cloud storage, in seconds.

It’s not a Strategy without integration

While adding enterprise capabilities are great, the idea of a modern data platform relies on having our data in the location we need it, when we need it while maintaining management and control. This is where the use of NetApp’s technology provides real advantage. The use of ONTAP as a consistent endpoint provides the platform for integration, allowing us to use the same tools, policies and procedures at the core of our data platform and extend this to our data in the public cloud.

NetApp’s SnapMirror provides us with a data movement engine so we can simply move data in and out of and between clouds. Replicating data in this way means that while our on-prem version can be the authoritative copy, it doesn’t have to be the only one, replicating a copy of our data to a location for a one off task, which once completed can then be destroyed, is a powerful capability and an important element of simplifying the extension of our platform into the cloud.

Summary

Throughout this series we have asked the question “do we have to use technology X to deliver this service?” the reality is of course no, but NetApp are a key element of our modern data platforms because of this cloud integration capability, the option to provide consistent data services across multiple locations is extremely powerful allowing us to take advantage of cloud while maintaining our enterprise controls.

While I’ve not seen any other data services provider coming close to what NetApp are doing in this space, the important thing in your design strategy, if it is to include public cloud, is ensure you have appropriate access to data services, integration, management and control, it’s crucial that you don’t put data at risk or diminish the capabilities of your data platform by using cloud.

This is part 6 in a series of posts on building a modern data platform, you can find the introduction and other parts of this series here.

Building a modern data platform – Out on the edge

In this series so far we have concentrated on the data under our control in our datacentres and managed clouds and protected by enterprise data protection tools.

However, the reality of a modern data platform is not all of our data lives in those safe and secure locations. Today most organisations expect mobility, we want access to our key applications and data on any device and from any location.

This “edge data” presents a substantial challenge when building a modern data platform, not only is the mobility of data a security problem, it’s a significant management and compliance headache.

How do we go about managing this problem?

The aim of this series is to give examples of tools that I’ve used to solve modern data platform challenges, however with edge data it’s not that simple. It’s not only the type and location of data, but also the almost infinite range of devices that hold it.

Therefore, rather than present a single solution, we are going to look at some of the basics of edge data management and some tools you may wish to consider.

Manage the device

The fundamental building block of edge data protection is maintaining control of our mobile devices, they are repositories for our data assets and should be treated as any other in our organisation.

When we say control, what do we mean? In this case control comes from strong endpoint security.

Strong security is essential for our mobile devices, their very nature means they carry a significant risk of loss and therefore data breach, so it’s critical we get the security baseline right.

To do this mobile device management tools like Microsoft Intune can help us to build secure baseline policies, which may, for example, demand secure logon, provide application isolation and in the event of device loss ensure we can secure the data on that device to help minimise the threat of data leak and compliance breach.

Protecting the data

As critical as ensuring our mobile data repository is managed and secure, protecting the data on it is crucial. We can take three general approaches to controlling our edge data;

  • No data on the device
  • All data synchronised to a secure location
  • Enforce edge data protection

Which approach you use depends on both the type of data and the working practices of your organisation.

For example, if your mobile users only access data from good remote links, home office for example, then having data only within our controlled central repositories and never on the device is fine.

That however, is not always practical, therefore a hybrid approach that allows us to cache local copies of that data on our devices may be more appropriate, think OneDrive for Business, Dropbox or build your own sync tools such as Centrestack.

These tools allow users access to a cached local copy of the data housed in our central data stores regardless of connectivity, with managed synchronisation back to these stores when possible.

This provides up to date data copies for users for convenience, while we maintain a central data repository ensuring the authoritative copy resides under our control.

Enforce Data Protection

However, this hybrid approach relies upon users placing the data in the correct folder locations and if they don’t this then presents a data security and compliance risk.

To overcome this we can ensure we protect all of the data on these devices by extending our enterprise data protection solution, for example we can use Veeam Agents to protect our Windows workloads, or a specialised edge data tool such as Druva InSync, which can help us protect edge data on a range of devices and operating systems.

This goes beyond synchronisation of a set of predefined folders and allows us to protect as much of the data and configuration of our mobile devices as we need to.

Understand the edge

While ensuring the device and data is robustly protected, our modern platform also demands insight into our data, where it is, how it is used and importantly how to find it when needed.

This is a real challenge with edge data, how do we know who’s mobile device has certain data types on it? If we lose a device can we identify what was on it? The ability to find and identify data across our organisation, including that on the edge, is essential to the requirements of our modern data platform.

Ensuring we have a copy of that data, that is held securely and is indexed and searchable, should be a priority.

Druva InSync, for example, allows you to do compliance searches across all of the protected mobile devices, so you can find the content on a device, even if that device is lost.

Centralising content via enterprise backup, or synchronisation tools also provides us this capability, how you do it will depend on your own platform and working practice, doing it however should be seen as a crucial element of your modern data platform.

In Summary

The importance of having our data controlled even when it spends much of it’s time on the very edges of our networks is crucial to our modern data strategy. When it is, we can be sure  all of our business security and compliance rules are applied to it and we can ensure it’s protected, recoverable and always available.

Managing the data on the edges of our network is a difficult challenge, but by ensuring we have strong management of devices, robust data protection and insight into that data, we can ensure edge data is as core a part of our data platform as that in our datacentre.

This is part 5 in a series of posts on building a modern data platform, the previous parts of the series can be found below.

modern data platform
Introduction

modern storage
The Storage

031318_0833_Availabilit1.png
Availability

control
Control

 

 

what the cloud can bring
Prevention (Office365)

 

Building a modern data platform – Prevention (Office365)

In this series so far, we have looked at getting our initial foundations right and ensuring we have insight and control of our data and have looked at components that I use to help achieve this. However, this time we are looking at something that many organisations are already using which has a wide range of capabilities that can help to manage and control data but which are often underutilised.

For ever-increasing numbers of us Office365 has become the primary data and communications repository. However, I often find organisations are unaware of many powerful capabilities within their subscription which can greatly reduce the risks of data breach.

Tucked away with Office365 is the Security and Compliance Section (protection.office.com) and is the gateway to several powerful features that should be part of your modern data strategy.

In this article we are going to focus on two such features “Data Loss Prevention” and “Data Governance”, both offer powerful capabilities that can be deployed quickly across your organisation and can help to significantly mitigate against the risks of data breach.

Data Loss Prevention (DLP)

DLP is an important weapon in our data management arsenal, DLP policies are designed to ensure sensitive information does not leave our organisation in ways that it shouldn’t and Office365 makes this straightforward for us to get started.

We can quickly create policies that we can apply across our organisation to help identify types of data that we hold, several predefined options already exist including ones that identify financial data, personally identifiable information (PII), social security numbers, health records, passport numbers etc. with templates for a number of countries and regions across the world.

Once our policies which identify our data types are created we can apply rules to that data on how it can be used, we can apply several rules and, depending on requirement, make them increasingly stringent.

The importance of DLP rules should not be underestimated, while it’s important we understand who has access to and uses our data, too many times we feel this is enough and don’t take that next crucial step of controlling the use and movement of that data.

We shouldn’t forget that those with the right access to the right data, may accidentally or maliciously do the wrong thing with it!

Data Governance

Governance should be a cornerstone of a modern data platform it is what defines the way we use, manage, secure, classify and retain our data and can impact the cost of our data storage, it’s security and our ability to deliver compliance to our organisations.

Office365 provides two key governance capabilities.

Labels

Labels allow us to apply classifications to our data so we can start to understand what is important and what isn’t. We can highlight what is for public consumption, what is private, sensitive, commercial in confidence or any other range of potential classifications that you have within your organisation.

Classification is crucial part of delivering a successful data compliance capability, giving us granular control on exactly how we handle data of all types.

Labels can be applied automatically based on the contents of the data we have stored, they can be applied by users as they create content or in conjunction with the DLP rules we discussed earlier.

For example a DLP policy can identify a document with credit card details in, then automatically apply a rule that labels it as sensitive information.

Retention

Once we have classified our data into what is important and what isn’t we can then, with retention policies, define what we keep and for how long.

These policies allow us to effectively manage and govern our information and subsequently allows us to reduce the risk of litigation or security breach by either retaining data for a period, as defined by a regulatory requirement, or, importantly, permanently deleting old content that you’re no longer required to keep.

The policies can be assigned automatically based on classifications or can be applied manually by a user as they generate new data.

For example, a user creates a new document containing financial data which must be retained for 7 years, that user can classify the data accordingly, ensuring that both our DLP and retention rules are applied as needed

Management

Alongside these capabilities Office365 provides us with two management tools, disposition and supervision.

Disposition is our holding pen for data to be deleted so we can review any deletions before actioning.

Supervision is a powerful capability allowing us to capture employee communications for examination by internal or external reviewers.

These tools are important in allowing us to show we have auditable processes and control within our platform and are taking the steps necessary to protect our data assets as we should.

Summary

The ability to govern and control our data wherever we hold it is a critical part of a modern data platform. If you use Office365 and are not using these capabilities then you are missing out.

The importance of governance is only going to continue to grow as ever more stringent data privacy and security regulations develop, governance can allow us to greatly reduce many of the risks associated with data breach and services such as Office365 have taken things that have been traditionally difficult to achieve and made them a whole lot easier.

If you are building a modern data platform then compliance and governance should be at the heart of your strategy.

This is part 4 in a series of posts on building a modern data platform, the previous parts of the series can be found below.

modern data platform
Introduction

modern storage
The Storage

031318_0833_Availabilit1.png
Availability

control
Control

Building a modern data platform – Control

In the first parts of this series we have looked at ensuring the building blocks of our platform are right so that our data is sitting on strong foundations.

In this part we look at bringing management, security and compliance to our data platform.

As our data, the demands we place on it and the amount of regulation controlling it, continues to grow then gaining deep insight into how it is used can no longer be a “nice to have” it has to be an integral part of our strategy.

If you look traditionally at the way we have managed data growth you can see the basics of the problem, we have added file servers, storage arrays and cloud repositories as demanded, because more, has been easier than managing the problem.

However, this is no longer the case, as we see our data as more of an asset we need to make sure it is in good shape, holding poor quality data is not in our interest, the cost of storing it is no longer going unnoticed, we can no longer go to the business every 12 months needing more and while I have no intention of making this a piece about the EU General Data Protection Regulation (GDPR), it and regulation like it, is forcing us to rethink how we view the management of our data.

So what do I use in my data platforms to manage and control data better?

Varonis

varonis logo

I came across Varonis and their data management suite about 4 years ago and this was the catalyst for a fundamental shift in the way I have thought about and talked about data, as it opened up brand new insights on how unstructured data in a business was been used and highlighted the flaws in the way people were traditionally managing it.

With that in mind, how do I start to build management into my data platform?

It starts by finding answers to two questions;

Who, Where and When?

Without understanding this point it will be impossible to properly build management into our platform.

If we don’t know who is accessing data how can we be sure only the right people have access to our assets?

If we don’t know where the data is, how are we supposed to control its growth, secure it and govern access?

And of course when is the data accessed or even, is it accessed? let’s face it if no one is accessing our data then why are we holding it at all?

What’s in it?

However, there are lots of tools that tell me the who, where and when of data access, that’s not really reason I include Varonis in my platform designs.

While who, where and when is important it does not include a crucial component, the what. What type of information is stored in my data.

If I’m building management policies and procedures I can’t do that without knowing what is contained in my data, is it sensitive information like finances, intellectual property or customer details? Or, as we look at regulation such as GDPR, knowing where we hold private and sensitive data about individuals is increasingly crucial.

Without this knowledge we cannot ensure our data and business compliance strategies are fit for purpose.

Building Intelligence into our system

In my opinion one of the most crucial parts of a modern data platform is the inclusion of behavioural analytics, as our platforms grow ever more diverse, complex and large, one of the common refrains I hear is “this information is great, but who is going to look at it, let alone action it?”, this is a very fair point and a real problem.

Behavioural Analytics tools can help address this and supplement our IT teams. These technologies are capable of understanding and learning the normal behaviour of our data platform and when those norms are deviated from can warn us quickly and allow us to address the issue.

This kind of behavioural understanding offers significant benefits from knowing who the owners of a data set are to helping us spot malicious activity, from ransomware to data theft.

In my opinion this kind of technology is the only realistic way of maintaining security, control and compliance in a modern data platform.

Strategy

As discussed in parts one and two, it is crucial the vendors who make up a data platform have a vision that addresses the challenges businesses see when it comes to data.

There should be no surprise then that Varonis’s strategy aligns very well with those challenges, as one of the first companies I came across that delivered real forethought to the management, control and governance of our data assets.

That vision continues, with new tools and capabilities continually delivered, such as Varonis Edge and the recent addition of a new automation engine which provides a significant enhancement to the Varonis portfolio, the tools now don’t only warn of deviations from the norm, but can also act upon them to remediate the threat.

All of this tied in with Varonis’ continued extension of its integration with On-Prem and Cloud, storage and service providers, ensure they will continue to play a significant role in bringing management to a modern data platform.

Regardless of whether you choose Varonis or not it is crucial you have intelligent management and analytics built into your environment, because without it, it will be almost impossible to deliver the kind of data platform fit for a modern data driven business.

You can find the other posts from this series below;

modern data platform
Introduction
modern storage
Part One – The Storage
alwayon
Part Two – Availability