Building a modern data platform – Out on the edge

In this series so far we have concentrated on the data under our control in our datacentres and managed clouds and protected by enterprise data protection tools.

However, the reality of a modern data platform is not all of our data lives in those safe and secure locations. Today most organisations expect mobility, we want access to our key applications and data on any device and from any location.

This “edge data” presents a substantial challenge when building a modern data platform, not only is the mobility of data a security problem, it’s a significant management and compliance headache.

How do we go about managing this problem?

The aim of this series is to give examples of tools that I’ve used to solve modern data platform challenges, however with edge data it’s not that simple. It’s not only the type and location of data, but also the almost infinite range of devices that hold it.

Therefore, rather than present a single solution, we are going to look at some of the basics of edge data management and some tools you may wish to consider.

Manage the device

The fundamental building block of edge data protection is maintaining control of our mobile devices, they are repositories for our data assets and should be treated as any other in our organisation.

When we say control, what do we mean? In this case control comes from strong endpoint security.

Strong security is essential for our mobile devices, their very nature means they carry a significant risk of loss and therefore data breach, so it’s critical we get the security baseline right.

To do this mobile device management tools like Microsoft Intune can help us to build secure baseline policies, which may, for example, demand secure logon, provide application isolation and in the event of device loss ensure we can secure the data on that device to help minimise the threat of data leak and compliance breach.

Protecting the data

As critical as ensuring our mobile data repository is managed and secure, protecting the data on it is crucial. We can take three general approaches to controlling our edge data;

  • No data on the device
  • All data synchronised to a secure location
  • Enforce edge data protection

Which approach you use depends on both the type of data and the working practices of your organisation.

For example, if your mobile users only access data from good remote links, home office for example, then having data only within our controlled central repositories and never on the device is fine.

That however, is not always practical, therefore a hybrid approach that allows us to cache local copies of that data on our devices may be more appropriate, think OneDrive for Business, Dropbox or build your own sync tools such as Centrestack.

These tools allow users access to a cached local copy of the data housed in our central data stores regardless of connectivity, with managed synchronisation back to these stores when possible.

This provides up to date data copies for users for convenience, while we maintain a central data repository ensuring the authoritative copy resides under our control.

Enforce Data Protection

However, this hybrid approach relies upon users placing the data in the correct folder locations and if they don’t this then presents a data security and compliance risk.

To overcome this we can ensure we protect all of the data on these devices by extending our enterprise data protection solution, for example we can use Veeam Agents to protect our Windows workloads, or a specialised edge data tool such as Druva InSync, which can help us protect edge data on a range of devices and operating systems.

This goes beyond synchronisation of a set of predefined folders and allows us to protect as much of the data and configuration of our mobile devices as we need to.

Understand the edge

While ensuring the device and data is robustly protected, our modern platform also demands insight into our data, where it is, how it is used and importantly how to find it when needed.

This is a real challenge with edge data, how do we know who’s mobile device has certain data types on it? If we lose a device can we identify what was on it? The ability to find and identify data across our organisation, including that on the edge, is essential to the requirements of our modern data platform.

Ensuring we have a copy of that data, that is held securely and is indexed and searchable, should be a priority.

Druva InSync, for example, allows you to do compliance searches across all of the protected mobile devices, so you can find the content on a device, even if that device is lost.

Centralising content via enterprise backup, or synchronisation tools also provides us this capability, how you do it will depend on your own platform and working practice, doing it however should be seen as a crucial element of your modern data platform.

In Summary

The importance of having our data controlled even when it spends much of it’s time on the very edges of our networks is crucial to our modern data strategy. When it is, we can be sure  all of our business security and compliance rules are applied to it and we can ensure it’s protected, recoverable and always available.

Managing the data on the edges of our network is a difficult challenge, but by ensuring we have strong management of devices, robust data protection and insight into that data, we can ensure edge data is as core a part of our data platform as that in our datacentre.

This is part 5 in a series of posts on building a modern data platform, the previous parts of the series can be found below.

modern data platform
Introduction

modern storage
The Storage

031318_0833_Availabilit1.png
Availability

control
Control

 

 

what the cloud can bring
Prevention (Office365)

 

Advertisements

Building a modern data platform – Prevention (Office365)

In this series so far, we have looked at getting our initial foundations right and ensuring we have insight and control of our data and have looked at components that I use to help achieve this. However, this time we are looking at something that many organisations are already using which has a wide range of capabilities that can help to manage and control data but which are often underutilised.

For ever-increasing numbers of us Office365 has become the primary data and communications repository. However, I often find organisations are unaware of many powerful capabilities within their subscription which can greatly reduce the risks of data breach.

Tucked away with Office365 is the Security and Compliance Section (protection.office.com) and is the gateway to several powerful features that should be part of your modern data strategy.

In this article we are going to focus on two such features “Data Loss Prevention” and “Data Governance”, both offer powerful capabilities that can be deployed quickly across your organisation and can help to significantly mitigate against the risks of data breach.

Data Loss Prevention (DLP)

DLP is an important weapon in our data management arsenal, DLP policies are designed to ensure sensitive information does not leave our organisation in ways that it shouldn’t and Office365 makes this straightforward for us to get started.

We can quickly create policies that we can apply across our organisation to help identify types of data that we hold, several predefined options already exist including ones that identify financial data, personally identifiable information (PII), social security numbers, health records, passport numbers etc. with templates for a number of countries and regions across the world.

Once our policies which identify our data types are created we can apply rules to that data on how it can be used, we can apply several rules and, depending on requirement, make them increasingly stringent.

The importance of DLP rules should not be underestimated, while it’s important we understand who has access to and uses our data, too many times we feel this is enough and don’t take that next crucial step of controlling the use and movement of that data.

We shouldn’t forget that those with the right access to the right data, may accidentally or maliciously do the wrong thing with it!

Data Governance

Governance should be a cornerstone of a modern data platform it is what defines the way we use, manage, secure, classify and retain our data and can impact the cost of our data storage, it’s security and our ability to deliver compliance to our organisations.

Office365 provides two key governance capabilities.

Labels

Labels allow us to apply classifications to our data so we can start to understand what is important and what isn’t. We can highlight what is for public consumption, what is private, sensitive, commercial in confidence or any other range of potential classifications that you have within your organisation.

Classification is crucial part of delivering a successful data compliance capability, giving us granular control on exactly how we handle data of all types.

Labels can be applied automatically based on the contents of the data we have stored, they can be applied by users as they create content or in conjunction with the DLP rules we discussed earlier.

For example a DLP policy can identify a document with credit card details in, then automatically apply a rule that labels it as sensitive information.

Retention

Once we have classified our data into what is important and what isn’t we can then, with retention policies, define what we keep and for how long.

These policies allow us to effectively manage and govern our information and subsequently allows us to reduce the risk of litigation or security breach by either retaining data for a period, as defined by a regulatory requirement, or, importantly, permanently deleting old content that you’re no longer required to keep.

The policies can be assigned automatically based on classifications or can be applied manually by a user as they generate new data.

For example, a user creates a new document containing financial data which must be retained for 7 years, that user can classify the data accordingly, ensuring that both our DLP and retention rules are applied as needed

Management

Alongside these capabilities Office365 provides us with two management tools, disposition and supervision.

Disposition is our holding pen for data to be deleted so we can review any deletions before actioning.

Supervision is a powerful capability allowing us to capture employee communications for examination by internal or external reviewers.

These tools are important in allowing us to show we have auditable processes and control within our platform and are taking the steps necessary to protect our data assets as we should.

Summary

The ability to govern and control our data wherever we hold it is a critical part of a modern data platform. If you use Office365 and are not using these capabilities then you are missing out.

The importance of governance is only going to continue to grow as ever more stringent data privacy and security regulations develop, governance can allow us to greatly reduce many of the risks associated with data breach and services such as Office365 have taken things that have been traditionally difficult to achieve and made them a whole lot easier.

If you are building a modern data platform then compliance and governance should be at the heart of your strategy.

This is part 4 in a series of posts on building a modern data platform, the previous parts of the series can be found below.

modern data platform
Introduction

modern storage
The Storage

031318_0833_Availabilit1.png
Availability

control
Control

Building a modern data platform – Control

In the first parts of this series we have looked at ensuring the building blocks of our platform are right so that our data is sitting on strong foundations.

In this part we look at bringing management, security and compliance to our data platform.

As our data, the demands we place on it and the amount of regulation controlling it, continues to grow then gaining deep insight into how it is used can no longer be a “nice to have” it has to be an integral part of our strategy.

If you look traditionally at the way we have managed data growth you can see the basics of the problem, we have added file servers, storage arrays and cloud repositories as demanded, because more, has been easier than managing the problem.

However, this is no longer the case, as we see our data as more of an asset we need to make sure it is in good shape, holding poor quality data is not in our interest, the cost of storing it is no longer going unnoticed, we can no longer go to the business every 12 months needing more and while I have no intention of making this a piece about the EU General Data Protection Regulation (GDPR), it and regulation like it, is forcing us to rethink how we view the management of our data.

So what do I use in my data platforms to manage and control data better?

Varonis

varonis logo

I came across Varonis and their data management suite about 4 years ago and this was the catalyst for a fundamental shift in the way I have thought about and talked about data, as it opened up brand new insights on how unstructured data in a business was been used and highlighted the flaws in the way people were traditionally managing it.

With that in mind, how do I start to build management into my data platform?

It starts by finding answers to two questions;

Who, Where and When?

Without understanding this point it will be impossible to properly build management into our platform.

If we don’t know who is accessing data how can we be sure only the right people have access to our assets?

If we don’t know where the data is, how are we supposed to control its growth, secure it and govern access?

And of course when is the data accessed or even, is it accessed? let’s face it if no one is accessing our data then why are we holding it at all?

What’s in it?

However, there are lots of tools that tell me the who, where and when of data access, that’s not really reason I include Varonis in my platform designs.

While who, where and when is important it does not include a crucial component, the what. What type of information is stored in my data.

If I’m building management policies and procedures I can’t do that without knowing what is contained in my data, is it sensitive information like finances, intellectual property or customer details? Or, as we look at regulation such as GDPR, knowing where we hold private and sensitive data about individuals is increasingly crucial.

Without this knowledge we cannot ensure our data and business compliance strategies are fit for purpose.

Building Intelligence into our system

In my opinion one of the most crucial parts of a modern data platform is the inclusion of behavioural analytics, as our platforms grow ever more diverse, complex and large, one of the common refrains I hear is “this information is great, but who is going to look at it, let alone action it?”, this is a very fair point and a real problem.

Behavioural Analytics tools can help address this and supplement our IT teams. These technologies are capable of understanding and learning the normal behaviour of our data platform and when those norms are deviated from can warn us quickly and allow us to address the issue.

This kind of behavioural understanding offers significant benefits from knowing who the owners of a data set are to helping us spot malicious activity, from ransomware to data theft.

In my opinion this kind of technology is the only realistic way of maintaining security, control and compliance in a modern data platform.

Strategy

As discussed in parts one and two, it is crucial the vendors who make up a data platform have a vision that addresses the challenges businesses see when it comes to data.

There should be no surprise then that Varonis’s strategy aligns very well with those challenges, as one of the first companies I came across that delivered real forethought to the management, control and governance of our data assets.

That vision continues, with new tools and capabilities continually delivered, such as Varonis Edge and the recent addition of a new automation engine which provides a significant enhancement to the Varonis portfolio, the tools now don’t only warn of deviations from the norm, but can also act upon them to remediate the threat.

All of this tied in with Varonis’ continued extension of its integration with On-Prem and Cloud, storage and service providers, ensure they will continue to play a significant role in bringing management to a modern data platform.

Regardless of whether you choose Varonis or not it is crucial you have intelligent management and analytics built into your environment, because without it, it will be almost impossible to deliver the kind of data platform fit for a modern data driven business.

You can find the other posts from this series below;

modern data platform
Introduction
modern storage
Part One – The Storage
alwayon
Part Two – Availability

Building a modern data platform – Availability

In part one we discussed the importance of getting our storage platform right, in part two we look at availability.

The idea that availability is a crucial part of a modern platform was something I first heard from a friend of mine, Michael Cade from Veeam, who introduced me to “availability as part of digital transformation” and how this was changing Veeam’s focus.

This shift is absolutely right, today as we build our modern platforms backup and recovery is still a crucial requirement, however, a focus on availability is at least, if not more, crucial. Today nobody in your business really cares how quickly you can recover a system, what our digitally driven businesses demand is that our systems are always there and downtime in ever more competitive environments is not tolerated.

With that in mind why do I choose Veeam to deliver availability to my modern data platform?

Keep it simple

Whenever I meet a Veeam customer their first comment on Veeam is “it just works”, the power of this rather simple statement should not be underestimated when you are protecting key assets. Too often data protection solutions have been overly complex, inefficient and unreliable and that is something I have always found unacceptable, for business big or small you need a data protection solution you can deploy and then forget and trust it just does what you ask, this is perhaps Veeam’s greatest strength and a crucial driver behind its popularity and what makes it such a good component part of a data platform.

I would actually say Veeam are a bit like the Apple of availability, although much of what they do has been done by others (Veeam didn’t invent data protection, in the same way Apple didn’t invent the smartphone) but what they have done is make it simple and usable and something that just works and can be trusted. Don’t underestimate the importance of this.

Flexibility

If ever there was a byword for modern IT, flexibility could well be it, it’s crucial that any solution and platform we build has the flexibility to react to ever changing business and technological demands. Look at how business needs for technology and the technology itself has changed in the last 10 years and how much our platforms have needed to change to keep up, flash storage, web scale applications, mobility, Cloud, the list goes on.

The following statement sums up Veeam’s view on flexibility perfectly

“Veeam Availability Platform provides businesses and enterprises of all sizes with the means to ensure availability for any application and any data, across any cloud infrastructure”

It is this focus on flexibility that make Veeam such an attractive proposition in the modern data platform, allowing me to design a solution that is flexible enough to meet my different needs, providing availability across my data platform, all with the same familiar toolset regardless of location, workload type or recovery needs.

Integration

As mentioned in part one, no modern data platform will be built with just one vendors tools, not if you want to deliver the control and insight into your data that we demand as a modern business. Veeam, like NetApp, have built a very strong partner ecosystem allowing them to integrate tightly with many vendors, but more than just integrate Veeam deliver additional value allowing me to simplify and do more with my platform (take a look at this blog about how Veeam allows you to get more from NetApp snapshots). Veeam are continuously delivering new integrations and not only with on-prem vendors, but also as mentioned earlier, with a vast range of cloud providers.

This ability to extend the capabilities and simplify the integration of multiple components in a multi-platform, multi-cloud world is very powerful and a crucial part of my data platform architecture.

Strategy

As with NetApp, over the last 18 months it has been the shift in Veeam’s overall strategy that has impressed me more than anything else, although seemingly a simple change, the shift from talking about backup and recovery to availability is significant.

As I said at the opening of this article, in our modern IT platforms nobody is interested in how quickly you can recover something, it’s about availability of crucial systems. A key part of Veeam’s strategy is to “deliver the next generation of availability for the Always-On Enterprise” and you can see this in everything Veeam are doing, focussing on simplicity, ensuring that you can have your workload where you need it when you need it and move those workloads seamlessly between on-prem, cloud and back again.

They have also been very smart, employing a strong leadership team and, as with NetApp, investing in ensuring that cloud services don’t leave a traditionally on-premises focussed technology provider adrift.

The Veeam and NetApp strategies are very similar, and it is this similarity that makes them attractive components in my data platform. I need my component providers to understand technology trends and changes so they, as well as our data platforms, can move and change with them.

Does it have to be Veeam?

In the same way it doesn’t have to be NetApp, of course it doesn’t have to be Veeam, but in exactly the same way, if you are building a platform for your data, then make sure your platform components deliver the kinds of things that we have discussed in the first two parts of this series, ensure that they provide the flexibility we need, the integration with components across your platform and a strategic vision that you are comfortable with, as long as you have that, that will give you rock solid foundations to build on.

In Part Three of this series we will look at building insight, compliance and governance into our data platform.

You can find the Introduction and Part One – “The Storage” below.

modern data platform
The Introduction
modern storage
Part One – The Storage

 

 

Building a modern data platform – The Series – Introduction

For many of you who read my blog posts (thank you) or listen to the Tech Interviews Podcast (thanks again!) you’ll know talking about data is something I enjoy, it has played a significant part in my career over the last 20 years, but today data is more central than ever too what so many of us are trying to achieve.

pexels-photo-373543.jpegIn today’s modern world however, storing our data is no longer enough, we need to consider much more, yes storing it effectively and efficiently is important, however, so is its availability, security, privacy and of course finding ways to extract value from it, whether that’s production data, archive or backup, we are looking at how we can make it do more (For examples of what I mean, read this article from my friend Matt Watts introducing the concept of Data Amplification Ratio) and deliver a competitive edge to our organisations.

To do this effectively means developing an appropriate data strategy and building a data platform that is fit for today’s business needs. This is something I’ve written and spoken about on many occasions, however, one question I get asked regularly is “we understand the theory, but how do we build this in practice, what technology do you use to build a modern data platform?”.

That’s a good question, the theory is all great and important, however seeing practical examples of how you deliver these strategies can be very useful. With that in mind I’ve put together this series of blogs too go through the elements of a data strategy and share some of the practical technology components I use to help organisations build a platform that will allow them to get the best from their data assets.

Over this series we’ll discuss how these components deliver flexibility, maintain security and privacy, provide governance control and insights, as well as interaction with hyperscale cloud providers to ensure you can exploit analytics, AI and Machine Learning.

So, settle back and over the next few weeks I hope to provide some practical examples of the technology you can to deliver a modern data strategy, parts one and two are live now and can be accessed in the links below. The other links will become live as I post them, so do keep an eye out for them.

modern storage
Part One – The Storage
alwayon
Part Two – Availability
control
Part Three – Control
what the cloud can bring
Part Four – Prevention (Office365)

I hope you enjoy the series and that you find these practical examples useful, but remember, these are just some of the technologies I’ve used and are not the only technologies available and you certainly don’t have to use any of these to meet your data strategy goals, however, the aim of this series is to help you understand the art of the possible, if these exact solutions aren’t for you, don’t worry, go and find technology partners and solutions that are and use them to help you meet your goals.

Good Luck and happy building!

Coming Soon;

Part Five – out on the edges

Part Six – Exploiting the Cloud

Part Seven – A strategic approach

Building a modern data platform – The Storage

wp_20160518_07_53_57_rich_li.jpgIt probably isn’t a surprise to anyone who has read my blogs previously to find out that when it comes to the storage part of our platform, NetApp are still first choice, but why?

While it is important to get the storage right, getting it right is much more than just having somewhere to store data, it’s important, even at the base level, that you can do more with it. As we move through the different elements of our platform we will look at other areas where we can apply insight and analytics, however, it should not be forgotten that there is significant value in having data services available at all levels of a data platform.

What are data services?

These services provide added capabilities beyond just a storage repository, they may provide security, storage efficiency, data protection or the ability to extract value from data. NetApp provide these services as standard with their ONTAP operating system bringing considerable value regardless of whether data capacity needs are large or small, the ability to provide extra capabilities beyond just storing data is crucial to our modern data platform.

However, many storage providers offer data services on their platforms, not often as comprehensive as those provided in ONTAP, but they are there, so if that is the case, why else do I choose to use NetApp as a foundation of a data platform?

Data Fabric

“Data Fabric” is the simple answer (I won’t go into detail here, I’ve written about the Data-Fabric_shutterstock.jpgfabric before for example Data Fabric – What is it good for?), when we think about data platforms we cannot just think about them in isolation, we need considerably more flexibility than that, we may have data in our data centre on primary storage, but we may also want that data in another location, maybe with a public cloud provider, we may want that data stored on a different platform, or in a different format all together, object storage for example. However, to manage our data effectively and securely, we can’t afford for it to be stored in different locations that need a plethora of separate management tools, policies and procedures to ensure we keep control.

The “Data Fabric” is why NetApp continue to be the base storage element of my data platform designs, the key to the fabric is the ONTAP operating system and its flexibility which goes beyond an OS installed on a traditional controller. ONTAP can be consumed as a software service within a virtual machine or from AWS or Azure, providing the same data services, managed by the same tools, deployed in all kinds of different ways, allowing me to move my data between these repositories while maintaining all of the same management and controls.

Beyond that, the ability to move data between NetApp’s other portfolio platforms, such as Solidfire and StorageGrid (Their Object storage solution), as well as to third party storage such as Amazon S3 and Azure Blob, ensures I can build a complex fabric that allows me to place data where I need it, when I need it. The ability to do this while maintaining security, control and management with the same tools regardless of location is hugely powerful and beneficial.


API’s and Integration

When we look to build a data platform it would be ridiculous to assume it will only ever contain the components of a single provider and as we build through the layers of our platform, integration between those layers is crucial and does play a part in the selection of the components I use.

API’s are increasingly important in the modern datacentre as we look for different ways to automate and integrate our components, again this is an area where NetApp are strong, providing great third party integrations with partners such as Microsoft, Veeam, VMware and Varonis (some of which we’ll explore in other parts of the series) as well as options to drive many of the elements of their different storage platforms via API’s so we can automate the delivery of our infrastructure.

Can it grow with me?

One of the key reasons that we need a more strategic view of data platforms is the continued growth of our data and the demands we put on it, therefore scalability and performance are hugely important when we chose the storage components of our platform.

NetApp deliver this across all their portfolio. ONTAP allows me to scale a storage cluster up to 24 nodes delivering huge capacity, performance and compute capability. The Solidfire platform, inspired by the needs of service providers, allows simple and quick scale and a quality of service engine which lets me guarantee performance levels of applications and data, this is before we talk about the huge scale of the StorageGrid object platform or the fast and cheap capabilities of E-Series.

Crucially NetApp’s Data Fabric strategy means I can scale across these platforms providing the ability to grow my data platform as I need and not be restricted by a single technology.

Does it have to be NetApp?

Do you have to use NetApp to build a data platform? Of course not, but do look at whatever you choose as the storage element of your platform that it can tick the majority of the boxes we’ve discussed , data services, a strategic vision and ability to move data between repositories and locations and provide great integration , while ensuring your platform can meet the performance and scale demands you have on it.

If you can do that, then you’ll have a great start for your modern data platform.

In the next post In this series we’ll look at the importance of availability – that post is coming soon.

Click below to return to “The Intro”

 

modern data platform
Building a modern data platform – The Series – Introduction