Top Challenges + Concerns with Unstructured Data

Posted by Patrick Holden on Aug 18, 2020 10:00:00 AM

In prior blog posts, I’ve explained the difference between unstructured, structured, and semi-structured data, and shared the value of unstructured data. In this blog post, I would like to focus on what I believe are the primary barriers preventing organizations from tapping into their unstructured data for analysis.

Legacy NAS Filer and File System Architectures Are Unable to Keep Up

In short, yesterday’s NAS filers were just not designed to handle the additional capacity and performance requirements of today’s modern unstructured data sets. Most legacy systems were not built for or designed to handle big data at scale. They have capacity and/or file count limits that are just too low, resulting in multiple repositories (NAS appliances, file systems, namespaces, and so on) that exist across the data center or in the cloud.

Performance problems become an issue as well due to the fact that most of today’s modern unstructured data sets are generated by IoT devices, social media platforms, and mobile apps. This type of modern unstructured data is typically comprised of a high number of very small files and require advanced metadata analysis by various applications that access the data. Most legacy filers were not designed to house millions and millions of small files with various requests to not only access the data itself but also to service various metadata requests on top of it.

As it concerns upgrades and expansions, most only follow the scale-up model, which allows for disk storage expansions but not for additional compute or networking resources to be added to the array. As these legacy systems see more and more disks and disk shelves added over time, but compute and network resources remain as-is. Performance begins to degrade as file counts grow exponentially and free capacity on the array shrinks and shrinks.

Legacy NAS systems are also much less efficient. Most utilize some form of antiquated post-process data reduction and garbage collection mechanisms that, over time as the array fills up, can lead to additional performance issues. Organizations also have to deal with increased vendor warranty and support costs that often rise year after year as these older solutions age. With each passing maintenance renewal, it becomes harder and harder to justify the rising costs to maintain support for such legacy systems.

Unstructured Data Becomes More Difficult to Manage as it Grows and Grows

As an organization’s unstructured data footprint grows, so do some additional pain points. The complexity of day-to-day operational management is one of the first hurdles to face in any environment where the data footprint has been growing at a rapid pace. If an organization is still dependent on legacy solutions to house its big data, then it most likely has acquired more and more physical systems that must be monitored, secured, and protected. Additional operational staff headcount is required to manage this ever-growing pool of disparate systems. The underlying environment becomes more and more complex, which eventually results in an untenable situation as it concerns the management of these legacy systems.

Data migrations also become a real pain point as certain arrays fill up and data needs to be migrated over to other physical systems. Maintenance and downtime windows are typically required in order to facilitate these types of large data migrations, as well. Last, but definitely not least, are the challenges with protecting these large unstructured data sets. Backup jobs increasingly take too long and backup schedules begin to overlap with themselves. Data replication to a secondary site also becomes a challenge since it requires a second instance of matching capacity in the disaster recovery environment.

Enforcement of Ever-Changing Compliance, Retention, Security, and Data Privacy Requirements

No matter what industry any given organization is a part of, they most likely are facing more and more regulations and policy enforcement challenges than in years past in regards to compliance, retention, security, and data privacy. Historically, it has mainly been financial, healthcare, and government entities that had to comply with these types of regulations, but as the private sector is capturing and storing more and more end-customer data, they too are facing these types of challenges.

The regulations themselves appear to evolve and change quickly, leaving organizations potentially in the lurch trying to keep up and retro-fit their environment and policies in order to meet just the basic requirements of some of these new mandates. Specifically, data retention has recently become a key concern with unstructured data. As businesses attempt to store more and more data for analysis, they also tend to retain that same data for longer periods of time and are sometimes storing certain sensitive data much longer than they actually should. The longer they keep this data, the more they risk running afoul of one or more data regulation policies.

Historically, we have seen most organizations handle their unstructured data in a very decentralized way. In the past, it was basically left up to the individual user or group that owns the data internally within the organization to decide who has access and how long the data should be stored and retained. This is fine until there is some kind of public data breach and it becomes the business’s problem much more so than the individual or group that initially managed the data. It should be clear to all that as a company’s unstructured data footprint grows, so does its chance of being in violation of one or more of these regulations. Various compliance and security solutions that used to be seen as optional are now becoming mandatory for organizations that want to store and tap into their unstructured data sets.



have you taken advantage of unstructured data's value to your business?

C1-Unstructured-Data-White-Paper-1

This ConvergeOne white paper will answer the following questions:

  • What exactly is unstructured data (also known as qualitative data)?
  • How is it different from structured data (also known as quantitative data)?
  • What are the latest trends and inherent challenges businesses need to be aware of concerning unstructured data?

Download the white paper to learn why it is becoming more critical than ever to extract value from unstructured data.

DOWNLOAD THE WHITE PAPER

Topics: Data Center