Anyone who has read the business press lately knows data is a critical driver of business success and transformation. So, the pressure is on and increasing. Our world is driven by massive amounts of data that continue to explode in growth and be characterized by diverse data varieties. How can we use all this data as a strategic asset?
Compounding this challenge is the expanding sources of data. Data-driven initiatives look beyond typical structured data. These diverse data varieties also take the forms of unstructured data – such as emails, internal files, mobile and social content, IoT devices and sensors, as well as web, video and images. And unlike structured data, which is organized, manageable and ready for integration into a database, unstructured data is raw, unorganized and often cumbersome and costly to manage.
The world’s data will double every two years because of unstructured data. Now, more and more organizations are searching for effective ways to collect, manage, and leverage these exploding volumes of new and traditional data sources in order to enable advanced analytics and derive value from big data.
The data lake has emerged as a popular and intriguing concept to meet this business requirement, but also remain widely misunderstood in both its definition and use.
Let’s end the confusion.
Data Lakes Defined
It might be easiest to start with what a data lake is not. A data lake is not a platform for data, nor it is a piece of software. Rather, it is a collection of data organized by a data-driven design pattern that is capable of capturing a wide range of data varieties on a large scale. It is an approach to organizing, cataloguing and retrieving data that can leverage a technology platform, but must be organized and thought as independent of any tool.
A data-driven design pattern you’re likely familiar with is data warehouse architecture. The data lake has come forward as the newest data-driven design pattern optimized for fast consumption of raw data and real-time processing of data for analytics.
Use a Data Lake to Complement Your Data Management Environment
The need for advanced analytics and to extract value from big data are primary drivers of data lake adoption. However, a data lake should be thought of as complementary to an existing data management environment to address data’s evolving nature. A data lake should not be a replacement.
Take for example the data warehouse vs. data lake debate. There isn’t one winner. In fact, the two can coexist and complement each other. A data lake should supplement and be a staging area for the data warehouse. The lake will allow for on-the-fly preparation of diverse, unstructured data without complex ETL practices that a data warehouse requires. Additionally, using a data lake (particularly in the cloud) allows for cost-effective storage to keep up with increasing volumes of data.
Don’t End Up with a Data Swamp
It’s easy for a data lake to become a data dumping ground. However, it will be incredibly difficult (if not impossible) to navigate or use without ensuring data governance best practices are in place. As an example, metadata management is essential with the increased variety of unstructured data sources entering the lake. When left ungoverned, the lake quickly becomes a data swamp. This could leave a business without meaningful business intelligence and jeopardize big data and advanced analytics initiatives.
The Fastest On-Ramp to Big Data & Advanced Analytics
A data lake is not an end in itself. Instead, a data lake is a part of a bigger analytics platform. The emergence and popularity of data lakes presents an opportunity to expand analytics programs, draw business value from new data sources, and modernize the data management environment. However, strong data governance is vital to success.
If you’d like to discuss data lakes in more detail, I’d love to chat. On June 7th I’ll be hosting a discussion with experts from Microsoft Canada where we’ll be talking about big data and how it's impacting businesses today. You can find the event details and registration here.