EMC Dips Deeper Into The Shallow End of The Data Lake

Barely a month after making its first big splash in the Data Lake, is back at it with an all-in-one Big Data analytics solution — hardware, software and services – with availability and pricing to be determined later. The Federation Business Data Lake packages and Big Data analytics technologies from EMC Information Infrastructure, , and VMware, together with services, to accelerate and automate deployment of Data Lakes and clear ‘the path for new insights and disruptive differentiation.’

It greatly simplifies the massively complex task of building a Data Lake and is designed for speed, self-service and scalability for the enterprise, enabling organizations to begin making better-informed business decisions using Big Data analytics, said Aidan O’Brien, Senior Director, EMC. “This is the industry’s first enterprise-grade Data Lake,” he said.

It seems like everybody’s use case varies slightly, stated O’Brien. So what we’ve tried to do is to increase standardization and flexibility. No company has all the technologies required, i.e. EMC lacks a visualization component, he said, but what they’ve done is provide some ready-made apps, a data scientist in a box, that can be implemented in as little as seven days. For the more adventurous, there’s also a software developer kit.

According to Gartner, a Data Lake is a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format. The purpose of a Data Lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse).

EMC gets more granular, defining the Data Lake as a modern approach to data analytics by taking advantage of the processing and cost advantages of Hadoop. It allows you to store all of the data that you think might be important into a central repository as is, and by leaving the data in its raw form, you don’t need a pre-determined schema or ‘schema on load’. Schema on load is a data warehousing process that optimizes a query, but also strips the data of information that could be useful for analysis. This flexibility then allows the data lake to feed all downstream applications such as a data warehouse, analytic sandboxes, and other analytic environments.

It’s still early days for the Data Lake concept, but Suresh Sathyamurthy, Sr. Director, Product Marketing for EMC’s Emerging Technologies Division, said his company was the market share leader, even though they’re just “scratching the surface”. EMC is looking at 3-4x growth next year, he said.

Nick Kirsch, EMC’s VP & Chief Technology Officer, ETD, recently divided the Data Lake market into two segments: utilizing intelligent software-defined storage resource management to store petabytes of data — and making that data available with multiprotocol access; and, a hyper-converged Data Lake that’s complete with apps, compute resources, and networks — delivered as an integrated appliance. In both cases, the decision is based on the unique challenges businesses face in delivering performance, managing growth, and gaining insights from their data.

EMC’s Pivotal was placed in the Visionary quadrant of the recent – and inaugural – Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Leaders were , Teradata, IBM, , and ; challengers included , Cloudera and MapR Technologies.

The Big Data technology and services market is exploding, according to IDC. This segment will grow at a 26.4% compound annual growth rate to $41.5 billion through 2018, or about six times the growth rate of the overall information technology market. Additionally, by 2020 IDC believes that line of business buyers will help drive analytics beyond its historical sweet spot of relational (performance management) to the double-digit growth rates of real-time intelligence and exploration/discovery of the unstructured worlds.

O’Brien agreed that this is an early-stage market, but the market is moving so quickly, it’s become a technology arms race. That’s why EMC has taken this platform approach. “We basically want to avoid a nuclear arms race.”

With this offering, customers don’t have to spend 4-6 months trying to integrate all the pieces together. By taking away all the integration headaches customers can instead focus on the strategic elements, he said.

However, regardless of where customers are – undecided, motivated or ready – they will need to be educated about what’s possible, and why. Education will be key, which is why it is one of EMC’s major Data Lake/Big Data thrusts, he said.

The Fiddly Bits & Bytes

The Federation Business Data Lake is a fully engineered solution that can be rapidly and automatically provisioned, enabling IT organizations to lead the needs of the business. The analytics layer is completely virtualized with VMware running on with predefined analytics use cases and automated provisioning and configuration. EMC Isilon provides the Data Lake Storage Foundation, delivering the ideal balance of capacity and performance.

The analytics layer is comprised of the , including , featuring the world’s leading SQL-on-Hadoop engine, HAWQ. provides enterprise-class SQL, which allows for seamless integration and interoperability with top analytics platforms such as SAS, Tableau and others, over data stored in Hadoop. EMC is also delivering two additional Business Data Lakes to enable integration with customer choice of Hadoop distribution including Cloudera and Hortonworks, along with any future Open Data Platform-based Hadoop distribution.

A full suite of services and education is available, including:

EMC Technology Onboarding Service: For customers who are ready to deploy a Data Lake, the EMC Technology Onboarding Service offers full consulting services to install and deploy the Federation Business Data Lake, optimize the analytics environment and configure and customize data requirements.

EMC Proof of Value Service: For customers who know the use case they want to address but are looking for help implementing the latest big data analytic and rapid application development tools and techniques, the Proof of Value Service demonstrates the ROI of a targeted use case using real customer data.

EMC Big Data Vision Workshop: For customers who are undecided about how to start infusing Big Data into its business strategy, the EMC Big Data Vision Workshop analyzes an organization’s strategy, business goals and then prioritizes a target use case for the start of its Big Data journey.

Education Services: In addition to the service offerings above, EMC offers training and certification to develop fundamental as well as advanced Big Data and Data Science understanding and skills required by business leaders and Big Data practitioners.

The Federation Business Data Lake will be offered in Directed Availability in April 2015 with General Availability at the end of Q3 or early Q4.

Author: Steve Wexler

Share This Post On

Leave a Reply