Apache Hadoop (High-availability distributed object-oriented platform), the open-source software framework that supports data-intensive distributed applications, is hot, with the market expected to expand at a compound annual growth rate of 55.63% between 2012-2016. Not only is it a Big Data play, but it is also benefitting from the everything-as-a-service groundswell, with Hadoop-as-a-Service predicted to enjoy an even higher CAGR during this period, shooting up 95.16%. The dominant vendors are Amazon Web Services, IBM, EMC, and Cloudera, but there are a host of other vendors in this space, ranging from MapR Technologies and Hortonworks to Hewlett-Packard and Dell.
Organisations are drowning in data, and the amount of data just keeps increasing. Deriving value from this data deluge, everything from unstructured data to live event streams, necessitates better solutions than the traditional data warehouse – i.e. Hadoop and NoSQL databases — to manage Big Data workloads.
“Hadoop represents a paradigm shift, bringing compute and data together,” said MapR CMO Jack Norris in a recent interview. “Hadoop is one of the most significant, disruptive enterprise architectures that have come forward in my lifetime… disruptive to storage, disruptive to data warehouse, disruptive to enterprise compute stack. It’s arriving at time when data is growing faster than Moore’s Law… and the desire to process that data in an automated fashion is growing exponentially.”
According to Norris, Hadoop is changing the data warehousing landscape, helping users lower the cost of data warehousing by archiving data and offloading ETL processing to a much lower cost Hadoop platform, a platform that is 50 times cheaper than enterprise data warehouses. This enables scaling to manage increasing transaction volumes.
The developer of Hadoop-based enterprise software (MapR is short for MapReduce, the programming model for processing large data sets with a parallel, distributed algorithm on a cluster originally developed by Google) has customers getting a 50x cost advantage on a per-terabyte basis with Hadoop versus data warehouses. One customer reported a 10x performance improvement over ETL (Extract, Transform and Load) and a 100x cost savings, he said.
The switch to online channels is driving unprecedented volumes of transaction data and click stream data that are driving up the cost of data warehousing as staging areas holding data for ETL processing, he said. The combination of Hadoop with ETL processing opens up the proposition of creating a low cost multi-way enterprise data management hub that manages and accelerates data processing in an end-to-end Big Data analytical ecosystem.
Businesses also want to analyze new and more complex high value data types like clickstream and un-modeled multi-structured data to add new insights into what they already know. Norris said Hadoop is designed to handle a wide variety of data that traditional data warehousing solutions cannot. It also can act as a long term data store for Big Data as well as archived data warehouse data and be used as an analytical platform to handle analytical workloads that are unlikely to be done in traditional data warehouses.
The important thing about data warehouse offload is it’s not necessarily an either or situation, he said. It’s not about ripping out your existing infrastructure, but about a platform that can be deployed along side. MapR believes Hadoop will become the high-scale enterprise hub from which most data management activities and analyses will either integrate or originate, he said.
Norris paints a rosy picture for Hadoop, but the current enthusiasm is not yet being translated into broad corporate acceptance. According to new data from Gartner, an “intractable third of the marketplace” —meaning the companies with no plans to invest in big data projects — remained essentially unchanged from 2012’s results. The only significant shift was in sharp decline in respondents who answered they did not know if their company had a big data plan: from 11 percent in 2012 to 5 percent in 2013.
We are in the early moments of Hadoop being accepted as a vehicle to moving forward, said Gartner analyst Merv Adrian, but there are big gaps in security and governance. “We have to have confidence about the security of the platform, that’s the number one issue.”
Hadoop has a long way to go before meeting the standards set forth by enterprise-grade software, according to MapR chief application architect Ted Dunning. “We’re seeing that open software is moving from the science fair phase where the people who were using it were the very early adopters…into an enterprise area, and the expectations are becoming very different: the expectations for continuity, maturity of implementation, [and] level of support. All of those notch up a lot when you sell to large organizations and when their business depends on it,” the executive noted.
There may be issues troubling enterprises, but the market data indicates that waiting is not going to be one of them. Another survey reports that the Hadoop market — performance monitoring software, management software, application software and packaged software — will reach $13.9 billion by 2017, with North America leading the way. North America accounted for 53.85% of the overall market in 2012 at $0.84 billion and it is expected to reach $6.92 billion by 2017.