Big Data and Apache Hadoop (High-availability distributed object-oriented platform) are racing towards respectability, and Concurrent wants to both accelerate the journey and make it easier with a couple of announcements, Cascading 2.5 and Cascading Lingual. Despite its rapid enterprise adoption, developing and building applications on Hadoop has proven to be difficult, stated Chris Wensel, Concurrent founder and CTO, who added the company’s focus is forging a simpler path to mass Hadoop adoption by delivering a framework for building powerful and reliable data-oriented applications supporting data driven business models – quickly and easily.
“Hadoop is a mess, a total mess,” said CEO Gary Nakamura. “It’s whack-a-mole as far as solving problems.” Hence the company’s focus to make it easier for mainstream enterprises to build these applications and continue on with their Big Data strategy, he said.
Concurrent, which bills itself as the enterprise Big Data application platform company, said the new release of Cascading features support for the recently announced Hadoop 2 (October GA), including YARN (MapReduce 2.0, MRv2). With Cascading Lingual (Hadoop for Everyone Else), an open source project that provides ANSI compatible SQL, enterprises with business intelligence (BI) tools such as Pentaho, Jaspersoft and Cognos, can now access their data on Hadoop in a matter of hours, rather than weeks, The new offerings provide a number of benefits, said Wensel, including making it easier to migrate applications to Hadoop, and the ability to create Cascading application on the fly.
According to Gartner, Big Data is still making its way up to the Peak of Inflated Expectations, but Hadoop is well on its way to the bottom of the Trough of Disillusionment. That means while there is still a lot of hype and confusion about Big Data is quickly moving to being a useful and more widely adopted technology.
It’s still early days for Big Data and analytics, but the future looks very bright. With organizations drowning in data, and missing out on opportunities, real-time operational intelligence systems are moving from ‘nice to have’ to ‘must have for survival’, said Gartner. It predicts that analytics will reach 50% of potential users by 2014, and by 2020, that figure will be 75%. Post 2020 we’ll be heading toward 100% of potential users and into the realms of the Internet of Everything.
Hadoop may not be the only way to address Big Data, but it is growing strongly, a compound annual growth rate of 55.63% between 2012-2016. Hadoop-as-a-Service is growing even faster, with a CAGR of 95.16% during this period. The Hadoop market is expected to reach $13.9 billion by 2017, with North America, which accounted for 53.85% of the overall market in 2012, leading the way.
A new IDC study, commissioned by Red Hat, found that virtually all of the companies (99%) surveyed have either deployed or plan to deploy Hadoop. A third (32%) have already made a Hadoop deployment, 31% intend to deploy Hadoop in the next 12 months, and 36% plan to use a Hadoop deployment in more than a year.
“We’re very cognizant that Hadoop is not the be all and end all of Big Data,” said Nakamura. Going forward, Concurrent plans to support the different computational platforms, with another platform offering already in the works, he said.
Under The Hood
To be publicly available soon and freely licensable under the Apache 2.0 License Agreement, Cascading 2.5 features and benefits include:
-support for Hadoop 2 and its new features, including YARN, so Cascading users can upgrade to Hadoop 2 and seamlessly migrate their applications; in addition, Big Data applications using domain specific languages (DSLs), such as Scalding (Scala on Cascading), Cascalog (Clojure on Cascading) and PyCascading (Jython on Cascading) languages, will also seamlessly migrate to Hadoop 2;
-added performance improvements for complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS; and,
-additional broad compatibility with other Hadoop vendors and Hadoop as a service providers, including Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR.
Called Hadoop for Everyone Else, Cascading Lingual enables virtually anyone familiar with SQL to instantly work with data stored on Hadoop using their JDBC compliant BI or desktop tool of choice, according to Concurrent. Offering a true ANSI-standard SQL interface, it is compatible with all major Hadoop distributions whether on-premise or in the cloud. Use-case examples include:
-data analysts, scientists and developers can now simply ‘cut and paste’ existing ANSI SQL code to instantly access data locked on or migrate applications to a Hadoop cluster;
-developers can use a standard Java JDBC interface to create new Hadoop applications, or use the Cascading APIs to build applications with a mix of SQL and custom Java, Scala or Clojure code; and,
-companies can now query and export data from Hadoop directly into traditional BI tools.