“Big data, huh, what is it good for?…

The mood of this week’s Hadoop Summit has felt wonderfully diverse. There is a cognitive disconnect between the incremental progress of dot release feature sets and the revolutionary new business and societal applications of the technology. In the same keynote session the topics can swerve from optimizing cluster utilization to optimizing marketing yields to finding a cure for cancer. The technical lectures were packed, while the expo floor was focused. A loud rock and roll string trio in (unnecessarily short) black dresses exits the stage to be replaced by serious talk of open-source projects and community. One day a presenter will explain how clever they are to be able to apply pervasive surveillance of drivers for more profits, the next day a keynote is focused on rallying the audience to develop their ethics and fight the forces of ignorance. Is the Hadoop Summit about vendor announcements or business outcomes? Is it about infrastructure or applications? Is it “sex sells” or cubical contributions? Is it profit or politics? Ultimately, it seems the big data industry is struggling to define who will benefit and how from our new technologies. If I had to pick some winners and losers, here would be my list. To read the complete article, CLICK...

Read More

Psycho query: qu’est-ce que c’est?

Any Talking Heads fans reading this blog? Take any French classes in high school? No? Nevermind then. I get asked a lot about SQL on Hadoop, and I know what you’re thinking: “this guy must have the coolest friends and the go to all best parties.” And you’re right, I do. Lenny Kravitz by a rooftop pool in Vegas. Fitz and the Tantrums. Duran Duran. The Astoria Middle School Marching Band on Loyalty Day. (10-year old daughter with a shiny new flute…) What were we talking about? Oh right — how do you find the important and relevant information you need in a very random data lake? You use SQL on Hadoop. Way easier for the average user than trying to code up the same functionality in MapReduce, at least if you are a DBA, analyst, or BI developer already familiar with SQL. Which according to our latest study, more than half of you are. If we take the respondents (shown below) as representative of the those engaged in big data and analytics initiatives, then you’re probably going down this road now. To read the complete article, CLICK ON AUTHOR’S...

Read More

Schrodinger’s Cat and Analytics Accessibility

Everyone loves the concept of Schrodinger’s cat, with the possible exception of a few serious PETA members. The metaphor that an entrapped feline can be both poisoned and/or not poisoned until directly observed is a catchy way to understand uncertainty around various possible states and outcomes. I see a similar problem with big data. Companies are going to great lengths to gather data and sophisticated workflows to analyze it all. The goal, of course, being nothing less than omniscience about the business. Intimately know every project, every product, every customer, every dollar. The problem is sometimes the black box isn’t open yet. Access to the information is limited. Business decisions are still guesses. The promised insight remains elusive. We are no closer to seeing if that poor cat is dead or alive. To read the complete article, CLICK...

Read More
EMC Dips Deeper Into The Shallow End of The Data Lake
Mar23

EMC Dips Deeper Into The Shallow End of The Data Lake

Barely a month after making its first big splash in the Data Lake, EMC is back at it with an all-in-one Big Data analytics solution — hardware, software and services – with availability and pricing to be determined later. The Federation Business Data Lake packages storage and Big Data analytics technologies from EMC Information Infrastructure, Pivotal, and VMware, together with services, to accelerate and automate deployment of Data Lakes and clear ‘the path for new insights and disruptive differentiation.’ It greatly simplifies the massively complex task of building a Data Lake and is designed for speed, self-service and scalability for the enterprise, enabling organizations to begin making better-informed business decisions using Big Data analytics, said Aidan O’Brien, Senior Director, EMC. “This is the industry’s first enterprise-grade Data Lake,” he said. It seems like everybody’s use case varies slightly, stated O’Brien. So what we’ve tried to do is to increase standardization and flexibility. No company has all the technologies required, i.e. EMC lacks a visualization component, he said, but what they’ve done is provide some ready-made apps, a data scientist in a box, that can be implemented in as little as seven days. For the more adventurous, there’s also a software developer kit. According to Gartner, a Data Lake is a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format. The purpose of a Data Lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse). EMC gets more granular, defining the Data Lake as a modern approach to data analytics by taking advantage of the processing and cost advantages of Hadoop. It allows you to store all of the data that you think might be important into a central repository as is, and by leaving the data in its raw form, you don’t need a pre-determined schema or ‘schema on load’. Schema on load is a data warehousing process that optimizes a query, but also strips the data of information that could be useful for analysis. This flexibility then allows the data lake to feed all downstream applications such as a data warehouse, analytic sandboxes, and other analytic environments. It’s still early days for the Data Lake concept, but Suresh Sathyamurthy, Sr. Director, Product Marketing for EMC’s Emerging Technologies Division, said his company was the market share leader, even though...

Read More

The Federation Business Data Lake and the “One Pile Method”

EMC and its federation sister companies (Pivotal, VMware, and RSA) have now worked together to make the one pile, ahem, business data lake something that actually works for the business. Their efforts are worthy. They make big data not just feasible, but something you actually want to live with. To read the complete article, CLICK HERE

Read More