The Rising Importance Of Text Analytics (Complete Article)
The talk these days is all about big data, but extracting insights that lead to value is the role of analytics — not just big data alone. And since a lot of that data is textual in nature, then the responsibility for delivering value falls upon textual analytics. And that is a big deal.
Recently, I attended the Text & Social Media Summit (which was co-located and had some joint sessions with the Customer Analytics Summit) in Cambridge, Massachusetts. As usual with such meetings, the speakers covered a broad range of topics. One thing that struck me was that the speakers and audience were primarily made up of practitioners, and that everyone wanted to learn from others so that they could not only continue doing what was successful for them, but also learn new approaches and strategies.
Let me touch upon just a few of the ideas that permeated the meeting.
Good analytics is needed to derive value from big data
One of the most thought-provoking presentations was by Gary King of Harvard University with the challenging title of Big Data Is Not About the Data. His thesis was that the value of big data actually lies in the analytics. One example that he used was an examination of the solvency of the U.S. Social Security Administration (SSA). The SSA had basically used essentially the same statistical methods for 75 years, and overall SSA forecasts were inaccurate, inconsistent, and overly optimistic.
Through the use of customized analytics that King‘s group at Harvard developed, forecasts using publicly available information showed that the SSA Trust needs over $1 trillion more than it thought. He felt that this type of analysis would also apply to the insurance industry, public health, etc. His argument on the value of analytics over data seems to be that the data was already available, but extracting value depended upon building analytical techniques that could unlock that value.
Although King makes a strong point, the answer is that both data and analytics are important. All the analytics in the world will be of no help if the data does not exist or you cannot access the data for use. Still, King‘s thesis really speaks to the need for creativity in the use of analytics to take advantage of data.
Integrating structured and unstructured data concepts together is a wave of the future
Traditional analytics has tended to focus on structured data, i.e., relational databases (such as doing analyses using traditional data warehouses). Much of big data tends to fall into the unstructured data category (I use semicolons to distinguish between semi-structured and unstructured data, but I won‘t push the difference here).
That unstructured data tends to respond to analytical techniques such as text analytics, rather than the analytics typically applied to SQL data. That has led to the thesis that the two are separate and distinct (as well as to the thesis that non-SQL techniques will dominate). Ralph Winters of Emblem Health (and other speakers, such as IDC) vigorously disagreed with that point of view.
In his presentation Practical Text Mining with SQL-using Relational Databases, Mr. Winters clearly showed the value in mapping unstructured data to structured data with a full text search that led to a weighted word matrix and other types of structured analyses that could be used to spot churn or conduct a sentiment analysis. Tying a relational data base (RDMS) to such things as a Hadoop connector, open source text mining tools, and file interfaces can lead to increased analytical richness.
The whole field of text (and other) analytics continues to evolve. Integrating analytic concepts that have traditionally been applied with structured data along with techniques that have traditionally been applied to unstructured data shows great promise.
Text analytics has a large number of practical uses
Many examples were discussed during the summit so I hesitate to focus on just one presentation, but Dr. Sergei Ananyan of Megaputer Intelligence did a good job of discuss-ing the business applications of text analytics. His first title slide was Bisunses Appliltacons of Txet Anlatycis, which clearly showed that humans can make sense of the correct title which was Business Applications of Text Analytics.
In the 21st century, text analytics is taking advantage of machine learning, semantic analysis, and deep linguistic parsing. All that can lead to useful applications, such as loan default analyses and sentiment analyses. One of the more important areas is medical diagnostics, where early diagnostics can eliminate common source of error. Another is the use of text analytics in eDiscovery, which is the examination of electronic information for evidence in a legal case. These areas are just a few examples, but they illustrate areas to which text analytics can be applied.
We have been subject to an application-driven software intelligence perspective of IT (where applications have dominated our consciousness as to where we derive value from IT) for most of our lives. So a data-driven software intelligence perspective, such as big data, where value in IT is squeezed from the data itself is not only unfamiliar and hard to comprehend, but also a little uncomfortable. Yet the world of data-driven software intelligence is the world of text analytics and will transform our view of how to get value from the IT infrastructure.
I could only touch upon a few points, such as the value of analytics, the evolution of unstructured data into a more structured world, and the breadth of text analytics. Pay attention to what is happening as it will affect your business life more and more. Meetings, such as the Text & Social Analysts Summit, are one conduit to learning more about that impact-ful technology.
The Mesabi Group (www.mesabigroup.com) helps organizations make their complex storage, storage management, and interrelated IT infrastructure decisions easier by making the choices simpler and clearer to understand.
NOTE: This column was originally published in the Pund-IT Review.