Friday, April 12, 2013

The Big Hype Begins for Big Data

Back in May 2012, we started looking into "big data," as it was starting to make the rounds of university alumni magazines, technomags, and the occasional CEO speech, as with HP. In this construction, big data was a step-child of "cloud computing," the other big marketing concept.

You can be sure that big data has arrived as a big time concept when academics jump in, and as the New York Times reports, they have done in a big way.

If you want to see how foggy the concept is, check out this video from the Institute for Data Sciences and Engineering, Fu Foundation School of Engineering at Columbia University.  Between one the heads of the IDSE, Professor McKeowen and Dean Goldfarb, they stumble around the edges without being able to cogently describe what exactly the Institute will be teaching and training practitioners to do.

The Times reports today that data scientists will be among the most highly recruited graduates in the coming years.  Among the leading commercial companies employing "big data" concepts are Google and Amazon.  Most of what these future graduates will be working on will be fairly boring and routine. They will become like the "Microserfs" who worked on the early code for Microsoft Office: important work, but mind numbing. Of course, many of these engineers got wealthy through stock options in the then smaller company.

On the other hand, using the statistical, computer science, and decision science concepts for bioinformatics and personalized medicine will be interesting, but the data scientists will probably function like grunts,on an multidisciplinary team headed up by physicians or biological researchers.

Mike Loukides on the O'Reilly Radar has a good introduction to the broad contours of data sciences.

Coincidentally, about two weeks ago, I finished an interesting book by Samuel Arbesman, "The Half-Life of Facts."  Dr. Arbesman's Ph.D. is in computational biology, but his book is about a collection of inter-related issues and makes good reading.

Arbesman warns against the "big data" hype by saying the utility of oceans of data to determine the optimal pattern of traffic light timing is Rio ("Smart Cities") is rather limited.  The thorniest problems will require "long data," i.e. time series and a different set of analytics.

If you're buying HP, Cisco, IBM, VMware or other stocks for the big data play: relax, you're WAY early.

No comments: