Steve Lohr on the Origins of the Term “Big Data”

Data hounds will appreciate reading Steve Lohr’s concise but informative article in the February 1 edition of the New York Times, in which he takes a look at the origins of the moniker “big data.” It’s fun insofar as the term has drifted into common parlance after being mentioned here and there, but it may not be so easy to find a single individual whom to credit for its creation. The first time I ever regarded it seriously was when it appeared in a NBER Working Paper that addressed future career opportunities for economists in big data (I’ll add the cite once I track down again).

It reminds me of a local story involving moniker-manufacturing on a grand scale. During the late 1970s, The Oakland-Berkeley regional newspaper East Bay Express published an article by humorist Alice Kahn. In the article, Ms. Kahn coined the term “Yuppie.”  So far as anyone could tell, she was the first person to use the term, which meme-exploded across the USA in a few months. In subsequent issues The Express she turned it into an ongoing gag, because everybody she knew kept telling her, “We think you should sue” –for rights to the term. Humor being an “open source” product first and foremost, she didn’t sue, but did “work it” for what it was worth.

Back to big data.  Here’s a quote from the article, given by Fred R. Shapiro, Associate Librarian at Yale Law School and editor of the Yale Book of Quotations:

“The Web…opens up new terrain.What you’re seeing is a marriage of structured databases and novel, less structured materials. It can be a powerful tool to see far more.”

This is exactly the point that Autonomy and other e-discovery firms such as Recommind make:  to analyze the full output of a given company, corporation or legal case, you now have to look at all of the data. That includes the easier-to-parse world of structured data, but more and more it includes social media, email, recorded telephone conversations and many other casual (but critical) information resources.