Speech Technology Magazine


What Made AI Mainstream, Part 1: Big Data (Video)

Paco Nathan of O'Reilly Media's R & D Group discusses the role of big data in the commoditization of AI in this clip from SpeechTEK 2018.
By The Editors of Speech Technology - Posted Nov 2, 2018
Page1 of 1
Bookmark and Share

Learn more about AI at the next SpeechTEK conference.

Read the complete transcript of this clip:

Paco Nathan: With AI, there were really three factors that came together that enabled commercialization. The first thing is Big Data, and the next is Big Compute, and the third is Big Models.

To give you a little bit of background, big data really started out in Q3 of 1997. There were four teams that were struggling with the same problem. It was what would become Google, Yahoo, eBay, and Amazon was already going. So four teams, and they've all documented it. Greg Linden at Amazon, Randy Shoup from eBay, on and on, they've documented what they were thinking and how they approached it.

Basically, they’ve come to the conclusion that eCommerce was about ready to really take off, but at the time they were running on, like single servers. So eBay at the time, in '97, was four servers. All of eBay. What they realized in Q3 was they had to get ready for holiday shopping, and four teams independently took and distributed the workloads. They got away from having bigger and bigger Oracle licenses and they distributed the workloads onto more open source approaches, Linux on commodity servers, and they could run basically private clouds. They took the workloads and parallelized them to spread them across a lot of little servers.

The phenomena that happened was, A, those four companies were in place for Q4 of 1997, which was the first real big success for eCommerce. But then, what they saw subsequently was that when you have a bunch of servers running all these transactions in eCommerce, you get a lot of log files. And again, these companies, and others that started joining them, they started aggregating the log files, bringing them back together and then running machine learning on the aggregate. And from that, you could produce what became called data products. You could have classifiers, indexing, recommendation systems. That all would feed back into the web apps that were generating the data in the first place.

So what you did was you could see a kind of virtual cycle emerging, where the eCommerce transactions created data. You aggregated it, used machine learning to improve the smarts, the intelligence in the eCommerce apps, and that allowed these companies to become giant, but it also set the mode for others to follow. It was really the origins of big data.

One of the things that we see with this kind of analysis is, how many people have ever run across like J-curve forecasting? I don't know if that's an obscure term, but basically, for these kind of technologies, we see a window of about 10 to 14 years from the point where they're invented to the point where they become commoditized. And the mid-point, right in between there, is where the early adopters really come in in a serious way. You see an inflection point, generally.

So, 1997 for big data, that's when it started. Roll the clock out about 12 years, you get out to 2009. Am I doing my math correctly? And what you find is that's when big data first started coming into parlance in the industry. It was about 2009. Data science and big data hit it up then. That's when we launched the Strata Data Conference about big data.

And then you roll the clock out further, you get out to about 2017, and big data, you see airport advertisement about it. So it's really commoditized by now.

Page1 of 1