India@75: A nation in the making

'The Good Thing about Big Data is You Don’t Have to Start Big'

Whether it’s the annual ‘hype cycle’ or fog in the users’ mind, Big Data has bumped off cloud computing to top the charts. Werner Vogels, chief technology officer of Amazon and one of the Big Data generators and solution providers, was in Bangalore recently and spoke to Forbes India

Published: Jul 25, 2013 07:34:49 AM IST
Updated: Jul 25, 2013 05:02:36 PM IST
'The Good Thing about Big Data is You Don’t Have to Start Big'

Q. A lot of businesses use analytics and some of them confuse it with Big Data. What is the difference, according to you?
Big Data is different [from analytics] because in traditional business intelligence, you knew beforehand what kind of questions you want to ask, and to answer those questions what kind of data you needed, and then collect the data to answer those questions.  In the world of Big Data, this is turned on its head. We don’t really know what exactly we need to ask. You just collect as much information as possible; you may combine that with other data streams. You don’t really have data model.

Why don’t I start looking at customers in different segments, geographies; or customers that only spend time researching for books versus customers who are looking at DVDs? The data model is created on the fly. We ask, for example, how many steps a customer takes before he buys something or if the customer comes in and types in ‘Sony’, does that mean he knows what he has come to buy or is he in the research stage. This kind of understanding of the customer may help you understand that if the customer is doing these two things, he is going to buy. So you make the process easy and seamless for him. If he takes these two actions, he is on a research path—he is comparing products and so make [the comparison] easier for him.  What we have today and what we didn’t have in the past is the location information. Whether a customer is at home or if he is on the road talking about books with his friends or whether he is at a store looking at DVDs—his actions are different depending on where he is. So, understanding the customers’ behaviour is a big driver of Big Data.

Q. That is from a business point of view. What needs to be done on the technology front?
In reality, Big Data is not just analytics. It’s actually a whole pipeline of things. How do we collect the data, how do we get the data into this pipeline, where does this data come from: From our website, from the brick-and-mortar stores, from the devices? How do we get it to the place where we want to operate on it and how do we store it? We are talking about large amounts of information here. Does it go into a relational database, does it go into a key value store? All these different options, as a technologist, involve issues we have to think about. Then we get to the next step, how to organise the data—there are multiple data streams that come from different places, different devices, different applications. And the last piece of the pipeline is: How do we share this data? This whole pipeline requires a lot of innovation.

Q. Which segment of this pipeline has most challenges?
I think it’s the analytics; we are still looking for more tools. The more technology savvy companies have figured out themselves but that cannot be said of a majority of enterprises. We see a whole new set of tools coming up this year and next. We will see tremendous innovation happening there.  

We will see more platforms targeted at a particular segment. A good one in our case is a platform for life sciences companies to do Big Data. All these life sciences companies use more or less the same style of processing. A company like Unilever makes use of platforms to develop toothpastes and deodorants. You will think deodorants are just research into what smells nice but it turns out that that’s not the case.  There is a lot of genomics research that is going on to understand the microbes and how they interact with particular genes and how they trigger sweat production. Things like these are actually Big Data genomics problem.

Again, in gum decay, it turns out there are genetics that play a role in the interaction between the toothpaste and microbes. Making toothpaste is genomics problem these days. In the past, companies like Unilever used to take two to three years. They could leisurely develop new products. But the new economy is so competitive that two to three years is no longer a possibility; so they are looking to speed up every possible part of product development.

Q. Tell us how are automotive companies, especially in India, looking at Big Data?
Let’s take Tata Motors. They are instrumenting all their trucks with GPS, with sensors, with telecom equipment to stream the data packs to Amazon cloud for analytics to understand not only where the trucks are but also the health of the trucks. Should we be doing preventive maintenance? If we can avoid truck breakdowns, the cost the company saves is tremendous.  And so here we have thousands of trucks and that’s a Big Data problem.

Q. Do businesses understand what they can do with Big Data?
Companies in general don’t have the understanding that the data that is sitting can give huge benefits. For example, Heritage Network, a healthcare insurance network in the US, has data sets with 5 million claims record.  They are just claims record—not health information. They have challenged data scientists to find a predictive model based on these claims record where you can determine the probability of these individuals ending up in the hospital in the coming year. Based on a Big Data predictive model like that, they are able to provide preventive healthcare. Now, this is not really looking at healthcare information of individuals, it is actually looking at financial records and this demonstrates some of the power of Big Data. It benefits both sides—individuals don’t end up going to hospital and insurance companies don’t end up paying. What we often see in the Big Data world is it’s a win-win [situation]. It’s not a zero sum game.

Q. Are conventional, brick-and-mortar companies interested?
I am not a part of brick and mortar. I am seeing that competition is tough and the economy not flourishing. So, for any company, winning the hearts of customers is important, and to be more consumer-driven is important. The loyalty of the customers is rapidly disappearing.

Q. Would you like to bust any myths?
The good thing about Big Data – the term is a bit misleading – is that you don’t need to start big. Can you find a way to better segment your customers? In that case, instead of targeting all your customers with advertisement, maybe you can just target a specific group. You can do it as a small project without investing too much into hardware or anything like that.

Q. Is more data better? Statistician Nate Silver (who accurately predicted the winners of all 50 states in 2012 US Presidential elections) says ‘No’.
Often, if you collect more data you will get better results. What Nate focuses on are unexpected occurrences. Some of them are very difficult to predict. They are black swans, but there are many white swans. You might be in the business of predicting black swans. For that, you will need different techniques. Big doesn’t always mean huge amounts of storage. What I mean is the more data you can collect the better your results will be. For example, type in Google ‘funny Amazon recommendations’ and you will find that when we didn’t have enough data we were not able to give good recommendation to our customers. Our recommendation engine would go astray. If you don’t have sufficient information, the algorithms that you use cannot produce high quality result. It doesn’t mean terabytes of data; it means you need to have sufficient amounts of data.

Q. Is Big Data all hype, just a bubble?
One of the analysts comes up with the yearly hype cycles. I was happy that cloud computing got knocked off from the top by Big Data. I think hype cycle has to do more with you guys than with us, to be honest. How things are written about—about the future rather than what they are at the moment.  Cloud computing has reached a point where there is no longer hype. That means it’s easier for us to do business, to talk about what the values are and what the benefits are more than we have to deal with—because hype also leads to misinformation. We can focus on reality as business moves forward, and this can happen to Big Data as well. The fog clears off, it starts to accelerate, and benefits are clear.

Check out our Festive offers upto Rs.1000/- off website prices on subscriptions + Gift card worth Rs 500/- from Click here to know more.

Show More
Post Your Comment
Required, will not be published
All comments are moderated
  • Ulf Mattsson

    I agree that '€œThe Good Thing about Big Data is You Don'€™t Have to Start Big'€, but I think that you have to start with security in mind. Unfortunately, many organizations have rushed into Big Data focused solely on ROI, and privacy is an afterthought. Many companies are now collecting data files into Big Data environments without fully understand what specific information that is hidden in those files. In many cases they do not have the resources to analyze before collecting huge volumes of data files. I think that the CISO/CSO/CPO should also be involved. Privacy issues and risk should be discussed when using big data in this scenario. There is also a shortage in Big Data skills and an industry-wide shortage in data security personnel, so many organizations don'€™t even know they are doing anything wrong from a security perspective. I think that many organizations shortly will be struggling with a major big data barrier: 1. I think a big data security crisis is likely to occur very soon and few organizations have the ability to deal with it. 2. We have little knowledge about data loss or theft in big data environments. 3. I imagine it is happening today but has not been disclosed to the public. The good news is that some organizations are proactive and successfully using new approaches to address issues with security and privacy in Big Data environments. These new security approaches are required since Big Data is based on new and different architecture. Ulf Mattsson, CTO Protegrity

    on Jul 25, 2013