Big Data – They unveiled the secret


Still in the hurricane of the Microsoft Tech Day 2013 in Paris. Two smart guys were interviewed about Big Data and Innovation.

Why did they were elected ? The first because his name is Cyrille Vincey @cyrvin, founder of Qunb, a startup which which created a good Buzz during Paris Le Web, which offers some datavisualization services. And the second is Antoine Durieux the CEO Chef Jerome @Chef_Jerome. Interview was conducted by  PEG @pegobry.

What is big data ? In real real reality. Those guys set up the context of big data. Big data is not new. It is just the principle of storage a very laaarge amount of data and being able to perform some request on it. Nothing really new. even the algorithms use to manage the operations are old. The major skills required to be a significant player on this area is the capability to store on different servers in parallel a large amount of data, with a reasonable scalability.

As an exemple. Chef Jerome is a linker of big data : receipts on one hand, supermakert catalog on the other hand. Chef Jérôme makes the link for the user : offering him a aggregated view. The little magic behind is the following : receipts are analysed, scanned, cut into basic elements such as title, picture, list of components, weight, quantity, supermaket catalogs suffer the same and it is done. Regarding the number of data manipulated, this is what can be called big data.

Limitation. Calculation operated on large amount of data are complex. Why ? Because data are fragmented. Because data calculation takes a non-predictable time to be performed. Because calculation are done in a asynchronous way. And to make sure that you can use the data despite those obstacles, pre-calculation is used. Which presents the drawback to make real time impossible.

Solutions. Oversimplifying, there are kind of two technical types of solution. The distributed model based, running in several servers, asynchronous and thus pre-calculated which has the advantage to be open source. And the second model which is based on storage on a single machine, allowing real time, but which has the disadvantaged to be under license.
What has changed since everyone talks about big data ?

One. Start up can talk to investors and decision makers, as they feel they should belong to the history by joining the big data movement. But in reality, shhuuut, don’t tell them, but big data is just a statistical tool for the dummies.

Two. Companies using big data principles on their internal information system can now have support to make micro-decision. Big data may not help you to build your innovation and strategic 3 years-plan, but may help companies to fine tune some decisions, analyzing statistically some usage of the employees, of their customers…

Three. Some work is starting about prediction based on Big Data. By spoofing little signs, you may be able to predict some behavior, or some underground trends, or detect some demand from your customer that you do not address.

Four. There is an unlimited combination of trying to correlate data. Imagination does not have any limit, expect power processing, and silos between actors detaining the data.

The philosophical questions. If we believe that the big data tool will be deployed in all area of business. One should ask if this is really attractive for citizen (and their privacy), for example, in the health domain. Another question, is, when are we going to open the data from different sectors ? Last amazing observation. Why is that statisticians do not join the movement ? (well, we can guess they might feel uncomfortable with everyone starting to play with statistics).

Definitely a technology to monitor. Thanks again to two guys, who have been innovative and demystified for us the big data magic.

Note : Picture by h de c’s under creative common license.

