The book Big Data written by Viktor Mayer- Schönberger and Kenneth Cukier, is the authors’ term for our new ability to control huge amounts of information, and draw conclusions from it. Since big data gives us a chance to investigate far more data than in the past, it will be possible for us to move away from expecting things to be accurate in which we will not be able to get hooked on causation. However, the book often over shadowed the idea of correlation being good enough, (the reason why you wouldn’t need causation) without explaining where it would come into play.
Tim Hartford’s article discussion, Big Data: Are we making a big mistake in FT Magazine states that “ a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down”. Whenever you're using analysis to do something like for example, to figure out how ice cream sales are in different seasons or (temperature) (as temperature goes up, ice cream sales go up) ( which is a real-world example) correlation is basically all you need. But that's because all you're trying to do is find out about something that already exists. If you want to use data for prediction or forecasting something, then correlation might not be sufficient enough for the answer you’re trying to get.
Every now and then in big data you have a great deal of data collected that you're not really sampling from a population anymore. You're actually measuring the whole masses. This is called the n=all approach, which Viktor Mayer- Schönberger talks about this a lot. Sometimes the best way to think of the population you have in front of you (in the real world) is just a model of all the possible ways that the world could have turned out. That is to say, there are times when even having all the data from the real world is only having a sample of the possible data.
The chapters toward the end of the book talks about privacy issues etc. This was a big take away and a slight contradiction in my opinion considering how the previous chapters in the beginning talks about how awesome “Big Data” is and how we should find a way to jump on the “Big Data” bandwagon.
Nevertheless, I particularly found the sections on Google's use of data for various purposes and the discussion of the value chain for the "big data" industry interesting in Viktor Mayer- Schönberger book. Businesses that are receiving this information are now trying to implement and make profits.
Just as companies are looking for these quick profits, Hartford explains that corporations don’t care about causation or sampling bias which makes Viktor Mayer- Schönberger questionable about the overall mission for “Big Data.” Is it just a new or not so new buzz word that people are hearing and are going “gaga” over because they don’t necessarily understand the thorough meaning behind it and information is just getting thrown at us so fast and so much that we just look to any type of opportunity or method to help us digest this ever changing digital world?
Overall, this is a good entry-level book for a novice like myself, it helped to understand the “meat and potatoes” behind Big Data, however, there is much more information that has to be discovered before people beyond entry-level to understand, there are just so many gray areas that has to be viewed before we all can concretely jump on this band wagon of Big Data.
References
Harford, T. (2014). Big data: Are we making a big mistake? Financial Times.
http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a- 00144feabdc0.html#axzz2ziUgQIoH>.
Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. New York: Houghton Mifflin Harcourt Publishing.
Tim Hartford’s article discussion, Big Data: Are we making a big mistake in FT Magazine states that “ a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down”. Whenever you're using analysis to do something like for example, to figure out how ice cream sales are in different seasons or (temperature) (as temperature goes up, ice cream sales go up) ( which is a real-world example) correlation is basically all you need. But that's because all you're trying to do is find out about something that already exists. If you want to use data for prediction or forecasting something, then correlation might not be sufficient enough for the answer you’re trying to get.
Every now and then in big data you have a great deal of data collected that you're not really sampling from a population anymore. You're actually measuring the whole masses. This is called the n=all approach, which Viktor Mayer- Schönberger talks about this a lot. Sometimes the best way to think of the population you have in front of you (in the real world) is just a model of all the possible ways that the world could have turned out. That is to say, there are times when even having all the data from the real world is only having a sample of the possible data.
The chapters toward the end of the book talks about privacy issues etc. This was a big take away and a slight contradiction in my opinion considering how the previous chapters in the beginning talks about how awesome “Big Data” is and how we should find a way to jump on the “Big Data” bandwagon.
Nevertheless, I particularly found the sections on Google's use of data for various purposes and the discussion of the value chain for the "big data" industry interesting in Viktor Mayer- Schönberger book. Businesses that are receiving this information are now trying to implement and make profits.
Just as companies are looking for these quick profits, Hartford explains that corporations don’t care about causation or sampling bias which makes Viktor Mayer- Schönberger questionable about the overall mission for “Big Data.” Is it just a new or not so new buzz word that people are hearing and are going “gaga” over because they don’t necessarily understand the thorough meaning behind it and information is just getting thrown at us so fast and so much that we just look to any type of opportunity or method to help us digest this ever changing digital world?
Overall, this is a good entry-level book for a novice like myself, it helped to understand the “meat and potatoes” behind Big Data, however, there is much more information that has to be discovered before people beyond entry-level to understand, there are just so many gray areas that has to be viewed before we all can concretely jump on this band wagon of Big Data.
References
Harford, T. (2014). Big data: Are we making a big mistake? Financial Times.
http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a- 00144feabdc0.html#axzz2ziUgQIoH>.
Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. New York: Houghton Mifflin Harcourt Publishing.