Data Security: A talk with Gregory Shapiro

Published July 22, 2014 |

George Hill

As a key protagonist in the Big Data movement, Gregory Shapiro’s website KDNuggets has become one of the most influential places to find new information and developments within data. As a prominent figure in the data revolution, he has seen himself and his website place on Forbes’ ‘Top Influencers in Big Data’.
Gregory was ahead of the times when he created KDNuggets (which stands for knowledge data nuggets) and has had his finger on the pulse of all things data for the past 10 years. I was interested in his ideas about how this has changed data privacy and the effects that it will have in the future.
When discussing these matters, despite being one of the faces of the Big Data revolution, Gregory manages to adopt a stance that is impartial and informed. He recognises that many people look at Big Data and the aspects of data privacy that surround it as a way forward in humanity, a greater way of understanding society as well as the aspects of technology that will push people forward. However, on the flip side he knows that there is a level that needs to be reached to balance privacy with useable data. When exploring these concepts with Gregory, one of the main elements that always comes across is his implicit knowledge of current trends, from general media coverage through to in depth scholarly work. It is this kind of thoroughness that has kept KDNuggets at the centre of Big Data communities for the past decade and this same level of detail is still in evidence today.
No discussion on the ethics of data privacy today would be relevant without the NSA scandal, and Gregory tells me that some recent work by students at Stanford University has proven that through the use of metadata it is possible to identify medical conditions, financial and legal connections, and even if somebody owns a gun. That seemingly ‘safe’ data can be used in this way shows how much information can be gathered on you without needing to type it in to Facebook or being aware of entering it on any other website.
As this kind of information can be gleamed from sources where only the metadata is seen, this could have a major impact on another emerging trend according to Gregory; The Internet of Things.
With the levels of information that people are relatively aware of (people know that when they send texts or make phone calls that regardless of who see’s it, data is being created) when people create data that they are not conscious of, it provides information that is potentially more corruptible and shows more about an individual. Fi t n e s s trackers, for instance, are worn frequently to track the general health of an individual as well as their levels of movement and the kind of activities that they perform. Some more advanced models even allow for heart rates to be monitored throughout the day. This is the kind of information that in the wrong hands can be more sensitive than knowing what you like to buy or the kind of places you like to go, it is tracking your life.
According to Gregory the use and storage of this kind of information needs to be closely monitored as well as used in ethical ways. If the information was somehow made public through unsecured storage then the chances of identification of things like medical issues and information from an individual’s private life could easily be found.
I was also interested in hearing Gregory’s opinions on the news that part of Google’s recent acquisition of Deepmind dictated that they start an ethics board that had powers to stop any project that was deemed unethical.
One of the interesting points that he made was that one of the primary uses of this board would be a tangent to the actual regulation of Google and Deepmind, instead it would allow for a body to define ‘what is evil?’. Although this may seem like a relatively ambiguous phrasing, it comes from Google’s overall mantra on the subject which is ‘don’t be evil’, after all, if there is no definition of what evil is, then how can people avoid it? It is the attempt to make technological morality universal rather than subjective.
Gregory believes that most major companies currently don’t care about ethics, but the minority of these companies can have a major impact. For instance, the US has systems in place that allow legal action to be taken against the majority if only a small minority actually voice a concern. A classic example of this being when AOL released anonymous data, that unintentionally allowed for one person in a huge data set to be identified. Although this was just a single individual in a silo of thousands, the PR received was a disaster.
This kind of unintentional data leak is of interest in other areas, and I was curious to know if the NSA revelations would have an effect on the ways in which companies collect and store data in the future. Gregory believes that it will have very little effect on commercial companies such as Facebook and Google. The kind of limitations that the Obama administration are putting into effect will hamper only unlawful government collection as opposed to wider data gathering activities. He even believes that the kind of information that these companies are not allowed to ask (race, sexual preferences, credit score etc) can be found out through the use of other information held on the individual. We are seeing companies holding more and more information on their customers and even their potential customers, Gregory believes that to increase trust ‘Big data requires transparency. Companies should allow people to see what companies know about them and give people some benefit from their data – this will reduce potential conflict’.
One of the elements that this requires and one that Gregory is keen to discuss, is the security of data once it has been collected. We are seeing that despite the majority of data being securely held, there are several high profile examples of where this hasn’t been the case. For instance, Target’s well publicised data breach where over 40 million customers had credit card information and passwords hacked and shared.
With this kind of well publicised data loss, I wanted to get Gregory’s opinion on whether this would be a catalyst that would see consumers stop sharing information as freely. He believes that this isn’t the case. The re-ality is that the majority of the information collected is not shared because people intrinsically want to, it is given to make the task they are undertaking easier. For instance, 1 click buy on Amazon requires that credit card numbers are held, and if there is an option to sign up for a new service by allowing for the information to be fed from Facebook rather than filling out a form, people are likely to just share their Facebook data. Gregory remains adamant that as long as people are ‘rewarded’ for sharing their information and that this creates additional convenience for them, then the amount of data shared will not drop.
One of the key points that ran through the entire conversation with Gregory was that despite the recent furore regarding the use of data and it’s protection, the levels of sharing will continue to increase. On discussing the NSA revelations, it is clear that changes need to be made in the way that some data and personal information is collected, but in reality this will be at national governmental level as opposed to commercial company levels. Despite being very pro-data for commercial uses, Gregory is also an advocate of increasing the accountability of companies who hold large data sets whilst also improving transparency with their customers.
As a leader in this field with a heavyweight voice in these matters and one of the most effective vehicles for communicating it to the data community, it is good to see that Gregory is taking these issues seriously, whilst actively encouraging others to do the same. He not only discussed this information with me but later sent me an email outlining many of the key points complete with references to many of the scholarly and popular articles that he discussed. It is an attention to detail that defines not only himself but the wider data community and it is this level of detail that will need to be looked at when moving these issues forward.
This article appeared on the 9th issue of Big Data Innovation Magazine. To download a copy, click here. Republished with permission.