How to build trust into big data and AI: Share access
How to build trust into big data and AI: Share access
Should most datasets and AI technologies be made open source? Sharing access is the first of five ways to build trust into big data and AI. Anthony Buck’s weekly comment explores this in the fourth instalment of his series on trust in a world of big data and AI.
This article was written by Anthony Buck and reflects his personal analyses and opinions, rather than those of EARS.
How do you know that you are truly friends with another person? What is that point where you realise you really have reached a level of mutual trust? For me it is when I walk into their kitchen, grab a glass and get myself some water. Actually, it is probably when I feel comfortable raiding their fridge without asking. That is when I know we truly have built a trust, when they feel comfortable coming to my house and eating my food without asking and I feel the same. Access to the kitchen is shared freely.
This moment does not happen all at once. Rather it builds as friends get to know each other and offer access to the kitchen many times, some unprovoked. Sharing access builds trust, and eventually that trust just becomes part of the way the friendship works. That is the solution I am going to suggest today for how we can build trust into the way big data and AI works.
In previous comments here I have outlined why we need to build trust into big data and AI, especially as it relates to religion. First we asked: ‘Can religions trust Big Tech and governments with Big Data and AI?’ and then ‘Can religions trust themselves with big data and AI?’ So we have seen that the powerful digital technologies of big data and AI are terrifying in their scope and in their potential for misuse, whether they are in the hands of Big Tech (e.g. Google, Facebook, Amazon, etc.),[1] [2] governments,[3] [4] [5] or even religions.[6] [7] But not only is there an ongoing risk of abuse;[8] there has already been some concerning and unethical behaviour.[9] [10] [11] [12] [13] [14] [15] But in the third article, we turned to ask whether there was a better way forward than tacking trust on to digital technologies as an afterthought. I argue there is: we need to build trust directly into how big data and AI technologies work. There are five things we do to change the future of how the digital landscape operates. The first is to make them share access.
Why share access?
About 500 years ago, there was a similar situation to today. The average person in medieval Europe did not have access to the datasets of their day. They were known as books. The wealth of knowledge that had been collected over the centuries of human history were held not in server farms and data centers but in books. Even if an average person went to church where one might expect to at least find some of that knowledge shared, they often failed to truly be able to access that knowledge. There were three main reasons for this: 1) much if not all teaching, learning, writing, even religious services were in Latin not the local language;[16] 2) many people could not read in general;[17] 3) many books were still relatively expensive, because the printing press was still a relatively new technology and books available had been copied painstakingly by hand.[18] [19] [20] One of the main features of the Protestant reformation was the realisation that by keeping most knowledge inaccessible to the common person, a great imbalance of power had developed between the people who had the money and time to learn Latin and the common person. Their solution was to share access to that knowledge by translating the Bible into the common local language and conduct services and teaching also in that language, in addition to a strong emphasis on literacy for the everyday person.[21] In this sense, sharing access is not a strange new concept, but a tried and proven method to empower the everyday person against the abusive imbalances of power.
Why are all datasets not already open access?
This is perhaps my most mundane suggestion for building trust into big data and AI, in part because it is already being done. Making massive datasets widely available is an important part of building trust into big data. There are already some datasets aimed at wide distribution.[22] In fact, tech firms (e.g. Google,[23] [24] [25] Amazon,[26] Microsoft[27]), central governments (e.g. UK,[28] EU,[29] and USA[30] [31]), and NGOs (e.g. World Bank,[32] WHO,[33] UNICEF[34]), among other organisations (e.g. media,[35] universities[36] [37], the Church of England[38]) are making some of their massive data sets available to the public. Yet, while this is a great start, not all organisations are making their datasets public, nor are even those that are publishing datasets making all their datasets open to the public.
On the one hand, not all datasets should be made public, since some may contain sensitive personal information or since consent to use and publish the information is missing. On the other hand, many datasets are not kept private for these reasons, but on the basis of the fact that the dataset is the product that the company sells. Tech firms like Google and Facebook ultimately sell their datasets indirectly to advertisers who pay to have their ads go to targeted demographics.[39] [40]
But these are not the only possible reasons. Another, some might argue more legitimate, reason is the matter of proprietary information. Sometimes the data itself can contain information about how a technology functions that would enable anyone to copy it. Yet, a more shameful motivation for failing to publish is because the datasets themselves could carry evidence of unethical behaviour. A Google ethicist was fired, reportedly, after raising concerns after Google pulled a publication indicating there might be bias and environmental costs to consider with the company’s AI systems, with a second dismissal recently following.[41] [42] [43] [44]
GDPR is more about helping Big Tech get and keep your data and less the reverse
While official GDPR legislation does give every individual a set of rights over their data,[45] [46] as it stands these rights are more hypothetical than real in many senses. First, because the legislation mainly only applies in practice to for-profit business and not to governments.[47] Second, because the arrangement of rights assumes the priority of access should be not the individual but the organisation, that is, all things being equal the laws exist to regulate in a facilitating way the organisation to collect the information, as much as they give the individual rights over it. For example, individuals must always initiate the request for data related to them to be reported to them or deleted,[48] but this does not mean that the individual always is either aware that data has been collected or that they have rights.[49] [50] [51] And that does not mean the data has to be deleted from everywhere, just from their European sites.[52]
Moreover, since many digital technologies hold near monopolies in the global market and self-interested shaping force over what the basic structures of society are moving towards (digitalisation), they can effectively make any demands they want, and the average person has little choice.[53] How do you build trust into a system if access to information is weighted so heavily in favour of the powerful? Is not trust rendered irrelevant if there is only one game in town and you have to play it or face a burdensome daily existence?
Improve GDPR to protect the public from data and AI exploitation
Thus, to begin to build trust into big data and AI, we have to truly democratise access to these powerful datasets and technologies. Just being able to access the information is not the same as being able to use it. Knowing that someone is using your data is also not the same as knowing how and why they are using it. If Big Tech, governments, and religions want or ought to build trust into how these technologies work, then the first step is sharing access in a more rigorous and expansive way. Perhaps all datasets and the tools to read and use them need to be made by law available to the public. If one is going to have data on a person, on a group, then maybe there should be a requirement to make all that data easily available and accessible. Instead of having to request information on what data is held and how it is being used, perhaps every institution should have to send a weekly report to those whose data they have and are using, saying what data they have collected during that time and how they have been using it. Some might suggest this could be expensive in terms of time and/or money, but if these AI technologies are as powerful, efficient, and cost-effective as we have already seen, I doubt that is really the problem.
Let me suggest that these kinds of steps will rebalance the power of big data and AI back towards the public. It might cause a headache for tech firms, governments, businesses, banks, religious organisations, and others, but not because it is hard for them to have AI send out a weekly automated email with the breakdown data collected and how it has been used in that week.[54] Just consider all the marketing newsletters you already get. The problem is not that this would be expensive or even impossible. The problem is that it would be inconvenient, mainly for those marketing your data. I think most people when they see how they have become the products being sold or the sheer volume of information help, they will realise how great the chasm of trustworthiness has been. Trust is the best way forward and the EU sees it.[55] That is why the first step for building trust into big data and AI is to build into them sharing access.
500 years ago, sharing access to knowledge and giving the everyday person the ability to access and use it was how the Protestant reformers sought to democratise power during their revolution of information and communication. They achieved through increasing literacy and requiring church services and bibles to be in the local spoken language. Today’s digital revolution may not need to find a new answer as much as find the courage to reapply religion’s answer in a new age. It may not be popular with those with the power, just as the Reformation was not, but refusing to share access to datasets and AI technologies is like inventing the printing press and teaching no one to read and printing everything in Latin. Sharing access is only the first part of building trust into big data and AI, yet for the average person in a digital age sharing access is definitely something not to live without.
Next time, we will take up the theme of Sharing Development, and it is likely to be even more provocative than today’s on Sharing Access. Until then, I would love to hear in the comments how you might suggest building trust into these technologies. Oh, and go raid a friend’s fridge.
This article was written by Anthony Buck and reflects his personal analyses and opinions, rather than those of EARS.
Interested in similar topics? Go to our Dashboard and receive free updates.
[1] Algorithms drive online discrimination, academic warns
[2] Analysis | The Technology 202: Nuns are leading the charge to pressure Amazon to get religion on facial recognition
[3] Perspective | China’s alarming AI surveillance of Muslims should wake us up
[4] Opinion | China’s Orwellian War on Religion
[5] Your body is a passport – POLITICO
[6] Iglesias brasileñas adquieren tecnología de reconocimiento facial para controlar la asistencia y emociones de sus fieles
[7] Kuzzma Inteligência Artificial |
[8] EUROPEAN COMMISSION Brussels, 19.2.2020 COM(2020) 65 final WHITE PAPER On Artificial Intelligence – A European approach to excellence and trust.
[9] EU companies selling surveillance tools to China’s human rights abusers
[10] Sandra Wachter, ‘Affinity Profiling and Discrimination by Association in Online Behavioural Advertising’, Berkeley Technology Law Journal 35.2 (2020 Forthcoming). Accessed 16 Dec 2020. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3388639#
[11] Indonesian app to report ‘misguided’ religion ‘risks dividing neighbors’
[12] India’s founding values are threatened by sinister new forms of oppression | Madhav Khosla
[13] Margaret Mitchell: Google fires AI ethics founder
[14] Google fires second AI ethics researcher following internal investigation
[15] Google fires top AI ethicist
[17] The Reformation and education | Office of Religious Affairs
[18] H.E. Bell, ‘The Price Of Books In Medieval England’, The Library s4-17, no. 3 (Dec 1936): 312–332.
[19] The Art of the Book in the Middle Ages | Essay
[20] New Media New Knowledge – How the printing press led to a transformation of European thought
[21] On the Reformation’s 500th anniversary, remembering Martin Luther’s contribution to literacy
[22] Learn more about UK Biobank
[23] Open Images Dataset – opensource.google
[24] Datasets – Google Research
[25] Deepmind Research (which is owned by Google)
[26] Registry of Open Data on AWS
[27] Microsoft Azure Open Datasets
[28] Find open data – data.gov.uk
[29] Welcome | European Union Open Data Portal
[32] World Bank Open Data | Data
[33] UNICEF DATA – Child Statistics
[34] Global Health Observatory Data (GHO)
[36] https://www.uni-muenster.de/LODUM/
[37] UCI Machine Learning Repository
[38] Church of England – Research and Statistics
[39] Sandra Wachter, ‘Affinity Profiling and Discrimination by Association in Online Behavioural Advertising’, Berkeley Technology Law Journal 35.2 (2020 Forthcoming). Accessed 16 Dec 2020.
[40] Algorithms drive online discrimination, academic warns
[41] https://www.wired.com/story/behind-paper-led-google-researchers-firing/
[42] Margaret Mitchell: Google fires AI ethics founder
[43] Google fires second AI ethics researcher following internal investigation
[44] Google fires top AI ethicist
[45] Guide to the UK General Data Protection Regulation (UK GDPR)
[46] General Data Protection Regulation (GDPR) Compliance Guidelines
[47] ICO guide to GDPR: Exemptions
[48] Everything you need to know about the “Right to be forgotten”
[49] Personal Data Collection: The Complete WIRED Guide
[50]European Digital Rights (EDRi)
[51] Help Me Out – your digital rights – CBBC
[52] Google wins landmark right to be forgotten case
[53] Your Data Is Shared and Sold…What’s Being Done About It? – Knowledge@Wharton
[54] Consider the wealth of AI email assistants already on offer, including How to Use Intelligent Email Virtual Assistants in 2021
[55] Europe attempts to take leading role in regulating uses of AI