Why the Open Data and Algorithm Communities Need to Work Together

The rate at which every day life is being augmented by artificial intelligence continues to rapidly increase. Efforts to understand the impacts of this augmentation on society are underway but missing from these discussions were the voices from all parts of the creation of machine decision making. That’s why earlier this year, I brought together experts from the open data field, as well as the artificial intelligence policy sector to discuss what can be done to make sure that daily life isn’t negatively augmented beyond repair.

From the open data field, Center for Open Data Enterprise, Development Gateway, and Open Data Watch joined. From the AI and algorithm policy side, Access Now, World Wide Web Foundation, Upturn, the National Democratic Institute, Human Rights Watch, and the Center for Democracy and Technology participated.

Let’s back up first. For algorithms to learn they need one thing: data. The more data that goes in, the more accurate the algorithm can become. Except, that hasn’t always been the case. The issue with many of these data sets is they’re inaccurate, biased, or incomplete. Or, as has become a frequent phrase, “garbage in, garbage out”. If a government is training an algorithm to predict areas where there are higher infant mortality rates, yet the data they have grossly under-represents women, they are not going to get a very accurate portrayal, thereby wasting resources, money, and time. Thankfully there are open data organizations starting to push for more representative data, particularly around gender data.

Now we know that having accurate and representative data is one of the keys to having more responsible and accountable AI. This is why I brought the open data experts to the table to discuss what they have learned and what still needs to be done. According to the open data participants, whereas the first 5-10 years of the open data movement focused primarily on just opening up government datasets and transparency, now they are focusing on the actions of accountability and applying open data for public services.

But open data practitioners have learned that the data that is used has the potential to exacerbate inequality around the world. While over 50% of the world is now online, those that aren’t are often the poorest or, more often than not, women. This means that the majority of data that is being collected, whether online or through government, over-represents those who are wealthier and likely men. That inequality spreads to some of the discussions taking place around AI. Is there diversity in who is sharing their concerns and are those impacted allowed to have a voice?

This struck a chord with the AI policy experts in the room. Part of the issues that have also arisen in their work for them is redress and grievance mechanisms. Whereas redress is one of the core principles of the international human rights system, it is a bit tricky to implement when it comes to data and AI. Some of the potential barrier to redress that were mentioned in the meeting had to deal with a lack of knowledge: if there is a system for redress, are people even aware of it, and that they were affected negatively by AI? And who would you even complain to? Are companies or governments even under enough pressure to deal with redress? There are no clear pathways forward. Yet.

Openness of data and transparency of algorithm uses are essential to charting a less societally harmful path. The best and most efficient way to audit algorithms is if it uses open data. Then anyone can go in and test the veracity of the data to make sure it is representative and accurate. This is why it is important for the open data movement and the responsible AI movement to continue to work together and to continue to push for openness. During the meeting, the need for having feedback loops between those providing the data and those using it for algorithms became very apparent. Understanding the end use of the data would greatly impact how that data is being collected, just as understanding what data is being collected would impact how the algorithm is being built.

The responsible AI movement is going through the growing pains that the open data movement went through 10 years ago. What principles should we all adhere to? Should it be these or these or these? How can we hold governments and companies accountable? How can citizens have a voice and be represented accurately in their data? What does diversity and inclusion look like? It’s a wild time, full of uncertainty and change, but with continued collaboration and knowledge sharing between these two communities, I believe we might find some answers together.