Trustworthy AI: Check Where the Machine Learning Algorithm is Learning From

We do care what our children learn, but we do not care yet about what our robots learn from

Join our Digital Music Observatory as a user, curator, developer or help building our business case

We do care what our children learn, but we do not care yet about what our robots learn from. One key idea behind trustworthy AI is that you verify what data sources your machine learning algorithms can learn from. As we have emphasised in our forthcoming academic paper and in our experiments, one key problem that goes wrong when you see too few small country artists, or too few womxn in the charts is that the big tech recommendation systems and other autonomous systems are learning from historically biased or patchy data.

This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations. In our work in Slovakia, we reverse engineered some of these undesirable outcomes. Our Slovak musicologist data curator, Dominika Semaňáková explains how we want to teach machine learning algorithms to learn more about Slovak music in her introductory interview.

A key mission of our Digital Music Observatory, which is our modern, subjective approach on how the future European Music Observatory should look like, is to not only to provide high-quality data on the music economy, the diversity of music, and the audience of music, but also on metadata. The quality and availability, interoperability of metadata (information about how the data should be used) is key to build trustworthy AI systems.

Traitors in a war used to be executed by firing squad, and it was a psychologically burdensome task for soldiers to have to shoot former comrades. When a 10-marksman squad fired 8 blank and 2 live ammunition, the traitor would be 100% dead, and the soldiers firing would walk away with a semblance of consolation in the fact they had an 80% chance of not having been the one that killed a former comrade. This is a textbook example of assigning responsibility and blame in systems. AI-driven systems such as the YouTube or Spotify recommendation systems, the shelf organization of Amazon books, or the workings of a stock photo agency come together through complex processes, and when they produce undesirable results, or, on the contrary, they improve life, it is difficult to assign blame or credit [..] If you do not see enough women on streaming charts, or if you think that the percentage of European films on your favorite streaming provider—or Slovak music on your music streaming service—is too low, you have to be able to distribute the blame in more precise terms than just saying “it’s the system” that is stacked up against women, small countries, or other groups. We need to be able to point the blame more precisely in order to effect change through economic incentives or legal constraints.

Assigning and avoding blame, read the earlier blogpost here.

This is precisely the type of work we are doing with the continued support of the Slovak national rightsholder organizations. In our work in Slovakia, we reverse engineered some of these undesirable outcomes. Popular video and music streaming recommendation systems have at least three major components based on machine learning. The problem is usually not that an algorithm is nasty and malicious; algorithms are often trained through “machine learning” techniques, and often, machines “learn” from biased, faulty, or low-quality information. Our Slovak musicologist data curator, Dominika Semaňáková explains how we want to teach machine learning algorithms to learn more about Slovak music in her introductory interview.

Read more about our Slovak music use case here.

These undesirable outcomes are sometimes illegal as they may go against non-discrimination or competition law. (See our ideas on what can go wrong – Music Streaming: Is It a Level Playing Field?) They may undermine national or EU-level cultural policy goals, media regulation, child protection rules, and fundamental rights protection against discrimination without basis. They may make Slovak artists earn significantly less than American artists.

In our academic (pre-print) paper we argue for new regulatory considerations to create a better, and more accountable playing field for deploying algorithms in a quasi-autonomous system, and we suggest further research to align economic incentives with the creation of higher quality and less biased metadata. The need for further research on how these large systems affect various fundamental rights, consumer or competition rights, or cultural and media policy goals cannot be overstated.

Incentives and investments into metadata

The first step is to open and understand these autonomous systems, and this is our mission with the Digital Music Observatory: it is a fully automated, open source, open data observatory that links public datasets in order to provide a comprehensive view of the European music industry. It produces key business and policy indicators, and research experiment data following the data pillars laid out in the Feasibility study for the establishment of a European Music Observatory.

Join our Digital Music Observatory as a user, curator, developer or help building our business case.

Join our open collaboration Music Data Observatory team as a data curator, developer or business developer. More interested in antitrust, innovation policy or economic impact analysis? Try our Economy Data Observatory team! Or your interest lies more in climate change, mitigation or climate action? Check out our Green Deal Data Observatory team!

Read More on Data & Lyrics

This blogpost appeared originally on our blog, Data & Lyrics.

Related