Metadata | Antal Dániel honlapja

Workshop on Metadata Sharing for Small Labels, Libraries, and Collectors

Fri, 12 Sep 2025 15:00:00 +0200

Join us on 12 September 2025 at the House of Hungarian Music, Budapest for a hands-on workshop on how small labels, music libraries, and private collectors can connect their catalogues and archives to the new Open Music Europe / OpenMusE data sharing space.

We will show how the Slovak and Hungarian music data spaces — federated through the Open Music Observatory — make it easier to:

Share and repair metadata across archives, libraries, labels, and streaming platforms
Use identifiers (ISRC, ISWC, VIAF, etc.) to improve visibility and royalty distribution
Manage voluntary deposits and digital surrogates for private collections
Connect local catalogues to international platforms like Spotify, YouTube, Wikidata, MusicBrainz

Who should attend?

Independent and small labels seeking better visibility in digital distribution
Music libraries and archives aiming for cross-platform metadata integration
Private collectors interested in digitising and sharing their holdings responsibly

Why attend?

Learn how open-source tools like Wikibase make metadata sharing affordable and sustainable
Discover how services such as Unlabel help bring hidden catalogues into global circulation
Network with peers from Hungary, Slovakia, and beyond who face similar challenges

The event language is Hungarian, with support available in English. Further reading: A szlovák adatkicserélési tér magyarországi föderációjának lehetőségei; Federating the Slovak Music Dataspace: Replication in Hungary

Practical details

📅 Date: Friday, 12 September 2025
📍 Location: House of Hungarian Music, Városliget, Budapest
🕑 Time: 10:00–16:00 (followed by informal networking)

Participation is free, but registration is required as places are limited. The language of the event is Hungarian. We appreciate if you write on the registration form a few sentences about what you collect, what type of collections you manage, and what is your primary interest.

👉 Register here (link to be added)

Federating Music Library Data in Hungary – A Call to Action

Mon, 21 Jul 2025 08:45:00 +0200

How can music library services be modernized to compete with platforms like Spotify, YouTube, or Apple Music Classical? How can we make it easy for music students, educators, amateurs, or professional musicians to find the sheet music of a piece that interests them? And how can schoolchildren explore the music of their town or region—with the help of local musicians, teachers, or librarians—given the limited financial resources of university and public libraries worldwide?

Our tools are open-source and free to test, and we are happy to support those interested in exploring them. We are also planning a meetup in Budapest at the end of August or beginning of September to discuss these ideas further with IAML Hungary members.

🇭🇺 Olvasd magyarul el ezt a bejegyzést - Hungarian version of this post.

Daniel Antal, co-founder of Reprex, originally presented at the IAML 2025 Congress alongside librarian Anna Mester and Anna Žilková, head of IAML Slovakia. Their presentation and poster explored the legal, organizational, and information science aspects of the data-sharing infrastructure behind the Slovak Comprehensive Music Database (SKCMDb). A year earlier, the Hungarian professional community encountered this work at the Networkshop 2024: Digital Transformation of Education, Research, and Public Collections conference, in the talk and paper titled A szlovák adatkicserélési tér magyarországi föderációjának lehetőségei [Opportunities for Federating the Slovak Data Exchange Space in Hungary].

At the IAML Congress, we invited international partners to test, critique, and help develop a live demo service. Thanks to Salzburg’s geographical and cultural proximity, we were joined by many Hungarian colleagues.

The Reprex team is explaining our collaborative work to music metadata experts in Salzburg. Photo: Anna Žilková.

Our project’s ambitious goal is to make all music created within the territory of present-day Slovakia accessible through a semantic database. This includes a user-friendly graphical interface for individuals and API access for libraries and other institutions. The database connects known works and their variants, manuscript and published scores, and demo, archival, and commercial recordings. It also links composers and performers to secondary sources in libraries and archives that provide musicological context.

In Slovakia, we are integrating materials from the Music Fund, the Slovak Music Centre, and the SOZA database (sister organization to Artisjus). This enables us to substantially improve the data quality of public music library catalogues, which often lack detailed, work-level descriptions of scores. Metadata is frequently incomplete, outdated, or even incorrect. For example, a catalogue may refer to a composer’s “collected works” or “selected sonatas,” but rarely indicate which edition contains a particular sonata or movement a listener may have just encountered on a streaming playlist.

Our poster presented by Daniel Antal, Anna Márta Mester (librarian-data steward) and Anna Žilková (chairperson of IAML Slovakia) on 8 July 2025 on IAML 2025. You can download our poster in PDF here.

Our Salzburg presentation explored key questions:

How does data governance work between private and public organizations—such as between a rights management society or streaming platform and a university or public library?
How can we reduce redundant workflows through data exchange?
How can deeper, semantically enriched search improve music library services?
What transferable insights from information and library science can help libraries, archives, museums, and private actors mutually improve their data?
The development of this system is supported by the European Union’s Horizon Europe programme, through the Open Music Europe project. Our aim is to offer a federated, decentralized alternative to the centralized but never built European Music Observatory model—one that builds on existing national, regional, and pan-European data systems in a networked way and leaves data ownership and control in place following the subsidiarity principle.

Sneak peak: http://135.181.91.51:3007/en/

If you care about interoperability, cultural equity, and the future of library relevance in the streaming era—this is your moment to get involved.

Missed IAML2025?
👉 Presentation
👉 Poster
👉 Please contact us directly to try out our system.
👉 Blogpost about the wider context of our work, relevant for IAML, not only national chapters and invidivual members.

Help Us Build a Truly Inclusive European Music Observatory

Thu, 19 Jun 2025 18:45:00 +0200

Across Europe, music libraries are under pressure: greater expectations for digital services, growing metadata burdens, and increasingly fragmented infrastructure. At the same time, vital parts of our musical heritage—especially regional or minority repertoires—remain hidden from search, discovery, and policy.

Please meet us at 👉 IAML2025 in Salzburg on 7 or 8th July. Our presentation takes place in the session of Music Libraries of Tomorrow: Reaching out to Wider Audiences at the Mozarteum University E.001 HS Thomas Bernhard room on 7 July 2025, 16:00–17:30. The day after you can meet us in the Gallery for the poster session.

We initiated the Open Music Europe project, because we believe that in the music ecosystem, data centralisation always fails, and a new kind of cooperation is needed—one that respects local control while enabling international reuse.

Our Slovak pilot, the SKCMDb, connects libraries, music centres, rights organisations, and platforms through a shared metadata backbone based on open ontologies. Built as a national data sharing space, it enables coordinated cataloguing and discovery across public and private systems—from streaming services and printed scores to CD loans and digital archives.

Please visit our poster and talk with our team members, Daniel Antal, Anna Márta Mester (librarian-data steward) and Anna Zilkova (chairperson of IAML Slovakia) on 8 July 2025 10:30–11:00 in the Gallery. You can download our poster in PDF here.

But we also know that cultural and music policy is not only national. It is often regional, local, or community-based. That’s why we follow the principle of subsidiarity: letting decisions and innovation happen at the lowest competent level, close to the collections and communities themselves.

Our Finno-Ugric Data Sharing Space, including the LīvMDb (Livonian Music Database), shows how even the smallest communities—without formal cultural infrastructure—can take part in high-quality metadata production and digital discovery. We provide the tools and models to empower local custodians, in their language, on their terms, and without the need for large institutional support.

Please check out the demo version of the Finno-Ugric Dataspace or read the long-form project description.

Now we invite IAML members—national libraries, regional centres, municipal collections, and independent music librarians—to join us in building a federated, decentralised European Music Observatory. One that reflects Europe’s diversity. One that reduces data curation costs and improves visibility. One that connects music libraries with the open data and open science infrastructures already transforming other sectors.

Our platform is open-source, built on FAIR principles and the European Interoperability Framework. We use tools like Wikibase, Blazegraph, Sampo-UI, and R—packaged to work for libraries with limited technical capacity.

Sneak peak: http://135.181.91.51:3007/en/

If you care about interoperability, cultural equity, and the future of library relevance in the streaming era—this is your moment to get involved.

Not present at IAML2025?
👉 Presentation
👉 Poster
👉 Please contact us directly.

Let’s ensure music libraries remain vital entry points to Europe’s rich and evolving cultural soundscape.

Metadata Groundhog Day: What a Moribound Language Can Teach Spotify and Shopify

Thu, 19 Jun 2025 18:45:00 +0200

And if you want to fix these errors, you may find that you are back to the Data Sisyphus.

When you build systems in the cloud, or in your local architecture, at one point you will realise that naming things — places, people, products — or updating their whereabouts is probably the most time-consuming, most expensive, and most error-prone workflow.

In this blogpost, we want to talk about what seems like the easiest part of a location: the name of the city, town, or village.

Mazirbe Is Missing Again

We recently built a multilingual gazetteer — essentially a reconciled database of place names — for a tiny stretch of the Livonian coast in Latvia. At first glance, this might seem like a project rooted deeply in the digital humanities.

But here’s the twist: the very same problems we tackled here are the ones plaguing the music industry, global e-commerce platforms, and enterprise software stacks.

This is Gross-Irben 👉 Lielirbe / Īra / Irben / Suur-Irben 👉 Familiar with RDF: see in TTL; Klein-Irben will be near Gross-Irben, and Irē is almost Īra!

Mazirbe is a small, big place. It definitely exists, and it is the cultural center of a small nation: the Livonians. Yet, when you are looking for clothing, music, or photographs that should come from Mazirbe in a relevant database, you often find nothing. Not even the place.

But Mazirbe exists!

Depending on the record, it might appear as:

Mazirbe (Latvian) • Irē (Livonian) • Мазирбе (Russian) • Klein-Irben (German) • Suur-Irben (Finnish-German hybrid) • Мазирбе (Russian) • Mazirbė (Lithuanian)

Meyer‘s Zeitungsatlas 050 – Russland- Gouvernement Sankt Petersburg, Esthland, Liefland, Kurland

This kind of variation isn’t just a cultural footnote — it breaks databases, mismatches search results, and silently corrupts analytics.

If you’re in music metadata, this is your “JAY Z” vs. “Jay-Z” vs. “Shawn Carter” problem.

If you’re in e-commerce, it’s “Red Crewneck XXL” vs. “Crewneck, crimson, 2XL”.

Same data structure. Same unresolved chaos.

A Gazetteer That Works Like Real Life

We created a semantic, multilingual, multiscript gazetteer for the Livonian coast. Each place entry includes:

All known name variants across time, languages, and scripts
Structured links to global authority services (Wikidata, VIAF, GeoNames)
Canonical IDs, multilingual labels, and machine-readable formats (RDF, TTL, etc.)
Context about administrative boundaries, historical changes, and source provenance

Try us:

👉 Mazirbe / Irē / Klein-Irben / Мазирбе / Mazirbė 👉 Familiar with RDF: see in TTL

We published it using Wikibase — the same technology that powers Wikidata. It’s not just a spreadsheet; it’s a small, dynamic knowledge graph.

And we also put it into BlazeGraph, so you can find all these villages — and also the music, the clothing, or photographs that come from them.

So What?

Here’s why this matters outside the northern shores of Kurzeme, or beyond the borders of Latvia:

In global supply chains, location names and vendor names drift constantly. While country boundaries are relatively stable, subnational boundary changes — counties, parishes, provinces, municipal borders — happen thousands of times per year, even within Europe.
In streaming metadata, artists get duplicated, misspelled, or transliterated inconsistently. It’s not unusual to find dozens of same-named artists in a distributor’s or rights manager’s roster.
In CRM systems, customers have multiple entries because of one diacritic. Irē becomes Ire if the user didn’t have ē installed.
In museum heritage databases and webshops, items disappear because their place of origin changed names three times since the accession record was created.

Our little example was created to accompany a digital humanities publication, but it’s not just a “humanities” problem. It’s a cross-sector, multilingual, historical, bureaucratic, data problem.

And we’re all living in it.

Lessons We Took Away

Don’t fight ambiguity. Model it.
Linked data models (RDF, Wikibase) handle aliases and variants with elegance.
Small, local, curated vocabularies can scale conceptually to global systems.
Top-down standardization fails in diverse data ecosystems — context wins.

See It / Fork It / Repurpose It

You can explore the full Livonian Gazetteer here:

Web UI: https://reprexbase.eu/fu/Main_Page
RDF example: https://reprexbase.eu/fu/Special:EntityData/Q4429.ttl
Also check out our TextileBase project — same model, but for 19th-century Latvian shirts and skirts

If your stack includes messy location names, user-generated labels, non-English content, or legacy records — maybe this can help.

And if you feel like you’ve seen this movie before… you have.

It’s Data Sisyphus all over again.

👉 https://reprex.nl/post/2021-07-08-data-sisyphus/

Open Music Observatory Technical Report (Versioned)

Tue, 30 May 2023 00:00:00 +0000

About this Release

This report presents the first technical foundations of the Open Music Observatory.
It was written before consortium partners supplied their datasets and before real data pipelines were stress-tested.

The document outlines the Observatory’s architecture, data governance approach, and integration strategy, but it remains an early edition.

Note: This version is preliminary and should not be cited as a final technical reference. A new, data-driven edition will be released in 2025 once the Observatory has been validated with live data from partners.

Next Steps

The upcoming edition will integrate real-world metadata, copyright, and economic indicators, stress-tested through operational pipelines, and will provide a more complete technical baseline for Europe’s music data space.

How We Add Value to Public Data With Better Curation And Documentation?

Mon, 08 Nov 2021 09:00:00 +0000

In this example, we show a simple indicator: the Turnover in Radio Broadcasting Enterprises in many European countries. This is an important demand driver in the Music economy pillar of our Digital Music Observatory, and important indicator in our more general Cultural & Creative Sectors and Industries Observatory. Of course, if you work with competition policy or antitrust, than any industry may be interesting to you–but not all of them are well-serverd with data.

This dataset comes from a public datasource, the data warehouse of the European statistical agency, Eurostat. Yet it is not trivial to use: unless you are familiar with national accounts, you will not find this dataset on the Eurostat website.

The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.

Our version of this statistical indicator is documented following the FAIR principles: our data assets are findable, accessible, interoperable, and reusable. While the Eurostat data warehouse partly fulfills these important data quality expectations, we can improve them significantly. And we can also improve the dataset, too, as we will show in the next blogpost.

Tartalomjegyzék

Findable Data

Our data observatories add value by curating the data–we bring this indicator to light with a more descriptive name, and we place it in a domain-specific context with our Digital Music Observatory and Cultural & Creative Sectors and Industries Observatory and a policy-specific context with our Competition Data Observatory and Green Deal Data Observatory. While many people may need this dataset in the creative sectors, or among cultural policy designers, most of them have no training in working with national accounts, which imply decyphering national account data codes in records that measure economic activity at a national level. Our curated data observatories bring together many available data around important domains. Our Digital Music Observatory, for example, aims to form an ecosystem of music data users and producers.

We added descriptive metadata that help you find our data and match it with other relevant data sources.

We added descriptive metadata that help you find our data and match it with other relevant data sources. For example, we add keywords and standardized metadata identifiers from the Library of Congress Linked Data Services, probably the world’s largest standardized knowledge library description. This ensures that you can find relevant data around the same key term ("Radio broadcasting") in addition to our turnover data. This allows connecting our dataset unambiguously with other information sources that use the same concept, but may be listed under different keywords, such as Radio–Broadcasting, or Radio industry and trade, or maybe Hörfunkveranstalter in German, or Emitiranje radijskog programa in Croatian or Actividades de radiodifusão in Portugese.

Accessible Data

Our data is accessible in two forms: in csv tabular format (which can be read with Excel, OpenOffice, Numbers, SPSS and many similar spreadsheet or statistical applications) and in JSON for automated importing into your databases. We can also provide our users with SQLite databases, which are fully functional, single user relational databases.

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This makes the data easier to clean, and far more easier to use in a much wider range of applications than the original data we used. In theory, this is a simple objective, yet we find that even governmental statistical agencies–and even scientific publications–often publish untidy data. This poses a significant problem that implies productivity loses: tidying data will require long hours of investment, and if a reproducible workflow is not used, data integrity can also be compromised: chances are that the process of tidying will overwrite, delete, or omit a data or a label.

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.

While the original data source, the Eurostat data warehouse is accessible, too, we added value with bringing the data into a tidy format. Tidy data can immediately be imported into a statistical application like SPSS or STATA, or into your own database. It is immediately available for plotting in Excel, OpenOffice or Numbers.

Interoperability

Our data can be easily imported with, or joined with data from other internal or external sources.

All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our API

All our indicators come with standardized descriptive metadata, following two important standards, the Dublin Core and DataCite–implementing not only the mandatory, but the recommended descriptions, too. This will make it far easier to connect the data with other data sources, e.g. turnover with the number of radio broadcasting enterprises or radio stations within specific territories.

Our passion for documentation standards and best practices goes much further: our data uses Statistical Data and Metadata eXchange standardized codebooks, unit descriptions and other statistical and administrative metadata.

Reuse

All our datasets come with standardized information about reusabililty. We add citation, attribution data, and licensing terms. Most of our datasets can be used without commercial restriction after acknowledging the source, but we sometimes work with less permissible data licenses.

In the case presented here, we added further value to encourage re-use. In addition to tidying, we significantly increased the usability of public data by handling missing cases. This is the subject of our next blogpost.

Are you a data user? How could we serve you better?

Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please get in touch with us!

How We Add Value to Public Data With Imputation and Forecasting

Mon, 08 Nov 2021 10:00:00 +0100

Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.

In the previous blogpost we explained how we added value by documenting data following the FAIR principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: Jack Sloop.

Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.

Tartalomjegyzék

Why is data missing?

International organizations offer many statistical products, but usually they are on an ‘as-is’ basis. For example, Eurostat is the world’s premiere statistical agency, but it has no right to overrule whatever data the member states of the European Union, and some other cooperating European countries give to them. And they cannot force these countries to hand over data if they fail to do so. As a result, there will be many data points that are missing, and often data points that have wrong (obsolete) descriptions or geographical dimensions. We will show the geographical aspect of the problem in a separate blogpost; for now, we only focus on missing data.

Some countries have only recently started providing data to the Eurostat umbrella organization, and it is likely that you will find few datapoints for North Macedonia or Bosnia-Herzegovina. Other countries provide data with some delay, and the last one or two years are missing. And there are gaps in some countries’ data, too.

See the authoritative copy of the dataset.

This is a headache if you want to use the data in some machine learning application or in a multiple or panel regression model. You can, of course, discard countries or years where you do not have full data coverage, but this approach usually wastes too much information–if you work with 12 years, and only one data point is available, you would be discarding an entire country’s 11-years’ worth of data. Another option is to estimate the values, or otherwise impute the missing data, when this is possible with reasonable precision. This is where things get tricky, and you will likely need a statistician or a data scientist onboard.

What can we improve?

Consider that the data is only missing from one year for a particular country, 2015. The naive solution would be to omit 2015 or the country at hand from the dataset. This is pretty destructive, because we know a lot about the radio market turnover in this country and in this year! But leaving 2015 blank will not look good on a chart, and will make your machine learning application or your regression model stop.

A statistician or a radio market expert will tell you that you know more-or-less the missing information: the total turnover was certainly not zero in that year. With some statistical or radio domain-specific knowledge you will use the 2014, or 2016 value, or a combination of the two and keep the country and year in the dataset.

Our improved dataset added backcasted (using the best time series model fitting the country’s actually present data), forecasted (again, using the best time series model), and approximated data (using linear approximation.) In a few cases, we add the last or next known value. To give a few quantiative indicators about our work:

Increased number of observations: 65%
Reduced missing values: -48.1%
Increased non-missing subset for regression or AI: +66.67%

If your organization is working with panel (longitudional multiple) regressions or various machine learning applications, then your team knows that not havint the +66.67% gain would be a deal-breaker in the choice of models and punctuality of estimates or KPIs or other quantiative products. And that they would spent about 90% of their data resources on achieving this +66.67% gain in usability.

If you happen to work in an NGO, a business unit or a research institute that does not employ data scientists, then it is likely that you can never achieve this improvement, and you have to give up on a number of quantitative tools or visualizations. If you have a data scientist onboard, that professional can use our work as a starting point.

Can you trust our data?

We believe that you can trust our data better than the original public source. We use statistical expertise to find out why data may be missing. Often, it is present in a wrong location (for example, the name of a region changed.)

If you are reluctant to use estimates, think about discarding known actual data from your forecast or visualization, because one data point is missing. How do you provide more accurate information? By hiding known actual data, because one point is missing, or by using all known data and an estimate?

Our codebooks and our API uses the Statistical Data and Metadata eXchange documentation standards to clearly indicate which data is observed, which is missing, which is estimated, and of course, also how it is estimated. This example highlights another important aspect of data trustworthiness. If you have a better idea, you can replace them with a better estimate.

Our indicators come with standardized codebooks that do not only contain the descriptive metadata, but administrative metadata about the history of the indicator values. You will find very important information about the statistical method we used the fill in the data gaps, and even link the reliable, the peer-reviewed scientific, statistical software that made the calculations. For data scientists, we record the plenty of information about the computing environment, too-–this can come handy if your estimates need external authentication, or you suspect a bug.

Avoid the data Sisyphus

If you work in an academic institution, in an NGO or a consultancy, you can never be sure who downloaded the Annual detailed enterprise statistics for services (NACE Rev. 2 H-N and S95) Eurostat folder from Eurostat. Did they modify the dataset? Did they already make corrections with the missing data? What method did they use? To prevent many potential problems, you will likely download it again, and again, and again…

See our The Data Sisyphus blogpost.

We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our regular backups on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, in this case, 10.5281/zenodo.5652118. These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor access to modify them.

Get the data

How can we do better?

Are you a data user?

Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please get in touch with us!

Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies

Sat, 13 Feb 2021 18:10:00 +0200

The majority of music sales in the world is driven by AI-algorithm powered robots that create personalized playlists, recommendations and help programming radio music streams or festival lineups. It is critically important that an artist’s work is documented, described in a way that the algorithm can work with it.

In our research paper – soon to be published – made for the Listen Local Initiative we found that 15% of Dutch, Estonian, Hungarian, or Slovak artists had no chance to be recommended, and they usually end up on Forgetify, an app that lists never-played songs of Spotify. In another project with rights management organizations, we found that about half of the rightsholders are at risk of not getting all their royalties from the platforms because of poor documentation.

But how come that distributors give streaming platforms songs that are not properly documented? What sort of information is missing for the European repertoire’s visibility? Reprex is exploring this problem in a practical cooperation with SOZA, the Slovak Performing and Mechanical Rights Society, and in an academic cooperation that involves leading researchers in the field. A manuscript co-authored Martin Senftleben, director of the Institute for Information Law in Amsterdam, and eminent researchers in copyright law and music economics, Reprex’s co-founder makes the case that Europe must invest public money to resolve this problem, because in the current scenario, the documentation costs of a song exceed the expected income from streaming platforms.

In the European Strategy for Data, the European Commission highlighted the EU’s ambition to acquire a leading role in the data economy. At the same time, the Commission conceded that the EU would have to increase its pools of quality data available for use and re-use. In the creative industries, this need for enhanced data quality and interoperability is particularly strong. Without data improvement, unprecedented opportunities for monetising the wide variety of EU creative and making this content available for new technologies, such as artificial intelligence training systems, will most probably be lost. The problem has a worldwide dimension. While the US have already taken steps to provide an integrated data space for music as of 1 January 2021, the EU is facing major obstacles not only in the field of music but also in other creative industry sectors. Weighing costs and benefits, there can be little doubt that new data improvement initiatives and sufficient investment in a better copyright data infrastructure should play a central role in EU copyright policy. A trade-off between data harmonisation and interoperability on the one hand, and transparency and accountability of content recommender systems on the other, could pave the way for successful new initiatives. Download the manuscript from SSRN

Our Slovak Demo Music Database project is a best example for this. We started systematically collect publicly available information from Slovak artists (in our write-in process) and ask them to give GDPR-protected further data (in our opt-in process) to create a comprehensive database that can help recommendation engines as well as market-targeting or educational AI apps.

We believe that one of the problems of current AI algorithms that they solely or almost only work with English language documentation, putting other, particularly small language repertoires at risk of being buried below well-documented music mainly arriving from the United States.

We are looking for rightsholders and their organizations, artists, researchers to work with us to find out how we can increase the visibility of European music.

Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies

Sat, 13 Feb 2021 11:00:00 +0000

This article, published in JIPITEC in 2022, remains one of our most cited works on copyright, metadata, and cultural policy.

The paper shows how fragmented copyright metadata undermines the visibility of European creative works, causes royalty losses for artists, and limits the ability of European industries to compete globally in emerging areas like AI training and recommender systems.

Using the music industry as a central case study, the article highlights why improved metadata and licensing infrastructures are vital. Its findings directly connect to our current projects on trustworthy AI, cultural data spaces, and fair remuneration systems.

📄 Read the published version in JIPITEC: Full text PDF
📄 Preprint version available on SSRN: SSRN abstract