<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Metadata | Antal Dániel honlapja</title><link>https://danielantal.eu/hu/tag/metadata/</link><atom:link href="https://danielantal.eu/hu/tag/metadata/index.xml" rel="self" type="application/rss+xml"/><description>Metadata</description><generator>Wowchemy (https://wowchemy.com)</generator><language>hu</language><lastBuildDate>Fri, 12 Sep 2025 15:00:00 +0200</lastBuildDate><image><url>https://danielantal.eu/media/icon_hub9491570ac57158c0eeecc95c95b13e5_20247_512x512_fill_lanczos_center_3.png</url><title>Metadata</title><link>https://danielantal.eu/hu/tag/metadata/</link></image><item><title>Workshop on Metadata Sharing for Small Labels, Libraries, and Collectors</title><link>https://danielantal.eu/hu/event/2025-09-12-magyarzenehaza/</link><pubDate>Fri, 12 Sep 2025 15:00:00 +0200</pubDate><guid>https://danielantal.eu/hu/event/2025-09-12-magyarzenehaza/</guid><description>&lt;p>Join us on &lt;strong>12 September 2025&lt;/strong> at the &lt;strong>House of Hungarian Music, Budapest&lt;/strong> for a hands-on workshop on how &lt;strong>small labels, music libraries, and private collectors&lt;/strong> can connect their catalogues and archives to the new Open Music Europe / OpenMusE data sharing space.&lt;/p>
&lt;p>We will show how the &lt;strong>Slovak and Hungarian music data spaces&lt;/strong> — federated through the &lt;strong>Open Music Observatory&lt;/strong> — make it easier to:&lt;/p>
&lt;ul>
&lt;li>Share and repair metadata across archives, libraries, labels, and streaming platforms&lt;/li>
&lt;li>Use identifiers (ISRC, ISWC, VIAF, etc.) to improve visibility and royalty distribution&lt;/li>
&lt;li>Manage voluntary deposits and digital surrogates for private collections&lt;/li>
&lt;li>Connect local catalogues to international platforms like &lt;strong>Spotify, YouTube, Wikidata, MusicBrainz&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h2 id="who-should-attend">Who should attend?&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Independent and small labels&lt;/strong> seeking better visibility in digital distribution&lt;/li>
&lt;li>&lt;strong>Music libraries and archives&lt;/strong> aiming for cross-platform metadata integration&lt;/li>
&lt;li>&lt;strong>Private collectors&lt;/strong> interested in digitising and sharing their holdings responsibly&lt;/li>
&lt;/ul>
&lt;h2 id="why-attend">Why attend?&lt;/h2>
&lt;ul>
&lt;li>Learn how open-source tools like &lt;strong>Wikibase&lt;/strong> make metadata sharing affordable and sustainable&lt;/li>
&lt;li>Discover how services such as &lt;strong>Unlabel&lt;/strong> help bring hidden catalogues into global circulation&lt;/li>
&lt;li>Network with peers from Hungary, Slovakia, and beyond who face similar challenges&lt;/li>
&lt;/ul>
&lt;div class="alert alert-note">
&lt;div>
The event language is &lt;strong>Hungarian&lt;/strong>, with support available in &lt;strong>English&lt;/strong>. Further reading:
&lt;a href="https://zenodo.org/records/14640180" target="_blank" rel="noopener">A szlovák adatkicserélési tér magyarországi föderációjának lehetőségei&lt;/a>;
&lt;a href="https://reprex.nl/event/2024-06-14_eltedh/" target="_blank" rel="noopener">Federating the Slovak Music Dataspace: Replication in Hungary&lt;/a>
&lt;/div>
&lt;/div>
&lt;h2 id="practical-details">Practical details&lt;/h2>
&lt;ul>
&lt;li>📅 &lt;strong>Date:&lt;/strong> Friday, 12 September 2025&lt;/li>
&lt;li>📍 &lt;strong>Location:&lt;/strong> House of Hungarian Music, Városliget, Budapest&lt;/li>
&lt;li>🕑 &lt;strong>Time:&lt;/strong> 10:00–16:00 (followed by informal networking)&lt;/li>
&lt;/ul>
&lt;p>Participation is free, but &lt;a href="https://docs.google.com/forms/d/1AovCrxFfxFpZUmH4yRzqQNbaQiJJTvi65BaGOUnjiC8/edit" target="_blank" rel="noopener">registration is required&lt;/a> as places are limited. The language of the event is Hungarian. We appreciate if you write on the registration form a few sentences about what you collect, what type of collections you manage, and what is your primary interest.&lt;/p>
&lt;p>👉 &lt;a href="#">Register here&lt;/a> (link to be added)&lt;/p></description></item><item><title>Federating Music Library Data in Hungary – A Call to Action</title><link>https://danielantal.eu/hu/post/2025-07-21-hu-music-collections/</link><pubDate>Mon, 21 Jul 2025 08:45:00 +0200</pubDate><guid>https://danielantal.eu/hu/post/2025-07-21-hu-music-collections/</guid><description>&lt;p>How can music library services be modernized to compete with platforms like Spotify, YouTube, or Apple Music Classical? How can we make it easy for music students, educators, amateurs, or professional musicians to find the sheet music of a piece that interests them? And how can schoolchildren explore the music of their town or region—with the help of local musicians, teachers, or librarians—given the limited financial resources of university and public libraries worldwide?&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>Our tools are open-source and free to test, and we are happy to support those interested in exploring them. We are also planning a meetup in Budapest at the end of August or beginning of September to discuss these ideas further with IAML Hungary members.&lt;/p>
&lt;p>🇭🇺 Olvasd magyarul el ezt a &lt;a href="https://danielantal.eu/documents/IAML-HU/IAML.html">bejegyzést&lt;/a> - Hungarian &lt;a href="https://danielantal.eu/documents/IAML-HU/IAML.html">version&lt;/a> of this post.&lt;/p>
&lt;/div>
&lt;/div>
&lt;p>Daniel Antal, co-founder of Reprex, originally presented at the IAML 2025 Congress
alongside librarian Anna Mester and Anna Žilková, head of IAML Slovakia.
Their presentation and poster explored the legal, organizational, and information science
aspects of the data-sharing infrastructure behind the
Slovak Comprehensive Music Database (&lt;a href="https://reprex.nl/project/skcmdb/" target="_blank" rel="noopener">SKCMDb&lt;/a>).
A year earlier, the Hungarian professional community encountered this work at
the &lt;em>Networkshop 2024: Digital Transformation of Education, Research, and Public Collections&lt;/em>
conference, in the talk and paper
titled &lt;a href="https://zenodo.org/records/14640180" target="_blank" rel="noopener">A szlovák adatkicserélési tér magyarországi föderációjának lehetőségei&lt;/a>
[&lt;em>Opportunities for Federating the Slovak Data Exchange Space in Hungary&lt;/em>].&lt;/p>
&lt;p>At the IAML Congress, we invited international partners to test, critique, and help develop a live demo service. Thanks to Salzburg’s geographical and cultural proximity, we were joined by many Hungarian colleagues.&lt;/p>
&lt;figure id="figure-the-reprex-team-is-explaining-our-collaborative-work-to-music-metadata-experts-in-salzburg-photo-anna-žilková">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The Reprex team is explaining our collaborative work to music metadata experts in Salzburg. Photo: Anna Žilková." srcset="
/media/jpg/2025/2025-07-08_IAML_poster_hu07bed0aef01bea6c76ac00e45963c6a4_422821_6170fc05bf3aad1612dcc57b6a029550.webp 400w,
/media/jpg/2025/2025-07-08_IAML_poster_hu07bed0aef01bea6c76ac00e45963c6a4_422821_924855189df74798c90efcd10e76a3c7.webp 760w,
/media/jpg/2025/2025-07-08_IAML_poster_hu07bed0aef01bea6c76ac00e45963c6a4_422821_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://danielantal.eu/media/jpg/2025/2025-07-08_IAML_poster_hu07bed0aef01bea6c76ac00e45963c6a4_422821_6170fc05bf3aad1612dcc57b6a029550.webp"
width="760"
height="507"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The Reprex team is explaining our collaborative work to music metadata experts in Salzburg. Photo: Anna Žilková.
&lt;/figcaption>&lt;/figure>
&lt;p>Our project’s ambitious goal is to make all music created within the territory of present-day Slovakia accessible through a semantic database. This includes a user-friendly graphical interface for individuals and API access for libraries and other institutions. The database connects known works and their variants, manuscript and published scores, and demo, archival, and commercial recordings. It also links composers and performers to secondary sources in libraries and archives that provide musicological context.&lt;/p>
&lt;p>In Slovakia, we are integrating materials from the Music Fund, the Slovak Music Centre, and the SOZA database (sister organization to Artisjus). This enables us to substantially improve the data quality of public music library catalogues, which often lack detailed, work-level descriptions of scores. Metadata is frequently incomplete, outdated, or even incorrect. For example, a catalogue may refer to a composer’s “collected works” or “selected sonatas,” but rarely indicate which edition contains a particular sonata or movement a listener may have just encountered on a streaming playlist.&lt;/p>
&lt;figure id="figure-our-poster-presented-by-daniel-antal-anna-márta-mester-librarian-data-steward-and-anna-žilková-chairperson-of-iaml-slovakia-on-8-july-2025-on-iaml-2025-you-can-download-our-poster-in-pdf-herehttpszenodoorgrecords15814286">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our poster presented by Daniel Antal, Anna Márta Mester (librarian-data steward) and Anna Žilková (chairperson of IAML Slovakia) on 8 July 2025 on IAML 2025. You can download our poster in PDF [here](https://zenodo.org/records/15814286)." srcset="
/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_01169f53faf069cb2891f0b0e3cd648a.webp 400w,
/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_1e6b7c9ba9c6aa5d98bc5e96b49da985.webp 760w,
/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_01169f53faf069cb2891f0b0e3cd648a.webp"
width="538"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Our poster presented by Daniel Antal, Anna Márta Mester (librarian-data steward) and Anna Žilková (chairperson of IAML Slovakia) on 8 July 2025 on IAML 2025. You can download our poster in PDF &lt;a href="https://zenodo.org/records/15814286" target="_blank" rel="noopener">here&lt;/a>.
&lt;/figcaption>&lt;/figure>
&lt;p>Our Salzburg presentation explored key questions:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>How does data governance work between private and public organizations—such as between a rights management society or streaming platform and a university or public library?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>How can we reduce redundant workflows through data exchange?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>How can deeper, semantically enriched search improve music library services?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What transferable insights from information and library science can help libraries, archives, museums, and private actors mutually improve their data?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The development of this system is supported by the European Union’s Horizon Europe programme, through the &lt;a href="https://openmuse.eu/" target="_blank" rel="noopener">Open Music Europe&lt;/a> project. Our aim is to offer a federated, decentralized alternative to the centralized but never built European Music Observatory model—one that builds on existing national, regional, and pan-European data systems in a networked way and leaves data ownership and control in place following the subsidiarity principle.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Our tools are open-source and free to test, and we are happy to support those interested in exploring them. We are also planning a meetup in Budapest at the end of August or beginning of September to discuss these ideas further with IAML Hungary members.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-sneak-peak-http13518191513007enhttp13518191513007en">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sneak peak: [http://135.181.91.51:3007/en/](http://135.181.91.51:3007/en/)" srcset="
/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_44f93947517fcf22ddbe6ef033eed7cf.webp 400w,
/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_2cb506361c74ce14745f98b518ae5453.webp 760w,
/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_44f93947517fcf22ddbe6ef033eed7cf.webp"
width="760"
height="723"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Sneak peak: &lt;a href="http://135.181.91.51:3007/en/" target="_blank" rel="noopener">http://135.181.91.51:3007/en/&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;/td>
&lt;p>If you care about interoperability, cultural equity, and the future of
library relevance in the streaming era—this is your moment to get involved.&lt;/p>
&lt;p>Missed &lt;code>IAML2025&lt;/code>?&lt;/br>
👉 &lt;a href="https://danielantal.eu/slides/20250707-reprex-iaml2025/">Presentation&lt;/a>&lt;/br>
👉 &lt;a href="https://zenodo.org/records/15814286" target="_blank" rel="noopener">Poster&lt;/a>&lt;/br>
👉 Please &lt;a href="https://reprex.nl/contact/" target="_blank" rel="noopener">contact us&lt;/a> directly to try out our system.&lt;/br>
👉 &lt;a href="https://danielantal.eu/post/2025-07-05-iaml-2025/">Blogpost&lt;/a> about the wider context of our work, relevant for IAML, not only national chapters and invidivual members.&lt;/br>&lt;/p></description></item><item><title>Help Us Build a Truly Inclusive European Music Observatory</title><link>https://danielantal.eu/hu/post/2025-07-05-iaml-2025/</link><pubDate>Thu, 19 Jun 2025 18:45:00 +0200</pubDate><guid>https://danielantal.eu/hu/post/2025-07-05-iaml-2025/</guid><description>&lt;p>Across Europe, music libraries are under pressure: greater expectations for
digital services, growing metadata burdens, and increasingly fragmented infrastructure.
At the same time, vital parts of our musical heritage—especially regional or
minority repertoires—remain hidden from search, discovery, and policy.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Please meet us at 👉 &lt;a href="https://danielantal.eu/event/2025-07-07-iaml2025/">IAML2025&lt;/a> in Salzburg on 7 or 8th July. Our presentation takes place in the session of
&lt;strong>Music Libraries of Tomorrow: Reaching out to Wider Audiences&lt;/strong> at the
Mozarteum University E.001 HS Thomas Bernhard room on 7 July 2025, 16:00–17:30.
The day after you can meet us in the Gallery for the poster session.
&lt;/div>
&lt;/div>
&lt;p>We initiated the Open Music Europe project, because we believe that in the music
ecosystem, data centralisation always fails, and a new kind of cooperation
is needed—one that respects local control while enabling international reuse.&lt;/p>
&lt;p>Our Slovak pilot, the &lt;a href="https://reprex.nl/project/skcmdb/" target="_blank" rel="noopener">SKCMDb&lt;/a>, connects libraries, music centres, rights organisations,
and platforms through a shared metadata backbone based on open ontologies.
Built as a national data sharing space, it enables coordinated cataloguing and
discovery across public and private systems—from streaming services and
printed scores to CD loans and digital archives.&lt;/p>
&lt;figure id="figure-please-visit-our-poster-and-talk-with-our-team-members-daniel-antal-anna-márta-mester-librarian-data-steward-and-anna-zilkova-chairperson-of-iaml-slovakia-on-8-july-2025-10301100-in-the-gallery-you-can-download-our-poster-in-pdf-herehttpszenodoorgrecords15814286">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Please visit our poster and talk with our team members, Daniel Antal, Anna Márta Mester (librarian-data steward) and Anna Zilkova (chairperson of IAML Slovakia) on 8 July 2025 10:30–11:00 in the Gallery. You can download our poster in PDF [here](https://zenodo.org/records/15814286)." srcset="
/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_01169f53faf069cb2891f0b0e3cd648a.webp 400w,
/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_1e6b7c9ba9c6aa5d98bc5e96b49da985.webp 760w,
/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/posters/IAML-reprex-poster-2025_hu29650834c1f466e3d24bbd103225d8fb_3179691_01169f53faf069cb2891f0b0e3cd648a.webp"
width="538"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Please visit our poster and talk with our team members, Daniel Antal, Anna Márta Mester (librarian-data steward) and Anna Zilkova (chairperson of IAML Slovakia) on 8 July 2025 10:30–11:00 in the Gallery. You can download our poster in PDF &lt;a href="https://zenodo.org/records/15814286" target="_blank" rel="noopener">here&lt;/a>.
&lt;/figcaption>&lt;/figure>
&lt;p>But we also know that cultural and music policy is not only national.
It is often regional, local, or community-based. That’s why we follow the principle
of subsidiarity: letting decisions and innovation happen at the lowest competent
level, close to the collections and communities themselves.&lt;/p>
&lt;p>Our &lt;a href="https://reprex.nl/project/finnougricdataspace/" target="_blank" rel="noopener">Finno-Ugric Data Sharing Space&lt;/a>, including the LīvMDb (Livonian Music Database),
shows how even the smallest communities—without formal cultural
infrastructure—can take part in high-quality metadata production and digital
discovery. We provide the tools and models to empower local custodians,
in their language, on their terms, and without the need for large institutional support.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-please-check-out-the-demo-version-of-the-finno-ugric-dataspacehttpsreprexbaseeufuindexphptitlemain_page-or-read-the-long-form-project-descriptionhttpsreprexnldocumentsfufu">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Please check out the demo version of the [Finno-Ugric Dataspace](https://reprexbase.eu/fu/index.php?title=Main_Page) or read the long-form [project description](https://reprex.nl/documents/fu/fu)." srcset="
/media/png/dataspace/finnougric/Finno-Ugric-Sampo-20250705_16x9_hudbcd3518b03f17b8a68d3531b004eb1f_965309_de77c0577d5b27f353d35e18a5ecd92d.webp 400w,
/media/png/dataspace/finnougric/Finno-Ugric-Sampo-20250705_16x9_hudbcd3518b03f17b8a68d3531b004eb1f_965309_137c852e1e9d796114cc57a570eda16f.webp 760w,
/media/png/dataspace/finnougric/Finno-Ugric-Sampo-20250705_16x9_hudbcd3518b03f17b8a68d3531b004eb1f_965309_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/png/dataspace/finnougric/Finno-Ugric-Sampo-20250705_16x9_hudbcd3518b03f17b8a68d3531b004eb1f_965309_de77c0577d5b27f353d35e18a5ecd92d.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Please check out the demo version of the &lt;a href="https://reprexbase.eu/fu/index.php?title=Main_Page" target="_blank" rel="noopener">Finno-Ugric Dataspace&lt;/a> or read the long-form &lt;a href="https://reprex.nl/documents/fu/fu" target="_blank" rel="noopener">project description&lt;/a>.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>Now we invite IAML members—national libraries, regional centres, municipal collections,
and independent music librarians—to join us in building a federated,
decentralised European Music Observatory. One that reflects Europe’s diversity.
One that reduces data curation costs and improves visibility.
One that connects music libraries with the open data and open science infrastructures
already transforming other sectors.&lt;/p>
&lt;p>Our platform is open-source, built on FAIR principles and the
European Interoperability Framework. We use tools like Wikibase, Blazegraph,
Sampo-UI, and R—packaged to work for libraries with limited technical capacity.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-sneak-peak-http13518191513007enhttp13518191513007en">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sneak peak: [http://135.181.91.51:3007/en/](http://135.181.91.51:3007/en/)" srcset="
/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_44f93947517fcf22ddbe6ef033eed7cf.webp 400w,
/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_2cb506361c74ce14745f98b518ae5453.webp 760w,
/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/png/skcmdb/skcmdb-library-access_huf3155c55aaf98b7cdd63ae10eaa747e0_109899_44f93947517fcf22ddbe6ef033eed7cf.webp"
width="760"
height="723"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Sneak peak: &lt;a href="http://135.181.91.51:3007/en/" target="_blank" rel="noopener">http://135.181.91.51:3007/en/&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;/td>
&lt;p>If you care about interoperability, cultural equity, and the future of
library relevance in the streaming era—this is your moment to get involved.&lt;/p>
&lt;p>Not present at &lt;code>IAML2025&lt;/code>?&lt;/br>
👉 &lt;a href="https://danielantal.eu/slides/20250707-reprex-iaml2025/">Presentation&lt;/a>&lt;/br>
👉 &lt;a href="https://zenodo.org/records/15814286" target="_blank" rel="noopener">Poster&lt;/a>&lt;/br>
👉 Please &lt;a href="https://reprex.nl/contact/" target="_blank" rel="noopener">contact us&lt;/a> directly.&lt;/p>
&lt;p>Let’s ensure music libraries remain vital entry points to Europe’s rich and evolving cultural soundscape.&lt;/p></description></item><item><title>Metadata Groundhog Day: What a Moribound Language Can Teach Spotify and Shopify</title><link>https://danielantal.eu/hu/post/2025-06-19-gazetteer/</link><pubDate>Thu, 19 Jun 2025 18:45:00 +0200</pubDate><guid>https://danielantal.eu/hu/post/2025-06-19-gazetteer/</guid><description>&lt;p>And if you want to fix these errors, you may find that you are back to the &lt;strong>Data Sisyphus&lt;/strong>.&lt;/p>
&lt;p>When you build systems in the cloud, or in your local architecture, at one point you will realise that naming things — places, people, products — or updating their whereabouts is probably the most time-consuming, most expensive, and most error-prone workflow.&lt;/p>
&lt;p>In this blogpost, we want to talk about what seems like the easiest part of a location: the name of the city, town, or village.&lt;/p>
&lt;h2 id="mazirbe-is-missing-again">Mazirbe Is Missing Again&lt;/h2>
&lt;p>We recently built a multilingual gazetteer — essentially a reconciled database of place names — for a tiny stretch of the Livonian coast in Latvia. At first glance, this might seem like a project rooted deeply in the digital humanities.&lt;/p>
&lt;p>But here’s the twist: the very same problems we tackled here are the ones plaguing the music industry, global e-commerce platforms, and enterprise software stacks.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-this-is-gross-irben--lielirbe--īra--irben--suur-irbenhttpsreprexbaseeufuitemq4429--familiar-with-rdf-see-in-ttlhttpsreprexbaseeufuspecialentitydataq4429ttl-klein-irben-will-be-near-gross-irben-and-irē-is-almost-īra">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="This is Gross-Irben 👉 [Lielirbe / Īra / Irben / Suur-Irben](https://reprexbase.eu/fu/Item:Q4429) 👉 [Familiar with RDF: see in TTL](https://reprexbase.eu/fu/Special:EntityData/Q4429.ttl); Klein-Irben will be near Gross-Irben, and Irē is almost Īra!" srcset="
/media/png/identifiers/geonames_lielirbe_2x1_hu2f3d53746350179bbf98c0f697c64400_111140_32eeb639bc704a3ad38976cf6b4a1e06.webp 400w,
/media/png/identifiers/geonames_lielirbe_2x1_hu2f3d53746350179bbf98c0f697c64400_111140_df330965d5c718527e1c1756ff9f6b0c.webp 760w,
/media/png/identifiers/geonames_lielirbe_2x1_hu2f3d53746350179bbf98c0f697c64400_111140_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/png/identifiers/geonames_lielirbe_2x1_hu2f3d53746350179bbf98c0f697c64400_111140_32eeb639bc704a3ad38976cf6b4a1e06.webp"
width="760"
height="380"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
This is Gross-Irben 👉 &lt;a href="https://reprexbase.eu/fu/Item:Q4429" target="_blank" rel="noopener">Lielirbe / Īra / Irben / Suur-Irben&lt;/a> 👉 &lt;a href="https://reprexbase.eu/fu/Special:EntityData/Q4429.ttl" target="_blank" rel="noopener">Familiar with RDF: see in TTL&lt;/a>; Klein-Irben will be near Gross-Irben, and Irē is almost Īra!
&lt;/figcaption>&lt;/figure>
&lt;/td>
&lt;p>Mazirbe is a small, big place. It definitely exists, and it is the cultural center of a small nation: the Livonians. Yet, when you are looking for clothing, music, or photographs that should come from Mazirbe in a relevant database, you often find nothing. Not even the place.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;h4 id="but-mazirbe-exists">But Mazirbe exists!&lt;/h4>
&lt;p>Depending on the record, it might appear as:&lt;/p>
&lt;ul>
&lt;li>Mazirbe (Latvian) • Irē (Livonian) • Мазирбе (Russian) • Klein-Irben (German) • Suur-Irben (Finnish-German hybrid) •
Мазирбе (Russian) • Mazirbė (Lithuanian)&lt;/li>
&lt;/ul>
&lt;td style="text-align: center;">
&lt;figure id="figure-meyers-zeitungsatlas-050--russland--gouvernement-sankt-petersburg-esthland-liefland-kurlandhttpsuploadwikimediaorgwikipediacommons004meyere28098s_zeitungsatlas_050_e28093_russland-_gouvernement_sankt_petersburg2c_esthland2c_liefland2c_kurlandjpg">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="[Meyer‘s Zeitungsatlas 050 – Russland- Gouvernement Sankt Petersburg, Esthland, Liefland, Kurland](https://upload.wikimedia.org/wikipedia/commons/0/04/Meyer%E2%80%98s_Zeitungsatlas_050_%E2%80%93_Russland-_Gouvernement_Sankt_Petersburg%2C_Esthland%2C_Liefland%2C_Kurland.jpg)" srcset="
/media/webp/identifiers/old_map_of_courland_hue01bb2cd2f71c02c57a3f3e8212ca966_977008_0c287770dd041e1a97efbed60791a980.webp 400w,
/media/webp/identifiers/old_map_of_courland_hue01bb2cd2f71c02c57a3f3e8212ca966_977008_9d4478bc71dd6069b90b91b75e983c5e.webp 760w,
/media/webp/identifiers/old_map_of_courland_hue01bb2cd2f71c02c57a3f3e8212ca966_977008_1200x1200_fit_q75_h2_lanczos_2.webp 1200w"
src="https://danielantal.eu/media/webp/identifiers/old_map_of_courland_hue01bb2cd2f71c02c57a3f3e8212ca966_977008_0c287770dd041e1a97efbed60791a980.webp"
width="760"
height="522"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://upload.wikimedia.org/wikipedia/commons/0/04/Meyer%E2%80%98s_Zeitungsatlas_050_%E2%80%93_Russland-_Gouvernement_Sankt_Petersburg%2C_Esthland%2C_Liefland%2C_Kurland.jpg" target="_blank" rel="noopener">Meyer‘s Zeitungsatlas 050 – Russland- Gouvernement Sankt Petersburg, Esthland, Liefland, Kurland&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;/td>
&lt;/div>
&lt;/div>
&lt;p>This kind of variation isn’t just a cultural footnote — it breaks databases, mismatches search results, and silently corrupts analytics.&lt;/p>
&lt;p>If you&amp;rsquo;re in music metadata, this is your &lt;strong>&amp;ldquo;JAY Z&amp;rdquo; vs. &amp;ldquo;Jay-Z&amp;rdquo; vs. &amp;ldquo;Shawn Carter&amp;rdquo;&lt;/strong> problem.&lt;/p>
&lt;p>If you&amp;rsquo;re in e-commerce, it’s &lt;strong>“Red Crewneck XXL” vs. “Crewneck, crimson, 2XL”&lt;/strong>.&lt;/p>
&lt;p>Same data structure. Same unresolved chaos.&lt;/p>
&lt;h2 id="a-gazetteer-that-works-like-real-life">A Gazetteer That Works Like Real Life&lt;/h2>
&lt;p>We created a semantic, multilingual, multiscript gazetteer for the Livonian coast. Each place entry includes:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>All known name variants across time, languages, and scripts&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Structured links to global authority services (Wikidata, VIAF, GeoNames)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Canonical IDs, multilingual labels, and machine-readable formats (RDF, TTL, etc.)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Context about administrative boundaries, historical changes, and source provenance&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Try us:&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure--mazirbe--irē--klein-irben--мазирбе--mazirbėhttpsreprexbaseeufuitemq4202--familiar-with-rdf-see-in-ttlhttpsreprexbaseeufuspecialentitydataq4202ttl">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="👉 [Mazirbe / Irē / Klein-Irben / Мазирбе / Mazirbė](https://reprexbase.eu/fu/Item:Q4202) 👉 [Familiar with RDF: see in TTL](https://reprexbase.eu/fu/Special:EntityData/Q4202.ttl)" srcset="
/media/png/identifiers/fuds_mazirbe_2x1_hue147fb84f0dcec7ccc29241d53d4804a_93374_a8f87b64beac38f2fa0357c52941c885.webp 400w,
/media/png/identifiers/fuds_mazirbe_2x1_hue147fb84f0dcec7ccc29241d53d4804a_93374_806e2facbbacdd3d0e7b238e4f7b504b.webp 760w,
/media/png/identifiers/fuds_mazirbe_2x1_hue147fb84f0dcec7ccc29241d53d4804a_93374_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://danielantal.eu/media/png/identifiers/fuds_mazirbe_2x1_hue147fb84f0dcec7ccc29241d53d4804a_93374_a8f87b64beac38f2fa0357c52941c885.webp"
width="760"
height="380"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
👉 &lt;a href="https://reprexbase.eu/fu/Item:Q4202" target="_blank" rel="noopener">Mazirbe / Irē / Klein-Irben / Мазирбе / Mazirbė&lt;/a> 👉 &lt;a href="https://reprexbase.eu/fu/Special:EntityData/Q4202.ttl" target="_blank" rel="noopener">Familiar with RDF: see in TTL&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;/td>
&lt;p>We published it using &lt;strong>Wikibase&lt;/strong> — the same technology that powers Wikidata. It&amp;rsquo;s not just a spreadsheet; it&amp;rsquo;s a small, dynamic knowledge graph.&lt;/p>
&lt;p>And we also put it into &lt;strong>BlazeGraph&lt;/strong>, so you can find all these villages — and also the music, the clothing, or photographs that come from them.&lt;/p>
&lt;h2 id="so-what">So What?&lt;/h2>
&lt;p>Here’s why this matters outside the northern shores of Kurzeme, or beyond the borders of Latvia:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>In global &lt;strong>supply chains&lt;/strong>, location names and vendor names drift constantly. While country boundaries are relatively stable, subnational boundary changes — counties, parishes, provinces, municipal borders — happen &lt;strong>thousands of times per year&lt;/strong>, even within Europe.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In &lt;strong>streaming metadata&lt;/strong>, artists get duplicated, misspelled, or transliterated inconsistently. It’s not unusual to find &lt;strong>dozens of same-named artists&lt;/strong> in a distributor’s or rights manager’s roster.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In &lt;strong>CRM systems&lt;/strong>, customers have multiple entries because of one diacritic. &lt;em>Irē&lt;/em> becomes &lt;em>Ire&lt;/em> if the user didn’t have &lt;code>ē&lt;/code> installed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In &lt;strong>museum heritage databases&lt;/strong> and &lt;strong>webshops&lt;/strong>, items disappear because their place of origin changed names three times since the accession record was created.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Our little example was created to accompany a digital humanities publication, but it&amp;rsquo;s &lt;strong>not just a “humanities” problem&lt;/strong>. It’s a &lt;strong>cross-sector, multilingual, historical, bureaucratic, data problem&lt;/strong>.&lt;/p>
&lt;p>And we’re all living in it.&lt;/p>
&lt;h2 id="lessons-we-took-away">Lessons We Took Away&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Don’t fight ambiguity. &lt;strong>Model it.&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Linked data models (RDF, Wikibase) handle aliases and variants with elegance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Small, local, curated vocabularies can scale conceptually to global systems.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Top-down standardization fails in diverse data ecosystems — &lt;strong>context wins&lt;/strong>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="see-it--fork-it--repurpose-it">See It / Fork It / Repurpose It&lt;/h2>
&lt;p>You can explore the full Livonian Gazetteer here:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Web UI: &lt;a href="https://reprexbase.eu/fu/Main_Page" target="_blank" rel="noopener">https://reprexbase.eu/fu/Main_Page&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>RDF example: &lt;a href="https://reprexbase.eu/fu/Special:EntityData/Q4429.ttl" target="_blank" rel="noopener">https://reprexbase.eu/fu/Special:EntityData/Q4429.ttl&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Also check out our &lt;a href="https://reprexbase.eu/textilebase/" target="_blank" rel="noopener">TextileBase&lt;/a> project — same model, but for 19th-century Latvian shirts and skirts&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>If your stack includes &lt;strong>messy location names, user-generated labels, non-English content, or legacy records&lt;/strong> — maybe this can help.&lt;/p>
&lt;p>And if you feel like you’ve seen this movie before… you have.&lt;/p>
&lt;p>It’s &lt;strong>Data Sisyphus&lt;/strong> all over again.&lt;/p>
&lt;p>👉 &lt;a href="https://reprex.nl/post/2021-07-08-data-sisyphus/" target="_blank" rel="noopener">https://reprex.nl/post/2021-07-08-data-sisyphus/&lt;/a>&lt;/p></description></item><item><title>Open Music Observatory Technical Report (Versioned)</title><link>https://danielantal.eu/hu/publication/2023_omo_report/</link><pubDate>Tue, 30 May 2023 00:00:00 +0000</pubDate><guid>https://danielantal.eu/hu/publication/2023_omo_report/</guid><description>&lt;h2 id="about-this-release">About this Release&lt;/h2>
&lt;p>This report presents the &lt;strong>first technical foundations&lt;/strong> of the Open Music Observatory.&lt;br>
It was written before consortium partners supplied their datasets and before &lt;strong>real data pipelines were stress-tested&lt;/strong>.&lt;/p>
&lt;p>The document outlines the Observatory’s architecture, data governance approach, and integration strategy, but it remains an &lt;strong>early edition&lt;/strong>.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Note:&lt;/strong> This version is preliminary and should not be cited as a final technical reference. A new, data-driven edition will be released in 2025 once the Observatory has been validated with live data from partners.
&lt;/div>
&lt;/div>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>The upcoming edition will integrate &lt;strong>real-world metadata, copyright, and economic indicators&lt;/strong>, stress-tested through operational pipelines, and will provide a more complete technical baseline for Europe’s music data space.&lt;/p></description></item><item><title>How We Add Value to Public Data With Better Curation And Documentation?</title><link>https://danielantal.eu/hu/post/2021-11-08-indicator_findable/</link><pubDate>Mon, 08 Nov 2021 09:00:00 +0000</pubDate><guid>https://danielantal.eu/hu/post/2021-11-08-indicator_findable/</guid><description>&lt;p>In this example, we show a simple indicator: the &lt;em>Turnover in Radio Broadcasting Enterprises&lt;/em> in many European countries. This is an important demand driver in the &lt;em>Music economy&lt;/em> pillar of our &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, and important indicator in our more general &lt;a href="https://ccsi.dataobservatory.eu/" target="_blank" rel="noopener">Cultural &amp;amp; Creative Sectors and Industries Observatory&lt;/a>. Of course, if you work with competition policy or antitrust, than any industry may be interesting to you&amp;ndash;but not all of them are well-serverd with data.&lt;/p>
&lt;p>This dataset comes from a public datasource, the data warehouse of the
European statistical agency, Eurostat. Yet it is not trivial to use:
unless you are familiar with national accounts, you will not find &lt;a href="https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_1a_se_r2&amp;amp;lang=en" target="_blank" rel="noopener">this dataset&lt;/a> on the Eurostat website.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-the-data-can-be-retrieved-from-the-annual-detailed-enterprise-statistics-for-services-nace-rev2-h-n-and-s95-eurostat-folder">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/blogposts_2021/eurostat_radio_broadcasting_turnover.png" alt="The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>Our version of this statistical indicator is documented following the &lt;a href="https://www.go-fair.org/fair-principles/" target="_blank" rel="noopener">FAIR principles&lt;/a>: our data assets
are findable, accessible, interoperable, and reusable. While the
Eurostat data warehouse partly fulfills these important data quality
expectations, we can improve them significantly. And we can also
improve the dataset, too, as we will show in the &lt;a href="https://danielantal.eu/post/2021-11-06-indicator_value_added/">next blogpost&lt;/a>.&lt;/p>
&lt;details class="toc-inpage d-print-none " open>
&lt;summary class="font-weight-bold">Tartalomjegyzék&lt;/summary>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#findable-data">Findable Data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#accessible-data">Accessible Data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#interoperability">Interoperability&lt;/a>&lt;/li>
&lt;li>&lt;a href="#reuse">Reuse&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;/details>
&lt;h2 id="findable-data">Findable Data&lt;/h2>
&lt;p>Our data observatories add value by curating the data&amp;ndash;we bring this
indicator to light with a more descriptive name, and we place it in a domain-specific context with our &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> and &lt;a href="https://ccsi.dataobservatory.eu/" target="_blank" rel="noopener">Cultural &amp;amp; Creative Sectors and Industries Observatory&lt;/a> and a policy-specific context with our &lt;em>Competition Data Observatory&lt;/em> and &lt;em>Green Deal Data Observatory&lt;/em>. While many people may need this dataset in the creative sectors, or among cultural policy designers, most of them have no training in working with
national accounts, which imply decyphering national account data codes in records that measure economic activity at a national level. Our curated data observatories bring together many available data around important domains. Our &lt;code>Digital Music Observatory&lt;/code>, for example, aims to form an ecosystem of music data users and producers.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-we-added-descriptive-metadatahttpszenodoorgrecord5652113yykvbwdmkuk-that-help-you-find-our-data-and-match-it-with-other-relevant-data-sources">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/blogposts_2021/zenodo_metadata_eurostat_radio_broadcasting_turnover.png" alt="We [added descriptive metadata](https://zenodo.org/record/5652113#.YYkVBWDMKUk) that help you find our data and match it with other relevant data sources." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
We &lt;a href="https://zenodo.org/record/5652113#.YYkVBWDMKUk" target="_blank" rel="noopener">added descriptive metadata&lt;/a> that help you find our data and match it with other relevant data sources.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>We added descriptive metadata that help you find our data and match it
with other relevant data sources. For example, we add keywords and
standardized metadata identifiers from the Library of Congress Linked
Data Services, probably the world’s largest standardized knowledge
library description. This ensures that you can find relevant data
around the same key term (&amp;quot;&lt;a href="https://id.loc.gov/authorities/subjects/sh85110448.html" target="_blank" rel="noopener">Radio broadcasting&lt;/a>&amp;quot;)
in addition to our turnover data. This allows connecting our dataset unambiguously
with other information sources that use the same concept, but may be listed under
different keywords, such as &lt;em>Radio–Broadcasting&lt;/em>, or &lt;em>Radio industry and
trade&lt;/em>, or maybe &lt;em>Hörfunkveranstalter&lt;/em> in German, or &lt;em>Emitiranje
radijskog programa&lt;/em> in Croatian or &lt;em>Actividades de radiodifusão&lt;/em> in
Portugese.&lt;/p>
&lt;h2 id="accessible-data">Accessible Data&lt;/h2>
&lt;p>Our data is accessible in two forms: in &lt;code>csv&lt;/code> tabular format (which can be
read with Excel, OpenOffice, Numbers, SPSS and many similar spreadsheet
or statistical applications) and in &lt;code>JSON&lt;/code> for automated importing into
your databases. We can also provide our users with SQLite databases,
which are fully functional, single user relational databases.&lt;/p>
&lt;p>Tidy datasets are easy to manipulate, model and visualize, and have a
specific structure: each variable is a column, each observation is a
row, and each type of observational unit is a table. This makes the data
easier to clean, and far more easier to use in a much wider range of
applications than the original data we used. In theory, this is a simple objective,
yet we find that even governmental statistical agencies&amp;ndash;and even scientific
publications&amp;ndash;often publish untidy data. This poses a significant problem that implies
productivity loses: tidying data will require long hours of investment, and if
a reproducible workflow is not used, data integrity can also be compromised:
chances are that the process of tidying will overwrite, delete, or omit a data or a label.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-tidy-datasetshttpsr4dshadconztidy-datahtml-are-easy-to-manipulate-model-and-visualize-and-have-a-specific-structure-each-variable-is-a-column-each-observation-is-a-row-and-each-type-of-observational-unit-is-a-table">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/blogposts_2021/tidy-8.png" alt="[Tidy datasets](https://r4ds.had.co.nz/tidy-data.html) are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://r4ds.had.co.nz/tidy-data.html" target="_blank" rel="noopener">Tidy datasets&lt;/a> are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>While the original data source, the Eurostat data warehouse is
accessible, too, we added value with bringing the data into a &lt;a href="https://www.jstatsoft.org/article/view/v059i10" target="_blank" rel="noopener">tidy
format&lt;/a>. Tidy data can
immediately be imported into a statistical application like SPSS or
STATA, or into your own database. It is immediately available for
plotting in Excel, OpenOffice or Numbers.&lt;/p>
&lt;h2 id="interoperability">Interoperability&lt;/h2>
&lt;p>Our data can be easily imported with, or joined with data from other internal or external sources.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-all-our-indicators-come-with-standardized-descriptive-metadata-and-statistical-processing-metadata-see-our-apihttpsapimusicdataobservatoryeudatabasemetadata">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/observatory_screenshots/DMO_API_metadata_table.png" alt="All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our [API](https://api.music.dataobservatory.eu/database/metadata/) " loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our &lt;a href="https://api.music.dataobservatory.eu/database/metadata/" target="_blank" rel="noopener">API&lt;/a>
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>All our indicators come with standardized descriptive metadata,
following two important standards, the &lt;a href="https://dublincore.org/" target="_blank" rel="noopener">Dublin Core&lt;/a> and
&lt;a href="https://datacite.org/" target="_blank" rel="noopener">DataCite&lt;/a>–implementing not only the mandatory,
but the recommended descriptions, too. This will make it far easier to
connect the data with other data sources, e.g. turnover with the number of radio broadcasting enterprises or radio stations within specific territories.&lt;/p>
&lt;p>Our passion for documentation standards and best practices goes much further: our data uses &lt;a href="https://sdmx.org/?page_id=3215/" target="_blank" rel="noopener">Statistical Data and Metadata eXchange&lt;/a> standardized codebooks, unit descriptions and other statistical and administrative metadata.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-we-participate-in-scientific-workhttpsreprexnlpublicationeuropean_visibilitiy_2021-related-to-data-interoperability">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/reports/european_visbility_publication.png" alt="We participate in [scientific work](https://reprex.nl/publication/european_visibilitiy_2021/) related to data interoperability." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
We participate in &lt;a href="https://reprex.nl/publication/european_visibilitiy_2021/" target="_blank" rel="noopener">scientific work&lt;/a> related to data interoperability.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;h2 id="reuse">Reuse&lt;/h2>
&lt;p>All our datasets come with standardized information about reusabililty.
We add citation, attribution data, and licensing terms. Most of our
datasets can be used without commercial restriction after acknowledging
the source, but we sometimes work with less permissible data licenses.&lt;/p>
&lt;p>In the case presented here, we added further value to encourage re-use. In addition to tidying, we significantly increased the usability of public data by handling
missing cases. This is the subject of our &lt;a href="https://danielantal.eu/post/2021-11-06-indicator_value_added/">next blogpost&lt;/a>.&lt;/p>
&lt;details class="spoiler " id="spoiler-6">
&lt;summary>Are you a data user? How could we serve you better?&lt;/summary>
&lt;p>&lt;em>Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please get in touch with &lt;a href="https://reprex.nl/#contact" target="_blank" rel="noopener">us&lt;/a>!&lt;/em>&lt;/p>
&lt;/details></description></item><item><title>How We Add Value to Public Data With Imputation and Forecasting</title><link>https://danielantal.eu/hu/post/2021-11-06-indicator_value_added/</link><pubDate>Mon, 08 Nov 2021 10:00:00 +0100</pubDate><guid>https://danielantal.eu/hu/post/2021-11-06-indicator_value_added/</guid><description>&lt;p>Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-in-the-previous-blogpostpost2021-11-08-indicator_findable-we-explained-how-we-added-value-by-documenting-data-following-the-fair-principle-and-with-the-professional-curatorial-work-of-placing-the-data-in-context-and-linking-it-to-other-information-sources-such-as-other-datasets-books-and-publications-regardless-of-their-natural-language-ie-whether-these-sources-are-described-in-english-german-portugese-or-croatian-photo-jack-sloophttpsunsplashcomphotoseywn81spkj8">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/blogposts_2021/jack-sloop-eYwn81sPkJ8-unsplash.jpg" alt="[In the previous blogpost](/post/2021-11-08-indicator_findable/) we explained how we added value by documenting data following the *FAIR* principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: [Jack Sloop](https://unsplash.com/photos/eYwn81sPkJ8)." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
&lt;a href="https://danielantal.eu/post/2021-11-08-indicator_findable/">In the previous blogpost&lt;/a> we explained how we added value by documenting data following the &lt;em>FAIR&lt;/em> principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: &lt;a href="https://unsplash.com/photos/eYwn81sPkJ8" target="_blank" rel="noopener">Jack Sloop&lt;/a>.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.&lt;/p>
&lt;details class="toc-inpage d-print-none " open>
&lt;summary class="font-weight-bold">Tartalomjegyzék&lt;/summary>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#why-is-data-missing">Why is data missing?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-can-we-improve">What can we improve?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#can-you-trust-our-data">Can you trust our data?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#avoid-the-data-sisyphus">Avoid the data Sisyphus&lt;/a>&lt;/li>
&lt;li>&lt;a href="#get-the-data">Get the data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#how-can-we-do-better">How can we do better?&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;/details>
&lt;h2 id="why-is-data-missing">Why is data missing?&lt;/h2>
&lt;p>International organizations offer many statistical products, but usually they are on an ‘as-is’ basis. For example, Eurostat is the world’s premiere statistical agency, but it has no right to overrule whatever data the member states of the European Union, and some other cooperating European countries give to them. And they cannot force these countries to hand over data if they fail to do so. As a result, there will be many data points that are missing, and often data points that have wrong (obsolete) descriptions or geographical dimensions. We will show the geographical aspect of the problem in a separate blogpost; for now, we only focus on missing data.&lt;/p>
&lt;p>Some countries have only recently started providing data to the Eurostat umbrella organization, and it is likely that you will find few datapoints for North Macedonia or Bosnia-Herzegovina. Other countries provide data with some delay, and the last one or two years are missing. And there are gaps in some countries’ data, too.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-see-the-authoritative-copy-of-the-datasethttpszenodoorgrecord5652118yykhvmdmkuk">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/blogposts_2021/trb_plot.png" alt="See the authoritative copy of the [dataset](https://zenodo.org/record/5652118#.YYkhVmDMKUk)." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
See the authoritative copy of the &lt;a href="https://zenodo.org/record/5652118#.YYkhVmDMKUk" target="_blank" rel="noopener">dataset&lt;/a>.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>This is a headache if you want to use the data in some machine learning application or in a multiple or panel regression model. You can, of course, discard countries or years where you do not have full data coverage, but this approach usually wastes too much information&amp;ndash;if you work with 12 years, and only one data point is available, you would be discarding an entire country’s 11-years’ worth of data. Another option is to estimate the values, or otherwise impute the missing data, when this is possible with reasonable precision. This is where things get tricky, and you will likely need a statistician or a data scientist onboard.&lt;/p>
&lt;h2 id="what-can-we-improve">What can we improve?&lt;/h2>
&lt;p>Consider that the data is only missing from one year for a particular country, 2015. The naive solution would be to omit 2015 or the country at hand from the dataset. This is pretty destructive, because we know a lot about the radio market turnover in this country and in this year! But leaving 2015 blank will not look good on a chart, and will make your machine learning application or your regression model stop.&lt;/p>
&lt;p>A statistician or a radio market expert will tell you that you know more-or-less the missing information: the total turnover was certainly not zero in that year. With some statistical or radio domain-specific knowledge you will use the 2014, or 2016 value, or a combination of the two and keep the country and year in the dataset.&lt;/p>
&lt;p>Our improved dataset added backcasted (using the best time series model fitting the country&amp;rsquo;s actually present data), forecasted (again, using the best time series model), and approximated data (using linear approximation.) In a few cases, we add the last or next known value. To give a few quantiative indicators about our work:&lt;/p>
&lt;ul>
&lt;li>Increased number of observations: 65%&lt;/li>
&lt;li>Reduced missing values: -48.1%&lt;/li>
&lt;li>Increased non-missing subset for regression or AI: +66.67%&lt;/li>
&lt;/ul>
&lt;p>If your organization is working with panel (longitudional multiple) regressions or various machine learning applications, then your team knows that not havint the +66.67% gain would be a deal-breaker in the choice of models and punctuality of estimates or KPIs or other quantiative products. And that they would spent about 90% of their data resources on achieving this +66.67% gain in usability.&lt;/p>
&lt;p>If you happen to work in an NGO, a business unit or a research institute that does not employ data scientists, then it is likely that you can never achieve this improvement, and you have to give up on a number of quantitative tools or visualizations. If you have a data scientist onboard, that professional can use our work as a starting point.&lt;/p>
&lt;h2 id="can-you-trust-our-data">Can you trust our data?&lt;/h2>
&lt;p>We believe that you can trust our data better than the original public source. We use statistical expertise to find out why data may be missing. Often, it is present in a wrong location (for example, the name of a region changed.)&lt;/p>
&lt;p>If you are reluctant to use estimates, think about discarding known actual data from your forecast or visualization, because one data point is missing. How do you provide more accurate information? By hiding known actual data, because one point is missing, or by using all known data and an estimate?&lt;/p>
&lt;p>Our codebooks and our API uses the &lt;a href="https://sdmx.org/?page_id=3215/" target="_blank" rel="noopener">Statistical Data and Metadata eXchange&lt;/a> documentation standards to clearly indicate which data is observed, which is missing, which is estimated, and of course, also how it is estimated.
This example highlights another important aspect of data trustworthiness. If you have a better idea, you can replace them with a better estimate.&lt;/p>
&lt;p>Our indicators come with standardized codebooks that do not only contain the descriptive metadata, but administrative metadata about the history of the indicator values. You will find very important information about the statistical method we used the fill in the data gaps, and even link the reliable, the peer-reviewed scientific, statistical software that made the calculations. For data scientists, we record the plenty of information about the computing environment, too-–this can come handy if your estimates need external authentication, or you suspect a bug.&lt;/p>
&lt;h2 id="avoid-the-data-sisyphus">Avoid the data Sisyphus&lt;/h2>
&lt;p>If you work in an academic institution, in an NGO or a consultancy, you can never be sure who downloaded the &lt;a href="https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_1a_se_r2&amp;amp;lang=en" target="_blank" rel="noopener">Annual detailed enterprise statistics for services (NACE Rev. 2 H-N and S95)&lt;/a> Eurostat folder from Eurostat. Did they modify the dataset? Did they already make corrections with the missing data? What method did they use? To prevent many potential problems, you will likely download it again, and again, and again&amp;hellip;&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-see-our-the-data-sisyphushttpsreprexnlpost2021-07-08-data-sisyphus-blogpost">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://danielantal.eu/img/blogposts_2021/Sisyphus_Bodleian_Library.png" alt="See our [The Data Sisyphus](https://reprex.nl/post/2021-07-08-data-sisyphus/) blogpost." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
See our &lt;a href="https://reprex.nl/post/2021-07-08-data-sisyphus/" target="_blank" rel="noopener">The Data Sisyphus&lt;/a> blogpost.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our &lt;a href="https://zenodo.org/record/5652118#.YYhGOGDMLIU" target="_blank" rel="noopener">regular backups&lt;/a> on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, in this case, &lt;a href="https://doi.org/10.5281/zenodo.5652118" target="_blank" rel="noopener">10.5281/zenodo.5652118&lt;/a>. These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor access to modify them.&lt;/p>
&lt;h2 id="get-the-data">Get the data&lt;/h2>
&lt;p>&lt;a href="https://doi.org/10.5281/zenodo.5652118" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://zenodo.org/badge/DOI/10.5281/zenodo.5652118.svg" alt="DOI" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p>
&lt;h2 id="how-can-we-do-better">How can we do better?&lt;/h2>
&lt;details class="spoiler " id="spoiler-4">
&lt;summary>Are you a data user?&lt;/summary>
&lt;p>&lt;em>Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please get in touch with &lt;a href="https://reprex.nl/#contact" target="_blank" rel="noopener">us&lt;/a>!&lt;/em>&lt;/p>
&lt;/details></description></item><item><title>Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies</title><link>https://danielantal.eu/hu/post/2021-02-13-european-visibility/</link><pubDate>Sat, 13 Feb 2021 18:10:00 +0200</pubDate><guid>https://danielantal.eu/hu/post/2021-02-13-european-visibility/</guid><description>&lt;p>The majority of music sales in the world is driven by AI-algorithm powered robots that create personalized playlists, recommendations and help programming radio music streams or festival lineups. It is critically important that an artist’s work is documented, described in a way that the algorithm can work with it.&lt;/p>
&lt;p>In our research paper – soon to be published – made for the Listen Local Initiative we found that 15% of Dutch, Estonian, Hungarian, or Slovak artists had no chance to be recommended, and they usually end up on &lt;a href="post/2020-11-17-recommendation-analysis/">Forgetify&lt;/a>, an app that lists never-played songs of Spotify. In another project with rights management organizations, we found that about half of the rightsholders are at risk of not getting all their royalties from the platforms because of poor documentation.&lt;/p>
&lt;p>But how come that distributors give streaming platforms songs that are not properly documented? What sort of information is missing for the European repertoire’s visibility? Reprex is exploring this problem in a practical cooperation with SOZA, the Slovak Performing and Mechanical Rights Society, and in an academic cooperation that involves leading researchers in the field. A manuscript co-authored Martin Senftleben, director of the &lt;a href="https://www.ivir.nl/" target="_blank" rel="noopener">Institute for Information Law&lt;/a> in Amsterdam, and eminent researchers in copyright law and music economics, Reprex’s co-founder makes the case that Europe must invest public money to resolve this problem, because in the current scenario, the documentation costs of a song exceed the expected income from streaming platforms.&lt;/p>
&lt;blockquote>
&lt;p>In the European Strategy for Data, the European Commission highlighted the EU’s ambition to acquire a leading role in the data economy. At the same time, the Commission conceded that the EU would have to increase its pools of quality data available for use and re-use. In the creative industries, this need for enhanced data quality and interoperability is particularly strong. Without data improvement, unprecedented opportunities for monetising the wide variety of EU creative and making this content available for new technologies, such as artificial intelligence training systems, will most probably be lost. The problem has a worldwide dimension. While the US have already taken steps to provide an integrated data space for music as of 1 January 2021, the EU is facing major obstacles not only in the field of music but also in other creative industry sectors. Weighing costs and benefits, there can be little doubt that new data improvement initiatives and sufficient investment in a better copyright data infrastructure should play a central role in EU copyright policy. A trade-off between data harmonisation and interoperability on the one hand, and transparency and accountability of content recommender systems on the other, could pave the way for successful new initiatives. &lt;a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3785272" target="_blank" rel="noopener">Download the manuscript from SSRN&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;p>Our &lt;a href="post/2020-12-17-demo-slovak-music-database/">Slovak Demo Music Database&lt;/a> project is a best example for this. We started systematically collect publicly available information from Slovak artists (in our write-in process) and ask them to give GDPR-protected further data (in our opt-in process) to create a comprehensive database that can help recommendation engines as well as market-targeting or educational AI apps.&lt;/p>
&lt;p>We believe that one of the problems of current AI algorithms that they solely or almost only work with English language documentation, putting other, particularly small language repertoires at risk of being buried below well-documented music mainly arriving from the United States.&lt;/p>
&lt;p>&lt;em>We are looking for rightsholders and their organizations, artists,
researchers to work with us to find out how we can increase the visibility of European music.&lt;/em>&lt;/p></description></item><item><title>Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies</title><link>https://danielantal.eu/hu/publication/european_visibilitiy_2022/</link><pubDate>Sat, 13 Feb 2021 11:00:00 +0000</pubDate><guid>https://danielantal.eu/hu/publication/european_visibilitiy_2022/</guid><description>&lt;p>This article, published in &lt;em>JIPITEC&lt;/em> in 2022, remains one of our most cited works on copyright, metadata, and cultural policy.&lt;/p>
&lt;p>The paper shows how &lt;strong>fragmented copyright metadata&lt;/strong> undermines the visibility of European creative works, causes &lt;strong>royalty losses&lt;/strong> for artists, and limits the ability of European industries to compete globally in emerging areas like &lt;strong>AI training&lt;/strong> and &lt;strong>recommender systems&lt;/strong>.&lt;/p>
&lt;p>Using the &lt;strong>music industry&lt;/strong> as a central case study, the article highlights why improved metadata and licensing infrastructures are vital. Its findings directly connect to our current projects on &lt;strong>trustworthy AI, cultural data spaces, and fair remuneration systems&lt;/strong>.&lt;/p>
&lt;p>📄 &lt;strong>Read the published version&lt;/strong> in JIPITEC: &lt;a href="https://www.jipitec.eu/jipitec/article/view/345/338" target="_blank" rel="noopener">Full text PDF&lt;/a>&lt;br>
📄 &lt;strong>Preprint version&lt;/strong> available on SSRN: &lt;a href="https://ssrn.com/abstract=3785272" target="_blank" rel="noopener">SSRN abstract&lt;/a>&lt;/p>
&lt;hr></description></item></channel></rss>