IDCC24 Lightning Talk Session

Tue, 20 Feb 2024 16:15:00 +0000

Open Music Europe is a Horizon Europe project that aims to build a working prototype of the planned European Music Observatory.

Interested in our automated data observatories? Let us meet in Edinburgh on the 18th International Digital Curation Conference and discuss who you could use our open-source, collaborative data infrastructure and know-how.

The EU, UN, or other international bodies have recognised or initiated at least 60 data observatories that carry out long-term data collection on various domains; we have not found any good policies or practices on how to place these observatories on data infrastructures that are interoperable towards open science and open government. We are creating a data management and governance model and a working MVP that coordinates data collection and statistical data production among scientific, private and official statistical actors.

Our most crucial pilot project wants to showcase a best practice for using privately-held data, i.e., data of music organisations and surveys carried out by scientific and business actors, to improve the quality of government statistics. We show how the guidelines on using private data as an ‘administrative data source’ and an ex-ante harmonisation of governmental surveys with open scientific surveys can result in high-quality datasets that fully complement the pre-existing official statistical products and commercial products.

As a coordination tool, we started developing a Data Management Plan to increase transparency from the outset. Apart from applying Horizon Europe’s OpenAIRE recommendations and FAIR requirements, we use the Open Policy Analysis Guidelines to bring open science transparency into the less standardised policy analysis area. We implement this following various UN/EU Guidelines on statistical production, creating a three-way reconciliation and interoperability, i.e., scientific research, public policy design and official statistics.

Click through to our working paper (available in PDF, epub, and html).

Our work contributes to sharing outputs earlier using Open Research platforms because we are building a framework supported by research automation that integrates open science, business, and official governmental data. We develop a software ecosystem complementing the R statistical environment and language, the lingua franca of official and scientific statistics, to make the data curation, pre-processing, processing, and eventual quality-controlled statistical data release open, transparent, and much timelier.

Our project follows an open collaboration framework that we design so that private music NGOs and enterprises, statistical offices and open science research groups can work together on the curation and design, production and release and use of data assets in the cultural domain. By opening the statistical infrastructure with our open-source production code and implementing the statistical data and metadata exchange standards simultaneously with other metadata standards and standardisation techniques like ex-ante and retrospective survey harmonisation, we hope to combine them in novel ways like never before while making them available sooner.

https://music.dataobservatory.eu/documents/open_music_europe/slovakia/slovak-cult-stat-pilot.html

Our showcase product will be a twin, linked open data resource: the Slovak Comprehensive Music Database. It will connect in unprecedented detail information about musical works and their sound recordings and notations in music libraries, heritage organisations and individual and collective rights management organisations. We will derive the Slovak Music Industry Registry from this linked open resource that we will convert into a structural business register satellite as an interface between the privately-held data of music management and music heritage institutions and the national/satellite account system of the Slovak Republic, particularly the Slovak Cultural and Creative Satellite Accounts.

Let’s get in touch if you are interested.

How We Add Value to Public Data With Better Curation And Documentation?

Mon, 08 Nov 2021 09:00:00 +0000

In this example, we show a simple indicator: the Turnover in Radio Broadcasting Enterprises in many European countries. This is an important demand driver in the Music economy pillar of our Digital Music Observatory, and important indicator in our more general Cultural & Creative Sectors and Industries Observatory. Of course, if you work with competition policy or antitrust, than any industry may be interesting to you–but not all of them are well-serverd with data.

This dataset comes from a public datasource, the data warehouse of the European statistical agency, Eurostat. Yet it is not trivial to use: unless you are familiar with national accounts, you will not find this dataset on the Eurostat website.

The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.

Our version of this statistical indicator is documented following the FAIR principles: our data assets are findable, accessible, interoperable, and reusable. While the Eurostat data warehouse partly fulfills these important data quality expectations, we can improve them significantly. And we can also improve the dataset, too, as we will show in the next blogpost.

Tartalomjegyzék

Findable Data

Our data observatories add value by curating the data–we bring this indicator to light with a more descriptive name, and we place it in a domain-specific context with our Digital Music Observatory and Cultural & Creative Sectors and Industries Observatory and a policy-specific context with our Competition Data Observatory and Green Deal Data Observatory. While many people may need this dataset in the creative sectors, or among cultural policy designers, most of them have no training in working with national accounts, which imply decyphering national account data codes in records that measure economic activity at a national level. Our curated data observatories bring together many available data around important domains. Our Digital Music Observatory, for example, aims to form an ecosystem of music data users and producers.

We added descriptive metadata that help you find our data and match it with other relevant data sources.

We added descriptive metadata that help you find our data and match it with other relevant data sources. For example, we add keywords and standardized metadata identifiers from the Library of Congress Linked Data Services, probably the world’s largest standardized knowledge library description. This ensures that you can find relevant data around the same key term ("Radio broadcasting") in addition to our turnover data. This allows connecting our dataset unambiguously with other information sources that use the same concept, but may be listed under different keywords, such as Radio–Broadcasting, or Radio industry and trade, or maybe Hörfunkveranstalter in German, or Emitiranje radijskog programa in Croatian or Actividades de radiodifusão in Portugese.

Accessible Data

Our data is accessible in two forms: in csv tabular format (which can be read with Excel, OpenOffice, Numbers, SPSS and many similar spreadsheet or statistical applications) and in JSON for automated importing into your databases. We can also provide our users with SQLite databases, which are fully functional, single user relational databases.

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This makes the data easier to clean, and far more easier to use in a much wider range of applications than the original data we used. In theory, this is a simple objective, yet we find that even governmental statistical agencies–and even scientific publications–often publish untidy data. This poses a significant problem that implies productivity loses: tidying data will require long hours of investment, and if a reproducible workflow is not used, data integrity can also be compromised: chances are that the process of tidying will overwrite, delete, or omit a data or a label.

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.

While the original data source, the Eurostat data warehouse is accessible, too, we added value with bringing the data into a tidy format. Tidy data can immediately be imported into a statistical application like SPSS or STATA, or into your own database. It is immediately available for plotting in Excel, OpenOffice or Numbers.

Interoperability

Our data can be easily imported with, or joined with data from other internal or external sources.

All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our API

All our indicators come with standardized descriptive metadata, following two important standards, the Dublin Core and DataCite–implementing not only the mandatory, but the recommended descriptions, too. This will make it far easier to connect the data with other data sources, e.g. turnover with the number of radio broadcasting enterprises or radio stations within specific territories.

Our passion for documentation standards and best practices goes much further: our data uses Statistical Data and Metadata eXchange standardized codebooks, unit descriptions and other statistical and administrative metadata.

Reuse

All our datasets come with standardized information about reusabililty. We add citation, attribution data, and licensing terms. Most of our datasets can be used without commercial restriction after acknowledging the source, but we sometimes work with less permissible data licenses.

In the case presented here, we added further value to encourage re-use. In addition to tidying, we significantly increased the usability of public data by handling missing cases. This is the subject of our next blogpost.

Are you a data user? How could we serve you better?

Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please get in touch with us!