<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>metadata | Green Deal Data Observatory</title>
    <link>https://greendeal.dataobservatory.eu/tag/metadata/</link>
      <atom:link href="https://greendeal.dataobservatory.eu/tag/metadata/index.xml" rel="self" type="application/rss+xml" />
    <description>metadata</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Wed, 29 Jun 2022 08:12:00 +0100</lastBuildDate>
    <image>
      <url>https://greendeal.dataobservatory.eu/media/icon_hu15ef3b829c0a4063327dbf09185a10cc_70008_512x512_fill_lanczos_center_3.png</url>
      <title>metadata</title>
      <link>https://greendeal.dataobservatory.eu/tag/metadata/</link>
    </image>
    
    <item>
      <title>stacodelists: use standard, language-independent variable codes to help international data interoperability and machine reuse in R</title>
      <link>https://greendeal.dataobservatory.eu/post/2022-06-29-statcodelists/</link>
      <pubDate>Wed, 29 Jun 2022 08:12:00 +0100</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2022-06-29-statcodelists/</guid>
      <description>&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-visit-the-documentation-website-of-statcodelists-on-statcodelistsdataobservatoryeuhttpsstatcodelistsdataobservatoryeu&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Visit the documentation website of statcodelists on [statcodelists.dataobservatory.eu/](https://statcodelists.dataobservatory.eu/).&#34; srcset=&#34;
               /media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_0b514d80337ede30bff4c26cee6a6f11.webp 400w,
               /media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_1416f7a0950b1cecac8097850d995432.webp 760w,
               /media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2022/statcodelists_website_huef7e1379be389a62e3a47c5a8502e55c_102481_0b514d80337ede30bff4c26cee6a6f11.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Visit the documentation website of statcodelists on &lt;a href=&#34;https://statcodelists.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;statcodelists.dataobservatory.eu/&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;!-- badges: start --&gt;
&lt;p&gt;&lt;a href=&#34;https://dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;















&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;https://img.shields.io/badge/ecosystem-dataobservatory.eu-3EA135.svg&#34; alt=&#34;dataobservatory&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;!-- badges: end --&gt;
&lt;p&gt;The goal of &lt;code&gt;statcodelists&lt;/code&gt; is to promote the reuse and exchange of statistical information and related metadata with making the internationally standardized SDMX code lists available for the R user. SDMX – the &lt;a href=&#34;https://sdmx.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; has been published as an ISO International Standard (ISO 17369). The metadata definitions, including the codelists are updated regularly according to the standard. The authoritative version of the code lists made available in this package is &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://sdmx.org/?page_id=3215/&lt;/a&gt;.&lt;/p&gt;
&lt;details class=&#34;spoiler &#34;  id=&#34;spoiler-1&#34;&gt;
  &lt;summary&gt;Click to expand table of contents of the post&lt;/summary&gt;
  &lt;p&gt;&lt;details class=&#34;toc-inpage d-print-none  &#34; open&gt;
  &lt;summary class=&#34;font-weight-bold&#34;&gt;Table of Contents&lt;/summary&gt;
  &lt;nav id=&#34;TableOfContents&#34;&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href=&#34;#purpose&#34;&gt;Purpose&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#installation&#34;&gt;Installation&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href=&#34;#code-of-conduct&#34;&gt;Code of Conduct&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/nav&gt;
&lt;/details&gt;
&lt;/p&gt;
&lt;/details&gt;
&lt;h2 id=&#34;purpose&#34;&gt;Purpose&lt;/h2&gt;
&lt;p&gt;Cross-domain concepts in the SDMX framework describe concepts relevant to many, if not all, statistical domains. SDMX recommends using these concepts whenever feasible in SDMX structures and messages to promote the reuse and exchange of statistical information and related metadata between organisations.&lt;/p&gt;
&lt;p&gt;Code lists are predefined sets of terms from which some statistical coded concepts take their values. SDMX cross-domain code lists are used to support cross-domain concepts. What are these cross-domain coded concepts?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Geographical codes, like &lt;code&gt;NL&lt;/code&gt;:  the Netherlands in the &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_AREA.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_AREA&lt;/a&gt; code list.&lt;/li&gt;
&lt;li&gt;Standard industry codes &lt;code&gt;J631&lt;/code&gt; for Data processing, hosting and related activities in Europe. (&lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_ACTIVITY_NACE2.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NACE Rev 2&lt;/a&gt; in Europe, beware, it is &lt;code&gt;J592&lt;/code&gt;in Australia and New Zealand, see &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_ACTIVITY_ANZSIC06.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_ACTIVITY_ANZSIC06&lt;/a&gt;.)&lt;/li&gt;
&lt;li&gt;Occupations, like &lt;code&gt;OC2521&lt;/code&gt; for &lt;code&gt;Database designers and administrators&lt;/code&gt; in &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_OCCUPATION.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_OCCUPATIONS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Time fomatting standards, like &lt;code&gt;CCYY&lt;/code&gt; for annual data series in &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/CL_TIME_FORMAT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CL_TIME_FORMAT&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Check out the available codlists on the &lt;a href=&#34;https://statcodelists.dataobservatory.eu/reference/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;package homepage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The use of common code lists will help users to work even more efficiently, easing the maintenance of and reducing the need for mapping systems and interfaces delivering data and metadata to them. A very obvious advantage of using the code systems is that you can retrieve data from national sources indifferent of the natural language used in North Macedonia, Japan, the U.S. or the Netherlands. While the data labels may change to be locally human-readable, computers and geeks can read the codes and understand them immediately. Provided that they use the standard codes.&lt;/p&gt;
&lt;p&gt;Our data observatories are rolling out SDMX coding across all datasets to help data ingestion and interoperability, data findability and data reuse. &lt;code&gt;statcodelists&lt;/code&gt; can help the use of standard SDMX codes in your R workflow&amp;ndash;both for downloading data from statistical agencies and to produce publication-ready datasets that the rest of the world (and even APIs) will understand.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation&lt;/h2&gt;
&lt;p&gt;You can install &lt;code&gt;statcodelists&lt;/code&gt; from CRAN:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;install.packages(&amp;#34;statcodelists&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Further recommended code values for expressing general statistical concepts like &lt;code&gt;not applicable&lt;/code&gt;, etc., can be found in section &lt;code&gt;Generic codes&lt;/code&gt; of the &lt;a href=&#34;https://sdmx.org/?page_id=4345&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Guidelines for the creation and management of SDMX Cross-Domain Code Lists&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further codelists used by reliable statistical agency but not harmonized on SDMX level please consult the &lt;a href=&#34;https://registry.sdmx.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SDMX Global Registry&lt;/a&gt; &lt;a href=&#34;https://registry.sdmx.org/items/codelist.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Codelists&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;The creator of this package is not affiliated with SDMX, and this package was has not been endorsed by SDMX.&lt;/p&gt;
&lt;h2 id=&#34;code-of-conduct&#34;&gt;Code of Conduct&lt;/h2&gt;
&lt;p&gt;Please note that the &lt;code&gt;statcodelists&lt;/code&gt; project is released with a &lt;a href=&#34;https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Contributor Code of Conduct&lt;/a&gt;. By contributing to this project, you agree to abide by its terms.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How We Add Value to Public Data With Imputation and Forecasting?</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_value_added/</link>
      <pubDate>Mon, 08 Nov 2021 10:00:00 +0100</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_value_added/</guid>
      <description>&lt;p&gt;Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-in-the-previous-blogpost-we-explained-how-we-added-value-with-documenting-the-data-following-the-fair-principle-and-with-the-professional-curatorial-work-of-placing-the-data-in-context-and-linking-it-to-other-information-sources-that-are-not-depending-on-the-english-language-and-can-connect-our-radio-dataset-to-other-data-books-publications-regardless-if-they-are-described-in-english-or-in-german-or-slovak-photo-atmospheric-research-observatory-south-pole-antarctica-photo-noaahttpsunsplashcomphotoswwvd4wxrx38&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;In the previous blogpost we explained how we added value with documenting the data following the *FAIR* principle and with the professional curatorial work of placing the data in context, and linking it to other information sources that are not depending on the English language, and can connect our radio dataset to other data, books, publications, regardless if they are described in English, or in German, or Slovak. Photo: Atmospheric Research Observatory, South Pole, Antarctica Photo: [NOAA](https://unsplash.com/photos/WWVD4wXRX38).&#34; srcset=&#34;
               /media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited_huc1de598e48bcf2ca9302064c36ee3048_2297404_13a19cc7308f7f90fb71ae2c524e8fe6.webp 400w,
               /media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited_huc1de598e48bcf2ca9302064c36ee3048_2297404_4c70859ff3bfdb7160714dc07c4d5305.webp 760w,
               /media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited_huc1de598e48bcf2ca9302064c36ee3048_2297404_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited_huc1de598e48bcf2ca9302064c36ee3048_2297404_13a19cc7308f7f90fb71ae2c524e8fe6.webp&#34;
               width=&#34;760&#34;
               height=&#34;504&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      In the previous blogpost we explained how we added value with documenting the data following the &lt;em&gt;FAIR&lt;/em&gt; principle and with the professional curatorial work of placing the data in context, and linking it to other information sources that are not depending on the English language, and can connect our radio dataset to other data, books, publications, regardless if they are described in English, or in German, or Slovak. Photo: Atmospheric Research Observatory, South Pole, Antarctica Photo: &lt;a href=&#34;https://unsplash.com/photos/WWVD4wXRX38&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NOAA&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.&lt;/p&gt;
&lt;h2 id=&#34;why-is-data-missing&#34;&gt;Why is data missing?&lt;/h2&gt;
&lt;p&gt;International organizations offer many statistical products, but usually they are on an ‘as-is’ basis. For example, Eurostat is the world’s premiere statistical agency, but it has no right to overrule whatever data the member states of the European Union, and some other cooperating European countries give to them. And they cannot force these countries to hand over data if they fail to do so. As a result, there will be many data points that are missing, and often data points that have wrong (obsolete) descriptions or geographical dimensions. We will show the geographical aspect of the problem in a separate blogpost; for now, we only focus on missing data.&lt;/p&gt;
&lt;p&gt;Some countries have only recently started providing data to the Eurostat umbrella organization, and it is likely that you will find few datapoints for North Macedonia or Bosnia-Herzegovina. Other countries provide data with some delay, and the last one or two years are missing. And there are gaps in some countries’ data, too.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-see-the-authoritative-copy-of-the-datasethttpszenodoorgrecord4775787yyqevmdmliu&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;See the authoritative copy of the [dataset](https://zenodo.org/record/4775787#.YYqevmDMLIU).&#34; srcset=&#34;
               /media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_a4f175ef26eb4fd64901b7fec564a2d4.webp 400w,
               /media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_99b295653ecf8ec6dbf89153a188c1fa.webp 760w,
               /media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_a4f175ef26eb4fd64901b7fec564a2d4.webp&#34;
               width=&#34;760&#34;
               height=&#34;507&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See the authoritative copy of the &lt;a href=&#34;https://zenodo.org/record/4775787#.YYqevmDMLIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataset&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;This is a headache if you want to use the data in some machine learning application or in a multiple or panel regression model. You can, of course, discard countries or years where you do not have full data coverage, but this approach usually wastes too much information&amp;ndash;if you work with 12 years, and only one data point is available, you would be discarding an entire country’s 11-years’ worth of data. Another option is to estimate the values, or otherwise impute the missing data, when this is possible with reasonable precision. This is where things get tricky, and you will likely need a statistician or a data scientist onboard.&lt;/p&gt;
&lt;h2 id=&#34;what-can-we-improve&#34;&gt;What can we improve?&lt;/h2&gt;
&lt;p&gt;Consider that the data is only missing from one year for a particular country, 2015. The naive solution would be to omit 2015 or the country at hand from the dataset. This is pretty destructive, because we know a lot about the R&amp;amp;D allocations in this country and in this year! But leaving 2015 blank will not look good on a chart, and will make your machine learning application or your regression model stop.&lt;/p&gt;
&lt;p&gt;A statistician or an innovation expert will tell you that you know more-or-less the missing information: the total allocation was most likely not zero in that year.  With some statistical or innovation, or public finance specific knowledge you will use the 2014, or 2016 value, or a combination of the two and keep the country and year in the dataset.&lt;/p&gt;
&lt;p&gt;Our improved dataset added backcasted (using the best time series model fitting the country&amp;rsquo;s actually present data), forecasted (again, using the best time series model), and approximated data (using linear approximation.) In a few cases, we add the last or next known value.  To give a few quantiative indicators about our work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increased number of observations: 29.2%&lt;/li&gt;
&lt;li&gt;Reduced missing values: -26.4%&lt;/li&gt;
&lt;li&gt;Increased non-missing subset for regression or AI: +64.7%&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your organization is working with panel (longitudional multiple) regressions or various machine learning applications, then your team knows that not havint the +66.67% gain would be a deal-breaker in the choice of models and punctuality of estimates or KPIs or other quantiative products. And that they would spent about 90% of their data resources on achieving this +66.67% gain in usability.&lt;/p&gt;
&lt;p&gt;If you happen to work in an NGO, a business unit or a research institute that does not employ data scientists, then it is likely that you can never achieve this improvement, and you have to give up on a number of quantitative tools or visualizations. If you  have a data scientist onboard, that professional can use our work as a starting point.&lt;/p&gt;
&lt;h2 id=&#34;can-you-trust-our-data&#34;&gt;Can you trust our data?&lt;/h2&gt;
&lt;p&gt;We believe that you can trust our data better than the original public source. We use statistical expertise to find out why data may be missing. Often, it is present in a wrong location (for example, the name of a region changed.)&lt;/p&gt;
&lt;p&gt;If you are reluctant to use estimates, think about discarding known actual data from your forecast or visualization, because one data point is missing.  How do you provide more accurate information? By hiding known actual data, because one point is missing, or by using all known data and an estimate?&lt;/p&gt;
&lt;p&gt;Our codebooks and our API uses the &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; documentation standards to clearly indicate which data is observed, which is missing, which is estimated, and of course, also how it is estimated.
This example highlights another important aspect of data trustworthiness. If you have a better idea, you can replace them with a better estimate.&lt;/p&gt;
&lt;p&gt;Our indicators come with standardized codebooks that do not only contain the descriptive metadata, but administrative metadata about the history of the indicator values. You will find very important information about the statistical method we used the fill in the data gaps, and even link the reliable, the peer-reviewed scientific, statistical software that made the calculations. For data scientists, we record the plenty of information about the computing environment, too-–this can come handy if your estimates need external authentication, or you suspect a bug.&lt;/p&gt;
&lt;h2 id=&#34;avoid-the-data-sisyphus&#34;&gt;Avoid the data Sisyphus&lt;/h2&gt;
&lt;p&gt;If you work in an academic institution, in an NGO or a consultancy, you can never be sure who downloaded the &lt;a href=&#34;http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=gba_nabsfin07&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GBARD by socioeconomic objectives (NABS 2007)&lt;/a&gt; Eurostat folder from Eurostat. Did they modify the dataset? Did they already make corrections with the missing data? What method did they use? To prevent many potential problems, you will likely download it again, and again, and again&amp;hellip;&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-see-our-the-data-sisyphushttpsreprexnlpost2021-07-08-data-sisyphus-blogpost&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;See our [The Data Sisyphus](https://reprex.nl/post/2021-07-08-data-sisyphus/) blogpost.&#34; srcset=&#34;
               /media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_cd48a6c374c9ff68a08abe79a6abf2f4.webp 400w,
               /media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_a6eb1b13ff33a5c73aba34550964ff52.webp 760w,
               /media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_cd48a6c374c9ff68a08abe79a6abf2f4.webp&#34;
               width=&#34;760&#34;
               height=&#34;507&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See our &lt;a href=&#34;https://reprex.nl/post/2021-07-08-data-sisyphus/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The Data Sisyphus&lt;/a&gt; blogpost.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our &lt;a href=&#34;https://zenodo.org/record/5652118#.YYhGOGDMLIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regular backups&lt;/a&gt; on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, in this case, &lt;a href=&#34;https://doi.org/10.5281/zenodo.5661169&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;10.5281/zenodo.5661169&lt;/a&gt;. These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor to modify them.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.5661169&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;















&lt;figure  &gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.5661169.svg&#34; alt=&#34;DOI&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;/figure&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you a data user? Give us some feedback! Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please  give us any &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feedback&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How We Add Value to Public Data With Better Curation And Documentation?</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_findable/</link>
      <pubDate>Mon, 08 Nov 2021 09:00:00 +0100</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_findable/</guid>
      <description>&lt;p&gt;In this example, we show a simple indicator: the &lt;em&gt;Government Budget Allocations for R&amp;amp;D in Environment&lt;/em&gt; in many European countries. (In our &lt;em&gt;Digital Music Observatory&lt;/em&gt; we give a more relevant &lt;a href=&#34;https://music.dataobservatory.eu/post/2021-11-08-indicator_findable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;example&lt;/a&gt; about the turnover of the radio industry in Europe.)&lt;/p&gt;
&lt;p&gt;This dataset comes from a public datasource, the data warehouse of the
European statistical agency, Eurostat. Yet it is not trivial to use:
unless you are familiar with the &lt;em&gt;nomenclature for the analysis and comparison of scientific programmes and budgets&lt;/em&gt; or the &lt;a href=&#34;https://www.oecd.org/sti/frascati-manual-2015-9789264239012-en.htm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Frascati Manual&lt;/a&gt;, you will probably not find &lt;a href=&#34;http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=gba_nabsfin07&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this dataset&lt;/a&gt; on the Eurostat website.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-the-raw-data-can-be-retrieved-gbard-by-socioeconomic-objectives-nabs-2007gba_nabsfin07-eurostat-folder-if-you-find-it&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;The raw data can be retrieved GBARD by socioeconomic objectives (NABS 2007)[gba_nabsfin07] Eurostat folder (if you find it.)&#34; srcset=&#34;
               /media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_a4f175ef26eb4fd64901b7fec564a2d4.webp 400w,
               /media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_99b295653ecf8ec6dbf89153a188c1fa.webp 760w,
               /media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/gbard_environment_expenditure_plot_hu092519695c5c8c0c293bf2a5eeefe580_292114_a4f175ef26eb4fd64901b7fec564a2d4.webp&#34;
               width=&#34;760&#34;
               height=&#34;507&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      The raw data can be retrieved GBARD by socioeconomic objectives (NABS 2007)[gba_nabsfin07] Eurostat folder (if you find it.)
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Our version of this statistical indicator is documented following the &lt;a href=&#34;https://www.go-fair.org/fair-principles/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR principles&lt;/a&gt;: our data assets
are findable, accessible, interoperable, and reusable. While the
Eurostat data warehouse partly fulfills these important data quality
expectations, we can improve them significantly. And we can also
improve the dataset, too, as we will show in the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-06-indicator_value_added/&#34;&gt;next blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;findable-data&#34;&gt;Findable Data&lt;/h2&gt;
&lt;p&gt;Our data observatories add value by curating the data&amp;ndash;we bring this
indicator to light with a more descriptive name, and we place it in
context with our &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Green Deal Data Observatory&lt;/a&gt;.
While many people may need this dataset in the environmental policy organizations, NGOs, scientific journalists, or researchers, most of them has no training in the nomenclatures of scientific and R&amp;amp;D spending or public budget accounts. Our curated data observatories bring together many
available data around important domains. Our &lt;em&gt;Green Deal Data Observatory&lt;/em&gt;, for example, aims to form an ecosystem of climate policy and climate change mitigation data users and producers.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-we-added-descriptive-metadatahttpszenodoorgrecord5658849yyqicwdmliu-that-help-you-find-our-data-and-match-it-with-other-relevant-data-sources&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;We [added descriptive metadata](https://zenodo.org/record/5658849#.YYqicWDMLIU) that help you find our data and match it with other relevant data sources.&#34; srcset=&#34;
               /media/img/blogposts_2021/zenodo_gbard_environment_expenditure_metadata_hu466af5eda667e61c992cbc3770f1c27b_194619_94393f82400c1139d76477a52a1af13a.webp 400w,
               /media/img/blogposts_2021/zenodo_gbard_environment_expenditure_metadata_hu466af5eda667e61c992cbc3770f1c27b_194619_2b0d6d8f077aaaeca31f7fc768a35e03.webp 760w,
               /media/img/blogposts_2021/zenodo_gbard_environment_expenditure_metadata_hu466af5eda667e61c992cbc3770f1c27b_194619_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/zenodo_gbard_environment_expenditure_metadata_hu466af5eda667e61c992cbc3770f1c27b_194619_94393f82400c1139d76477a52a1af13a.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      We &lt;a href=&#34;https://zenodo.org/record/5658849#.YYqicWDMLIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;added descriptive metadata&lt;/a&gt; that help you find our data and match it with other relevant data sources.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We added descriptive metadata that help you find our data and match it
with other relevant data sources. For example, we add keywords and
standardized metadata identifiers from the Library of Congress Linked
Data Services, probably the world’s largest standardized knowledge
library description. This makes sure that you can find relevant data
about the same concept (&lt;a href=&#34;https://id.loc.gov/authorities/subjects/sh85044203.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;environmental protection&lt;/a&gt;)
besides our turnover data. This help unambigously connect our dataset
with other information source that use the same concept, but maybe
different keywords, such as &lt;em&gt;Protection of environment&lt;/em&gt;, or maybe &lt;em&gt;Umweltschutz&lt;/em&gt; in German, or &lt;em&gt;Ochrana životného prostredia&lt;/em&gt; in Slovak. Or avoid confusion with &lt;em&gt;Human environment&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id=&#34;accessible-data&#34;&gt;Accessible Data&lt;/h2&gt;
&lt;p&gt;Our data is accessible in two forms: in &lt;code&gt;csv&lt;/code&gt; tabular format (which can be
read with Excel, OpenOffice, Numbers, SPSS and many similar spreadsheet
or statistical applications) and in &lt;code&gt;JSON&lt;/code&gt; for automated importing into
your databases. We can also provide our users with SQLite databases,
which are fully functional, single user relational databases.&lt;/p&gt;
&lt;p&gt;Tidy datasets are easy to manipulate, model and visualize, and have a
specific structure: each variable is a column, each observation is a
row, and each type of observational unit is a table. This makes the data
easier to clean, and far more easier to use in a much wider range of
applications than the original data we used. In theory, this is a simple objective,
yet we find that even governmental statistical agencies&amp;ndash;and even scientific
publications&amp;ndash;often publish untidy data. This poses a significant problem that implies
productivity loses: tidying data will require long hours of investment, and if
a reproducible workflow is not used, data integrity can also be compromised:
chances are that the process of tidying will overwrite, delete, or omit a data or a label.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-tidy-datasetshttpsr4dshadconztidy-datahtml-are-easy-to-manipulate-model-and-visualize-and-have-a-specific-structure-each-variable-is-a-column-each-observation-is-a-row-and-each-type-of-observational-unit-is-a-table&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;[Tidy datasets](https://r4ds.had.co.nz/tidy-data.html) are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.&#34; srcset=&#34;
               /media/img/blogposts_2021/tidy-8_hub5468e0441f3c23e1be9aa13622e5d1a_299553_840d5597bab1e4d7c2b314453bf83608.webp 400w,
               /media/img/blogposts_2021/tidy-8_hub5468e0441f3c23e1be9aa13622e5d1a_299553_f01845e0e6967cc9a3a2b53cf12edd0a.webp 760w,
               /media/img/blogposts_2021/tidy-8_hub5468e0441f3c23e1be9aa13622e5d1a_299553_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/tidy-8_hub5468e0441f3c23e1be9aa13622e5d1a_299553_840d5597bab1e4d7c2b314453bf83608.webp&#34;
               width=&#34;760&#34;
               height=&#34;355&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;https://r4ds.had.co.nz/tidy-data.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tidy datasets&lt;/a&gt; are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;While the original data source, the Eurostat data warehouse is
accessible, too, we added value with bringing the data into a &lt;a href=&#34;https://www.jstatsoft.org/article/view/v059i10&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tidy
format&lt;/a&gt;. Tidy data can
immediately be imported into a statistical application like SPSS or
STATA, or into your own database. It is immediately available for
plotting in Excel, OpenOffice or Numbers.&lt;/p&gt;
&lt;h2 id=&#34;interoperability&#34;&gt;Interoperability&lt;/h2&gt;
&lt;p&gt;Our data can be easily imported with, or joined with data from other internal or external sources.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-all-our-indicators-come-with-standardized-descriptive-metadata-and-statistical-processing-metadata-see-our-apihttpsapigreendealdataobservatoryeudatabasemetadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our [API](https://api.greendeal.dataobservatory.eu/database/metadata/) &#34; srcset=&#34;
               /media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_225afcd2a785db051b89c7c36fdc28b9.webp 400w,
               /media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_5807feecbd17bee02fd8c68fad87b1d7.webp 760w,
               /media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_225afcd2a785db051b89c7c36fdc28b9.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/metadata/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;API&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;All our indicators come with standardized descriptive metadata,
following two important standards, the &lt;a href=&#34;https://dublincore.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dublin Core&lt;/a&gt; and
&lt;a href=&#34;https://datacite.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DataCite&lt;/a&gt;–implementing not only the mandatory,
but the recommended descriptions, too. This will make it far easier to
connect the data with other data sources, e.g. turnover with the number of radio broadcasting enterprises or radio stations within specific territories.&lt;/p&gt;
&lt;p&gt;Our passion for documentation standards and best practices goes much further: our data uses &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; standardized codebooks, unit descriptions and other statistical and administrative metadata.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-we-participate-in-scientific-workhttpsreprexnlpublicationeuropean_visibilitiy_2021-related-to-data-interoperability&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;We participate in [scientific work](https://reprex.nl/publication/european_visibilitiy_2021/) related to data interoperability.&#34; srcset=&#34;
               /media/img/reports/european_visbility_publication_hu9fd9bf0ebbda97354d76a2e1b9589f6b_264884_25232c9bd0c86814e3e3337261110ea4.webp 400w,
               /media/img/reports/european_visbility_publication_hu9fd9bf0ebbda97354d76a2e1b9589f6b_264884_93fa43b83c3a299d78a1afed7bc4f820.webp 760w,
               /media/img/reports/european_visbility_publication_hu9fd9bf0ebbda97354d76a2e1b9589f6b_264884_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/reports/european_visbility_publication_hu9fd9bf0ebbda97354d76a2e1b9589f6b_264884_25232c9bd0c86814e3e3337261110ea4.webp&#34;
               width=&#34;760&#34;
               height=&#34;506&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      We participate in &lt;a href=&#34;https://reprex.nl/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;scientific work&lt;/a&gt; related to data interoperability.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;h2 id=&#34;reuse&#34;&gt;Reuse&lt;/h2&gt;
&lt;p&gt;All our datasets come with standardized information about reusabililty.
We add citation, attribution data, and licensing terms. Most of our
datasets can be used without commercial restriction after acknowledging
the source, but we sometimes work with less permissible data licenses.&lt;/p&gt;
&lt;p&gt;In the case presented here, we added further value to encourage re-use. In addition to tidying, we
significantly increased the usability of public data by handling
missing cases. This is the subject of our &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-06-indicator_value_added/&#34;&gt;next blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you a data user? Give us some feedback! Shall we do some further
automatic data enhancements with our datasets? Document with different
metadata? Link more information for business, policy, or academic use? Please
give us any &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feedback&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The Data Sisyphus</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-07-08-data-sisyphus/</link>
      <pubDate>Thu, 08 Jul 2021 09:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-07-08-data-sisyphus/</guid>
      <description>&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-sisyphus-was-punished-by-being-forced-to-roll-an-immense-boulder-up-a-hill-only-for-it-to-roll-down-every-time-it-neared-the-top-repeating-this-action-for-eternity--this-is-the-price-that-project-managers-and-analysts-pay-for-the-inadequate-documentation-of-their-data-assets&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.&#34; srcset=&#34;
               /media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_cd48a6c374c9ff68a08abe79a6abf2f4.webp 400w,
               /media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_a6eb1b13ff33a5c73aba34550964ff52.webp 760w,
               /media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/blogposts_2021/Sisyphus_Bodleian_Library_hu99f0c1d6c82963b9538437670b4d339d_1662894_cd48a6c374c9ff68a08abe79a6abf2f4.webp&#34;
               width=&#34;760&#34;
               height=&#34;507&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;&lt;em&gt;When was a file downloaded from the internet?  What happened with it sense?  Are their updates? Did the bibliographical reference was made for quotations?  Missing values imputed?  Currency translated? Who knows about it – who created a dataset, who contributed to it?  Which is an intermediate format of a spreadsheet file, and which is the final, checked, approved by a senior manager?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Big data creates inequality and injustice. On aspect of this inequality is the cost of data processing and documentation – a greatly underestimated, and usually not reported cost item. In small organizations, where there are no separate data science and data engineering roles, data is usually supposed to be processed and documented by (junior) analysts or researchers.  This a very important source of the gap between Big Tech and them: the data usually ends up very expensive, ill-formatted, not readable by computers that use machine learning and AI. Usually the documentation steps are completely omitted.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Data is potential information, analogous to potential energy: work is required to release it.” &amp;ndash; Jeffrey Pomerantz&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Metadata, which is information about the history of the data, and information how it can be technically and legally reused, has a hidden cost. Cheap or low-quality external data comes with poor or no metadata, and small organizations lack the resources to add high-quality metadata to their datasets. However, this only perpetuates the problem.&lt;/p&gt;
&lt;h2 id=&#34;metadata-unbillable-hours&#34;&gt;The hidden cost item behind the unbillable hours&lt;/h2&gt;
&lt;p&gt;As we have shown with our research partners, such metadata problems are not unique to data analysis.  Independent artists and small labels are suffering on music or book sales platforms, because their copyrighted content is not well documented.  If you automatically document tens of thousands of songs or datasets, the documentation cost is very small per item. If you, do it manually, the cost may be higher than the expected revenue from the song, or the total cost of the dataset itself. (See our research consortiums&amp;rsquo; preprint paper: &lt;a href=&#34;https://dataandlyrics.com/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In the short run, small consultancies, NGOs, or as a matter of fact, musicians, seem to logically give up on high-quality documentation and logging.  In the long run, this has two devastating consequences: computers, such as machine learning algorithms cannot read their documents, data, songs.  And as memory fades, the ill-documented resources need to be re-created, re-checked, reformatted.  Often, they are even hard to find on your internal server or laptop archive.&lt;/p&gt;
&lt;p&gt;Metadata is a hidden destroyer of the competitiveness of corporate or academic research, or independent content management.   It never quoted on external data vendor invoices, it is not planned as a cost item, because metadata, the description of a dataset, a document, a presentation, or song, is meaningless without the resource that it describes. You never buy metadata.  But if your dataset comes without proper metadata documentation, you are bound, like Sisyphus, to search for it, to re-arrange it, to check its currency units, its digits, its formatting.  Data analysts are reported to spend about 80% of their working hours on data processing and not data analysis &amp;ndash; partly, because data processing is a very laborious task that can be done by computers at a scale far cheaper, and partly because they do not know if the person who sat before them at the same desk has already performed these tasks, or if the person responsible for quality control checked for errors.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-uncut-diamonds-need-to-be-cut-polished-and-you-have-to-make-sure-that-they-come-from-a-legal-source-data-is-similar-it-needs-to-be-tidied-up-checked-and-documented-before-use-photo-dave-fischer&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Uncut diamonds need to be cut, polished, and you have to make sure that they come from a legal source. Data is similar: it needs to be tidied up, checked and documented before use. Photo: Dave Fischer.&#34; srcset=&#34;
               /media/img/gems/Uncut-diamond_Edit_hu4573f19f53e1306ad88770fc5e491871_409761_0317c281e0aba727eb8e1a81805de459.webp 400w,
               /media/img/gems/Uncut-diamond_Edit_hu4573f19f53e1306ad88770fc5e491871_409761_1470967ea871e5c3f6f247c839f6d52a.webp 760w,
               /media/img/gems/Uncut-diamond_Edit_hu4573f19f53e1306ad88770fc5e491871_409761_1200x1200_fit_q75_h2_lanczos.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/gems/Uncut-diamond_Edit_hu4573f19f53e1306ad88770fc5e491871_409761_0317c281e0aba727eb8e1a81805de459.webp&#34;
               width=&#34;760&#34;
               height=&#34;506&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Uncut diamonds need to be cut, polished, and you have to make sure that they come from a legal source. Data is similar: it needs to be tidied up, checked and documented before use. Photo: Dave Fischer.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Undocumented data is hardly informative – it may be a page in a book, a file in an obsolete file format on a governmental server, an Excel sheet that you do not remember to have checked for updates.  Most data are useless, because we do not know how it can inform us, or we do not know if we can trust it.  The processing can be a daunting task, not to mention the most boring and often neglected documentation duties after the dataset is final and pronounced error-free by the person in charge of quality control.&lt;/p&gt;
&lt;h2 id=&#34;observatory-metadata-services&#34;&gt;Our observatory automatically processes and documents the data&lt;/h2&gt;
&lt;p&gt;The good news about documentation and data validation costs is that they can be shared.  If many users need GDP/capita data from all over the world in euros, then it is enough if only one entity, a data observatory, collects all GDP and population data expresed in dollars, korunas, and euros, and makes sure that the latest data is correctly translated to euros, and then correctly divided by the latest population figures. These task are error-prone,and should not be repeaeted by every data journalist, NGO employee, PhD student or junior analyst.  This is one of the services of our data observatory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The tidy data format means that the data has a uniform and clear data structure and semantics, therefore it can be automatically validated for many common errors and can be automatically documented by either our software or any other professional data science application. It is not as strict as the schema for a relational database, but it is strict enough to make, among other things, importing into a database easy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The descriptive metadata contains information on how to find the data, access the data, join it with other data (interoperability) and use it, and reuse it, even years from now. Among others, it contains file format information and intellectual property rights information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The processing metadata makes the data usable in strictly regulated professional environments, such as in public administration, law firms, investment consultancies, or in scientific research. We give you the entire processing history of the data, which makes peer-review or external audit much easier and cheaper.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The authoritative copy is held at an independent repository, it has a globally unique identifier that protects you from accidental data loss, mixing up with unfinished an untested version.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-cutting-the-dataset-to-a-format-with-clear-semantics-and-documenting-it-with-the-fair-metadata-concep-exponentially-increases-the-value-of-data-it-can-be-publisehd-or-sold-at-a-premium-photo-andere-andrehttpscommonswikimediaorgwindexphpcurid4770037&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Cutting the dataset to a format with clear semantics and documenting it with the FAIR metadata concep exponentially increases the value of data. It can be publisehd or sold at a premium. Photo: [Andere Andre](https://commons.wikimedia.org/w/index.php?curid=4770037).&#34; srcset=&#34;
               /media/img/gems/Diamond_Polisher_hu2b5ca0e8d1290dc6b290d6b4669a6259_449722_27278366bdb30735ec3edb5dd68ce37b.webp 400w,
               /media/img/gems/Diamond_Polisher_hu2b5ca0e8d1290dc6b290d6b4669a6259_449722_2022c9c74076769b68c8f788b6835f99.webp 760w,
               /media/img/gems/Diamond_Polisher_hu2b5ca0e8d1290dc6b290d6b4669a6259_449722_1200x1200_fit_q75_h2_lanczos.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/gems/Diamond_Polisher_hu2b5ca0e8d1290dc6b290d6b4669a6259_449722_27278366bdb30735ec3edb5dd68ce37b.webp&#34;
               width=&#34;760&#34;
               height=&#34;506&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Cutting the dataset to a format with clear semantics and documenting it with the FAIR metadata concep exponentially increases the value of data. It can be publisehd or sold at a premium. Photo: &lt;a href=&#34;https://commons.wikimedia.org/w/index.php?curid=4770037&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Andere Andre&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;While humans are much better at analysing the information and human agency is required for trustworthy AI, computers are much better at processing and documenting data.  We apply to important concepts to our data service: we always process the data to the tidy format, we create an authoritative copy, and we always automatically add descriptive and processing metadata.&lt;/p&gt;
&lt;h2 id=&#34;value-of-metadata&#34;&gt;The value of metadata&lt;/h2&gt;
&lt;p&gt;Metadata is often more valuable and more costly to make than the data itself, yet it remains an elusive concept for senior or financial management.  Metadata is information about how to correctly use the data and has no value without the data itself.  Data acquisition, such as buying from a data vendor, or paying an opinion polling company, or external data consultants appears among the material costs, but metadata is never sold alone, and you do not see its cost.&lt;/p&gt;
&lt;p&gt;In most cases, the reason why &lt;a href=&#34;https://dataandlyrics.com/post/2021-06-18-gold-without-rush/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;there is no gold rush for open data&lt;/a&gt; is that fact that while the EU member states release billions of euros&amp;rsquo; worth data for free, or at very low cost, annually, it comes without proper metadata.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;















&lt;figure  id=&#34;figure-data-as-serviceservicesdata-as-servicereusable-legal-easy-to-import-interoperable-always-fresh-data-in-tidy-formats-with-a-modern-api-photo-edgar-sotohttpsunsplashcomphotosgb0bzgae1nk&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;[Data-as-Service](/services/data-as-service/)Reusable, legal, easy-to-import, interoperable, always fresh data in tidy formats with a modern API. Photo: [Edgar Soto](https://unsplash.com/photos/gb0BZGae1Nk).&#34; srcset=&#34;
               /media/img/gems/edgar-soto-gb0BZGae1Nk-unsplash_hu885793c483f74753314f6c800c67a06f_204775_81b97d34c1ccb0eb3994b312d0747e63.webp 400w,
               /media/img/gems/edgar-soto-gb0BZGae1Nk-unsplash_hu885793c483f74753314f6c800c67a06f_204775_b3ddf8e86873a66ce16e8636fadc3357.webp 760w,
               /media/img/gems/edgar-soto-gb0BZGae1Nk-unsplash_hu885793c483f74753314f6c800c67a06f_204775_1200x1200_fit_q75_h2_lanczos.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/gems/edgar-soto-gb0BZGae1Nk-unsplash_hu885793c483f74753314f6c800c67a06f_204775_81b97d34c1ccb0eb3994b312d0747e63.webp&#34;
               width=&#34;760&#34;
               height=&#34;506&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;https://greendeal.dataobservatory.eu/services/data-as-service/&#34;&gt;Data-as-Service&lt;/a&gt;&lt;/br&gt;&lt;/br&gt;Reusable, legal, easy-to-import, interoperable, always fresh data in tidy formats with a modern API. Photo: &lt;a href=&#34;https://unsplash.com/photos/gb0BZGae1Nk&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Edgar Soto&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;If the data source is cheap or has a low quality, you do not even get it.  If you do not have it, it will show up as a human resource cost in research (when your analysist or junior researcher are spending countless hours to find out the missing metadata information on the correct use of the data) or in sales costs (when you try to reuse a research, consulting or legal product and you have comb through your archive and retest elements again and again.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The data, together with the descriptive and administrative metadata, and links to the use license and the authoritative copy can be found in our API. Try it out!&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Metadata</title>
      <link>https://greendeal.dataobservatory.eu/services/metadata/</link>
      <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/services/metadata/</guid>
      <description>&lt;p&gt;&lt;em&gt;Adding metadata exponentially increases the value of data. Did your region add a new town to its boundaries? How do you adjust old data to conform to constantly changing geographic boundaries? What are some practical ways of combining satellite sensory data with my organization&amp;rsquo;s records? And do I have the right to do so? Metadata logs the history of data, providing instructions on how to reuse it, also setting the terms of use. We automate this labor-intensive process applying the FAIR data concept.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In our observatory we apply the concept of &lt;a href=&#34;#FAIR&#34;&gt;FAIR&lt;/a&gt; (&lt;strong&gt;f&lt;/strong&gt;indable, &lt;strong&gt;a&lt;/strong&gt;ccessibe, &lt;strong&gt;i&lt;/strong&gt;nteroperable, and &lt;strong&gt;r&lt;/strong&gt;eusable digital assets) in our APIs and in our open-source statistical software packages.&lt;/p&gt;
&lt;h2 id=&#34;the-hidden-cost-item&#34;&gt;The hidden cost item&lt;/h2&gt;
&lt;p&gt;Metadata gets less attention than data, because it is never acquired separately, it is not on the invoice, and therefore it remains an a hidden cost, and it is more important from a budgeting and a usability point of view than the data itself. Metadata is responsible for industry non-billable hours or uncredited working hours in academia. Poor data documentation, lack of reproducible processing and testing logs, inconsistent use of currencies, keywords, and storing &lt;a href=&#34;#messy-data&#34;&gt;messy data&lt;/a&gt; make reusability and interoperability, integration with other information impossible.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;#FAIR-data&#34;&gt;FAIR Data and the Added Value of Rich Metadata&lt;/a&gt; we introduce how we apply the concept of &lt;a href=&#34;#FAIR&#34;&gt;FAIR&lt;/a&gt; (&lt;strong&gt;f&lt;/strong&gt;indable, &lt;strong&gt;a&lt;/strong&gt;ccessibe, &lt;strong&gt;i&lt;/strong&gt;nteroperable, and &lt;strong&gt;r&lt;/strong&gt;eusable digital assets) in our APIs.&lt;/p&gt;
&lt;p&gt;Organizations pay many times for the same, repeated work, because these boring tasks, which often comprise of tens of thousands of microtasks, are neglected. Our solution creates automatic documentation and metadata for your own historical internal data or for acquisitions from data vendors. We apply the more general &lt;a href=&#34;#Dublin-Core&#34;&gt;Dublin Core&lt;/a&gt; and the more specific, mandatory and recommended values of &lt;a href=&#34;#DataCite&#34;&gt;DataCite&lt;/a&gt; for datasets &amp;ndash; these are new requirements in EU-funded research from 2021. But they are just the minimal steps, and there is a lot more to do to create a diamond ring from an uncut gem.&lt;/p&gt;
&lt;h2 id=&#34;map-your-data-bibliographis-catalogues-codebooks-versioning&#34;&gt;Map your data: bibliographis, catalogues, codebooks, versioning&lt;/h2&gt;
&lt;p&gt;Updating descriptive metadata, such as bibliographic citation files, descriptions and sources to data files downloaded from the internet, versioning spreadsheet documents and presentations is usually a hated and often neglected task withing organization, and rightly so: these boring and error-prone tasks are best left to computers.&lt;/p&gt;
















&lt;figure  id=&#34;figure-already-adjusted-spreadsheets-are-re-adjusted-and-re-checked-hours-are-spent-on-looking-for-the-right-document-with-the-rigth-version-duplicates-multiply-already-downloaded-data-is-downloaded-again-and-miscategorized-again-finding-the-data-without-map-is-a-treasure-hunt-photo--nhttpsunsplashcomphotosrfid0_7kep4utm_sourceunsplash&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © [N.](https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash)&#34; srcset=&#34;
               /media/img/gems/n-RFId0_7kep4-unsplash_huee7d1c00c98fa72789543cf5e3e81601_230600_4ef39edbabd2ce5d60369717f173740b.webp 400w,
               /media/img/gems/n-RFId0_7kep4-unsplash_huee7d1c00c98fa72789543cf5e3e81601_230600_b62329f523c1d5825fbad91ef6374d1a.webp 760w,
               /media/img/gems/n-RFId0_7kep4-unsplash_huee7d1c00c98fa72789543cf5e3e81601_230600_1200x1200_fit_q75_h2_lanczos.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/gems/n-RFId0_7kep4-unsplash_huee7d1c00c98fa72789543cf5e3e81601_230600_4ef39edbabd2ce5d60369717f173740b.webp&#34;
               width=&#34;760&#34;
               height=&#34;506&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © &lt;a href=&#34;https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;N.&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The lack of time and resources spend on documentation over time reduces reusability and significantly increases data processing and supervision or auditing costs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Our observatory metadata is compliant with the &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/cross-domain-attribute/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dublin Core Cross-Domain Attribute Set&lt;/a&gt; metadata standard, but we use different formatting. We offer simple re-formatting from the richer DataCite to Dublin Core for interoperability with a wider set of data sources.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; We use all &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory&lt;/a&gt; DataCite metadata fields, all the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the recommended and optional&lt;/a&gt; ones.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; It complies with the tidy data principles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words: very easy to import into your databases, or join with other databases, and the information is easy to find.  Corrections, updates can automatically managed.&lt;/p&gt;
&lt;h2 id=&#34;what-happened-with-the-data-before&#34;&gt;What happened with the data before?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; We are creating Codebooks that are following the SDMX statistical metadata codelists, and resemble the SMDX concepts used by international statistical agencies. (See more technical information &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Small organizations often cannot afford to have data engineers and data scientists on staff, and they employ analysts who work with Excel, OpenOffice, PowerBI, SPSS or Stata.  The problem with these applications is that they often require the user to manually adjust the data, with keyboard entries or mouse clicks.  Furthermore, they do not provide a precise logging of the data processing, manipulation history.
The manual data processing and manipulation is very error prone and makes the use of complex and high value resources, such as harmonized surveys or symmetric input-output tables, to name two important source we deal with, impossible to use.  The use of these high-value data sources often requires tens of thousands of data processing steps: no human can do it faultlessly.&lt;/p&gt;
&lt;p&gt;What is even more problematic that simple applications for analysis do not provide a log of these manipulations’ steps: pulling over a column with the mouse, renaming a row, adding a zero to an empty cell. This makes senior supervisory oversight and external audit very costly.&lt;/p&gt;
&lt;p&gt;Our data comes with full history: all changes are visible, and we even open the code or algorithm that processed the raw data.  Your analysts can still use their favourite spreadsheet or statistical software application, but they can start from a clean, tidy dataset, with all data wrangling, currency and unit conversion, imputation and other low-priority but important tasks done and logged.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Developing an Open API is the Right Direction</title>
      <link>https://greendeal.dataobservatory.eu/post/2021-06-08-developer-botond-vitos/</link>
      <pubDate>Mon, 07 Jun 2021 20:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/post/2021-06-08-developer-botond-vitos/</guid>
      <description>&lt;p&gt;&lt;em&gt;Botond Vitos, PhD is responsible for maintaing our &lt;a href=&#34;https://greendeal.dataobservatory.eu/data/api/&#34;&gt;API&lt;/a&gt;. He first started collaboration with our &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; and its trustwrothy AI project.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;as-data-engineer-what-type-of-data-do-you-usually-use-in-your-projects&#34;&gt;As data engineer, what type of data do you usually use in your projects?&lt;/h2&gt;
&lt;p&gt;Coming from a cultural studies background, my main research interest has been grassroots music scenes and festival cultures, which I hope to extend to my current projects as data engineer and as a data scientist. My prior research’s scope was mainly qualitative and focused on the inside views and stories of scene participants and stakeholders, which was invaluable in the understanding of specialized stylistic vocabularies. At the same time, I was interested in the “bigger picture,” which can be approximated through algorithmic approaches and data analysis. With both interests together, I shifted towards data science and engineering.&lt;/p&gt;
















&lt;figure  id=&#34;figure-see-our-trustworthy-ai-driven-music-export-case-study-for-slovakiahttpsmusicdataobservatoryeupublicationlisten_local_2020&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;See our trustworthy AI-driven music export case study for [Slovakia](https://music.dataobservatory.eu/publication/listen_local_2020/)&#34; srcset=&#34;
               /media/img/streaming/listen_local_SK_EN_hue3bbdd36723034473d5308625670dcc8_550932_8e1b9f713792380fd59264a40e5b9362.webp 400w,
               /media/img/streaming/listen_local_SK_EN_hue3bbdd36723034473d5308625670dcc8_550932_990e882f700e82da59356785ef840ceb.webp 760w,
               /media/img/streaming/listen_local_SK_EN_hue3bbdd36723034473d5308625670dcc8_550932_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/streaming/listen_local_SK_EN_hue3bbdd36723034473d5308625670dcc8_550932_8e1b9f713792380fd59264a40e5b9362.webp&#34;
               width=&#34;760&#34;
               height=&#34;507&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      See our trustworthy AI-driven music export case study for &lt;a href=&#34;https://music.dataobservatory.eu/publication/listen_local_2020/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Slovakia&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;I was recently involved with the development of a classification algorithm that detected stylistic directions within the music genres of electronic dance music labels found on Bandcamp. The &lt;a href=&#34;https://medium.com/data-lyrics/how-to-speak-about-music-in-the-digital-age-from-taxonomies-to-folksonomies-ac2d25ed29f7&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Bandcamp Librarian&lt;/a&gt; project makes use of the genre taxonomy offered by the industry website Beatport, which is a very top-down approach on electronic dance music genres, often resisted by the artists themselves (many of the more niche subgenres don’t even appear on the Beatport site). Accordingly, the project defined genre clusters within each Bandcamp label, which show up as combinations of Beatport subgenres. Also, it indicated some of the folksonomies (bottom-up stylistic definitions and tags) propagated by the musicians themselves.&lt;/p&gt;
















&lt;figure  id=&#34;figure-screenshot-of-the-first-verison-of-the-demo-app&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Screenshot of the first verison of the demo app.&#34; srcset=&#34;
               /media/img/streaming/listen_local_app_1_hu098db0e3c2b2943b540798ab81deb1b0_117013_98cf3836f56fdd9aae930cde9bb5a3e5.webp 400w,
               /media/img/streaming/listen_local_app_1_hu098db0e3c2b2943b540798ab81deb1b0_117013_50e29da19d86792d96fd18dc07a23aa1.webp 760w,
               /media/img/streaming/listen_local_app_1_hu098db0e3c2b2943b540798ab81deb1b0_117013_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/streaming/listen_local_app_1_hu098db0e3c2b2943b540798ab81deb1b0_117013_98cf3836f56fdd9aae930cde9bb5a3e5.webp&#34;
               width=&#34;760&#34;
               height=&#34;309&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Screenshot of the first verison of the demo app.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;In addition, working with Reprex, I became involved in the development of the Listen Local initiative. The system was aimed to protect the rights of small local artists by offering recommendation algorithms that prioritize local talent for consideration and enables the user to find local talent. The current playlist recommendations of streaming industry giants, such a,s Spotify prioritize big labels and big names, blocking access to the output of smaller, local musicians. Naturally, I looked at this project as a possible continuation of my previous work, and we are currently &lt;a href=&#34;https://bvitos.medium.com/bandcamp-librarian-part-ii-57adc160d13f&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;extending the scope of the Bandcamp Librarian&lt;/a&gt; to fit this initiative.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In an ideal data world, what would be the ultimate dataset or datasets that you would like to see in the Digital Music Observatory?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As my answer to the previous question suggests, my main concern is the development of a trustworthy AI framework. Acknowledging the national and cultural diversity of the European Union, it is essential to enable access to data that takes into account such diversities and the priorities of smaller stakeholders as well. This type of data needs to be comprehensive and well-maintained, and I believe that with curators&amp;rsquo; priorities and the development of an easily accessible, open API, we are moving in the right direction.&lt;/p&gt;
















&lt;figure  id=&#34;figure-our-apihttpsapigreendealdataobservatoryeu-contains-rich-processing-and-descriptive-metadata-besides-our-high-quality-indicators&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img alt=&#34;Our [API](https://api.greendeal.dataobservatory.eu/) contains rich processing and descriptive metadata besides our high-quality indicators.&#34; srcset=&#34;
               /media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_225afcd2a785db051b89c7c36fdc28b9.webp 400w,
               /media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_5807feecbd17bee02fd8c68fad87b1d7.webp 760w,
               /media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_1200x1200_fit_q75_h2_lanczos_3.webp 1200w&#34;
               src=&#34;https://greendeal.dataobservatory.eu/media/img/observatory_screenshots/GDO_API_metadata_table_hu31b494a33d5ae09272643545372dbd1d_100491_225afcd2a785db051b89c7c36fdc28b9.webp&#34;
               width=&#34;760&#34;
               height=&#34;428&#34;
               loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      Our &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;API&lt;/a&gt; contains rich processing and descriptive metadata besides our high-quality indicators.
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;read-more-on-data--lyrics&#34;&gt;Read More on Data &amp;amp; Lyrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Recommendation Systems: What can Go Wrong with the Algorithm?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;join-us&#34;&gt;Join us&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Join our open collaboration Green Deal Data Observatory team as a &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/curator&#34;&gt;data curator&lt;/a&gt;, &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/developer&#34;&gt;developer&lt;/a&gt; or &lt;a href=&#34;https://greendeal.dataobservatory.eu/authors/team&#34;&gt;business developer&lt;/a&gt;. More interested in antitrust, innovation policy or economic impact analysis? Try our &lt;a href=&#34;https://economy.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt; team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href=&#34;https://music.dataobservatory.eu/#contributors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; team!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Metadata</title>
      <link>https://greendeal.dataobservatory.eu/data/metadata/</link>
      <pubDate>Tue, 01 Jun 2021 11:00:00 +0000</pubDate>
      <guid>https://greendeal.dataobservatory.eu/data/metadata/</guid>
      <description>&lt;p&gt;Our observatory has a new data API which allows access to our daily refreshing open data. You can access the API via &lt;a href=&#34;http://api.economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;api.economy.dataobservatory.eu&lt;/a&gt; (&lt;em&gt;apologies for the ugly, temporary subdomain masking!&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;All the data and the metadata are available as open data, without database use restrictions, under the &lt;a href=&#34;https://opendatacommons.org/licenses/odbl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ODbL&lt;/a&gt; license. However, the metadata contents are not finalized yet. We are currently working on a solution that applies the &lt;a href=&#34;http://www.nature.com/articles/sdata201618&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR Guiding Principles for scientific data management and stewardship&lt;/a&gt;, and fulfills the mandatory requirements of the Dublic Core metadata standards and at the same time the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory requirements&lt;/a&gt;, and most of the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;recommended requirements&lt;/a&gt; of DataCite. These changes will be effective before 1 July 2021.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Competition Data Observatory&lt;/strong&gt; temporarily shares an API with the &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;, which serves as an incubator for similar economy-oriented reproducible research resources.&lt;/p&gt;
















&lt;figure  id=&#34;figure-apieconomydataobservatoryeu-processing-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;https://greendeal.dataobservatory.eu/img/observatory_screenshots/EDO_API_metadata_table.png&#34; alt=&#34;api.economy.dataobservatory.eu: processing metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      api.economy.dataobservatory.eu: processing metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;descriptive-metadata&#34;&gt;Descriptive Metadata&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An unambiguous reference to the resource within a given context. (Dublin Core item), but several identifiders allowed, and we will use several of them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Creator&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Title&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A name given to the resource. Extends Dublin Core with alternative title, subtitle, translated Title, and other title(s).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Publisher&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. For software, use Publisher for the code repository. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Publication Year&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The year when the data was or will be made publicly available.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Resource Type&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We publish Datasets, Images, Report, and Data Papers. (Dublin Core item with controlled vocabulary.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;recommended-for-discovery&#34;&gt;Recommended for discovery&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Recommended&lt;/strong&gt; (R) properties are optional, but strongly recommended for interoperability.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Subject&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The topic of the resource. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Contributor&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.) When applicable, we add Distributor (of the datasets and images), Contact Person, Data Collector, Data Curator, Data Manager, Hosting Institution, Producer (for images), Project Manager, Researcher, Research Group, Rightsholder, Sponsor, Supervisor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Date&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A point or period of time associated with an event in the lifecycle of the resource, besides the Dublin Core minimum we add Collected, Created, Issued, Updated, and if necessary, Withdrawn dates to our datasets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Rights&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give &lt;a href=&#34;https://spdx.org/licenses/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SPDX License List&lt;/a&gt; standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Description&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Recommended for discovery.(Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;GeoLocation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Similar to Dublin Core item Coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Subject&lt;/code&gt; property: we need to set standard coding schemas for each observatory.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Contributor&lt;/code&gt; property:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;DataCurator&lt;/code&gt; the curator of the dataset, who sets the mandatory properties.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DataManager&lt;/code&gt; the person who keeps the dataset up-to-date.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ContactPerson&lt;/code&gt; the person who can be contacted for reuse requests or bug reports.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Date&lt;/code&gt; property contains the following dates, which are set automatically by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Updated&lt;/code&gt; when the dataset was updated;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EarliestObservation&lt;/code&gt;, which the earliest, not backcasted, estimated or imputed observation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LatestObservation&lt;/code&gt;, which the earliest, not backcasted, estimated or imputed observation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;UpdatedatSource&lt;/code&gt;, when the raw data source was last updated.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;GeoLocation&lt;/code&gt; is automatically created by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Description&lt;/code&gt; property optional elements, and we adopted them as follows for the observatories:
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Abstract&lt;/code&gt; is a short, textual description; we try to automate its creation as much as a possible, but some curatorial input is necessary.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;TechnicalInfo&lt;/code&gt; sub-field, we record automatically the &lt;code&gt;utils::sessionInfo()&lt;/code&gt; for computational reproducability. This is automatically created by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;Other&lt;/code&gt; sub-field, we record the keywords for structuring the observatory.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;optional&#34;&gt;Optional&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Optional&lt;/strong&gt; (O) properties are optional and provide richer description. For findability they are not so important, but to create a web service, they are essential. In the mandatory and recommended fields, we are following other metadata standards and codelists, but in the optional fields we have to build up our own system for the observatories.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Language&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A language of the resource. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Alternative Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Size&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give the CSV, downloadable dataset size in bytes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Format&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give file format information. We mainly use CSV and JSON, and occasionally rds and SPSS types. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Version&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The version number of the resource.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Rights&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give &lt;a href=&#34;https://spdx.org/licenses/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SPDX License List&lt;/a&gt; standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Funding Reference&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We provide the funding reference information when applicable. This is usually mandatory with public funds.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Item&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give information about our observatory partners&amp;rsquo; related research products, awards, grants (also Dublin Core item as Relation.) We particularly include source information when the dataset is derived from another resource (which is a Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;In the &lt;code&gt;Language&lt;/code&gt; we only use English (eng) at the moment.&lt;/li&gt;
&lt;li&gt;By default We do not use the &lt;code&gt;Alternative Identifier&lt;/code&gt; property. We will do this when the same dataset will be used in several observatories.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Size&lt;/code&gt; property is measured in bytes for the CSV representation of the dataset. During creations, the software creates a temporary CSV file to check if the dataset has no writing problems, and measures the dataset size.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Version&lt;/code&gt; property needs further work. For a daily re-freshing API we need to find an applicable versioning system.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Funding reference&lt;/code&gt; will contain information for donors, sponsors, and co-financing partners.&lt;/li&gt;
&lt;li&gt;Our default setting for &lt;code&gt;Rights&lt;/code&gt; is the &lt;a href=&#34;https://spdx.org/licenses/CC-BY-NC-SA-4.0.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CC-BY-NC-SA-4.0&lt;/a&gt; license and we provide an URI for the license document.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;RelatedItem&lt;/code&gt; we give information about:
&lt;ul&gt;
&lt;li&gt;The original (raw) data source.&lt;/li&gt;
&lt;li&gt;Methodological bibilography reference, when needed.&lt;/li&gt;
&lt;li&gt;The open-source statistical software code that processed the data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;administrative-processing-metadata&#34;&gt;Administrative (Processing) Metadata&lt;/h2&gt;
&lt;h2 id=&#34;administrative-metadata&#34;&gt;Administrative Metadata&lt;/h2&gt;
&lt;p&gt;Like with diamonds, it is better to know the history of a dataset, too. Our administrative metadata contains codelists that follow the SXDX statistical metadata standards, and similarly strucutred information about the processing history of the dataset.&lt;/p&gt;
&lt;p&gt;See for further reference &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The codebook Class&lt;/a&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Observation Status&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;SDMX Code list for &lt;a href=&#34;https://sdmx.org/?sdmx_news=new-version-of-code-list-for-observation-status-version-2-2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Observation Status 2.2&lt;/a&gt; (CL_OBS_STATUS), such as actual, missing, imputed, etc. values.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Method&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;If the value is estimated, we provide modelling information.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Unit&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We provide the measurement unit of the data (when applicable.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Frequency&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;&lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SDMX Code list for Frequency 2.1 (CL_FREQ)&lt;/a&gt; frequency values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Codelist&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Euros-SDMX Codelist entries for the observational units, such as sex, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Imputation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;SDMX Code list for Frequency 2.1 (CL_IMPUT_METH) imputation values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Estimation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The estimation methodology of data that we calculated, together with citation information and URI to the actual processing code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Item&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give information about the software code that processed the data (both Dublin Core and DataCite compliant.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;See an example in the &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The codebook Class&lt;/a&gt; article of the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
