MetaX-Blogi

Teemme ketterästi Suomen tutkimusaineistojen metatietovarantoa, joka tulee palvelemaan OKM:n tutkimusaineistopalveluita (esim. TPAS, IDA ja Etsin). Tässä blogissa käsitellään kehityksessä ajankohtaisia ja usein avoimia asioita, eikä mikään tässä esitetty edusta mitään virallisia linjauksia. Sen sijaan toivomme palautetta ja avointa keskustelua.

fredag 2 mars 2018

A Fair(y)data service user Tale

My wonderful colleagues suddenly felt an urge to tell a nice story in English about how they envision their mission providing a convenient research data service. It presents a user perspective and we hope it can help us spot issues in the integration processes. Please, feel free to comment, honoured colleagues from near and far! 

So, buckle up: Once upon a time ...


1. There are three wise researchers from University of Turku, Tampere and from the Jyväskylä University of applied sciences. They have gathered amazing data about one special flea species that lives in house sparrows. They (the researchers, not the sparrows) are now finalizing an article and they want to include a data citation to their data, to give it the visibility it deserves. Therefore, they need a persistent identifier for their dataset. The researchers have a common storage space in the far-famed IDA service. To gather the data they use their IDA project’s staging area, which is a folder with full editing rights for all project members. Each researcher, of course, uses sensible file names and well-organised folder structures to make it easy to keep track of data files. 

However, when they’re ready to publish their final results they feel that they could reorganize their data once more. No worries: all project members are free to rename and rearrange data in the staging area. 

After deciding to publish the sparrow-flea-data the project members carefully arrange the data under one root folder in the staging area. After they’re happy with the new folder structure and file names one of the researchers chooses the root folder of the ready data and clicks on the button "Freeze".



2. The freezing feature moves all data under the chosen root folder to the projects frozen area and makes it read-only. The file metadata is stored in MetaX in a background operation, which makes the file metadata available for other services in the ecosystem.

The other two researchers go to check the files in the project’s freezing area and download it on their own computers. They both see that it’s the final version of the data and everything’s good to go. The researchers are now ready to publish the data. Hooray!








3. One of the researchers clicks “create a new dataset” button in IDA and is taken to Qvain. She’s presented with a metadata editor where she can fill out metadata about the dataset she’s about to publish. She clicks the Get persistent identifier button and the UI now shows a PID she can send over to her colleague for the article. She then fills the required fields and even adds geospatial data about the locations where the data was gathered “Pretty neat”, she thinks and hits a button in Qvain called “IDA file picker”.





4. The researcher is now presented with a file system view similar that she has in IDA. One big difference is that she only sees the data that is in the project’s frozen area. The other older versions and raw data that the researchers had stored in the staging area are not visible. The file picker is actually not showing IDA, but the file metadata (file path, name, size, checksum etc..) that was stored in MetaX when the file was frozen in IDA.

The user selects the root folder of the frozen data, which automatically selects all files and subfolders that are under it. She sets the data as freely accessible. This means that once the dataset metadata is published, anyone browsing the dataset can download the files linked to it on their own computer.





5. The researcher is a bit unsure about what licence they should use for the dataset. She hits “Save as Draft” (and not “Save and Publish”) which saves a local copy of the dataset description in Qvain. She goes talk to her colleague in the next room. The colleague tells her that the default in Qvain called CC-BY-4.0 is a good and recommended option for research data.







6. The researcher is happy with the way the dataset description looks and clicks the “Save and publish” button. She’s presented with a link to Etsin research data finder to view the published data. What she doesn’t see, is that the dataset metadata and links to IDA file metadata have now been stored to MetaX. 

All the dataset metadata, including links to file metadata that MetaX knows, are shown by Etsin. However, the metadata about files in IDA’s frozen area that are not linked to any dataset metadata are not shown nor searchable in Etsin.







7. The researcher clicks the link that takes her to Etsin and sees a page that is called a dataset landing page. The page shows the metadata and the file links that she created using Qvain. Next to the information about the data files there’s a button that says “Download all”. The researcher clicks the button and her browser starts to download the files. When she clicks the button the information about the dataset identifier and about the internal identifiers of the files is taken to the Fairdata data access service, which then queries MetaX. The Fairdata data access service needs to know that the dataset identifier and the file identifiers exist, and that the dataset truly is downloadable by all users. They are, so MetaX answers tells the Fairdata access service where the files are located in IDA. The Fairdata data access service then uses a special API in IDA to fetch the files.


8. The researcher now sees that anyone can download their data on their own computer and knows how to use and cite it! 


-- Updated 7.3. step 7, details about downloading files

5 kommentarer:

  1. Heti tuli mieleen, että Etsinlinkin antaminen tuossa kohdassa 7 on hankalaa (ehkä?), koska resolvointi. :)

    SvaraRadera
  2. Voi sen antaa suorana linkkinä Etsimeen, jolloin resolvointi ei ole vielä tuossa kohdassa tarpeen.

    SvaraRadera
  3. Entä mitä tapahtuu, jos tutkija tarvitsee vain muutamia datasettiin kuuluvia tiedostoja? Voiko nämä rajata Etsimessä näkyviin ts. muodostaa subsetin, joka IDAsta ladataan?

    SvaraRadera
  4. Kiitos kysymyksestä! Tarkoituksenamme on, että tutkija pystyisi valitsemaan ladattavaksi hakemiston (=kansion), tiedoston tai koko aineiston kerralla. Ainakin aluksi. Mutta emme ole vielä tehneet yksityiskohtaisempaa speksausta, joten tähän voi tulla muutoksia. Mitä ajatuksia, Mietta?

    SvaraRadera