Stimulating the use of FAIR data sharing in the lifesciences

Published:

Why do Aspergillus researchers need better data sharing?

Aspergillus fumigatus is a opportunistic pathogen of humans that usually lives saprotrophically in agricultural waste (Figure 1). In modern agriculture, the use of fungicides to combat diseases such as Botrytis in tulips is common. The fungicides end up together with A. fumigatus in the compost heaps. These compost heaps are quite extreme, within the heap, temperatures can reach up to 50°C. Luckily for A. fumigatus (and unluckily for us), A. fumigatus actually likes these temperatures, so it ends up being the only fungus in the heap. As the azole concentration in the heap is quite high, A. fumigatus quickly develops azole resistance.

Spread and effects of A. fumigatus.
Figure 1: Image from Keller. (2017)

Azole resistance in A. fumigatus is an issue, as the mould is quite lethal in patients with a repressed immune system, since azole resistant stains are quite difficult to treat. Many researchers are investigating how the resistance works, how it occurs and where it occurs and much knowledge has been obtained using modern biology methods, such as CRISPR-CAS9, and sequencing technologies.

However, there is a issue in the data sharing: researchers usually put their data in an excel sheet or zip file and place this on the website of the publisher. While this is perfectly suitable for judging the quality of the scientific work, and perhaps for analysing the data for a different question, it is very difficult to get an overview of these disconnected datasets. The lack of overview causes unneeded work: people may collect “new”data that already exists, and a lot of unneeded work is put into merging data sets.

The main way to solve this limitation, is to adopt FAIR data principles within the Aspergillus fumigatus domain. According to FAIR, data should be indexed at a searchable location, this location must be publicly available (perhaps with authenication), the stored data must be described in a standard format that can be combined with other FAIR data sources.

The 5 star model for FAIR data quality
Figure 2: Diagramme of FAIR (from NLM)

What has been done about it?

Me, Mariana Santos and Anna Fensel have been working on a system to ease data sharing for Aspergillus researchers, called the Aspergillus fumigatus azole resistance knowledge repository (ASPAR_KR). The FAIRDS has been identified as a tool that will be FAIRfiy data coming into the repository.

The programme models each aspect of a general experimental workflow (see image below). In investigation, the paper for which all the experiments are done, is described. Study describes all the subquestions of an investigation. Observation unit describes what was observed to answer a question. The Sample and Assay classes are used to note down the results.

The main data classes of the FAIRDS.
Figure 3: The main data classes of the FAIRDS.

When the user fills the aforementioned data in an excel sheet an rdf file is made that can be uploaded to a graph database, or analysed with an rdf library in R or python.

Where can I find more information?