Data Curation#

Data curation is one of the key aspects of Research Data Management (RDM). It involves the capturing, appraisal, disposal, description, preservation, access, reuse and transformation of research data. Data curation promotes the use of data from the point of creation to ensuring that its used for the purpose it is intended for. Data curation also enables you to easily access datasets and information that you need in an organised format, once shared in a Data Repository [def]. You can imagine that if you don’t follow the data curation process, it becomes difficult to access that data.

A cartoon illustration describing the seven steps outlined below of the Data Curation Pipeline. The first step is capture, where a person is trying to catch some bugs. The second one is appraisal, where there is a looking glass trying to review the data. The third one is disposal, where some trash is being tossed out of the pipeline. The fourth step is description, which is showing some data charts. The fifth step is preservation, where these charts are then stored in glass jars. The sixth step is access, where individuals would have access to these jars with data in them. The last step is transformation, where different datasets are being transformed into a new one.

Fig. 53 The Data Curation pipeline. The Turing Way project illustration by Scriberia. Zenodo.

Data Curation Pipeline#

1. Data capture#

Data capture is the process of gathering or collecting data from different sources, which will be processed and used for certain purposes. These sources may be research articles in electronic format or databases like electronic databases among others. This stage focuses on ensuring that the data captured is actually fit for purpose and ready for curation.

2. Data appraisal#

In data appraisal, you are required to select the appropriate data by entering, digitizing, transcribing, checking, validating and cleaning the data. You may also need to anonymise or pseudoanonymise the data where necessary. Your appraisal and selection policy should ensure consistency, transparency, and accountable decision making.

3. Data disposal#

Data disposal involves disposing of data that has not be selected for retention. You need to have an appraisal policy which will guide you on the data required for archiving, redeployment, transfer of custody or ownership or destroying the research data.

4. Data description#

Data description requires that you are able to interpret the data; derive data; produce research outputs; author publications; data anonymisation; data visualization; data validation and prepare the data for preservation. So it is important that you describe the research data so that it is discoverable and usable over time, Documentation and Metadata for more information. There are also metadata standards that already exist to help you with standardised descriptions.

5. Data preservation#

See Data Repositories and Sharing and Archiving Data for more information.

6. Data access#

Data access entails distributing data, sharing data, publishing data, linking data to outputs, controlling access, establishing copyright and promoting or disseminating the data to wider audiences to access it or re-use it.
You can make the data freely available online to anyone who may be interested in reading it or you may restrict access of the data or provide an option of how to access the data. See Data Repositories and Sharing and Archiving Data for more information.

7. Data transformation#

Data transformation is the practice of examining large datasets to generate new information. You can reanalyze the research data and make relationships that may not have been previously discovered.

Additional resources on data curation#