There is an enormous amount of research data now out there online, either openly available or restricted. Despite the amount of data available, finding the right data for your research project/question is often difficult. The tips below may help you in finding data suitable for your project.
You also need to consider if data is actually reusable - does it have the correct license? And does it contain enough metadata and documentation for reuse?
Finding a dataset¶
You can find open and restricted datasets by conducting searches of the metadata such as keyword searches.
You can find data via:
Direct browsing of discipline-specific and multidisciplinary repositories such as Zenodo, Open Science Framework, Figshare.
Search for discipline-specific data repositories on Re3data, FAIRsharing or look at this list of data repositories.
See Data Repositories for more information.
Search in data journals and research articles - you can start by looking at our Chapter on Data Articles.
Use your network to find datasets.
Use specific data search tools:
Check the license¶
Once you found a dataset, you need to check the license to see if you can actually reuse the data!
The most commonly used open licences are Creative Commons, Open Government Licence, or an Open Data Commons Attribution License. See our chapter on Licensing for more information.
Not all datasets that are available to researchers are open datasets. Therefore, if you want to use a restricted dataset, you need to check how you can apply to access it and what the restrictions are on its use. Restricted datasets still have a license on them and there should be a clear application process such as a data request form or an email address to inquire about the access.
Check the metadata and documentation for reusability¶
After a metadata check to see if the data is of use to you, you’ll need to evaluate the dataset more closely.
The following questions may help you to do so:
What was the original research question?
How was the data collected?
Who collected the data? See Liu et al. (2024) for some considerations of why it is important to consider who collected data.
Are the collection and processing methods appropriate to answer my research question?
Is the data collection process well documented? Which instruments were used? What settings/parameters?
Are protocols of the data collection shared?
Is there sufficient information available to understand the dataset and its context/origin?
Is the information complete, understandable and consistent?
Giving credit for use of data¶
Once you have used someone elses dataset, you’ll need to cite the data to provide credit to the original data creator(s)!
You need to do this clearly in your research documentation as well as in any research articles you publish. See Citing your own Research Objects for more information about how to properly cite datasets.
Always check how the original dataset should be cited: sometimes researchers want you to cite the accompanying publication instead of the dataset itself. This information is generally available in READme files or in the metadata of the repository.
More information¶
- Liu, Z., Luo, P., Tang, X., Wang, J., & Nie, L. (2024). Unfolding the downloads of datasets: A multifaceted exploration of influencing factors. Scientific Data, 11(1). 10.1038/s41597-024-03591-8
- Gregory, K., Khalsa, S. J., Michener, K., William, Psomopoulos, F. E., de Waard, A., & Wu, M. (2018). Eleven quick tips for finding research data. PLOS. 10.1371/journal.pcbi.1006038
- Plomp, E., & Obileye, O. (2024). AREN - Reusing Open Data. 10.5281/ZENODO.11862587