Skip to article frontmatterSkip to article content

Data Wranglers can be viewed as a specialised type of data scientist, primarily working in the space between data generators and data analysts. There are many activities that a data scientist might undertake, for example, data collection, wrangling, analysis, modelling, visualisation and communication. How these activities map onto different job titles is domain specific and will vary on a project and organisational level.

Hand-drawn illustration by Scriberia. A person representing a Data Wrangler is in a central bubble, on their left there are links to three clusters of connected shapes representing unstructured data. The Data Wrangler ‘unravels’ these clusters and re-structures them to form three distinct presents, labelled ‘Research Ready Data’, shown to the right of the central bubble. Attached to the central bubble there are smaller individual bubbles labelled ‘Connecting Specialists' (shown by a triangle of linked people), 'Data Privacy and Security’ (padlock) and 'Data Quality and Standards (sparkling diamond).

Figure 1:A Data Wrangler collaborates with multiple specialists to provide research-ready data whilst upholding data privacy and domain-specific standards. Created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: The Turing Way Community & Scriberia (2024).

What do Data Wranglers do?

In a data science project, it is commonly observed that data wrangling tasks take the majority of time Anaconda, 2020, in contrast to data analysis and modelling. Traditionally, data wrangling tasks involve cleaning, restructuring and filtering data into analysis-ready formats. However, in terms of Data Wrangler as a profession, day to day tasks and objectives can be much more diverse.

Data Wranglers work primarily in the space between data generators and data analysts, who are addressing the research question of interest. Understanding the intended use for the data in the context of the analysis and research questions, Data Wranglers can be in the position of influencing data generators in improving data collection methods. Similarly, Data Wranglers will conduct preliminary analysis on the data to ensure both completeness of data and preparation for data analysis, acting as a proxy for the data generator’s knowledge during the data analysis process. A key focus of a Data Wrangler’s role is the preparation of analysis/research-ready data Stewart et al., 2022 in which data security, data management and FAIR standards Mons et al., 2017 are all core priorities.

Examples of day-to-day tasks:

What qualifications or skills do you need to be a Data Wrangler?

Data Wranglers should have experience with programming (no specific language required, but there is a wider adoption of both R and Python), database querying (SQL) and data analysis. They will have an educational background to equip them to engage with the specific research data objects relevant to the projects they will work on. Therefore, they will have undergraduate and postgraduate degrees, or equivalent experience. As with many data science and research infrastructure roles, further relevant training and specialisation can happen on the job. They need good problem solving skills, with a curiosity and willingness to learn. Lastly, good interpersonal skills are required in order to work with people with many different backgrounds, skillsets and priorities.

Challenges for Data Wranglers

Some key challenges of a Data Wrangler role are:

In an ideal situation, some of these challenges can be mitigated if communication with Data Wranglers near the start of a project is encouraged and facilitated.

Benefits of having Data Wranglers

Who employs Data Wranglers?

Here are some examples of places that employ Data Wranglers in the UK:

Please get in touch if you know of other organisations that you would like to add to this list. This list, and this Data Wrangler page in general, started off with a focus on Data Wrangler roles within an academic research context, but these roles will also exist within other contexts.

Data Wranglers: Summary

A Data Wrangler position is becoming recognised as a crucial part of any project that involves large amounts of complex data, specifically in a research context. They will have a diverse set of technical and interpersonal skills. A Data Wrangler will bring dedicated time and resources to increasing data quality whilst facilitating collaboration, ultimately resulting in more efficient and impactful project outcomes.

References
  1. The Turing Way Community, & Scriberia. (2024). Illustrations from The Turing Way: Shared under CC-BY 4.0 for reuse. Zenodo. 10.5281/ZENODO.3332807
  2. Anaconda, Inc. (2020). 2020 STATE OF DATA SCIENCE (pp. 11–12). https://know.anaconda.com/rs/387-XNW-688/images/Anaconda-SODS-Report-2020-Final.pdf
  3. Stewart, B., Oliver, E., & McGrath-Lone, L. (2022). What is ‘research-ready’ data? https://www.adruk.org/fileadmin/uploads/adruk/Documents/What_is_research_ready_data__A_roundtable_report_June_2022.pdf
  4. Mons, B., Neylon, C., Velterop, J., Dumontier, M., Bonino da Silva Santos, L. O., & Wilkinson, M. (2017). Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services & Use, 37, 1–8. 10.3233/ISU-170824