Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Persistent identifiers (PIDs) are long-lasting references to digital resources that remain valid even if the resource’s location changes. They are fundamental to making research findable, accessible, and citable in the long term. Throughout this book, you’ll see recommendations to “assign a DOI” or use persistent identifiers - this chapter explains what PIDs are, how they work, and how they enable FAIR and open research practices.

What are Persistent Identifiers?

A persistent identifier is a unique string of characters that reliably points to a specific digital resource. Unlike a web address (URL) that can break when a website is reorganized or shut down, a PID is designed to remain functional indefinitely.

Research outputs shared online face a significant challenge: link rot. Studies have shown one in five reference article become inaccessible within just a few years. When a dataset moves from one repository to another, or when a university reorganizes its web infrastructure, ordinary web links break. This makes it impossible for other researchers to find and verify the original work.

How PIDs Solve This

PIDs use a resolution service that acts as a lookup system:

  1. The PID (for example, 10.5281/zenodo.3332807) doesn’t change even if the resource moves

  2. When someone uses the PID, it is sent to a resolver (like https://doi.org)

  3. The resolver looks up where the resource currently lives and redirects the user to the correct location

  4. If the resource moves, only the resolver’s records need updating - the PID itself stays the same

This is similar to how a phone number can stay the same even if you move to a new address.

Open Scholarly Infrastructure Ecosystem

Modern research relies on an ecosystem of interconnected PID systems. These systems work together to create a connected scholarly graph where research outputs, people, organizations, and funding can all be reliably identified and linked.

PIDs for Research Outputs

DataCite and Crossref are the two main providers of Digital Object Identifiers (DOIs) for research:

DataCite specialises in assigning DOIs to diverse research outputs including:

Crossref primarily handles:

Both systems use the same DOI infrastructure (the Handle System), so all DOIs work the same way regardless of which organization issued them. The main difference is in their communities and the types of metadata they specialize in collecting.

When you use a trusted repository (see our chapter on data repositories), it will typically assign a DOI through one of these providers automatically. You don’t usually need to choose between DataCite and Crossref yourself - the repository or publisher handles this for you.

PIDs for People

ORCID (Open Researcher and Contributor ID) provides unique identifiers for researchers. An ORCID iD is a 16-digit number that distinguishes you from every other researcher, even those with identical names.

For comprehensive guidance on ORCID, see our dedicated chapter on ORCID.

Key benefits:

PIDs for Organizations

ROR (Research Organization Registry) provides identifiers for research institutions. Every university, research institute, and funding organization can have a unique ROR ID.

Examples:

ROR IDs appear in research output metadata to indicate:

PIDs for Funders

The Crossref Funder Registry and ROR provides identifiers for funding organizations. These enable researchers to formally cite the grants that supported their work, not just acknowledge them in text.

This creates a traceable connection between:

For more on citing funding, see the section on connection metadata in linking research outputs.

How These Systems Work Together

These PID systems don’t operate in isolation - they’re designed to interconnect:

This creates a rich, queryable network of relationships that makes research more discoverable and its impacts more measurable. Funders, institutions, and researchers can trace the full story of research from funding to outputs to reuse.

PIDs and FAIR Principles

Persistent identifiers are not just convenient - they’re fundamental to making research FAIR (Findable, Accessible, Interoperable, and Reusable).

Findable

PIDs make research objects findable in multiple ways:

When you assign a PID to a dataset, it becomes discoverable not just on the repository where it lives, but across the entire scholarly ecosystem.

Accessible

PIDs point to access methods, even for restricted resources:

This aligns with the principle that data should be “as open as possible, as closed as necessary.” See our chapters on open data and sharing data for guidance on when and how to share research outputs.

Interoperable

PIDs use standard systems that work across platforms:

This means tools and services can be built on top of PID infrastructure, creating value beyond what any single repository could provide.

Reusable

PIDs enable persistent citation and credit:

When combined with open licenses, PIDs make it clear what can be reused and by whom.

PID Metadata

When a PID is created, it’s accompanied by metadata - structured information about the resource the PID identifies. This metadata is what makes research discoverable and understandable.

What is PID Metadata?

PID metadata is:

Core PID Metadata Properties

While different PID providers have their own schemas, core properties typically include:

Essential (usually required):

Important for discoverability:

For linking and attribution:

For comprehensive guidance on metadata, see our chapter on documentation and metadata.

Relationship to Domain-Specific Metadata

It’s important to understand that PID metadata and domain-specific metadata serve different but complementary purposes:

PID metadata enables discovery:

Domain-specific metadata enables reuse:

Both are needed: Think of PID metadata as the catalog card that helps someone find a book in a library, while domain-specific metadata is the detailed table of contents and index inside the book.

Our metadata chapter discusses resources like FAIRsharing that help you find the right domain-specific standards for your field. When you deposit in a repository, you’ll typically provide both:

  1. Core PID metadata through the repository’s submission form

  2. Domain-specific metadata as part of your documentation and data files

Metadata Completeness: Minimal vs. Rich

While some fields are required to create a PID, providing rich metadata substantially increases the value of your research outputs:

Minimal PID metadata (required fields only) allows:

Rich PID metadata (many optional fields completed) enables:

Example of minimal vs. rich metadata:

Minimal:

Title: Field Survey Data
Creator: J. Smith
Publisher: Generic Repository
Year: 2024
Type: Dataset

Rich:

Title: Soil Carbon Content Survey Data from Temperate Grasslands 2022-2023
Creators: Jane Smith (ORCID: 0000-0002-1234-5678)
          Alex Johnson (ORCID: 0000-0003-8765-4321)
Affiliations: University of Example (ROR: 02abcdef9)
Publisher: Field Science Data Repository
Year: 2024
Type: Dataset
Description: Soil samples collected monthly from 15 grassland sites in
             Oxfordshire, UK, analyzed for total organic carbon using
             loss-on-ignition method. Part of the Grassland Carbon
             Monitoring project.
Subjects: Soil Science; Carbon Cycle; Grassland Ecology
Related Identifiers:
  - IsSupplementTo: doi:10.1234/example-paper (the paper)
  - IsCompiledBy: doi:10.5281/zenodo.1234567 (the analysis code)
Funding: Natural Environment Research Council (Grant NE/X012345/1)
Rights: Creative Commons Attribution 4.0 International
Version: 1.0
Geolocation: Oxfordshire, UK (51.7°N, 1.2°W)

The rich metadata tells a much more complete story and enables many more discovery pathways.

How Repositories Generate PID Metadata

When you deposit research outputs in a trusted repository (see our chapter on repositories), the repository handles PID creation and much of the metadata collection automatically:

  1. You provide information through the repository’s upload form (title, description, creators, and so on)

  2. The repository generates a PID (usually a DOI) through its relationship with DataCite or Crossref

  3. The repository constructs properly formatted PID metadata from your information

  4. The repository registers the PID and metadata with the PID provider

  5. The PID provider makes the metadata publicly searchable through their services

  6. The metadata is harvested by aggregators and discovery services

This automated process is one of the key benefits of using established repositories rather than just hosting files on a personal or institutional website.

The repository also typically creates a landing page for your PID that displays the metadata in human-readable format and provides access to the resource itself.

Practical Guidance

When to Create PIDs for Your Research Outputs

PIDs are valuable for nearly any research output that you want others to be able to find, access, and cite. Consider creating PIDs when:

During research planning:

During research execution:

When preparing publications:

After publication:

PIDs are not just for “final” outputs - see our chapter on research objects for more on sharing throughout the research lifecycle.

Repository-Based vs. Direct PID Minting

There are two main ways to obtain PIDs:

Repository-based (recommended for most researchers):

See our chapter on selecting repositories for guidance on choosing the right one.

Direct minting (for specialized cases):

For most researchers, repository-based PID creation is the appropriate choice.

Understanding DOI Resolution

When you or someone else uses a DOI, here’s what happens:

  1. A DOI is shared (in a citation, link, or reference): doi:10.5281/zenodo.3332807

  2. It’s formatted as a URL to make it clickable: https://doi.org/10.5281/zenodo.3332807

  3. The doi.org resolver looks up where the resource currently lives

  4. You’re redirected to the resource’s current location (the landing page at Zenodo, in this example)

  5. The landing page shows metadata and provides access to the resource itself

This is why DOIs should always be expressed as full URLs in online contexts:

Maintaining PIDs

One of the key advantages of using repository-based PIDs is that you don’t need to maintain them yourself. The repository ensures:

If you need to update metadata (for example, to add a link to a publication that cites your data), contact the repository where the resource is hosted. Most repositories provide forms or help systems for metadata updates.

What if a Resource Truly Disappears?

In rare cases, a resource may need to be removed (for example, if it’s found to contain sensitive data that shouldn’t have been shared). Even then, the PID is not deleted:

This is much better than a link that simply returns “404 Not Found” - it provides context and maintains the integrity of the citation network.

Connecting Research Through PIDs

PIDs are most powerful when they’re used to connect related research outputs together. For detailed guidance on how to link your research outputs, versions, and funding through PID metadata, see our chapter on linking research objects.

Key connections you can make:

Additional Resources

Learn More About PIDs

PID Services and Tools

References
  1. Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K., & Tobin, R. (2014). Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE, 9(12), e115253. 10.1371/journal.pone.0115253
  2. (2021). Templeton World Charity Organization. 10.54224/20568