Minting ARKs with EZID

What is an ARK?

An ARK (Archival Resource Key) is a type of persistent identifier designed to provide long-term access to digital information. ARKs are used by libraries, archives, data centers, and publishers to ensure that digital objects remain accessible over time, even if their storage location changes.

EZID and ARKs

EZID is a service provided by the California Digital Library that enables users to create and manage persistent identifiers, including ARKs and DOIs. EZID provides a straightforward API for minting and updating ARKs, making it easier for institutions to integrate persistent identifiers into their digital asset management workflows.

Why ARKs are Important

ARKs are critical for:

  • Persistence: They remain stable even when digital objects move or change over time.

  • Interoperability: ARKs can be resolved globally using services like N2T.net.

  • Curation: Metadata embedded in ARKs, particularly using the erc: element, helps describe and manage resources over time.

  • Transparency: ARKs can convey information about the resource’s provider and its status through qualifiers and metadata.

The ERC Element

EZID supports metadata using the erc: (Electronic Resource Citation) tag, a simple and standardized schema used to describe digital resources. The erc: format is often used in ARK records to provide essential citation metadata.

Typical ERC elements include:

  • who: The agent or creator responsible for the resource.

  • what: The title or name of the resource.

  • when: The date associated with the resource.

  • where: A pointer to the resource’s location, often a URL.

Example:

erc.who: Mark Baggett
erc.what: Avalon Metadata Mapping Guide
erc.when: 2025-08-01
erc.where: https://example.org/avalon/mapping

EZID allows these erc: fields to be updated over time, supporting good stewardship and curation practices for digital collections.

What To Do When You Have Limited Metadata

Occasionally, we may have an issue where we are missing a certain who, what, when, or where value. In these cases, do the following.

Who

Who should be similar to dc:creator. When we don’t know, put Unknown. If we aren’t sure but it’s done by someone at a specific institution, list the institution.

What

What should be similar to dc:title. We need this value to help us identify what was originally described. This value should always be descriptive even if it’s not a title. For instance, it could be a filename or something else. If needed, it is also valid to concatenate multiple values together like title - filename.tif.

When

When should be similar to dc:date. When we don’t know, put Unknown or an edtf value approximating the date.

Where

Where should be the url of the resource online. This will normally be a work page, but could also be another form of url.

Generating ARKs Programmatically

To generate ARKs in batches, you can simply create a CSV with your standard erc fields. Then you can run code like below that will generate each ARK and record them on a spreadsheet:

import csv
import requests
import os


class EZIDARKGenerator:
    def __init__(self, shoulder_url='https://ezid.cdlib.org/shoulder/ark:/81423/d2'):
        self.url = shoulder_url
        self.headers = {'Content-Type': 'text/plain'}
        self.auth = (os.getenv("EZID_USER"), os.getenv("EZID_PASSWORD"))
        self.completed = []

    def create_metadata(self, who, what, when, where):
        """Create metadata content string for EZID request.

        Args:
            who (str): the agent responsible for the resource — typically the creator, author, or contributor.
            what (str): the title of the work
            when (str): the date of publication for the original work
            where (str): URL or current location of the resource

        Returns:
            dict: The data formatted for the Post and Creation of the Ark
        """
        return (
            f'erc.who: {who}\n'
            f'erc.what: {what}\n'
            f'erc.when: {when}\n'
            f'_target: {where}\n'
            f'_status: reserved\n'
        )

    def create_ark(self, who, what, when, where):
        """Create a single ARK identifier.

        Args:
            who (str): the agent responsible for the resource — typically the creator, author, or contributor.
            what (str): the title of the work
            when (str): the date of publication for the original work
            where (str): URL or current location of the resource

        Returns:
            dict: Data sent to the ARK with the ARK returned
        """
        metadata_content = self.create_metadata(who, what, when, where)
        data = metadata_content.encode('utf-8')

        response = requests.post(self.url, data=data, headers=self.headers, auth=self.auth)
        # https://n2t.net/ark:/81423/d2tg6j
        full_message = response.content.decode('utf-8')
        ark = ""
        if "success" in  full_message:
            ark = f"https://n2t.net/{full_message.split(' ')[-1]}"
        return {
            'who': who,
            'what': what,
            'when': when,
            'where': where,
            'message': full_message,
            'ark': ark,
        }

    def process_csv(self, input_file):
        """Process CSV file and create ARKs for each row.

        Args:

            input_file (str): The CSV that contains your ARK information with appropriate headings.
        """
        with open(input_file, 'r', newline='') as csvfile:
            reader = csv.DictReader(csvfile)

            for row in reader:
                result = self.create_ark(
                    row['who'],
                    row['what'],
                    row['when'],
                    row['where']
                )
                self.completed.append(result)

    def save_results(self, output_file):
        """Save completed results to CSV file."""
        fieldnames = ['who', 'what', 'when', 'where', 'message', 'ark']
        with open(output_file, 'w', newline='') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            for row in self.completed:
                writer.writerow(row)

    def run(self, input_file, output_file):
        """Main method to process input and save results."""
        self.process_csv(input_file)
        self.save_results(output_file)
        return self.completed


if __name__ == "__main__":
    input_csv = "quick.csv"
    output_csv = "output3.csv"
    generator = EZIDARKGenerator()
    results = generator.run(input_csv, output_csv)
    print(f"Processed {len(results)} records")