RDFizing DSPACE Works to Generate Manifests

irIIIFService requires RDF graphs in order to generate manifests for items in DSPACE. This document describes how to get a list of handles and generate turtle for irIIIFService.

Making a List

Assuming you have a DSPACE collection and want to create RDF for everything in it, you can use Python to get a list of handles. Just specify the collection object you want to use and where to write the handles like on lines 48 and 49.

import httpx
from rdflib import Graph, Namespace, URIRef


class Namespaces:
    def __init__(self):
        self.dc = "http://purl.org/dc/elements/1.1/"
        self.dcterms = "http://purl.org/dc/terms/"
        self.void = "http://rdfs.org/ns/void#"
        self.rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        self.xsd = "http://www.w3.org/2001/XMLSchema#"
        self.bibo = "http://purl.org/ontology/bibo/"
        self.foaf = "http://xmlns.com/foaf/0.1/"
        self.dspace = "http://digital-repositories.org/ontologies/dspace/0.1.0#"


class DspaceItem:
    def __init__(self, uri):
        self.uri = uri
        self.namespaces = Namespaces()
        self.g = self.__get_graph()

    def __get_graph(self):
        r = httpx.get(self.uri, timeout=60)
        try:
            if r.status_code == 200:
                g = Graph()
                g.parse(data=r.content, format='turtle')
                return g
            else:
                raise Exception(f"Failed to download {self.uri}. Status code: {r.status_code}")
        except Exception as e:
            raise Exception(f"Failed to download {self.uri}: {e}")

    def get_children(self):
        return [str(o) for s,p,o in self.g if p == URIRef(f"{self.namespaces.dcterms}hasPart")]

    def prep_for_rdfizer(self, output):
        children = self.get_children()
        handles = [f"{child.split('/')[-2]}/{child.split('/')[-1]}" for child in children]
        with open(output, "w") as fh:
            for handle in handles:
                fh.write(f"{handle}\n")



if __name__ == "__main__":
    x = DspaceItem(uri="https://oaktrust-pre.library.tamu.edu/server/rdf/handle/1969.1/86434")
    x.prep_for_rdfizer("maps.txt")

Copying your List and a Process to DSPACE

In order to script with rdfizer, you need a few things.

Steps

  1. First, navigate to a running container with the DSPACE CLI. The DSPACE CLI is usually at /dspace/bin/dspace but not in PATH.

  2. The running container also likely does not have a text editor that you can use. To create a bash script and get your files to the container, you can use cat. This isn’t ideal, but you can write a shell script like process.sh by running cat >> process.sh then pasting this:

#!/bin/bash

# Check if a filename was provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <input_file>"
    exit 1
fi

INPUT_FILE="$1"

# Check if the file exists
if [ ! -f "$INPUT_FILE" ]; then
    echo "Error: File '$INPUT_FILE' not found!"
    exit 1
fi

# Read the file line by line and execute the command
while IFS= read -r line; do
    # Trim any leading/trailing whitespace
    VALUE=$(echo "$line" | xargs)

    # Skip empty lines
    if [ -n "$VALUE" ]; then
        echo "Processing: $VALUE"
        /dspace/bin/dspace rdfizer -i "$VALUE"
    fi
done < "$INPUT_FILE"
  1. You can then save the file by pressing CTRL + D.

  2. You’ll need to make sure you can execute the file so make sure you have execute or give yourself full permissions by chmod 777 process.sh.

  3. Now, you’ll need to get your files there too. You can do the same process like cat > files.txt, pasting your files you created earlier, and then running like ./process.sh files.txt.

Monitoring Long Tasks

Sometimes tasks take a long time and you need to dettach or run in background. To do this, simply and a & to the end of your command like ./process.sh things.txt&.

You can then monitor the process like ps aux | grep 2640 where 2640 is your process id.

You can also watch this automatically like: watch -n 1 "ps aux | grep 2640" where 1 is the interval of the request.