OAKTrust and DSPACE Statistcs

Accessing Statistics via Solr

DSPACE traffic is written to a solr core called statistcs. Each document looks like:

{
    "ip":"162.158.174.218",
    "referrer":"https://oaktrust.library.tamu.edu/collections/5375f882-869a-418f-b40e-0191ba379fa3",
    "dns":"162.158.174.218",
    "userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
    "isBot":false,
    "continent":"NA",
    "countryCode":"US",
    "city":"Dallas",
    "latitude":32.7767,
    "longitude":-96.797,
    "id":"22ab66ce-cfcc-4a8e-ac3f-3ce2238d1da3",
    "type":0,
    "owningItem":["7ed85e6e-21f9-4373-ac88-758ba0d579b2"],
    "owningColl":["5375f882-869a-418f-b40e-0191ba379fa3"],
    "owningComm":["ee1165a9-f6b9-4b93-8471-9e4bbee03d04",
      "e55ccac8-4d31-431f-9320-058cc3a708ab",
      "ed9a1370-076a-4cc0-bf87-25ae04053a36"],
    "time":"2025-08-01T20:58:28.617Z",
    "bundleName":["THUMBNAIL"],
    "statistics_type":"view",
    "uid":"8db5cc7c-1cb6-47c0-8bdd-06b86fac8d42"
}

Based on the document, you can see a number of strategies for getting data about a community, collection, or specific item. The section below will cover reproduceable ways of creating reports for this.

Getting Time Based Stats

Let’s pretend we want to get all traffic between 2025-08-01 and 2025-08-10 in OAKTrust. We can perform a search on the time field like time:[2025-08-01T00:00:00Z TO 2025-08-10T23:59:59Z]. Doing a search only for this string will search traffic across all communities and return 174282 documents as formatted like:

{
    "ip":"162.158.155.223",
    "referrer":"https://oaktrust.library.tamu.edu/bitstreams/ab61984f-c44a-4b78-bfd9-661a1c6557c4/download",
    "dns":"162.158.155.223",
    "userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36",
    "isBot":false,
    "continent":"NA",
    "countryCode":"US",
    "city":"Newark",
    "latitude":40.7357,
    "longitude":-74.1724,
    "id":"ab61984f-c44a-4b78-bfd9-661a1c6557c4",
    "type":0,
    "owningItem":["8cdd8f6a-dbb3-4a91-96f5-c8a3c81549b1"],
    "owningColl":["a213b1c3-40ff-49a8-a954-38b802e1aa1f",
      "f1fab458-eb4f-4acd-aba9-9c7b08f26730"],
    "owningComm":["5396a3df-85e9-4aef-ae08-13e1e164e55c",
      "9d351faf-09e6-469f-9a26-24bca6d907f6",
      "af0dfacd-b0f2-4f51-b2dd-e63f17519b4f",
      "ed9a1370-076a-4cc0-bf87-25ae04053a36"],
    "time":"2025-08-01T00:00:50.557Z",
    "bundleName":["ORIGINAL"],
    "statistics_type":"view",
    "uid":"4b4d537c-a48c-4486-9a2d-852de113f330"
}

Eliminating Bots

You can remove bot traffic by expanding your query to include time:[2025-08-01T00:00:00Z TO 2025-08-10T23:59:59Z] AND isBot:false. This reduces things to 110246.

If you want to avoid the Solr interface entirely, you can do this like so:

https://rancher.library.tamu.edu/k8s/clusters/c-kd2s7/api/v1/namespaces/oaktrust/services/http:dspace-solr:80/proxy/solr/statistics/select?indent=true&q.op=OR&q=time%3A%5B2025-08-01T00%3A00%3A00Z%20TO%202025-08-10T23%3A59%3A59Z%5D%20AND%20isBot%3Afalse

Getting Item Based Stats

Let’s pretend we want to get all traffic