$ dspace metadata-export -i 10568/16814 -f /tmp/iwmi.csv
-$ csvcut -c 'cg.creator.identifier,cg.creator.identifier[en_US]' ~/Downloads/2024-02-06-iwmi.csv \
- | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' \
- | sort -u \
- | tee /tmp/iwmi-orcids.txt \
- | wc -l
-353
-$ ./ilri/resolve_orcids.py -i /tmp/iwmi-orcids.txt -o /tmp/iwmi-orcids-names.csv -d
+
+Continue testing and debugging the cgspace-java-helpers on DSpace 7
+Work on IFPRI ISNAR archive cleanup
+
+2024-01-03
+
+- I haven’t heard from the Handle admins so I’m preparing a backup solution using nginx streams
+- This seems to work in my simple tests (this must be outside the
http {}
block):
+
+stream {
+ upstream handle_tcp_9000 {
+ server 188.34.177.10:9000;
+ }
+
+ server {
+ listen 9000;
+ proxy_connect_timeout 1s;
+ proxy_timeout 3s;
+ proxy_pass handle_tcp_9000;
+ }
+}
+
+- Here I forwarded a test TCP port 9000 from one server to another and was able to retrieve a test HTML that was running on the target
+
+- I will have to do TCP and UDP on port 2641, and TCP/HTTP on port 8000.
+
+
+- I did some more minor work on the IFPRI ISNAR archive
+
+- I got some PDFs from the UMN AgEcon search and fixed some metadata
+- Then I did some duplicate checking and found five items already on CGSpace
+
+
+
+2024-01-04
+
+- Upload 692 items for the ISNAR archive to CGSpace: https://cgspace.cgiar.org/handle/10568/136192
+- Help Peter proof and upload 252 items from the 2023 Gender conference to CGSpace
+- Meeting with IFPRI to discuss their migration to CGSpace
+
+- We agreed to add two new fields, one for IFPRI project and one for IFPRI publication ranking
+- Most likely we will use
cg.identifier.project
as a general field and consolidate other project fields there
+- Not sure which field to use for the publication rank…
+
+
+
+2024-01-05
+
+- Proof and upload 51 items in bulk for IFPRI
+- I did a big cleanup of user groups in anticipation of complaints about slow workflow tasks etc in DSpace 7
+
+- I removed ILRI editors from all the dozens of CCAFS community and collection groups, and I should do the same for other CRPs since they are closed for two years now
+
+
+
+2024-01-06
+
+- Migrate CGSpace to DSpace 7
+
+2024-01-07
+
+- High load on the server and UptimeRobot saying the frontend is flapping
+
+- I noticed tons of logs from pm2 in the systemd journal, so I disabled those in the systemd unit because they are available from pm2’s log directory anyway
+- I also noticed the same for Solr, so I disabled stdout for that systemd unit as well
+
+
+- I spent a lot of time bringing back the nginx rate limits we used in DSpace 6 and it seems to have helped
+- I see some client doing weird HEAD requests to search pages:
+
+47.76.35.19 - - [07/Jan/2024:00:00:02 +0100] "HEAD /search/?f.accessRights=Open+Access%2Cequals&f.actionArea=Resilient+Agrifood+Systems%2Cequals&f.author=Burkart%2C+Stefan%2Cequals&f.country=Kenya%2Cequals&f.impactArea=Climate+adaptation+and+mitigation%2Cequals&f.itemtype=Brief%2Cequals&f.publisher=CGIAR+System+Organization%2Cequals&f.region=Asia%2Cequals&f.sdg=SDG+12+-+Responsible+consumption+and+production%2Cequals&f.sponsorship=CGIAR+Trust+Fund%2Cequals&f.subject=environmental+factors%2Cequals&spc.page=1 HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.2504.63 Safari/537.36"
+
+- I will add their network blocks (AS45102) and regenerate my list of bot networks:
+
+$ wget https://asn.ipinfo.app/api/text/list/AS16276 \
+ https://asn.ipinfo.app/api/text/list/AS23576 \
+ https://asn.ipinfo.app/api/text/list/AS24940 \
+ https://asn.ipinfo.app/api/text/list/AS13238 \
+ https://asn.ipinfo.app/api/text/list/AS14061 \
+ https://asn.ipinfo.app/api/text/list/AS12876 \
+ https://asn.ipinfo.app/api/text/list/AS55286 \
+ https://asn.ipinfo.app/api/text/list/AS203020 \
+ https://asn.ipinfo.app/api/text/list/AS204287 \
+ https://asn.ipinfo.app/api/text/list/AS50245 \
+ https://asn.ipinfo.app/api/text/list/AS6939 \
+ https://asn.ipinfo.app/api/text/list/AS45102 \
+ https://asn.ipinfo.app/api/text/list/AS21859
+$ cat AS* | sort | uniq | wc -l
+4897
+$ cat AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt
+$ wc -l /tmp/networks.txt
+2017 /tmp/networks.txt
-- I noticed some similar looking names in our list so I clustered them in OpenRefine and manually checked a dozen or so to update our list
+- I’m surprised to see the number of networks reduced from my current ones… hmmm.
+- I will also update my list of Bing networks:
-2024-02-07
+$ ./ilri/bing-networks-to-ips.sh
+$ ~/go/bin/mapcidr -a < /tmp/bing-ips.txt > /tmp/bing-networks.txt
+$ wc -l /tmp/bing-networks.txt
+250 /tmp/bing-networks.txt
+
2024-01-08
-- Maria asked me about the “missing” item from last week again
-
-- I can see it when I used the Admin search, but not in her workflow
-- It was submitted by TIP so I checked that user’s workspace and found it there
-- After depositing, it went into the workflow so Maria should be able to see it now
+- Export list of publishers for Peter to select some amount to use as a controlled vocabulary:
-
-
-2024-02-09
-
-- Minor edits to CGSpace submission form
-- Upload 55 ISNAR book chapters to CGSpace from Peter
-
-2024-02-19
-
-- Looking into the collection mapping issue on CGSpace
-
-
-
-2024-02-20
-
-- Minor work on OpenRXV to fix a bug in the ng-select drop downs
-- Minor work on the DSpace 7 nginx configuration to allow requesting robots.txt and sitemaps without hitting rate limits
-
-2024-02-21
-
-- Minor updates on OpenRXV, including one bug fix for missing mapped collections
-
-- Salem had to re-work the harvester for DSpace 7 since the mapped collections and parent collection list are separate!
-
-
-
-2024-02-22
-
-- Discuss tagging of datasets and re-work the submission form to encourage use of DOI field for any item that has a DOI, and the normal URL field if not
-
-- The “cg.identifier.dataurl” field will be used for “related” datasets
-- I still have to check and move some metadata for existing datasets
-
-
-
-2024-02-23
-
-- This morning Tomcat died due to an OOM kill from the kernel:
-
-kernel: Out of memory: Killed process 698 (java) total-vm:14151300kB, anon-rss:9665812kB, file-rss:320kB, shmem-rss:0kB, UID:997 pgtables:20436kB oom_score_adj:0
+localhost/dspace7= ☘ \COPY (SELECT DISTINCT text_value AS "dcterms.publisher", count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 178 GROUP BY "dcterms.publisher" ORDER BY count DESC) to /tmp/2024-01-publishers.csv WITH CSV HEADER;
+COPY 4332
-- I don’t see any abnormal pattern in my Grafana graphs, for JVM or system load… very weird
-- I updated the submission form on CGSpace to include the new changes to URLs for datasets
+
- Address some feedback on DSpace 7 from users, including fileing some issues on GitHub
+
+- The Alliance TIP team was having issues posting to one collection via the legacy DSpace 6 REST API
+
+- In the DSpace logs I see the same issue that they had last month:
-2024-02-25
+ERROR unknown unknown org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
+
2024-01-09
-- This morning Tomcat died while I was doing a CSV export, with an OOM kill from the kernel:
-
-kernel: Out of memory: Killed process 720768 (java) total-vm:14079976kB, anon-rss:9301684kB, file-rss:152kB, shmem-rss:0kB, UID:997 pgtables:19488kB oom_score_adj:0
-
-- I don’t know why this is happening so often recently…
-
-2024-02-27
+I restarted Tomcat to see if it helps the REST issue
+After talking with Peter about publishers we decided to get a clean list of the top ~100 publishers and then make sure all CGIAR centers, Initiatives, and Impact Platforms are there as well
-- IFPRI sent me a list of authors to add to our list for now, until we can find a better way of doing it
-
-- I extracted the existing authors from our controlled vocabulary and combined them with IFPRI’s:
-
-
-
-$ xmllint --xpath '//node/isComposedBy/node()' dspace/config/controlled-vocabularies/dc-contributor-author.xml \
- | grep -oE 'label=".*"' \
- | sed -e 's/label="//' -e 's/"$//' > /tmp/authors
-$ cat /tmp/authors /tmp/ifpri-authors | sort -u > /tmp/new-authors
-
2024-02-28
-
-- I figured out a way to add a new Angular component to handle all our relation fields
-
-2024-02-29
-
-- Clean up a bunch of metadata on CGSpace
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-