diff --git a/docs/2020-06/index.html b/docs/2020-06/index.html
index 16b869257..a0a60f7a3 100644
--- a/docs/2020-06/index.html
+++ b/docs/2020-06/index.html
@@ -19,7 +19,7 @@ I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Tes
-
+
@@ -43,9 +43,9 @@ I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Tes
"@type": "BlogPosting",
"headline": "June, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-06/",
- "wordCount": "846",
+ "wordCount": "2127",
"datePublished": "2020-06-01T13:55:39+03:00",
- "dateModified": "2020-06-04T14:43:40+03:00",
+ "dateModified": "2020-06-06T13:33:23+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -256,6 +256,199 @@ sys 3m13.929s
Unfortunately the item is still not showing up in the item mapper…
Something happened to AReS Explorer (linode20) so I ran all system updates and rebooted it
+2020-06-07
+
+- Peter said he was annoyed with a CSV export from CGSpace because of the different
text_lang
attributes and asked if we can fix it
+- The last time I normalized these was in 2019-06, and currently it looks like this:
+
+dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE resource_type_id=2 GROUP BY text_lang ORDER BY count DESC;
+ text_lang | count
+-------------+---------
+ en_US | 2158377
+ en | 149540
+ | 49206
+ es_ES | 18
+ fr | 4
+ Peer Review | 1
+ | 0
+(7 rows)
+
+- In theory we can have different languages for metadata fields but in practice we don’t do that, so we might as well normalize everything to “en_US” (and perhaps I should make a curation task to do this)
+- For now I will do it manually on CGSpace and DSpace Test:
+
+dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2;
+UPDATE 2414738
+
+dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item);
+
+- Peter asked if it was possible to find all ILRI items that have “zoonoses” or “zoonotic” in their titles and check if they have the ILRI subject “ZOONOTIC DISEASES” (and add it if not)
+
+- Unfortunately the only way we have currently would be to export the entire ILRI community as a CSV and filter/edit it in OpenRefine
+
+
+
+2020-06-08
+
+- I manually mapped the two Big Data items that Maria had asked about last week by exporting their metadata to CSV and re-importing it
+
+- I still need to look into the underlying issue there, seems to be something in Solr
+- Something strang is that, when I search for part of the title in Discovery I get 2,000 results on CGSpace, while on my local DSpace 5.8 environment I get 2!
+
+
+
+
+
+
+- On DSpace Test, which is currently running DSpace 6, I get 2,000 results but the top one is the correct match and the item does show up in the item mapper
+
+- Interestingly, if I search directly in the Solr
search
core on CGSpace with a query like handle:10568/108315
I don’t see the item, but on my local Solr I see them!
+
+
+- Peter asked if it was easy for me to add ILRI subject “ZOONOTIC DISEASES” to any items in the ILRI community that had “zoonotic” or “zoonoses” in their title, but were missing the ILRI subject
+
+- I exported the ILRI community metadata, cut the three fields I needed, and then filtered and edited the CSV in OpenRefine:
+
+
+
+$ dspace metadata-export -i 10568/1 -f /tmp/2020-06-08-ILRI.csv
+$ csvcut -c 'id,cg.subject.ilri[en_US],dc.title[en_US]' ~/Downloads/2020-06-08-ILRI.csv > /tmp/ilri.csv
+
+- Moayad asked why he’s getting HTTP 500 errors on CGSpace
+
+- I looked in the Nginx logs and I see some HTTP 500 responses, but nothing in nginx’s error.log
+- Looking in Tomcat’s log I see there are many:
+
+
+
+# journalctl --since=today -u tomcat7 | grep -c 'Internal Server Error'
+482
+
+- They are all related to the REST API, like:
+
+Jun 07 02:00:27 linode18 tomcat7[6286]: SEVERE: Mapped exception to response: 500 (Internal Server Error)
+Jun 07 02:00:27 linode18 tomcat7[6286]: javax.ws.rs.WebApplicationException
+Jun 07 02:00:27 linode18 tomcat7[6286]: at org.dspace.rest.Resource.processException(Resource.java:151)
+Jun 07 02:00:27 linode18 tomcat7[6286]: at org.dspace.rest.ItemsResource.getItems(ItemsResource.java:195)
+Jun 07 02:00:27 linode18 tomcat7[6286]: at sun.reflect.GeneratedMethodAccessor548.invoke(Unknown Source)
+Jun 07 02:00:27 linode18 tomcat7[6286]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+Jun 07 02:00:27 linode18 tomcat7[6286]: at java.lang.reflect.Method.invoke(Method.java:498)
+Jun 07 02:00:27 linode18 tomcat7[6286]: at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
+...
+
+Jun 08 09:28:29 linode18 tomcat7[6286]: SEVERE: Mapped exception to response: 500 (Internal Server Error)
+Jun 08 09:28:29 linode18 tomcat7[6286]: javax.ws.rs.WebApplicationException
+Jun 08 09:28:29 linode18 tomcat7[6286]: at org.dspace.rest.Resource.processFinally(Resource.java:169)
+Jun 08 09:28:29 linode18 tomcat7[6286]: at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
+Jun 08 09:28:29 linode18 tomcat7[6286]: at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
+Jun 08 09:28:29 linode18 tomcat7[6286]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+Jun 08 09:28:29 linode18 tomcat7[6286]: at java.lang.reflect.Method.invoke(Method.java:498)
+
+Jun 06 08:19:54 linode18 tomcat7[6286]: SEVERE: Mapped exception to response: 500 (Internal Server Error)
+Jun 06 08:19:54 linode18 tomcat7[6286]: javax.ws.rs.WebApplicationException
+Jun 06 08:19:54 linode18 tomcat7[6286]: at org.dspace.rest.Resource.processException(Resource.java:151)
+Jun 06 08:19:54 linode18 tomcat7[6286]: at org.dspace.rest.CollectionsResource.getCollectionItems(CollectionsResource.java:289)
+Jun 06 08:19:54 linode18 tomcat7[6286]: at sun.reflect.GeneratedMethodAccessor598.invoke(Unknown Source)
+Jun 06 08:19:54 linode18 tomcat7[6286]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+Jun 06 08:19:54 linode18 tomcat7[6286]: at java.lang.reflect.Method.invoke(Method.java:498)
+
+- Looking back, I see ~800 of these errors since I changed the database configuration last week:
+
+# journalctl --since=2020-06-04 --until=today -u tomcat7 | grep -c 'javax.ws.rs.WebApplicationException'
+795
+
+- And only ~280 in the entire month before that…
+
+# journalctl --since=2020-05-01 --until=2020-06-04 -u tomcat7 | grep -c 'javax.ws.rs.WebApplicationException'
+286
+
+- So it seems to be related to the database, perhaps that there are less connections in the pool?
+
+- … and on that note, working without the JDBC driver and DSpace’s built-in connection pool since 2020-06-04 hasn’t actually solved anything, the issue with locks and idle in transaction connections is creeping up again!
+
+
+
+
+
+
+- It seems to have started today around 10:00 AM… I need to pore over the logs to see if there is a correlation
+
+- I think there is some kind of attack going on because I see a bunch of requests for sequential Handles from a similar IP range in a datacenter in Sweden where the user does not re-use their DSpace
session_id
+- Looking in the nginx logs I see most (all?) of these requests are using the following user agent:
+
+
+
+Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36
+
+- Looking at the nginx access logs I see that, other than something that seems like Google Feedburner, all hosts using this user agent are all in Sweden!
+
+# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.*.gz | grep 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36' | grep -v '/feed' | awk '{print $1}' | sort | uniq -c | sort -n
+ 1624 192.36.136.246
+ 1627 192.36.241.95
+ 1629 192.165.45.204
+ 1636 192.36.119.28
+ 1641 192.36.217.7
+ 1648 192.121.146.160
+ 1648 192.36.23.35
+ 1651 192.36.109.94
+ 1659 192.36.24.93
+ 1665 192.36.154.13
+ 1679 192.36.137.125
+ 1682 192.176.249.42
+ 1682 192.36.166.120
+ 1683 192.36.172.86
+ 1683 192.36.198.145
+ 1689 192.36.226.212
+ 1702 192.121.136.49
+ 1704 192.36.207.54
+ 1740 192.36.121.98
+ 1774 192.36.173.93
+
+- The earliest I see any of these hosts is 2020-06-05 (three days ago)
+- I will purge them from the Solr statistics and add them to abusive IPs ipset in the Ansible deployment scripts
+
+$ ./check-spider-ip-hits.sh -f /tmp/ips -s statistics -p
+Purging 1423 hits from 192.36.136.246 in statistics
+Purging 1387 hits from 192.36.241.95 in statistics
+Purging 1398 hits from 192.165.45.204 in statistics
+Purging 1413 hits from 192.36.119.28 in statistics
+Purging 1418 hits from 192.36.217.7 in statistics
+Purging 1418 hits from 192.121.146.160 in statistics
+Purging 1416 hits from 192.36.23.35 in statistics
+Purging 1449 hits from 192.36.109.94 in statistics
+Purging 1440 hits from 192.36.24.93 in statistics
+Purging 1465 hits from 192.36.154.13 in statistics
+Purging 1447 hits from 192.36.137.125 in statistics
+Purging 1453 hits from 192.176.249.42 in statistics
+Purging 1462 hits from 192.36.166.120 in statistics
+Purging 1499 hits from 192.36.172.86 in statistics
+Purging 1457 hits from 192.36.198.145 in statistics
+Purging 1467 hits from 192.36.226.212 in statistics
+Purging 1489 hits from 192.121.136.49 in statistics
+Purging 1478 hits from 192.36.207.54 in statistics
+Purging 1502 hits from 192.36.121.98 in statistics
+Purging 1544 hits from 192.36.173.93 in statistics
+
+Total number of bot hits purged: 29025
+
+- Skype with Enrico, Moayad, Jane, Peter, and Abenet to see the latest OpenRXV/AReS developments
+
+- One thing Enrico mentioned to me during the call was that they had issues with Altmetric’s user agents, and he said they are apparently using
Altmetribot
and Postgenomic V2
+- I looked in our logs and indeed we have those, so I will add them to the nginx rate limit bypass
+- I checked the Solr stats and it seems there are only a few thousand in 2016 and a few hundred in other years so I won’t bother adding it to the DSpace robot user agents list
+
+
+- Atmire sent an updated pull request for the Font Awesome 5 update for CUA (#445) so I filed feedback on their tracker
+
diff --git a/docs/2020/06/cgspace-discovery-search.png b/docs/2020/06/cgspace-discovery-search.png
new file mode 100644
index 000000000..0e2f0a7ad
Binary files /dev/null and b/docs/2020/06/cgspace-discovery-search.png differ
diff --git a/docs/2020/06/localhost-discovery-search.png b/docs/2020/06/localhost-discovery-search.png
new file mode 100644
index 000000000..2acfc2fda
Binary files /dev/null and b/docs/2020/06/localhost-discovery-search.png differ
diff --git a/docs/2020/06/postgres_connections_ALL-day2.png b/docs/2020/06/postgres_connections_ALL-day2.png
new file mode 100644
index 000000000..7ecf16653
Binary files /dev/null and b/docs/2020/06/postgres_connections_ALL-day2.png differ
diff --git a/docs/2020/06/postgres_connections_ALL-week2.png b/docs/2020/06/postgres_connections_ALL-week2.png
new file mode 100644
index 000000000..a88f81dba
Binary files /dev/null and b/docs/2020/06/postgres_connections_ALL-week2.png differ
diff --git a/docs/categories/index.html b/docs/categories/index.html
index d7ff5a7f7..b2c09b4b8 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 4bf4b52e3..0e04ebdc4 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index b8aa23895..cc7847a3a 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 94c5f1a03..864d84542 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 2d9e95171..1d66bf8f3 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/2/index.html b/docs/categories/page/2/index.html
index 8b87db25a..a57de89cb 100644
--- a/docs/categories/page/2/index.html
+++ b/docs/categories/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/3/index.html b/docs/categories/page/3/index.html
index da6f3e509..d32160d11 100644
--- a/docs/categories/page/3/index.html
+++ b/docs/categories/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/4/index.html b/docs/categories/page/4/index.html
index 2d985a90c..7f1a8a149 100644
--- a/docs/categories/page/4/index.html
+++ b/docs/categories/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/5/index.html b/docs/categories/page/5/index.html
index 451f5e818..3384f6a11 100644
--- a/docs/categories/page/5/index.html
+++ b/docs/categories/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/6/index.html b/docs/categories/page/6/index.html
index 3da669000..2660b9e1d 100644
--- a/docs/categories/page/6/index.html
+++ b/docs/categories/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 41ca72a5c..f78fea4f0 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 65eaca68f..1167f20ca 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index f4c47129c..447a1ff56 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 72ab934e0..a7c5eb4df 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 61faf0161..1bfd9d79f 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index f23a2166a..2d90d0b68 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index cee47a757..7d14dbe2f 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index d5a70d58e..555c25a81 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 9081e649e..41c396e12 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 1e976774f..56d1ea966 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 15615edaa..cd092b59d 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index cabe54fd0..0a522813c 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 0b8446c5b..b0fc8a834 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,27 +4,27 @@
https://alanorth.github.io/cgspace-notes/categories/
- 2020-06-04T14:43:40+03:00
+ 2020-06-06T13:33:23+03:00
https://alanorth.github.io/cgspace-notes/
- 2020-06-04T14:43:40+03:00
+ 2020-06-06T13:33:23+03:00
https://alanorth.github.io/cgspace-notes/2020-06/
- 2020-06-04T14:43:40+03:00
+ 2020-06-06T13:33:23+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2020-06-04T14:43:40+03:00
+ 2020-06-06T13:33:23+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2020-06-04T14:43:40+03:00
+ 2020-06-06T13:33:23+03:00
diff --git a/static/2020/06/cgspace-discovery-search.png b/static/2020/06/cgspace-discovery-search.png
new file mode 100644
index 000000000..0e2f0a7ad
Binary files /dev/null and b/static/2020/06/cgspace-discovery-search.png differ
diff --git a/static/2020/06/localhost-discovery-search.png b/static/2020/06/localhost-discovery-search.png
new file mode 100644
index 000000000..2acfc2fda
Binary files /dev/null and b/static/2020/06/localhost-discovery-search.png differ
diff --git a/static/2020/06/postgres_connections_ALL-day2.png b/static/2020/06/postgres_connections_ALL-day2.png
new file mode 100644
index 000000000..7ecf16653
Binary files /dev/null and b/static/2020/06/postgres_connections_ALL-day2.png differ
diff --git a/static/2020/06/postgres_connections_ALL-week2.png b/static/2020/06/postgres_connections_ALL-week2.png
new file mode 100644
index 000000000..a88f81dba
Binary files /dev/null and b/static/2020/06/postgres_connections_ALL-week2.png differ