curl: (22) The requested URL returned error: 401 Unauthorized
```
- The DSpace log shows the item ID (because I modified the error text):
```
2019-05-01 11:41:11,069 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item(id=77708)!
```
- If I delete that one I get another, making the list of item IDs so far:
- 74648
- 77708
- 85079
- Some are in the `workspaceitem` table (pre-submission), others are in the `workflowitem` table (submitted), and others are actually approved, but withdrawn...
- This is actually a worthless exercise because the real issue is that the `/items/find-by-metadata-value` endpoint is simply designed flawed and shouldn't be fatally erroring when the search returns items the user doesn't have permission to access
- It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn't actually fix the problem because some items are *submitted* but *withdrawn*, so they actually have handles and everything
- I think the solution is to recommend people don't use the `/items/find-by-metadata-value` endpoint
- CIP is asking about embedding PDF thumbnail images in their RSS feeds again
- They asked in 2018-09 as well and I told them it wasn't possible
- To make sure, I looked at [the documentation for RSS media feeds](https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds) and tried it, but couldn't get it to work
- It seems to be geared towards iTunes and Podcasts... I dunno
- Run all system updates on DSpace Test (linode19) and reboot it
- Merge changes into the `5_x-prod` branch of CGSpace:
- Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links ([#421](https://github.com/ilri/DSpace/pull/421))
- Add new CCAFS Phase II project tags ([#420](https://github.com/ilri/DSpace/pull/420))
- Add item ID to REST API error logging ([#422](https://github.com/ilri/DSpace/pull/422))
- Re-deploy CGSpace from `5_x-prod` branch
- Run all system updates on CGSpace (linode18) and reboot it
- Strangely enough, I *do* see the statistics-2018, statistics-2017, etc cores in the Admin UI...
- I restarted Tomcat a few times (and even deleted all the Solr write locks) and at least five times there were issues loading one statistics core, causing the Atmire stats to be incomplete
- Also, I tried to increase the `writeLockTimeout` in `solrconfig.xml` from the default of 1000ms to 10000ms
- Eventually the Atmire stats started working, despite errors about "Error opening new searcher" in the Solr Admin UI
- I wrote to the dspace-tech mailing list again on the thread from March, 2019
- The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:
- I'm not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?
- Looking again, I see 84,000 requests to `/handle` this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in `access.log`):
- But it would be difficult to find a pattern for those requests because they cover 78,000 *unique* Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):
- I finally had time to analyze the 7,000 IPs from the major traffic spike on 2019-05-06 after several runs of my `resolve-addresses.py` script (ipapi.co has a limit of 1,000 requests per day)
- Resolving the unique IP addresses to organization and AS names reveals some pretty big abusers:
- 1213 from Region40 LLC (AS200557)
- 697 from Trusov Ilya Igorevych (AS50896)
- 687 from UGB Hosting OU (AS206485)
- 620 from UAB Rakrejus (AS62282)
- 491 from Dedipath (AS35913)
- 476 from Global Layer B.V. (AS49453)
- 333 from QuadraNet Enterprises LLC (AS8100)
- 278 from GigeNET (AS32181)
- 261 from Psychz Networks (AS40676)
- 196 from Cogent Communications (AS174)
- 125 from Blockchain Network Solutions Ltd (AS43444)
- 118 from Silverstar Invest Limited (AS35624)
- All of the IPs from these networks are using generic user agents like this, but MANY more, and they change many times:
```
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2703.0 Safari/537.36"
```
- I found a [blog post from 2018 detailing an attack from a DDoS service](https://www.qurium.org/alerts/azerbaijan/azerbaijan-and-the-region40-ddos-service/) that matches our pattern exactly
- They specifically mention:
<pre>The attack that targeted the “Search” functionality of the website, aimed to bypass our mitigation by performing slow but simultaneous searches from 5500 IP addresses.</pre>
- So this was definitely an attack of some sort... only God knows why
- I noticed a few new bots that don't use the word "bot" in their user agent and therefore don't match Tomcat's Crawler Session Manager Valve:
- Tezira says she's having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with `dspace test-email` and it's also working...
- Send a list of DSpace build tips to Panagis from AgroKnow
- Export a list of all investors (`dc.description.sponsorship`) for Peter to look through and correct:
```
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-05-16-investors.csv WITH CSV HEADER;
COPY 995
```
- Fork the [ICARDA AReS v1 repository](https://github.com/icarda-git/AReS) to [ILRI's GitHub](https://github.com/ilri/AReS) and give access to CodeObia guys
- I was going to make a new controlled vocabulary of the top 100 terms after these corrections, but I noticed a bunch of duplicates and variations when I sorted them alphabetically
- Instead, I exported a new list and asked Peter to look at it again
- Add "ISI journal" to item view sidebar at the request of Maria Garruccio
- Update `fix-metadata-values.py` and `delete-metadata-values.py` scripts to add some basic checking of CSV fields and colorize shell output using Colorama
- Convert some dates to string (from number in Excel)
- Trim whitespace on all fields
- Correct and standardize affiliations
- Validate subject terms against AGROVOC
- Add rights information to all items
- Correct and standardize sponsors
- Generate Simple Archive Format bundle with SAFBuilder and import into the [AfricaRice Articles in Journals](https://cgspace.cgiar.org/handle/10568/101106) collection on CGSpace:
```
$ dspace import -a -e me@cgiar.org -m 2019-05-25-AfricaRice.map -s /tmp/SimpleArchiveFormat
- Export new list of all authors from CGSpace database to send to Peter:
```
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-05-27-all-authors.csv with csv header;
COPY 64871
```
- Run all system updates on DSpace Test (linode19) and reboot it
- A CIMMYT user was having problems registering or logging into CGSpace
- I tried to register her and it gave an error, then I remembered for CGIAR LDAP users we actually need to just log in and it will automatically create an eperson
- I told her to try to log in with the LDAP login method and let me know what happens (then I can look in the logs too)
- I see the following error in the DSpace log when the user tries to log in with her CGIAR email and password on the LDAP login:
```
2019-05-30 07:19:35,166 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A5E0C836AF8F3ABB769FE47107AE1CFF:ip_addr=185.71.4.34:failed_login:no DN found for user sa.saini@cgiar.org
```
- For now I just created an eperson with her personal email address until I have time to check LDAP to see what's up with her CGIAR account:
```
$ dspace user -a -m blah@blah.com -g Sakshi -s Saini -p 'sknflksnfksnfdls'