Add notes for 2023-03-24

This commit is contained in:
2023-03-24 13:19:13 +03:00
parent 534f0d9cf8
commit 11646971a9
31 changed files with 117 additions and 36 deletions

View File

@ -455,4 +455,41 @@ $ psql dspace < /tmp/reindex.sql
- After playing with WebP at Q82 and Q92, I see it has lower ssimulacra2 scores than JPEG Q92 for the dozen test files
- Could it just be something with ImageMagick?
## 2023-03-22
- I updated csv-metadata-quality to use pandas 2.0.0rc1 and everything seems to work...?
- So the issues with nulls (isna) when I tried the first release candidate a few weeks ago were resolved?
- Meeting with Jawoo and others about a "ChatGPT-like" thing for CGIAR data using CGSpace documents and metadata
## 2023-03-23
- Add a missing IFPRI ORCID identifier to CGSpace and tag his items on CGSpace
- A super unscientific comparison between csv-metadata-quality's pytest regimen using Pandas 1.5.3 and Pandas 2.0.0rc1
- The data was gathered using [rusage](https://justine.lol/rusage), and this is the results of the last of three consecutive runs:
```
# Pandas 1.5.3
RL: took 1,585,999µs wall time
RL: ballooned to 272,380kb in size
RL: needed 2,093,947µs cpu (25% kernel)
RL: caused 55,856 page faults (100% memcpy)
RL: 699 context switches (1% consensual)
RL: performed 0 reads and 16 write i/o operations
# Pandas 2.0.0rc1
RL: took 1,625,718µs wall time
RL: ballooned to 262,116kb in size
RL: needed 2,148,425µs cpu (24% kernel)
RL: caused 63,934 page faults (100% memcpy)
RL: 461 context switches (2% consensual)
RL: performed 0 reads and 16 write i/o operations
```
- So it seems that Pandas 2.0.0rc1 took ten megabytes less RAM... interesting to see that the PyArrow-backed dtypes make a measurable difference even on my small test set
- I should try to compare runs of larger input files
## 2023-03-24
- I added a Flyway SQL migration for the PNG bitstream format registry changes on DSpace 7.6
<!-- vim: set sw=2 ts=2: -->