Update notes for 2017-02-28

This commit is contained in:
Alan Orth 2017-02-28 22:58:29 +02:00
parent a3f0d88945
commit 56a24bf456
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
5 changed files with 59 additions and 9 deletions

View File

@ -310,4 +310,14 @@ dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where r
COPY 1968 COPY 1968
``` ```
- And then using awk or uniq to either remove or print the lines that have a duplicate `resource_id` (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the `metadata_value_id` to delete them - And then use awk to print the duplicate lines to a separate file:
```
$ awk -F',' 'seen[$1]++' /tmp/ciat.csv > /tmp/ciat-dupes.csv
```
- From that file I can create a list of 279 deletes and put them in a batch script like:
```
delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
```

View File

@ -90,7 +90,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
"headline": "February, 2017", "headline": "February, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-02/", "url": "https://alanorth.github.io/cgspace-notes/2017-02/",
"wordCount": "2019", "wordCount": "2028",
"datePublished": "2017-02-07T07:04:52-08:00", "datePublished": "2017-02-07T07:04:52-08:00",
@ -522,9 +522,19 @@ COPY 1968
</code></pre> </code></pre>
<ul> <ul>
<li>And then using awk or uniq to either remove or print the lines that have a duplicate <code>resource_id</code> (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the <code>metadata_value_id</code> to delete them</li> <li>And then use awk to print the duplicate lines to a separate file:</li>
</ul> </ul>
<pre><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv &gt; /tmp/ciat-dupes.csv
</code></pre>
<ul>
<li>From that file I can create a list of 279 deletes and put them in a batch script like:</li>
</ul>
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
</code></pre>

View File

@ -372,8 +372,18 @@ COPY 1968
&lt;/code&gt;&lt;/pre&gt; &lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;And then using awk or uniq to either remove or print the lines that have a duplicate &lt;code&gt;resource_id&lt;/code&gt; (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the &lt;code&gt;metadata_value_id&lt;/code&gt; to delete them&lt;/li&gt; &lt;li&gt;And then use awk to print the duplicate lines to a separate file:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ awk -F&#39;,&#39; &#39;seen[$1]++&#39; /tmp/ciat.csv &amp;gt; /tmp/ciat-dupes.csv
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;From that file I can create a list of 279 deletes and put them in a batch script like:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
&lt;/code&gt;&lt;/pre&gt;</description>
</item> </item>
<item> <item>

View File

@ -372,8 +372,18 @@ COPY 1968
&lt;/code&gt;&lt;/pre&gt; &lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;And then using awk or uniq to either remove or print the lines that have a duplicate &lt;code&gt;resource_id&lt;/code&gt; (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the &lt;code&gt;metadata_value_id&lt;/code&gt; to delete them&lt;/li&gt; &lt;li&gt;And then use awk to print the duplicate lines to a separate file:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ awk -F&#39;,&#39; &#39;seen[$1]++&#39; /tmp/ciat.csv &amp;gt; /tmp/ciat-dupes.csv
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;From that file I can create a list of 279 deletes and put them in a batch script like:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
&lt;/code&gt;&lt;/pre&gt;</description>
</item> </item>
<item> <item>

View File

@ -371,8 +371,18 @@ COPY 1968
&lt;/code&gt;&lt;/pre&gt; &lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;And then using awk or uniq to either remove or print the lines that have a duplicate &lt;code&gt;resource_id&lt;/code&gt; (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the &lt;code&gt;metadata_value_id&lt;/code&gt; to delete them&lt;/li&gt; &lt;li&gt;And then use awk to print the duplicate lines to a separate file:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ awk -F&#39;,&#39; &#39;seen[$1]++&#39; /tmp/ciat.csv &amp;gt; /tmp/ciat-dupes.csv
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;From that file I can create a list of 279 deletes and put them in a batch script like:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
&lt;/code&gt;&lt;/pre&gt;</description>
</item> </item>
<item> <item>