Opened 2 years ago

Closed 17 months ago

Last modified 17 months ago

#941 closed defect (fixed)

Truncated history revision for revisions

Reported by: adeiana Owned by: adeiana
Priority: major Milestone:
Component: *installation* Version:
Keywords: Cc: alessio.deiana@…

Description

We have a problem over here in inspire where some revisions for our records are truncated because they are longer than 65535 which is the max length for BLOBs in MySQL.
I would suggest to converting the blob to mediumblob.

r = run_sql('SELECT marcxml FROM hstRECORD WHERE id_bibrec=%s AND job_date=%s', ('1083313', '20120106133541'))

len(r[0][0])

65535

zlib.decompress(r[0][0])

Traceback (most recent call last):

File "<stdin>", line 1, in ?

zlib.error: Error -5 while decompressing data

Attachments (3)

clean_history_tables.py (562 bytes) - added by adeiana 2 years ago.
Script for checking old history rows
0001-Installation-updates-hstRECORD-table-with-bigger-mar.patch (864 bytes) - added by adeiana 2 years ago.
installation update patch
0001-BibEdit-updates-hstRECORD-table-with-bigger-marcxml-.patch (4.5 KB) - added by adeiana 2 years ago.
more complete patch

Download all attachments as: .zip

Change History (22)

comment:1 Changed 2 years ago by adeiana

  • Cc alessio.deiana@… added

comment:2 Changed 2 years ago by simko

  • Component changed from *general* to *installation*
  • Status changed from new to infoneeded_new

I fully agree, of course. Will you prepare a patch against master?

We may want to check other BLOB occurrences as well, e.g. firerole or job details:

$ git grep -Hni blob modules/miscutil/sql/ | grep -Ev '(mediumblob|longblob)'
modules/miscutil/sql/tabcreate.sql:2950:  password blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:2952:  settings blob default NULL,
modules/miscutil/sql/tabcreate.sql:2989:  firerole_def_ser blob NULL,
modules/miscutil/sql/tabcreate.sql:3004:  data blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:3149:  reply_order_cached_data blob NULL default NULL,
modules/miscutil/sql/tabcreate.sql:3199:  reply_order_cached_data blob NULL default NULL,
modules/miscutil/sql/tabcreate.sql:3636:  marcxml blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:3641:  job_details blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:3662:  job_details blob NULL default NULL,

comment:3 Changed 2 years ago by adeiana

  • Status changed from infoneeded_new to new

I will prepare a patch

comment:4 Changed 2 years ago by adeiana

  • Summary changed from Trucated history revision for revisions to Truncated history revision for revisions

Changed 2 years ago by adeiana

Script for checking old history rows

Changed 2 years ago by adeiana

installation update patch

comment:5 Changed 2 years ago by adeiana

  • Status changed from new to in_merge

I was originally planning to have a check when creating the revision.
However I do not think we need it anymore. I used longblob which is the
same column format used by bibfmt and as a result we will never have
a record longer than that except if it something really weird is going on.

comment:6 Changed 2 years ago by simko

  • Status changed from in_merge to assigned

Thanks. Can you please do the following:

  • update Makefile.am's target update-db-from-v1.0-to-v1.1 with appropriate ALTER TABLE statement;
  • turn your clean_history_tables script into a new CLI option for the bibedit tool, such as bibedit --check-revisions [recid], that would detect bad revisions for given record (or for all records if given asterisk, say) and that would not only list troublesome revisions, but perhaps also interactively ask whether to delete any found troublesome revisions. (BTW, when checking, beware that there may be bibupload processes running.) See bibedit --help for other existing options WRT record revisions; you can emulate what these do and edit bibeditcli.py in this respect.

Changed 2 years ago by adeiana

more complete patch

comment:7 Changed 2 years ago by adeiana

  • Status changed from assigned to in_merge

Did those changes.

comment:8 Changed 23 months ago by adeiana

  • Status changed from in_merge to assigned

comment:9 Changed 23 months ago by adeiana

  • Owner set to adeiana

comment:10 Changed 23 months ago by adeiana

  • Status changed from assigned to in_merge

comment:11 Changed 17 months ago by adeiana

The new version is here adeiana/941-hst

comment:12 Changed 17 months ago by skaplun

Hi Alessio,

it would be cool if:

  • you indeed give the possibility for users to interactively (/globally) delete corrupted histories (e.g. with e.g. a --delete-corrupted-history additional flag) as Tibor was mentioning in #comment:8)
  • print errors as in
    print >> sys.stderr, "ERROR: foo"
    

to make it clear that it is an error

  • (minor) you can wrap the results of run_sql in intbitset as in:
    for i, recid in enumerate(intbitset(run_sql("SELECT id FROM bibrec")))
    

to keep memory compact (instead of having a list of one million integers.

  • You can actually quick guess all the potential corrupted blobs: just look for all the blob whose length happen to exactly the length of the max size of the blob column:

i.e. 216 bytes if I well understand the reference at: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
In this way you could in principle speed up the overall check process and plug-it to the end of post_upgrade() hook :-)

comment:13 Changed 17 months ago by adeiana

I agree with the changes I updated the patch in consequence.

I would only sys.stderr for a script error, finding corrupted revisions is the normal behavior of the script so should be stdout to me.
So that change I left out.

You got it right I went for the speedy procedure in the invenio upgrade. I kept the only working way for bibeditcli because it is generic, it'll check for uncompression errors which can be due to other corruptions and not just len.

comment:14 Changed 17 months ago by Alessio Deiana <alessio.deiana@…>

  • Resolution set to fixed
  • Status changed from in_merge to closed

In c6f2a15d7e48e18d99edf966dc7ed91a587a271a:

installation: bigger hstRECORD.marcxml size

  • Updates table structure of the history table from blob to longblob. (fixes #941)
  • Adds a new cli option to bibedit --check-revisions to check for invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@…>

comment:15 Changed 17 months ago by Alessio Deiana <alessio.deiana@…>

In c6f2a15d7e48e18d99edf966dc7ed91a587a271a:

installation: bigger hstRECORD.marcxml size

  • Updates table structure of the history table from blob to longblob. (fixes #941)
  • Adds a new cli option to bibedit --check-revisions to check for invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@…>

comment:16 Changed 17 months ago by Alessio Deiana <alessio.deiana@…>

In c6f2a15d7e48e18d99edf966dc7ed91a587a271a:

installation: bigger hstRECORD.marcxml size

  • Updates table structure of the history table from blob to longblob. (fixes #941)
  • Adds a new cli option to bibedit --check-revisions to check for invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@…>

comment:17 Changed 17 months ago by Alessio Deiana <alessio.deiana@…>

In c6f2a15d7e48e18d99edf966dc7ed91a587a271a:

installation: bigger hstRECORD.marcxml size

  • Updates table structure of the history table from blob to longblob. (fixes #941)
  • Adds a new cli option to bibedit --check-revisions to check for invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@…>

comment:18 Changed 17 months ago by Alessio Deiana <alessio.deiana@…>

In c6f2a15d7e48e18d99edf966dc7ed91a587a271a:

installation: bigger hstRECORD.marcxml size

  • Updates table structure of the history table from blob to longblob. (fixes #941)
  • Adds a new cli option to bibedit --check-revisions to check for invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@…>

comment:19 Changed 17 months ago by Alessio Deiana <alessio.deiana@…>

In c6f2a15d7e48e18d99edf966dc7ed91a587a271a:

installation: bigger hstRECORD.marcxml size

  • Updates table structure of the history table from blob to longblob. (fixes #941)
  • Adds a new cli option to bibedit --check-revisions to check for invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@…>

Note: See TracTickets for help on using tickets.