Opened 4 years ago

Closed 3 years ago

#248 closed defect (fixed)

Database dump failure

Reported by: tbrooks Owned by: simko
Priority: major Milestone:
Component: MiscUtil Version:
Keywords: INSPIRE OpenAIRE Cc:

Description

As I mentioned on the last EVO, I'm taking some dumps of the db on dev and beta. Tibor suggested that max_allowed_packet was the culprit for having an unlcean dump of the db.

This is true for dev, I believe, in that I just finally took a snapshot of the db there with no errors, after raising max_allowed_packet in /etc/my.cnf to 500M. Still loading that on my machine here to make sure I am right.

However, on beta, I set max_allowed_packet to 500M and then 1G (on pcudssw1502)and still have a problem:

2010-08-16 20:15:01 --> Task #6337 started.
2010-08-16 20:15:01 --> Reading parameters started
2010-08-16 20:15:01 --> Reading parameters ended
2010-08-16 20:15:01 --> Database dump started
2010-08-16 20:15:01 --> ... writing /opt/tbrooks/data-dumps//inspirehep-dbdump-2010-08-16_20:15:01.sql
2010-08-16 20:24:53 --> ERROR: mysqldump exit code is 768.
2010-08-16 20:24:54 --> Task #6337 finished. [ERROR]

I can't find anything obvious about 768 error codes from MySQL.

This is not the same problem that dev was giving when doing the dumps that failed, so this may be a different issue, but note that the dump file gets to about 15G and ends with:

--
-- Dumping data for table `rnkCITATIONDATA`
--

LOCK TABLES `rnkCITATIONDATA` WRITE;
/*!40000 ALTER TABLE `rnkCITATIONDATA` DISABLE KEYS */;

Change History (16)

comment:1 Changed 4 years ago by simko

  • Status changed from new to infoneeded_new

You seem to have changed one value only; the [mysqldump] section value
still stayed low.

PCUDSSW1502> grep max_a /etc/my.cnf
max_allowed_packet = 1G
max_allowed_packet = 160M

comment:2 Changed 4 years ago by tbrooks

  • Status changed from infoneeded_new to new

Thanks, good point I missed that. but after fixing:

[tbrooks@pcudssw1502 ~]$ grep packet /etc/my.cnf
max_allowed_packet = 1G
max_allowed_packet = 1G

I still get:

2010-08-16 20:15:01 --> Task #6337 started.
2010-08-16 20:15:01 --> Reading parameters started
2010-08-16 20:15:01 --> Reading parameters ended
2010-08-16 20:15:01 --> Database dump started
2010-08-16 20:15:01 --> ... writing /opt/tbrooks/data-dumps//inspirehep-dbdump-2010-08-16_20:15:01.sql
2010-08-16 20:24:53 --> ERROR: mysqldump exit code is 768.
2010-08-16 20:24:54 --> Task #6337 finished. [ERROR]

comment:3 Changed 4 years ago by simko

  • Owner set to simko
  • Status changed from new to assigned

comment:4 Changed 4 years ago by simko

  • Status changed from assigned to in_work

comment:5 Changed 4 years ago by simko

  • Status changed from in_work to in_merge

Apparently fixed by setting max_allowed_packet on the client side
as well; a fresh DB dump is available at
/opt/cds-invenio/var/log/inspirehep-dbdump-2010-08-16_23:28:32.sql.
Closing the ticket, let's wait with further analysis in case problem
reappears.

comment:6 Changed 4 years ago by simko

  • Resolution set to fixed
  • Status changed from in_merge to closed

comment:7 Changed 4 years ago by tbrooks

For future reference, note that this is set in the client side, not in the args to mysqldump, but in my.cnf on the application server (1506 as opposed to 1502, the db server):

[tbrooks@pcudssw1506:tbrooks/data-dumps]$ more /etc/my.cnf  
                                                                                 
<snip>

[mysqldump]
max_allowed_packet = 1G

comment:8 Changed 4 years ago by simko

Yes, that's exactly what I did.

comment:9 Changed 3 years ago by skaplun

  • Keywords OpenAIRE added
  • Resolution fixed deleted
  • Status changed from closed to new

Mmh. I have the same issue in OpenAIRE. I tried to fix it setting 1G in any instance of max_allowed_packet without success. Moreover it is the first time that this errore happens (I think) and OpenAIRE has only 3 records :-) So I think it's an error unrelated to max_allowed_packet.

Could it be due to AFS?

P.s. I found this ticket by Googling around for MySQL exit code 768... We seem to be almost the only one on the web falling on this issue :-)

comment:10 Changed 3 years ago by simko

  • Status changed from new to in_merge

After further adjustments of MySQL parameters, the dumper has been
performing well for the past few weeks, so I'm closing this ticket.

comment:11 Changed 3 years ago by simko

  • Resolution set to fixed
  • Status changed from in_merge to closed

comment:12 Changed 3 years ago by jlavik

  • Resolution fixed deleted
  • Status changed from closed to new

This issue seem to be back on Inspire, as last nights dbdump failed in this exact way. Has anyone any further information about this issue?

comment:13 follow-up: Changed 3 years ago by lmarian

dbdumped failed on CDS as well last night, but with a different exit code: 1280, which I don't know what it is, and did not find any useful hints googleing it.
On the other hand, 2 nights ago, dbdumped failed on CDS with exit code 768 (as the original ticket reports). I modified my.cnf (as suggested above) but I am not convinced this was causing the problem.. it might be AFS playing tricks on us :(

comment:14 in reply to: ↑ 13 Changed 3 years ago by simko

  • Status changed from new to assigned

Replying to lmarian:

I modified my.cnf (as suggested above) but I am not convinced this was causing the problem.. it might be AFS playing tricks on us :(

I had increased max_allowed_packet on INSPIRE boxes because of the big citation dictionaries, where ~1GB of blob data is stored in one row. This definitely helped to remove the dumping blocker at the time, although the dump error re-appeared from time to time afterwards.

On CDS, the dumps were working fine in the past even with smaller max_allowed_packet value, because there are no big blobs such as citation dictionaries. So it should not be necessary to tweak this concrete parameter. Still, some of the tables are huge, so we may need to tweak some other buffer parameters.

I'll try to look at this problem closer now that it reoccurs more frequently again. FWIW, dumping to local file system also caused troubles on INSPIRE when I was testing this one year ago, although this may have been before max_allowed_packet and buffer tweaks. I'll have a look.

comment:15 Changed 3 years ago by simko

  • Status changed from assigned to in_merge

So I've had a look and both CDS and INSPIRE dumping problems were due to disk space issue. The dump files are bigger than 33G now, so there was not enough room on the dump partition to (temporarily) hold three dumps anymore. I have cleaned the disk space last night and the dump happened successfully both on CDS and INSPIRE, see for example RT ticket 143960.

To be safer in the future, I have increased the AFS quota for dbdump volume, see RT ticket 144096. Moreover, dbdump will soon check for available space before it runs, so that it won't stop the queue unnecessarily, see ticket:578.

I'm therefore closing this concrete ticket here again.

comment:16 Changed 3 years ago by simko

  • Resolution set to fixed
  • Status changed from in_merge to closed
Note: See TracTickets for help on using tickets.