Potential Variable Length Data Corruption Prevented During Automatic Recovery
5 October 2009
Affected Builds: All builds prior to 091005 Criteria: Variable Length Files under Transaction Control that have Undergone Automatic Recovery After a Failed Server Process Indications: Error 37 during recovery and/or evidence of data or index corruption
Sometime after automatic recovery on a system where the c-treeACE Server had previously gone through the process of automatic recovery, data and index files were found to contain unexpected 10-byte headers beginning with 0xfbbf marks. The headers appeared at unexpected locations, sometimes overwriting data records or index nodes, and sometimes written at the physical end of the file, preceded by a region of 0x00 bytes.
A second observed symptom was automatic recovery failing with error 37 due to an invalid file descriptor, as indicated by the following entries in CTSTATUS.FCS:
- User# 00002 trandat: scanning log 815
Wed Sep 23 15:44:25 2009
- User# 00002 WRITE_ERR: ???? at 0:1457a4ex sysiocod=6 bufsiz=10 bytes written=0[0] ioLoc=0: 37
Automatic recovery invalidates the space management index for variable-length data files, and it queues entries for the space reclamation thread to process. These entries trigger the space reclamation thread to reconstruct the space management index by physically scanning the variable-length data files for deleted records and adding keys to the space management index for each deleted region found in the file.
Changes to the space management index can place entries - with associated file numbers - into the transaction log. One such entry was written to the server's preImage space, however, not immediately written to the transaction logs. A later transaction can then commit this entry with a transaction file number not associated with the original file, as the original file could have been physically closed and the file number reassigned to another physical file.
Should the server then experience an abnormal failure and automatic recovery takes place, the entry is processed from the transaction log with a transaction file number associated with the wrong physical file. An attempt is then made to write this entry to the incorrect file causing either data to be overwritten, or a failed write due to an invalid file descriptor.
To prevent this behavior, the transaction log entries are now written immediately and directly to the physical transaction log ensuring a correct associated transaction file number upon recovery.
FairCom customers on current maintenance can request an updated V9 c-treeACE line at any time. Please contact your nearest FairCom office should you have any concerns that you are impacted by this update.
|