Multics Technical Bulletin MTB-568 DM: Rollback To: Distribution From: Andre Bensoussan Date: 06/23/83 Subject: Data Management: Rollback ABSTRACT This MTB describes how the recovery system rolls back unfinished transactions during normal operation, and how it rolls back all unfinished transactions after a system crash. During normal operation, a transaction may be rolled back by the process that started the transaction, if it is still alive; otherwise, it is rolled back by the Data Management Daemon process. After a system crash, the Multics system is first initialized; then various Deamons are logged in, and in particular, the Data Management Daemon process. Its first task is to check if some transactions were in progress at the time of the crash and, if so, to roll them back. Comments should be sent to the author: via Multics Mail: Bensoussan.Multics on System M. via US Mail: André Bensoussan Honeywell Information Systems, inc. 575 Tech Square Cambridge, Massachusetts 02139 via telephone: (HVN) 261-9334, or (617) 492-9334 _________________________________________________________________ Multics project internal working documentation. Not to be reproduced or distributed outside the Multics project. CONTENTS Page Abstract . . . . . . . . . . . . . . . i 1 Introduction . . . . . . . . . . . . 1 2 Rolling back a transaction . . . . . 2 2.1 Summary of what the rollback procedure does . . . . . . . . . . 3 2.2 Environment of the rollback procedure . . . . . . . . . . . . 3 2.3 File identification . . . . . . 3 2.4 How the rollback procedure does its job . . . . . . . . . . . . . 4 3 Rolling back after crash . . . . . . 8 3.1 Invoking the rollback_after_crash . . . . . . . 8 3.2 Finding all Journals and Files 8 3.3 Finding the end of each before journal . . . . . . . . . . . . . 9 3.4 Finding the end of each after journal . . . . . . . . . . . . . 10 3.5 Phasing before and after journals . . . . . . . . . . . . . 10 3.6 Finding all unfinished transactions . . . . . . . . . . . 11 3.7 Rolling back all unfinished transactions . . . . . . . . . . . 12 3.8 Cleaning up . . . . . . . . . . 12 3.9 Accepting users again . . . . . 13 Multics Technical Bulletin MTB-568 DM: Rollback 1 INTRODUCTION The Rollback description contained in this memo is the logical continuation of the Before Journal Manager Design document (MTB-560). It is the object of a separate MTB because of practical size consideration. It can be viewed as Part II of the Before Journal Manager Design. Part I (MTB-560) describes what information is stored in the journal, and how it is stored, in order to be used later if needed. Part II (this MTB) describes how rollback uses the information stored in the journal. The first portion of this memo describes how the "rollback" primitive of the before journal manager does its job of rolling back a single transaction, during normal system operation. This rollback may be performed by the process that was executing the transaction, if it is still alive, or by the Data Management Daemon process. The second portion describes how recovery after crash does its job of finding out, after a crash, what the state of the system was at the time of the crash, and rolling back all transactions that were in progress at the time of the crash. This job is always done by the Data Management Daemon process. MTB-568 Multics Technical Bulletin DM: Rollback 2 ROLLING BACK A TRANSACTION Rolling back a transaction consists of several operations executed by the before journal, the after journal, the file and the lock managers, orchestrated by the transaction manager. The transaction manager may perform a rollback because the transaction has to be aborted or because it has to be restarted from the beginning or from a given checkpoint. To rollback a transaction, the transaction manager takes the following steps: (1) Call before_journal_manager_$rollback, to undo the modifications made by the transaction, up to the beginning or up to a specified checkpoint. (2) Call file_manager_$flush_modified_ci, to flush all control intervals modified by the rollback procedure while undoing the original modifications. (3) Call after_journal_manager_$flush_transaction, to flush all after images produced by the transaction being rolled back, including the after images produced by the rollback procedure. (4) Call before_journal_manager_$write_rolled_back_mark, to write a mark in the before journal used by the transaction, indicating that the transaction has been rolled back and how far it has been rolled back. (4a) Call before_journal_manager_$write_aborted_mark, to write a mark in the before journal used by the transaction, indicating that the transaction has been aborted. This step is taken instead of step 4 if the transaction manager is rolling back the transactiom in order to abort it. (5) Call lock_manager_$unlock_all, to unlock all locks set by the transaction, or the portion of it, that has been rolled back. What we are interested in, here, is the before_journal_manager_$rollback procedure, which does most of the work, and which will be referred to as the "rollback procedure" in the remainder of this memo. Multics Technical Bulletin MTB-568 DM: Rollback 2.1 Summary of what the rollback procedure does The rollback procedure reads all before journal records produced by the transaction, in reverse chronological order, from the last record to the begin mark record (or the checkpoint mark record specified by the caller). Each time it reads a record, it performs the appropriate action to undo what the transaction had done. In order to undo the modifications made to a control interval of a protected file, the rollback procedure has to write again in this control interval. It does so by calling the special entry point file_manager_$unput, which restores the control interval to its original value, and causes this modification made by rollback to be journalized in the After Journal associated with the file it writes into. After images produced during the rollback logically cancel out the original after images produced while the transaction was in progress. No Before Images are produced during rollback. 2.2 Environment of the rollback procedure The rollback procedure may be executed by the process that was executing the transaction, or by the Data Management Daemon process, a daemon process associated with the data management system. While this function is performed, other transactions may be in progress concurrently. Several transactions may be being rolled back concurrently, by several processes. In order to work properly, the rollback procedure expects all tables used by the file, transaction, before journal, after journal and lock managers to be in a consistent state. 2.3 File identification Each Before Image record was produced by a transaction before modifying a file and contains the identification of the file in two forms: the file opening id and the file unique id. When the rollback is performed by the process that was executing the transaction, the file opening id is used by the rollback procedure to refer to the file when calling the file manager. However, when the rollback is performed by the daemon process, the file opening id cannot be used, since it is MTB-568 Multics Technical Bulletin DM: Rollback meaningful only in the original process. Instead, the file uid is used to search a uid to pathname conversion table, in which all protected files are registered, for as long as they may be needed by the rollback mechanism. This table is maintained by the open primitive of the file manager; it is needed to be able to rollback and it must be as safe as the Before Journal itself. Ideally, it should be implemented as an Index in a protected file, whose modifications are journalized in "well known" before and after journals; in the first release, it will be implemented as a segment in virtual memory, carefully modified and flushed after each modification. 2.4 How the rollback procedure does its job The calling sequence of the rollback procedure is: call before_journal_manager_$rollback (txn_id, txn_ix, checkpoint_no, code) where txn_id is the transaction id of the transaction to be rolled back, txn_ix is the index in the transaction table of the entry assigned to the transaction, checkpoint_no is the checkpoint number at which the rollback is supposed to stop, and code is a standard system error code. The major steps of this rollback procedure can be described as follows: (1) Locate the bj_txte info structure for the transaction to rollback. This structure is an entry in the bj_txt table, and contains before journal information about this transaction. (2) Get the bj_oid and the bj_uid from the bj_txte info. The bj_oid must be validated against the bj_uid to determine whether or not it can be used by the process doing the rollback to reference the before journal. When the rollback is done by the Data Management Daemon process, the bj_oid will be found invalid, because it belongs to the original process. In any event, when the bj_oid is not bound to bj_uid in the process doing the rollback, this process must acquire a valid one. It does so by using the bj_uid to find the pathname of the before journal, in the system table which contains the list of all before journals opened in the system. With this pathname, it opens the journal and enters the bj_oid in the bj_txte info. Multics Technical Bulletin MTB-568 DM: Rollback (3) Get the record id of the last record stored in the before journal by the transaction, from the bj_txte info. (4) Flush the before journal up to this last record to guarantee that all records necessary for rolling back are in the file in which the journal is written, and none of them are still in the main memory buffer used by the before journal manager. (5) Read the last record produced by the transaction by calling: call bj_storage_get (bj_oid, record_id,....) If the last record produced by the transaction is a committed or aborted mark, return a status code to the caller, indicating that the transaction has been committed or aborted, and that it cannot be rolled back. This case may occur if the process executing the transaction lost control while the transaction was being committed, after the commit mark was logically written in the journal but before the transaction manager could be informed that the commit mark was physically on disk. (6) Analyse the record just read from the journal and take the appropriate action, according to its type: (a) If it is a "before_image" record, use its contents to undo the modification it is supposed to undo; then read the previous record produced by the transaction in this journal and go back to step (6): "Analyse the record just read...". In order to undo the modification associated with this before image record, the rollback procedure has to call the file manager to write in some control interval. The identification of the file is found in the before image record in the form of the file_oid and the file_uid. The file_oid must be validated to make sure it is bound to the file_uid. If the rollback is done by the Data Management Daemon process, the file_oid will, in general, be invalid and the file_oid for the file in the daemon process must be used when calling the file manager to write in the control interval. MTB-568 Multics Technical Bulletin DM: Rollback In the event that this file is not open in the process that does the rollback, it has to be opened: the file_uid is found in the before image; it is used to search the table containing the list of all protected files open in the system (or that were open at the time of the crash, as explained in the next section), in order to determine the pathname of the file; then the pathname is used to open the file, and the new file_oid is used instead of the file_oid stored in the before image. The rollback procedure can now call the file manager to write the appropriate portions of the control interval, with the understanding that it is a rollback action and therefore no before image must be taken, but an after image must be taken, like for any other modification, in order to cancel out the after image produced when the modification was done by the transaction itself. A special entry point file_manager_$unput is provided by the file manager, for rolling back modifications. To take an after image, the file manager must call the after journal manager with the aj_oid of the after journal. It can find the pathname and aj_uid of the after journal in the file attributes stored in control interval zero of the file. If the after journal is not open in the process doing the rollback, it must be open and the aj_oid obtained is then used in subsequent references to this after journal. (b) If it is a "rollback_handler" record, the name of the procedure to be called is extracted from the record, an entry variable is initialized to the value of this entry point and the entry point is called, with the bit representation of the input data it expects to do its job; this bit string is also extracted from the before journal record. When the handler returns, the previous record produced by the transaction in the before journal is read and control is transferred back to step (6): "Analyse the record just read...". (c) If it is a "committed" or an "aborted" mark, this is a system error, unless this record is the last record produced by the transaction, as explained above, in step 5. Multics Technical Bulletin MTB-568 DM: Rollback (d) If it is a "rolled_back" mark, it indicates that the transaction has been rolled back up to a checkpoint, or up to the beginning. This mark contains a pointer to the record up to which the transaction has already been rolled back. So, when encountering a rolled_back mark, the rollback procedure skips all the previous records that were already used in a previous rollback, and goes directly to the checkpoint record where the previous rollback stopped. Thus, it reads the record pointed to by the rolled_back record and goes back to step (6): "Analyse the record just read...". (e) If it is a "checkpoint" mark and its checkpoint number is greater than the checkpoint number at which the rollback procedure is supposed to stop, then read the previous record produced by the transaction and go back to step (6): "Analyse the record just read...". (f) If it is a "begin" mark or a "checkpoint" mark with a checkpoint number equal to the checkpoint number at which the rollback procedure is supposed to stop, no more record need to be read, and control goes to the next step. (The begin mark is equivalent to the mark for checkpoint 0). (7) Remember, in the bj_txte info structure for this transaction, the record id of the last record read, which is either a begin mark or a checkpoint mark. This record id will be stored later in the rolled_back record, indicating that the rollback has been physically completed. Now return to the caller, i.e., the transaction manager. As explained earlier, the transaction manager must now flush all control intervals that have been modified during the rollback, flush all after journal records produced during the rollback, and wait for all I/O's to complete. Finally, it appends a rolled_back mark at the end of the before journal, flushes the mark and waits for it to be physically on disk. MTB-568 Multics Technical Bulletin DM: Rollback 3 ROLLING BACK AFTER CRASH As described in MTB-564, the system will guarantee that a modification made to a CI of a protected file is never written to disk before its before image is physically on disk. As a result, it will be possible to rollback after any system crash, whether or not ESD was successful, provided no data was damaged by a media failure. A complete description of the recovery after a system crash can be found in MTB-603: "Data Management - Crash Recovery". 3.1 Invoking the rollback_after_crash After the Multics system has been initialized, the Multics initializer process logs in the Data Management Daemon process. This Daemon is responsible for initializing the Data Management System, but before doing so, it finds out if some transactions were left unfinished in the previous Multics system invocation, in which case it rolls them back. If the system crashed with ESD successfully executed, all information contained in the various tables used by the transaction manager, before journal manager, after journal manager, file manager, lock manager has been written to disk and could be used by the Data Management Daemon. If ESD failed, these tables cannot be trusted and the Daemon process must be able to recover without them. The description that follows assumes that these tables are lost. Some of the steps described here might be skipped or simplified when these tables are available, if one decided to take advantage of that knowledge. In the current implementation, no table is assumed to be valid, regardless of whether or not ESD was successful, except for the uid-pathname tables maintained by the file and journal managers. 3.2 Finding all Journals and Files The first thing the Daemon process has to do is to find out what journals were in use at the time of the crash, and to prepare them again for its own use. The "open" primitive of the before journal manager maintains a table containing the pathnames and uids of all before journals opened in the system, i.e., opened in at least one process. This table is flushed after every modification and is available after a system crash, even if ESD fails. The Daemon knows the pathname of the segment containing the table; it initiates it, and opens, for itself, all Multics Technical Bulletin MTB-568 DM: Rollback before journals that are listed in the table, by calling the before journal manager special entry point "$open_all_after_crash". A similar table, maintained by the "open" primitive of the after journal manager, contains the pathnames and uids of all after journals that were opened in the system. The Daemon uses it to open, for itself, all after journals that were opened at the time of the crash. A third table, maintained by the "open" primitive of the file manager, contains the pathnames and uids of all protected files that were opened in the system at the time of the crash. The Daemon process initiates this table but does not open all the files listed in it. The table will be used during the actual rollback, to convert file uid's found in before images into pathnames. These three tables are supposed to always be consistent, and available after a crash even when ESD fails. They are necessary to rollback after a system crash, and must be as safe as the journals themselves. 3.3 Finding the end of each before journal For each before journal, the Daemon must find the last record physically written in the journal, and such that all records produced before it are also physically on disk. Assuming that the before journal manager tables are not available, one has to find the end of the before journal using the fact that the journal is written sequentially, and that each control interval contains the time at which it was written in the journal. The header of the before journal, stored in CI zero contains the first CI number and the last CI number of the journal. A search on the time stored in each CI is used to determine the most recently written CI of the journal. Then, within this CI, the last logical record is located. The storage manager module of the before journal manager provides the appropriate services for the Daemon process to find the end of each before journal. MTB-568 Multics Technical Bulletin DM: Rollback 3.4 Finding the end of each after journal For each after journal, the Daemon must find the last record physically written in the journal and such that all after journal records produced before it are also physically in the journal. If the after journal is on disk, a method similar to that described for the before journal can be used. If the after journal is on tape, the end of the tape has to be found, and the tape positioned to the end. The after journal manager will provide a utility procedure to do just that, and it will be called by the Daemon process to find the end of each after journal. 3.5 Phasing before and after journals The strategy that has been chosen for the after journal manager when rolling forward is to post every single after image found in the after journal, without trying to determine if it was produced by a committed or an aborted transaction. This strategy requires "taking after images during rollback" as explained in the description of the rollback procedure. However, this is not quite sufficient. Since the before and after journals are not phased during normal operation, it is possible that an after image be physically written in the after journal before the corresponding before image is physically written in the before journal. After a crash, it is possible to have after images in the after journal which do not have their before image counterpart in the before journal. Taking after images during rollback_after_crash would not cancel out these after images. GCOS solves this problem by phasing the before and after journals during normal operation to guarantee that this situation cannot occur. It is difficult to use the same method in Multics. Instead, we let after images and before images be physically journalized without trying to phase them. After a system crash, the ends of all journals are examined and analysed, and all after images that have no before image are eliminated from the after journals. A detailed description of how this is done can be found in MTB-569: "DM: Phasing before and after journals". The after journal manager provides a procedure to do this job and the Daemon calls this procedure to "cleanup" the end of all after journals. Multics Technical Bulletin MTB-568 DM: Rollback 3.6 Finding all unfinished transactions Each before journal contains before images of finished and unfinished transactions. The only information one has so far is the record id of the last record for each journal. By reading the before journal in reverse chronological order, from the most recent to the least recent record, it is possible to determine which transactions have been committed or aborted, and which one were still in progress at the time of the crash; while reading the before journal in reverse order, one can build the list of all unfinished transactions, with the record id of the last record produced by each of them. Reading the entire before journal to find out which transactions were in progress is a long operation in terms of real time it takes to rollback after crash. A number of alternatives are available to find all transactions in progress without having to read the entire journal. They all consist of writing historical information in the journal, showing that, at a particular point in time, only N transactions were in progress. When reaching that point while reading the journal backwards, the rollback_after_crash procedure can start a count down until it finds the corresponding N begin marks. The more frequently this historical information is stored, the sooner the count down can be started, making the search shorter. One could: (1) Store periodically in the header of each before journal the number of transactions in progress in this journal and the time of this observation, or (2) Maintain for each before journal a count of transactions in progress by incrementing this count at each write_begin_mark operation and decrementing it at each write_committed_mark and write_aborted_mark operations. Store this count in each begin, committed and aborted record, or (3) Store this count in every before journal record. The current implementation uses method number (3). MTB-568 Multics Technical Bulletin DM: Rollback 3.7 Rolling back all unfinished transactions Now we have the list of all transactions in progress and the record id of the last record produced by each of them. In addition, we know that the after journals have been cleaned up of any after image that had no before image counterpart. Rolling back these transactions can start safely. The rollback procedure described in the previous section can be used to rollback these unfinished transactions one after the other, if it is provided with the environment it expects; that is, all tables used by the transaction manager, before journal manager, after journal manager, file manager, and lock manager must be initialized to give the rollback procedure the impression it is called during normal operation. This technique will be used instead of writing another rollback procedure. It is also possible to use the transaction manager to rollback or abort each transaction; this would cause "rolled_back" or "aborted" marks to be written in the before journal, after all appropriate flushing operations have been done. Since the checkpoint facility is not provided in the current system implementation, all unfinished transactions are aborted. Rolling back all transactions can be described as follows: (1) Initialize all tables showing that N transactions are in progress. (2) For each transaction in progress, call the transaction manager to abort the transaction, as if it were during normal system operation. 3.8 Cleaning up After all transactions have been aborted, all before journals, after journals and protected files that have been opened by the Daemon to do its rollback task are closed by calling the "close" primitives of the before journal manager, after journal manager and file manager. Multics Technical Bulletin MTB-568 DM: Rollback 3.9 Accepting users again The Daemon process now enables the Data Management System for all users, by renaming to the appropriate name the directory in which the various tables reside. Then it goes to sleep, waiting for a request to execute (See MTB-603 and MTB-604).