Multics Technical Bulletin MTB-548 DM: Record Manager Design To: Distribution From: Lindsey Leroy Spratt and Matthew Clement Pierret Date: 04/27/82 Subject: Data Managment: Record Manager Design 1 ABSTRACT This document describes various facets of the implementation of the record manager. The record manager uses three utilities to do its work, data_mgmt_util_ and rm_search_, which are described here, and collection_manager_. The data_mgmt_util_ utility understands the format of records; the rm_search_ utility positions in a record collection; collection_manager_ handles all of the storage for a collection of records. Comments should be sent to the author: via Multics Mail: Spratt.Multics on either MIT Multics or System M. via US Mail: Lindsey Spratt Honeywell Information Systems, inc. 4 Cambridge Center Cambridge, Massachusetts 02142 via telephone: (HVN) 261-9321, or (617) 492-9321 _________________________________________________________________ Multics project internal working documentation. Not to be reproduced or distributed outside the Multics project without the consent of the author or the author's management. CONTENTS Page 1 Abstract . . . . . . . . . . . . . . i 2 Record Manager Utilities . . . . . . 1 3 Record Storage . . . . . . . . . . . 1 4 Record Layout . . . . . . . . . . . 2 4.1 The field_table structure . . . 2 5 Allocation of Records . . . . . . . 4 5.1 Allocation of Records to Control Intervals . . . . . . . . 4 5.1.1 Overlength Records . . . . 4 5.2 Allocation of a Record within a Control Interval . . . . . . . . . 5 6 Rewriting Records . . . . . . . . . 5 Appendix 1 Search Algorithm . . . . . . . . . . . 1 Multics Technical Bulletin MTB-548 DM: Record Manager Design 2 RECORD MANAGER UTILITIES The record manager uses three utilities, collection_manager_, data_mgmt_util_, and vector_util_. The storage management utility, collection_manager_, deals directly with the file manager and talks in terms of "elements" of the record collection. These elements are undifferentiated bit strings; they are allocated in "control intervals". A control interval is 1024 words long, one Multics page. At some point in the future this size will become settable on a per-page file basis, approximately in the range of 512 to 4096 words. The collection_manager_ is documented in MTBs 551, "Data Management: Collection Manager Functional Specification.", and 552, "Data Management: Collection Manager Design.". The record data is presented to the caller of the record_manager_ as a typed_vector structure (see MTB-541 "Toward a unification of data manipulation on Multics." for a brief discussion of data vectors). The vector_util_ module provides tools for creating and manipulating vectors of data. The internal representation of records is as a continuous, peculiarly formatted, bit string. This representation is sometimes referred to as the "string" representation, in contrast to the "vector" representation used to communicate with the caller of the record_manager_. Converting between the "vector" representation and the "string" representation, and manipulating and interpreting the contents of the "string" representation are all functions performed by data_mgmt_util_ (which is also used for this purpose by the index_manager_). 3 RECORD STORAGE Records are stored as elements in collecitons using collection_manager_. The elements are a storage abstraction managed as undifferentiated bit strings. The control intervals in a record collection all contain nothing but the elements used to store the records, i.e., there is no control information kept inside of a record collection. When a record is stored in the record collection, it is broken up into as many datums as necessary, where each datum is contained within a control interval. All of the datums together make a single "element" of the record collection. A set of collection_manager_ primitives exists for allocating records, freeing them, rewriting and retrieving. The record is known to these primitives as only a string of bits, an "element". The structure of the record as a set of fields is known and used by the data_mgmt_util_ utility. MTB-548 Multics Technical Bulletin DM: Record Manager Design 4 RECORD LAYOUT The record is layed out within the element. The layout of the contents is described in the "field_table" structure, described below. The layout of the record is divided into the fixed portion of the record and the variable portion. The fixed portion is composed of all of the fixed-length fields, and the length data for the variable fields. This minimizes the expense of accessing the fixed fields, as intermingling fixed-length and varying-length data requires calculating each field's position for each record and separating the fixed and varying field data eliminates the need for calculating the location of the fixed fields. The variable portion of the record consists only of the contents of the variable length fields. The field table describes the beginning location for the fixed length fields and the location of the length value for the variable length fields. It also describes the type of each field, and the location where the first variable field's contents start. 4.1 The field_table structure The field_table structure is stored in the record_manager_'s per-record-collection header information. The field_table is declared in dm_field_table.incl.pl1. Multics Technical Bulletin MTB-548 DM: Record Manager Design dcl 1 field_table aligned based (field_table_ptr), 2 version fixed bin (17) unal, 2 number_of_fields fixed bin (17) unal, 2 maximum_field_name_length fixed bin (17) unal, 2 location_of_first_varying_field fixed bin (17) unal, 2 field (ft_number_of_fields refer (field_table.number_of_fields)), 3 name char (ft_maximum_field_name_length refer (field_table .maximum_field_name_length)) var, 3 flags unal, 4 descriptor_is_varying bit (1) unal, 4 length_is_in_characters bit (1), 4 must_be_zero bit (7) unal, 3 location fixed bin (35), 3 descriptor bit (36) aligned, 3 length_in_bits fixed bin (35), 2 varying_field_map (ft_number_of_fields refer (field_table .number_of_fields)), 3 field_id fixed bin (17) unal, 3 varying_field_index fixed bin (17) unal; The field_table.field array has one entry for each field in the record, varying and non-varying. The varying and non-varying fields may be intermingled in this array. The field_table.field.location variable, for a non-varying field, identifies the starting offset in the record_string of the data for the field. field_table.field.type identifes the data type of the field which is sufficient to know the size of the value for most types. For some types, e.g. char_nonvar, the field_table.field.size value is used to determine the length of the value. For a varying field, field_table.field.location identifies the location of the length value for the field (in the record_string). The actual value of the field is found (for the first varying field) at field_table.location_of_first_varying_field in the record_string. The second varying field is found at field_table.location_of_first_varying_field + (length of first varying field). The length and contents of varying fields are separated out in this fashion to help speed references to specific varying fields, by allowing quicker references to the lengths of the MTB-548 Multics Technical Bulletin DM: Record Manager Design preceding varying fields than that allowed by the obvious alternative of storing: length1 + contents1 + length2 + contents2 + length3 + contents3 + ... where: to pick up contents3 one positions to length1, reads it, adds it in and positions to length2, reads it, skips over contents2, reads length3, reads contents3. Also, this allocation technique allows storing the fields in the same order as their definition by the caller at index-creation time. The varying_field_map array speeds the location of any given field, which is of a varying type. The varying_field_map (ordinality_of_varying_field).field_id is used to determine the index into field_table.field array for the Nth varying field. The varying_field_map (field_array_index).varying_field_index is used to determine the ordinality among varying fields of a varying field given its field_id, its position in the field_table.field array. 5 ALLOCATION OF RECORDS The technique for allocating records has two parts; finding an appropriate control interval and allocating records within control intervals. 5.1 Allocation of Records to Control Intervals The basic scheme for determining which control interval in which to allocate a record follows: The tree is searched to determine the correct position for the new record. This results in a control interval id and a slot being selected for the new record. If there is sufficient room for the new record in the control interval selected, a single element containing the record is allocated in this control interval at the selected slot. 5.1.1 OVERLENGTH RECORDS The above scenario is somewhat complicated by the possibility that a record will be too long to fit in one control interval. To store an overlength record it is necessary to break it into multiple elements. The number of elements required to store a record is equal to the least greater integer of the length of the record divided by the maximum allowed element size. Multics Technical Bulletin MTB-548 DM: Record Manager Design (i.e., number_of_elements = ceil (record_length / maximum_allowed_element_size) If record_length is not evenly divisible by the maximum_allowed_element_size, then one of the elements of the record is less than maximum_allowed_element_size in length. Otherwise, all of the elements are maximum sized. If there is a less-than-maximum sized element, it is allocated first, and in the same fashion as an element which is the entire record as discussed above. For the maximum-sized elements, which necessarily occupy entire control intervals, a set of empty control intervals is used. 5.2 Allocation of a Record within a Control Interval Once the control interval has been selected in which the record is to be inserted, it may be necessary to alter the construction of the control interval to make room for the record contents. The need to make room for the record's contents arises when there is insufficient room in the "unused" space of the control interval for its contents, but enough room if the "unused" space and the scatterred free space are combined. The scattered free space comes from deleting (or rewriting larger) records and simply leaving the storage their contents occupy behind to be recovered later. This is solved by "compacting" the control interval. Scatterred free space compaction is accomplished by making a "logical" copy of the control interval by inserting each of the records in the control interval in a temp control interval, then physically copying this "compacted" version of the control interval back into the original control interval. This increases the unused space of the control interval because there was scattered free space in the original copy. 6 REWRITING RECORDS When a record has been allocated, it is assigned an identifier corresponding to its location. The identifier contains the control interval number in which the record is stored and an index in an array of offsets within that control interval. The physical location of the record inside of the control interval may change as storage requirements dictate, but once allocated, a record holds on to the array slot. Fulfilling this requirement entails some non-obvious behavior in certain cases of rewriting records. In the normal case, the record is stored in a single control interval. Rewriting a record with a value not greater in length than the old entails overwriting the old record. Excess space is MTB-548 Multics Technical Bulletin DM: Record Manager Design left to be reclaimed later. Rewriting a record with a value greater in length than the old may require allocating new space in the control interval and freeing the old. If there is not enough room in the free pool to allocate the new value, but the amount of free space plus the amount of scatterred space plus the length of the old value is enough to allocate the new value, a scatterred free space compaction is performed, and the new value is stored in the new free pool. If even a compaction would not yield enough free space in which to allocate the new value, the value is allocated in another control interval in the same manner as allocating a new record. There is one exception: the identifier of the record does not change to indicate the new location of the record. Instead, a "pointer" to the new value is stored in the old control interval. The indirection now required to get to the record is sacrificed in order to gain permanency of record identifiers. This scheme is curther complicated by records which span control intervals. If a record value is larger than the maximum size for an element, the trailing maximum sized portions are allocted in control intervals, just as when allocating a new record. If the record already spans control intervals, the control intervals already in use are reused, with unneeded control intervals freed (if the record has shrunk) or extra control intervals allocated (if the record has grown). 1 SEARCH ALGORITHM