MULTICS TECHNICAL BULLETIN MTB-625 To: MTB Distribution From: Rich Coppola Date: 05/27/83 Subject: The SSF and Multics - ABSTRACT - The contents of this MTB outline the capabilities of the DPS88 System Support Facility (SSF) and the perceived security, integrity and reliability issues that arise using the DPS88 hardware platform. It is presumed that the reader has some knowledge of the DPS88 hardware architecture. Comments should be sent to the author: via Multics Mail: To: Coppola.SysMaint On System M. via Mail: R. L. Coppola Honeywell Information Systems Large Information Systems Division P.O. Box 8000 Mail Station Z-30 Phoenix, AZ 85066 ________________________________________ Multics Project internal working documentation. Not to be reproduced or distributed outside the Multics Project. MTB-625 INTRODUCTION The intent of this MTB is to address the perceived security, integrity and reliability issues raised by the System Support Facility (SSF) when Multics uses the DPS88 hardware platform. This perception of the SSF, its capabilities and design, is based on the SSF EPS and Technical Design Memo's (TDM's). It does not address the many fine features of the SSF as they are considered non-issues. The intent of this paper is to state the capabilities of the SSF and possible negative impact on Multics security and integrity. The impact may be due to a failure in the communications protocol between the SSF and Multics or a successful breach of SSF, and therefore Multics, security by a hostile or benign maintenance user. In no way is this paper meant to impugn the intentions or capabilities of the SSF developers. The reader should be cognizant of the basic premise of a secure computing environment. Simply stated, everything outside of the secure Kernel cannot be trusted. Studies have proven that software alone cannot provide adequate levels of protection in a secure environment. The software must be supported by a proven hardware design. Without proper hardware support it is impossible for any software development house, including Multics, to provide a secure computing environment. OVERVIEW The System Support Facility (SSF) is a free-standing computer system that is used to perform all hardware diagnosis on the DPS88 hardware complex. The EPS claims that the SSF is a support facility for the entire DPS88 system. It, however, is really a hardware support facility. It provides minimal facilities for the support of the other critical elements of the system (e.g., software and network). Architecturally the SSF consists of a standard Honeywell Level-6 with one special adapter board for interfacing to the DPS88's Intercomputer Controller (ICC). The software structure consists of a modified MOD-400 base which hosts a special executive called System Maintainability and Availability Software (SMAS). All maintenance functions, both non-functional (off-line/dead-system) and functional (on-line/operational Multics) are performed through the auspices of SMAS. The SSF, therefore requires, and the DPS88 has been designed to provide, direct access to all resources in the system, including memory and peripheral devices. In essence the SSF is more privileged than the RING-0 Kernel. The SSF initializes the DPS88 hardware, and initiates the 'boot' process for the operating system. Once Multics is operational a multi-layer HDSA Common Exchange Interface (CXI) dialogue is MTB-625 established between the OS and the SSF using standard HDSA session control over a direct channel (DI) on the IOX. This interface is used by the SSF to request host system resources for T&D requests, inform the OS that a central system component has failed, pass central system error records from the SSF's files to the syserr_log, etc. A 'deadman' hyper-connect protocol is used to keep Multics and the SSF informed of the operational status of each system. These dialogues are the only means the SSF has to determine the operational status of Multics. All maintenance software (e.g., T&D, NFTs) is resident on the SSF system. All requests to run tests on a mainframe or peripheral device are routed to SMAS whether they originate from a terminal logged in to the SSF or Multics. In addition to alarm processing there are two different aspects of testing from the SSF. Peripheral testing and mainframe testing. PERIPHERAL TESTING For peripheral, memory or FNP testing SMAS negotiates with Multics to acquire the target resource, and 80KW of memory. Once Multics has deconfigured the target resource and granted the T&D request the SSF loads the Functional Test System (FTS), which runs in the BASIC (GCOS3) decor, into the memory and initiates shared-processor mode or hyper-switching on the DPS88 CPU. FTS may have to be modified to run in the Multics decor as Basic decor functionality may have to be removed to support Multics. Although it is possible to dedicate a CPU to FTS in a multi-cpu configuration, it is doubtful that any of our customers would accept the dedication of such a valuable resource for peripheral testing. FTS issues all of its own I/O and processes all resultant interrupts. There is no mechanism available to allow Multics to validate or issue the I/O on behalf of FTS. Memory testing is also performed by FTS under the MOLTS subsystem. The memory frame to be tested is acquired from Multics using standard protocol. Once SMAS has been told by Multics to proceed with testing, the hyper-page table(s) for the memory frame(s) are assigned to FTS and testing proceeds in the same manner as does peripheral testing. Front-End Processor testing follows the same protocol sequence described above. FTS is assigned one of the logical channels on the DI interface to the FNP, in much the same manner as it would be if it were testing a peripheral on an MPC. MAINFRAME TESTING If the target of a test request is a mainframe component (CPU, IOX or CIU; not a peripheral, FNP or block of memory) SMAS negotiates with Multics to acquire the target resource. Since all test programs and media reside in the SSF it does not require additional DPS88 resources to support the test activity. Once Multics grants the test request and deletes the resource, the SSF MTB-625 may begin testing. Any isolation of the target resource must be performed by the SSF since Multics does not have access to the hardware 'switches' to perform the isolation itself. In all cases once testing is initiated, Multics never receives control until SMAS informs it that it may or may not have the resource back. ALARM PROCESSING Alarm processing is the result of a central system component sending a signal to the SSF signifying that the component has detected an error requiring direct intervention by the SSF. The contract of the SSF is to attempt to help the hardware recover from the failure to keep Multics operational. There are essentially two types of alarms. The first is the result of certain internal errors detected in a unit that allow it to come to an orderly halt. These may be retryable. The second type, which are primarily failures in the memory hierarchy logic, cause all clocks to be stopped immediately. The majority of failures in this area are non-recoverable and will usually result in a system crash. For both types of alarms the SSF determines what action is required (e.g., run NFTs, retry the operation) and informs Multics of the result by signalling a fault. The SSF also places the Machine Conditions in reserved memory so that Multics may retry the operation on another CPU, if retry on the CPU that originally signalled the alarm does not succeed. If the SSF 'thinks' that Multics is inoperable due to a failure in the hyper-connect protocol or possibly in the CXI dialogue, it will send a message to the SSF console informing the operator that it thinks the host is down and to initiate re-boot activities. Although there are plans to support NFTs and dynamic failure analysis/recovery for the CIU and IOX, only the CPU will have alarm recovery support at FCS. When the entire central system will be fully supported is unknown at this time. Since the IOX does not have a console channel, the system console is a part of the SSF hardware complement (the SSF console may also be the Initializers' console). The console, a standard VIP7802, is connected to the L6 hardware BUS by a standard L6 MLCP. Multics can communicate with the console using standard HDSA session control or the ICC console emulator. If the customer only purchases one 'console' the SSF must multiplex the operator and maintenance functions/users on that one console. To summarize, the relationship between the SSF and the rest of the DPS88 system may be conceptualized as closely-coupled with respect to the hardware complex and loosely-coupled with respect to operating system software (in this case Multics). Although the SSF and Multics are loosely coupled, the SSF always maintains absolute control of the hardware complex. MTB-625 To understand the problems presented to Multics by this design a brief overview of how a maintenance process is controlled in the Multics environment is necessary. The standard I/O interfacer (ioi_) and isolts_ are used as examples. MTB-625 CURRENT_ISOLATION_MECHANISMS: isolts_ The ISOLATED Online Test Subsystem (ISOLTS) provides a means of testing Multics processors online, in an ISOLATED environment using the TOLTS executive program. The target CPU must have been released, by Multics, from the service system prior to the test request. The isolation portion of the isolts_ mechanism, which resides in the inner-ring, then ensures that the CPU is ISOLATED from the system by performing the following tasks: ISOLATION_STEP_1 The operator is told to reconfigure the target CPU such that it has access to one and only one SCU, the base 64KW of memory on that SCU being dedicated to the isolts_ process, for an extent of 64KW. The Multics Development Center would prefer that the entire reconfiguration process be performed by the software but the current CPUs do not provide software control of the configuration panel switches. However, the isolation process has been designed to intercept manual reconfiguration errors and inform the operator of the error before any testing is initiated. ISOLATION_STEP_2 All memory in the dedicated SCU is then removed from the system so that any reconfiguration test failures do not result in the corruption of data or a catastrophic system failure. ISOLATION_STEP_3 The inner-ring isolation/reconfiguration logic then ensures that the target CPU is configured properly, and is indeed bound by the configuration switch settings. ISOLATION_STEP_4 If all of these checks are successful, the rest of the memory on the SCU is returned to the system and isolts_ is allowed to proceed and the standard 'off-line CPU T&Ds are executed on the target CPU. In order to run isolts_ the maintenance process must also have access to several privileged hardcore gates. MTB-625 The I/O Interface A Multics ring-4 process cannot perform I/O directly, as the CIOC instruction is classed as a privileged instruction. Therefore the maintenance process, whether it wants to use TOLTS (T&D) or a Multics tool such as load_mpc, must accomplish all I/O operations by interfacing with the standard I/O interfacer (ioi_). Unlike GCOS, Multics does not allow the Initializer's (system) console to be used to run T&Ds. The protocol followed by all Multics maintenance tools, including T&D, is as follows: The target device is attached using the Resource Control Package (RCP) rcp_ or rcp_$priv_attach subroutines (the process must have access to the rcp_sys_ gate to use the $priv_attach entry). RCP determines whether or not the requesting process may attach the resource by checking its access on the access control segment for that resource. Assuming that the process has the proper access the operator is asked to mount the requested media, if appropriate. If the target is assigned to the system or another process, RCP will deny the request. The channel program is set up (IDCW/DCW list) and a call to ioi_$connect is made to have ioi_ perform the I/O operation. If the I/O operation is directed towards an MPC and may affect its operation (e.g., running ITR's), a call to ioi_$suspend_devices is required to allow all current I/O to be drained, and to prevent further I/O until ioi_ is notified that all other I/O may be resumed (this is accomplished via a call to ioi_$release_devices). The ioi_ interfacer validates the first IDCW of the channel program and, if valid, performs the I/O on behalf of the maintenance process. Since the hardware will not allow a channel program to change the target of the I/O operation, this validation is only performed on the first IDCW of the channel program. The Multics T&D interface, tolts_, follows the protocol described above. The only difference is that it translates specific MMEs in the T&D test-pages into calls to the appropriate ioi_ entrypoints. In all cases the maintenance process is able to attach a specific system resource only if the system operator and system administrator allow it. The operator controls access to a resource by authenticating or denying the request from the maintenance process, the system administrator by providing access to the required gates. Additionally, the maintenance process is bound by the fact that it resides in ring-4 and is MTB-625 completely controlled by the Multics kernel and the hardware ring mechanism. In addition to all of the above, the sharing of the target resource between Multics and the maintenance process is prohibited. It is not possible to run T&D on a resource that is currently 'owned' by Multics or some other process. System storage devices (disk) being a good example. The reasoning behind this restriction is twofold: In the first place it presents a security breach; Secondly it is not logical to trust a device which is believed to be broken to follow protocol and only write on the T&D cylinder, or not write anywhere at all. These examples should make it obvious that a maintenance process, just like any other Multics process, is governed by an established set of hardware enforced accesses and gates which may be dynamically controlled by the systems administrator depending on the immediate needs of the system. The critical areas of resource ISOLATION, channel program validation etc. are jointly controlled by the hardware and the inner ring (most trusted) software. This design ensures the security and integrity of customer and system data while allowing concurrent maintenance on ISOLATED system resources. This ability, validation of channel programs, hardware enforcement of access and rings and the ability to ISOLATE the unit under test cannot be emphasized enough. MTB-625 The_SSF: The SSF provides all facilities that are necessary to perform maintenance on DPS88 hardware complex and the SSF itself. Some of these facilities include; the ability to dump SSF memory, update SMAS, read/write memory and cache as well as hardware registers, patch SMAS bound units and MPC firmware, write to the CIU calendar clock, disable/clear the SMAS journal. The SSF is also perfectly capable of performing I/O to any device on the IOX, including the servicing of any resultant interrupts, or writing to the memory hierarchy without the consultation or concurrence of Multics. It is not suggested that SMAS will intentionally circumvent established protocol. However, it is possible that a user of SMAS may do so intentionally or, most likely, SMAS may not 'think' that Multics is operational due to a failure in the inter-system interface and allow direct memory hierarchy or peripheral device operations. Access to the SSF is controlled by the SMAS log-in facility. With the exception of the system console, Multics has no knowledge of any SSF log-on activity. SSF facilities (e.g., maintenance activities, operator functions, SMAS updates/patching) are controlled by SMAS with a set of access control lists that appear to be very similar to those used in Multics. The important difference is that there is no hardware implementation in the L6 to enforce the access control list. All access control is enforced by SMAS, which is a newly written, untested, assembly language operating system. This limited access control is diminished even further by the fact that the SSF and its users share the same, primitive, single layer file system (there is no hierarchical structure). The SSF provides maintenance and operational facilities to users, concurrently, with different levels of access and privileges (multi-level access control processing/multiplexing). Since the SSF has a minimal access control mechanism and file system, it will be extremely difficult, perhaps impossible, for it to adequately control and prevent accidental or intentional modification and/or destruction of the file system or its own object code. Maintenance on the SSF itself is accomplished utilizing the standard CSD maintenance strategies for the L6. A connection is made to the L6 System Control Facility (SCF) providing absolute control of the L6 to the maintainer who may or may not be connected over a dial-up line. The SCF connects the dial-up line to the maintenance console. It makes no differentiation between the local console or dial-up line. The only means of discerning where input came from is by perusal of the hard-copy ROSY output for a leading space. Since the ROSY printer is an option, this hard-copy audit may not be available. When the SCF is enabled for L6 maintenance the ICC connection to the DPS88 mainframe MTB-625 complex is disabled. This is to prevent any disruptions to the DPS88 system that may result due to maintenance activities on the L6 itself. If an alarm is signalled by the DPS88 mainframe complex during this period it cannot be processed until the SSF is returned to service. A reload of MOD 400 and the SMAS software is forced when the ICC interface is re-enabled after maintenance activity on the L6. However, there are no requirements or forced hardware functions which would cause the SSF disk(s) to be dismounted when the SCF is enabled. The assumption, based on SSF access control, is that the disk-resident software will not/cannot be modified when the SCF is enabled. ERROR_RECOVERY When a central system hardware component detects an error it signals an alarm to the SSF. Once the alarm is signalled, the unit is usually inoperable until the SSF services the alarm and returns the unit to the host if the error was recoverable. The SSF interface to the DPS88 system is a bit serial shift path controlled by the L6. Since all maintenance functions are performed over this interface, it is not inconceivable that most error retry/recovery attempts will take up to 20 seconds. The EPS states that successful retries are invisible to the process in control of the CPU at the time of the error but does not specifically state that Multics will be informed. Multics must be informed of this activity so that it can adjust accounting and any other pertinent data. DPS88 HW systems engineering states that Multics will be informed and the SSF will place recovery information in the Multics address space. According to DPS88 HW systems engineering, only specific internal CPU parity errors will be retried. All CIU and memory hierarchy failures are currently treated as fatal and will result in a system crash. IOX failures are, from all available data, logged only. The SSF EPS states otherwise. INSTRUCTION_RETRY The EPS is not at all clear on this subject. It does not address which instructions are retryable, how retry is performed, how the memory hierarchy or CPU registers are reverted to their previous contents before the instruction is retried or how/when Multics is informed of the recovery attempt. The potential for corrupting shared data bases, especially locked system data bases, is significant and requires a very detailed description and review of this operation for the Multics environment. MTB-625 THE_HYPERVISOR The hypervisor consists of software that runs in the DPS88 hardware at the most privileged level possible. SMAS may, through the auspices of the hypervisor, dispatch from one OS to another (Multics to FTS) without any negotiation whatsoever. The SSF may instantly reassign memory that is assigned to the second OS by reloading the hyper-page table. Memory may, in fact, overlap or be taken from one OS and be made available to another by manipulating the hyper-page table. This is not meant to imply that the hypervisor software or SMAS will intentionally disregard protocol, or intentionally make Multics memory available to FTS. As stated earlier the inter-system communications interface may fail, causing SMAS to make a wrong decision. The intent is to make the reader aware of the capabilities of the SSF. It does not appear that the problems of hyper-switching and "multi-OS" or "multi-computer" have been adequately resolved. The SSF EPS does not address this subject adequately for a technical evaluation. Resource sharing (e.g., CPUs, channels, memory, etc) and attendant accounting problems, time 'skips' and event notification are not addressed adequately at all. MTB-625 SUMMARY: To understand the Multics Development Centers' concerns with the SSF it is necessary that those unfamiliar with Multics have an understanding of the security it offers and what is expected by the majority of the Multics PARC. That reader should also be aware that Multics has been certified by the Department of Defense for multi-level processing. It is the only commercial system to achieve that rating. The Air Force, although it does not assign ratings, has declared that Multics is the most secure commercial system available. The Multics Development Center does not want to jeopardize or lose that status, a very important selling point to security conscious commercial sites as well as the federal marketplace. The inability to control the facilities provided by the SSF with certified hardware and software presents definite security and integrity problems for the Multics PARC, especially the FSD and National Defense Agency segments. The SSF abrogates the architectural design of Multics and is, without a doubt, the least secure element in the DPS88 system. Since the security of a system is measured by the least secure element, HW or SW, in that system, Multics will most likely acquire the classification of a Level 6. We agree with the DPS88 developers that once access to the physical machine room is granted to someone, that person must be considered to be 'secure' by the site. However, there are levels of security within the machine room. The operator has specific duties and restrictions. The operator is normally logged into the system as a standard user, and is governed by the same access controls as all other users are. Although the operator does have unlimited access to the Initializer's console, the functions provided are restricted to what is necessary to operate the system. The operators access to highly privileged system functions is restricted by the fact that he/she must know the password to enter 'admin mode'. A CSR in the machine room has the same, if not more, restrictions as the operator. The CSRs process is governed by standard access controls, usually including a process overseer, established by the site as Multics does not allow T&Ds to be run from the Initializer's (system) console. Although anyone in a current product line machine room may push the wrong button and crash the system it is extremely difficult, if not impossible, for them to breach security or compromise sensitive data by manipulating maintenance panels without being noticed by site personnel or the system itself. The current product line also has the capability to lock-out maintenance panel functions with the test/normal switch. It is also possible MTB-625 to ISOLATE hardware resources to allow concurrent maintenance without effecting the security and integrity of the system. The SSF provides a greater potential for a breach of security than the current product line because it has capacity to access any hardware resource in the hardware complex without the knowledge of Multics. Our primary concerns with the SSF are with its unrestricted access to the entire hardware complex and the reliability and integrity of the SSF hardware and software. MDC is also concerned with the ability of the SSF to multiplex operator and maintenance functions properly. The SSF and SMAS software must be capable of performing multi-level access control to keep the two functions/users isolated from each other. The most current TDMs and the SSF EPS indicate that there are problems in this area, especially when one terminal is used as an operator and maintenance console concurrently. Another major concern is that the functions provided by the SSF far exceed those required of a maintenance or support facility, and are in fact much more complicated than necessary. These concerns are caused by the perceived lack of a total systems integration of the SSF, DPS88 hardware and Multics. (Multics). To provide adequate control of the SSF and its maintenance processes, Multics must be able to validate all channel programs and I/O accesses and be able to ISOLATE or verify the ISOLATION of any and all resources being tested by the SSF. Multics should also have a means of ISOLATING all of its hardware resources from the SSF. The SSF should be a passive facility, performing activities dictated by the operating system or alarm processing. MTB-625 RECOMMENDATIONS: The following recommendations address the major areas of concern. If accepted, the gaps in security and integrity will be minimized to the point that the Multics Development Center would no longer consider the SSF as a crucial issue in developing software for the DPS88 hardware platform. 1. Eliminate peripheral and FNP testing from FTS while the host is operational. This function belongs in the domain of the host. 2. Develop a mechanism, in the hardware, that will allow the OS to lock the SSFs access to the hardware complex. This hardware lock, would be reset by the hardware when an alarm is signalled to the SSF. It could be re-locked, by the OS, when the SSF notifies it that the SSF is returning the resource. To perform the necessary isolation Multics must be able to lock access to the hardware complex on a component basis. 3. The SSF cannot directly write into or read from the Multics address space while Multics is operational. All retry information and any other data must be placed into the memory that is used for the CXI interface. 4. Reduce the HDSA CXI session control interface to a single layer low-level interface. There are several reasons for this. The most important is the reduction of complexity. This extremely important inter-system connection must be as simple as possible to reduce the likelihood of failure. 5. Develop a mechanism, in the hardware, that will force a write-protect or a cycle-down of the SSF disk when the SCF is enabled. This will force physical operator intervention to allow all modifications to the SSF disk. 6. Until the L6 firmware for the SCF is changed to differentiate between local and remote users the SCF must be disabled. The SCF should be enabled only when maintenance on the L6 is being performed. This could be accomplished by using the maintenance enable switch. 7. While Multics is operational direct log ins to the SSF cannot be allowed. All maintenance activity should be routed through Multics when it is operational. The maintainer would log into Multics and be serviced by a process overseer, much like he is today. This mechanism would allow Multics to validate the users access to commands and functions before passing them to MTB-625 the SSF for execution. If Multics is down the maintainer would be able to log into the SSF directly. 8. To allow secure remote maintenance capabilities the CSD TAC should install a Multics support system. This would address several security problems raised by TDM-RAS-125 and TDM-RAS-132. The Multics support system would obviate the need for TAC personnel to know customer phone numbers or passwords. It would also be able to enforce the various levels of access for each skill level. 9. The 'deadman' protocol used to determine the health of the SSF and Multics must be defined so that it is as fail-safe as possible. 10. All data transmissions to or from the SSF should utilize standard communications methods to validate that the data was transmitted correctly. This is important not only for file-to-file transfers, but normal interactive transmissions.