MULTICS TECHNICAL BULLETIN MTB-621 To: MTB Distribution From: Roger D. Lackey, L. A. Dupuis, Ronald B. Harvey, Ronald A. Barstad, Noah S. Davids, Michael S. Kubicar, David J. Schimke, Craig L. Senft, Paul W. Benjamin Date: 05/11/83 Subject: An Architecture for End User Data Access Forum Meeting: >udd>Demo>dbm_test>con>End_User_Data_Access (euda) ABSTRACT This paper proposes a long range plan for the implementation of a facility that will provide an end-user interface to various types of files, including but not restricted to, MRDS databases, Lister databases, ASCII text files, and structured files. Major aspects are: o A new End User Facility (EUF) to replace LINUS. o Retrievals across databases and across database types. o A selection expression parser that uses the SQL selection language, and produces Canonical Query Language (CQL). o A single uniform subroutine interface to MRDS and Lister databases, replacing dsl_ and lister_. _________________________________________________________________ Multics Project internal working documentation. Not to be reproduced or distributed outside the Multics Project. MTB-621 End User Data Access LINUS Starting from the top, let's look at LINUS. What does it do? And what are the alternatives? It processes requests, or at least used to do so. The subsystem utilities now do that for the most part. It offers a macro facility via the invoke request. The exec_com feature supplied by ssu_ does a much better job of that. It offers LILA. What does LILA do? It performs very crude editing functions and parses selection expressions. There is a proposal for a common parser that would support the MRDS, LINUS and SQL selection languages. Editors such as ted or qedx do a much better job of editing than LILA could hope to accomplish. LINUS produces reports, sort of. There is an interface to MRPG, but MRPG is generally recognized not to be a good interface for end users. There is a method of creating Lister files, but the user is left with escaping from LINUS and using Lister commands directly to produce reports. There is the LINUS print request, which merely displays the data in an inflexible format. And then there is the new LINUS report writer, which is much better than any of these, and fortunately modularized so that it could be extricated from LINUS and stand on its own. And finally, LINUS accesses MRDS databases. In some cases, it communicates properly with the appropriate MRDS subroutine entrypoints. In others, it does the inexcusable and looks at MRDS files and structures directly. The New EUF We propose a new LINUS, similar in task, simpler in design, and broader in scope. The standard subsystem tasks would be handled by ssu_, the selection expression editing by the user's editor of choice, the selection expression parsing by a common selection expression parser, report generation by the (formerly LINUS) report writer and file access by something we will call the View Manager. Precisely what that EUF would look like will be the topic of another paper. There are various approaches that could be taken. One is to have it essentially similar to LINUS. Another is to mimic IBM's ISQL interface (or MEDC's IQ which is similar). That point need not be worked out at this stage of the discussion. End User Data Access MTB-621 View Manager At the center of this proposal is something called the View Manager (VM). VM will act as a traffic cop of sorts for database accesses. It will have entrypoints such as open, close, get, put, get_description, get_population, etc. These entrypoints are named for sake of argument, not as an actual proposal, the actual implementation should include the full repertoire of database functions. VM's task includes determining the type (MRDS, Lister, etc.) of the database in question and routing the request to the appropriate database manager. To simplify VM's task, those routines that want to be supported by VM will provide a uniform set of entrypoints that is a subset of the VM entrypoints. This subset will consist of primitives for database manipulation. For example, VM will have an "open" entrypoint, as will MRDS, Lister, et al. More exotic entrypoints, however, such as one that would parse a selection expression and retrieve a tuple in one call, would be provided only in VM. There are 2 reasons for this. It isolates as much functionality as possible in VM, making it unnecessary to add new features in more than one place. It also makes VM more palatable as the "approved" interface. The dsl_ interface to MRDS would continue to be supported for some period of releases, but no work beyond bugfixes would be applied to them. The direct subroutine interfaces to MRDS, Lister, etc. would be "internal interfaces". The VM interface would be the approved, recommended, documented interface. This is not to say, however, that the Lister commands, for example, would not continute to be supported. So, in the simple case, the user asks the EUF to open a MRDS database, EUF calls, for example, VM$open who sees that the object in question is MRDSish and in turn calls mrds_$open. Whenever the task involves only one database and involves nothing more than the primitive operation, VM acts as a transfer vector of sorts. Note however, that EUF cares not what type of object is being manipulated. It can be MRDS, Lister, whatever. EUF handles it in the same fashion. So, with this simple approach, the EUF is made much more powerful. Now, consider the situation where the user has opened more than one database and requests a retrieval whose selection criterion involves more than one database. The parser translates the expression to Canonical Query Language (discussed in more detail later in this paper) which is returned to the EUF. EUF then calls VM$get (or whatever) with this CQL. VM determines that the selection involves more than one database and breaks the CQL down into appropriate parts. Retrievals are made from both databases, comparisons are made by VM, and the requested data is returned to the user. MTB-621 End User Data Access Canonical Query Language A selection expression parser reads user-supplied selection expressions, e.g. "select foo from bar" and translates it into a form suitable for processing by the database manager's selection mechanism. Currently on Multics we have such parsers for MRDS, LINUS and Lister (plus tr_query but this proposal does not speak to that). Each accepts a different selection language as input and creates a different internal representation as output. This paper proposes that the various parsers produce the same internal representation as output. This internal representation is known as Canonical Query Language or CQL. There are several relational or quasi-relational database packages under development for various Honeywell operating systems. Work is underway on DPS6 in Billerica, DPS7 in France, GCOS 3/8 at the Multi-Environment Development Center (MEDC) in Phoenix, CP-6 in Los Angeles and at CCSC in Minneapolis. There has been a fair amount of contact between these developers in hopes of establishing a Canonical Query Language. The idea was that any user-visible selection language should/would be reduced to an internal representation that would be common to all. The MRDS group in Phoenix was not directly involved in these discussions. The effort to establish a CQL that spanned product lines was singularly unsuccessful. One group was not far enough along in their work to agree to the proposal of another while a third could not wait for resolution, having to get a product out the door. While there exists no agreement, there still appears to be a sincere desire to conform to such a standard should it be established. The MEDC group in Phoenix, in developing RAM for GCOS, was unable to wait and established their own protocol. RAM has been announced and was demonstrated at the Toronto HLSUA (4/83). The original intent of this proposal was to attempt to conform to the Honeywell standard. Since their is no such standard, the proposal is now to look at the RAM CQL and attempt to use that. Initial contact with MEDC has indicated a willingness to make some modifications to their CQL if it was necessary to do so for MRDS. This CQL could well provide a basis for the hoped-for company-wide standard. Company-wide standardization is a lofty ideal and one that should not get lost in this process. However, if we were unable to come to an agreement even with the GCOSians, a Multics-only implementation would still be a win. It would make it easier to implement new end-user interfaces such as the Query By Example package that is wanted so badly by Marketing. The Multics implementation should be structured in a fashion that would allow modifications if and when a Honeywell standard were chosen. End User Data Access MTB-621 Parsers Regardless of what CQL is chosen (even if Multics rolls its own), there would be a new parser. The language that this parser would process would be essentially the ISQL language, but the parser would be equipped to handle both the existing LINUS and MRDS languages. Our documentation would be upgraded to describe only the ISQL dialect, as that is rapidly becoming the industry standard. This proposal is not new. Jim Gray wrote a paper in 1981 describing it and at least one BNF grammar exists. It would be implemented using LALR. The parser would serve 3 purposes: 1) It would be used by the new EUF to process selection expressions. Since the proposal calls for MRDS and Lister to be supported under EUF/VM, both MRDS and Lister would have to be taught to accept CQL. This would mean that the user would use the ISQL language for Lister selections within the EUF. Since the existing Lister command interface would have to be retained, at least for a period of time, it might also be necessary to alter lister_compile_select_ to produce CQL. If there was strong sentiment for the Lister commands to continue indefinitely, it may be useful to have them optionally accept either the existing selection syntax or the new syntax. 2) It would be used by VM itself to process selection expressions. Currently, the user can call dsl_$retrieve to get a tuple, passing in an unparsed selection expression. Although the EUF would not use the capability, such functionality must be provided to the user. Therefore, for each function in VM that required selection criterion (retrieve, modify and delete operations, whatever they are called) there would be 2 entrypoints, one that took CQL as input and one that took an unparsed selection expression. In the latter case, VM would have to call the parser. 3) The parser would be available as a system subroutine for anyone wishing to write an application that accepted the standard syntax as input and desired CQL. MTB-621 End User Data Access The Database Managers MRDS and Lister have been used as examples of facilities that would supply the appropriate entrypoints for access by VM. The conversion of these modules to provide the entrypoints is a mandatory part of this proposal. MRDS has a myriad of subroutine entries, but unless the dsl_ entries are used as a model for the definition of the VM entries (not likely), new entrypoints will have to be provided. Lister, for all practical purposes, has no subroutine interface. You can't do a whole lot with lister_. It exists only for the purposes of the LINUS create_list request. Two more examples of system routines for file handling are for ASCII files and structured files. A manager (with appropriate entrypoints) could easily be written to allow support of ASCII files under VM, perhaps having a file defined like card images (the first 2 positions are field A, the next 7 field B, etc.), or perhaps using delimiters a la the LINUS write and store requests. A structured file would be one not limited to ASCII data which could be described by a PL/1 structure declaration. The potential for support of other objects is intriguing. An interface to RDMS or JANUS would be feasible. Anybody who can provide the appropriate entrypoints can conceivably have their package and database type supported under VM. Consider a subsystem that communicated with the Intel database machine. So long as the interface is correct, it works. Questions Needing Answers This paper does not purport to have all of the details worked out as to how such an architecture would be implemented. We are aware of several things that need to be worked out. They are presented here to stimulate discussion and will hopefully be resolved in future MTBs: 1) When the users have more than one database open and are doing cross-database activities, what they are conceptually accessing is a "virtual database" or "view". How should this view be defined? How should it be established? There are potential naming conflicts within a view (relations/files and/or attributes/fields with the same name). How should these conflicts be resolved? One approach (best we've come up with this far) is to have a view identified by some sort of descriptor. This descriptor could be a separate segment ("open foo.view"). It could be an entry in a data dictionary (the authors shudder to use that term since it can mean so many things). Or perhaps a data dictionary could be just an archive of .view segments. Taking the single segment approach for sake of argument, it could created by a create_view command that took as input ASCII End User Data Access MTB-621 source and "compiled" the view. Any name conflicts would be flagged as errors. There could also be a command to recreate the ASCII source from the compiled version. Within the EUF, perhaps the user could open several separate databases and then issue the "establish_view" request. That request would create a view, prompting the user for resolution of naming conflicts, and then open the view. Another useful request would be "save_view" request where this dynamically created view could be preserved in a .view segment (or data dictionary or whatever we decide on). 2) Let's start thinking of any opening (single database, for example) as a view. The user can have more than one view open at a time. How will the view to be acted upon be identified by the user in a request. Database indices (view indices?) are not to thrilling from a Schneidermanian viewpoint. How about reference names of sorts, i.e. "open foo" opens foo and then the user refers to it as foo in subsequent requests, whereas "open foo bar" opens foo which is refered to as bar. Is it cumbersome to have to specify the view name in all requests? Probably. How about a use request? That speaks to the EUF, how about the subroutine interface to VM? Should we use view indices, a la MRDS? 3) As implied earlier, we have no idea what the EUF will look like. Should it look much like LINUS (and Multics, for that matter) and have ctl_args, standard request names, etc., for compatibility with other Multics subsystems or should it follow a different drummer? A case could certainly be made by Marketing for it to look like ISQL. When the LINUS Report Writer design review was held, the consensus was that it should use ctl_args where the original design said no. Does that mean that the same logic calls for this new EUF to look like the rest of Multics? Probably not. A good case for ctl_args in LRW was that it was part of a subsystem that already used clt_args. If an ISQL-like or other human factors oriented interface was chosen, it would only need to be internally consistent with regard conventions. So, what it probably boils down to is who the audience for this EUF will be. Will it be primarily for systems programmers? Engineers? Clerical people? Seasoned Multicians? And so on. Should it be a video and/or menu interface? If is not an interface that requires intelligent video terminals, should it have such things built into it as options or alternate approaches or should a video/menu interface be a separate EUF?