Multics Technical Bulletin MTB770 MRDS Query Language To: Distribution From: John Hergert Date: 11/05/87 Subject: MRDS DSL Query Language Changes 1 Abstract This document describes a proposal for a new query language designed to replace Multics Relational Data Store's current Data Sub-Language (DSL). This new language incorporates many of the features considered mandatory for a relational database manager today. Discussion should take place in the System-M Forum meeting: >udd>Demo>dbmt>con>MRDS_Development.forum or the Ford_A meeting: >udd>MRDS>meetings>MRDS or comments should be sent to the author: via Multics Mail: Hergert@System-M Hergert on Ford_A via US Mail: John Hergert Ford Motor Company 20000 Rotunda Drive ECC Row 207-5 MD-1 Dearborn Mich. 48121 via telephone: John Hergert (313) 845-9351 _________________________________________________________________ Multics project internal working documentation. Not to be reproduced or distributed outside the Multics project without the consent of the author or the author's management. MTB770 Multics Technical Bulletin MRDS Query Language Multics Technical Bulletin MTB770 MRDS Query Language 2 Introduction This document describes a new parser and grammar as a replacement to the DSL subroutine interface to MRDS. This grammar is intended to be upward compatible with the existing query language so as to be completely transparent to any user who does not wish to be affected by it. At the same time it offers the flexibility and sophistication of the current trend in database query languages. This merger produced a hybrid query language consisting of the best points of the original language and those of the current "state of the art" query languages based on SQL. The query language is described in detail in the following sections with the BNF of the grammar listed in the appendix. MRDS will not support all of these new features when this new language is first released. The intent however, is to provide language support for any and all possible future endeavors. Some of the features, like not having to use parentheses in where clause, will be available in the first version since they come for "free" with the LALR parser generator tool used to construct the parser. MTB770 Multics Technical Bulletin MRDS Query Language 3 Language Definition 3.1 Selection Expressions A selection expression can consist of either one of two groups of text strings. The first group called, "control orders", allows a selection expression to be any of "-another, -compiled, or -current". The -another order requests the dsl retrieve entry point to return "another" tuple based on the previously supplied selection expression. The "-compiled" order requests the dsl entrypoint to use a previously compiled selection expression, and the "-current" order requests the dsl entrypoint to use the same tuple it used for the most recent call to dsl. The other group is known as a "query" and allows a selection expression to be an actual 'selection expression'. That is, the string that one uses to query a database, usually a range-select-where combination, called a query clause. A query clause may be followed by one of three set operators, "-inter, -differ, or -union" and then another query clause to form another query. A query may be followed by another query clause. Parentheses are optional and may be placed around a query. They are used to indicate precedence of set operators since none is implied. Evaluation is from left to right. Example: RSW is a range-select-where combination, or query clause. Any of the following constructs is valid. RSW RSW -inter RSW RSW -inter RSW -union RSW ((RSW) -inter (RSW)) -union (RSW -inter RSW) 3.2 Query Clause A query clause must consist of a range clause followed by a select clause. Optionally, a where clause may follow these, or a group clause may follow. If a where clause follows, then a group clause may optionally follow it. Example: These are the only constructs allowed. Multics Technical Bulletin MTB770 MRDS Query Language RANGE_CLAUSE SELECT_CLAUSE RANGE_CLAUSE SELECT_CLAUSE GROUP_CLAUSE RANGE_CLAUSE SELECT_CLAUSE WHERE_CLAUSE RANGE_CLAUSE SELECT_CLAUSE WHERE_CLAUSE GROUP_CLAUSE 3.3 Range Clause A range clause is used to define the range of relations that the selection expression will be operating on. The range clause must start with the keyword "-range". The keyword must be followed by a list of range items. Each item in the list can be represented by an open parentheses followed by a relation label followed by a relation followed by a close parentheses, or just a relation. Any item in the list may optionally be followed by a plus sign "+". This indicates that it is participating in an outer join comparison. A relation can be specified by using its name, a .V. argument substitution character used to identify a temporary relation, or a label.relation reference where the label identifies the database that the relation belongs to. Example: Any of the following constructs is valid. The current syntax: -range (p parts) The parts relation with a label and the vehicle relation: -range (p parts) vehicle The parts relation with no label: -range parts The parts and vehicle relation in an outer join: -range parts vehicle+ A temp relation and a label for the parts relation in a a foreign db named prod, in an outer join situation: -range (t .V.)+ (p prod.parts) 3.4 Select Clause A select clause is used to define what the query is to return or "select" from the database. The select clause must MTB770 Multics Technical Bulletin MRDS Query Language start with the keyword "-select". It may optionally be followed by the keyword "-dup", indicating that duplicate tuples should be returned. Then a select item list is required. This may consist of a single star "*" to indicate all attributes should be returned, or any number of attribute identifiers separated by spaces. An attribute identifier is the attributes name, or the label or relation name supplied in the range clause followed by a period "." followed by the attributes name. The second type (a.b format) is only required if the attributes name is not unique across all relations specified in the range clause. A star may be appended to this second type when defining a temporary relation, to indicate that the attribute is to be a key in the temporary relation. A select item list may also consist of expressions. See the section entitled "Expressions" for details on the construction of expressions. Expressions in select clauses may not contain .V. or .X. argument substitution characters. They must also be enclosed in parentheses. A select item list may be followed by the string "::" and some name. This name becomes an alias and can be used to rename attributes when defining temporary relations. Example: Any of the following constructs is valid. Select the attribute p, or all attributes in the relation p, depending on the context: -select p Select the attributes number and date from the relation identified by p: -select p.number p.date Select the attribute number from the relation p and use it as a key to a new temp relation. Call it new_number: -select p.number* :: new_number Select the attributes count and date from relation p. Multiply count by 2: -select (2*p.count) p.date Select the attributes name and address from the relation e. Concatenate the string "MR. " before the name, take the first Multics Technical Bulletin MTB770 MRDS Query Language 5 characters of address: -select ("MR. "||e.name) (substr(e.address 1 5)) Select the attribute number from the relation, multiply it by 3, add 2 and use it as a key for a temporary relation. Call it new_number: -select (2+3*number)*::new_number 3.5 Where Clause The where clause is used to qualify tuples that are selected. If no where clause is supplied, all tuples selected will be returned. The where clause must start with the keyword "-where" and be followed by a qualifier. A qualifier is a list of terms separated by either of the logical operator keywords "-or", or "-and". The logical operator symbols "|" and "&" are also valid. Any term can be preceded by the logical operator keyword "-not" or symbol "^". Evaluation precedence is: "not" first, "and" second, "or" last. A qualifier may optionally have parentheses around it to change the default evaluation precedence. A term has 4 valid syntactical constructs. A "comparison" construct A "like" construct A "null" construct A "between" construct In the following explanations, a reference to an expression refers to the explanation under the section entitled "Expressions". A comparison construct allows comparisons between values to be done. It must have an item followed by a comparison operator followed by another item. A comparison operator can be a relational operator or a relational operator followed by either "-any_of" or "-all_of". A comparison can also be either of the keywords "-is_in" or "-is_not_in". These are all for list operations. A relational operator is the normal "=", "^=", "<", etc. An item in a comparison construct can be an expression, or a list or constants in parentheses or a selection expression in parentheses, or the current style expression that has brackets around it. It can also be the argument substitution character ".X.". MTB770 Multics Technical Bulletin MRDS Query Language Example: Where "part" in relation p is equal to "part" in relation d: -where p.part = d.part Where the attribute "part" is not less than 123 -where part ^< 123 Where the attribute "number" is equal to either 1, 2 or 3: -where p.number -is_in (1,2,3) Where the attribute "part" times 4 plus 6 is equal to 1-6: -where 2*3+4*p.part = -any_of (1,2,3,4,5,6) Where "part" is equal to any value returned by the second selection expression: -where p.part = (-range parts -select part -where p.number = 2) The "like" construct allows regular expression searching in a where clause. It must start with an attribute definition or a function. Functions are explained in the Function section. Following these must be either of the keywords "-is_like" or "-is_not_like" and finally a regular expression string, whose syntax is yet to be defined. Example: Where the value of the attribute "part" ends in "valve": -where p.part -is_like "*valve" Where the value of the attribute "part" does not end in "valve": -where p.part -is_not_like "*valve" The "null" construct allows checking for "NULL" values in the database. This must start with an attribute definition and be followed by either of the keywords "-is_null" or "-is_not_null". Example: Where the value of the attribute "part" is null: -where p.part -is_null Where the value of the attribute "part" is not null: -where p.part -is_not_null Multics Technical Bulletin MTB770 MRDS Query Language The "between" construct allows checking to see if a value is between two others. That is, it is greater than the first but less than the second, or in the not_between case, less than the first or greater than the second. It must start with an expression, be followed by either of the keywords "-is_between" or "-is_not_between", then followed by another expression, the keyword "-and" or the operator "&", and finally another expression. Example: Where the attribute "part" is greater than 2 times the attribute "c" and less than 4 times "c": -where p.part -is_between 2*p.c -and 4*p.c Where the first two characters of the attribute "name" is between two user supplied values: -where substr(p.name, 1, 2) -is_between .v. & .v. 3.5.1 Expressions An expression consists of an item followed by an arithmetic operator followed by another item, or just an item by itself. An arithmetic operator is either a "+","-","*",or "/". Precedence of evaluation is "*" and "/" first, "+" and "-" last, done left to right. Parentheses may be used around an expression to change the order of evaluation. An item in an expression can be a function, a constant, a .V., or an attribute definition. Any of these may be followed by the symbol "||" and then another item. This symbol indicates concatenation. Any of these items may also be preceeded by either the symbol "+", or "-", to indicate a positive or negative value. An attribute definition can be either an attribute name, a relation name followed by a period then an attribute name, or a database label followed by a period, a relation name, and an attribute name. Example: Two times a user supplied value, plus 1: 2*.V.+1 MTB770 Multics Technical Bulletin MRDS Query Language The first occurrence of 0 in the foreign attribute "parts": -index(prod.p.parts, "0") Just the attribute "part": p.part Concatenate a "MR. " before the attribute "name": "MR. "||p.name 3.5.2 Functions A function must start with a function name, be followed by an open parentheses, one or more expressions separated by commas, and ended with a close parentheses. Example: The first occurrence of 2 in the attribute part: index(p.part, "2") Count the different values of the attribute "p": count(p) Return the substring of the attribute "name" starting at a user supplied value: substr(e.name, .V.) 3.6 Order Clause The order clause is used to sort the data selected by the selection expression. The order clause must start with the keyword "-order_by". It must be followed by one or more expressions separated by commas. The order in which these expressions are supplied dictates the sort order. These expressions must specify items that were selected in the select clause. Any of them may be followed by either of the keywords "-ascending" or "-descending". If neither of these are supplied "-ascending" is assumed. .V. or .X. are not allowed in the order clause. Example: Sort by the attribute "parts" in descending order: -order_by p.parts -descending Multics Technical Bulletin MTB770 MRDS Query Language Sort by the attribute "number" in ascending order and within number, sort by 2 times the attribute "parts" in descending order: -order_by p.number, 2*p.parts -descending 3.7 Group Clause The group clause is used to elevate set values specified in the select clause to scalar values. It is only useful when both set and scalar values are specified in the same select clause. The effect is to collect the many values returned by an attribute reference into separate groups. The group clause must start with the keyword "-group_by", and be followed by an expression. It may optionally be followed by a "having_clause", which serves to qualify the groups specified in the group clause. A "having_clause" is exactly the same as a WHERE clause except the initial keyword must be "-having". The substitution characters .V. and .X. are not allowed in a group clause. Example: List the average salary for all job categories: -range emp -select job (avg(salary)) -group_by job List the average salary for programmers and analysts: -range emp -select job (avg(salary)) -group_by job -having job = "PROG" | job = "ANAL" MTB770 Multics Technical Bulletin MRDS Query Language 4. Implementation Details 4.1 Introduction The new parser system consists of three logical parts. These are a control part, an error reporting part, and the actual parser part. The control part allows controlling the actions and operation of the parser and error reporting part. The error reporting part handles the condition mechanism and error message formatting and displaying mechanism. The parser part consists of the actual scanner, parser, semantics analyzer and related support routines. 4.2 Control Section A new command has been added to enable the user to more directly control MRDS. The command is called set_mrds_options and has a short name of smo. The smo command allows the setting, resetting and listing of various options relating to MRDS selection expressions. With the smo command the user has the option of turning off or on the print_search_order and no_optimize flags which are currently only settable by the -pso and -no_ot keywords in the selection expression. He also has the option of turning off or on two new flags, print_selection_expression and error_report. Turning on the print_selection_expression or pse flag causes a successfully parsed selection expression to be formatted and displayed over the user_i/o switch. The error_report or er switch enables or disables the new error reporting mechanism including the signalling of mrds_se_error_ when a selection expression error is detected. Resetting this switch causes MRDS to report selection expression errors only via the returned error code as it has always done. These switches are set per-process and consequently affect all selection expressions evaluated across all database openings in a process. They are intended mainly for debug purposes. A -reset control argument resets all of the values to their defaults, and the -list control argument displays the current settings for all of the switches. A -pso or -no_ot in the selection expression will take precedence over any setting via the smo command unless the -force control argument was used when setting the switch. The complete info segment for set_mrds_options can be found in appendix E. 4.3 Error Reporting Multics Technical Bulletin MTB770 MRDS Query Language 4.3.1 Introduction MRDS error reporting has always been a source of much controversy and many complaints. With the new parser comes an opportunity to create an error reporting mechanism that is actually helpful. To be truly helpful MRDS should offer more information than something such as "Syntax error in selection expression" when an error is discovered during the parsing of the selection expression. It should return as much accurate information as possible to the user to help him in diagnosing the problem. If MRDS "knows" what the problem is, that fact should be reported. To this end, much time was spent in designing the error reporting mechanism. The LALR parser generating system which was used to generate the new parser offers 2 methods of error detection and correction. These are local recovery and skip recovery. Local error recovery attempts to "guess" at what the orginal intent was based upon context. Skip error recovery tries to ignore the error and continue on at some point after the error where it is more sure of itself. Both of these methods are intended to detect as many errors as possible during one pass. While this is certainly an admirable and worthwhile goal, it proved to difficult and dangerous to fold into MRDS in this project time frame. The limiting factor was in integrating severity of error into MRDS and then reporting it. While it is generally safe to assume that a missing right parentheses in a range clause is harmless and should cause no problem in evaluating the selection expression, the same assumption can not be made about a missing right parentheses in the where clause. For these reasons it was decided to implement the parser with only single error detection and reporting. 4.3.2 General Overview The MRDS selection expression error reporting system will be based upon a new condition called "mrds_se_error_". This will require a change to default_error_handler_ to deal with the new condition. The condition will be signalled whenever an error is detected in the parsing of the selection expression. This can be a syntax error, or a logic error such as specifying a relation that doesnt exist. Error codes will continue to be returned as they are now. The mrds_se_error_ condition info structure is defined in appendix D. MTB770 Multics Technical Bulletin MRDS Query Language If the user does not handle the condition the default operation will be to print an error message on the error_output switch. The error message will contain a formatted version of the selection expression, the text version of the error code returned, and an optional string used to clarify the error. The error message has the following format: Error: MRDS <1> error. <2>. <3>. <4> Where <1> is derived from mrds_se_error_info.error_type and will be one of the following strings: Selection Expression Range Clause Select Clause Where Clause Where Clause Function Where Clause Expression Access Violation Internal Logic <2> will be the text version of the error code contained in mrds_se_error_info.header.status_code <3> is an optional message that is used to clarify the error. It is contained in mrds_se_error_info.error_msg <4> is the selection expression displayed in a format where keywords start on a new line and any line greater than 79 characters is broken and started on the next line. A pointer, the carat (^), is used to point at the offending token. It is derived from mrds_se_error_info.formatted_select_expr.se_ptr and mrds_se_error_info.formatted_select_expr.se_len. An error message might look like the following: Error: MRDS Range Clause error. A specified relation name is undefined in the submodel. The relation 'foo' is unknown in this opening. -range (f foo) ^ -select f.far 4.3.3 Scanner Errors The only errors the scanner will complain about are those that prevent it from returning a token. These are incomplete character strings and numbers, or tokens that are too long. All Multics Technical Bulletin MTB770 MRDS Query Language other errors are left to the parser and semantic analyzer to discover and report. All errors discovered in the scanner will be reported via the mrds_se_error_ condition mechanism. The variable mrds_se_error_info.error_type and the field labeled <1> in the error message will always contain the string "Selection Expression". The variable mrds_se_error_info.error_msg and the field labeled <3> in the error message will always contain the offending token wrapped in single quotes. 4.3.4 Parser Errors Error messages the parser can return are of two types. Parser logic errors, where the parser fails due to some internal limitation, or grammatical syntax errors. Parser logic errors should be very few and may or may not cause the mrds_se_error_ condition to be signalled. Grammar syntax errors will always cause the mrds_se_error_ condition to be signalled. The value of mrds_se_error_info.error_type will always be "Selection Expression", and the mrds_se_error_info.header.status_code will always be mrds_error_$sell_syntax (A syntax error has been detected within the selection expression). The mrds_se_error_info.error_msg will contain a message that identifies the incorrect token and labels it as a symbol, number, bit_string, string, keyword, or operator. A parser error might look like one of the following. Error: MRDS Selection Expression error. A syntax error has been detected within the selection expression. The unexpected operator "(" was encountered. -range ((a one) ^ -select a.CH1 Error: MRDS Selection Expression error. A syntax error has been detected within the selection expression. The unexpected end of the selection expression was encountered. -range (a one) -select a.CH1 -where ^ 4.3.5 Semantic Errors MTB770 Multics Technical Bulletin MRDS Query Language The semantic routines, depending on the specific error, may or may not signal the mrds_se_error_ condition. Errors returned by the semantic routines can be of any type but will generally report either unimplemented request errors, where the requested syntax has not been written, or a general usage error. The semantic routines will cause the contents of mrds_se_error_info.error_type to contain any of the Range Clause, Select Clause, Where Clause, etc. strings as describe above depending on the actual error. Generally the mrds_se_error_info.error_msg variable will contain a descriptive narrative of the problem. 4.4 Parser The new parser is designed and created with the parser generator system known as LALR. LALR has provisions for creating and debugging a grammar, and then for generating a parser and skeletons of scanner and semantic analysis procedures and related data structures to support the parser. The existing MRDS parser is fairly structured and modularized. This allows us to basically "carve" out the old parser and replace it with a new one, relatively cleanly and easily. The old parser consists of seven procedures. They are detailed here: mrds_dsl_translate called by higher level routines such as mrds_dsl_retrieve and mrds_dsl_modify. Serves to control parse. mrds_dsl_range_clause called by mrds_dsl_translate to parse and translate a range clause. mrds_dsl_select_clause called by mrds_dsl_translate to parse and translate a select clause. mrds_dsl_where_clause called by mrds_dsl_translate to parse and translate a where clause. mrds_dsl_expr called by mrds_dsl_translate to parse and translate an expression in a where clause. Multics Technical Bulletin MTB770 MRDS Query Language mrds_dsl_func called by mrds_dsl_translate to parse and translate a function in a where clause. mrds_dsl_get_token serves sa scanner and is called by all of the above procedures to return the next token in the selection expression. For the most part, the algorithms and associated working code in all of these modules is retained. The new parser consists of the following 8 procedures. They are detailed here. mrds_dsl_translate called by higher level routines such as mrds_dsl_retrieve and mrds_dsl_modify. This module handles simple selection expressions such as -compiled and -another. All other selection expressions are passed along to mrds_dsl_parser. mrds_dsl_parser This module is the parser engine. It includes the scanner and all of the state tables to make the parser function as a finite state machine. It detects parser syntax errors and causes the reporting of these and all errors detected during parsing. This is called by mrds_dsl_translate. mrds_dsl_semantics This module either does the actual work when a production is reduced by the parser or calls the appropriate utility module to do the work. This is called by mrds_dsl_parser. mrds_dsl_select_clause_ This module was the old mrds_dsl_select_clause. It was restructered but is functionally the same. Most of the internal procedures were made into entry points that are callable from mrds_dsl_semantics. mrds_dsl_where_clause_ This module was the old mrds_dsl_where_clause. It was restructered but is functionally the same. Most of the internal procedures were made into entry points that are callable from mrds_dsl_semantics. mrds_dsl_expr_ This module was the old mrds_dsl_expr. It was restructered but is functionally the same. Most of the internal procedures were made into entry points that are callable from mrds_dsl_semantics. MTB770 Multics Technical Bulletin MRDS Query Language mrds_dsl_func_ This module was the old mrds_dsl_. It was restructered but is functionally the same. Most of the internal procedures were made into entry points that are callable from mrds_dsl_semantics. mu_print_error_ This module is called to report errors. It formats the selection expression, builds the condition_info structure and signals the mrds_se_error_ condition. It is only called by mrds_dsl_parser. The old module mrds_dsl_get_token was eliminated. Parts of its functionality were included in mrds_dsl_parser$scanner and mrds_dsl_semantics. The old module mrds_dsl_range_clause was folded into mrds_dsl_semantics. The old modules mrds_dsl_where_clause, mrds_dsl_select_clause, mrds_dsl_func and mrds_dsl_expr were reorganized and had there names changed to include a trailing underscore. Their functionality and general algorithms have been retained. 4.4.1 Scanner The scanner is driven from a set of state tables that reside in mrds_scanner_tables.incl.pl1. In this include file are declarations for token_type_list which assigns a type to tokens, char_class_list which assigns classes to all characters, and a num_state_table which is used to driver number parsing. This include file is documented in Appendix F. For all practical purposes an unreserved keyword scanner could not be built for MRDS using the LALR parser generator. This being the case, the new scanner currently recognizes 56 keywords. They are declared in the include file mrds_dsl_keywords.incl.pl1 which is contained in Appendix F. The scanner tables generated by the LALR parser generator allows for synonyms to be defined for any of the keywords. This allowed us to keep the list of reserved words down but expand MRDS' vocabulary. Following is a list of the synonymns that are currently defined. Token Synonyms <= =< ^> = ^<> ^>< ^= <> >< > ^<= ^=< >= => ^< < ^>= ^=> Multics Technical Bulletin MTB770 MRDS Query Language .V. .v. .X. .x. -range -from & -and | -or ^ -not 5. Conclusion This new parser was beta tested for 6 months before going into production on all 3 of Fords Multics systems. It has been in production for over a year as of this writing. Initial start up grumblings were of two varieties. The first type actually broke applications. The new parser was designed to be upward compatible with the old parser. However the old parser, due solely to its design, allowed certain mal-formed selection expressions to be evaluated properly. Most common of these were where clauses that ended in logical operators, such as: -where (a.b=c.d) & (d.e=f.g) & Errors of this type were the result of dynamic selection expression building programs. This example while syntactily incorrect was evaluated properly by the old parser. The new parser complains about it. The other most common problem of this type centered around token delimination. The new parser requires white space around keywords where the old one did not. Thus selection expression such as: -range (a b) -select a-where a.b=2 were accepted by the old parser. The new parser will complain about the attribute a-where not being valid in the a relation. This problem is centered around the fact that MRDS allows hyphens in attribute names and also uses them to delimit keywords and serve as the subtraction operator. Large amounts of heuristics in the old parser allowed the hyphen to be interpreted properly. The new parser put the restriction that all key words must be preceeded by white space. The other two cases, subtraction and hyphens in attributes are determine heuristically, with no restrictions. The other major problem, while not fatal to applications, caused general unrest in one of our larger applications. The application was designed to "guess" at what the user wanted. In doing this, it would select relation and attribute combinations MTB770 Multics Technical Bulletin MRDS Query Language that were not always correct and build a selection expression around them. The application would then catch the error code returned from MRDS and try again with a different set of relations and attributes. This was its normal and accepted operation. When the new parser discovered the error it would report it by signalling the mrds_se_error_ condition and generating the output error message. This problem was solved by having the user turn the error reporting off via the smo -er off command. These problems were the most visible but affected a very small percentage of the user base. All of the users were amiable to changing their code to work properly with the new parser under the guise that their code was wrong and depended on inherent bugs in the old parser to function. They also were eagerly awaiting not having to use "all those parentheses". Multics Technical Bulletin MTB770 MRDS Query Language Appendix A Points of Interest These are new features that the parser is now capable of. They are not all implemented, but the framework is supplied for them. Only items 7, and 8 are implemented in the first version. 1) Sorting provisions by the ORDER clause. 2) Grouping provisions by the GROUP clause. 3) The use of expressions in the SELECT clause. 4) The use of aliases in the SELECT clause, to name new attributes in the definition of temporary relations. 5) When a specified attribute or relation is obvious, a tuple variable is not required. It is optional though. 6) Reflexive properties for WHERE clause primaries. For example: a.b=.v is the same as .v.=a.b 7) Operator precedence for logical and arithmetic operators without the use of parentheses. Parentheses are optional. 8) Expanded set of relational operators. 9) Complete regular expression searching in WHERE clause. 10) Additional expression capabilities in the WHERE clause. 11) Simplification of naming relations in the RANGE clause. Relation labels are only required when referenced attributes are not unique. 12) Outer join capabilities in the RANGE clause. 13) Multi database support when referencing attributes. 14) Expanded WHERE clause capabilities using the new keywords, -is_like, -is_null, and -is_between. 15) Expanded WHERE clause relational capabilities using the new keywords, -any_of, and -all_of. MTB770 Multics Technical Bulletin MRDS Query Language Appendix B BNF Tutorial The database query language presented here are described in a syntax known as BNF. BNF is a language that is used to describe grammars. It consists of a set of symbols known as terminals, variables and delimiters. Together, these form what are called productions. A collection of productions is known as a rule. A collection of rules form a grammar which defines the language. As an example of BNF, the following grammar defines an integer. The delimiter "::=" should be read as "is defined as", and the delimiter "|" as "or". The delimiter "!" indicates the end of the rule. The terminals in this grammar are the symbols +, -, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0. These tokens usually are not enclosed in the brackets < >. They are the lowest element in the grammar and have no further derivation. The variables are <signed_integer>, <digits> and <digit>. The start symbol is the symbol <integer>. The production and rule numbers are displayed for informational purposes only. Production number Rule number <integer> ::= <signed_integer> ! 1 1 <signed integer> ::= + <digits> | 2 2 - <digits> | 3 <digits> ! 4 <digits> ::= <digit> | 5 3 <digits> <digit> ! 6 <digit> ::= 0 | 1 | 2 | 3 | 4 | 7-16 4 5 | 6 | 7 | 8 | 9 ! Rule 1 says an <integer> is a <signed_integer>. Rule 2 says a <signed_integer> is a plus sign followed by <digits>, or a minus sign followed by <digits>, or just <digits>. Rule 3 says <digits> is a <digit>, or a <digits> followed by <digit>. Rule 4 says that a <digit> is any one of the ten terminals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. The string "-123" would be identified as an integer by the following production sequence. Scanning the string from left to right, the minus would go on the stack as a terminal. The "1" would be identified by production 8 as a <digit> and then reduced Multics Technical Bulletin MTB770 MRDS Query Language to production 5, <digits>. The 2 would then be read and identified by production 9 as a <digit> and then reduced to production 6 because we already had a <digits> and 6 says we can have a <digits> followed by a <digit>. The 3 would be handled the same as the 2. Upon detecting the end of the information, production 5, <digits>, would reduce to 3, <signed_integer>, then to 1 <integer>. Any token not defined in the grammar would signal an error condition. MTB770 Multics Technical Bulletin MRDS Query Language Appendix C BNF of Grammar (* COMMENTS To add a new production to the language, just insert the production in the appropriate place in this description of the grammar. Also create a new "prod (%%%%N): return;" sequence in the correct place, where N is the relative production number in the rule. Then use the command "lalr mrds_dsl". New tables will be created, a new mrds_dsl_semantics.pl1 and a few other things. To test the new grammar use the command "lalrp mrds_dsl -trace". This invokes an interpretor that will wait for terminal input. Enter a sample grammar and end it with the string "EOI". After the grammar is proven correct the old mrds_dsl_semantics.pl1 will have to be modified to reflect the new grammar. Usually some of the productions have to be renumbered and code must be installed to handle the new ones. If a new kewyword was added then a new keyword table must also be created. This is done with the kwsl command: "kwsl mrds_dsl". It will create a new mrds_dsl.incl.pl1. This must renamed to mrds_dsl_keywords.incl.pl1 and modified by changing the size of keyword.name and adding the new level major_keyword according to the old mrds_dsl_keywords.incl.pl1 Due to a bug in the LALR compiler the production <attrubute_spec> ::= <relation>.<attribute> has to be represented as <attribute_spec> ::= <symbol>.<symbol> which is syntactically equivalent. -pso and -no_ot have been removed. Since they currently can exist anyplace in a selection expression, to maintain upward compatibility, their parsing was moved to the scanner. -another and -compiled have been removed from <control_order>. They will be handled outside of the parser. <control_order> was renamed <current_clause> *) (* SELECTION EXPRESSION *) <selection_expression> ::= <current_clause> | <query> -order_by <order_spec_list> | <query> ! Multics Technical Bulletin MTB770 MRDS Query Language <current_clause> ::= <current_header> | <current_header> <select_list> ! <current_header> ::= -current ! <query> ::= <query_expression> | <query> <set_op> <query_expression> ! <set_op> ::= -union | -inter | -differ ! <query_expression> ::= <query_clause> | ( <query> ) ! <query_clause> ::= <required_clause> | <required_clause> <optional_clauses> ! <required_clause> ::= <range_clause> <select_clause> | <select_clause> <range_clause> ! <optional_clauses> ::= <where_clause> | <group_clause> | <where_clause> <group_clause> ! <group_clause> ::= -group_by <simple_expr> | -group_by <simple_expr> -having <qualifier> ! <order_spec_list> ::= <simple_expr> | <simple_expr> <direction> | <order_spec_list> , <simple_expr> | <order_spec_list> , <simple_expr> <direction> ! <direction> ::= -ascending | -descending ! (* RANGE CLAUSE *) MTB770 Multics Technical Bulletin MRDS Query Language <range_clause> ::= -range <range_def_list> ! <range_def_list> ::= <sql_range_def_list> | <dsl_range_def_list> ! <sql_range_def_list> ::= <sql_range> | <sql_range_def_list> , <sql_range> ! <sql_range> ::= <sql_range_item> | <sql_range_item> (+) ! <sql_range_item> ::= <relation_id> | <relation_id> <relation_label> ! <dsl_range_def_list> ::= <dsl_range_item> | <dsl_range_def_list> <dsl_range_item> ! <dsl_range_item> ::= ( <relation_label> <relation_id>) ! <relation_id> ::= <relation> | <v_arg_substitution> | <db_label>.<relation> ! (* SELECT CLAUSE *) <select_clause> ::= <select_header> * | <select_header> <select_list> ! <select_header> ::= -select | -select -distinct | -select -dup ! Multics Technical Bulletin MTB770 MRDS Query Language <select_list> ::= <select_item> | <select_list> <select_item> ! <select_item> ::= <select_id> | <select_id> :: <alias> ! <select_id> ::= <attribute> | ( <simple_expr> ) | ( <simple_expr> ) * | <relation> . <attribute> | <relation> . <attribute> * ! (* WHERE CLAUSE *) <where_clause> ::= -where <qualifier> ! <qualifier> ::= <qualifier_term> | <qualifier> <or_op> <qualifier_term> ! <qualifier_term> ::= <qualifier_factor> | <qualifier_term> <and_op> <qualifier_factor> ! <qualifier_factor> ::= <predicate> | <not_op> <predicate> ! <predicate> ::= <term> | (<qualifier>) ! <term> ::= <expr_or_query> <comp_op> <expr_or_query> | <expr_or_query> <is_in> ( <query_literal> ) | <like_item> <like> <regular_expr> | <attribute_spec> <is_null> | <expr> <between> <expr> <and_op> <expr> ! <expr_or_query> ::= ( <query_literal> ) | [ <old_func_or_expr> ] | MTB770 Multics Technical Bulletin MRDS Query Language <expr> | <x_arg_substitution> ! <query_literal> ::= <query> | <constant_list> ! <old_func_or_expr> ::= <old_function> | <old_expr> ! <like_item> ::= <attribute_spec> | <function> ! <is_in> ::= -is_in | -is_not_in ! <like> ::= -is_like | -is_not_like ! <is_null> ::= -is_null | -is_not_null ! <between> ::= -is_between | -is_not_between ! <comp_op> ::= <rel_op> | <rel_op> -any_of | (* ^= -any_of is always true *) <rel_op> -all_of ! (* = -all_of is always false *) <rel_op> ::= = | ^= | > | >= | '< | '<= ! <regular_expr> ::= <expr> | <x_arg_substitution> ! (* EXPRESSIONS *) (* SIMPLE EXPR I.E. no arg substitution *) Multics Technical Bulletin MTB770 MRDS Query Language <simple_expr> ::= <simple_arith_term> | <simple_expr> <add_op> <simple_arith_term> ! <simple_arith_term> ::= <simple_arith_factor> | <simple_arith_term> <mult_op> <simple_arith_factor> ! <simple_arith_factor> ::= <simple_primary> | <simple_arith_factor> '|'| <simple_primary> | <add_op> <simple_primary> ! <simple_primary> ::= <attribute_spec> | <function> | (<simple_expr>) | <constant> ! (* STANDARD EXPR I.E. arg substitution *) <expr> ::= <arith_term> | <expr> <add_op> <arith_term> ! <arith_term> ::= <arith_factor> | <arith_term> <mult_op> <arith_factor> ! <arith_factor> ::= <primary> | <arith_factor> '|'| <primary> | <add_op> <primary> ! <primary> ::= <attribute_spec> | <function> | (<expr>) | <constant> | <v_arg_substitution> ! (* OLD EXPR I.E. BRACKETS AROUND EXPRESSIONS *) <old_expr> ::= <old_primary> <arith_op> <old_primary> ! MTB770 Multics Technical Bulletin MRDS Query Language <old_primary> ::= <relation>.<attribute> | <old_function> | <constant> | ( <old_expr> ) | <v_arg_substitution> ! <old_function> ::= <fn_name> (<old_arg_list>) ! <old_arg_list> ::= <old_arg> | <old_arg_list> <old_arg> ! <old_arg> ::= [ <old_func_or_expr> ] | <old_function> | <relation>.<attribute> | <constant> | <v_arg_substitution> ! (* COMMON PRODUCTIONS *) <function> ::= <fn_name> (<arg_list>) ! <arg_list> ::= <expr> | <arg_list> , <expr> ! <attribute_spec> ::= <attribute> | <symbol> . <symbol> | <db_label> . <relation> . <attribute> ! <constant_list> ::= <constant_item> , <constant_item> | <constant_list> , <constant_item> ! <constant_item> ::= <constant> | <v_arg_substitution> | <x_arg_substitution> ! <constant> ::= <string> | <integer> | <bit_string> ! <arith_op> ::= <mult_op> | <add_op> ! Multics Technical Bulletin MTB770 MRDS Query Language <add_op> ::= + | - ! <mult_op> ::= * | / ! <and_op> ::= & ! <or_op> ::= '| ! <not_op> ::= ^ ! <v_arg_substitution> ::= .V. ! <x_arg_substitution> ::= .X. ! <db_label> ::= <symbol> ! <relation> ::= <symbol> ! <relation_label> ::= <symbol> ! <attribute> ::= <symbol> ! <fn_name> ::= <symbol> ! <alias> ::= <symbol> ! MTB770 Multics Technical Bulletin MRDS Query Language Appendix D mrds_se_error_ Info Condition Structure This include file contains the info structure for the "mrds_se_error_" condition. This condition is signalled by MRDS when certain errors are discovered in a selection expression. 1 mrds_se_error_info aligned based (mrds_se_error_info_ptr), 2 header aligned like condition_info_header, 2 error_type char (32), /* range, select... */ 2 error_msg char (256), /* informational msg */ 2 token char (mrds_data_$max_token_size), /* last known token */ 2 raw_select_expr, 3 se_ptr ptr, /* ptr to the actual se */ 3 se_len fixed bin, /* length of actual se */ 3 token_position fixed bin, /* where the token starts in raw_select_expr */ 2 formatted_select_expr, 3 se_ptr ptr, /* ptr to "pretty" se */ 3 se_len fixed bin, /* length of "pretty" se */ 3 token_position fixed bin; /* where the token starts in formatted_select_expr */ Multics Technical Bulletin MTB770 MRDS Query Language Appendix E set_mrds_options Info Segment 09/04/85 set_mrds_options, smo Syntax: set_mrds_options -control_args Function: Allows the setting, reseting and displaying the state of various options relating to MRDS selection expressions. Control arguments: -force, -fc Specifies that options existing in a MRDS selection expression are to be overridden by those specified by the set_mrds_option command. The default is to not override options specified in a selection expression. -optimize, -ot off | on Controls the optimization of a MRDS selection expression via the MRDS optimizer. The default is to optimize the selection expression. -print_search_order, -pso off | on Controls the printing of the search order when a MRDS selection expression is evaluated. The default is to not print the search order. -print_selection_expression, -pse off | on Causes all selection expressions that are not "-another", "-compiled", or "-current" to be displayed when they are translated. The default is to not print the selection expression. -error_report, -er off | on Controls how selection expression errors are reported to the user. This is done either by just returning a code, (-er off), or by by a long diagnostic message (-er on). This control argument does not affect error reporting that is done via sub_err_ signalling. -list, -ls Displays the state of all of the switches. -reset, -rs Resets the option values to their defaults. See the section on Default values. MTB770 Multics Technical Bulletin MRDS Query Language Default values: The default values that MRDS uses if set_mrds_options is never invoked, or it is invoked with the -reset control argument are: -print_search_order OFF -optimize ON -print_select_expression OFF -error_report ON -force OFF Notes: The command "smo -pso on" will cause all MRDS selection expressions evaluated in a process to be done as if the keyword "-pso" was actually in each selection expression. This behaviour will continue until the command "smo -pso off" is executed or a new process is initiated, either via new_proc or a logout;login sequence. The command "smo -ot off" will cause all MRDS selection expressions evaluated in a process to be done as if the keyword "-no_ot" was actually in each selection expression. This behaviour will continue until the command "smo -ot on" is executed or a new process is initiated, either via new_proc or a logout;login sequence. If the -force argument is not provided the options specified are used only as default values. That is, they do not affect selection expressions that have defined the options in the selection expression. The -reset and -list arguments are incompatible with any other control arguments. Multics Technical Bulletin MTB770 MRDS Query Language Appendix F mrds_scanner_tables.incl.pl1 /* mrds_scanner_tables.incl.pl1 These are the tables that drive the scanner for MRDS selection expression parsing. Originally created 08-04-85 J. Hergert The following declaration is used by the scanner to type the token it is about to parse. The first character determines what the token could be. The codes in the following declaration are deciphered as follows: Any code > 0 is a token type, indicating the type of token we will try to find. Codes less than 0 are negated keyword encode values. These token types are simple one character tokens and no further searching need be done when we find one. */ dcl token_type_list (22) fixed bin internal static options(constant) init ( /* CHARACTER CLASSES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 num . +- Ee i ^ = < > let * / | ( ) & , 17 18 19 20 21 22 whsp " oth : ] [ */ 1, 2, 3, 4, 4, 5, 6, 7, 8, 4, 14, 9, 14, 14, 14, 14, 10, 11, 12, 13, 14, 14); /* The following dcl sets up a list that is used to categorize each character into one of 22 classes. The classes are defined below. The character classes are the following: 1 digits 2 . 3 +- 4 Ee 5 i 6 ^ 7 = 8 < 9 > 10 letters MTB770 Multics Technical Bulletin MRDS Query Language 11 * / 12 | 13 ( 14 ) 15 & 16 , 17 SP TAB NL FF VT CR (white space) 18 " 19 all others 20 : 21 ] 22 [ */ /* each of 512 ascii characters classified into the above groups */ dcl char_class_list (0:511) fixed bin internal static options(constant) init( /* whsp ! " #$% & ' ( ) * + , - */ (9)19, (5)17, (18)19, 17, 19, 18, (3)19, 15, 19, 13, 14, 11, 3, 16, 3, /* . / nums : ; < = > ? @ ABCD E FGH I J-U */ 2, 11, (10)1, 20, 19, 8, 7, 9, 19, 19, (4)10, 4, (3)10, 5, (12)10, /* V W X Y Z [ ] ^ _ ` abcd e fgh i j-u */ 10, 10, 10, 10, 10, 22, 19, 21, 6, 19, 19, (4)10, 4, (3)10, 5, (12)10, /* v w x y z { | */ 10, 10, 10, 10, 10, 19, 12, (387)19); /* the table declared below is a state table used to parse a number. The rows are states, and the columns are character classes. These are obtained from the char_class_list above. The columns are organized so that the six correspond to classes 1-6 above. A positive value in the table is the next state to goto, given the current character class. A negative value means the scan is finished. -1 means the scan is finished, a token has been found. -2 means an error has been detected. -3 means the scan is finished, a token has been found, the cursor must be bumped by one. */ dcl num_state_table (6,6) fixed bin internal static options (constant) init( /* class 1 2 3 4 5 6 num . +- Ee i other states */ Multics Technical Bulletin MTB770 MRDS Query Language /* 1 */ 1, 2, -1, 3, -3, -1, /* 2 */ 4, -2, -2, -2, -2, -2, /* 3 */ 6, -2, 5, -2, -2, -2, /* 4 */ 4, -1, -1, 3, -3, -1, /* 5 */ 6, -2, -2, -2, -2, -2, /* 6 */ 6, -1, -1, -1, -3, -1); MTB770 Multics Technical Bulletin MRDS Query Language Appendix G mrds_dsl_keywords.incl.pl1 /* BEGIN INCLUDE FILE ..... mrds_dsl_keywords.incl.pl1 ..... 03/17/87 1851.1 est Tue kwsl (generated) */ /* Keywords from >user_dir_dir>MRDS>Hergert>p>mrds_dsl.grammar Generated from >udd>MRDS>Hergert>p>mrds_dsl.lalr by Hergert.MRDS.a at Ford ECC Multics A on 03/17/87 1825.8 est Tue */ /* this is the list of keywords and their synonyms that are known to the scanner. */ dcl 1 keyword aligned internal static options (constant), 2 name (74) unaligned char (24) init( /* 1 */ "&", /* 1 & */ /* 2 */ "(", /* 2 ( */ /* 3 */ "(+)", /* 3 (+) */ /* 4 */ ")", /* 4 ) */ /* 5 */ "*", /* 5 * */ /* 6 */ "+", /* 6 + */ /* 7 */ ",", /* 7 , */ /* 8 */ "-", /* 8 - */ /* 9 */ "-all_of", /* 9 -all_of */ /* 10 */ "-and", /* 1 & */ /* 11 */ "-any_of", /* 10 -any_of */ /* 12 */ "-ascending", /* 11 -ascending */ /* 13 */ "-current", /* 12 -current */ /* 14 */ "-descending", /* 13 -descending */ /* 15 */ "-differ", /* 14 -differ */ /* 16 */ "-distinct", /* 15 -distinct */ /* 17 */ "-dup", /* 16 -dup */ /* 18 */ "-from", /* 33 -range */ /* 19 */ "-group_by", /* 17 -group_by */ /* 20 */ "-having", /* 18 -having */ /* 21 */ "-inter", /* 19 -inter */ /* 22 */ "-is_between", /* 20 -is_between */ /* 23 */ "-is_in", /* 21 -is_in */ /* 24 */ "-is_like", /* 22 -is_like */ /* 25 */ "-is_not_between", /* 23 -is_not_between */ /* 26 */ "-is_not_in", /* 24 -is_not_in */ /* 27 */ "-is_not_like", /* 25 -is_not_like */ /* 28 */ "-is_not_null", /* 26 -is_not_null */ /* 29 */ "-is_null", /* 27 -is_null */ /* 30 */ "-no_optimize", /* 28 -no_optimize */ /* 31 */ "-no_ot", /* 29 -no_ot */ /* 32 */ "-not", /* 53 ^ */ /* 33 */ "-or", /* 55 | */ /* 34 */ "-order_by", /* 30 -order_by */ Multics Technical Bulletin MTB770 MRDS Query Language /* 35 */ "-print_search_order", /* 31 -print_search_order */ /* 36 */ "-pso", /* 32 -pso */ /* 37 */ "-range", /* 33 -range */ /* 38 */ "-select", /* 34 -select */ /* 39 */ "-union", /* 35 -union */ /* 40 */ "-where", /* 36 -where */ /* 41 */ ".", /* 37 . */ /* 42 */ ".V.", /* 38 .V. */ /* 43 */ ".X.", /* 39 .X. */ /* 44 */ ".v.", /* 38 .V. */ /* 45 */ ".x.", /* 39 .X. */ /* 46 */ "/", /* 40 / */ /* 47 */ "::", /* 41 :: */ /* 48 */ "<", /* 42 < */ /* 49 */ "<=", /* 43 <= */ /* 50 */ "<>", /* 54 ^= */ /* 51 */ "<bit_string>", /* 44 <bit_string> */ /* 52 */ "<integer>", /* 45 <integer> */ /* 53 */ "<string>", /* 46 <string> */ /* 54 */ "<symbol>", /* 47 <symbol> */ /* 55 */ "=", /* 48 = */ /* 56 */ "=<", /* 43 <= */ /* 57 */ "=>", /* 50 >= */ /* 58 */ ">", /* 49 > */ /* 59 */ "><", /* 54 ^= */ /* 60 */ ">=", /* 50 >= */ /* 61 */ "[", /* 51 [ */ /* 62 */ "]", /* 52 ] */ /* 63 */ "^", /* 53 ^ */ /* 64 */ "^<", /* 50 >= */ /* 65 */ "^<=", /* 49 > */ /* 66 */ "^<>", /* 48 = */ /* 67 */ "^=", /* 54 ^= */ /* 68 */ "^=<", /* 49 > */ /* 69 */ "^=>", /* 42 < */ /* 70 */ "^>", /* 43 <= */ /* 71 */ "^><", /* 48 = */ /* 72 */ "^>=", /* 42 < */ /* 73 */ "|", /* 55 | */ /* 74 */ "||"), /* 56 || */ /* This table contains the codes that are known to the parser. One code per token. Synonomous tokens have the same code. */ 2 value (74) fixed bin init ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 10, 11, 12, 13, 14, 15, 16, 33, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 53, 55, 30, 31, MTB770 Multics Technical Bulletin MRDS Query Language 32, 33, 34, 35, 36, 37, 38, 39, 38, 39, 40, 41, 42, 43, 54, 44, 45, 46, 47, 48, 43, 50, 49, 54, 50, 51, 52, 53, 50, 49, 48, 54, 49, 42, 43, 48, 42, 55, 56), /* This table is used for error reporting. It decides that the keyword is a "major" one. These currently are -differ, -range, -group_by, -having, -inter, -order_by, -from, -select, -union, -where */ 2 major_keyword (74) bit unaligned init ( /* 1-7 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 8-14 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 15-21 */ "1"b, "0"b, "0"b, "1"b, "1"b, "1"b, "1"b, /* 22-28 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 29-35 */ "0"b, "0"b, "0"b, "0"b, "0"b, "1"b, "0"b, /* 36-42 */ "0"b, "1"b, "1"b, "1"b, "1"b, "0"b, "0"b, /* 43-49 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 50-56 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 57-63 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 64-70 */ "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, "0"b, /* 71-74 */ "0"b, "0"b, "0"b, "0"b); /* END INCLUDE FILE ..... mrds_dsl_keywords.incl.pl1 ..... */ Page 38