Multics Technical Bulletin MTB-729 Basic Math Functions To: Distribution From: Hal Hoover Date: 28 August 1985 Subject: BFP and HFP Basic Mathematical Functions 1. Abstract This MTB presents a discussion of the algorithms used in rewriting several ALM coded math functions presently in bound_library_wired_. The paper describes the algorithms used to implement the single and double precision, Binary Floating Point (BFP) and Hexadecimal Floating Point (HFP) basic mathematical functions: inverse sine and inverse cosine, inverse tangent, exponential, logarithm, sine and cosine, square root, tangent and cotangent. It is a more detailed discussion of the numerical analysis methods mentioned in MTB 695 - ALM Math Routines. _________________________________________________________________ Multics project internal documentation; not to be reproduced or distributed outside the Multics project. MTB-729 Multics Technical Bulletin Basic Math Functions Comments on this MTB should be sent to the author - via Multics mail to: Hoover.Multics on System M via posted mail to: Hal Hoover Advanced Computing Technology Centre The University of Calgary Foothills Professional Building Room #301, 1620 - 29th Street N.W. Calgary, Alberta T2N 4L7 CANADA via telephone to: (403)-270-5400 Multics Technical Bulletin MTB-729 Basic Math Functions TABLE OF CONTENTS Section Page Subject ======= ==== ======= 1 i Abstract 2 1 Overview 3 3 Minmax Polynomials 4 5 Minmax Rational Functions 5 9 Accuracy 6 11 The Algorithms 6.1 11 . . 'arc_sine_' 6.2 18 . . 'double_arc_sine_' 6.3 24 . . 'arc_tangent_' 6.4 33 . . 'double_arc_tangent_' 6.5 39 . . 'exponential_' 6.6 45 . . 'double_exponential_' 6.7 51 . . 'logarithm_' 6.8 59 . . 'double_logarithm_' 6.9 66 . . 'principal_angle_' 6.10 72 . . 'double_principal_angle_' 6.11 75 . . 'sine_' 6.12 80 . . 'double_sine_' 6.13 84 . . 'square_root_' 6.14 89 . . 'double_square_root_' 6.15 92 . . 'tangent_' 6.16 98 . . 'double_tangent_' Multics Technical Bulletin MTB-729 Basic Math Functions 2. Overview This paper describes the algorithms used to implement the single and double precision, Binary Floating Point (BFP) and Hexadecimal Floating Point (HFP) basic mathematical functions: inverse sine and inverse cosine, inverse tangent, exponential, logarithm, sine and cosine, square root, tangent and cotangent. The routines which implement these basic math functions are coded in ALM. Although this makes the routines harder to maintain, it makes for much faster code. It also provides improved accuracy by virtue of the ability to keep critical values in the overlength EAQ. This paper does not document the ALM routines which implement the basic math functions, but rather the algorithms used in those routines. For each function, the derivation of the algorithms to approximate that function in single and double precision, BFP or HFP mode is described. The PL/I implementation for each BFP routine is given. If a corresponding HFP routine requires a different algorithm, a pseudo-PL/I implementation (since PL/I does not support HFP) is also given. The ALM routines which actually implement the basic math routines are essentially translations of the PL/I routines presented here. The algorithms for implementing the various math functions consist of three parts: 1. Perform the appropriate type of error handling if the argument of the function is invalid. For example, in BFP we can not calculate the value of 'exp(x)' if 'x' is bigger than about 88.029 because the result would be larger than the largest BFP number; thus, we simply signal an overflow and return the largest BFP number for the value of the function. MTB-729 Multics Technical Bulletin Basic Math Functions 2. Use mathematical properties of the function to reduce the calculation of the function to that of a more easily evaluated function. For example, the symmetry and periodicity of the sine function make it very easy to reduce the calculation of the sine of an arbitrary number of radians to that of a derived number of radians between -pi/2 and pi/2. 3. Provide an efficient and accurate method of approximating the target function of step 2. For example, the sine function can be approximated to single precision accuracy between -pi/2 and pi/2 by a polynomial of degree 11 having no even powers. The hardest part is, of course, deriving the method of approximation used in step 3. Except for the square root function (which uses Newton's method), the method of approximation used is either a minmax polynomial or a minmax rational function. Thus, in order to better understand the algorithms, some knowledge of minmax polynomials and rational functions is required. Multics Technical Bulletin MTB-729 Basic Math Functions 3. Minmax Polynomials According to Weierstrass's Theorem from Calculus, if f(x) is a continuous function on a closed interval [a, b], and if e is an arbitrary positive number, then there exists a polynomial P(x) such that for all x in [a, b], |P(x) - f(x)| < e. This means that for any of the basic math functions, it is possible to find a polynomial that will approximate that function to any desired amount of accuracy over any closed interval of the domain. This is a nice result because polynomials are easy to evaluate on a computer. The catch is that the order of the polynomial may be so high that round off error in the calculation makes it impossible to actually attain the desired accuracy. From Weierstrass's Theorem it is possible to prove that for any function f(x) which is continuous over a closed interval [a, b], among all polynomials of order n or less, there is a unique polynomial P(x) which minimizes the maximum value of |P(x) - f(x)| for all x in (a, b). This polynomial is called the minmax polynomial. Unfortunately, the theory does not provide a way to directly determine the minmax polynomial; however, it does provide a characterization of the minmax polynomial which can be used to test if any particular polynomial is in fact the minmax polynomial. That characterization is based on something called the equal-error property. A function p(x) has the equal-error property with respect to another function f(x) over a set of points if at each consecutive pair of points u and v in the set (when the set is ordered by increasing value), p(u) - f(u) = f(v) - p(v) (i.e. the magnitude of the difference between p and f is the same at all the points, but the sign of the difference alternates). A polynomial p(x) is the minmax approximation to a function f(x) over an interval (a, b) if and only if there is a set of n+2 points over which p(x) has the equal-error property and at which the maximum difference over (a, b) between p(x) and f(x) occurs. The above characterization can be used to find the minmax polynomial P(x) for a function f(x) over an interval (a, b) by an iterative technique called the Exchange Method. The Exchange Method starts with the arbitrary selection of n+2 points in (a, b). From this set a polynomial p(x) of degree n which has the equal-error property with respect to f(x) is generated. The maximum difference between p and f over (a, b) is calculated. If the maximum difference is the same as the difference at the elements of the set of n+2 points, then p(x) is the desired minmax polynomial. Otherwise, p(x) is only an approximation to the minmax polynomial. A better approximation can be obtained by MTB-729 Multics Technical Bulletin Basic Math Functions replacing one of the n+2 points with the point at which the maximum error occurs in such a way that the sign of p(x) - f(x) alternates over the new ordered set of n+2 points. The new set of n+2 points is then used to start the next iteration. The theory guarantees that the process will terminate in a finite number of steps with the minmax polynomial. Calculation of the polynomial p(x) which has the equal-error property with respect to f(x) over the set of n+2 points is easy. If the points are sorted into ascending order and then named x0, x1, x2, etc., then the coefficients p0, p1, ... pn of the polynomial p(x) are obtained by solving the following system of linear equations: p0 + p1*x0 + p2*x0**2 + ... + pn*x0**n - e = f(x0) p0 + p1*x1 + p2*x1**2 + ... + pn*x1**n + e = f(x1) p0 + p1*x2 + p2*x2**2 + ... + pn*x2**n - e = f(x2) p0 + p1*x3 + p2*x3**2 + ... + pn*x3**n + e = f(x3) etc. Note that the solution of this system of equations also gives us the value of e, which is the amount by which p(x) differs from f(x) at the selected points. If p(x) is the minmax polynomial, then e will be the maximum amount that p(x) differs from f(x) over (a, b). Thus, at the same time that we determine what the minmax polynomial is, we also find out how accurately it approximates f(x). (If it is not as accurate as we desire, then we can solve for the next higher order minmax polynomial to see if it will do; while if it is much more accurate than we need, we can check if the next lower order minmax polynomial will do.) The hard part of the Exchange Method is determining where the maximum difference between p(x) and f(x) occurs. Since all the basic math functions are differentiable and polynomials are differentiable, we can use the result from Calculus that the maximum difference occurs where the derivatives of the functions are equal (i.e. at the points in (a, b) where p'(x) - f'(x) = 0). Unfortunately, it is not very easy to accurately solve nonlinear equations such as p'(x) - f'(x) = 0. The problem can be simplified somewhat by replacing f'(x) with a Taylor polynomial (i.e. the first so-many terms of the Taylor series) for f'(x) of sufficient order to reasonably approximate f'(x). This reduces the problem to finding the real roots of a polynomial, which can be accomplished using, for example, Sturm sequences. Of course, the roots obtained won't be exactly where the maximum differences in p(x) - f(x) occur due to round-off errors and the fact that we approximated f'(x) by a Taylor polynomial. However, the roots will be close and we can search nearby points to find out where the maximum differences really occur. Multics Technical Bulletin MTB-729 Basic Math Functions 4. Minmax Rational Functions Any function which is continuous over a closed interval can be approximated over that interval to as much precision as desired by a polynomial. However, the degree of the polynomial may be so high that this is not a practical way to approximate the function. For example, this occurs when a function with a pole must be approximated near the pole (e.g. the tangent function near pi/2 radians). Such problems can be resolved by using a more complex approximating function than a polynomial, such as a rational function. A rational function is simply the quotient of two polynomials. They have two properties that make them very attractive for use in computers to approximate more complicated functions: They are easy to evaluate, and they adhere to a theory that parallels that for polynomials, as will shortly be apparent. A function which is continuous over a closed interval can be approximated over that interval to as much accuracy as is desired by a rational function. This is an immediate consequence of Weierstrass's Theorem, since polynomials are simply rational functions whose denominator is of degree 0. Moreover, it can be shown that for any function f(x) which is continuous over a closed interval [a, b], among all rational functions whose numerator is of degree m or less and whose denominator is of degree n or less, there exists a unique rational function R(x) which minimizes the maximum value of |R(x) - f(x)| over all x in [a, b]. This rational function is called the minmax rational function. As with the minmax polynomial, the theory yields a characterization by which we can tell if a particular rational function is the minmax rational function, but it does not give a method for directly calculating the minmax rational function. That characterization is virtually the same as the one for the minmax polynomial: A rational function r(x) whose numerator is of degree m and denominator is of degree n, is the minmax approximation to a function f(x) over an interval (a, b) if and only if there is a set of m+n+2 points over which r(x) has the equal-error property and at which the maximum difference over (a, b) between r(x) and f(x) occurs. The above characterization can be used to find the minmax rational function R(x) (of desired order) for a function f(x) over an interval (a, b) by a variation of the Exchange Method used to find the minmax polynomial. This technique starts with the aribitrary selection of m+n+2 points in (a, b). From this set a rational function r(x) (whose numerator is of degree m or MTB-729 Multics Technical Bulletin Basic Math Functions less and whose denominator is of degree n or less) which has the equal-error property with respect to f(x) is generated. The maximum difference between r and f over (a, b) is calculated. If the maximum difference is the same as the difference at the elements of the set of m+n+2 points, then r(x) is the desired minmax rational function. Otherwise, r(x) is only an approximation to the minmax rational function. A better approximation can be obtained by replacing one of the m+n+2 points with the point at which the maximum error occurs in such a way that the sign of r(x) - f(x) alternates over the new ordered set of m+n+2 points. The new set of m+n+2 points is then used to start the next iteration. The theory guarantees that the process will terminate in a finite number of steps with the minmax rational function. Calculation of a rational function r(x) which has the equal-error property with respect to f(x) over the set of m+n+2 points is more complicated than calculating a polynomial with the same property. The calculation consists of two phases. In the first phase, the magnitude of the difference between r(x) and f(x) at the m+n+2 points is calculated. (This can be done without knowing the values of the coefficients of r(x).) In the second phase, the value calculated in the first phase is used to determine the coefficients of the numerator and denominator of r(x). The details of the calculations performed in each phase follow. In the first phase, we calculate a number E such that r(x(i)) - f(x(i)) is equal to (-1)**i * E, for i = 0, 1, 2, ... m+n+1, where x(0), x(1), ... x(m+n+1) are the m+n+2 generating points, sorted into increasing order. Let the numerator of r(x) be the order m polynomial p(x) = p0 + p1*x + p2*x**2 + ... pm*x**m, and let the denominator be the order n polynomial q(x) = q0 + q1*x + q2*x**2 + ... + qn*x**n. Then some simple algebra yields the system of m+n+2 equations: p(x(i)) - (f(x(i)) + (-1)**i * E)*q(x(i)) = 0, i = 0, 1, ... m+n+1 If we knew the value of E, the above system of equations would be reduced to m+n+2 linear equations in the m+n+2 unknowns p0, p1, ..., pm, q0, q1, ... qn. From linear algebra we know that such a system can only have a trivial solution (i.e. all the unknowns are zero) unless the determinant of the system is zero. Thus, we must choose the value of E so that the determinant of the system is zero. Multics Technical Bulletin MTB-729 Basic Math Functions The coefficients of the first m+1 columns of the determinant of the system are all constants, while the coefficients of the remaining n+1 columns are all linear expressions of the variable E. Thus the value of the determinant is a polynomial of order n in the variable E. Any root of this polynomial will make the determinant of the system zero. However we want to minimize the difference between f(x) and the r(x) we are calculating, so we want to choose E to be the real root of smallest magnitude. Once we have found E, we can substitute it in the system of equations. This yields a system of m+n+2 linear equations in m+n+2 unknowns which has a determinant of zero. Of course, a determinant of zero means that at least one of the equations in the system is redundant (being merely a linear combination of some or all of the rest), and at least one of the variables is arbitrary. This is not surprising, since the variables of the system are the coefficients of the numerator and denominator of the rational function we are trying to construct. Since it is a rational function (i.e. the product of two polynomials), we can divide all the coefficients by any nonzero coefficient without changing the value of the rational function, and yet doing so eliminates one of the coefficients (the one we divide by). Thus although the rational function appears to have m+n+2 coefficients, it can be "normalized" to have only m+n+1 coefficients. Apparently, the way to solve the system is to discard one of the equations and to set one of the variables to one. It turns out not to matter which equation we discard, but it does matter which coefficient we set to one. It matters because we must be sure that we do not choose to normalize by the coefficient of a power that should be missing from the rational function we are seeking. Fortunately, it is not hard to tell which coefficient to normalize by. Although we can normalize by any nonzero coefficient of the rational function, in practice we always normalize by either p0 or q0 (i.e. the constant terms of the numerator and denominator). This is because we know that one of them must be nonzero (since if they were both zero we could simplify the rational function by dividing top and bottom by 'x'). If zero is in the closure of the domain over which the rational function is to approximate the given function, we can easily determine if either p0 or q0 could be zero: p0 is zero if and only if the function is zero at zero, and q0 is zero if and only if the function has a pole at zero. If zero is not in the closure of the domain of approximation, we can arbitrarily pick one of p0 or MTB-729 Multics Technical Bulletin Basic Math Functions q0 to normalize by and see what happens; if we are unlucky enough to choose to normalize by a coefficient that should be zero, calculation of the coefficients will either break down or we will get values of very large magnitude for the other coefficients (so that if we renormalized by one of the large value, the coefficient that should be zero would become very small). Once we have picked the coefficient (p0 or q0) to normalize by, the calculation of the remaining coefficients is simply a matter of solving m+n+1 linear equations in m+n+1 unknowns. After calculating the coefficients of the approximating rational function, the next step in the Exchange Method is to test whether we have generated the minmax rational function. This involves finding the point in the domain of approximation at which the rational function has the greatest divergence from the function we are approximating. From Calculus, we know that the greatest divergence occurs where the derivatives of the two functions are equal. Now the derivative of a rational function r(x) = p(x)/q(x) is (p'(x)*q(x) - q'(x)*p(x))/q(x)**2, so the points of greatest divergence are solutions of the equation: p'(x)*q(x) - q'(x)*p(x) - (q(x)**2)*f'(x) = 0 An approximate solution to the above equation can be found by replacing f'(x) by a polynomial that approximates f'(x) suitably closely. (An obvious candidate for such an approximating polynomial is to take enough terms of the Taylor series for the derivative of f(x) to give an error over the domain of approximation which is smaller than the accuracy to which we want r(x) to approximate f(x).) If we replace f'(x) by a polynomial of order 'k', the lefthand side of the equation is just a polynomial s(x) of order max(m+n-1, 2*n+k-1) and we need only find the roots of that polynomial to have a close approximation to the points of maximum difference between r(x) and f(x). The strategy, then, is to choose from the points very near to the roots of s(x) which are in the domain of approximation the point 'z' such that the difference between r(z) and f(z) is greatest. If the magnitude of that difference is greater than the value E previously calculated, we then know that r(x) is not the minmax approximation to f(x). We further know that if we replace one of the points in the set of m+n+2 generating points in the way already described for finding the minmax polynomial, we can use the new set of m+n+2 generating points to obtain a closer rational approximation to f(x). Multics Technical Bulletin MTB-729 Basic Math Functions 5. Accuracy The purpose of generating a minmax polynomial or rational function is to approximate a given function to a specified accuracy in a computationally efficient way. Theoretically, minmax polynomials and rational functions meet both goals: They are easy to calculate on a computer, and the the method for finding them yields a bound on the accuracy of the approximation. In practice, however, there are some problems. Once a minmax approximation of suitable theoretic accuracy has been found, we must implement the approximation. It is not practical to do exact calculations, so we must implement the approximation in floating point. This means that we must worry about errors due to round-off. These are the errors that in practice determine how good an approximation is. There are two kinds of rounding error that must be considered. The first kind is in the conversion of the coefficients of the minmax approximation to floating point. This is no problem if we are performing an approximation that needs single precision accuracy because any critical coefficients can be stored in double precision. However, if we are trying to attain double precision accuracy using double precision coefficients, rounding errors in the coefficients can be critical. The second type of rounding error is that which occurs during evaluation of the approximating function. If we are performing an approximation that needs single precision accuracy, this problem can usually be avoided by doing critical calculations in double precision. However, if we are trying to attain double precision accuracy, double precision arithmetic may not be precise enough. The overlength EAQ does afford eight extra bits of precision for those calculations that can be performed without the need to store partial results. Thus we can expect to get slightly better results from a double precision polynomial approximation than from a double precision rational approximation. In the implementation of the various Multics math routines, we have strived to code the single precision routines so that they are as fast as possible (i.e. minimize use of double precision) and yet have a maximum of 1 bit of error in the BFP case and 3 bits of error in the HFP case. In the double precision case, we have aimed more at accuracy than at speed. MTB-729 Multics Technical Bulletin Basic Math Functions We have done extensive testing on the math routines in order to make them work the best we can. The accuracy of the math routines has been significantly improved. As an example, with an input of PI radians, the old double precision sine routine would return NO significant bits in the mantissa. For the same input, the the new routine returns a value in which the entire mantissa is significant. The new routines are generally faster in the BFP case and are always faster in the HFP case. With the installation of the new routines, there has been a 5% increase in the number of Fortran whetstones that can be calculated per second in BFP mode. The increase is about 20% with HFP! Multics Technical Bulletin MTB-729 Basic Math Functions 6. The Algorithms 6.1. 'arc_sine_' The 'arc_sine_' module calculates the inverse sine and inverse cosine functions to single precision accuracy, in degrees or radians, in either BFP or HFP mode. There are eight entry points corresponding to the eight functions implemented by this module. They are: arc_cosine_degrees_ hfp_arc_cosine_degrees_ arc_cosine_radians_ hfp_arc_cosine_radians_ arc_sine_degrees_ hfp_arc_sine_degrees_ arc_sine_radians_ hfp_arc_sine_radians_ The only differences between the BFP and the corresponding HFP entry points to this module are in the bit representations of the floating point constants used. Therefore we need only describe the four BFP entry points. The module is called 'arc_sine_' because each of the entry points is defined in terms of an internal procedure, 'arcsin', which calculates the inverse sine function in radians. If the module were written in PL/I, the entry points would look like this: arc_cosine_degrees_: entry (x) returns (float bin); return (90 - one_radian*arcsin (x)); arc_cosine_radians_: entry (x) returns (float bin); return (half_pi - arcsin (x)); arc_sine_degrees_: entry (x) returns (float bin); return (one_radian*arcsin (x)); arc_sine_radians_: entry (x) returns (float bin); return (arcsin (x)); where 'one_radian' is the constant specifying the number of degrees in one radian (i.e. 180/pi), and 'half_pi' is the constant pi/2. The inverse sine function is defined over the interval [-1, 1]. It can be approximated well by a rational function. For values near zero, the order of the rational function is low, but for values with magnitude near 1, a high order rational function is needed. Fortunately, there are some simple identities which allow us to calculate arcsines of values near one in terms of arcsines of values near zero. MTB-729 Multics Technical Bulletin Basic Math Functions The strategy used in the internal 'arcsin' procedure is to divide the domain into several ranges. The first range corresponds to values whose magnitude is sin(pi/6) or less (i.e. the interval [-1/2, 1/2]). On this range, the inverse sine function is directly approximated by a rational function. It turns out that single precision accuracy can be obtained over this range with a rational function whose numerator is an odd polynomial of degree 5 and whose denominator is an even polynomial of degree 4. The second range corresponds to values whose magnitude lie in the interval (sin(pi/6), sin(pi/4)] (i.e. 1/2 to sqrt(2)/2). It is only necessary to deal with the positive values in this range, since arcsin(-x) = -arcsin(x). Over this range we can make use of the identity: arcsin(x) = 0.5*arcsin(2*x**2 - 1) + pi/4 This formula is derived from the definition of the second order Chebyshev polynomial: cos(2*arccos(x)) = 2*x**2 - 1 by taking the arccosine of both sides and using the fact that: arccos(x) = pi/2 - arcsin(x) An examination of the graph of '2*x**2 - 1' shows that it is strictly increasing over (sin(pi/6), sin(pi/4)]. Thus the value we calculate for '2*x**2 - 1' will be between the values the function takes at the two endpoints of the interval, i.e. between -1/2 and 1/2. This is the range over which we can approximate arcsine with our rational function. The third range consists of values whose magnitude lie in the interval (sin(pi/4), sin(5*pi/12)] (i.e. sqrt(2)/2 through (sqrt(3)+1)/(2*sqrt(2))). As with range 2, we use 'arcsin(-x) = -arcsin(x)' to deal with negative numbers in the range. The positive numbers are dealt with by: arcsin(x) = 0.25*arcsin(8*x**4 - 8*x**2 + 1) + 3*pi/8 This formula is derived from the definition of the fourth order Chebyshev polynomial: cos(4*arccos(x)) = 8*x**4 - 8*x**2 + 1 Multics Technical Bulletin MTB-729 Basic Math Functions The function '8*x**4 - 8*x**2 + 1' is strictly increasing over the interval (sin(pi/4), sin(5*pi/12)], so its value over that interval lies between the values at the endpoints, i.e. -1/2 and 1/2. Again, this is the range over which we can approximate arcsine with our rational function. The fourth range consists of values whose magnitude lie in the interval (sin(5*pi/12), sin(pi/2)] (i.e. (sqrt(3)+1)/(2*sqrt(2)) through 1). As before, we use the identity 'arcsin(-x) = -arcsin(x)' to deal with negative values in this range. The positive values in this range are handled by: arcsin(x) = pi/2 - 2*arcsin(sqrt((1-x)/2)) This formula is derived as follows. Let 'x' be any value in [-1, 1], the domain of the arcsine function. Let 'y' be '(1-x)/2' (so 'x = 1 - 2*y**2'). Since 'x' is in [-1, 1], 'y' is in [0, 1] and we can take the arcsine of it. Let 'z = arcsin(y)' (so 'y = sin(z)'). Then: asin(x) = pi/2 - acos(x) = pi/2 - acos(1 - 2*y**2) = pi/2 - acos(1 - 2*(sin(z))**2) = pi/2 - acos(cos(2*z)) = pi/2 - 2*z = pi/2 - 2*asin(y) = pi/2 - 2*asin(sqrt((1-x)/2)) The function 'sqrt((1-x)/2)' is strictly decreasing on the interval (sin(5*pi/12), sin(pi/2)], so its value lies between the values at the endpoints, sin(pi/24) and 0. This is well within the interval [-1/2, 1/2] on which we can approximate arcsine by our rational function. The fifth and final range corresponds to values outside the domain of the arcsine function (i.e. to values whose magnitude exceeds 1). Attempts to evaluate the arcsine of a value in this range result in the signalling of math error 58. If the routine is restarted, zero is arbitrarily chosen as the value of the arcsine. It only remains for us to define the rational function used to approximate arcsine to single precision accuracy over the interval [-1/2, 1/2]. That rational function is: r(x) = x*p(x*x)/q(x*x) where 'p' and 'q' are the quadratic polynomials: MTB-729 Multics Technical Bulletin Basic Math Functions p(y) = p0 + p1*y + p2*y**2 q(y) = q0 + q1*y + y**2 with: p0 = +5.603629044813127 p1 = -4.6145309466645 p2 = +0.49559947478731 q0 = +5.603629030606043 q1 = -5.54846659934668 One problem with using the above rational function to approximate arcsine is that it will suffer underflows for values whose magnitude is about 5.445e-20 or less. This is not really much of a problem since the arcsine of a value is the same as that value, to 71 bits of precision, if the magnitude of the value is less than 5.7031627e-10. Thus we need only use the rational function for values whose magnitude is much larger than the threshold for underflows. Solving the problem of underflows by reducing the order of the approximating function suggests a possible optimization: We could use a lower order rational approximation for values close enough to zero to not require the above rational approximation. It is particularly tempting to look for a simpler approximation for values whose magnitude is sin(pi/24) (i.e. about 0.13052619) or less, as that would speed up the 'arcsin' routine for values in range 4. It turns out that the following simpler rational approximation gives us single precision accuracy for values whose magnitude is sin(pi/24) or less: rr(x) = x*pp(x*x)/qq(x*x) where 'pp' and 'qq' are the linear functions: pp(y) = pp0 + y*pp1 qq(y) = qq0 + y with: pp0 = -2.21393498174243 pp1 = +0.63101484054356 qq0 = -2.21393497792717 The 'arc_sine_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: Multics Technical Bulletin MTB-729 Basic Math Functions arc_sine_: proc options (support); dcl x float bin; dcl math_error_ entry (fixed bin); dcl half_pi float bin (63) static options (constant) init (1.570796326794896619), one_radian float bin (63) static options (constant) init (57.29577951308232088), pi float bin (63) static options (constant) init (3.141592653589793238); arcsine_domain_error: call math_error_ (58); return (0); arc_cosine_degrees_: entry (x) returns (float bin); return (90 - one_radian * arcsin (x)); arc_cosine_radians_: entry (x) returns (float bin); return (half_pi - arcsin (x)); arc_sine_degrees_: entry (x) returns (float bin); return (one_radian * arcsin (x)); arc_sine_radians_: entry (x) returns (float bin); return (arcsin (x)); arcsin: proc (x) returns (float bin); dcl x float bin; dcl abs_x float bin (63), radians float bin (63), y float bin (63); dcl three_pi_by_two float bin (63) init (3 * pi / 2); MTB-729 Multics Technical Bulletin Basic Math Functions abs_x = abs (x); if abs_x <= 0.5 then radians = part_arcsin (float (x, 63)); else do; if abs_x <= 0.866025404 /* sin(pi/3) */ then do; y = 2 * abs_x ** 2 - 1; radians = 0.5 * (part_arcsin (y) + half_pi); end; else if abs_x <= 0.965925826 /* sin(5*pi/12) */ then do; y = 8 * abs_x ** 2 * (abs_x ** 2 - 1) + 1; radians = 0.25 * (part_arcsin (y) + three_pi_by_two); end; else if abs_x <= 1 then do; y = sqrt (.5 - .5 * abs_x); radians = half_pi - 2 * part_arcsin (y); end; else goto arcsine_domain_error; if x < 0 then radians = -radians; end; return (radians); end arcsin; /* Subroutine 'part_arcsin' calculates the arcsine of a value in the range [0, .5]. */ part_arcsin: proc (y) returns (float bin); dcl y float bin (63); dcl p float bin (63), pp float bin (63), q float bin (63), qq float bin (63), yy float bin (63); Multics Technical Bulletin MTB-729 Basic Math Functions dcl p0 float bin (63) static options (constant) init (+5.603629044813127), p1 float bin (63) static options (constant) init (-4.6145309466645), p2 float bin (63) static options (constant) init (+0.49559947478731), q0 float bin (63) static options (constant) init (+5.603629030606043), q1 float bin (63) static options (constant) init (-5.54846659934668), pp0 float bin (63) static options (constant) init (-2.21393498174243), pp1 float bin (63) static options (constant) init (+0.63101484054356), qq0 float bin (63) static options (constant) init (-2.21393497792717); if abs (y) >= 0.13052619 then do; yy = y * y; p = p0 + yy * (p1 + yy * p2); q = q0 + yy * (q1 + yy); return (y * p / q); end; else if abs (y) >= 5.7031627e-10 then do; yy = y * y; pp = pp0 + yy * pp1; qq = qq0 + yy; return (y * pp / qq); end; else return (y); end part_arcsin; end arc_sine_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.2. 'double_arc_sine_' The 'double_arc_sine_' module calculates the inverse sine and inverse cosine functions to double precision accuracy, in degrees or radians, in either BFP or HFP mode. There are eight entry points corresponding to the eight functions implemented by this module. They are: double_arc_cosine_degrees_ hfp_double_arc_cosine_degrees_ double_arc_cosine_radians_ hfp_double_arc_cosine_radians_ double_arc_sine_degrees_ hfp_double_arc_sine_degrees_ double_arc_sine_radians_ hfp_double_arc_sine_radians_ The algorithm used to implement these functions is identical to that used for the single precision 'arc_sine_' module, except that the rational functions 'r(x)' and 'rr(x)' which approximate arcsine over the intervals [-1/2, 1/2] and [-sin(pi/24), sin(pi/24)] are replaced by higher order ones which give double precision accuracy. The rational function 'r' is replaced by: r(x) = x*p(x*x)/q(x*x) where 'p' and 'q' are the polynomials: p(y) = p0 + p1*y + p2*y**2 + p3*y**3 + p4*y**4 + p5*y**5 + p6*y**6 q(y) = q0 + q1*y + q2*y**2 + q3*y**3 + q4*y**4 + q5*y**5 + y**6 with: p0 = 5.3849190607669366114e+2 p1 = -1.5568739285411701684e+3 p2 = 1.6924399892857508174e+3 p3 = -8.5268159839800034482e+2 p4 = 1.9645634637912159609e+2 p5 = -1.7029424630829249399e+1 p6 = 2.7091596623264652521e-1 q0 = 5.3849190607669366114e+2 q1 = -1.6466225795539524453e+3 q2 = 1.9264901929223241968e+3 q3 = -1.0743064209874076849e+3 q4 = 2.8817015748752908509e+2 q5 = -3.2508459966449899385e+1 The rational function 'rr' is replaced by: rr(x) = x*pp(x*x)/qq(x*x) Multics Technical Bulletin MTB-729 Basic Math Functions where 'pp' and 'qq' are the polynomials: pp(y) = pp0 + pp1*y + pp2*y**2 + pp3*y**3 qq(y) = qq0 + qq1*y + qq2*y**2 + qq3*y**3 + y**4 with: pp0 = 4.4005608326359226844e+2 pp1 = -6.7036113799980663993e+2 pp2 = 2.8631086818069079154e+2 pp3 = -3.0034170270689843770e+1 qq0 = 4.4005608326359226844e+2 qq1 = -7.4370381854373868468e+2 qq2 = 3.7725729835987782917e+2 qq3 = -5.6777961133209015623e+1 The 'double_arc_sine_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: double_arc_sine_: proc options (support); dcl x float bin (63); dcl math_error_ entry (fixed bin); dcl half_pi float bin (63) static options (constant) init (1.570796326794896619), one_radian float bin (63) static options (constant) init (57.29577951308232088), pi float bin (63) static options (constant) init (3.141592653589793238); arcsine_domain_error: call math_error_ (58); return (0); double_arc_cosine_degrees_: entry (x) returns (float bin (63)); if abs (x) < 0.866025404 then return (90 - one_radian * arcsin (x)); else if abs (x) > 1 then goto arcsine_domain_error; else return (one_radian * arcsin (sqrt (1 - x ** 2))); MTB-729 Multics Technical Bulletin Basic Math Functions double_arc_cosine_radians_: entry (x) returns (float bin (63)); if abs (x) < 0.866025404 then return (half_pi - arcsin (x)); else if abs (x) > 1 then goto arcsine_domain_error; else return (arcsin (sqrt (1 - x ** 2))); double_arc_sine_degrees_: entry (x) returns (float bin (63)); return (one_radian * arcsin (x)); double_arc_sine_radians_: entry (x) returns (float bin (63)); return (arcsin (x)); arcsin: proc (x) returns (float bin (63)); dcl x float bin (63); dcl abs_x float bin (63), radians float bin (63), y float bin (63); dcl three_pi_by_two float bin (63) init (3 * pi / 2); abs_x = abs (x); if abs_x <= 0.5 then radians = part_arcsin (float (x, 63)); else do; if abs_x <= 0.866025404 /* sin(pi/3) */ then do; y = 2 * abs_x ** 2 - 1; radians = 0.5 * (part_arcsin (y) + half_pi); end; else if abs_x <= 0.965925826 /* sin(5*pi/12) */ then do; y = 8 * abs_x ** 2 * (abs_x ** 2 - 1) + 1; radians = 0.25 * (part_arcsin (y) + three_pi_by_two); end; else if abs_x <= 1 Multics Technical Bulletin MTB-729 Basic Math Functions then do; y = sqrt (.5 - .5 * abs_x); radians = half_pi - 2 * part_arcsin (y); end; else goto arcsine_domain_error; if x < 0 then radians = -radians; end; return (radians); end arcsin; /* Subroutine 'part_arcsin' calculates the arcsine of a value in the range [0, .5]. */ part_arcsin: proc (y) returns (float bin (63)); dcl y float bin (63); dcl p float bin (63), pp float bin (63), q float bin (63), qq float bin (63), yy float bin (63); dcl p0 float bin (63) static options (constant) init (5.3849190607669366114e+2), p1 float bin (63) static options (constant) init (-1.5568739285411701684e+3), p2 float bin (63) static options (constant) init (1.6924399892857508174e+3), p3 float bin (63) static options (constant) init (-8.5268159839800034482e+2), p4 float bin (63) static options (constant) init (1.9645634637912159609e+2), p5 float bin (63) static options (constant) init (-1.7029424630829249399e+1), p6 float bin (63) static options (constant) init (2.7091596623264652521e-1), q0 float bin (63) static options (constant) init (5.3849190607669366114e+2), MTB-729 Multics Technical Bulletin Basic Math Functions q1 float bin (63) static options (constant) init (-1.6466225795539524453e+3), q2 float bin (63) static options (constant) init (1.9264901929223241968e+3), q3 float bin (63) static options (constant) init (-1.0743064209874076849e+3), q4 float bin (63) static options (constant) init (2.8817015748752908509e+2), q5 float bin (63) static options (constant) init (-3.2508459966449899385e+1), pp0 float bin (63) static options (constant) init (4.4005608326359226844e+2), pp1 float bin (63) static options (constant) init (-6.7036113799980663993e+2), pp2 float bin (63) static options (constant) init (2.8631086818069079154e+2), pp3 float bin (63) static options (constant) init (-3.0034170270689843770e+1), qq0 float bin (63) static options (constant) init (4.4005608326359226844e+2), qq1 float bin (63) static options (constant) init (-7.4370381854373868468e+2), qq2 float bin (63) static options (constant) init (3.7725729835987782917e+2), qq3 float bin (63) static options (constant) init (-5.6777961133209015623e+1); if abs (y) > 0.13052619 then do; yy = y * y; p = p0 + yy * (p1 + yy * (p2 + yy * (p3 + yy * (p4 + yy * (p5 + yy * p6))) Multics Technical Bulletin MTB-729 Basic Math Functions )); q = q0 + yy * (q1 + yy * (q2 + yy * (q3 + yy * (q4 + yy * (q5 + yy))))); return (y * p / q); end; else if abs (y) >= 5.7031627e-10 then do; yy = y * y; pp = pp0 + yy * (pp1 + yy * (pp2 + yy * pp3)); qq = qq0 + yy * (qq1 + yy * (qq2 + yy * (qq3 + yy))); return (y * pp / qq); end; else return (y); end part_arcsin; end double_arc_sine_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.3. 'arc_tangent_' The 'arc_tangent_' module calculates the inverse tangent function of one or two arguments to single precision accuracy, in degrees or radians, in either BFP or HFP mode. There are eight entry points corresponding to the eight functions implemented by this module. They are: arc_tangent_degrees_ hfp_arc_tangent_degrees_ arc_tangent_degrees_2_ hfp_arc_tangent_degrees_2_ arc_tangent_radians_ hfp_arc_tangent_radians_ arc_tangent_radians_2_ hfp_arc_tangent_radians_2_ The only differences between the BFP and the corresponding HFP entry points to this module are in the bit representations of the floating point constants used. Therefore we need only describe the four BFP entry points. The entry points whose names end with '2_' take two arguments, while the other entry points take a single argument. Those entry points which take a single argument calculate the principle arctangent of that argument, while the entry points with two arguments calculate an arctangent (not necessarily the principle one) of the quotient of the arguments. If 'arc_tangent_' was coded in PL/I, the entry points would be defined as: arc_tangent_degrees_: entry (x) returns (float bin); return (one_radian*arctan (x)); arc_tangent_degrees_2_: entry (x, y) returns (float bin); return (one_radian*arctan2 (x, y)); arc_tangent_radians_: entry (x) returns (float bin); return (arctan (x)); arc_tangent_radians_2_: entry (x, y) returns (float bin); return (arctan2 (x, y)); where 'arctan' is a function that calculates to single precision accuracy the angle in radians in [-pi/2, pi/2] whose tangent is its argument, and 'arctan2' is a function which calculates to single precision accuracy an angle in radians in (-pi, pi/2] whose tangent is the quotient of its arguments. Multics Technical Bulletin MTB-729 Basic Math Functions The angle returned by 'arctan2' is determined as follows: If the value of the first argument is positive, the result is positive. If the value of the first argument is zero, the result is zero if the second argument is positive, and pi if the second argument is negative. If the value of the first argument is negative, the result is negative. If the value of the second argument is zero, the absolute value of the result is pi/2, unless the first argument is also zero, in which case math error 11 is signalled and the result is set to zero. When neither argument to 'arctan2' is zero, the result is determined by calling 'arctan' to calculate the inverse tangent of the quotient in the range [-pi/2, pi/2], and then using the periodicity of the tangent function (i.e. tan(x) = tan(x + n*pi) for any integer 'n') to find the equivalent angle in (-pi, pi] which satisfies the above constraints. Thus the meat of the 'arc_tangent_' module is the calculation of the 'arctan' function. Now let us consider how we can implement the 'arctan' function. Arctangent maps the interval (-infinity, infinity) into [-pi/2, pi/2]. It is an odd function (i.e. arctan(-x) = -arctan(x)), so we can restrict ourselves to finding the arctangent of nonnegative numbers. If we are asked for the arctangent of a negative number we simply return the negative of the arctangent of the absolute value of the number. It would take a very high degree polynomial to give a good approximation to the arctangent function for an arbitrary positive number, so we want to find some way to reduce the range over which we need to approximate arctangent. With that in mind, we subdivide the nonnegative numbers into the following 9 ranges: range 0: [0, tan(pi/32)) ~ [0, 0.0985) range 1: [tan(pi/32), tan(3*pi/32)) ~ [0.0985, 0.303) range 2: [tan(3*pi/32, tan(5*pi/32)) ~ [0.303, 0.535) range 3: [tan(5*pi/32), tan(7*pi/32)) ~ [0.535, 0.821) range 4: [tan(7*pi/32), tan(9*pi/32)) ~ [0.821, 1.22) range 5: [tan(9*pi/32), tan(11*pi/32)) ~ [1.22, 1.87) range 6: [tan(11*pi/32), tan(13*pi/32)) ~ [1.87, 3.30) range 7: [tan(13*pi/32), tan(15*pi/32)) ~ [3.30, 10.2) range 8: [tan(15*pi/32), infinity) ~ [10.2, infinity) As we shall shortly see, the calculation of the arctangent of a value in any of the above ranges can be reduced to the calculation of an arctangent in the range [-pi/32, pi/32]. For this reason, we shall now define an function 'part_arctan' which will calculate to single precision accuracy the arctangent in radians of a value in that range. MTB-729 Multics Technical Bulletin Basic Math Functions It turns out that for 'z' in the range [-pi/32, pi/32], 'arctan(z)' can be approximated to better than single precision accuracy by the function 'z*p(z*z)', where 'p(x)' is the cubic polynomial: p(x) = ((p3*x + p2)*x + p1)*x + p0 with: p0 = 0.9999999999924517 p1 = -.33333330840148 p2 = 0.199987124164 p3 = -.14072538 There is one problem with using the function 'z*p(z*z)' to approximate arctan: For values sufficiently close to zero, we will get one or more underflows. This turns out to be a simple problem to overcome. Underflow only occurs for values whose magnitude is less than 1.02183098e-19, while the arctangent of a value is the same as the value to 71 bits of precision for values less than 5.7031627e-10 (as can be deduced from the Taylor series expansion: arctan(x) = x - x**3/3 + x**5/5 - x**7/7 + ...). To evaluate the arctan of values in range 0, we need only call 'part_arctan'. The task now is to figure out how to map the other 8 ranges into that range. First let's consider the last range, [tan(15*pi/32), infinity). It is the largest range, so it would seem to be the hardest to deal with. That is not in fact so, because: arctan(x) = pi/2 - arctan(1/x) This means that we can compute the arctangent of a very large number in terms of the arctangent of a very small number. For example, if we want the arctangent of 1000, we merely calculate the arctangent of 1/1000 and subtract the result from pi/2. Notice that the largest arctangent we need to calculate occurs for the smallest value that we are interested in applying this formula to. We have a way to calculate arctangents of numbers as large as tan(pi/32), so we can use this formula for all numbers greater than or equal to 1/tan(pi/32). But what is that number? 1/tan(pi/32) = cot(pi/32) = tan(pi/2 - pi/32) = tan(15*pi/32) Thus, on range 8, we can approximate 'arctan(x)' simply by 'half_pi - part_arctan(1/x)'. Note that '1/x' can neither underflow nor overflow for any 'x' in range 8. Multics Technical Bulletin MTB-729 Basic Math Functions Ranges 1 through 7 can all be reduced to range 0 by applying the identity: arctan(z) = arctan(t) + arctan(u) where: t = 1/u - (1/u**2 + 1)/(1/u + z) This identity tells us how to break an arctangent into the sum of two other arctangents. The way we use this is as follows. Suppose we want to find the arctangent of 'z', where 'z' is somewhere in range 'i' (1 <= 'i' <= 7). If you go back to the table that defines the various ranges, you will see that saying that 'z' is in range 'i' is the same as saying that: tan((2*i-1)*pi/32) <= z < tan((2*i+1)*pi/32) Choose 'u' to be the value 'tan(i*pi/16)'. This is pretty close to the midpoint of range 'i'. Then, by the above identity, arctan(z) = arctan(t) + arctan(u) = arctan(t) + arctan(tan(i*pi/16)) = arctan(t) + i*pi/16 Thus, to calculate 'arctan(z)', all we have to do is calculate 'arctan(t)' and add 'i*pi/16' (which we can obtain from a table look-up). The question we must ask is: Is the value of 't' in the range [-pi/32, pi/32] where we can approximate 'arctan(t)' by 'part_arctan(t)'? The answer is, of course, yes, as we shall now see. Since arctan is a strictly increasing function, we need only make sure that the 't' we get for largest and smallest 'z' in the range is okay. let 'zmax' and 'zmin' be the largest and smallest values 'z' can be. Then 'zmax = tan((2*i+1)*pi/32)' and 'zmin = tan((2*i-1)*pi/32)'. We know that 'arctan(z) = arctan(t) + arctan(u)', so 'arctan(t) = arctan(z) - arctan(u)'. Thus: arctan(t) < arctan(zmax) - arctan(u) = arctan(tan(2*(i+1)*pi/32)) - arctan(tan(i*pi/16)) = 2*(i+1)*pi/32 - i*pi/16 = pi/32 Similarly, 'arctan(t)' is greater than or equal to -pi/32, so we can validly approximate 'tan(t)' by 'part_arctan(t)'. MTB-729 Multics Technical Bulletin Basic Math Functions There is one more thing to look at: What is the best way to evaluate 't'? Remember, t = 1/u - (1/u**2 + 1)/(1/u + z), where u = tan(i*pi/16). What we need is a seven element table called 'one_over_u' which contains the values 1/tan(pi/16), 1/tan(2*pi/16), ..., 1/tan(7*pi/16), and a second seven element table called 'one_plus_one_over_u_squared' which contains one more than the square of the corresponding elements of 'one_over_u'. Armed with these tables, the calculation of 't' is just: t = one_over_u(i) - one_plus_one_over_u_squared(i)/(one_over_u(i) + z) The 'arc_tangent_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: arc_tangent_: proc options (support); dcl x float bin (27), y float bin (27); dcl half_pi float bin (63) static options (constant) init (1.570796326794896619), one_radian float bin (63) static options (constant) init (57.29577951308232088), pi float bin (63) static options (constant) init (3.141592653589793238); arc_tangent_degrees_: entry (x) returns (float bin); return (one_radian * arctan ((x))); arc_tangent_degrees_2_: entry (x, y) returns (float bin); return (one_radian * arctan2 ((x), (y))); arc_tangent_radians_: entry (x) returns (float bin); return (arctan ((x))); arc_tangent_radians_2_: entry (x, y) returns (float bin); return (arctan2 ((x), (y))); Multics Technical Bulletin MTB-729 Basic Math Functions arctan: proc (z) returns (float bin (63)); dcl z float bin (63); dcl t float bin (63), radians float bin (63), range fixed bin; dcl tan_pi_by_32 float bin (27) static options (constant) init (.98491403e-1), tan_3_pi_by_32 float bin (27) static options (constant) init (.30334668), tan_5_pi_by_32 float bin (27) static options (constant) init (.53451114), tan_7_pi_by_32 float bin (27) static options (constant) init (.82067879), tan_9_pi_by_32 float bin (27) static options (constant) init (1.2185035), tan_11_pi_by_32 float bin (27) static options (constant) init (1.8708684), tan_13_pi_by_32 float bin (27) static options (constant) init (3.2965582), tan_15_pi_by_32 float bin (27) static options (constant) init (10.153170); dcl one_over_u (7) float bin (63) static options (constant) init ( 5.0273394921258481045e0, 2.4142135623730950488e0, 1.4966057626654890176e0, 1.0e0, .66817863791929891999e0, .41421356237309504880e0, .19891236737965800691e0); dcl one_plus_one_over_u_squared (7) float bin (63) static options (constant) init ( 0.26274142369088180356e02, 0.68284271247461900976e01, 0.32398288088435500410e01, 0.20e1, 0.14464626921716895685e01, 0.11715728752538099024e01, 0.10395661298965800348e01); MTB-729 Multics Technical Bulletin Basic Math Functions dcl arctan_of_u (7) float bin (63) static options (constant) init ( .19634954084936207740, .39269908169872415481, .58904862254808623221, .78539816339744830962, .98174770424681038702, 1.17809724509617246442, 1.37444678594553454182); if abs (z) < tan_7_pi_by_32 then if abs (z) < tan_3_pi_by_32 then if abs (z) < tan_pi_by_32 then range = 0; else range = 1; else if abs (z) < tan_5_pi_by_32 then range = 2; else range = 3; else if abs (z) < tan_15_pi_by_32 then if abs (z) < tan_11_pi_by_32 then if abs (z) < tan_9_pi_by_32 then range = 4; else range = 5; else if abs (z) < tan_13_pi_by_32 then range = 6; else range = 7; else range = 8; if range = 0 then radians = part_arctan (abs (z)); else if range = 8 then if abs (z) < 1e71b then radians = half_pi - part_arctan (1 / abs (z)); else radians = half_pi; else do; t = one_over_u (range) - one_plus_one_over_u_squared (range) / (one_over_u (range) + abs (z)); radians = part_arctan (t) + arctan_of_u (range); end; if z < 0 then radians = -radians; return (radians); end arctan; arctan2: proc (x, y) returns (float bin (63)); Multics Technical Bulletin MTB-729 Basic Math Functions dcl x float bin (63), y float bin (63); dcl radians float bin (63), z float bin (63); dcl math_error_ entry (fixed bin); dcl (overflow, underflow) condition; if y = 0 then if x = 0 then goto arctan2_domain_error; else radians = half_pi; else do; on overflow goto quotient_too_large; on underflow goto quotient_too_small; z = abs (x / y); revert overflow, underflow; radians = arctan (z); end; set_quadrant: if y < 0 then radians = pi - radians; if x < 0 then radians = -radians; return (radians); arctan2_domain_error: call math_error_ (11); return (0); quotient_too_large: radians = half_pi; goto set_quadrant; quotient_too_small: radians = 0; goto set_quadrant; end arctan2; part_arctan: proc (z) returns (float bin (63)); dcl z float bin (63); dcl zz float bin (63); MTB-729 Multics Technical Bulletin Basic Math Functions dcl p0 float bin (63) static options (constant) init (0.9999999999924517), p1 float bin (63) static options (constant) init (-.33333330840148), p2 float bin (63) static options (constant) init (0.199987124164), p3 float bin (63) static options (constant) init (-.14072538); if abs (z) < 5.7031627e-10 then return (z); else do; zz = z * z; return (z * (p0 + zz * (p1 + zz * (p2 + zz * p3))) ); end; end part_arctan; end arc_tangent_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.4. 'double_arc_tangent_' The 'double_arc_tangent_' module calculates the inverse tangent function of one or two arguments to single precision accuracy, in degrees or radians, in either BFP or HFP mode. There are twelve entry points corresponding to the eight functions implemented by this module. They are: double_arc_tan_degrees_ hfp_double_arc_tan_degrees_ double_arc_tangent_degrees_ double_arc_tan_degrees_2_ hfp_double_arc_tan_degrees_2_ double_arc_tangent_degrees_2_ double_arc_tan_radians_ hfp_double_arc_tan_radians_ double_arc_tangent_radians_ double_arc_tan_radians_2_ hfp_double_arc_tan_radians_2_ double_arc_tangent_radians_2_ The names containing the word 'tangent' are obsolete names retained for compatibility with previous versions. They are merely synonyms for the entry points whose names are obtained by replacing 'tangent' with 'tan'. The algorithm used to implement these functions is identical to that used for the single precision 'arc_tangent_' module, except that the polynomial 'p(x)' which approximates arctangent over the interval [-pi/32, pi/32] is replaced by a higher order polynomial which gives double precision accuracy. The polynomial 'p' is replaced by: p(x) = 1 + x*(p1 + x*(p2 + x*(p3 + x*(p4 + x*(p5 + x*(p6 + x*p7)))))) with: p1 = -0.33333333333333333154 p2 = 0.19999999999999612046 p3 = -0.14285714285394000547 p4 = 0.1111111098121285609 p5 = -0.09090880462996335 p6 = 0.076888077127566 p7 = -0.064430854376 The 'double_arc_tangent_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: double_arc_tangent_: proc options (support); MTB-729 Multics Technical Bulletin Basic Math Functions dcl x float bin (63), y float bin (63); dcl half_pi float bin (63) static options (constant) init (1.570796326794896619), one_radian float bin (63) static options (constant) init (57.29577951308232088), pi float bin (63) static options (constant) init (3.141592653589793238); double_arc_tangent_degrees_: entry (x) returns (float bin (63)); double_arc_tan_degrees_: entry (x) returns (float bin (63)); return (one_radian * arctan ((x))); double_arc_tangent_degrees_2_: entry (x, y) returns (float bin (63)); double_arc_tan_degrees_2_: entry (x, y) returns (float bin (63)); return (one_radian * arctan2 ((x), (y))); double_arc_tangent_radians_: entry (x) returns (float bin (63)); double_arc_tan_radians_: entry (x) returns (float bin (63)); return (arctan ((x))); double_arc_tangent_radians_2_: entry (x, y) returns (float bin (63)); double_arc_tan_radians_2_: entry (x, y) returns (float bin (63)); return (arctan2 ((x), (y))); arctan: proc (z) returns (float bin (63)); dcl z float bin (63); dcl abs_z float bin (63), t float bin (63), radians float bin (63), range fixed bin; Multics Technical Bulletin MTB-729 Basic Math Functions dcl tan_pi_by_32 float bin (63) static options (constant) init (.98491403e-1), tan_3_pi_by_32 float bin (63) static options (constant) init (.30334668), tan_5_pi_by_32 float bin (63) static options (constant) init (.53451114), tan_7_pi_by_32 float bin (63) static options (constant) init (.82067879), tan_9_pi_by_32 float bin (63) static options (constant) init (1.2185035), tan_11_pi_by_32 float bin (63) static options (constant) init (1.8708684), tan_13_pi_by_32 float bin (63) static options (constant) init (3.2965582), tan_15_pi_by_32 float bin (63) static options (constant) init (10.153170); dcl u (7) float bin (63) static options (constant) init ( .198912367379658006912, .414213562373095048802, .668178637919298919998, 1.0, 1.49660576266548901760, 2.41421356237309504880, 5.02733949212584810451); dcl arctan_of_u (7) float bin (63) static options (constant) init ( .19634954084936207740, .39269908169872415481, .58904862254808623221, .78539816339744830962, .98174770424681038702, 1.17809724509617246442, 1.37444678594553454182); if abs (z) < tan_7_pi_by_32 then if abs (z) < tan_3_pi_by_32 then if abs (z) < tan_pi_by_32 then range = 0; else range = 1; else if abs (z) < tan_5_pi_by_32 then range = 2; else range = 3; else if abs (z) < tan_15_pi_by_32 then if abs (z) < tan_11_pi_by_32 then if abs (z) < tan_9_pi_by_32 MTB-729 Multics Technical Bulletin Basic Math Functions then range = 4; else range = 5; else if abs (z) < tan_13_pi_by_32 then range = 6; else range = 7; else range = 8; if range = 0 then radians = part_arctan (abs (z)); else if range = 8 then if abs (z) < 1e71b then radians = half_pi - part_arctan (1 / abs (z)); else radians = half_pi; else do; abs_z = abs (z); t = (abs_z - u (range)) / (1 + abs_z * u (range)); radians = part_arctan (t) + arctan_of_u (range); end; if z < 0 then radians = -radians; return (radians); end arctan; arctan2: proc (x, y) returns (float bin (63)); dcl x float bin (63), y float bin (63); dcl radians float bin (63), z float bin (63); dcl math_error_ entry (fixed bin); dcl (overflow, underflow) condition; Multics Technical Bulletin MTB-729 Basic Math Functions if y = 0 then if x = 0 then goto arctan2_domain_error; else radians = half_pi; else do; on overflow goto quotient_too_large; on underflow goto quotient_too_small; z = abs (x / y); revert overflow, underflow; radians = arctan (z); end; set_quadrant: if y < 0 then radians = pi - radians; if x < 0 then radians = -radians; return (radians); arctan2_domain_error: call math_error_ (24); return (0); quotient_too_large: radians = half_pi; goto set_quadrant; quotient_too_small: radians = 0; goto set_quadrant; end arctan2; part_arctan: proc (z) returns (float bin (63)); dcl z float bin (63); dcl zz float bin (63); dcl p1 float bin (63) static options (constant) init (-0.33333333333333333154), p2 float bin (63) static options (constant) init (0.19999999999999612046), p3 float bin (63) static options (constant) init (-0.14285714285394000547), p4 float bin (63) static MTB-729 Multics Technical Bulletin Basic Math Functions options (constant) init (0.1111111098121285609), p5 float bin (63) static options (constant) init (-0.09090880462996335), p6 float bin (63) static options (constant) init (0.076888077127566), p7 float bin (63) static options (constant) init (-0.064430854376); if abs (z) < 5.7031627e-10 then return (z); else do; zz = z * z; return (z * (1 + zz * (p1 + zz * (p2 + zz * (p3 + zz * (p4 + zz * (p5 + zz * (p6 + zz * p7))) ))))); end; end part_arctan; end double_arc_tangent_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.5. 'exponential_' The 'exponential_' module calculates the exponential function 'e**x' to single precision accuracy in either BFP or HFP mode. There are two entry points into the module: exponential_ hfp_exponential_ The value of 'e**x' is smaller than the smallest normalized positive floating point number if 'x' is less than the logarithm of that number. Similarly, the value of 'e**x' is larger than the largest positive floating point number if 'x' is greater than the logarithm of that number. Thus the 'exponential_' module can only approximate 'e**x' on a small portion of the range of floating point numbers. If we are asked to evaluate 'e**x' for an 'x' that is below this range, we just return zero. If we are asked to evaluate 'e**x' for an 'x' above this range, we signal an overflow fault, and if we are restarted, we return the largest single precision positive number. In BFP mode, the smallest normalized positive number is 2**(-129), and the largest positive single precision number is 2**127 - 2**100. Thus the range of numbers over which we can approximate 'e**x' is -89.41598629 through 88.02969192. In HFP mode, the smallest normalized positive number is 16**(-129), and the largest positive single precision number is 16**127 - 16**100. Thus the range of numbers over which we can approximate 'e**x' is -357.6639451 to 352.1187677 (the upper bound is excluded since the nearest HFP number to it is slightly larger). The 'exponential_' module doesn't actually calculate 'e**x'. In BFP mode, it calculates '2**y' where 'y' is chosen so that '2**y' has the same value as 'e**x' (i.e. 'y' is 'x*log2(e)'). Similarly, in HFP mode, '16**y' is calculated, where 'y' is chosen so that '16**y' has the same value as 'e**x' (i.e. 'y' is 'x*log16(e)'). This is done so that we can take advantage of the ease with which integer powers of the base can be manipulated in floating point arithmetic. In the BFP case, we want to evaluate '2**y', where 'y' is in the range [-129, 127). From the laws of exponents, we know that 2**(m+n) is the same as (2**m)*(2**n). This suggests that, since it is trivial to multiply by a power of 2 in BFP mode, that we should break 'y' into the sum of an integer 'iy' and a real number 'ry', such that we can approximate '2**ry' by a MTB-729 Multics Technical Bulletin Basic Math Functions polynomial. We want to choose 'iy' and 'ry' so that '2**iy' fits in the range of floating point numbers, and so that the variation in 'ry' is small, so as to minimize the degree of polynomial needed to approximate '2**ry'. These conditions are satisfied if we choose 'iy' to be one more than the floor of 'y', for then 'iy' is in [-128, 127] and 'ry' is in [-1, 0). The function '2**z' can be approximated to single precision accuracy over [-1, 0) by the degree 7 polynomial: p(z) = p0 + p1*z + p2*z**2 + ... + p7*z**7 where: p0 = 0.999999999959788989221 p1 = 0.693147175773076184335 p2 = 0.240226411617528907564 p3 = 0.055503374633869439843 p4 = 0.009615319129350436459 p5 = 0.001327438181098387966 p6 = 0.000147007243118869978 p7 = 0.000010749381848696467 Evaluation of the polynomial 'p(z)' will result in underflows if the magnitude of 'z' is 1.3669325e-34 or less. We can avoid this easily since '2**z' is just 1 to 71 bits of precision for |z| less than 1.56417309e-19. In HFP mode, we want to calculate '16**y'. We do this analogously to how '2**y' is approximated in BFP mode: Break 'y' into the sum of an integer 'iy' and a real number 'ry', where 'iy' is one more than the floor of 'y', so that 'iy' is in [-128, 127] and 'ry' is in [-1, 0). Then, since '16**y' is the same as the product of '16**iy' and '16**ry', all we have to do is find a way to calculate '16**ry'. A little thought will show that we already have a way to calculate '16**ry' efficiently, since '16**ry' is the same as '(2**ry)**4' and '2**ry' can be calculated by the internal procedure 'part_exp2'! The 'exponential_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: exponential_: proc (x) returns (float bin); dcl x float bin; Multics Technical Bulletin MTB-729 Basic Math Functions dcl log_2_of_e float bin (63) static options (constant) init (1.442695040888963407359), max_value float bin (27) static options (constant) init (1.701411834604691217e38); dcl result float bin, ry float bin, y float bin; dcl iy fixed bin; dcl expon fixed bin (7) unaligned based; if x < -89.41598629 then result = 0; else if x > 88.02969192 then do; /* force overflow */ result = max_value + max_value; result = max_value; end; else do; y = x * log_2_of_e; iy = floor (y + 1); ry = y - iy; result = part_exp2 (ry); addr (result) -> expon = addr (result) -> expon + iy; end; return (result); part_exp2: proc (z) returns (float bin (63)); dcl z float bin; dcl p0 float bin (63) static options (constant) init (0.999999999959788989221), p1 float bin (63) static options (constant) init (0.693147175773076184335), p2 float bin (63) static options (constant) init (0.240226411617528907564), p3 float bin (63) static options (constant) MTB-729 Multics Technical Bulletin Basic Math Functions init (0.055503374633869439843), p4 float bin (63) static options (constant) init (0.009615319129350436459), p5 float bin (63) static options (constant) init (0.001327438181098387966), p6 float bin (63) static options (constant) init (0.000147007243118869978), p7 float bin (63) static options (constant) init (0.000010749381848696467); if abs (z) < 1.56417309e-19 then return (1); return (p0 + z * (p1 + z * (p1 + z * (p3 + z * (p4 + z * (p5 + z * (p6 + z * p7))))))); end part_exp2; end exponential_; If PL/I supported HFP, the HFP algorithm would be coded like this: hfp_exponential_: proc (x) returns (float hex); dcl x float hex; dcl log_2_of_e float hex (63) static options (constant) init (1.442695040888963407359), max_value float hex (27) static options (constant) init (1.701411834604691217e38); dcl result float hex, ry float hex, y float hex; dcl iy fixed hex; dcl expon fixed hex (7) unaligned based; Multics Technical Bulletin MTB-729 Basic Math Functions if x < -357.6639451 then result = 0; else if x >= 352.1187677 then do; /* force overflow */ result = max_value + max_value; result = max_value; end; else if abs (x) < 1.0842021e-19 then result = 1.0; else do; y = x * log_16_of_e; iy = floor (y + 1); ry = 4 * (y - iy); scale = 1; do while (ry < -1.0); scale = 0.5 * scale; ry = ry + 1; end; result = scale * hfp_part_exp2 (ry); addr (result) -> expon = addr (result) -> expon + iy; end; return (result); hfp_part_exp2: proc (z) returns (float hex (63)); dcl z float hex; dcl p1 float hex (63) static options (constant) init (0.999999999959788989221), p2 float hex (63) static options (constant) init (0.693147175773076184335), p3 float hex (63) static options (constant) init (0.240226411617528907564), p4 float hex (63) static options (constant) init (0.055503374633869439843), p5 float hex (63) static options (constant) init (0.001327438181098387966), MTB-729 Multics Technical Bulletin Basic Math Functions p6 float hex (63) static options (constant) init (0.000147007243118869978), p7 float hex (63) static options (constant) init (0.000010749381848696467); if abs (z) < 1.56417309e-19 then return (1); return (p0 + z * (p1 + z * (p1 + z * (p3 + z * (p4 + z * (p5 + z * (p6 + z * p7))))))); end hfp_part_exp2; end hfp_exponential_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.6. 'double_exponential_' The 'double_exponential_' module calculates the exponential function 'e**x' to double precision accuracy in either BFP or HFP mode. There are two entry points into the module: double_exponential_ hfp_double_exponential_ We can only approximate the exponential function over a small portion of the range of floating point numbers. In BFP mode, that range is from log(2**(-129)) to log(2**127 - 2**64) (i.e. about -89.4159862922329449148 to 88.0296919311130542959). In HFP mode, that range is from log(16**(-129)) to log(16**127 - 16**64) (i.e. about -357.663945168931779659 to 352.1187677244522171839). In either mode, the exponential of a value below the lower bound is too small to represent and we can just return a zero. Similarly, the exponential of a value above the upper bound is too large to represent, so we should signal an overflow and, if we are restarted, return the largest floating point value that can be represented. As in the single precision 'exponential_' routine, we do not actually calculate 'e**x'. In the BFP case we calculate '2**y' where 'y' has the value 'x*log2(e)'. In the HFP case, we calculate '16**y', where 'y' has the value 'x*log16(e)'. In the single precision 'exponential_' routine, we calculated '2**y' by breaking 'y' into the sum of an integer 'iy' and a real number 'ry' such that 'iy' was one more than the floor of 'y'. Then 'iy' was in the range [-128, 127] and 'ry' was in the range [-1, 0). By the laws of exponents, '2**y' is the same as the product of '2**iy' with '2**ry', so the crux of the routine was to evaluate '2**ry'. This was accomplished by means of a degree 7 polynomial. We could do the same kind of thing in 'double_exponential_', except use a higher degree polynomial to approximate '2**ry' to double precision accuracy. However, it turns out that it takes less computation to approximate '2**z' to double precision accuracy with a rational function instead of a polynomial, provided the range of the approximation is centered at zero. Fortunately, it is not difficult to alter the strategy used in 'exponential_' to work with an approximation of '2**z' on the range [-0.5, 0.5) instead of [-1, 0): If 'part_exp2(z)' approximates '2**z' to double precision accuracy on [-0.5, 0.5), then we can use 'part_exp2(ry)' to approximate '2**ry' on [-0.5, 0] and we can use '0.5*part_exp2(ry+1)' to approximate it on [-1, -0.5), since '0.5*2**(ry+1)' has the same value as '2**ry'. MTB-729 Multics Technical Bulletin Basic Math Functions It turns out that we can approximate '2**z' for 'z' in [-0.5, 0.5] to double precision accuracy with the rational function: r(z) = (q(z*z) + z*p(z*z)) / (q(z*z) - z*p(z*z)) where 'p' and 'q' are the polynomials: p(z) = p0 + p1*z + p2*z**2 q(z) = q0 + q1*z + q2*z**2 + z**3 with: p0 = 2.0803843466946630014e6 p1 = 3.0286971697440362990e4 p2 = 6.0614853300610808416e1 q0 = 6.0027203602388325282e6 q1 = 3.2772515180829144230e5 q2 = 1.7492876890930764038e3 If we try to calculate 'r(z)' for a 'z' of very small magnitude (about 2.657e-20 or less), we will obtain one or more underflows. This is easy to avoid, though, since '2**z' is just 1 to 71 bits of precision for |z| < 1.56417309e-19. In HFP mode, we want to calculate '16**y'. We do this analogously to how '2**y' is approximated in BFP mode: Break 'y' into the sum of an integer 'iy' and a real number 'ry', where 'iy' is one more than the floor of 'y', so that 'iy' is in [-128, 127] and 'ry' is in [-1, 0). Then, since '16**y' is the same as the product of '16**iy' and '16**ry', all we have to do is find a way to calculate '16**ry'. In the single precision 'hfp_exponential_' routine, we were able to approximate '16**ry' by 'part_exp2(ry)**4', since '16**z' is the same as '(2**z)**4'. This was numerically sound because we used double precision arithmetic to calculate the 4th power, and we only needed the result to be accurate to single precision. We would prefer to avoid this type of calculation in the double precision case, since calculating the 4th power of a double precision number with double precision arithmetic can yield an answer that is wrong (due to round-off errors) in the last 4 bits. Fortunately, there is another way to calculate '16**ry' through 'part_exp2' which introduces much less error, based on the fact that the product of '2**s' and '2**(4*ry-s)' has the same value as '16**ry', for any 's'. The strategy will be to choose an 's' based on the value of 'ry' in such a way that '2**s' is easy to calculate and '4*ry-s' is in the range [-0.5, 0.5), so that '2**(4*ry-s)' can be approximated by 'part_exp2(4*ry-s)'. A Multics Technical Bulletin MTB-729 Basic Math Functions little thought will show that if we choose 's' to be the floor of '4*ry+0.5' then '4*ry-s' will lie in [-0.5, 0.5) and, since 'ry' is in [-1, 0), 's' will be one of -4, -3, -2, -1 or 0. Since 's' can only take on 5 values and they are all integers, we can calculate '2**s' by a table look-up that uses 's' as the table index. The 'double_exponential_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: double_exponential_: proc (x) returns (float bin (63)); dcl x float bin (63); dcl log_2_of_e float bin (63) static options (constant) init (1.442695040888963407359), max_value float bin (63) static options (constant) init (1.701411834604691217e38); dcl result float bin (63), ry float bin (63), y float bin (63); dcl iy fixed bin; dcl expon fixed bin (7) unaligned based; if x <= -89.4159862922329449148 then result = 0; else if x >= 88.0296919311130542959 then do; /* force overflow */ result = max_value + max_value; result = max_value; end; else do; y = x * log_2_of_e; iy = floor (y + 1); ry = y - iy; if ry < -0.5 then result = 0.5 * part_exp2 (ry + 1); else result = part_exp2 (ry); addr (result) -> expon = addr (result) -> expon + iy; end; return (result); MTB-729 Multics Technical Bulletin Basic Math Functions part_exp2: proc (z) returns (float bin (63)); dcl z float bin (63); dcl p float bin (63), q float bin (63), zz float bin (63); dcl p0 float bin (63) static options (constant) init (2.0803843466946630014e6), p1 float bin (63) static options (constant) init (3.0286971697440362990e4), p2 float bin (63) static options (constant) init (6.0614853300610808416e1), q0 float bin (63) static options (constant) init (6.0027203602388325282e6), q1 float bin (63) static options (constant) init (3.2772515180829144230e5), q2 float bin (63) static options (constant) init (1.7492876890930764038e3); if abs (z) < 1.56417309e-19 then return (1); zz = z * z; p = z * (p0 + zz * (p1 + zz * p2)); q = q0 + zz * (q1 + zz * (q2 + zz)); return ((q + p) / (q - p)); end part_exp2; end double_exponential_; If PL/I supported HFP, the HFP algorithm would be coded like this: hfp_double_exponential_: proc (x) returns (float hex (63)); dcl x float hex (63); Multics Technical Bulletin MTB-729 Basic Math Functions dcl log_2_of_e float hex (63) static options (constant) init (1.442695040888963407359), max_value float hex (63) static options (constant) init (1.701411834604691217e38); dcl two_to_the (-4:0) static init (0.0625, 0.125, 0.25, 0.5, 1); dcl result float hex (63), ry float hex (63), y float hex (63); dcl iy fixed hex, s fixed hex; dcl expon fixed hex (7) unaligned based; if x <= -357.663945168931779659 then result = 0; else if x >= 352.1187677244522171839 then do; /* force overflow */ result = max_value + max_value; result = max_value; end; else if abs (x) < 1.08420217248550443e-19 then result = 1.0; else do; y = x * log_2_of_e; iy = floor (y + 1); ry = y - iy; s = floor (4 * ry + 0.5); result = two_to_the (s) * part_exp2 (4 * ry - s); addr (result) -> expon = addr (result) -> expon + iy; end; return (result); hfp_part_exp2: proc (z) returns (float hex (63)); dcl z float hex (63); dcl p float hex (63), q float hex (63), zz float hex (63); MTB-729 Multics Technical Bulletin Basic Math Functions dcl p0 float hex (63) static options (constant) init (2.0803843466946630014e6), p1 float hex (63) static options (constant) init (3.0286971697440362990e4), p2 float hex (63) static options (constant) init (6.0614853300610808416e1), q0 float hex (63) static options (constant) init (6.0027203602388325282e6), q1 float hex (63) static options (constant) init (3.2772515180829144230e5), q2 float hex (63) static options (constant) init (1.7492876890930764038e3); if abs (z) < 1.56417309e-19 then return (1); zz = z * z; p = z * (p0 + zz * (p1 + zz * p2)); q = q0 + zz * (q1 + zz * (q2 + zz)); return ((q + p) / (q - p)); end hfp_part_exp2; end hfp_double_exponential_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.7. 'logarithm_' The 'logarithm_' module calculates base 10, 2 and 'e' logarithms to single precision accuracy, in either BFP or HFP mode. There are six entry points corresponding to the six functions implemented by this module. They are: log_base_10_ hfp_log_base_10_ log_base_2_ hfp_log_base_2_ log_base_e_ hfp_log_base_e_ The domain of the logarithm_ function is the positive real numbers. If asked to calculate the logarithm of zero, we signal the 'error' condition with the diagnostic 'log(0), log2(0), log10(0) not allowed. Type "start" to set result to -max_real.' If asked to calculate the logarithm of a negative number, we signal the 'error' condition with the diagnostic 'log(-x), log2(-x), log10(-x) not allowed. Type "start" to set result to -max_real.' Our handling of requests for the logarithm of zero is consistent with previous versions, although we have changed the diagnostic slightly: It used to give the constant '-.17014118e+39' where we use '-max_real'. This change makes the diagnostic independent of whether we are in BFP or HFP mode. Our handling of requests for the logarithm of a negative number differs from that in previous versions. The version of 'logarithm_' in the system since MR9 would signal the 'error' condition with a diagnostic of 'alog(-x), alog2(-x), alog10(-x) not allowed. Type "start" to evaluate for +x.' However, if you did type 'start', the logarithm of one (i.e. zero) would be evaluated. Obviously since the diagnostic did not agree with the actual return value, there is no sense in maintaining compatibility with the previous version. Since negative numbers are not in the domain of the logarithm function, the value returned when the user types 'start' is purely arbitrary. However, if the value is only slightly less than zero, perhaps it was the result of a calculation that should have given a slightly positive number but didn't because of round-off errors. In that case it is clearly superior to return a result consistent with a very small positive number, which is why we have changed the return value from zero to '-max_real'. The theory of logarithms tells us that it is easy to calculate logarithms in some base in terms of logarithms in a different base. Specifically, if we can compute base N logarithms and we want to compute base M logarithms, we use the formula: logM(x) = logM(N) * logN(x) MTB-729 Multics Technical Bulletin Basic Math Functions Thus, even though the 'logarithm_' module has entry points to calculate logarithms in three different bases, we need only be able to calculate logarithms in one of those bases. We will choose to calculate base 2 logaritms because the base 2 logarithm of a BFP is just the logarithm of the mantissa plus the value of the exponent. It turns out that it takes much less computation to approximate 'log2((1+y)/(1-y))' than it does to calculate 'log2(x)'. Thus when we are called upon to evaluate 'log2(x)', we will calculate a 'y' such that '(1+y)/(1-y)' has the same value as 'x' (i.e. 'y' is '(x-1)/(x+1)'), and then approximate 'log2((1+y)/(1-y))'. The function 'log2((1+y)/(1-y))' can be approximated to better than single precision accuracy for 'y' in [-0.171573, 0.171573] (i.e. for '(1+y)/(1-y)' in [0.5*2**0.5, 2**0.5]) by 'y*p(y*y)', where 'p' is the cubic polynomial: p(z) = p0 + p1*z + p2*z**2 + p3*z**3 with: p0 = 2.88539007275213810 p1 = 0.961800759210250522 p2 = 0.576584541348266310 p3 = 0.434255940790007142 If the magnitude of 'y' is less than 5.081691e-20, evaluation of 'y*p(y*y)' will result in one or more underflows. This can easily be avoided since 'y*p(y*y)' is just 'y*p0' to 71 bits of precision for |y| < 4.1968417e-11, and 'y*p0' cannot underflow since 'p0' > 1. The above tells us how we can calculate 'log2(x)' if 'x' is in the range [0.5*2**0.5, 2**0.5]. The PL/I code for this (ignoring for the moment the code to prevent underflows) would look like: y = (x-1)/(x+1); yy = y*y; log2_of_x = y*(p0 + yy*(p1 + yy*(p2 + yy*p3))); Now let us consider how to handle values outside of the range for which the above code works. We will consider the BFP case first. Let 'b' be any positive normalized BFP number. The hardware represents 'b' as 'bm * 2**be', where 'bm' is a binary fraction in the range [0.5, 1), and 'be' is an integer in the range [-128, 127]. By the laws of logarithms, log2(b) = log2(bm * 2**be) = log2(bm) + log2(2**be) = log2(bm) + be Multics Technical Bulletin MTB-729 Basic Math Functions Thus we can easily calculate the logarithm of any positive normalized BFP number if we can calculate the logarithm of its mantissa. Unfortunately, the mantissa is in the range [0.5, 1), but we only know how to approximate logarithms on about [0.707, 1.414]. This is easy to remedy, since: log2(bm) = log2(bm * 2**0.5 / 2**0.5) = log2(bm * 2**0.5) - log2(2**0.5) = log2(bm * 2**0.5) - 0.5 Since 'bm' is in [0.5, 1), 'bm * 2**0.5' is in [0.5*2**0.5, 2**0.5), and we can thus approximate 'log2(bm * 2**0.5)'. It is not actually necessary to form the product of 'bm' and 2**0.5 because all we are going to do with it is use it to calculate 'y' such that '(1+y)/(1-y)' has the same value as the product. In other words: y = (bm*2**0.5 - 1)/(bm*2**0.5 + 1) = (bm - 1/2**0.5)/(bm + 1/2**0.5) = (bm - 0.5*2**0.5)/(bm + 0.5*2**0.5) What this means in the general case is that 'log2(x/y)' can be approximated to single precision accuracy by 'z*p(z*z)', where 'z' is '(x - y)/(x + y)', providing that 'x/y' is in the range [0.5*2**0.5, 2**0.5]. For this reason, we shall make the basis of our code an internal procedure that, given 'x' and 'y' such that 'x/y' is in [0.5*2**0.5, 2**0.5], calculates 'log2(x/y)'. The HFP case is slightly more complicated for values outside the range where we can directly apply 'part_log_of_ratio'. Let 'h' be any positive normalized HFP number. The hardware represents 'h' as 'hm * 16**he', where 'hm' is a hexadecimal fraction in the range [0.0625, 1), and 'he' is an integer in the range [-128, 127]. By the laws of logarithms: log2(h) = log2(hm * 16**he) = log2(hm) + log2(16**he) = log2(hm) + 4*he We can calculate 'log2(hm)' by introducing a scaling factor '2**s': log2(hm) = log2(hm * 2**s / 2**s) = log2(hm * 2**s) - log2(2**s) = log2(hm * 2**s) - s MTB-729 Multics Technical Bulletin Basic Math Functions The value of 'hm' is somewhere in [0.0625, 1). We want the value of 'hm * 2**s' to be in [0.5*2**0.5, 2**0.5]. This is easily done: If 'hm' is in [0.0625, 0.125), 's' is 3.5; if 'hm' is in [0.125, 0.25), 's' is 2.5; if 'hm' is in [0.25, 0.5), 'hm' is 1.5; and if 'hm' is in [0.5, 1), 's' is 0.5. It is not necessary to actually figure out which interval 'hm' is in, as we can do it with the following loop: s = 0.5; do while (hm < 0.5); s = s + 1; hm = 2*hm; end; The 'logarithm_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: logarithm_: proc options (support); dcl x float bin; dcl log_10_of_2 float bin (63) static options (constant) init (3.010299956639811952137e-01), log_e_of_2 float bin (63) static options (constant) init (6.931471805599453094172e-01), max_value float bin (27) static options (constant) init (1.701411834604691217e38); dcl math_error_ entry (fixed bin); log_base_10_: entry (x) returns (float bin); return (log_10_of_2 * log2 (x)); log_base_2_: entry (x) returns (float bin); return (log2 (x)); log_base_e_: entry (x) returns (float bin); return (log_e_of_2 * log2 (x)); Multics Technical Bulletin MTB-729 Basic Math Functions log_of_negative: call math_error_ (10); return (-max_value); log_of_zero: call math_error_ (9); return (-max_value); log2: proc (x) returns (float bin (63)); dcl x float bin; dcl bias float bin; dcl result float bin (63), xm float bin (63); dcl xe fixed bin; dcl expon fixed bin (7) unaligned based; dcl square_root_half float bin (63) static options (constant) init (.7071067811865475244008), square_root_two float bin (63) static options (constant) init (1.414213562373095048801); if x < 0 then goto log_of_negative; if x = 0 then goto log_of_zero; if square_root_half <= x & x <= square_root_two then result = part_log2_of_ratio (x, 1); else do; xe = addr (x) -> expon; xm = x; addr (xm) -> expon = 0; bias = float (xe) - 0.5; result = part_log2_of_ratio (xm, square_root_half) + bias; end; return (result); end log2; /* part_log2_of_ratio (x, y) calculates log2(x/y), where x/y is in the range [0.5*2**0.5, 2**0.5]. */ part_log2_of_ratio: proc (x, y) returns (float bin (63)); MTB-729 Multics Technical Bulletin Basic Math Functions dcl x float bin (63), y float bin (63); dcl z float bin (63), zz float bin (63); dcl p0 float bin (63) static options (constant) init (2.88539007275213810), p1 float bin (63) static options (constant) init (0.961800759210250522), p2 float bin (63) static options (constant) init (0.576584541348266310), p3 float bin (63) static options (constant) init (0.434255940790007142); z = (x - y) / (x + y); if abs (z) < 4.1968417e-11 then return (z * p0); zz = z * z; return (z * (p0 + zz * (p1 + zz * (p2 + zz * p3)))); end part_log2_of_ratio; end logarithm_; If PL/I supported HFP, the HFP algorithm would be coded like this: hfp_logarithm_: proc options (support); dcl x float hex; dcl log_10_of_2 float hex (63) static options (constant) init (3.010299956639811952137e-01), log_e_of_2 float hex (63) static options (constant) init (6.931471805599453094172e-01), max_value float hex (27) static options (constant) init (1.701411834604691217e38); dcl math_error_ entry (fixed hex); Multics Technical Bulletin MTB-729 Basic Math Functions hfp_log_base_10_: entry (x) returns (float hex); return (hfp_log_10_of_2 * hfp_log2 (x)); hfp_log_base_2_: entry (x) returns (float hex); return (hfp_log2 (x)); hfp_log_base_e_: entry (x) returns (float hex); return (hfp_log_e_of_2 * hfp_log2 (x)); hfp_log_of_negative: call math_error_ (10); return (-max_value); hfp_log_of_zero: call math_error_ (9); return (-max_value); hfp_log2: proc (x) returns (float hex (63)); dcl x float hex; dcl bias float hex; dcl result float hex (63), xm float hex (63); dcl xe fixed hex, shift fixed hex; dcl expon fixed hex (7) unaligned based; dcl square_root_half float hex (63) static options (constant) init (.7071067811865475244008), square_root_two float hex (63) static options (constant) init (1.414213562373095048801); if x < 0 then goto hfp_log_of_negative; if x = 0 then goto hfp_log_of_zero; if square_root_half <= x & x <= square_root_two then result = hfp_part_log2_of_ratio (x, 1); else do; xe = addr (x) -> expon; xm = x; addr (xm) -> expon = 0; shift = 0; MTB-729 Multics Technical Bulletin Basic Math Functions do while (xm < 0.5); xm = 2 * xm; shift = shift + 1; end; bias = float (4 * xe - shift) - 0.5; result = hfp_part_log2_of_ratio (xm, square_root_half) + bias; end; return (result); end hfp_log2; /* hfp_part_log2_of_ratio (x, y) calculates log2(x/y), where x/y is in the range [0.5*2**0.5, 2**0.5]. */ hfp_part_log2_of_ratio: proc (x, y) returns (float hex (63)); dcl x float hex (63), y float hex (63); dcl z float hex (63), zz float hex (63); dcl p0 float hex (63) static options (constant) init (2.88539007275213810), p1 float hex (63) static options (constant) init (0.961800759210250522), p2 float hex (63) static options (constant) init (0.576584541348266310), p3 float hex (63) static options (constant) init (0.434255940790007142); z = (x - y) / (x + y); if abs (z) < 4.1968417e-11 then return (z * p0); zz = z * z; return (z * (p0 + zz * (p1 + zz * (p2 + zz * p3)))); end hfp_part_log2_of_ratio; end hfp_logarithm_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.8. 'double_logarithm_' The 'double_logarithm_' module calculates base 10, 2 and 'e' logarithms to double precision accuracy, in either BFP or HFP mode. There are six entry points corresponding to the six functions implemented by this module. They are: double_log_base_10_ hfp_double_log_base_10_ double_log_base_2_ hfp_double_log_base_2_ double_log_base_e_ hfp_double_log_base_e_ The algorithm used to implement these functions is identical to that used for the single precision 'logarithm_' module, except that the polynomial 'p(x)' used to approximate 'log2((1+z)/(1-z))' to single precision accuracy is replaced by a rational function that gives double precision accuracy. The function 'log2((1+z)/(1-z))' can be approximated to double precision accuracy over [-0.171573, 0.171573] by the rational function 'z*p(z*z)/q(z*z)', where 'p' and 'q' are the polynomials: p(x) = p0 + p1*x + p2*x**2 + p3*x**3 q(x) = q0 + q1*x + q2*x**2 + q3*x**3 + x**4 with: p0 = 513.90458864923992069 p1 = -792.10250577344319906 p2 = 340.70763364903118663 p3 = -35.419160305337449948 q0 = 178.10575834951956203 q1 = -333.89039541217149928 q2 = 193.75591463035879517 q3 = -35.526251110400238735 One or more underflows will occur in the evaluation of 'z*p(z*z)/q(z*z)' if the magnitude of 'z' is less than 3.8332335e-20. This is easy to avoid, since 'z*p(z*z)/q(z*z)' has the same value as 'z*p0/q0' to 71 bits of precision if the magnitude of 'z' is less than 1.27420168e-11. The 'double_logarithm_' module is implemented in ALM for efficiency and support of the HFP routines. A BFP-only version could be implemented in PL/I. It would look like this: double_logarithm_: proc options (support); dcl x float bin (63); MTB-729 Multics Technical Bulletin Basic Math Functions dcl log_10_of_2 float bin (63) static options (constant) init (3.010299956639811952137e-01), log_e_of_2 float bin (63) static options (constant) init (6.931471805599453094172e-01), max_value float bin (63) static options (constant) init (1.701411834604691217e38); dcl math_error_ entry (fixed bin); double_log_base_10_: entry (x) returns (float bin (63)); return (log_10_of_2 * log2 (x)); double_log_base_2_: entry (x) returns (float bin (63)); return (log2 (x)); double_log_base_e_: entry (x) returns (float bin (63)); return (log_e_of_2 * log2 (x)); log_of_negative: call math_error_ (21); return (-max_value); log_of_zero: call math_error_ (20); return (-max_value); log2: proc (x) returns (float bin (63)); dcl x float bin (63); dcl bias float bin (63), result float bin (63), xm float bin (63); dcl xe fixed bin; dcl expon fixed bin (7) unaligned based; dcl square_root_half float bin (63) static options (constant) init (.7071067811865475244008), square_root_two float bin (63) static options (constant) init (1.414213562373095048801); Multics Technical Bulletin MTB-729 Basic Math Functions if x < 0 then goto log_of_negative; if x = 0 then goto log_of_zero; if square_root_half <= x & x <= square_root_two then result = part_log2_of_ratio (x, 1); else do; xe = addr (x) -> expon; xm = x; addr (xm) -> expon = 0; bias = float (xe) - 0.5; result = part_log2_of_ratio (xm, square_root_half) + bias; end; return (result); end log2; /* part_log2_of_ratio (x, y) calculates log2(x/y), where x/y is in the range [0.5*2**0.5, 2**0.5]. */ part_log2_of_ratio: proc (x, y) returns (float bin (63)); dcl x float bin (63), y float bin (63); dcl p float bin (63), q float bin (63), z float bin (63), zz float bin (63); dcl p0 float bin (63) static options (constant) init (513.90458864923992069), p1 float bin (63) static options (constant) init (-792.10250577344319906), p2 float bin (63) static options (constant) init (340.70763364903118663), p3 float bin (63) static options (constant) init (-35.419160305337449948), q0 float bin (63) static options (constant) init (178.10575834951956203), q1 float bin (63) static options (constant) init (-333.89039541217149928), MTB-729 Multics Technical Bulletin Basic Math Functions q2 float bin (63) static options (constant) init (193.75591463035879517), q3 float bin (63) static options (constant) init (-35.526251110400238735); z = (x - y) / (x + y); if abs (z) < 1.27420168e-11 then do; p = p0; q = q0; end; else do; zz = z * z; p = p0 + zz * (p1 + zz * (p2 + zz * p3)); q = q0 + zz * (q1 + zz * (q2 + zz * (q3 + zz))); end; return (z * p / q); end part_log2_of_ratio; end double_logarithm_; If PL/I supported HFP, the HFP algorithm would be coded like this: hfp_double_logarithm_: proc options (support); dcl x float hex (63); dcl log_10_of_2 float hex (63) static options (constant) init (3.010299956639811952137e-01), log_e_of_2 float hex (63) static options (constant) init (6.931471805599453094172e-01), max_value float hex (63) static options (constant) init (1.701411834604691217e38); dcl math_error_ entry (fixed hex); hfp_double_log_base_10_: entry (x) returns (float hex); return (hfp_log_10_of_2 * hfp_log2 (x)); Multics Technical Bulletin MTB-729 Basic Math Functions hfp_double_log_base_2_: entry (x) returns (float hex); return (hfp_log2 (x)); hfp_double_log_base_e_: entry (x) returns (float hex); return (hfp_log_e_of_2 * hfp_log2 (x)); hfp_log_of_negative: call math_error_ (21); return (-max_value); hfp_log_of_zero: call math_error_ (20); return (-max_value); hfp_log2: proc (x) returns (float hex (63)); dcl x float hex (63); dcl bias float hex (63), result float hex (63), xm float hex (63); dcl xe fixed hex; dcl expon fixed hex (7) unaligned based; dcl square_root_half float hex (63) static options (constant) init (.7071067811865475244008), square_root_two float hex (63) static options (constant) init (1.414213562373095048801); if x < 0 then goto hfp_log_of_negative; if x = 0 then goto hfp_log_of_zero; if square_root_half <= x & x <= square_root_two then result = hfp_part_log2_of_ratio (x, 1); else do; xe = addr (x) -> expon; xm = x; addr (xm) -> expon = 0; shift = 0; do while (xm < 0.5); xm = 2 * xm; shift = shift + 1; end; MTB-729 Multics Technical Bulletin Basic Math Functions bias = float (4 * xe - shift) - 0.5; result = hfp_part_log2_of_ratio (xm, hfp_square_root_half) + bias; end; return (result); end hfp_log2; /* hfp_part_log2_of_ratio (x, y) calculates log2(x/y), where x/y is in the range [0.5*2**0.5, 2**0.5]. */ hfp_part_log2_of_ratio: proc (x, y) returns (float hex (63)); dcl x float hex (63), y float hex (63); dcl p float hex (63), q float hex (63), z float hex (63), zz float hex (63); dcl p0 float hex (63) static options (constant) init (513.90458864923992069), p1 float hex (63) static options (constant) init (-792.10250577344319906), p2 float hex (63) static options (constant) init (340.70763364903118663), p3 float hex (63) static options (constant) init (-35.419160305337449948), q0 float hex (63) static options (constant) init (178.10575834951956203), q1 float hex (63) static options (constant) init (-333.89039541217149928), q2 float hex (63) static options (constant) init (193.75591463035879517), q3 float hex (63) static options (constant) init (-35.526251110400238735); Multics Technical Bulletin MTB-729 Basic Math Functions z = (x - y) / (x + y); if abs (z) < 1.27420168e-11 then do; p = p0; q = q0; end; else do; zz = z * z; p = p0 + zz * (p1 + zz * (p2 + zz * p3)); q = q0 + zz * (q1 + zz * (q2 + zz * (q3 + zz))); end; return (z * p / q); end hfp_part_log2_of_ratio; end hfp_double_logarithm_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.9. 'principal_angle_' The trigonometric math functions are periodic with a period of pi radians (tangent and cotangent) or 2*pi radians (sine and cosine). This periodicity is used to reduce the input angle to an equivalent angle in the range of -pi/4 to 7*pi/4 radians (or -45 to 315 degrees, if we are dealing in degrees). The code to perform this reduction for the single precision trigonometric math routines is provided by the 'principal_angle_' module. The 'principal_angle_' module has four entry points, corresponding to which mode (BFP or HFP) and what units (degrees or radians) that the input angle is specified in. They are: principal_degrees_ hfp_principal_degrees_ principal_radians_ hfp_principal_radians_ It may seem strange that the range of the equivalent angle starts at -pi/4 radians rather than zero. This is because the equivalent angle is returned in two parts. The first part is an angle (called the principal angle) in the range of -pi/4 to pi/4 radians (or -45 to 45 degrees, if we are dealing in degrees). The second part is an integer (called the shift), in the range 0 to 3. The equivalent angle is the sum of the principal angle and 'shift' right angles. The mathematics needed to calculate the principal angle and shift are straightforward. Let 'n' be the number of right angles in the input angle, to the nearest integer: n = floor(angle/right_angle + 0.5) where 'right_angle' is 90 if we are dealing in degrees and pi/2 if we are dealing in radians. Then the principal angle and the shift are simply: principal_angle = angle - n*right_angle shift = mod(n, 4) The simplicity of the above specification for calculating the principal angle and shift may mislead one into believing that the implementation is equally simple. For example, if the calculation was being done in PL/I, one might be tempted to simply code: n = floor(divide(angle, right_angle, 63) + 0.5); principal_angle = angle - multiply(n, angle, 63); shift = mod(n, 4); Multics Technical Bulletin MTB-729 Basic Math Functions At first glance, the above PL/I code looks very reasonable, especially considering that all floating point operations have been forced to be carried out in double precision. However, there are two serious problems with this code. The first problem occurs in the calculation of 'n'. It is crucial that 'n' be calculated exactly, for if it is not, the value calculated for 'principal_angle' will not be in the range of -pi/4 to pi/4 radians (-45 to 45 degrees). The problem is that if 'angle' is too large, then 0.5 will be additively insignificant compared to the result of dividing 'angle' by 'right_angle'. The solution is to limit the magnitude of 'angle'. In 'principal_angle_', we limit the magnitude of the angle to less than 2**54 in BFP mode and 2**48 in HFP mode. If an angle of larger magnitude is supplied, an error condition will be signalled. (Actually, these bounds are more conservative than are needed for correct calculation of 'n'. However, they are required for the calculation of the principal angle when we are dealing in radians, and they are adopted in the degrees case for convenience.) The second (and more serious) problem occurs only when dealing in radians. It occurs in the calculation of the principal angle. The problem is that the amount of precision needed in the calculation is actually higher than double precision. The reason that higher than double precision accuracy is needed is that the measure of a right angle (i.e. pi/2 radians) cannot be expressed exactly as a floating point number, no matter how much precision is used. An example using decimal floating point arithmetic may make this clearer. Suppose that we are restricted to using 3 digit floating decimal arithmetic and want to calculate the principal angle of 16. The value of pi/2 to 3 significant digits is 1.57, so we would calculate 'n' as floor(16/1.57 + 0.5) = 10 and the principal angle would be 16 - 10*1.57 = 0.3. The value calculated for 'n' is correct, but the value calculated for the principal angle has only 1 significant digit; it should be 0.292. In order to get an answer with three significant digits, we must approximate pi/2 by 1.5708 and use 5 digit arithmetic. As a further example, again suppose that we are restricted to using 3 digit floating decimal arithmetic and want to calculate the principal angle of 157. We would find that 'n' is 100 and the principal angle is 0. The value of 'n' is correct, but the principal angle has no significant digits; it should be 0.0796. In order to get the correct answer, we would need to approximate pi/2 by 1.570796 and use 7 digit arithmetic! MTB-729 Multics Technical Bulletin Basic Math Functions The above decimal examples suggest that if the input angle has 'k' digits of precision and we desire 'l' digits of precision for the principal angle, then we must use approximately 'k + l' digit arithmetic in the calculation. Since our input angle has a precision of 27 bits and double precision arithmetic is accurate to 63 bits, this would seem to indicate that double precision calculations should be sufficient, if we wish the principal angle to be accurate to 27 bits. The problem with this reasoning is that we must actually return the value of the principal angle accurate to 63 bits! It may seem strange to require the principal angle to have 63 bits of precision when the input angle has only 27 bits of precision. The reasoning behind this is that the routines that call the 'principal_angle_' module want a result that has double precision accuracy because they may use other properties of the trigonometric functions to perform further transformations of the angle. If the principal angle were calculated to only single precision accuracy, these transformations could result in an angle that had little or no accuracy. The task of calculating the principal angle is broken into two cases according to the magnitude of 'n'. If the magnitude of 'n' is less than 2**27 in BFP mode or 2**24 in HFP mode, 'angle - n*pi/2' is calculated with arithmetic that simulates 81 bits of precision. Otherwise, it is calculated with arithmetic that simulates 108 bits of precision. The calculation of 'angle - n*pi/2' to higher than double precision accuracy may seem a little daunting. However, note that we only need the result to have double precision accuracy. This means that the calculation can be simplified to finding a set of floating point numbers whose sum (if summed with sufficiently high precision) is n*pi/2 to the desired precision. The members of this set can then be subtracted from 'angle' one at a time, in decreasing order of magnitude, using ordinary double precision subtraction. Consider the first case, where the magnitude of 'n' is less than 2**27 in BFP mode or 2**24 in HFP mode. Let 'n1' be the value 'n' expressed as a single precision floating point number, and let 'half_pi1', 'half_pi2', and 'half_pi3' be single precision floating point numbers whose sum represents pi/2 to 81 bits of precision. Then 'n*pi/2' is represented to 81 bits of precision by the expression: n1*(half_p1 + half_pi2 + half_pi3) This expression is equivalent to the sum t1 + t2 + t3, where: Multics Technical Bulletin MTB-729 Basic Math Functions t1 = n1*half_pi1 t2 = n1*half_pi2 t3 = n1*half_pi3 Each of the values t1, t2 and t3 can be held exactly in double precision (actually only float bin (54) is needed), so the principal angle can be calculated using ordinary double precision subtraction by: angle - t1 - t2 - t3 Now consider the case where 'n' is between 2**27 and 2**54 in BFP mode, or between 2**24 and 2**48 in HFP mode. Let 'n1' and 'n2' be a pair of single precision floating point numbers whose sum exactly represents the value of 'n' (i.e. 'n1' contains the most significant 27 bits of the floating point representation of 'n', and 'n2' contains the remaining bits). Similarly, let 'half_pi1', 'half_pi2', 'half_pi3' and 'half_pi4' be four single precision floating point numbers whose sum represents pi/2 to 108 bits of accuracy. Then 'n*pi/2' is presented to 108 bits of accuracy by the expression: (n1 + n2)*(half_pi1 + half_pi2 + half_pi3 + half_pi4) Some simple algebra will show that this sum is equivalent to t1 + t2 + t3 + t4 + t5, where: t1 = n1*half_pi1 t2 = n1*half_pi2 + n2*half_pi1 t3 = n1*half_pi3 + n2*half_pi2 t4 = n1*half_pi4 + n2*half_pi3 t5 = n2*half_pi4 Each of the above values can be held exactly in double precision. (In fact, t1 and t5 are only float bin (54) values, and t2, t3 and t4 are only float bin (55) values.) Their exact sum has a precision of 54 + 108 = 162 bits. Except for a possible carry, the t5 term does not contribute anything to the first 108 bits of the sum. Thus, since we are only interested in 108 bit accuracy, the principal_angle can be calculated using standard double precision subtraction by the expression: angle - t1 - t2 - t3 - t4 It is possible to code the BFP version of 'principal_angle_' in PL/I, but not the HFP version, since PL/I does not support HFP. However, if PL/I did support HFP, the differences between the BFP and HFP versions would be minimal. The following PL/I code shows how the BFP version would look, and includes comments where the HFP version would differ. MTB-729 Multics Technical Bulletin Basic Math Functions principal_angle_: proc options (support); dcl angle float bin (27) parameter, principal_angle float bin (63) parameter, shift fixed bin (1) parameter; dcl half_pi float bin (63) static init (1.57079632679489661923), half_pi1 float bin (27) static init (1.57079632580280304), half_pi2 float bin (27) static init (9.920935739593517155e-10), half_pi3 float bin (27) static init (5.721188709663574904e-18), half_pi4 float bin (27) static init (1.644625689296520716e-26), one_over_half_pi float bin (63) static init (6.3661977236758134307553e-1); dcl n fixed bin (71), n1 float bin (63), n2 float bin (63), t1 float bin (63), t2 float bin (63), t3 float bin (63), t4 float bin (63); dcl math_error_ entry (fixed bin); principal_degrees_: entry (angle, principal_angle, shift); if abs (angle) < 1e54b /* HFP: 1e48b */ then do; n = floor (divide (angle, 90, 63) + 0.5); shift = mod (n, 4); principal_angle = angle - multiply (n, 90, 63); end; else do; call math_error_ (70); /* HFP: call math_error_ (71); */ principal_angle, shift = 0; end; return; Multics Technical Bulletin MTB-729 Basic Math Functions principal_radians_: entry (angle, principal_angle, shift); if abs (angle) < 1e54b /* HFP: 1e48b */ then do; n = floor (angle * one_over_half_pi + 0.5); shift = mod (n, 4); if abs (n) < 1e27b /* HFP: 1e24b */ then do; n1 = n; t1 = multiply (n1, half_pi1, 63); t2 = multiply (n1, half_pi2, 63); t3 = multiply (n1, half_pi3, 63); principal_angle = angle - t1 - t2 - t3; end; else do; n1 = n; n2 = n - n1; t1 = multiply (n1, half_pi1, 63); t2 = multiply (n1, half_pi2, 63) + multiply (n2, half_pi1, 63); t3 = multiply (n1, half_pi3, 63) + multiply (n2, half_pi2, 63); t4 = multiply (n1, half_pi4, 63) + multiply (n2, half_pi3, 63); principal_angle = angle - t1 - t2 - t3 - t4; end; end; else do; call math_error_ (70); /* HFP: call math_error_ (71); */ principal_angle, shift = 0; end; end principal_angle_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.10. 'double_principal_angle_' The trigonometric math functions are periodic with a period of pi radians (tangent and cotangent) or 2*pi radians (sine and cosine). This periodicity is used to reduce the input angle to an equivalent angle in the range of -pi/4 to 7*pi/4 radians (or -45 to 315 degrees, if we are dealing in degrees). The code to perform this reduction for the double precision trigonometric math routines is provided by the 'double_principal_angle_' module. The 'double_principal_angle_' module has four entry points, corresponding to which mode (BFP or HFP) and what units (degrees or radians) that the input angle is specified in. They are: double_principal_degrees_ hfp_double_principal_degrees_ double_principal_radians_ hfp_double_principal_radians_ The algorithms to implement the entry points involving degrees are the same as for the single precision 'principal_angle_' module. The algorithms to implement the entry points involving radians are analogous to those used in 'principal_angle_', but are of higher precision. In 'principal_angle_', the code for calculating the principal angle when dealing in radians is broken into two cases according to the magnitude of the integer 'n' (i.e. the number of right angles in the angle, to the nearest integer). In the first case (where the magnitude of 'n' is less than 2**27 in BFP mode and 2**24 in HFP mode) the value of the principal angle is calculated with arithmetic that simulates 81 bits of precision. In the second case, the computation is performed with arithmetic that simulates 108 bits of precision. In 'double_principal_angle_', the code for calculating the principal angle when dealing with radians is broken into the same two cases according to the magnitude of 'n'. Both cases calculate the value of the principal angle with arithmetic that simulates 135 bits of precision. It is possible to code the BFP version of 'double_principal_angle_' in PL/I, but not the HFP version, since PL/I does not support HFP. However, if PL/I did support HFP, the differences between the BFP and HFP versions would be minimal. The following PL/I code shows how the BFP version would look, and includes comments where the HFP version would differ. double_principal_angle_: proc options (support); Multics Technical Bulletin MTB-729 Basic Math Functions dcl angle float bin (63) parameter, principal_angle float bin (63) parameter, shift fixed bin (1) parameter; dcl half_pi float bin (63) static init (1.57079632679489661923), half_pi1 float bin (27) static init (1.57079632580280304), half_pi2 float bin (27) static init (9.920935739593517155e-10), half_pi3 float bin (27) static init (5.721188709663574904e-18), half_pi4 float bin (27) static init (1.644625689296520716e-26), half_pi5 float bin (27) static init (4.334905080264810101e-35), one_over_half_pi float bin (63) static init (6.3661977236758134307553e-1); dcl n fixed bin (71), n1 float bin (63), n2 float bin (63), t1 float bin (63), t2 float bin (63), t3 float bin (63), t4 float bin (63), t5 float bin (63); dcl math_error_ entry (fixed bin); double_principal_degrees_: entry (angle, principal_angle, shift); if abs (angle) < 1e54b /* HFP: 1e48b */ then do; n = floor (divide (angle, 90, 63) + 0.5); shift = mod (n, 4); principal_angle = angle - multiply (n, 90, 63); end; else do; call math_error_ (72); /* HFP: call math_error_ (73); */ principal_angle, shift = 0; end; return; MTB-729 Multics Technical Bulletin Basic Math Functions double_principal_radians_: entry (angle, principal_angle, shift); if abs (angle) < 1e54b /* HFP: 1e48b */ then do; n = floor (angle * one_over_half_pi + 0.5); shift = mod (n, 4); if abs (n) < 1e27b /* HFP: 1e24b */ then do; n1 = n; t1 = multiply (n1, half_pi1, 63); t2 = multiply (n1, half_pi2, 63); t3 = multiply (n1, half_pi3, 63); t4 = multiply (n1, half_pi4, 63); t5 = multiply (n1, half_pi5, 63); principal_angle = angle - t1 - t2 - t3 - t4 - t5; end; else do; n1 = n; n2 = n - n1; t1 = multiply (n1, half_pi1, 63); t2 = multiply (n1, half_pi2, 63) + multiply (n2, half_pi1, 63); t3 = multiply (n1, half_pi3, 63) + multiply (n2, half_pi2, 63); t4 = multiply (n1, half_pi4, 63) + multiply (n2, half_pi3, 63); t5 = multiply (n1, half_pi5, 63) + multiply (n2, half_pi4, 63); principal_angle = round (angle - t1 - t2 - t3 - t4 - t5, 63); end; end; else do; call math_error_ (72); /* HFP: call math_error_ (73); */ principal_angle, shift = 0; end; end double_principal_angle_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.11. 'sine_' The 'sine_' module calculates to single precision accuracy the sine or cosine of an angle given in degrees or radians, in either BFP or HFP mode. There are eight entry points corresponding to the eight functions implemented by this module. They are: cosine_degrees_ hfp_cosine_degrees_ cosine_radians_ hfp_cosine_radians_ sine_degrees_ hfp_sine_degrees_ sine_radians_ hfp_sine_radians_ The only differences between the BFP and corresponding HFP entry points to this module are in the bit representations of the floating point constants used. Therefore, we need only describe the four BFP entry points. The strategy used to implement each of the entry points consists of two parts. First, the properties of the sine and cosine functions are used to calculate, from the input angle, an angle in the range of [-pi/2, pi/2] radians or [-90, 90] degrees whose sine is equal to the desired result. Second, that angle is converted to radians (if it is not already in radians) and passed to an internal procedure 'part_sine' which approximates the sine function to single precision accuracy over [-pi/2, pi/2]. If we are approximating sine and the input angle is in the range [-pi/2, pi/2] radians or [-90, 90] degrees, no transformation is needed, and we can go directly to step 2. If we are approximating cosine and the input angle is in the range of [-pi, pi] radians (or [-180, 180] degrees), we can transform the angle to pi/2 - abs(angle) radians (or 90 - abs(angle) degrees), since for any angle 'a', sin(pi/2 - a) = sin(pi/2 + a) = cos(a). If we are neither approximating sine over [-pi/2, pi/2] nor cosine over [-pi, pi], then we use the fact that sine and cosine are periodic functions with a period of 2*pi to calculate an equivalent angle in the range [-pi/4, 7*pi/4] radians (or [-45, 315] degrees). This calculation is performed by the 'principal_angle_' module. It returns a floating point value called the principal angle (in the range [-pi/4, pi/4] radians or [-45, 45] degrees) and an integer called the shift (between 0 and 3, inclusive) such that the equivalent angle is the sum 'principal_angle + shift*right_angle', where 'right_angle' is pi/2 if we are dealing in radians and 90 if we are dealing in degrees. MTB-729 Multics Technical Bulletin Basic Math Functions If we are approximating the sine function and 'shift' is zero, then 'principal_angle' is the angle needed for step 2. Otherwise, we must transform 'principal_angle' using one of the following identities: sin(principal_angle + right_angle) = sin(right_angle - abs(principal_angle)) sin(principal_angle + 2*right_angle) = sin(-principal_angle) sin(principal_angle + 3*right_angle) = sin(abs(principal_angle) - right_angle) cos(principal_angle) = sin(right_angle - abs(principal_angle)) cos(principal_angle + right_angle) = sin(-principal_angle) cos(principal_angle + 2*right_angle) = sin(abs(principal_angle) - right_angle) cos(principal_angle + 3*right_angle) = sin(principal_angle) It only remains to define the algorithm used in step 2 to approximate the sine function over [-pi/2, pi/2] radians to single precision accuracy. This is accomplished by evaluating 'x*p(x**2)', where 'p(x)' is the fifth degree polynomial: p(x) = p0 + p1*x + p2*x**2 + ... + p9*x**9 where: p0 = 9.999999999788e-1 p1 = -1.6666666608826e-1 p2 = 8.333330720556e-3 p3 = -1.98408328231e-4 p4 = 2.7523971068e-6 p5 = -2.386834641e-8 For values of 'x' near zero, evaluation of 'x*p(x**2)' will produce one or more underflows. This can easily be avoided since the sine of an angle is just the angle in radians (to 71 bits of precision) if the magnitude of the angle is less than about 5e-10. If coded in PL/I, the BFP-only version of the 'sine_' module would look like this: Multics Technical Bulletin MTB-729 Basic Math Functions sine_: proc options (support); dcl angle float bin (27) parameter; dcl principal_degrees_ entry (float bin (27), float bin (63), fixed bin (1)), principal_radians_ entry (float bin (27), float bin (63), fixed bin (1)); dcl half_pi float bin (63) static init (1.57079632679489661923), half_pi1 float bin (63) static init (1.570796326794896619), half_pi2 float bin (63) static init (8.333742918520878328e-20), one_degree float bin (63) static init (0.0174532925199432957692), pi float bin (63) static init (3.14159265358979323846); dcl abs_angle float bin (27), degrees float bin (63), shift fixed bin (1), radians float bin (63); cosine_radians_: entry (angle) returns (float bin); if abs (angle) <= pi then do; radians = angle; goto case_radians (1); end; call principal_radians_ (angle, radians, shift); goto case_radians (shift + 1); sine_radians_: entry (angle) returns (float bin); if abs (angle) <= half_pi then do; radians = angle; goto case_radians (0); end; call principal_radians_ (angle, radians, shift); goto case_radians (shift); MTB-729 Multics Technical Bulletin Basic Math Functions cosine_degrees_: entry (angle) returns (float bin); if abs (angle) <= 180 then do; degrees = angle; goto case_degrees (1); end; call principal_degrees_ (angle, degrees, shift); goto case_degrees (shift + 1); sine_degrees_: entry (angle) returns (float bin); if abs (angle) <= 90 then do; degrees = angle; goto case_degrees (0); end; call principal_degrees_ (angle, degrees, shift); goto case_degrees (shift); case_radians (0): case_radians (4): return (part_sine (radians)); case_radians (1): return ( part_sine (half_pi1 + half_pi2 - abs (radians))); case_radians (2): return (part_sine (-radians)); case_radians (3): return ( part_sine (abs (radians) - half_pi1 - half_pi2)); case_degrees (1): degrees = 90 - abs (degrees); goto case_degrees (0); case_degrees (2): degrees = -degrees; goto case_degrees (0); case_degrees (3): degrees = abs (degrees) - 90; goto case_degrees (0); Multics Technical Bulletin MTB-729 Basic Math Functions case_degrees (0): case_degrees (4): if abs (degrees) < 8.418858142948452884e-38 /* HFP: < 2.670821537926801391e-154 */ then radians = 0; else radians = degrees * one_degree; return (part_sine (radians)); part_sine: proc (x) returns (float bin (63)); /* Calculate 'sin(x)' for 'x' in [-pi/2, pi/2]. */ dcl x float bin (63); dcl xx float bin (63); dcl p0 float bin (63) static init (9.999999999788e-1), p1 float bin (63) static init (-1.6666666608826e-1), p2 float bin (63) static init (8.333330720556e-3), p3 float bin (63) static init (-1.98408328231e-4), p4 float bin (63) static init (2.7523971068e-6), p5 float bin (63) static init (-2.386834641e-8); if abs (x) < 5e-10 then return (x); xx = x * x; return ( round (x * (p0 + xx * (p1 + xx * (p2 + xx * (p3 + xx * (p4 + xx * p5))))), 27)); end part_sine; end sine_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.12. 'double_sine_' The 'double_sine_' module calculates to double precision accuracy the sine or cosine of an angle given in degrees or radians, in either BFP or HFP mode. There are eight entry points corresponding to the eight functions implemented by this module. They are: double_cosine_degrees_ hfp_double_cosine_degrees_ double_cosine_radians_ hfp_double_cosine_radians_ double_sine_degrees_ hfp_double_sine_degrees_ double_sine_radians_ hfp_double_sine_radians_ The only difference in the algorithms used in the 'double_sine_' module from those used in the 'sine_' module is that the polynomial 'p(x)' used to approximate sine over the interval [-pi/2, pi/2] radians is replaced by a higher degree polynomial which gives double precision accuracy. Specifically, 'p(x)' is replaced by: p(x) = p0 + p1*x + p2*x**2 + ... + p9*x**9 where: p0 = 9.9999999999999999998e-1 p1 = -1.6666666666666666664e-1 p2 = 8.333333333333332952e-3 p3 = -1.9841269841269648946e-4 p4 = 2.7557319223936401884e-6 p5 = -2.5052108378101760587e-8 p6 = 1.60590431721336921e-10 p7 = -7.647126379076958e-13 p8 = 2.8101852815318e-15 p9 = -7.9798971356e-18 The BFP and corresponding HFP routines differ only in the bit representations of the floating point constants used, so we will only present the PL/I code for the BFP version. It would look like this: double_sine_: proc options (support); dcl angle float bin (63) parameter; dcl double_principal_degrees_ entry (float bin (63), float bin (63), fixed bin (1)), double_principal_radians_ entry (float bin (63), float bin (63), fixed bin (1)); Multics Technical Bulletin MTB-729 Basic Math Functions dcl half_pi float bin (63) static init (1.57079632679489661923), half_pi1 float bin (63) static init (1.570796326794896619), half_pi2 float bin (63) static init (8.333742918520878328e-20), one_degree float bin (63) static init (0.0174532925199432957692), pi float bin (63) static init (3.14159265358979323846); dcl abs_angle float bin (63), degrees float bin (63), shift fixed bin (1), radians float bin (63); double_cosine_radians_: entry (angle) returns (float bin (63)); if abs (angle) <= pi then do; radians = angle; goto case_radians (1); end; call double_principal_radians_ (angle, radians, shift); goto case_radians (shift + 1); double_sine_radians_: entry (angle) returns (float bin (63)); if abs (angle) <= half_pi then do; radians = angle; goto case_radians (0); end; call double_principal_radians_ (angle, radians, shift); goto case_radians (shift); double_cosine_degrees_: entry (angle) returns (float bin (63)); if abs (angle) <= 180 then do; degrees = angle; goto case_degrees (1); end; call double_principal_degrees_ (angle, degrees, shift); goto case_degrees (shift + 1); MTB-729 Multics Technical Bulletin Basic Math Functions double_sine_degrees_: entry (angle) returns (float bin (63)); if abs (angle) <= 90 then do; degrees = angle; goto case_degrees (0); end; call double_principal_degrees_ (angle, degrees, shift); goto case_degrees (shift); case_radians (0): case_radians (4): return (part_sine (radians)); case_radians (1): return ( part_sine (half_pi1 + half_pi2 - abs (radians))); case_radians (2): return (part_sine (-radians)); case_radians (3): return ( part_sine (abs (radians) - half_pi1 - half_pi2)); case_degrees (1): degrees = 90 - abs (degrees); goto case_degrees (0); case_degrees (2): degrees = -degrees; goto case_degrees (0); case_degrees (3): degrees = abs (degrees) - 90; goto case_degrees (0); case_degrees (0): case_degrees (4): if abs (degrees) < 8.418858142948452884e-38 /* HFP: < 2.670821537926801391e-154 */ then radians = 0; else radians = degrees * one_degree; return (part_sine (radians)); part_sine: proc (x) returns (float bin (63)); /* Calculate 'sin(x)' for 'x' in [-pi/2, pi/2]. */ Multics Technical Bulletin MTB-729 Basic Math Functions dcl x float bin (63); dcl xx float bin (63); dcl p0 float bin (63) static init (9.9999999999999999998e-1), p1 float bin (63) static init (-1.6666666666666666664e-1), p2 float bin (63) static init (8.333333333333332952e-3), p3 float bin (63) static init (-1.9841269841269648946e-4), p4 float bin (63) static init (2.7557319223936401884e-6), p5 float bin (63) static init (-2.5052108378101760587e-8), p6 float bin (63) static init (1.60590431721336921e-10), p7 float bin (63) static init (-7.647126379076958e-13), p8 float bin (63) static init (2.8101852815318e-15), p9 float bin (63) static init (-7.9798971356e-18); if abs (x) < 5e-10 then return (x); xx = x * x; return ( round (x * (p0 + xx * (p1 + xx * (p2 + xx * (p3 + xx * (p4 + xx * (p5 + xx * (p6 + xx * (p7 + xx * (p8 + xx * p9)))))))) ), 63)); end part_sine; end double_sine_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.13. 'square_root_' The 'square_root_' module calculates to single precision accuracy the square root of a single precision BFP or HFP number. There are two entry points into the module: square_root_ hfp_square_root_ In BFP mode, the problem of finding the square root of an arbitrary positive floating point number can easily be reduced to that of finding the square root of a related number in the range of [0.25, 1). To see how this is done, let 'x' be any positive normalized floating point number. Then 'x' is represented in the hardware as 'm * 2**e', where 'm' is a binary fraction in the range [0.5, 1), and 'e' is a binary integer in the range [-128, 127]. There are two cases to consider, according to whether 'e' is even or odd. Suppose that 'e' is even. Let 'k' be the integer such that e = k + k. Then: sqrt(x) = sqrt(m * 2**e) = sqrt(m) * sqrt(2**(k + k)) = sqrt(m) * 2**k On the other hand, suppose 'e' is odd. Let 'k' be the integer such that e = k + k - 1. Then: sqrt(x) = sqrt(m * 2**e) = sqrt(0.5*m * 2**(e+1)) = sqrt(0.5*m) * sqrt(2**(k + k)) = sqrt(0.5*m) * 2**k Thus, in order to find the square root of a BFP number whose exponent is even, we need only calculate the square root of the mantissa and divide the exponent by two. And to find the square root of a BFP number whose exponent is odd, we need only calculate the square root of half the mantissa and divide the exponent plus one by two. In the first case, we need find the square root of a number in the range [0.5, 1), and in the second case we must find the square root of a number in [0.25, 0.5). Thus the problem is reduced to finding the square root of numbers in the range [0.25, 1). There is an interative technique, called Newton's method, for finding square roots which doubles the number of correct digits on each iteration. Specifically, if 'r' is an approximation to the square root of 'x', then 0.5*(r + x/r) is twice as good an approximation. Multics Technical Bulletin MTB-729 Basic Math Functions All that is needed to apply Newton's method to finding square roots is to choose the initial approximation. The 'square_root_' module uses a quadratic minmax polynomial approximation to the square root function over [0.25, 1) to provide the initial value. This polynomial has an accuracy of 8 or more bits over that interval, so it only requires two iterations of Newton's method to get the square accurate to better than single precision (about 32 bits, to be specific). The polynomial for calculating the initial approximation is: p(x) = p0 + p1*x + p2*x**2 where: p0 = 0.25927688 p1 = 1.0521212 p2 = -0.31632214 There is no danger of overflow or underflow in evaluating this polynomial, since we are only interested in applying it over the interval [0.25, 1). If the algorithm for calculating BFP square roots were coded in PL/I, it would look like this: square_root_: proc (x) returns (float bin); dcl x float bin; dcl math_error_ entry (fixed bin); dcl p0 float bin (27) static init (2.5927688e-1), p1 float bin (27) static init (1.0521212e0), p2 float bin (27) static init (-3.1632214e-1); dcl e fixed bin, m float bin, root_m float bin, root_x float bin; dcl expon fixed bin (7) unaligned based; MTB-729 Multics Technical Bulletin Basic Math Functions if x = 0 then return (0); if x < 0 then do; call math_error_ (13); x = -x; end; e = addr (x) -> expon; m = x; addr (m) -> expon = 0; if mod (e, 2) = 1 then m = 0.5 * m; root_m = p0 + m * (p1 + m * p2); root_m = 0.5 * (root_m + m / root_m); root_m = 0.5 * (root_m + float (m, 63) / root_m); root_x = root_m; addr (root_x) -> expon = addr (root_x) -> expon + divide (e + 1, 2, 7); return (root_x); end square_root_; It is only slightly more complicated to calculate single precision square roots in HFP mode. Again, it is done by reducing the problem to one of finding square roots in the range [0.25, 1). However, the reduction takes a slightly different form. Let 'x' be any normalized positive HFP number. Then 'x' is represented in the hardware as 'm * 16**e', where 'm' is a binary fraction in the range [0.0625, 1) and 'e' is a binary integer in the range [-128, 127). Again there are two cases to consider, according to whether 'e' is even or odd. Suppose that 'e' is even. Let 'k' be the integer such that e = k + k. Then: sqrt(x) = sqrt(m * 16**e) = sqrt(m) * sqrt(16**(k + k)) = sqrt(m) * 16**k On the other hand, suppose 'e' is odd. Let 'k' be the integer such that e = k + k - 1. Then: sqrt(x) = sqrt(m * 16**e) = sqrt(0.0625*m * 16**(e+1)) = 0.25*sqrt(m) * 16**k Multics Technical Bulletin MTB-729 Basic Math Functions There are two differences between this reduction and the one used in the BFP case: 'm' is in the range [0.0625, 1) instead of [0.25, 1), and there is a scale factor of 0.25 in the case that 'e' is odd. If 'm' is in the range [0.25, 1), we can calculate sqrt(m) in the same way as we do in the BFP case (i.e. use the minmax polynomial to get an approximate root accurate to 8 bits, and then apply two iterations of Newton's method). If 'm' is in the range [0.0625, 0.25), we can use the identity sqrt(m) = 0.5*sqrt(4*m), to reduce the problem to finding the square root of a number in [0.25, 1). It is not possible to code this algorithm in PL/I, since PL/I does not support HFP mode. However, if PL/I did support HFP mode, the code to implement this algorithm would look like this: hfp_square_root_: proc (x) returns (float hex); dcl x float hex; dcl math_error_ entry (fixed bin); dcl p0 float hex (27) static init (2.5927688e-1), p1 float hex (27) static init (1.0521212e0), p2 float hex (27) static init (-3.1632214e-1); dcl e fixed bin, m float hex, root_m float hex, root_x float hex, scale float hex; dcl expon fixed bin (7) unaligned based; if x = 0 then return (0); if x < 0 then do; call math_error_ (13); x = -x; end; e = addr (x) -> expon; m = x; addr (m) -> expon = 0; MTB-729 Multics Technical Bulletin Basic Math Functions if m >= 0.25 then scale = 0.5; else do; m = 4 * m; scale = 0.25; end; if mod (e, 2) = 1 then scale = 0.25 * scale; root_m = p0 + m * (p1 + m * p2); root_m = 0.5 * (root_m + m / root_m); root_m = scale * (root_m + float (m, 63) / root_m); root_x = root_m; addr (root_x) -> expon = addr (root_x) -> expon + divide (e + 1, 2, 7); return (root_x); end hfp_square_root_; Multics Technical Bulletin MTB-729 Basic Math Functions 6.14. 'double_square_root_' The 'double_square_root_' module calculates to double precision accuracy the square root of a double precision BFP or HFP number. There are two entry points into the module: double_square_root_ hfp_double_square_root_ The algorithms used to implement these routines differ from their single precision counterparts only in that an extra iteration of Newton's method is performed to increase the precision of the root from 32 to 64 bits. If coded in PL/I, the algorithm for finding double precision square roots in BFP mode would look like this: double_square_root_: proc (x) returns (float bin (63)); dcl x float bin (63); dcl math_error_ entry (fixed bin); dcl p0 float bin (27) static init (2.5927688e-1), p1 float bin (27) static init (1.0521212e0), p2 float bin (27) static init (-3.1632214e-1); dcl e fixed bin, m float bin (63), root_m float bin (63), root_x float bin (63); dcl expon fixed bin (7) unaligned based, m_top float bin (27) based (addr (m)), root_m_top float bin (27) based (addr (root_m)); if x = 0 then return (0); e = addr (x) -> expon; m = x; addr (m) -> expon = 0; if m < 0 then do; call math_error_ (22); m = -m; MTB-729 Multics Technical Bulletin Basic Math Functions end; if mod (e, 2) = 1 then m = 0.5 * m; root_m_top = p0 + m_top * (p1 + m_top * p2); root_m = 0.5 * (root_m_top + m_top / root_m_top); root_m = 0.5 * (root_m + m / root_m); root_m = 0.5 * (root_m + m / root_m); root_x = root_m; addr (root_x) -> expon = addr (root_x) -> expon + divide (e + 1, 2, 7); return (root_x); end double_square_root_; If PL/I supported HFP mode, the implementation of the double precision square root algorithm would look like this: hfp_double_square_root_: proc (x) returns (float hex (63)); dcl x float hex (63); dcl math_error_ entry (fixed bin); dcl p0 float hex (27) static init (2.5927688e-1), p1 float hex (27) static init (1.0521212e0), p2 float hex (27) static init (-3.1632214e-1); dcl e fixed bin, m float hex (63), root_m float hex (63), root_x float hex (63), scale float hex (27); dcl expon fixed bin (7) unaligned based, m_top float hex (27) based (addr (m)), root_m_top float hex (27) based (addr (root_m)); if x = 0 then return (0); e = addr (x) -> expon; m = x; addr (m) -> expon = 0; if m < 0 then do; Multics Technical Bulletin MTB-729 Basic Math Functions call math_error_ (22); m = -m; end; if m >= 0.25 then scale = 0.5; else do; m = 4 * m; scale = 0.25; end; if mod (e, 2) = 1 then scale = 0.25 * scale; root_m_top = p0 + m_top * (p1 + m_top * p2); root_m = 0.5 * (root_m_top + m_top / root_m_top); root_m = 0.5 * (root_m + m / root_m); root_m = scale * (root_m + m / root_m); root_x = root_m; addr (root_x) -> expon = addr (root_x) -> expon + divide (e + 1, 2, 7); return (root_x); end hfp_double_square_root_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.15. 'tangent_' The 'tangent_' module approximates, to single precision accuracy, the tangent and cotangent functions of an angle expressed in either degrees or radians, in either BFP or HFP mode. The module has eight entry points corresponding to the eight functions implemented. They are: cotangent_degrees_ hfp_cotangent_degrees_ cotangent_radians_ hfp_cotangent_radians_ tangent_degrees_ hfp_tangent_degrees_ tangent_radians_ hfp_tangent_radians_ The only differences between the HFP routines and the corresponding BFP routines is in the bit representation of constants, so it is sufficient to describe only the BFP routines. Each routine consists of two parts. In the first part, the input angle is transformed into an equivalent angle in the range of [-pi/4, pi/4] radians or [-45, 45] degrees whose tangent or cotangent gives the desired result. In the second part, this angle is converted to radians (if it is not already in radians) and passed to an internal procedure which aproximates the tangent or cotangent function to single precision accuracy over [-pi/4, pi/4]. The transformation of the input angle to a corresponding one in the range of [-pi/4, pi/4] radians (or [45, 45] degrees) is carried out as follows. First a flag is set to indicate whether we are approximating tangent or cotangent. Next the magnitude of the input angle is checked. If it is in the range [-pi/4, pi/4] radians (or [-45, 45] degrees), no transformation is needed, and we can immediately approximate the tangent or cotangent, as specified by the function flag. Otherwise, we use the fact that tangent and cotangent or periodic functions with a period of pi radians (or 180 degrees) to calculate an angle in the range [-pi/4, 3*pi/4] radians (or [-45, 135] degrees) whose tangent and cotangent are the same as those of the input angle. The calculation of this equivalent angle is carried out by the 'principal_angle_' module. It assumes a period of 2*pi rather than pi so that it can also be used for the sine and cosine functions. The equivalent angle it returns is thus in the interval [-pi/4, 7*pi/4] radians (or [-45, 315] degrees). However, the way this angle is specified makes it easy to find the equivalent angle in the range [-pi/4, 3*pi/4] radians: 'principal_angle_' returns an angle, called the principal angle, in the range [-pi/4, pi/4] radians (or [-45, 45] degrees), and an Multics Technical Bulletin MTB-729 Basic Math Functions integer, called the shift, in the range 0 through 3, such that the equivalent angle is given by the sum 'principal_angle + shift*right_angle', where 'right_angle' is pi/2 if we are dealing in radians, and 90 if we are dealing in degrees. The equivalent angle that we desire is 'principal_angle + mod(shift, 2)*right_angle'. If 'shift' is even, the equivalent angle is just the principal angle, which is already in the range required for step two. However, if 'shift' is odd, the equivalent angle is a right angle bigger than the principal angle, and further reduction is required. The reduction is very simple; change the function flag to indicate the cofunction, and use the negative of the principal angle as the angle. This reduction is based on the identities: cot(a + pi/2) = cot(a - pi/2) = tan(-a) tan(a + pi/2) = tan(a - pi/2) = cot(-a) Tangent and cotangent or not only cofunctions, but also reciprocal functions (i.e. tan(x) = 1/cot(x)). Thus if we find a single precision approximation to tangent over [-pi/4, pi/4], the reciprocal will give us a single precision approximation to cotangent over that interval. The tangent function can be approximated to single precision accuracy over [-pi/4, pi/4] by the rational function r(x) = x*p(x**2)/q(x**2) where 'p' and 'q' are the polynomials: p(x) = p0 + p1*x + p2*x**2 q(x) = q0 + q1*x + x**2 where: p0 = 6.26041119547433196e1 p1 = -6.97168400629442048e0 p2 = 6.73091025875915e-2 q0 = 6.260411195336057284e1 q1 = -2.78397212200427089e1 Evaluation of 'p' and 'q' close to zero can cause underflows. This is easily avoided though, since tan(x) is just 'x' (to 71 bits of precision) and cot(x) is just 1/x (to 71 bits of precision) if the magnitude of 'x' is less than about 5e-10. MTB-729 Multics Technical Bulletin Basic Math Functions Cotangent has a pole at zero, so if we are asked to find the cotangent of a value too close to zero, the result will overflow. We must check for this condition. If it occurs, we signal an overflow and return the largest floating point number of appropriate sign that can be represented. If asked to evaluate cotangent at zero (where the sign of the function is undefined), we arbitrarily assume the result to be positive infinity. The only differences between the BFP and corresponding HFP version of each routine is the bit representation of the floating point constants. Thus it is only necessary to show the PL/I implementation of the BFP version. It would look like this: tangent_: proc options (support); dcl angle float bin (27) parameter; dcl principal_degrees_ entry (float bin (27), float bin (63), fixed bin (1)), principal_radians_ entry (float bin (27), float bin (63), fixed bin (1)); dcl Cotangent fixed bin (1) static init (-1), Tangent fixed bin (1) static init (1), eps1 float bin (63) static init (8.41885814294845879e-38), eps2 float bin (63) static init (5e-10), one_degree float bin (63) static init (1.74532925199432957692e-2), quarter_pi float bin (63) static init (0.785398163397448309615); dcl degrees float bin (63), radians float bin (63), shift fixed bin (1); cotangent_degrees_: entry (angle) returns (float bin (27)); shift = 0; if abs (angle) <= 45 then degrees = angle; else call principal_degrees_ (angle, degrees, shift); if abs (degrees) < eps1 then return (infinity (degrees)); else if (shift = 0 | shift = 2) then return ( Multics Technical Bulletin MTB-729 Basic Math Functions part_tan_or_cot (Cotangent, degrees * one_degree)); else /* if (shift = 1 | shift = 3) */ return ( part_tan_or_cot (Tangent, -(degrees * one_degree))); cotangent_radians_: entry (angle) returns (float bin (27)); shift = 0; if abs (angle) <= quarter_pi then radians = angle; else call principal_radians_ (angle, radians, shift); if (shift = 0 | shift = 2) then return (part_tan_or_cot (Cotangent, radians)); else /* if (shift = 1 | shift = 3) */ return (part_tan_or_cot (Tangent, -radians)); tangent_degrees_: entry (angle) returns (float bin (27)); shift = 0; if abs (angle) <= 45 then degrees = angle; else call principal_degrees_ (angle, degrees, shift); if abs (degrees) < eps1 then return (0); else if (shift = 0 | shift = 2) then return ( part_tan_or_cot (Tangent, degrees * one_degree)); else /* if (shift = 1 | shift = 3) */ return ( part_tan_or_cot (Cotangent, -(degrees * one_degree))); tangent_radians_: entry (angle) returns (float bin (27)); shift = 0; if abs (angle) <= quarter_pi then radians = angle; else call principal_radians_ (angle, radians, shift); if (shift = 0 | shift = 2) then return (part_tan_or_cot (Tangent, radians)); else /* if (shift = 1 | shift = 3) */ return (part_tan_or_cot (Cotangent, -radians)); MTB-729 Multics Technical Bulletin Basic Math Functions infinity: proc (sign) returns (float bin (63)); dcl sign float bin (63); dcl max_real float bin (27) static init ( 1.11111111111111111111111111e126b); dcl result float bin (27); /* Signal overflow. */ result = max_real + max_real; if sign < 0 then return (-max_real); else return (max_real); end infinity; part_tan_or_cot: proc (function, x) returns (float bin (27)); /* Calculate either 'tan(x)' or 'cot(x)', to single precision accuracy, for 'x' in [-pi/4, pi/4]. */ dcl function fixed bin (1), x float bin (63); dcl p0 float bin (63) static init (6.26041119547433196e1), p1 float bin (63) static init (-6.97168400629442048e0), p2 float bin (63) static init (6.73091025875915e-2), q0 float bin (63) static init (6.260411195336057284e1), q1 float bin (63) static init (-2.78397212200427089e1); dcl p float bin (63), q float bin (63), result float bin (63), xx float bin (63); if abs (x) < eps2 then do; result = x; if function = Cotangent then if abs (result) <= 1.87085736509965619556e-39 Multics Technical Bulletin MTB-729 Basic Math Functions then result = infinity ((result)); else result = 1 / result; end; else do; xx = x * x; q = q0 + xx * (q1 + xx); p = x * (p0 + xx * (p1 + xx * p2)); if function = Tangent then result = p / q; else result = q / p; end; return (round (result, 27)); end part_tan_or_cot; end tangent_; MTB-729 Multics Technical Bulletin Basic Math Functions 6.16. 'double_tangent_' The 'double_tangent_' module calculates to double precision accuracy the tangent or cotangent of an angle given in degrees or radians, in either BFP or HFP mode. There are eight entry points corresponding to the eight functions implemented by this module. They are: double_cotangent_degrees_ hfp_double_cotangent_degrees_ double_cotangent_radians_ hfp_double_cotangent_radians_ double_tangent_degrees_ hfp_double_tangent_degrees_ double_tangent_radians_ hfp_double_tangent_radians_ The only difference in the algorithms used in the 'double_tangent_' module from those used in the 'tangent_' module is that the polynomials 'p(x)' and 'q(x)' are replaced by ones of higher order which give double precision accuracy. The new polynomials are: p(x) = p0 + p1*x + p2*x**2 + p3*x**3 + p4*x**4 q(x) = q0 + q1*x + q2*x**2 + q3*x**3 + x**4 where: p0 = 7.61637646334745151e5 p1 = -1.045644297708972282e5 p2 = 2.990139654186652781e3 p3 = -2.195407375452258719e1 p4 = 2.229548191006262686e-2 q0 = 7.61637646334745151e5 q1 = -3.584436452158122785e5 q2 = 2.091966854815805879e4 q3 = -3.069448235422934591e2 The only differences between the BFP and corresponding HFP version of each routine is the bit representation of the floating point constants. Thus it is only necessary to show the PL/I implementation of the BFP version. It would look like this: double_tangent_: proc options (support); dcl angle float bin (63) parameter; dcl double_principal_degrees_ entry (float bin (63), float bin (63), fixed bin (1)), double_principal_radians_ entry (float bin (63), float bin (63), fixed bin (1)); Multics Technical Bulletin MTB-729 Basic Math Functions dcl Cotangent fixed bin (1) static init (-1), Tangent fixed bin (1) static init (1), eps1 float bin (63) static init (8.41885814294845879e-38), eps2 float bin (63) static init (5e-10), one_degree float bin (63) static init (1.74532925199432957692e-2), quarter_pi float bin (63) static init (0.785398163397448309615); dcl degrees float bin (63), radians float bin (63), shift fixed bin (1); double_cotangent_degrees_: entry (angle) returns (float bin (63)); shift = 0; if abs (angle) <= 45 then degrees = angle; else call double_principal_degrees_ (angle, degrees, shift); if abs (degrees) < eps1 then return (infinity (degrees)); else if (shift = 0 | shift = 2) then return ( part_tan_or_cot (Cotangent, degrees * one_degree)); else /* if (shift = 1 | shift = 3) */ return ( part_tan_or_cot (Tangent, -(degrees * one_degree))); double_cotangent_radians_: entry (angle) returns (float bin (63)); shift = 0; if abs (angle) <= quarter_pi then radians = angle; else call double_principal_radians_ (angle, radians, shift); if (shift = 0 | shift = 2) then return (part_tan_or_cot (Cotangent, radians)); else /* if (shift = 1 | shift = 3) */ return (part_tan_or_cot (Tangent, -radians)); MTB-729 Multics Technical Bulletin Basic Math Functions double_tangent_degrees_: entry (angle) returns (float bin (63)); shift = 0; if abs (angle) <= 45 then degrees = angle; else call double_principal_degrees_ (angle, degrees, shift); if abs (degrees) < eps1 then return (0); else if (shift = 0 | shift = 2) then return ( part_tan_or_cot (Tangent, degrees * one_degree)); else /* if (shift = 1 | shift = 3) */ return ( part_tan_or_cot (Cotangent, -(degrees * one_degree))); double_tangent_radians_: entry (angle) returns (float bin (63)); shift = 0; if abs (angle) <= quarter_pi then radians = angle; else call double_principal_radians_ (angle, radians, shift); if (shift = 0 | shift = 2) then return (part_tan_or_cot (Tangent, radians)); else /* if (shift = 1 | shift = 3) */ return (part_tan_or_cot (Cotangent, -radians)); infinity: proc (sign) returns (float bin (63)); dcl sign float bin (63); dcl max_real float bin (63) static init ( 1.11111111111111111111111111e126b); dcl result float bin (63); /* Signal overflow. */ result = max_real + max_real; if sign < 0 then return (-max_real); else return (max_real); end infinity; Multics Technical Bulletin MTB-729 Basic Math Functions part_tan_or_cot: proc (function, x) returns (float bin (63)); /* Calculate either 'tan(x)' or 'cot(x)', to single precision accuracy, for 'x' in [-pi/4, pi/4]. */ dcl function fixed bin (1), x float bin (63); dcl p0 float bin (63) static init (7.61637646334745151e5), p1 float bin (63) static init (-1.045644297708972282e5), p2 float bin (63) static init (2.990139654186652781e3), p3 float bin (63) static init (-2.195407375452258719e1), p4 float bin (63) static init (2.229548191006262686e-2), q0 float bin (63) static init (7.61637646334745151e5), q1 float bin (63) static init (-3.584436452158122785e5), q2 float bin (63) static init (2.091966854815805879e4), q3 float bin (63) static init (-3.069448235422934591e2); dcl p float bin (63), q float bin (63), result float bin (63), xx float bin (63); if abs (x) < eps2 then do; result = x; if function = Cotangent then if abs (result) <= 1.87085736509965619556e-39 then result = infinity ((result)); else result = 1 / result; end; else do; xx = x * x; q = q0 + xx * (q1 + xx * (q2 + xx * (q3 + xx))); p = x * (p0 + xx MTB-729 Multics Technical Bulletin Basic Math Functions * (p1 + xx * (p2 + xx * (p3 + xx * p4))) ); if function = Tangent then result = p / q; else result = q / p; end; return (round (result, 63)); end part_tan_or_cot; end double_tangent_;