LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 5io.e>4 t\o.2£>7-2°)2 cop. 2 Digitized by the Internet Archive in 2013 http://archive.org/details/leftlanguagefore291flow Report No. 291 7>ux4 coo-1018-1158 LEFT: A LANGUAGE FOR EDITING AND FORMATING TEXT by Stanley J. Flower dew October 30, 1968 JAN DEPARTMENT OF COMPUTER SCIENCE • UNIVERSITY OF ILLINOIS • URBANA, ILLINOIS COO-1018-1158 Report No. 291 LEFT: A LANGUAGE FOR EDITING AND FORMATING TEXT by Stanley J. Flower dew October 30, 1968 Department of Computer Science University of Illinois Urbana, Illinois 6l801 ^Supported in part by Contract AT(ll-l)-10l8 with the U.S. Atomic Energy Commission and the Advanced Research Projects Agency. 1. INTRODUCTION This language describes the logical and typographical organiza- tion of text rather than its natural language content. In doing so, LEFT embodies a distinction between the medium-independent content of text and the purely typographical or packaging description. The packaging description is contained in commands of the LEFT language and the medium- independent content is represented by character strings. It should be emphasized that these strings need not be digitally encoded as alphanumeric characters but may occur as video records referred to by pointers. Ideally, any item of text can be replaced by an equivalent LEFT program. It is hoped that LEFT will provide the link between two aspects of text which until now have been only intuitively related, viz. the printed document and the style manual description of that document. The report can conveniently be split into three parts: 1) Sections 1 and 2 explain the motivation for the develop- ment of LEFT and its program structure. 2) Sections 3, h and 5 form the main part of the report and describe the formating and editing commands. 3) Sections 6, 7 and 8 are supplementary and contain a BNF syntax for LEFT, LEFT object code and a bibliography. -1- 1.1 Role of the Language in Text Reformatting The motivation for the development of the LEFT language is the need to convert a document from its original typographic style to a form satisfying the specifications of a new style manual. For this purpose, it is convenient to think of text as being specified in terms of a major structure such as book, catalogue, journal, brochure, etc., which can in turn be expressed in terms of minor structures such as bibliographies, indexes, chapters, etc. The syntactic and semantic descriptions of text will be submitted to a universal translator to produce a compiler for the text language. By use of the universal translator, future modifications to the syntactic descriptions of the major and minor struc- tures can be readily incorporated. In the conversion from one style manual to another, five transition stages can be identified in the reformatting of text: (1) Typographical recognition is performed on the document which is converted to an equivalent LEFT description. The depth to which recognition is performed is determined by the complexity of the style-change e.g., if it is necessary to change only a. folio, then recognition need separate just the folio from the remaining page text . The LEFT description will ideally contain all data pertaining to the format of the original document. Knowledge or ready determination of the input style will greatly speed up the recognition process. (2) The LEFT description is parsed by means of Floyd productions in- to the equivalent textual description which contains a categori- zation of such entities as heading, footnote ^etc. The language corresponding to the Floyd productions is largely context free, but provisions for occasional context sensitive features is also incorporated. -2- (3) The style conversion per se occurs in this transi- tion. The textual description of the original docu- ment is translated into the new textual description as governed by the object style manual. This can be effected either by (A) permuting constructs in the right hand sides of production q.v. Schwarzenberger , Adv. in Computer Typesetting, 1966 or by (B) table look-up in which each table corresponds to a particular style and each row of each table corresponds to some keyword such as author. (k) The new textual description is translated to its equivalent LEFT object code by means of BNF productions. (5) The LEFT object program is translated to the photo- composition language of the printer. A heuristic procedure can be applied to a series of documents printed according to a fixed but unknown style manual in order to identify and classify features which are characteristic of that style, and so aid in the recognition of future documents. Another area, in which LEFT might be applied, is in the automatic indexing of serials. The output from state (2) described above would provide essential data for the cataloging of a document. -3- 1.2 Modes of Usage of the Language Textual input can be considered to be (l) generated line- by-line in a syntax-directed dialog mode, (2) a syntactically correct string where character, line or paragraph recognition has been done directly from the input text image, or (3) a complete program in LEFT which does not make use of the syntactic recognition procedures described above. Mode (l) requires on-line communication while Modes (2) and (3) can be implemented using conventional batch processing techniques. In Modes (l) and (2), the parsed item of text will then be translated into a LEFT program by means of a selected style manual which automatically generates the formating commands and delimiters. 1) Dialog Mode The user chooses an input format as specified by a style manual and generates /edits text either by tele- type or by cursor motion at a console. The dialog instructions help the user to prepare the correct form of input for his text. Syntactical analysis is performed by the computer to generate LEFT object code. 2) Automatic Mode Syntactical recognition from the input text is essentially automatic. The user may assist manually in the syntax recognition process by generating or selecting syntactical items. The user may or may not indicate the input style manual. 3) Precoded Mode The user constructs a complete program in LEFT consisting of text or text pointers interspersed with LEFT commands. -k- 2. PROGRAM STRUCTURE 2.1 Text Delimiters Commands are distinguished from the text by two methods: In the explicit method, a delimiter is inserted before and after each string of text which explicitly appears in the program. In this report, the delimiter will be indicated by the symbol ('). Restrictions on the use of delimiters in text are the same as for PL/l. In the implicit method, text need not explicitly appear in the program but may be selected by a text pointer. The general form of text is then $$. The text pointer may refer to text which is in video or analog form. 2.2 Command Separators Two successive commands are separated from one another by a semicolon. 2.3 Procedures A procedure is a block of the form X: PROCEDURE (FP , FP , ,FP ); END X; -5- which is called repetitively from the main program. The above sequence of statements declares the label X to be the name of a procedure possessing formal parameters FP. (l < i < n). In general, the identifier X can be a. string of maximum length 31 over the set of alphanumeric characters and the brea.k char- acter. Control is transferred to the above procedure by a call statement which takes the form CALL X (AP,, AP , ,AP ): l 7 2' ' n' ' where AP (l < i < n) are the actual parameters. 2..h Block Structure A block is a. continuous piece of program starting with BEGIN or PRO- CEDURE and finishing with END where the keywords BEGIN, PROCEDURE and END occur in the language. Ea.ch complete program is itself a, block and consists of a nested structure of blocks. Any block may be named by preceding it by a, label and a. colon. The label of a named block may follow the corresponding END statement. BEGIN is always followed immediately by a. semi-colon. Block structure makes it possible to format short sections requiring unusual commands without having to respecify the commands for the main section when the short section is completed. 2 . 5 Comments Comments can be inserted wherever a blank may occur in the program and take the form /**/ where is a string of any length that does not contain the character pa.ir */ . -6- 3. FORMAT COMMANDS In general, each format command refers to following text rather than preceding text. When a format command has been specified, it remains in effect until the end of the block in which it appears unless superceded by a later command. The format commands are listed below. Also, the following notation has been used: mn, op where mn is the number of picas and op is the number of points, (l inch = 6 picas = 72 points, 1 point = .0138"). qr where, depending on context, qr is the number of lines or number of pages. 3.1 Font Command F n Interpretation Identifies the font to be used, where n is the code of the new font. 3.2 Line -Width Command W This command specifies as the line-width or measure. 3 . 3 Margin Commands LM /- -]• Specifies the position of a. non-standard left margin where is the dis- tance between the standard and non-stan- dard left margins. A plus sign indicates that the non-standard margin is to the right of the standard margin and con- versely for a minus sign. -7- RM £+ -j- NM Specifies the position of a non-standard right margin where is the dis- tance between the standard and non-stan- dard right margins. A plus sign indicates that the non-standard margin is to the right of the standard margin and converse- ly for a minus sign. Restores standard margins after the use of LM or RM commands . 3.1+ Justification Commands SJ RJ Suppresses justification. (Text is always justified unless an SJ command is speci- fied.) Restores justification after a non-justi- fied passage. 3. 5 Page -Length Command PL Specifies page-length of . 3.6 New Page Command PG Begin new page, immediately following the PG Comma.nd. 3 .7 Qua.d Comma.nds QL QC Quad left (has the same effect as SJ). Quad center ( centers each line between the margins in use) . -8- QR RJ Quad right (moves each line of text to abutt the right margin, leaving the left end ragged) . Restores justification after the use of Ql, QC or QR. 3-8 New Line/New Paragraph Commands NL NP When this command is encountered, all text on the line being composed will be quaded to the left according to the quad command in use and a new line will be started at the current left margin. A line which is ended by an NL command is not justified. Specifies the start of a new paragraph. If is unspecified, a standard indentation of 11 points, regardless of the font in use, will be applied to the first word on each paragraph. If specified, then is the indentation for the new paragraph. The NP command immediately precedes the first word of the new para- graph or succeeds the final word of the old paragraph. The last line of a para- graph is not justified. 3 '9 Folio Commands PN Page numbering at center and foot of page if is 1. Page numbering at center and top of page if is 2. Page numbering at top on right for recto and at top on left for -9- PNN FN n verso if is 3. Page num- bering at foot on right for recto and at foot on left for verso if is k. Specifies the next page number to be . Specifies the font to be used for page numbering . 3-10 Running Head Commands RH '' ERH VRH «' EVRH RRH ■' EREH URH «» EURH UVRH »' EUVRH URRH '« EURRH Specifies that occurs at the top of each page as a running head. The folio at the top of a page containing a running head is set on the same line as the run- ning head. For the first page of each chapter, the running head is omitted and the type page is usually shortened and dropped. The running head should not ap- pear above full-page text illustrations or upright tables when the latter are set wider than the text matter. Specifies , which is usually title of work or main section, to occur at the top of each verso page. Specifies , which is usually title of chapter or subsection, to occur at top of each recto page. Similar to RH, VRH and RRH respectively, except that the running head is under- lined. -10- 3.11 Subscript and Superscript Commands SB ■' ESB SP »« ESP Specifies to be a subscript. Specifies to be a superscript. 3.12 olumn Commands TWO TWO (p,q) TWO (p,q,r,s) THREE THREE (p, q,r,s) THREE (p, q,r, s,t,u) ONE Print page in two columns . Standard margins will be used unless specified differently in parentheses af- ter the TWO command. TWO (p,q) identi- fies the positions of the right margin of the left column and the left margin of the right column. TWO (p,q, r, s) can be used to specify all four margins. Print page in three columns with the parameters in parentheses specifying the non-standard margins. End TWO or THREE command i.e., print page in one column. 3.13 Distance Between Base-Lines BL SB Specifies the distance between base-lines of successive lines of type. This com- mand overrules the standard distance be- tween base-lines which is implied by the current fond command. Suppresses the effect of a BL command and makes the spacing depend upon the font in use. -11- 3 . 1^ Blank Line Commands VL NL; VL NP; VL VL A number of blank lines specified by are left before a non- indented line. Identical to VL . A number of blank lines specified by- parameter are left before a new para- graph . Similar to VL . 3.15 Waiting Commands WALn EWALn WAPn EWAPn Delays the execution of following com- mands up to an EWALn command by the num- ber of lines expressed in . The delay begins at the start of the pre- sent line in which the WALn command appears. Terminates the effect of a WALn command. Similar to WALn except that the delay is expressed in terms of pages rather than lines. Terminates the effect of a WAPn command. N.B. ::= , n ::= . For example, the sequence WAP1 <1>; F3; EWAP1 will cause the font in use to be changed to F3 at the beginning of the next page. (The waiting commands circumvent the prob- lem of not knowing which word begins a particular line or page.) -12- 3.l6 Commands for Hanging Paragraphs H n CH Hang paragraph. This command causes the following paragraphs to be set with hanging indentation, i.e., the first line of the paragraph is set without an indent and all remaining lines are in- dented on the left. The number of letter spaces of indent is specified by n. If n is unspecified the indentation is 11 points . Cancels an Hn command. 3.17 Miscellaneous Commands CP AP T mn s Occurs at beginning of fragment which must be continued below the usual page- end to avoid breaking it. Occurs at end of fragment which must be continued below the usual page-end to avoid breaking it. If the material which occurs after a CP command and before an AP command causes output to be set below the usual page-end, then the material immediately following the AP command will immediately be started on a new page. Skip the number of horizontal spaces specified by mn and if s is stated, fill the intervening space with the symbol s . If s is not stated, then the intervening space is left blank. -13- RA , RC , ER V , EV IT EIT FT EFT LIG Rotate counterclockwise. This command causes the characters of the font in use to be rotated ninety degrees counter- clockwise and the following text to be set vertically starting at the horizontal position specified by the first argument and vertical position specified by second argument . r Similar to the RA command except that the font is rotated clockwise. End RA or RC command. This command causes the following text to be set vertically without any rotation of the font . End V command. The following words up to an EIT command are to be in italics. End IT command. Footnote follows which must be set at the foot of the page in which the FT command appears. End footnote. Ligature of succeeding two letters. 3.18 Default Interpretation of Commands A particular feature of the commands is the default interpretation which may be given to many of them, thus making it often unnecessary to specify certain formatting commands. For instance if no commands are given regarding font, margin positions, justification, and page-length then the text will be automatically justified and standard font, margin-positions and page-length will be automatically selected. -Ik- h. EDITING COMMANDS It is necessary that the text be ordered in some hierarchical fash- ion, thus making it possible to use the language for an information retrieval system. This will be achieved by the use of pointers as in IBAL (see IBAL Manual 2.5.1). In general, five pieces of data are necessary to reference a portion of text in which the primitive is a word. These are document, chapter, paragraph or item, line and word. The first name in the pointer is that of the document or the unique file name. If subscripts follow the name, the name is called a subscripted name. The first subscript specifies the number of a chapter of the document, and following subscripts represent paragraph, line and word respectively. If other names (separated by periods) follow the first name, the pointer is called a qualified name. Any of these subsidiary nodes may be connected to a lower level node either by a subscript or by a name, un- til a terminal node is reached. The second name, if it occurs is the name of a chapter, the third name is the name or first word of a paragraph, the fourth name is the first word of a line and the fifth name is the word itself. When using the first word of a paragraph etc., care must be taken that no other chapter begins with the same word. In the case of ambiguity, the subscripted form of pointer must be used. -15- k.l Absolute and Relative Commands There are two methods of referencing text to be edited. In the ab- solute method the subscripts and names referring to a portion of the text give the absolute location of that piece of text. In the relative method, minus or plus signs may be inserted before any of the subscripts. The location of text to be edited is specified relative to the editing command. It is necessary to specify two pointers for most editing commands. The first pointer indicates the item of text at which the editing starts and the second pointer indicates the item of text at which the editing ends. In general, there are six possible combinations of pointers for each editing com- mand requiring two pointers viz absolute, absolute; absolute, relative; rela- tive, absolute; relative, relative; absolute, increment; relative, increment. An incremental pointer is simply an increment starting at the previously speci- fied absolute or relative pointer. ■16- h .2 List of Editing Commands Command E , I / R , / E# E## U , Explanation Erase text specified by pointers. Insert in position specified by pointer. Replace in position specified by pointers. Erase previous word. Erase the present line. Underline the text specified by the pointers. Standard teletype commands can be added to the above list. If text is erased, then the following text is closed up. In the above table, replace is equivalent to erase and insert i.e., replace is not necessarily one to one, Special commands are convenient for the editing of editing commands' These will be called editing editing commands. -17- k.3 List of Editing Editing Commands EC ; 4 IC / RC , / Erase all commands in item speci- fied "by pointers. Insert in position specified by pointer. Replace in position speci- fied by the two pointers. The IC and RC command commands are similar to the I and R commands with the restriction that must be one or more commands and no other text, -18- 5. GRAPHICAL EDITING In this section the construction of graphics will not be considered but rather the overall manipulations which can be performed by treating each illustration as a 'black-box' or window. 5.1 Graphical Commands Command EG MG N/ IG , RG , MOG , CG M,, M / Explanation Erase illustration denoted by . Magnify illustration denoted by , by the linear multipli- cation factor N. (N may be less than unity resulting in demagnifi- cation. ) Insert illustration denoted by first in position de- noted by second . Replace illustration denoted by first by new illustration denoted by second . Move illustration denoted by first to position in text indi- cated by second . Crop the illustration denoted by by the factor M, on the vertical sides and M on the hori- zontal sides. -19- ROG C / ROG A / ROG I / Rotate illustration denoted by , 90 clockwise. Rotate illustration denoted by , 90 counterclockwise, Invert illustration denoted by . IC / ' ' EIC IL / «' EIL Insert caption denoted by into illustration denoted by . Insert legend denoted by into illustration denoted by . -20- 6. LEFT SYNTAX ::= ::= BEGIN; _[_ ; END ::=