1. [1] Introduction. This is XƎTEX, a program derived from and extending the capabilities of TEX, a document compiler intended to produce typesetting of high quality. The Pascal program that follows is the definition of TEX82, a standard version of TEX that is designed to be highly portable so that identical output will be obtainable on a great variety of computers.

The main purpose of the following program is to explain the algorithms of TEX as clearly as possible. As a result, the program will not necessarily be very efficient when a particular Pascal compiler has translated it into a particular machine language. However, the program has been written so that it can be tuned to run efficiently in a wide variety of operating environments by making comparatively few changes. Such flexibility is possible because the documentation that follows is written in the WEB language, which is at a higher level than Pascal; the preprocessing step that converts WEB to Pascal is able to introduce most of the necessary refinements. Semi-automatic translation to other languages is also feasible, because the program below does not make extensive use of features that are peculiar to Pascal.

A large piece of software like TEX has inherent complexity that cannot be reduced below a certain level of difficulty, although each individual part is fairly simple by itself. The WEB language is intended to make the algorithms as readable as possible, by reflecting the way the individual program pieces fit together and by providing the cross-references that connect different parts. Detailed comments about what is going on, and about why things were done in certain ways, have been liberally sprinkled throughout the program. These comments explain features of the implementation, but they rarely attempt to explain the TEX language itself, since the reader is supposed to be familiar with The TEXbook.

2. The present implementation has a long ancestry, beginning in the summer of 1977, when Michael F. Plass and Frank M. Liang designed and coded a prototype based on some specifications that the author had made in May of that year. This original protoTEX included macro definitions and elementary manipulations on boxes and glue, but it did not have line-breaking, page-breaking, mathematical formulas, alignment routines, error recovery, or the present semantic nest; furthermore, it used character lists instead of token lists, so that a control sequence like \halign was represented by a list of seven characters. A complete version of TEX was designed and coded by the author in late 1977 and early 1978; that program, like its prototype, was written in the SAIL language, for which an excellent debugging system was available. Preliminary plans to convert the SAIL code into a form somewhat like the present “web” were developed by Luis Trabb Pardo and the author at the beginning of 1979, and a complete implementation was created by Ignacio A. Zabala in 1979 and 1980. The TEX82 program, which was written by the author during the latter part of 1981 and the early part of 1982, also incorporates ideas from the 1979 implementation of TEX in MESA that was written by Leonidas Guibas, Robert Sedgewick, and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred refinements were introduced into TEX82 based on the experiences gained with the original implementations, so that essentially every part of the system has been substantially improved. After the appearance of “Version 0” in September 1982, this program benefited greatly from the comments of many other people, notably David R. Fuchs and Howard W. Trickey. A final revision in September 1989 extended the input character set to eight-bit codes and introduced the ability to hyphenate words from different languages, based on some ideas of Michael J. Ferguson.

No doubt there still is plenty of room for improvement, but the author is firmly committed to keeping TEX82 “frozen” from now on; stability and reliability are to be its main virtues.

On the other hand, the WEB description can be extended without changing the core of TEX82 itself, and the program has been designed so that such extensions are not extremely difficult to make. The banner string defined here should be changed whenever TEX undergoes any modifications, so that it will be clear which version of TEX might be the guilty party when a problem arises.

This program contains code for various features extending TEX, therefore this program is called ‘XƎTEX’ and not ‘TEX’; the official name ‘TEX’ by itself is reserved for software systems that are fully compatible with each other. A special test suite called the “TRIP test” is available for helping to determine whether a particular implementation deserves to be known as ‘TEX’ [cf. Stanford Computer Science report CS1027, November 1984].

MLTEX will add new primitives changing the behaviour of TEX. The banner string has to be changed. We do not change the banner string, but will output an additional line to make clear that this is a modified TEX version.

A similar test suite called the “e-TRIP test” is available for helping to determine whether a particular implementation deserves to be known as ‘𝜀-TEX’.

@define eTeX_version => 2 // \.{\\eTeXversion}
// \.{\\eTeXrevision}
@define eTeX_revision => strpool!(".6")
// current \eTeX\ version
@define eTeX_version_string => "-2.6"
@define XeTeX_version => 0 // \.{\\XeTeXversion}
// \.{\\XeTeXrevision}
@define XeTeX_revision => strpool!(".999994")
// current \XeTeX\ version
@define XeTeX_version_string => "-0.999994"
// printed when \XeTeX\ starts
@define XeTeX_banner =>
    "This is XeTeX, Version 3.141592653",
    eTeX_version_string,
    XeTeX_version_string
// printed when \TeX\ starts
@define TeX_banner_k => "This is TeXk, Version 3.141592653"
// printed when \TeX\ starts
@define TeX_banner => "This is TeX, Version 3.141592653"
@define banner => XeTeX_banner
@define banner_k => XeTeX_banner
@define TEX => XETEX // change program name into XETEX 
@define TeXXeT_code => 0 // the \TeXXeT\ feature is optional
// non-zero to enable breaks after en- and em-dashes
@define XeTeX_dash_break_code => 1
// non-zero if the main vertical list is being built upwards
@define XeTeX_upwards_code => 2
// non-zero to use exact glyph height/depth
@define XeTeX_use_glyph_metrics_code => 3
// non-zero to enable \.{\\XeTeXinterchartokens} insertion
@define XeTeX_inter_char_tokens_code => 4
// normalization mode:, 1 for NFC, 2 for NFD, else none
@define XeTeX_input_normalization_code => 5
// input mode for newly opened files
@define XeTeX_default_input_mode_code => 6
@define XeTeX_input_mode_auto => 0
@define XeTeX_input_mode_utf8 => 1
@define XeTeX_input_mode_utf16be => 2
@define XeTeX_input_mode_utf16le => 3
@define XeTeX_input_mode_raw => 4
@define XeTeX_input_mode_icu_mapping => 5
//  str_number of encoding name if mode == ICU
@define XeTeX_default_input_encoding_code => 7
// non-zero to log native fonts used
@define XeTeX_tracing_fonts_code => 8
// controls shaping of space chars in context when using 
// native fonts; set to 1 for contextual adjustment of space 
// width only, and 2 for full cross-space shaping (e.g. 
// multi-word ligatures)
@define XeTeX_interword_space_shaping_code => 9
// controls output of /ActualText for native-word nodes
@define XeTeX_generate_actual_text_code => 10
// sets maximum hyphenatable word length
@define XeTeX_hyphenatable_length_code => 11
// number of \eTeX\ state variables in eqtb 
@define eTeX_states => 12

3. Different Pascals have slightly different conventions, and the present program expresses TEX in terms of the Pascal that was available to the author in 1982. Constructions that apply to this particular compiler, which we shall call Pascal-H, should help the reader see how to make an appropriate interface for other systems if necessary. (Pascal-H is Charles Hedrick’s modification of a compiler for the DECsystem-10 that was originally developed at the University of Hamburg; cf. Software—Practice and Experience 6 (1976), 29–42. The TEX program below is intended to be adaptable, without extensive changes, to most other versions of Pascal, so it does not fully use the admirable features of Pascal-H. Indeed, a conscious effort has been made here to avoid using several idiosyncratic features of standard Pascal itself, so that most of the code can be translated mechanically into other high-level languages. For example, the ‘&with’ and ‘\new’ features are not used, nor are pointer types, set types, or enumerated scalar types; there are no ‘&var’ parameters, except in the case of files — 𝜀-TEX, however, does use ‘&var’ parameters for the reverse function; there are no tag fields on variant records; there are no assignments real = integer ; no procedures are declared local to other procedures.)

The portions of this program that involve system-dependent code, where changes might be necessary because of differences between Pascal compilers and/or differences between operating systems, can be identified by looking at the sections whose numbers are listed under ‘system dependencies’ in the index. Furthermore, the index entries for ‘dirty Pascal’ list all places where the restrictions of Pascal have not been followed perfectly, for one reason or another.

Incidentally, Pascal’s standard round function can be problematical, because it disagrees with the IEEE floating-point standard. Many implementors have therefore chosen to substitute their own home-grown rounding procedure.

4. The program begins with a normal Pascal program heading, whose components will be filled in later, using the conventions of WEB. For example, the portion of the program called ‘ Global variables 13’ below will be replaced by a sequence of variable declarations that starts in x13 of this documentation. In this way, we are able to define each individual global variable when we are prepared to understand what it means; we do not have to define all of the globals at once. Cross references in x13, where it says “See also sections 20, 26, …,” also make it possible to look at the set of all global variables, if desired. Similar remarks apply to the other portions of the program heading.

// this is a \.{WEB} coding trick:
@define mtype => paste!(t, paste!(y, paste!(p, e)))
@format mtype ~ type; // `\&{mtype}' will be equivalent to 
// `\&{type}'
@format type ~ @define; // but ` type ' will not be treated 
// as a reserved word
⟦9 Compiler directives⟧

program

TEX // all file names are defined dynamically

const
    ⟦11 Constants in the outer block⟧

type
    ⟦18 Types in the outer block⟧

var
    ⟦13 Global variables⟧

// this procedure gets things started properly
function initialize() {
    var ⟦19 Local variables for initialization⟧;
    
    ⟦8 Initialize whatever \TeX\ might access⟧
}

⟦57 Basic printing procedures⟧

⟦82 Error handling procedures⟧

5. The overall TEX program begins with the heading just shown, after which comes a bunch of procedure declarations and function declarations. Finally we will get to the main program, which begins with the comment ‘start_here ’. If you want to skip down to the main program now, you can look up ‘start_here ’ in the index. But the author suggests that the best way to understand this program is to follow pretty much the order of TEX’s components as they appear in the WEB description you are now reading, since the present ordering is intended to combine the advantages of the “bottom up” and “top down” approaches to the problem of understanding a somewhat complicated system.

6. For Web2c, labels are not declared in the main program, but we still have to declare the symbolic names.

// go here when \TeX's variables are initialized
@define start_of_TEX => 1
// this label marks the ending of the program
@define final_end => 9999

7. Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when TEX is being installed or when system wizards are fooling around with TEX without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords ‘ 𝑑𝑒𝑏𝑢𝑔𝑔𝑢𝑏𝑒𝑑’, with apologies to people who wish to preserve the purity of English.

Similarly, there is some conditional code delimited by ‘ 𝑠𝑡𝑎𝑡𝑡𝑎𝑡𝑠’ that is intended for use when statistics are to be kept about TEX’s memory usage. The stat tats code also implements diagnostic information for \tracingparagraphs, \tracingpages, and \tracingrestores.

@define debug => ifdef("TEXMF_DEBUG")
@define gubed => endif("TEXMF_DEBUG")
@format debug ~ begin;
@format gubed ~ end;
@define stat => ifdef("STAT")
@define tats => endif("STAT")
@format stat ~ begin;
@format tats ~ end;

8. This program has two important variations: (1) There is a long and slow version called INITEX, which does the extra calculations needed to initialize TEX’s internal tables; and (2) there is a shorter and faster production version, which cuts the initialization to a bare minimum. Parts of the program that are needed in (1) but not in (2) are delimited by the codewords ‘ 𝑖𝑛𝑖𝑡𝑡𝑖𝑛𝑖’ for declarations and by the codewords ‘ 𝐼𝑛𝑖𝑡𝑇𝑖𝑛𝑖’ for executable code. This distinction is helpful for implementations where a run-time switch differentiates between the two versions of the program.

@define init => ifdef("INITEX")
@define tini => endif("INITEX")
@define Init =>
    init!{
        if ini_version {
@define Tini =>
    }
    ;
@format Init ~ begin;
@format Tini ~ end;
@format init ~ begin;
@format tini ~ end;
⟦8 Initialize whatever \TeX\ might access⟧ = ⟦
    ⟦23 Set initial values of key variables⟧

    Init!{
        ⟦189 Initialize table entries (done by \.{INITEX} only)⟧
    }
⟧

9. If the first character of a Pascal comment is a dollar sign, Pascal-H treats the comment as a list of “compiler directives” that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the Pascal debugger when TEX is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases.

⟦9 Compiler directives⟧ = ⟦
    // no range check, catch arithmetic overflow, no debug 
    // overhead
    pascal_preprocessor!(/*$C-,A+,D-*/);

    debug!{
        pascal_preprocessor!(/*$C+,D+*/);
        ;
        // but turn everything on when debugging
    }
⟧

10. This TEX implementation conforms to the rules of the Pascal User Manual published by Jensen and Wirth in 1975, except where system-dependent code is necessary to make a useful system program, and except in another respect where such conformity would unnecessarily obscure the meaning and clutter up the code: We assume that case statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like

casexof1:codefor𝑥=1;3:codefor𝑥=3;othercasescodeforx!=1andx!=3endcases
since most Pascal compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the Pascal-H compiler allows ‘others :’ as a default label, and other Pascals allow syntaxes like ‘&else’ or ‘&otherwise’ or ‘\otherwise:’, etc. The definitions of othercases and endcases should be changed to agree with local conventions. Note that no semicolon appears before endcases in this program, so the definition of endcases should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the case statements of TEX will have to be laboriously extended by listing all remaining cases. People who are stuck with such Pascals have, in fact, done this, successfully but not happily!)

// default for cases not listed explicitly
@define othercases => others:
// follows the default case in an extended case statement
@define endcases => end
@format othercases ~ else;
@format endcases ~ end;

11. The following parameters can be changed at compile time to extend or reduce TEX’s capacity. They may have different values in INITEX and in production versions of TEX.

@define file_name_size => maxint
@define ssup_error_line => 255
// Larger values than 65536 cause the arrays to consume much 
// more memory.
@define ssup_max_strings => 2097151
@define ssup_trie_opcode => 65535
@define ssup_trie_size => 0x3fffff
// Changing this requires changing (un)dumping!
@define ssup_hyph_size => 65535
// Must be not less than hyph_prime !
@define iinf_hyphen_size => 610
// maximum number of internal fonts; this can be increased, 
// but hash_size + max_font_max should not exceed 29000.
@define max_font_max => 9000
// smallest internal font number; must be >= min_quarterword 
// ; do not change this without modifying the dynamic 
// definition of the font arrays.
@define font_base => 0
⟦11 Constants in the outer block⟧ = ⟦
    // smallest index in hash array, i.e., hash_base 
    // Use hash_offset == 0 for compilers which cannot 
    // decrement pointers.
    const hash_offset = 514;

    // space for ``opcodes'' in the hyphenation patterns; 
    // best if relatively prime to 313, 361, and 1009.
    const trie_op_size = 35111;

    // for lower trie_op_hash array bound; must be equal to 
    // - trie_op_size .
    const neg_trie_op_size = -35111;

    // first possible trie op code for any language
    const min_trie_op = 0;

    // largest possible trie opcode for any language
    const max_trie_op = ssup_trie_opcode;

    // this is configurable, for the sake of ML-\TeX
    // string of length file_name_size ; tells where the 
    // string pool appears
    const pool_name = TEXMF_POOL_NAME;

    // the name of this engine
    const engine_name = TEXMF_ENGINE_NAME;

    const inf_mem_bot = 0;

    const sup_mem_bot = 1;

    const inf_main_memory = 3000;

    const sup_main_memory = 256000000;

    const inf_trie_size = 8000;

    const sup_trie_size = ssup_trie_size;

    const inf_max_strings = 3000;

    const sup_max_strings = ssup_max_strings;

    const inf_strings_free = 100;

    const sup_strings_free = sup_max_strings;

    const inf_buf_size = 500;

    const sup_buf_size = 30000000;

    const inf_nest_size = 40;

    const sup_nest_size = 4000;

    const inf_max_in_open = 6;

    const sup_max_in_open = 127;

    const inf_param_size = 60;

    const sup_param_size = 32767;

    const inf_save_size = 600;

    const sup_save_size = 30000000;

    const inf_stack_size = 200;

    const sup_stack_size = 30000;

    const inf_dvi_buf_size = 800;

    const sup_dvi_buf_size = 65536;

    const inf_font_mem_size = 20000;

    //  integer -limited, so 2 could be prepended?
    const sup_font_mem_size = 147483647;

    const sup_font_max = max_font_max;

    // could be smaller, but why?
    const inf_font_max = 50;

    const inf_pool_size = 32000;

    const sup_pool_size = 40000000;

    const inf_pool_free = 1000;

    const sup_pool_free = sup_pool_size;

    const inf_string_vacancies = 8000;

    sup_string_vacancies == sup_pool_size - 23000

    const sup_hash_extra = sup_max_strings;

    const inf_hash_extra = 0;

    const sup_hyph_size = ssup_hyph_size;

    // Must be not less than hyph_prime !
    const inf_hyph_size = iinf_hyphen_size;

    const inf_expand_depth = 10;

    const sup_expand_depth = 10000000;
⟧

12. Like the preceding parameters, the following quantities can be changed at compile time to extend or reduce TEX’s capacity. But if they are changed, it is necessary to rerun the initialization program INITEX to generate new tables for the production TEX program. One can’t simply make helter-skelter changes to the following constants, since certain rather complex initialization numbers are computed from them. They are defined here using WEB macros, instead of being put into Pascal’s const list, in order to emphasize this distinction.

// maximum number of control sequences; it should be at most 
// about ( mem_max - mem_min ) / 10 ; see also font_max 
@define hash_size => 15000
// a prime number equal to about 85\pct! of hash_size 
@define hash_prime => 8501
// another prime for hashing \.{\\hyphenation} exceptions; 
// if you change this, you should also change 
// iinf_hyphen_size .
@define hyph_prime => 607
// the largest allowed character number; must be <= 
// max_quarterword , this refers to UTF16 codepoints that we 
// store in strings, etc; actual character codes can exceed 
// this range, up to biggest_usv 
@define biggest_char => 65535
@define too_big_char => 65536 //  biggest_char + 1 
// the largest Unicode Scalar Value
@define biggest_usv => 0x10ffff
@define too_big_usv => 0x110000 //  biggest_usv + 1 
@define number_usvs => 0x110000 //  biggest_usv + 1 
@define special_char => 0x110001 //  biggest_usv + 2 
// the largest allowed register number; must be <= 
// max_quarterword 
@define biggest_reg => 255
@define number_regs => 256 //  biggest_reg + 1 
@define font_biggest => 255 // the real biggest font
@define number_fonts => font_biggest - font_base + 2
@define number_math_families => 256
@define number_math_fonts =>
    
        number_math_families
        + number_math_families + number_math_families
@define math_font_biggest => number_math_fonts - 1
// size code for the largest size in a family
@define text_size => 0
// size code for the medium size in a family
@define script_size => number_math_families
// size code for the smallest size in a family
@define script_script_size =>
    number_math_families + number_math_families
// the largest hyphenation language
@define biggest_lang => 255
@define too_big_lang => 256 //  biggest_lang + 1 
// hard limit for hyphenatable length; runtime value is 
// max_hyphenatable_length 
@define hyphenatable_length_limit => 4095

13. In case somebody has inadvertently made bad settings of the “constants,” TEX checks them using a global variable called bad .

This is the first of many sections of TEX where global variables are defined.

⟦13 Global variables⟧ = ⟦
    // is some ``constant'' wrong?
    var bad: integer;
⟧

14. Later on we will say ‘if (mem_max >= max_halfword) { bad = 14; } ’, or something similar. (We can’t do that until max_halfword has been defined.)

⟦14 Check the ``constant'' values for consistency⟧ = ⟦
    bad = 0

    if (
        (half_error_line < 30)
        || (half_error_line > error_line - 15)
    ) {
        bad = 1;
    }

    if (max_print_line < 60) {
        bad = 2;
    }

    if (dvi_buf_size % 8 != 0) {
        bad = 3;
    }

    if (mem_bot + 1100 > mem_top) {
        bad = 4;
    }

    if (hash_prime > hash_size) {
        bad = 5;
    }

    if (max_in_open >= 128) {
        bad = 6;
    }

    if (mem_top < 256 + 11) {
        // we will want null_list > 255 
        bad = 7;
    }
⟧

15. Labels are given symbolic names by the following definitions, so that occasional goto statements will be meaningful. We insert the label ‘exit ’ just before the ‘end ’ of a procedure in which we have used the ‘return ’ statement defined below; the label ‘restart ’ is occasionally used at the very beginning of a procedure; and the label ‘reswitch ’ is occasionally used just prior to a case statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the loop construction defined below are commonly exited by going to ‘done ’ or to ‘found ’ or to ‘not_found ’, and they are sometimes repeated by going to ‘continue ’. If two or more parts of a subroutine start differently but end up the same, the shared code may be gathered together at ‘common_ending ’.

Incidentally, this program never declares a label that isn’t actually used, because some fussy Pascal compilers will complain about redundant labels.

@define exit => 10 // go here to leave a procedure
@define restart => 20 // go here to start a procedure again
// go here to start a case statement again
@define reswitch => 21
@define continue => 22 // go here to resume a loop
@define done => 30 // go here to exit a loop
// like done , when there is more than one loop
@define done1 => 31
// for exiting the second loop in a long block
@define done2 => 32
// for exiting the third loop in a very long block
@define done3 => 33
// for exiting the fourth loop in an extremely long block
@define done4 => 34
// for exiting the fifth loop in an immense block
@define done5 => 35
@define done6 => 36 // for exiting the sixth loop in a block
@define found => 40 // go here when you've found it
// like found , when there's more than one per routine
@define found1 => 41
// like found , when there's more than two per routine
@define found2 => 42
@define not_found => 45 // go here when you've found nothing
// like not_found , when there's more than one
@define not_found1 => 46
// like not_found , when there's more than two
@define not_found2 => 47
// like not_found , when there's more than three
@define not_found3 => 48
// like not_found , when there's more than four
@define not_found4 => 49
// go here when you want to merge with another branch
@define common_ending => 50

16. Here are some macros for common programming idioms.

@define negate(#) => # = -# // change the sign of a variable
// repeat over and over until a goto happens
@define loop => while true {
@format loop ~ xclause; // \.{WEB}'s xclause acts like 
// `\ignorespaces while true do \unskip'
@define do_nothing => /*nothing*/ // empty statement
@define return => goto exit // terminate a procedure call
@format return ~ nil;
@define empty => 0 // symbolic name for a null constant

17. [2] The character set. In order to make TEX readily portable to a wide variety of computers, all of its input text is converted to an internal eight-bit code that includes standard ASCII, the “American Standard Code for Information Interchange.” This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user’s external representation just before they are output to a text file.

Such an internal code is relevant to users of TEX primarily because it governs the positions of characters in the fonts. For example, the character ‘A’ has ASCII code 65=0𝑥41, and when TEX typesets this letter it specifies character number 65 in the current font. If that font actually has ‘A’ in a different position, TEX doesn’t know what the real position is; the program that does the actual printing from TEX’s device-independent files is responsible for converting from ASCII to a particular font encoding.

TEX’s internal code also defines the value of constants that begin with a reverse apostrophe; and it provides an index to the \catcode, \mathcode, \uccode, \lccode, and \delcode tables.

18. Characters of text that have been converted to TEX’s internal form are said to be of type ASCII_code , which is a subrange of the integers. For xetex, we rename ASCII_code as UTF16_code . But we also have a new type UTF8_code , used when we construct filenames to pass to the system libraries.

@define ASCII_code => UTF16_code
@define packed_ASCII_code => packed_UTF16_code
⟦18 Types in the outer block⟧ = ⟦
    // 16-bit numbers
    type ASCII_code = 0 .. biggest_char;

    // 8-bit numbers
    type UTF8_code = 0 .. 255;

    // Unicode scalars
    type UnicodeScalar = 0 .. biggest_usv;
⟧

19. The original Pascal compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of TEX has been written under the assumption that the Pascal compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes 0x20 through 0x7E; all of these characters are now available on most computer terminals.

Since we are dealing with more characters than were present in the first Pascal compilers, we have to decide what to call the associated data type. Some Pascals use the original name char for the characters in text files, even though there now are more than 64 such characters, while other Pascals consider char to be a 64-element subrange of a larger data type that has some other name.

In order to accommodate this difference, we shall use the name text_char to stand for the data type of the characters that are converted to and from ASCII_code when they are input and output. We shall also assume that text_char consists of the elements chr(first_text_char) through chr(last_text_char) , inclusive. The following definitions should be adjusted if necessary.

// the data type of characters in text files
@define text_char => ASCII_code
// ordinal number of the smallest element of text_char 
@define first_text_char => 0
// ordinal number of the largest element of text_char 
@define last_text_char => biggest_char
⟦19 Local variables for initialization⟧ = ⟦
    var i: integer;
⟧

20. The TEX processor converts between ASCII code and the user’s external character set by means of arrays xord and xchr that are analogous to Pascal’s ord and chr functions.

⟦13 Global variables⟧ += ⟦
    // dummy variable so tangle doesn't complain; not 
    // actually used
    var xchr: ^text_char;
⟧

21. Since we are assuming that our Pascal system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the xchr array properly, without needing any system-dependent changes. On the other hand, it is possible to implement TEX with less complete character sets, and in such cases it will be necessary to change something here.

22. Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning.

@define null_code => 0x0 // ASCII code that might disappear
// ASCII code used at end of line
@define carriage_return => 0xd
// ASCII code that many systems prohibit in text files
@define invalid_code => 0x7f

23. The ASCII code is “standard” only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix C of The TEXbook gives a complete specification of the intended correspondence between characters and TEX’s internal representation.

If TEX is being used on a garden-variety Pascal for which only standard ASCII codes will appear in the input and output files, it doesn’t really matter what codes are specified in xchr[0 .. 0x1f] , but the safest policy is to blank everything out by using the code shown below.

However, other settings of xchr will make TEX more friendly on computers that have an extended character set, so that users can type things like ‘^^Z’ instead of ‘\ne’. People with extended character sets can assign codes arbitrarily, giving an xchr equivalent to whatever characters the users of TEX are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than 0x20. To get the most “permissive” character set, change " " on the right of these assignment statements to chr(i) .

⟦23 Set initial values of key variables⟧ = ⟦
    /*nothing*/

24. The following system-independent code makes the xord array contain a suitable inverse to the information in xchr . Note that if xchr[i] == xchr[j] where i < j < 0x7f , the value of xord[xchr[i]] will turn out to be j or more; hence, standard ASCII code numbers will be used instead of codes below 0x20 in case there is a coincidence.

⟦23 Set initial values of key variables⟧ += ⟦
    /*nothing*/

25. [3] Input and output. The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of “real” programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let’s get it over with.

The basic operations we need to do are (1) inputting and outputting of text, to or from a file or the user’s terminal; (2) inputting and outputting of eight-bit bytes, to or from a file; (3) instructing the operating system to initiate (“open”) or to terminate (“close”) input or output from a specified file; (4) testing whether the end of an input file has been reached.

TEX needs to deal with two kinds of files. We shall use the term alpha_file for a file that contains textual data, and the term byte_file for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users.

The program actually makes use also of a third kind of file, called a word_file , when dumping and reloading base information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined.

⟦18 Types in the outer block⟧ += ⟦
    // unsigned one-byte quantity
    type eight_bits = 0 .. 255;

    // files that contain textual data
    type alpha_file = packed file of text_char;

    // files that contain binary data
    type byte_file = packed file of eight_bits;
⟧

26. Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in Pascal, i.e., the routines called get , put , eof , and so on. But standard Pascal does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement TEX; some sort of extension to Pascal’s ordinary reset and rewrite is crucial for our purposes. We shall assume that name_of_file is a variable of an appropriate type such that the Pascal run-time system being used to implement TEX can open a file whose external name is specified by name_of_file .

⟦13 Global variables⟧ += ⟦
    // we build filenames in utf8 to pass to the OS
    var name_of_file: ^UTF8_code;

    // but sometimes we need a UTF16 version of the name
    var name_of_file16: ^UTF16_code;

    // this many characters are actually relevant in 
    // name_of_file (the rest are blank)
    var name_length: 0 .. file_name_size;

    var name_length16: 0 .. file_name_size;
⟧

27. All of the file opening functions are defined in C.

28. And all the file closing routines as well.

29. Binary input and output are done with Pascal’s ordinary get and put procedures, so we don’t have to make any other special arrangements for binary I/O. Text output is also easy to do with standard Pascal routines. The treatment of text input is more difficult, however, because of the necessary translation to ASCII_code values. TEX’s conventions should be efficient, and they should blend nicely with the user’s operating environment.

30. Input from text files is read one line at a time, using a routine called input_ln . This function is defined in terms of global variables called buffer , first , and last that will be described in detail later; for now, it suffices for us to know that buffer is an array of ASCII_code values, and that first and last are indices into this array representing the beginning and ending of a line of text.

⟦13 Global variables⟧ += ⟦
    // lines of characters being read
    var buffer: ^UnicodeScalar;

    // the first unused position in buffer 
    var first: 0 .. buf_size;

    // end of the line just input to buffer 
    var last: 0 .. buf_size;

    // largest index used in buffer 
    var max_buf_stack: 0 .. buf_size;
⟧

31. The input_ln function brings the next line of input from the specified file into available positions of the buffer array and returns the value true , unless the file has already been entirely read, in which case it returns false and sets last = first . In general, the ASCII_code numbers that represent the next line of the file are input into buffer[first] , buffer[first + 1] , …, buffer[last - 1] ; and the global variable last is set equal to first plus the length of the line. Trailing blanks are removed from the line; thus, either last == first (in which case the line was entirely blank) or buffer[last - 1] != ord!(" ") .

An overflow error is given, however, if the normal actions of input_ln would make last >= buf_size ; this is done so that other parts of TEX can safely look at the contents of buffer[last + 1] without overstepping the bounds of the buffer array. Upon entry to input_ln , the condition first < buf_size will always hold, so that there is always room for an “empty” line.

The variable max_buf_stack , which is used to keep track of how large the buf_size parameter must be to accommodate the present job, is also kept up to date by input_ln .

If the bypass_eoln parameter is true , input_ln will do a get before looking at the first character of the line; this skips over an eoln that was in f^ . The procedure does not do a get when it reaches the end of the line; therefore it can be used to acquire input from the user’s terminal as well as from ordinary text files.

Standard Pascal says that a file should have eoln immediately before eof , but TEX needs only a weaker restriction: If eof occurs in the middle of a line, the system function eoln should return a true result (even though f^ will be undefined).

Since the inner loop of input_ln is part of TEX’s “inner loop”—each character of input comes in at this place—it is wise to reduce system overhead by making use of special routines that read in an entire array of characters at once, if such routines are available. The following code uses standard Pascal to illustrate what needs to be done, but finer tuning is often possible at well-developed Pascal sites.

We define input_ln in C, for efficiency. Nevertheless we quote the module ‘Report overflow of the input buffer, and abort’ here in order to make WEAVE happy, since part of that module is needed by e-TeX.

/*
    ⟦35 Report overflow of the input buffer, and abort⟧
*/

32. The user’s terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called term_in , and when it is considered an output file the file variable is term_out .

@define term_out => stdout // the terminal as an output file
⟦13 Global variables⟧ += ⟦
    init!{
        // are we \.{INITEX}?
        var ini_version: boolean;
        // was the dump name option used?
        var dump_option: boolean;
        // was a \.{\%\AM format} line seen?
        var dump_line: boolean;
    }

    // format name for terminal display
    var dump_name: const_cstring;

    var term_in: unicode_file;

    // temporary for setup
    var bound_default: integer;

    // temporary for setup
    var bound_name: const_cstring;

    // smallest index in the mem array dumped by \.{INITEX}; 
    // must not be less than mem_min 
    var mem_bot: integer;

    // total memory words allocated in initex
    var main_memory: integer;

    //  mem_min = mem_bot - extra_mem_bot except in 
    // \.{INITEX}
    var extra_mem_bot: integer;

    // smallest index in \TeX's internal mem array; must be 
    // min_halfword or more; must be equal to mem_bot in 
    // \.{INITEX}, otherwise <= mem_bot 
    var mem_min: integer;

    // largest index in the mem array dumped by \.{INITEX}; 
    // must be substantially larger than mem_bot , equal to 
    // mem_max in \.{INITEX}, else not greater than mem_max 
    var mem_top: integer;

    //  mem_max = mem_top + extra_mem_top except in 
    // \.{INITEX}
    var extra_mem_top: integer;

    // greatest index in \TeX's internal mem array; must be 
    // strictly less than max_halfword ; must be equal to 
    // mem_top in \.{INITEX}, otherwise >= mem_top 
    var mem_max: integer;

    // width of context lines on terminal error messages
    var error_line: integer;

    // width of first lines of contexts in terminal error 
    // messages; should be between 30 and error_line - 15 
    var half_error_line: integer;

    // width of longest text lines output; should be at 
    // least 60
    var max_print_line: integer;

    // maximum number of strings; must not exceed 
    // max_halfword 
    var max_strings: integer;

    // strings available after format loaded
    var strings_free: integer;

    // the minimum number of characters that should be 
    // available for the user's control sequences and font 
    // names, after \TeX's own error messages are stored
    var string_vacancies: integer;

    // maximum number of characters in strings, including 
    // all error messages and help texts, and the names of 
    // all fonts and control sequences; must exceed 
    // string_vacancies by the total length of \TeX's own 
    // strings, which is currently about 23000
    var pool_size: integer;

    // pool space free after format loaded
    var pool_free: integer;

    // number of words of font_info for all fonts
    var font_mem_size: integer;

    // maximum internal font number; ok to exceed 
    // max_quarterword and must be at most font_base + 
    // max_font_max 
    var font_max: integer;

    // loop variable for initialization
    var font_k: integer;

    // maximum number of hyphen exceptions
    var hyph_size: integer;

    // space for hyphenation patterns; should be larger for 
    // \.{INITEX} than it is in production versions of \TeX. 
    // 50000 is needed for English, German, and Portuguese.
    var trie_size: integer;

    // maximum number of characters simultaneously present 
    // in current lines of open files and in control 
    // sequences between \.{\\csname} and \.{\\endcsname}; 
    // must not exceed max_halfword 
    var buf_size: integer;

    // maximum number of simultaneous input sources
    var stack_size: integer;

    // maximum number of input files and error insertions 
    // that can be going on simultaneously
    var max_in_open: integer;

    // maximum number of simultaneous macro parameters
    var param_size: integer;

    // maximum number of semantic levels simultaneously 
    // active
    var nest_size: integer;

    // space for saving values outside of current group; 
    // must be at most max_halfword 
    var save_size: integer;

    // size of the output buffer; must be a multiple of 8
    var dvi_buf_size: integer;

    // limits recursive calls to the expand procedure
    var expand_depth: integer;

    // parse the first line for options
    var parse_first_line_p: cinttype;

    // format messages as file:line:error
    var file_line_error_style_p: cinttype;

    // make all characters printable by default
    var eight_bit_p: cinttype;

    // stop at first error
    var halt_on_error_p: cinttype;

    // current filename is quoted
    // Variables for source specials
    var quoted_filename: boolean;

    // Whether src_specials are enabled at all
    var src_specials_p: boolean;

    var insert_src_special_auto: boolean;

    var insert_src_special_every_par: boolean;

    var insert_src_special_every_parend: boolean;

    var insert_src_special_every_cr: boolean;

    var insert_src_special_every_math: boolean;

    var insert_src_special_every_hbox: boolean;

    var insert_src_special_every_vbox: boolean;

    var insert_src_special_every_display: boolean;
⟧

33. Here is how to open the terminal files. t_open_out does nothing. t_open_in , on the other hand, does the work of “rescanning,” or getting any command line arguments the user has provided. It’s defined in C.

// output already open for text output
@define t_open_out => /*nothing*/

34. Sometimes it is necessary to synchronize the input/output mixture that happens on the user’s terminal, and three system-dependent procedures are used for this purpose. The first of these, update_terminal , is called when we want to make sure that everything we have output to the terminal so far has actually left the computer’s internal buffers and been sent. The second, clear_terminal , is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, wake_up_terminal , is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified with UNIX. update_terminal does an fflush . clear_terminal is redefined to do nothing, since the user should control the terminal.

@define update_terminal => fflush(term_out)
@define clear_terminal => do_nothing
// cancel the user's cancellation of output
@define wake_up_terminal => do_nothing

35. We need a special routine to read the first line of TEX input from the user’s terminal. This line is different because it is read before we have opened the transcript file; there is sort of a “chicken and egg” problem here. If the user types ‘\input paper’ on the first line, or if some macro invoked by that line does such an \input, the transcript file will be named ‘paper.log’; but if no \input commands are performed during the first line of terminal input, the transcript file will acquire its default name ‘texput.log’. (The transcript file will not contain error messages generated by the first line before the first \input command.)

The first line is even more special if we are lucky enough to have an operating system that treats TEX differently from a run-of-the-mill Pascal object program. It’s nice to let the user start running a TEX job by typing a command line like ‘tex paper’; in such a case, TEX will operate as if the first line of input were ‘paper’, i.e., the first line will consist of the remainder of the command line, after the part that invoked TEX.

The first line is special also because it may be read before TEX has input a format file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. (If the Pascal compiler does not support non-local goto , the statement ‘goto final_end ’ should be replaced by something that quietly terminates the program.)

Routine is implemented in C; part of module is, however, needed for e-TeX.

⟦35 Report overflow of the input buffer, and abort⟧ = ⟦
    {
        cur_input.loc_field = first;
        cur_input.limit_field = last - 1;
        overflow(strpool!("buffer size"), buf_size);
    }
⟧

36. Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications:

1) It should open file term_in for input from the terminal. (The file term_out will already be open for output to the terminal.)

2) If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with ‘**’, and the first line of input should be whatever is typed in response.

3) The first line of input, which might or might not be a command line, should appear in locations first to last - 1 of the buffer array.

4) The global variable loc should be set so that the character to be read next by TEX is in buffer[loc] . This character should not be blank, and we should have loc < last .

(It may be necessary to prompt the user several times before a non-blank line comes in. The prompt is ‘**’ instead of the later ‘*’ because the meaning is slightly different: ‘\input’ need not be typed immediately after ‘**’.)

// location of first unread character in buffer 
@define loc => cur_input.loc_field

37. The following program does the required initialization. Iff anything has been specified on the command line, then t_open_in will return with last > first .

// gets the terminal input started
function init_terminal(): boolean {
    label exit;
    
    t_open_in;
    if (last > first) {
        loc = first;
        while ((loc < last) && (buffer[loc] == " ")) {
            incr(loc);
        }
        if (loc < last) {
            init_terminal = true;
            goto exit;
        }
    }
    loop {
        wake_up_terminal;
        write(term_out, "**");
        update_terminal;
        // this shouldn't happen
        if (!input_ln(term_in, true)) {
            write_ln(term_out);
            write_ln(
              term_out,
              "! End of file on the terminal... why?",
            );
            init_terminal = false;
            return;
        }
        loc = first;
        while ((loc < last) && (buffer[loc] == ord!(" "))) {
            incr(loc);
        }
        if (loc < last) {
            init_terminal = true;
            // return unless the line was all blank
            return;
        }
        write_ln(
          term_out,
          "Please type the name of your input file.",
        );
    }
  exit:
}

38. [4] String handling. Control sequence names and diagnostic messages are variable-length strings of eight-bit characters. Since Pascal does not have a well-developed string mechanism, TEX does all of its string processing by homegrown methods.

Elaborate facilities for dynamic strings are not needed, so all of the necessary operations can be handled with a simple data structure. The array str_pool contains all of the (eight-bit) ASCII codes in all of the strings, and the array str_start contains indices of the starting points of each string. Strings are referred to by integer numbers, so that string number s comprises the characters str_pool[j] for str_start_macro[s] <= j < str_start_macro[s + 1] . Additional integer variables pool_ptr and str_ptr indicate the number of entries used so far in str_pool and str_start , respectively; locations str_pool[pool_ptr] and str_start_macro[str_ptr] are ready for the next string to be allocated.

String numbers 0 to 255 are reserved for strings that correspond to single ASCII characters. This is in accordance with the conventions of WEB, which converts single-character strings into the ASCII code number of the single character involved, while it converts other strings into integers and builds a string pool file. Thus, when the string constant "." appears in the program below, WEB converts it into the integer 46, which is the ASCII code for a period, while WEB will convert a string like "hello" into some integer greater than 255. String number 46 will presumably be the single character ‘.’; but some ASCII codes have no standard visible representation, and TEX sometimes needs to be able to print an arbitrary ASCII character, so the first 256 strings are used to specify exactly what should be printed for each of the 256 possibilities.

Elements of the str_pool array must be ASCII codes that can actually be printed; i.e., they must have an xchr equivalent in the local character set. (This restriction applies only to preloaded strings, not to those generated dynamically by the user.)

Some Pascal compilers won’t pack integers into a single byte unless the integers lie in the range -128 .. 127 . To accommodate such systems we access the string pool only via macros that can easily be redefined.

// convert from ASCII_code to packed_ASCII_code 
@define si(#) => #
// convert from packed_ASCII_code to ASCII_code 
@define so(#) => #
@define str_start_macro(#) => str_start[(#) - too_big_char]
⟦18 Types in the outer block⟧ += ⟦
    // for variables that point into str_pool 
    const pool_pointer = integer;

    // for variables that point into str_start 
    type str_number = 0 .. ssup_max_strings;

    // elements of str_pool array
    type packed_ASCII_code = 0 .. biggest_char;
⟧

39.

⟦13 Global variables⟧ += ⟦
    // the characters
    var str_pool: ^packed_ASCII_code;

    // the starting pointers
    var str_start: ^pool_pointer;

    // first unused position in str_pool 
    var pool_ptr: pool_pointer;

    // number of the current string being created
    var str_ptr: str_number;

    // the starting value of pool_ptr 
    var init_pool_ptr: pool_pointer;

    // the starting value of str_ptr 
    var init_str_ptr: str_number;
⟧

40. Several of the elementary string operations are performed using WEB macros instead of Pascal procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string.

// the number of characters in string number s 
function length(s: str_number): integer {
    if ((s >= 0x10000)) {
        length = str_start_macro(s + 1) - str_start_macro(s);
    } else if ((s >= 0x20) && (s < 0x7f)) {
        length = 1;
    } else if ((s <= 0x7f)) {
        length = 3;
    } else if ((s < 0x100)) {
        length = 4;
    } else {
        length = 8;
    }
}

41. The length of the current string is called cur_length :

@define cur_length => (pool_ptr - str_start_macro(str_ptr))

42. Strings are created by appending character codes to str_pool . The append_char macro, defined here, does not check to see if the value of pool_ptr has gotten too high; this test is supposed to be made before append_char is used. There is also a flush_char macro, which erases the last character appended.

To test if there is room to append l more characters to str_pool , we shall write str_room(l) , which aborts TEX and gives an apologetic error message if there isn’t enough room.

// put ASCII_code \# at the end of str_pool 
@define append_char(#) =>
    {
        str_pool[pool_ptr] = si(#);
        incr(pool_ptr);
    }
// forget the last character in the pool
@define flush_char => decr(pool_ptr)
// make sure that the pool hasn't overflowed
@define str_room(#) =>
    {
        if (pool_ptr + # > pool_size) {
            overflow(
              strpool!("pool size"),
              pool_size - init_pool_ptr,
            );
        }
    }

43. Once a sequence of characters has been appended to str_pool , it officially becomes a string when the function make_string is called. This function returns the identification number of the new string as its value.

// current string enters the pool
function make_string(): str_number {
    if (str_ptr == max_strings) {
        overflow(
          strpool!("number of strings"),
          max_strings - init_str_ptr,
        );
    }
    incr(str_ptr);
    str_start_macro(str_ptr) = pool_ptr;
    make_string = str_ptr - 1;
}

44. To destroy the most recently made string, we say flush_string .

@define flush_string =>
    {
        decr(str_ptr);
        pool_ptr = str_start_macro(str_ptr);
    }
// append an existing string to the current string
function append_str(s: str_number) {
    var i: integer, j: pool_pointer;
    
    i = length(s);
    str_room(i);
    j = str_start_macro(s);
    while ((i > 0)) {
        append_char(str_pool[j]);
        incr(j);
        decr(i);
    }
}

45. The following subroutine compares string s with another string of the same length that appears in buffer starting at position k ; the result is true if and only if the strings are equal. Empirical tests indicate that str_eq_buf is used in such a way that it tends to return true about 80 percent of the time.

// test equality of strings
function str_eq_buf(s: str_number, k: integer): boolean {
    label not_found; // loop exit
    var
      j: pool_pointer, // running index
      result: boolean; // result of comparison
    
    j = str_start_macro(s);
    while (j < str_start_macro(s + 1)) {
        if (buffer[k] >= 0x10000) {
            if (
                so(str_pool[j])
                != 0xd800 + (buffer[k] - 0x10000) div 0x400
            ) {
                result = false;
                goto not_found;
            } else if (
                so(str_pool[j + 1])
                != 0xdc00 + (buffer[k] - 0x10000) % 0x400
            ) {
                result = false;
                goto not_found;
            } else {
                incr(j);
            }
        } else if (so(str_pool[j]) != buffer[k]) {
            result = false;
            goto not_found;
        }
        incr(j);
        incr(k);
    }
    result = true;
  not_found:
    str_eq_buf = result;
}

46. Here is a similar routine, but it compares two strings in the string pool, and it does not assume that they have the same length.

// test equality of strings
function str_eq_str(s, t: str_number): boolean {
    label not_found; // loop exit
    var
      j, k: pool_pointer, // running indices
      result: boolean; // result of comparison
    
    result = false;
    if (length(s) != length(t)) {
        goto not_found;
    }
    if ((length(s) == 1)) {
        if (s < 65536) {
            if (t < 65536) {
                if (s != t) {
                    goto not_found;
                }
            } else {
                if (s != str_pool[str_start_macro(t)]) {
                    goto not_found;
                }
            }
        } else {
            if (t < 65536) {
                if (str_pool[str_start_macro(s)] != t) {
                    goto not_found;
                }
            } else {
                if (
                    str_pool[str_start_macro(s)]
                    != str_pool[str_start_macro(t)]
                ) {
                    goto not_found;
                }
            }
        }
    } else {
        j = str_start_macro(s);
        k = str_start_macro(t);
        while (j < str_start_macro(s + 1)) {
            if (str_pool[j] != str_pool[k]) {
                goto not_found;
            }
            incr(j);
            incr(k);
        }
    }
    result = true;
  not_found:
    str_eq_str = result;
}

47. The initial values of str_pool , str_start , pool_ptr , and str_ptr are computed by the INITEX program, based in part on the information that WEB has output while processing TEX.

⟦1685 Declare additional routines for string recycling⟧

init!{
    // initializes the string pool, but returns false if 
    // something goes wrong
    function get_strings_started(): boolean {
        label done, exit;
        var
          g: str_number; // garbage
        
        pool_ptr = 0;
        str_ptr = 0;
        str_start[0] = 0;
        ⟦48 Make the first 256 strings⟧
        ⟦51 Read the other strings from the \.{TEX.POOL} file and return |true|, or give an error message and return |false|⟧
      exit:
    }
}

48. The first 65536 strings will consist of a single character only. But we don’t actually make them; they’re simulated on the fly.

⟦48 Make the first 256 strings⟧ = ⟦
    {
        str_ptr = too_big_char;
        str_start_macro(str_ptr) = pool_ptr;
    }
⟧

49. The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like ‘^^A’ unless a system-dependent change is made here. Installations that have an extended character set, where for example xchr[0x1a] == "^^Z" , would like string 0x1A to be printed as the single character 0x1A instead of the three characters 0x5E, 0x5E, 0x5A (^^Z). On the other hand, even people with an extended character set will want to represent string 0xD by ^^M, since 0xD is carriage_return ; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files.

Unprintable characters of codes 128–255 are, similarly, rendered ^^80^^ff.

The boolean expression defined here should be true unless TEX internal code number k corresponds to a non-troublesome visible symbol in the local character set. An appropriate formula for the extended character set recommended in The TEXbook would, for example, be ‘k in [0, 0x8 .. 0xa, 0xc, 0xd, 0x1b, 0x7f .. 0xff] ’. If character k cannot be printed, and k < 0x80 , then character k + 0x40 or k - 0x40 must be printable; moreover, ASCII codes [0x21 .. 0x26, 0x30 .. 0x39, 0x5e, 0x61 .. 0x66, 0x70 .. 0x79] must be printable. Thus, at least 80 printable characters are needed.

50. When the WEB system program called TANGLE processes the TEX.WEB description that you are now reading, it outputs the Pascal program TEX.PAS and also a string pool file called TEX.POOL. The INITEX program reads the latter file, where each string appears as a two-digit decimal length followed by the string itself, and the information is recorded in TEX’s string memory.

⟦13 Global variables⟧ += ⟦
    init!{
        // the string-pool file output by \.{TANGLE}
        var pool_file: alpha_file;
    }
⟧

51.

⟦51 Read the other strings from the \.{TEX.POOL} file and return |true|, or give an error message and return |false|⟧ = ⟦
    g = loadpoolstrings((pool_size - string_vacancies))

    if (g == 0) {
        wake_up_terminal;
        write_ln(
          term_out,
          "! You have to increase POOLSIZE.",
        );
        get_strings_started = false;
        return;
    }

    get_strings_started = true
⟧

52. Empty module

53. Empty module

54. [5] On-line and off-line printing. Messages that are sent to a user’s terminal and to the transcript-log file are produced by several ‘print ’ procedures. These procedures will direct their output to a variety of places, based on the setting of the global variable selector , which has the following possible values:

term_and_log , the normal setting, prints on the terminal and on the transcript file.

log_only , prints only on the transcript file.

term_only , prints only on the terminal.

no_print , doesn’t print at all. This is used only in rare cases before the transcript file is open.

pseudo , puts output into a cyclic buffer that is used by the show_context routine; when we get to that routine we shall discuss the reasoning behind this curious mode.

new_string , appends the output to the current string in the string pool.

0 to 15, prints on one of the sixteen files for \write output.

The symbolic names ‘term_and_log ’, etc., have been assigned numeric codes that satisfy the convenient relations no_print + 1 == term_only , no_print + 2 == log_only , term_only + 2 == log_only + 1 == term_and_log .

Three additional global variables, tally and term_offset and file_offset , record the number of characters that have been printed since they were most recently cleared to zero. We use tally to record the length of (possibly very long) stretches of printing; term_offset and file_offset , on the other hand, keep track of how many characters have appeared so far on the current line that has been output to the terminal or to the transcript file, respectively.

//  selector setting that makes data disappear
@define no_print => 16
// printing is destined for the terminal only
@define term_only => 17
// printing is destined for the transcript file only
@define log_only => 18
@define term_and_log => 19 // normal selector setting
// special selector setting for show_context 
@define pseudo => 20
// printing is deflected to the string pool
@define new_string => 21
@define max_selector => 21 // highest selector setting
⟦13 Global variables⟧ += ⟦
    // transcript of \TeX\ session
    var log_file: alpha_file;

    // where to print a message
    var selector: 0 .. max_selector;

    // digits in a number being output
    var dig: array [0 .. 22] of 0 .. 15;

    // the number of characters recently printed
    var tally: integer;

    // the number of characters on the current terminal line
    var term_offset: 0 .. max_print_line;

    // the number of characters on the current file line
    var file_offset: 0 .. max_print_line;

    // circular buffer for pseudoprinting
    var trick_buf: array [0 .. ssup_error_line] of
      ASCII_code;

    // threshold for pseudoprinting, explained later
    var trick_count: integer;

    // another variable for pseudoprinting
    var first_count: integer;
⟧

55.

⟦55 Initialize the output routines⟧ = ⟦
    selector = term_only

    tally = 0

    term_offset = 0

    file_offset = 0

56. Macro abbreviations for output to the terminal and to the log file are defined here for convenience. Some systems need special conventions for terminal output, and it is possible to adhere to those conventions by changing wterm , wterm_ln , and wterm_cr in this section.

@define wterm(#) => write(term_out, #)
@define wterm_ln(#) => write_ln(term_out, #)
@define wterm_cr => write_ln(term_out)
@define wlog(#) => write(log_file, #)
@define wlog_ln(#) => write_ln(log_file, #)
@define wlog_cr => write_ln(log_file)

57. To end a line of text output, we call print_ln .

⟦57 Basic printing procedures⟧ = ⟦
    // prints an end-of-line
    function print_ln() {
        case selector {
          term_and_log:
            wterm_cr;
            wlog_cr;
            term_offset = 0;
            file_offset = 0;
          log_only:
            wlog_cr;
            file_offset = 0;
          term_only:
            wterm_cr;
            term_offset = 0;
          no_print, pseudo, new_string:
            do_nothing;
          othercases:
            write_ln(write_file[selector]);
        }
        //  tally is not affected
    }
⟧

58. The print_raw_char procedure sends one character to the desired destination, using the xchr array to map it into an external character compatible with input_ln . All printing comes through print_ln , print_char or print_visible_char . When printing a multi-byte character, the boolean parameter incr_offset is set false except for the very last byte, to avoid calling print_ln in the middle of such character.

⟦57 Basic printing procedures⟧ += ⟦
    // prints a single character
    function print_raw_char(
      s: ASCII_code,
      incr_offset: boolean,
    ) {
        // label is not used but nonetheless kept (for other 
        // changes?)
        label exit;
        
        case selector {
          term_and_log:
            wterm(xchr[s]);
            wlog(xchr[s]);
            if (incr_offset) {
                incr(term_offset);
                incr(file_offset);
            }
            if (term_offset == max_print_line) {
                wterm_cr;
                term_offset = 0;
            }
            if (file_offset == max_print_line) {
                wlog_cr;
                file_offset = 0;
            }
          log_only:
            wlog(xchr[s]);
            if (incr_offset) {
                incr(file_offset);
            }
            if (file_offset == max_print_line) {
                print_ln;
            }
          term_only:
            wterm(xchr[s]);
            if (incr_offset) {
                incr(term_offset);
            }
            if (term_offset == max_print_line) {
                print_ln;
            }
          no_print:
            do_nothing;
          pseudo:
            if (tally < trick_count) {
                trick_buf[tally % error_line] = s;
            }
          new_string:
            if (pool_ptr < pool_size) {
                append_char(s);
            }
            // we drop characters if the string space is 
            // full
          othercases:
            write(write_file[selector], xchr[s]);
        }
        incr(tally);
      exit:
    }
⟧

59. The print_char procedure sends one character to the desired destination. Control sequence names, file names and string constructed with \string might contain ASCII_code values that can’t be printed using print_raw_char . These characters will be printed in three- or four-symbol form like ‘^^A’ or ‘^^e4’, unless the -8bit option is enabled. Output that goes to the terminal and/or log file is treated differently when it comes to determining whether a character is printable.

@define print_visible_char(#) => print_raw_char(#, true)
@define print_lc_hex(#) =>
    l = #;
    if (l < 10) {
        print_visible_char(l + ord!("0"));
    } else {
        print_visible_char(l - 10 + ord!("a"));
    }
⟦57 Basic printing procedures⟧ += ⟦
    // prints a single character
    function print_char(s: integer) {
        label exit;
        var l: small_number;
        
        // ``printing'' to a new string, encode as UTF-16 
        // rather than UTF-8
        if ((selector > pseudo) && (!doing_special)) {
            if (s >= 0x10000) {
                print_visible_char(
                  0xd800 + (s - 0x10000) div 0x400,
                );
                print_visible_char(
                  0xdc00 + (s - 0x10000) % 0x400,
                );
            } else {
                print_visible_char(s);
            }
            return;
        }
        if (⟦270 Character |s| is the current new-line character⟧) {
            if (selector < pseudo) {
                print_ln;
                return;
            }
        }
        // control char: \.{\^\^X}
        if (
            (s < 32)
            && (eight_bit_p == 0) && (!doing_special)
        ) {
            print_visible_char(ord!("^"));
            print_visible_char(ord!("^"));
            print_visible_char(s + 64);
        } else // printable ASCII
        if (s < 127) {
            print_visible_char(s);
        } else // DEL
        if ((s == 127)) {
            if ((eight_bit_p == 0) && (!doing_special)) {
                print_visible_char(ord!("^"));
                print_visible_char(ord!("^"));
                print_visible_char(ord!("?"));
            } else {
                print_visible_char(s);
            }
        } else // C1 controls: \.{\^\^xx}
        if (
            (s < 0xa0)
            && (eight_bit_p == 0) && (!doing_special)
        ) {
            print_visible_char(ord!("^"));
            print_visible_char(ord!("^"));
            print_lc_hex((s % 0x100) div 0x10);
            print_lc_hex(s % 0x10);
        } else if (selector == pseudo) {
            // Don't UTF8-encode text in trick_buf , we'll 
            // handle that when printing error context.
            print_visible_char(s);
        } else {
            //  char >= 128 : encode as UTF8
            if (s < 0x800) {
                print_raw_char(0xc0 + s div 0x40, false);
                print_raw_char(0x80 + s % 0x40, true);
            } else if (s < 0x10000) {
                print_raw_char(
                  0xe0 + (s div 0x1000),
                  false,
                );
                print_raw_char(
                  0x80 + (s % 0x1000) div 0x40,
                  false,
                );
                print_raw_char(0x80 + (s % 0x40), true);
            } else {
                print_raw_char(
                  0xf0 + (s div 0x40000),
                  false,
                );
                print_raw_char(
                  0x80 + (s % 0x40000) div 0x1000,
                  false,
                );
                print_raw_char(
                  0x80 + (s % 0x1000) div 0x40,
                  false,
                );
                print_raw_char(0x80 + (s % 0x40), true);
            }
        }
      exit:
    }
⟧

60.

@define native_room(#) =>
    while (native_text_size <= native_len + #) {
        native_text_size = native_text_size + 128;
        native_text = xrealloc(
          native_text,
          native_text_size * sizeof(UTF16_code),
        );
    }
@define append_native(#) =>
    {
        native_text[native_len] = #;
        incr(native_len);
    }

61.

⟦13 Global variables⟧ += ⟦
    var doing_special: boolean;

    // buffer for collecting native-font strings
    var native_text: ^UTF16_code;

    // size of buffer
    var native_text_size: integer;

    var native_len: integer;

    var save_native_len: integer;
⟧

62.

⟦23 Set initial values of key variables⟧ += ⟦
    doing_special = false

    native_text_size = 128

    native_text = xmalloc(
      native_text_size * sizeof(UTF16_code),
    )
⟧

63. An entire string is output by calling print . Note that if we are outputting the single standard ASCII character c, we could call print(ord!("c")) , since ord!("c") == 99 is the number of a single-character string, as explained above. But print_char(ord!("c")) is quicker, so TEX goes directly to the print_char routine when it knows that this is safe. (The present implementation assumes that it is always safe to print a visible ASCII character.)

⟦57 Basic printing procedures⟧ += ⟦
    // prints string s 
    function print(s: integer) {
        label exit;
        var
          j: pool_pointer, // current character code 
          // position
          nl: integer; // new-line character to restore
        
        if (s >= str_ptr) {
            // this can't happen
            s = strpool!("???");
        } else if (s < biggest_char) {
            if (s < 0) {
                // can't happen
                s = strpool!("???");
            } else {
                if (selector > pseudo) {
                    print_char(s);
                    // internal strings are not expanded
                    return;
                }
                if ((⟦270 Character |s| is the current new-line character⟧)) {
                    if (selector < pseudo) {
                        print_ln;
                        return;
                    }
                }
                nl = new_line_char;
                new_line_char = -1;
                print_char(s);
                new_line_char = nl;
                return;
            }
        }
        j = str_start_macro(s);
        while (j < str_start_macro(s + 1)) {
            if (
                (so(str_pool[j]) >= 0xd800)
                && (so(str_pool[j]) <= 0xdbff)
                && (j + 1 < str_start_macro(s + 1))
                && (so(str_pool[j + 1]) >= 0xdc00)
                && (so(str_pool[j + 1]) <= 0xdfff)
            ) {
                print_char(
                  
                      0x10000
                      + (so(str_pool[j]) - 0xd800)
                      * 0x400 + so(str_pool[j + 1]) - 0xdc00
                  ,
                );
                j = j + 2;
            } else {
                print_char(so(str_pool[j]));
                incr(j);
            }
        }
      exit:
    }
⟧

64. Old versions of TEX needed a procedure called slow_print whose function is now subsumed by print and the new functionality of print_char and print_visible_char . We retain the old name slow_print here as a possible aid to future software archæologists.

@define slow_print => print

65. Here is the very first thing that TEX prints: a headline that identifies the version number and format package. The term_offset variable is temporarily incorrect, but the discrepancy is not serious since we assume that this part of the program is system dependent.

⟦55 Initialize the output routines⟧ += ⟦
    if (
        src_specials_p
        || file_line_error_style_p || parse_first_line_p
    ) {
        wterm(banner_k);
    } else {
        wterm(banner);
    }

    wterm(version_string)

    if (format_ident == 0) {
        wterm_ln(" (preloaded format=", dump_name, ")");
    } else {
        slow_print(format_ident);
        print_ln;
    }

    if (shellenabledp) {
        wterm(" ");
        if (restrictedshell) {
            wterm("restricted ");
        }
        wterm_ln("\\write18 enabled.");
    }

    if (src_specials_p) {
        wterm_ln(" Source specials enabled.");
    }

    if (translate_filename) {
        wterm(" (WARNING: translate-file \"");
        fputs(translate_filename, stdout);
        wterm_ln("\" ignored)");
    }

    update_terminal
⟧

66. The procedure print_nl is like print , but it makes sure that the string appears at the beginning of a new line.

⟦57 Basic printing procedures⟧ += ⟦
    // prints string s at beginning of line
    function print_nl(s: str_number) {
        if (
            ((term_offset > 0) && (odd(selector)))
            || ((file_offset > 0) && (selector >= log_only))
        ) {
            print_ln;
        }
        print(s);
    }
⟧

67. The procedure print_esc prints a string that is preceded by the user’s escape character (which is usually a backslash).

⟦57 Basic printing procedures⟧ += ⟦
    // prints escape character, then s 
    function print_esc(s: str_number) {
        var
          c: integer; // the escape character code
        
        ⟦269 Set variable |c| to the current escape character⟧
        if (c >= 0) {
            if (c <= biggest_usv) {
                print_char(c);
            }
        }
        slow_print(s);
    }
⟧

68. An array of digits in the range 0 .. 15 is printed by print_the_digs .

⟦57 Basic printing procedures⟧ += ⟦
    // prints dig [ k - 1 ] $\,\ldots\,$ dig [ 0 ] 
    function print_the_digs(k: eight_bits) {
        while (k > 0) {
            decr(k);
            if (dig[k] < 10) {
                print_char(ord!("0") + dig[k]);
            } else {
                print_char(ord!("A") - 10 + dig[k]);
            }
        }
    }
⟧

69. The following procedure, which prints out the decimal representation of a given integer n , has been written carefully so that it works properly if n == 0 or if (-n) would cause overflow. It does not apply % or div to negative arguments, since such operations are not implemented consistently by all Pascal compilers.

⟦57 Basic printing procedures⟧ += ⟦
    // prints an integer in decimal form
    function print_int(n: integer) {
        var
          k: 0 .. 23, // index to current digit; we assume 
          // that $\vert n\vert<10^{23}$
          m: integer; // used to negate n in possibly 
          // dangerous cases
        
        k = 0;
        if (n < 0) {
            print_char(ord!("-"));
            if (n > -100000000) {
                negate(n);
            } else {
                m = -1 - n;
                n = m div 10;
                m = (m % 10) + 1;
                k = 1;
                if (m < 10) {
                    dig[0] = m;
                } else {
                    dig[0] = 0;
                    incr(n);
                }
            }
        }
        repeat {
            dig[k] = n % 10;
            n = n div 10;
            incr(k);
        } until (n == 0);
        print_the_digs(k);
    }
⟧

70. Here is a trivial procedure to print two digits; it is usually called with a parameter in the range 0 <= n <= 99 .

// prints two least significant digits
function print_two(n: integer) {
    n = abs(n) % 100;
    print_char(ord!("0") + (n div 10));
    print_char(ord!("0") + (n % 10));
}

71. Hexadecimal printing of nonnegative integers is accomplished by print_hex .

// prints a positive integer in hexadecimal form
function print_hex(n: integer) {
    var
      k: 0 .. 22; // index to current digit; we assume that 
      // $0\L n<16^{22}$
    
    k = 0;
    print_char(ord!("\""));
    repeat {
        dig[k] = n % 16;
        n = n div 16;
        incr(k);
    } until (n == 0);
    print_the_digs(k);
}

72. Old versions of TEX needed a procedure called print_ASCII whose function is now subsumed by print . We retain the old name here as a possible aid to future software archæologists.

@define print_ASCII => print

73. Roman numerals are produced by the print_roman_int routine. Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given. Notice that 1990 yields mcmxc, not mxm.

function print_roman_int(n: integer) {
    label exit;
    var
      j, k: pool_pointer, // mysterious indices into 
      // str_pool 
      u, v: nonnegative_integer; // mysterious numbers
    
    j = str_start_macro(strpool!("m2d5c2l5x2v5i"));
    v = 1000;
    loop {
        while (n >= v) {
            print_char(so(str_pool[j]));
            n = n - v;
        }
        if (n <= 0) {
            // nonpositive input produces no output
            return;
        }
        k = j + 2;
        u = v div (so(str_pool[k - 1]) - ord!("0"));
        if (str_pool[k - 1] == si(ord!("2"))) {
            k = k + 2;
            u = u div (so(str_pool[k - 1]) - ord!("0"));
        }
        if (n + u >= v) {
            print_char(so(str_pool[k]));
            n = n + u;
        } else {
            j = j + 2;
            v = v div (so(str_pool[j - 1]) - ord!("0"));
        }
    }
  exit:
}

74. The print subroutine will not print a string that is still being created. The following procedure will.

// prints a yet-unmade string
function print_current_string() {
    var
      j: pool_pointer; // points to current character code
    
    j = str_start_macro(str_ptr);
    while (j < pool_ptr) {
        print_char(so(str_pool[j]));
        incr(j);
    }
}

75. Here is a procedure that asks the user to type a line of input, assuming that the selector setting is either term_only or term_and_log . The input is placed into locations first through last - 1 of the buffer array, and echoed on the transcript file if appropriate.

This procedure is never called when interaction < scroll_mode .

@define prompt_input(#) =>
    {
        wake_up_terminal;
        print(#);
        term_input;
        // prints a string and gets a line of input
    }
// gets a line from the terminal
function term_input() {
    var
      k: 0 .. buf_size; // index into buffer 
    
    // now the user sees the prompt for sure
    update_terminal;
    if (!input_ln(term_in, true)) {
        limit = 0;
        fatal_error(
          strpool!("End of file on the terminal!"),
        );
    }
    // the user's line ended with \<\rm return>
    term_offset = 0;
    // prepare to echo the input
    decr(selector);
    if (last != first) {
        for (k in first to last - 1) {
            print(buffer[k]);
        }
    }
    print_ln;
    // restore previous status
    incr(selector);
}

76. [6] Reporting errors. When something anomalous is detected, TEX typically does something like this:

print_err(strpool!("Somethinganomaloushasbeendetected"))help3(strpool!("Thisisthefirstlineofmyoffertohelp."),)(strpool!("Thisisthesecondline.I'mtryingto"))(strpool!("explainthebestwayforyoutoproceed."))error
A two-line help message would be given using help2 , etc.; these informal helps should use simple vocabulary that complements the words used in the official error message that was printed. (Outside the U.S.A., the help messages should preferably be translated into the local vernacular. Each line of help is at most 60 characters long, in the present implementation, so that max_print_line will not be exceeded.)

The print_err procedure supplies a ‘!’ before the official message, and makes sure that the terminal is awake if a stop is going to occur. The error procedure supplies a ‘.’ after the official message, then it shows the location of the error; and if interaction == error_stop_mode , it also enters into a dialog with the user, during which time the help message may be printed.

77. The global variable interaction has four settings, representing increasing amounts of user interaction:

// omits all stops and omits terminal output
@define batch_mode => 0
@define nonstop_mode => 1 // omits all stops
@define scroll_mode => 2 // omits error stops
// stops at every opportunity to interact
@define error_stop_mode => 3
// extra value for command-line switch
@define unspecified_mode => 4
@define print_err(#) =>
    {
        if (interaction == error_stop_mode) {
            wake_up_terminal;
        }
        if (file_line_error_style_p) {
            print_file_line;
        } else {
            print_nl(strpool!("! "));
        }
        print(#);
    }
⟦13 Global variables⟧ += ⟦
    // current level of interaction
    var interaction: batch_mode .. error_stop_mode;

    // set from command line
    var interaction_option: batch_mode .. unspecified_mode;
⟧

78.

⟦23 Set initial values of key variables⟧ += ⟦
    if (interaction_option == unspecified_mode) {
        interaction = error_stop_mode;
    } else {
        interaction = interaction_option;
    }
⟧

79. TEX is careful not to call error when the print selector setting might be unusual. The only possible values of selector at the time of error messages are

no_print (when interaction == batch_mode and log_file not yet open);

term_only (when interaction > batch_mode and log_file not yet open);

log_only (when interaction == batch_mode and log_file is open);

term_and_log (when interaction > batch_mode and log_file is open).

⟦79 Initialize the print |selector| based on |interaction|⟧ = ⟦
    if (interaction == batch_mode) {
        selector = no_print;
    } else {
        selector = term_only;
    }
⟧

80. A global variable deletions_allowed is set false if the get_next routine is active when error is called; this ensures that get_next and related routines like get_token will never be called recursively. A similar interlock is provided by set_box_allowed .

The global variable history records the worst level of error that has been detected. It has five possible values: spotless , warning_issued , error_message_issued , fatal_error_stop , and output_failure .

Another global variable, error_count , is increased by one when an error occurs without an interactive dialog, and it is reset to zero at the end of every paragraph. If error_count reaches 100, TEX decides that there is no point in continuing further.

//  history value when nothing has been amiss yet
@define spotless => 0
//  history value when begin_diagnostic has been called
@define warning_issued => 1
//  history value when error has been called
@define error_message_issued => 2
//  history value when termination was premature
@define fatal_error_stop => 3
//  history value when output driver returned an error
@define output_failure => 4
⟦13 Global variables⟧ += ⟦
    // is it safe for error to call get_token ?
    var deletions_allowed: boolean;

    // is it safe to do a \.{\\setbox} assignment?
    var set_box_allowed: boolean;

    // has the source input been clean so far?
    var history: spotless .. output_failure;

    // the number of scrolled errors since the last 
    // paragraph ended
    var error_count: -1 .. 100;
⟧

81. The value of history is initially fatal_error_stop , but it will be changed to spotless if TEX survives the initialization process.

⟦23 Set initial values of key variables⟧ += ⟦
    deletions_allowed = true

    set_box_allowed = true

    error_count = 0 //  history is initialized elsewhere

82. Since errors can be detected almost anywhere in TEX, we want to declare the error procedures near the beginning of the program. But the error procedures in turn use some other procedures, which need to be declared forward before we get to error itself.

It is possible for error to be called recursively if some error arises when get_token is being used to delete a token, and/or if some fatal error occurs while TEX is trying to fix a non-fatal one. But such recursion is never more than two levels deep.

⟦82 Error handling procedures⟧ = ⟦
    forward_declaration normalize_selector();

    forward_declaration get_token();

    forward_declaration term_input();

    forward_declaration show_context();

    forward_declaration begin_file_reading();

    forward_declaration open_log_file();

    forward_declaration close_files_and_terminate();

    forward_declaration clear_for_error_prompt();

    forward_declaration give_err_help();

    debug!{
        forward_declaration debug_help();
    }
⟧

83. Individual lines of help are recorded in the array help_line , which contains entries in positions 0 .. (help_ptr - 1) . They should be printed in reverse order, i.e., with help_line[0] appearing last.

@define hlp1(#) =>
    /*... opened earlier ...*/
        help_line[0] = #;
    }
@define hlp2(#) =>
    help_line[1] = #;
    hlp1
@define hlp3(#) =>
    help_line[2] = #;
    hlp2
@define hlp4(#) =>
    help_line[3] = #;
    hlp3
@define hlp5(#) =>
    help_line[4] = #;
    hlp4
@define hlp6(#) =>
    help_line[5] = #;
    hlp5
@define help0 =>
    // sometimes there might be no help
    help_ptr = 0
@define help1 =>
    {
        help_ptr = 1;
        // use this with one help line
        hlp1
    /* ... closed later ... */
@define help2 =>
    {
        help_ptr = 2;
        // use this with two help lines
        hlp2
    /* ... closed later ... */
@define help3 =>
    {
        help_ptr = 3;
        // use this with three help lines
        hlp3
    /* ... closed later ... */
@define help4 =>
    {
        help_ptr = 4;
        // use this with four help lines
        hlp4
    /* ... closed later ... */
@define help5 =>
    {
        help_ptr = 5;
        // use this with five help lines
        hlp5
    /* ... closed later ... */
@define help6 =>
    {
        help_ptr = 6;
        // use this with six help lines
        hlp6
    /* ... closed later ... */
⟦13 Global variables⟧ += ⟦
    // helps for the next error 
    var help_line: array [0 .. 5] of str_number;

    // the number of help lines present
    var help_ptr: 0 .. 6;

    // should the err_help list be shown?
    var use_err_help: boolean;
⟧

84.

⟦23 Set initial values of key variables⟧ += ⟦
    help_ptr = 0

    use_err_help = false
⟧

85. The jump_out procedure just cuts across all active procedure levels. The body of jump_out simply calls ‘close_files_and_terminate ; ’ followed by a call on some system procedure that quietly terminates the program.

@format noreturn ~ procedure;
@define do_final_end =>
    {
        update_terminal;
        ready_already = 0;
        if (
            (history != spotless)
            && (history != warning_issued)
        ) {
            uexit(1);
        } else {
            uexit(0);
        }
    }
⟦82 Error handling procedures⟧ += ⟦
    noreturn

    function jump_out() {
        close_files_and_terminate;
        do_final_end;
    }
⟧

86. Here now is the general error routine.

⟦82 Error handling procedures⟧ += ⟦
    // completes the job of error reporting
    function error() {
        label continue, exit;
        var
          c: ASCII_code, // what the user types
          s1, s2, s3, s4: integer; // used to save global 
          // variables when deleting tokens
        
        if (history < error_message_issued) {
            history = error_message_issued;
        }
        print_char(ord!("."));
        show_context;
        if ((halt_on_error_p)) {
            history = fatal_error_stop;
            jump_out;
        }
        if (interaction == error_stop_mode) {
            ⟦87 Get user's advice and |return|⟧
        }
        incr(error_count);
        if (error_count == 100) {
            print_nl(
              strpool!("(That makes 100 errors; please try again.)"),
            );
            history = fatal_error_stop;
            jump_out;
        }
        ⟦94 Put help message on the transcript file⟧
      exit:
    }
⟧

87.

⟦87 Get user's advice and |return|⟧ = ⟦
    loop {
      continue:
        if (interaction != error_stop_mode) {
            return;
        }
        clear_for_error_prompt;
        prompt_input(strpool!("? "));
        if (last == first) {
            return;
        }
        c = buffer[first];
        if (c >= ord!("a")) {
            // convert to uppercase
            c = c + ord!("A") - ord!("a");
        }
        ⟦88 Interpret code |c| and |return| if done⟧
    }
⟧

88. It is desirable to provide an ‘E’ option here that gives the user an easy way to return from TEX to the system editor, with the offending line ready to be edited. We do this by calling the external procedure call_edit with a pointer to the filename, its length, and the line number. However, here we just set up the variables that will be used as arguments, since we don’t want to do the switch-to-editor until after TeX has closed its files.

There is a secret ‘D’ option available when the debugging routines haven’t been commented out.

@define edit_file => input_stack[base_ptr]
⟦88 Interpret code |c| and |return| if done⟧ = ⟦
    case c {
      ord!("0"),
      ord!("1"),
      ord!("2"),
      ord!("3"),
      ord!("4"),
      ord!("5"),
      ord!("6"),
      ord!("7"),
      ord!("8"),
      ord!("9"):
        if (deletions_allowed) {
            ⟦92 Delete \(c)|c-"0"| tokens and |goto continue|⟧
        }
      debug!{
          ord!("D"):
            debug_help;
            goto continue;
      }
      ord!("E"):
        if (base_ptr > 0) {
            if (input_stack[base_ptr].name_field >= 256) {
                edit_name_start = str_start_macro(
                  edit_file.name_field,
                );
                edit_name_length = 
                    str_start_macro(
                      edit_file.name_field + 1,
                    )
                    - str_start_macro(edit_file.name_field)
                ;
                edit_line = line;
                jump_out;
            }
        }
      ord!("H"):
        ⟦93 Print the help information and |goto continue|⟧
      ord!("I"):
        ⟦91 Introduce new material from the terminal and |return|⟧
      ord!("Q"), ord!("R"), ord!("S"):
        ⟦90 Change the interaction level and |return|⟧
      ord!("X"):
        interaction = scroll_mode;
        jump_out;
      othercases:
        do_nothing;
    }

    ⟦89 Print the menu of available options⟧

89.

⟦89 Print the menu of available options⟧ = ⟦
    {
        print(
          strpool!("Type <return> to proceed, S to scroll future error messages,"),
        );
        print_nl(
          strpool!("R to run without stopping, Q to run quietly,"),
        );
        print_nl(strpool!("I to insert something, "));
        if (base_ptr > 0) {
            if (input_stack[base_ptr].name_field >= 256) {
                print(strpool!("E to edit your file,"));
            }
        }
        if (deletions_allowed) {
            print_nl(
              strpool!("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"),
            );
        }
        print_nl(strpool!("H for help, X to quit."));
    }
⟧

90. Here the author of TEX apologizes for making use of the numerical relation between ord!("Q") , ord!("R") , ord!("S") , and the desired interaction settings batch_mode , nonstop_mode , scroll_mode .

⟦90 Change the interaction level and |return|⟧ = ⟦
    {
        error_count = 0;
        interaction = batch_mode + c - ord!("Q");
        print(strpool!("OK, entering "));
        case c {
          ord!("Q"):
            print_esc(strpool!("batchmode"));
            decr(selector);
          ord!("R"):
            print_esc(strpool!("nonstopmode"));
          ord!("S"):
            print_esc(strpool!("scrollmode"));// there are 
          // no other cases
        }
        print(strpool!("..."));
        print_ln;
        update_terminal;
        return;
    }
⟧

91. When the following code is executed, buffer[(first + 1) .. (last - 1)] may contain the material inserted by the user; otherwise another prompt will be given. In order to understand this part of the program fully, you need to be familiar with TEX’s input stacks.

⟦91 Introduce new material from the terminal and |return|⟧ = ⟦
    {
        // enter a new syntactic level for terminal input
        begin_file_reading;
        // now state == mid_line , so an initial blank space 
        // will count as a blank
        if (last > first + 1) {
            loc = first + 1;
            buffer[first] = ord!(" ");
        } else {
            prompt_input(strpool!("insert>"));
            loc = first;
        }
        first = last;
        // no end_line_char ends this line
        cur_input.limit_field = last - 1;
        return;
    }
⟧

92. We allow deletion of up to 99 tokens at a time.

⟦92 Delete \(c)|c-"0"| tokens and |goto continue|⟧ = ⟦
    {
        s1 = cur_tok;
        s2 = cur_cmd;
        s3 = cur_chr;
        s4 = align_state;
        align_state = 1000000;
        OK_to_interrupt = false;
        if (
            (last > first + 1)
            && (buffer[first + 1] >= ord!("0"))
            && (buffer[first + 1] <= ord!("9"))
        ) {
            c = c * 10 + buffer[first + 1] - ord!("0") * 11;
        } else {
            c = c - ord!("0");
        }
        while (c > 0) {
            // one-level recursive call of error is possible
            get_token;
            decr(c);
        }
        cur_tok = s1;
        cur_cmd = s2;
        cur_chr = s3;
        align_state = s4;
        OK_to_interrupt = true;
        help2(
          strpool!("I have just deleted some text, as you asked."),
        )(
          strpool!("You can now delete more, or insert, or whatever."),
        );
        show_context;
        goto continue;
    }
⟧

93.

⟦93 Print the help information and |goto continue|⟧ = ⟦
    {
        if (use_err_help) {
            give_err_help;
            use_err_help = false;
        } else {
            if (help_ptr == 0) {
                help2(
                  strpool!("Sorry, I don't know how to help in this situation."),
                )(
                  strpool!("Maybe you should try asking a human?"),
                );
            }
            repeat {
                decr(help_ptr);
                print(help_line[help_ptr]);
                print_ln;
            } until (help_ptr == 0);
        }
        help4(
          strpool!("Sorry, I already gave what help I could..."),
        )(strpool!("Maybe you should try asking a human?"))(
          strpool!("An error might have occurred before I noticed any problems."),
        )(
          strpool!("``If all else fails, read the instructions.''"),
        );
        goto continue;
    }
⟧

94.

⟦94 Put help message on the transcript file⟧ = ⟦
    if (interaction > batch_mode) {
        // avoid terminal output
        decr(selector);
    }

    if (use_err_help) {
        print_ln;
        give_err_help;
    } else {
        while (help_ptr > 0) {
            decr(help_ptr);
            print_nl(help_line[help_ptr]);
        }
    }

    print_ln

    if (interaction > batch_mode) {
        // re-enable terminal output
        incr(selector);
    }

    print_ln
⟧

95. A dozen or so error messages end with a parenthesized integer, so we save a teeny bit of program space by declaring the following procedure:

function int_error(n: integer) {
    print(strpool!(" ("));
    print_int(n);
    print_char(ord!(")"));
    error;
}

96. In anomalous cases, the print selector might be in an unknown state; the following subroutine is called to fix things just enough to keep running a bit longer.

function normalize_selector() {
    if (log_opened) {
        selector = term_and_log;
    } else {
        selector = term_only;
    }
    if (job_name == 0) {
        open_log_file;
    }
    if (interaction == batch_mode) {
        decr(selector);
    }
}

97. The following procedure prints TEX’s last words before dying.

@define succumb =>
    {
        if (interaction == error_stop_mode) {
            // no more interaction
            interaction = scroll_mode;
        }
        if (log_opened) {
            error;
        }
        debug!{
            if (interaction > batch_mode) {
                debug_help;
            }
        }
        history = fatal_error_stop;
        // irrecoverable error
        jump_out;
    }
⟦82 Error handling procedures⟧ += ⟦
    noreturn

    // prints s , and that's it
    function fatal_error(s: str_number) {
        normalize_selector;
        print_err(strpool!("Emergency stop"));
        help1(s);
        succumb;
    }
⟧

98. Here is the most dreaded error message.

⟦82 Error handling procedures⟧ += ⟦
    noreturn

    // stop due to finiteness
    function overflow(s: str_number, n: integer) {
        normalize_selector;
        print_err(
          strpool!("TeX capacity exceeded, sorry ["),
        );
        print(s);
        print_char(ord!("="));
        print_int(n);
        print_char(ord!("]"));
        help2(
          strpool!("If you really absolutely need more capacity,"),
        )(strpool!("you can ask a wizard to enlarge me."));
        succumb;
    }
⟧

99. The program might sometime run completely amok, at which point there is no choice but to stop. If no previous error has been detected, that’s bad news; a message is printed that is really intended for the TEX maintenance person instead of the user (unless the user has been particularly diabolical). The index entries for ‘this can’t happen’ may help to pinpoint the problem.

⟦82 Error handling procedures⟧ += ⟦
    noreturn

    // consistency check violated; s tells where
    function confusion(s: str_number) {
        normalize_selector;
        if (history < error_message_issued) {
            print_err(strpool!("This can't happen ("));
            print(s);
            print_char(ord!(")"));
            help1(
              strpool!("I'm broken. Please show this to someone who can fix can fix"),
            );
        } else {
            print_err(
              strpool!("I can't go on meeting you like this"),
            );
            help2(
              strpool!("One of your faux pas seems to have wounded me deeply..."),
            )(
              strpool!("in fact, I'm barely conscious. Please fix it and try again."),
            );
        }
        succumb;
    }
⟧

100. Users occasionally want to interrupt TEX while it’s running. If the Pascal runtime system allows this, one can implement a routine that sets the global variable interrupt to some nonzero value when such an interrupt is signalled. Otherwise there is probably at least a way to make interrupt nonzero using the Pascal debugger.

@define check_interrupt =>
    {
        if (interrupt != 0) {
            pause_for_instructions;
        }
    }
⟦13 Global variables⟧ += ⟦
    // should \TeX\ pause for instructions?
    var interrupt: integer;

    // should interrupts be observed?
    var OK_to_interrupt: boolean;
⟧

101.

⟦23 Set initial values of key variables⟧ += ⟦
    interrupt = 0

    OK_to_interrupt = true
⟧

102. When an interrupt has been detected, the program goes into its highest interaction level and lets the user have nearly the full flexibility of the error routine. TEX checks for interrupts only at times when it is safe to do this.

function pause_for_instructions() {
    if (OK_to_interrupt) {
        interaction = error_stop_mode;
        if ((selector == log_only) || (selector == no_print)) {
            incr(selector);
        }
        print_err(strpool!("Interruption"));
        help3(strpool!("You rang?"))(
          strpool!("Try to insert an instruction for me (e.g., `I\\showlists'),"),
        )(
          strpool!("unless you just want to quit by typing `X'."),
        );
        deletions_allowed = false;
        error;
        deletions_allowed = true;
        interrupt = 0;
    }
}

103. [7] Arithmetic with scaled dimensions. The principal computations performed by TEX are done entirely in terms of integers less than 231 in magnitude; and divisions are done only when both dividend and divisor are nonnegative. Thus, the arithmetic specified in this program can be carried out in exactly the same way on a wide variety of computers, including some small ones. Why? Because the arithmetic calculations need to be spelled out precisely in order to guarantee that TEX will produce identical output on different machines. If some quantities were rounded differently in different implementations, we would find that line breaks and even page breaks might occur in different places. Hence the arithmetic of TEX has been designed with care, and systems that claim to be implementations of TEX82 should follow precisely the calculations as they appear in the present program.

(Actually there are three places where TEX uses div with a possibly negative numerator. These are harmless; see div in the index. Also if the user sets the \time or the \year to a negative value, some diagnostic information will involve negative-numerator division. The same remarks apply for % as well as for div .)

104. Here is a routine that calculates half of an integer, using an unambiguous convention with respect to signed odd numbers.

function half(x: integer): integer {
    if (odd(x)) {
        half = (x + 1) div 2;
    } else {
        half = x div 2;
    }
}

105. Fixed-point arithmetic is done on scaled integers that are multiples of 216. In other words, a binary point is assumed to be sixteen bit positions from the right end of a binary computer word.

@define unity => 0x10000 // $2^{16}$, represents 1.00000
@define two => 0x20000 // $2^{17}$, represents 2.00000
⟦18 Types in the outer block⟧ += ⟦
    // this type is used for scaled integers
    const scaled = integer;

    // $0\L x<2^{31}$
    type nonnegative_integer = 0 .. 0x7fffffff;

    // this type is self-explanatory
    type small_number = 0 .. hyphenatable_length_limit;
⟧

106. The following function is used to create a scaled integer from a given decimal fraction (.𝑑0𝑑1𝑑𝑘1), where 0 <= k <= 17 . The digit 𝑑𝑖 is given in dig[i] , and the calculation produces a correctly rounded result.

// converts a decimal fraction
function round_decimals(k: small_number): scaled {
    var
      a: integer; // the accumulator
    
    a = 0;
    while (k > 0) {
        decr(k);
        a = (a + dig[k] * two) div 10;
    }
    round_decimals = (a + 1) div 2;
}

107. Conversely, here is a procedure analogous to print_int . If the output of this procedure is subsequently read by TEX and converted by the round_decimals routine above, it turns out that the original value will be reproduced exactly; the “simplest” such decimal number is output, but there is always at least one digit following the decimal point.

The invariant relation in the &repeat loop is that a sequence of decimal digits yet to be printed will yield the original number if and only if they form a fraction 𝑓 in the range 𝑠𝛿10216𝑓<𝑠. We can stop if and only if 𝑓=0 satisfies this condition; the loop will terminate before 𝑠 can possibly become zero.

// prints scaled real, rounded to five digits
function print_scaled(s: scaled) {
    var
      delta: scaled; // amount of allowable inaccuracy
    
    if (s < 0) {
        print_char(ord!("-"));
        // print the sign, if negative
        negate(s);
    }
    // print the integer part
    print_int(s div unity);
    print_char(ord!("."));
    s = 10 * (s % unity) + 5;
    delta = 10;
    repeat {
        if (delta > unity) {
            // round the last digit
            s = s + 0x8000 - 50000;
        }
        print_char(ord!("0") + (s div unity));
        s = 10 * (s % unity);
        delta = delta * 10;
    } until (s <= delta);
}

108. Physical sizes that a TEX user specifies for portions of documents are represented internally as scaled points. Thus, if we define an ‘sp’ (scaled point) as a unit equal to 216 printer’s points, every dimension inside of TEX is an integer number of sp. There are exactly 4,736,286.72 sp per inch. Users are not allowed to specify dimensions larger than 2301 sp, which is a distance of about 18.892 feet (5.7583 meters); two such quantities can be added without overflow on a 32-bit computer.

The present implementation of TEX does not check for overflow when dimensions are added or subtracted. This could be done by inserting a few dozen tests of the form ‘if x >= 0x40000000 then ’, but the chance of overflow is so remote that such tests do not seem worthwhile.

TEX needs to do only a few arithmetic operations on scaled quantities, other than addition and subtraction, and the following subroutines do most of the work. A single computation might use several subroutine calls, and it is desirable to avoid producing multiple error messages in case of arithmetic overflow; so the routines set the global variable arith_error to true instead of reporting errors directly to the user. Another global variable, remainder , holds the remainder after a division.

@define remainder => tex_remainder
⟦13 Global variables⟧ += ⟦
    // has arithmetic overflow occurred recently?
    var arith_error: boolean;

    // amount subtracted to get an exact division
    var remainder: scaled;
⟧

109. The first arithmetical subroutine we need computes 𝑛𝑥+𝑦, where x and y are scaled and n is an integer. We will also use it to multiply integers.

@define nx_plus_y(#) => mult_and_add(#, 0x3fffffff)
@define mult_integers(#) => mult_and_add(#, 0, 0x7fffffff)
function mult_and_add(
  n: integer,
  x, y, max_answer: scaled,
): scaled {
    if (n < 0) {
        negate(x);
        negate(n);
    }
    if (n == 0) {
        mult_and_add = y;
    } else if ((
        (x <= (max_answer - y) div n)
        && (-x <= (max_answer + y) div n)
    )) {
        mult_and_add = n * x + y;
    } else {
        arith_error = true;
        mult_and_add = 0;
    }
}

110. We also need to divide scaled dimensions by integers.

function x_over_n(x: scaled, n: integer): scaled {
    var
      negative: boolean; // should remainder be negated?
    
    negative = false;
    if (n == 0) {
        arith_error = true;
        x_over_n = 0;
        remainder = x;
    } else {
        if (n < 0) {
            negate(x);
            negate(n);
            negative = true;
        }
        if (x >= 0) {
            x_over_n = x div n;
            remainder = x % n;
        } else {
            x_over_n = -((-x) div n);
            remainder = -((-x) % n);
        }
    }
    if (negative) {
        negate(remainder);
    }
}

111. Then comes the multiplication of a scaled number by a fraction n / d , where n and d are nonnegative integers <=2^{16} and d is positive. It would be too dangerous to multiply by n and then divide by d , in separate operations, since overflow might well occur; and it would be too inaccurate to divide by d and then multiply by n . Hence this subroutine simulates 1.5-precision arithmetic.

function xn_over_d(x: scaled, n, d: integer): scaled {
    var
      positive: boolean, // was x >= 0 ?
      t, u, v: nonnegative_integer; // intermediate 
      // quantities
    
    if (x >= 0) {
        positive = true;
    } else {
        negate(x);
        positive = false;
    }
    t = (x % 0x8000) * n;
    u = (x div 0x8000) * n + (t div 0x8000);
    v = (u % d) * 0x8000 + (t % 0x8000);
    if (u div d >= 0x8000) {
        arith_error = true;
    } else {
        u = 0x8000 * (u div d) + (v div d);
    }
    if (positive) {
        xn_over_d = u;
        remainder = v % d;
    } else {
        xn_over_d = -u;
        remainder = -(v % d);
    }
}

112. The next subroutine is used to compute the “badness” of glue, when a total t is supposed to be made from amounts that sum to s . According to The TEXbook, the badness of this situation is 100(𝑡/𝑠)3; however, badness is simply a heuristic, so we need not squeeze out the last drop of accuracy when computing it. All we really want is an approximation that has similar properties.

The actual method used to compute the badness is easier to read from the program than to describe in words. It produces an integer value that is a reasonably close approximation to 100(𝑡/𝑠)3, and all implementations of TEX should use precisely this method. Any badness of 213 or more is treated as infinitely bad, and represented by 10000.

It is not difficult to prove that

badness(t+1,s)>=badness(t,s)>=badness(t,s+1).
The badness function defined here is capable of computing at most 1095 distinct values, but that is plenty.

@define inf_bad => 10000 // infinitely bad value
// compute badness, given t >= 0 
function badness(t, s: scaled): halfword {
    var
      r: integer; // approximation to $\alpha t/s$, where 
      // $\alpha^3\approx 100\cdot2^{18}$
    
    if (t == 0) {
        badness = 0;
    } else if (s <= 0) {
        badness = inf_bad;
    } else {
        if (t <= 7230584) {
            // $297^3=99.94\times2^{18}$
            r = (t * 297) div s;
        } else if (s >= 1663497) {
            r = t div (s div 297);
        } else {
            r = t;
        }
        if (r > 1290) {
            // $1290^3<2^{31}<1291^3$
            badness = inf_bad;
        } else {
            badness = (r * r * r + 0x20000) div 0x40000;
        }
        // that was $r^3/2^{18}$, rounded to the nearest 
        // integer
    }
}

113. When TEX “packages” a list into a box, it needs to calculate the proportionality ratio by which the glue inside the box should stretch or shrink. This calculation does not affect TEX’s decision making, so the precise details of rounding, etc., in the glue calculation are not of critical importance for the consistency of results on different computers.

We shall use the type glue_ratio for such proportionality ratios. A glue ratio should take the same amount of memory as an integer (usually 32 bits) if it is to blend smoothly with TEX’s other data structures. Thus glue_ratio should be equivalent to short_real in some implementations of Pascal. Alternatively, it is possible to deal with glue ratios using nothing but fixed-point arithmetic; see TUGboat 3,1 (March 1982), 10–27. (But the routines cited there must be modified to allow negative glue ratios.)

@define set_glue_ratio_zero(#) =>
    // store the representation of zero ratio
    # = 0.0
@define set_glue_ratio_one(#) =>
    // store the representation of unit ratio
    # = 1.0
// convert from glue_ratio to type real 
@define float(#) => #
// convert from real to type glue_ratio 
@define unfloat(#) => #
// convert integer constant to real 
@define float_constant(#) => #.0
⟦18 Types in the outer block⟧ += ⟦
    /*nothing*/

114. [7b] Random numbers.

This section is (almost) straight from MetaPost. I had to change the types (use integer instead of fraction ), but that should not have any influence on the actual calculations (the original comments refer to quantities like fraction_four (230), and that is the same as the numeric representation of maxdimen ).

I’ve copied the low-level variables and routines that are needed, but only those (e.g. m_log ), not the accompanying ones like m_exp . Most of the following low-level numeric routines are only needed within the calculation of norm_rand . I’ve been forced to rename make_fraction to make_frac because TeX already has a routine by that name with a wholly different function (it creates a fraction_noad for math typesetting) – Taco

And now let’s complete our collection of numeric utility routines by considering random number generation. generates pseudo-random numbers with the additive scheme recommended in Section 3.6 of The Art of Computer Programming; however, the results are random fractions between 0 and fraction_one - 1 , inclusive.

There’s an auxiliary array randoms that contains 55 pseudo-random fractions. Using the recurrence 𝑥𝑛=(𝑥𝑛55𝑥𝑛31)mod228, we generate batches of 55 new 𝑥𝑛’s at a time by calling new_randoms . The global variable j_random tells which element has most recently been consumed.

⟦13 Global variables⟧ += ⟦
    // the last 55 random values generated
    var randoms: array [0 .. 54] of integer;

    // the number of unused randoms 
    var j_random: 0 .. 54;

    // the default random seed
    var random_seed: scaled;
⟧

115. A small bit of metafont is needed.

// $2^{27}$, represents 0.50000000
@define fraction_half => 0x8000000
// $2^{28}$, represents 1.00000000
@define fraction_one => 0x10000000
// $2^{30}$, represents 4.00000000
@define fraction_four => 0x40000000
// $2^{31}-1$, the largest value that \MP\ likes
@define el_gordo => 0x7fffffff
@define halfp(#) => (#) div 2
@define double(#) => # = # + # // multiply a variable by two

116. The make_frac routine produces the fraction equivalent of p / q , given integers p and q ; it computes the integer 𝑓=228𝑝/𝑞+12, when 𝑝 and 𝑞 are positive. If p and q are both of the same scaled type t , the “type relation” make_frac(t, t) == fraction is valid; and it’s also possible to use the subroutine “backwards,” using the relation make_frac(t, fraction) == t between scaled types.

If the result would have magnitude 231 or more, make_frac sets arith_error = true . Most of ’s internal computations have been designed to avoid this sort of error.

If this subroutine were programmed in assembly language on a typical machine, we could simply compute (2^{28} * p) div q , since a double-precision product can often be input to a fixed-point division instruction. But when we are restricted to Pascal arithmetic it is necessary either to resort to multiple-precision maneuvering or to use a simple but slow iteration. The multiple-precision technique would be about three times faster than the code adopted here, but it would be comparatively long and tricky, involving about sixteen additional multiplications and divisions.

This operation is part of ’s “inner loop”; indeed, it will consume nearly 10% of the running time (exclusive of input and output) if the code below is left unchanged. A machine-dependent recoding will therefore make run faster. The present implementation is highly portable, but slow; it avoids multiplication and division except in the initial stage. System wizards should be careful to replace it with a routine that is guaranteed to produce identical results in all cases.

As noted below, a few more routines should also be replaced by machine-dependent code, for efficiency. But when a procedure is not part of the “inner loop,” such changes aren’t advisable; simplicity and robustness are preferable to trickery, unless the cost is too high.

function make_frac(p, q: integer): integer {
    var
      f: integer, // the fraction bits, with a leading 1 bit
      n: integer, // the integer part of $\vert p/q\vert$
      negative: boolean, // should the result be negated?
      be_careful: integer; // disables certain compiler 
      // optimizations
    
    if (p >= 0) {
        negative = false;
    } else {
        negate(p);
        negative = true;
    }
    if (q <= 0) {
        debug!{
            if (q == 0) {
                confusion(ord!("/"));
            }
        }
        negate(q);
        negative = !negative;
    }
    n = p div q;
    p = p % q;
    if (n >= 8) {
        arith_error = true;
        if (negative) {
            make_frac = -el_gordo;
        } else {
            make_frac = el_gordo;
        }
    } else {
        n = (n - 1) * fraction_one;
        ⟦117 Compute $f=\lfloor 2^{28}(1+p/q)+{1\over2}\rfloor$⟧
        if (negative) {
            make_frac = -(f + n);
        } else {
            make_frac = f + n;
        }
    }
}

117. The repeat loop here preserves the following invariant relations between f , p , and q : (i) 0 <= p < q ; (ii) 𝑓𝑞+𝑝=2𝑘(𝑞+𝑝0), where 𝑘 is an integer and 𝑝0 is the original value of 𝑝.

Notice that the computation specifies (p - q) + p instead of (p + p) - q , because the latter could overflow. Let us hope that optimizing compilers do not miss this point; a special variable be_careful is used to emphasize the necessary order of computation. Optimizing compilers should keep be_careful in a register, not store it in memory.

⟦117 Compute $f=\lfloor 2^{28}(1+p/q)+{1\over2}\rfloor$⟧ = ⟦
    f = 1

    repeat {
        be_careful = p - q;
        p = be_careful + p;
        if (p >= 0) {
            f = f + f + 1;
        } else {
            double(f);
            p = p + q;
        }
    } until (f >= fraction_one)

    be_careful = p - q

    if (be_careful + p >= 0) {
        incr(f);
    }
⟧

118.

function take_frac(q: integer, f: integer): integer {
    var
      p: integer, // the fraction so far
      negative: boolean, // should the result be negated?
      n: integer, // additional multiple of $q$
      be_careful: integer; // disables certain compiler 
      // optimizations
    
    ⟦119 Reduce to the case that |f>=0| and |q>0|⟧
    if (f < fraction_one) {
        n = 0;
    } else {
        n = f div fraction_one;
        f = f % fraction_one;
        if (q <= el_gordo div n) {
            n = n * q;
        } else {
            arith_error = true;
            n = el_gordo;
        }
    }
    f = f + fraction_one;
    ⟦120 Compute $p=\lfloor qf/2^{28}+{1\over2}\rfloor-q$⟧
    be_careful = n - el_gordo;
    if (be_careful + p > 0) {
        arith_error = true;
        n = el_gordo - p;
    }
    if (negative) {
        take_frac = -(n + p);
    } else {
        take_frac = n + p;
    }
}

119.

⟦119 Reduce to the case that |f>=0| and |q>0|⟧ = ⟦
    if (f >= 0) {
        negative = false;
    } else {
        negate(f);
        negative = true;
    }

    if (q < 0) {
        negate(q);
        negative = !negative;
    }
⟧

120. The invariant relations in this case are (i) (𝑞𝑓+𝑝)/2𝑘=𝑞𝑓0/228+12, where 𝑘 is an integer and 𝑓0 is the original value of 𝑓 ; (ii) 2𝑘𝑓<2𝑘+1.

⟦120 Compute $p=\lfloor qf/2^{28}+{1\over2}\rfloor-q$⟧ = ⟦
    // that's $2^{27}$; the invariants hold now with $k=28$
    p = fraction_half

    if (q < fraction_four) {
        repeat {
            if (odd(f)) {
                p = halfp(p + q);
            } else {
                p = halfp(p);
            }
            f = halfp(f);
        } until (f == 1);
    } else {
        repeat {
            if (odd(f)) {
                p = p + halfp(q - p);
            } else {
                p = halfp(p);
            }
            f = halfp(f);
        } until (f == 1);
    }
⟧

121. The subroutines for logarithm and exponential involve two tables. The first is simple: two_to_the[k] equals 2𝑘 . The second involves a bit more calculation, which the author claims to have done correctly: spec_log[k] is 227 times ln(1/(12𝑘))=2𝑘+1222𝑘+1323𝑘+ , rounded to the nearest integer.

⟦13 Global variables⟧ += ⟦
    // powers of two
    var two_to_the: array [0 .. 30] of integer;

    // special logarithms
    var spec_log: array [1 .. 28] of integer;
⟧

122.

⟦23 Set initial values of key variables⟧ += ⟦
    two_to_the[0] = 1

    for (k in 1 to 30) {
        two_to_the[k] = 2 * two_to_the[k - 1];
    }

    spec_log[1] = 93032640

    spec_log[2] = 38612034

    spec_log[3] = 17922280

    spec_log[4] = 8662214

    spec_log[5] = 4261238

    spec_log[6] = 2113709

    spec_log[7] = 1052693

    spec_log[8] = 525315

    spec_log[9] = 262400

    spec_log[10] = 131136

    spec_log[11] = 65552

    spec_log[12] = 32772

    spec_log[13] = 16385

    for (k in 14 to 27) {
        spec_log[k] = two_to_the[27 - k];
    }

    spec_log[28] = 1

123.

function m_log(x: integer): integer {
    var
      y, z: integer, // auxiliary registers
      k: integer; // iteration counter
    
    if (x <= 0) {
        ⟦125 Handle non-positive logarithm⟧
    } else {
        // $14\times2^{27}\ln2\approx1302456956.421063$
        y = 1302456956 + 4 - 100;
        // and $2^{16}\times .421063\approx 27595$
        z = 27595 + 6553600;
        while (x < fraction_four) {
            double(x);
            y = y - 93032639;
            z = z - 48782;
            // $2^{27}\ln2\approx 93032639.74436163$ and 
            // $2^{16}\times.74436163\approx 48782$
        }
        y = y + (z div unity);
        k = 2;
        while (x > fraction_four + 4) {
            ⟦124 Increase |k| until |x| can be multiplied by a factor of $2^{-k}$, and adjust $y$ accordingly⟧
        }
        m_log = y div 8;
    }
}

124.

⟦124 Increase |k| until |x| can be multiplied by a factor of $2^{-k}$, and adjust $y$ accordingly⟧ = ⟦
    {
        // $z=\lceil x/2^k\rceil$
        z = ((x - 1) div two_to_the[k]) + 1;
        while (x < fraction_four + z) {
            z = halfp(z + 1);
            k = k + 1;
        }
        y = y + spec_log[k];
        x = x - z;
    }
⟧

125.

⟦125 Handle non-positive logarithm⟧ = ⟦
    {
        print_err(strpool!("Logarithm of "));
        print_scaled(x);
        print(strpool!(" has been replaced by 0"));
        help2(
          strpool!("Since I don't take logs of non-positive numbers,"),
        )(
          strpool!("I'm zeroing this one. Proceed, with fingers crossed."),
        );
        error;
        m_log = 0;
    }
⟧

126. The following somewhat different subroutine tests rigorously if 𝑎𝑏 is greater than, equal to, or less than 𝑐𝑑, given integers (𝑎,𝑏,𝑐,𝑑). In most cases a quick decision is reached. The result is +1, 0, or 1 in the three respective cases.

@define return_sign(#) =>
    {
        ab_vs_cd = #;
        return;
    }
function ab_vs_cd(a, b, c, d: integer): integer {
    label exit;
    var
      q, r: integer; // temporary registers
    
    ⟦127 Reduce to the case that |a,c>=0|, |b,d>0|⟧
    loop {
        q = a div d;
        r = c div b;
        if (q != r) {
            if (q > r) {
                return_sign(1);
            } else {
                return_sign(-1);
            }
        }
        q = a % d;
        r = c % b;
        if (r == 0) {
            if (q == 0) {
                return_sign(0);
            } else {
                return_sign(1);
            }
        }
        if (q == 0) {
            return_sign(-1);
        }
        a = b;
        b = q;
        c = d;
        d = r;
        // now a > d > 0 and c > b > 0 
    }
  exit:
}

127.

⟦127 Reduce to the case that |a,c>=0|, |b,d>0|⟧ = ⟦
    if (a < 0) {
        negate(a);
        negate(b);
    }

    if (c < 0) {
        negate(c);
        negate(d);
    }

    if (d <= 0) {
        if (b >= 0) {
            if (
                ((a == 0) || (b == 0))
                && ((c == 0) || (d == 0))
            ) {
                return_sign(0);
            } else {
                return_sign(1);
            }
        }
        if (d == 0) {
            if (a == 0) {
                return_sign(0);
            } else {
                return_sign(-1);
            }
        }
        q = a;
        a = c;
        c = q;
        q = -b;
        b = -d;
        d = q;
    } else if (b <= 0) {
        if (b < 0) {
            if (a > 0) {
                return_sign(-1);
            }
        }
        if (c == 0) {
            return_sign(0);
        } else {
            return_sign(-1);
        }
    }
⟧

128. To consume a random integer, the program below will say ‘next_random ’ and then it will fetch randoms[j_random] .

@define next_random =>
    if (j_random == 0) {
        new_randoms;
    } else {
        decr(j_random);
    }
function new_randoms() {
    var
      k: 0 .. 54, // index into randoms 
      x: integer; // accumulator
    
    for (k in 0 to 23) {
        x = randoms[k] - randoms[k + 31];
        if (x < 0) {
            x = x + fraction_one;
        }
        randoms[k] = x;
    }
    for (k in 24 to 54) {
        x = randoms[k] - randoms[k - 24];
        if (x < 0) {
            x = x + fraction_one;
        }
        randoms[k] = x;
    }
    j_random = 54;
}

129. To initialize the randoms table, we call the following routine.

function init_randoms(seed: integer) {
    var
      j, jj, k: integer, // more or less random integers
      i: 0 .. 54; // index into randoms 
    
    j = abs(seed);
    while (j >= fraction_one) {
        j = halfp(j);
    }
    k = 1;
    for (i in 0 to 54) {
        jj = k;
        k = j - k;
        j = jj;
        if (k < 0) {
            k = k + fraction_one;
        }
        randoms[(i * 21) % 55] = j;
    }
    new_randoms;
    new_randoms;
    // ``warm up'' the array
    new_randoms;
}

130. To produce a uniform random number in the range 0 <= u < x or 0 >= u > x or 0 == u == x , given a scaled value x , we proceed as shown here.

Note that the call of take_frac will produce the values 0 and x with about half the probability that it will produce any other particular values between 0 and x , because it rounds its answers.

function unif_rand(x: integer): integer {
    var
      y: integer; // trial value
    
    next_random;
    y = take_frac(abs(x), randoms[j_random]);
    if (y == abs(x)) {
        unif_rand = 0;
    } else if (x > 0) {
        unif_rand = y;
    } else {
        unif_rand = -y;
    }
}

131. Finally, a normal deviate with mean zero and unit standard deviation can readily be obtained with the ratio method (Algorithm 3.4.1R in The Art of Computer Programming).

function norm_rand(): integer {
    var
      x, u, l: integer; // what the book would call 
      // $2^{16}X$, $2^{28}U$, and $-2^{24}\ln U$
    
    repeat {
        repeat {
            next_random;
            // $2^{16}\sqrt{8/e}\approx 112428.82793$
            x = take_frac(
              112429,
              randoms[j_random] - fraction_half,
            );
            next_random;
            u = randoms[j_random];
        } until (abs(x) < u);
        x = make_frac(x, u);
        // $2^{24}\cdot12\ln2\approx139548959.6165$
        l = 139548960 - m_log(u);
    } until (ab_vs_cd(1024, l, x, x) >= 0);
    norm_rand = x;
}

132. [8] Packed data. In order to make efficient use of storage space, TEX bases its major data structures on a memory_word , which contains either a (signed) integer, possibly scaled, or a (signed) glue_ratio , or a small number of fields that are one half or one quarter of the size used for storing integers.

If x is a variable of type memory_word , it contains up to four fields that can be referred to as follows:

x.int(aninteger)x.sc(ascaledinteger)x.gr(aglue_ratio)x.hh.lh,x.hh.rh(twohalfwordelds)x.hh.b0,x.hh.b1,x.hh.rh(twoquarterwordelds,onehalfwordeld)x.qqqq.b0,x.qqqq.b1,x.qqqq.b2,x.qqqq.b3(fourquarterwordelds)
This is somewhat cumbersome to write, and not very readable either, but macros will be used to make the notation shorter and more transparent. The Pascal code below gives a formal definition of memory_word and its subsidiary types, using packed variant records. TEX makes no assumptions about the relative positions of the fields within a word.

Since we are assuming 32-bit integers, a halfword must contain at least 16 bits, and a quarterword must contain at least 8 bits. But it doesn’t hurt to have more bits; for example, with enough 36-bit words you might be able to have mem_max as large as 262142, which is eight times as much memory as anybody had during the first four years of TEX’s existence.

N.B.: Valuable memory space will be dreadfully wasted unless TEX is compiled by a Pascal that packs all of the memory_word variants into the space of a single integer. This means, for example, that glue_ratio words should be short_real instead of real on some computers. Some Pascal compilers will pack an integer whose subrange is ‘0 .. 255 ’ into an eight-bit field, but others insist on allocating space for an additional sign bit; on such systems you can get 256 values into a quarterword only if the subrange is ‘-128 .. 127 ’.

The present implementation tries to accommodate as many variations as possible, so it makes few assumptions. If integers having the subrange ‘min_quarterword .. max_quarterword ’ can be packed into a quarterword, and if integers having the subrange ‘min_halfword .. max_halfword ’ can be packed into a halfword, everything should work satisfactorily.

It is usually most efficient to have min_quarterword == min_halfword == 0 , so one should try to achieve this unless it causes a severe problem. The values defined here are recommended for most 32-bit computers.

// smallest allowable value in a quarterword 
@define min_quarterword => 0
// largest allowable value in a quarterword 
@define max_quarterword => 0xffff
// smallest allowable value in a halfword 
@define min_halfword => -0xfffffff
// largest allowable value in a halfword 
@define max_halfword => 0x3fffffff

133. Here are the inequalities that the quarterword and halfword values must satisfy (or rather, the inequalities that they mustn’t satisfy):

⟦14 Check the ``constant'' values for consistency⟧ += ⟦
    init!{
        if ((mem_min != mem_bot) || (mem_max != mem_top)) {
            bad = 10;
        }
    }

    if ((mem_min > mem_bot) || (mem_max < mem_top)) {
        bad = 10;
    }

    if ((min_quarterword > 0) || (max_quarterword < 0x7fff)) {
        bad = 11;
    }

    if ((min_halfword > 0) || (max_halfword < 0x3fffffff)) {
        bad = 12;
    }

    if (
        (min_quarterword < min_halfword)
        || (max_quarterword > max_halfword)
    ) {
        bad = 13;
    }

    if (
        (mem_bot - sup_main_memory < min_halfword)
        || (mem_top + sup_main_memory >= max_halfword)
    ) {
        bad = 14;
    }

    if (
        (max_font_max < min_halfword)
        || (max_font_max > max_halfword)
    ) {
        bad = 15;
    }

    if (font_max > font_base + max_font_max) {
        bad = 16;
    }

    if (
        (save_size > max_halfword)
        || (max_strings > max_halfword)
    ) {
        bad = 17;
    }

    if (buf_size > max_halfword) {
        bad = 18;
    }

    if (max_quarterword - min_quarterword < 0xffff) {
        bad = 19;
    }
⟧

134. The operation of adding or subtracting min_quarterword occurs quite frequently in TEX, so it is convenient to abbreviate this operation by using the macros qi and qo for input and output to and from quarterword format.

The inner loop of TEX will run faster with respect to compilers that don’t optimize expressions like ‘x + 0 ’ and ‘x - 0 ’, if these macros are simplified in the obvious way when min_quarterword == 0 . So they have been simplified here in the obvious way.

The WEB source for TEX defines hi(#) => # + min_halfword which can be simplified when min_halfword == 0 . The Web2C implementation of TEX can use hi(#) => # together with min_halfword < 0 as long as max_halfword is sufficiently large.

// to put an eight_bits item into a quarterword
@define qi(#) => #
// to take an eight_bits item from a quarterword
@define qo(#) => #
// to put a sixteen-bit item into a halfword
@define hi(#) => #
// to take a sixteen-bit item from a halfword
@define ho(#) => #

135. The reader should study the following definitions closely:

@define sc => int //  scaled data is equivalent to integer 
⟦18 Types in the outer block⟧ += ⟦
    type quarterword = min_quarterword .. max_quarterword;

    type halfword = min_halfword .. max_halfword;

    // used when there are two variants in a record
    type two_choices = 1 .. 2;

    // used when there are four variants in a record
    type four_choices = 1 .. 4;

    verbatim!{#include "texmfmem.h";}

    const word_file = gzFile;
⟧

136. When debugging, we may want to print a memory_word without knowing what type it is; so we print it in all modes.

debug!{
    // prints w in all ways
    function print_word(w: memory_word) {
        print_int(w.int);
        print_char(ord!(" "));
        print_scaled(w.sc);
        print_char(ord!(" "));
        print_scaled(round(unity * float(w.gr)));
        print_ln;
        print_int(w.hh.lh);
        print_char(ord!("="));
        print_int(w.hh.b0);
        print_char(ord!(":"));
        print_int(w.hh.b1);
        print_char(ord!(";"));
        print_int(w.hh.rh);
        print_char(ord!(" "));
        print_int(w.qqqq.b0);
        print_char(ord!(":"));
        print_int(w.qqqq.b1);
        print_char(ord!(":"));
        print_int(w.qqqq.b2);
        print_char(ord!(":"));
        print_int(w.qqqq.b3);
    }
}

137. [9] Dynamic memory allocation. The TEX system does nearly all of its own memory allocation, so that it can readily be transported into environments that do not have automatic facilities for strings, garbage collection, etc., and so that it can be in control of what error messages the user receives. The dynamic storage requirements of TEX are handled by providing a large array mem in which consecutive blocks of words are used as nodes by the TEX routines.

Pointer variables are indices into this array, or into another array called eqtb that will be explained later. A pointer variable might also be a special flag that lies outside the bounds of mem , so we allow pointers to assume any halfword value. The minimum halfword value represents a null pointer. TEX does not assume that mem[null] exists.

// a flag or a location in mem or eqtb 
@define pointer => halfword
@define null => min_halfword // the null pointer
⟦13 Global variables⟧ += ⟦
    // a pointer variable for occasional emergency use
    var temp_ptr: pointer;
⟧

138. The mem array is divided into two regions that are allocated separately, but the dividing line between these two regions is not fixed; they grow together until finding their “natural” size in a particular job. Locations less than or equal to lo_mem_max are used for storing variable-length records consisting of two or more words each. This region is maintained using an algorithm similar to the one described in exercise 2.5–19 of The Art of Computer Programming. However, no size field appears in the allocated nodes; the program is responsible for knowing the relevant size when a node is freed. Locations greater than or equal to hi_mem_min are used for storing one-word records; a conventional AVAIL stack is used for allocation in this region.

Locations of mem between mem_bot and mem_top may be dumped as part of preloaded format files, by the INITEX preprocessor. Production versions of TEX may extend the memory at both ends in order to provide more space; locations between mem_min and mem_bot are always used for variable-size nodes, and locations between mem_top and mem_max are always used for single-word nodes.

The key pointers that govern mem allocation have a prescribed order:

null<=mem_min<=mem_bot<lo_mem_max<hi_mem_min<mem_top<=mem_end<=mem_max.

Empirical tests show that the present implementation of TEX tends to spend about 9% of its running time allocating nodes, and about 6% deallocating them after their use.

⟦13 Global variables⟧ += ⟦
    // the big dynamic storage area
    var yzmem: ^memory_word;

    // the big dynamic storage area
    var zmem: ^memory_word;

    // the largest location of variable-size memory in use
    var lo_mem_max: pointer;

    // the smallest location of one-word memory in use
    var hi_mem_min: pointer;
⟧

139. In order to study the memory requirements of particular applications, it is possible to prepare a version of TEX that keeps track of current and maximum memory usage. When code between the delimiters stat tats is not “commented out,” TEX will run a bit slower but it will report these statistics when tracing_stats is sufficiently large.

⟦13 Global variables⟧ += ⟦
    // how much memory is in use
    var var_used, dyn_used: integer;
⟧

140. Let’s consider the one-word memory region first, since it’s the simplest. The pointer variable mem_end holds the highest-numbered location of mem that has ever been used. The free locations of mem that occur between hi_mem_min and mem_end , inclusive, are of type two_halves , and we write info(p) and link(p) for the lh and rh fields of mem[p] when it is of this type. The single-word free locations form a linked list

𝑎𝑣𝑎𝑖𝑙,link(avail),link(link(avail)),
terminated by null .

// the link field of a memory word
@define link(#) => mem[#].hh.rh
// the info field of a memory word
@define info(#) => mem[#].hh.lh
⟦13 Global variables⟧ += ⟦
    // head of the list of available one-word nodes
    var avail: pointer;

    // the last one-word node used in mem 
    var mem_end: pointer;
⟧

141. If memory is exhausted, it might mean that the user has forgotten a right brace. We will define some procedures later that try to help pinpoint the trouble.

⟦322 Declare the procedure called |show_token_list|⟧

⟦336 Declare the procedure called |runaway|⟧

142. The function get_avail returns a pointer to a new one-word node whose link field is null. However, TEX will halt if there is no more room left.

If the available-space list is empty, i.e., if avail == null , we try first to increase mem_end . If that cannot be done, i.e., if mem_end == mem_max , we try to decrease hi_mem_min . If that cannot be done, i.e., if hi_mem_min == lo_mem_max + 1 , we have to quit.

// single-word node allocation
function get_avail(): pointer {
    var
      p: pointer; // the new node being got
    
    // get top location in the avail stack
    p = avail;
    if (p != null) {
        // and pop it off
        avail = link(avail);
    } else // or go into virgin territory
    if (mem_end < mem_max) {
        incr(mem_end);
        p = mem_end;
    } else {
        decr(hi_mem_min);
        p = hi_mem_min;
        if (hi_mem_min <= lo_mem_max) {
            // if memory is exhausted, display possible 
            // runaway text
            runaway;
            // quit; all one-word nodes are busy
            overflow(
              strpool!("main memory size"),
              mem_max + 1 - mem_min,
            );
        }
    }
    // provide an oft-desired initialization of the new node
    link(p) = null;
    stat!{
        incr(dyn_used);
        // maintain statistics
    }
    get_avail = p;
}

143. Conversely, a one-word node is recycled by calling free_avail . This routine is part of TEX’s “inner loop,” so we want it to be fast.

// single-word node liberation
@define free_avail(#) =>
    {
        link(#) = avail;
        avail = #;
        stat!{
            decr(dyn_used);
        }
    }

144. There’s also a fast_get_avail routine, which saves the procedure-call overhead at the expense of extra programming. This routine is used in the places that would otherwise account for the most calls of get_avail .

@define fast_get_avail(#) =>
    {
        // avoid get_avail if possible, to save time
        # = avail;
        if (# == null) {
            # = get_avail;
        } else {
            avail = link(#);
            link(#) = null;
            stat!{
                incr(dyn_used);
            }
        }
    }

145. The procedure flush_list(p) frees an entire linked list of one-word nodes that starts at position p .

// makes list of single-word nodes available
function flush_list(p: pointer) {
    var
      q, r: pointer; // list traversers
    
    if (p != null) {
        r = p;
        repeat {
            q = r;
            r = link(r);
            stat!{
                decr(dyn_used);
            }// now q is the last node on the list
        } until (r == null);
        link(q) = avail;
        avail = p;
    }
}

146. The available-space list that keeps track of the variable-size portion of mem is a nonempty, doubly-linked circular list of empty nodes, pointed to by the roving pointer rover .

Each empty node has size 2 or more; the first word contains the special value max_halfword in its link field and the size in its info field; the second word contains the two pointers for double linking.

Each nonempty node also has size 2 or more. Its first word is of type two_halves , and its link field is never equal to max_halfword . Otherwise there is complete flexibility with respect to the contents of its other fields and its other words.

(We require mem_max < max_halfword because terrible things can happen when max_halfword appears in the link field of a nonempty node.)

// the link of an empty variable-size node
@define empty_flag => max_halfword
@define is_empty(#) =>
    (link(#) == empty_flag) // tests for empty node
// the size field in empty variable-size nodes
@define node_size => info
// left link in doubly-linked list of empty nodes
@define llink(#) => info(# + 1)
// right link in doubly-linked list of empty nodes
@define rlink(#) => link(# + 1)
⟦13 Global variables⟧ += ⟦
    // points to some node in the list of empties
    var rover: pointer;
⟧

147. A call to get_node with argument s returns a pointer to a new node of size s , which must be 2 or more. The link field of the first word of this new node is set to null. An overflow stop occurs if no suitable space exists.

If get_node is called with 𝑠=230, it simply merges adjacent free areas and returns the value max_halfword .

// variable-size node allocation
function get_node(s: integer): pointer {
    label found, exit, restart;
    var
      p: pointer, // the node currently under inspection
      q: pointer, // the node physically after node p 
      r: integer, // the newly allocated node, or a 
      // candidate for this honor
      t: integer; // temporary register
    
  restart:
    // start at some free node in the ring
    p = rover;
    repeat {
        ⟦149 Try to allocate within node |p| and its physical successors, and |goto found| if allocation was possible⟧
        // move to the next node in the ring
        p = rlink(p);// repeat until the whole list has been 
        // traversed
    } until (p == rover);
    if (s == 0x40000000) {
        get_node = max_halfword;
        return;
    }
    if (lo_mem_max + 2 < hi_mem_min) {
        if (lo_mem_max + 2 <= mem_bot + max_halfword) {
            ⟦148 Grow more variable-size memory and |goto restart|⟧
        }
    }
    // sorry, nothing satisfactory is left
    overflow(
      strpool!("main memory size"),
      mem_max + 1 - mem_min,
    );
  found:
    // this node is now nonempty
    link(r) = null;
    stat!{
        // maintain usage statistics
        var_used = var_used + s;
    }
    ⟦1714 Initialize bigger nodes with {\sl Sync\TeX} information⟧
    get_node = r;
  exit:
}

148. The lower part of mem grows by 1000 words at a time, unless we are very close to going under. When it grows, we simply link a new node into the available-space list. This method of controlled growth helps to keep the mem usage consecutive when TEX is implemented on “virtual memory” systems.

⟦148 Grow more variable-size memory and |goto restart|⟧ = ⟦
    {
        if (hi_mem_min - lo_mem_max >= 1998) {
            t = lo_mem_max + 1000;
        } else {
            //  lo_mem_max + 2 <= t < hi_mem_min 
            t = 
                lo_mem_max
                + 1 + (hi_mem_min - lo_mem_max) div 2
            ;
        }
        p = llink(rover);
        q = lo_mem_max;
        rlink(p) = q;
        llink(rover) = q;
        if (t > mem_bot + max_halfword) {
            t = mem_bot + max_halfword;
        }
        rlink(q) = rover;
        llink(q) = p;
        link(q) = empty_flag;
        node_size(q) = t - lo_mem_max;
        lo_mem_max = t;
        link(lo_mem_max) = null;
        info(lo_mem_max) = null;
        rover = q;
        goto restart;
    }
⟧

149. Empirical tests show that the routine in this section performs a node-merging operation about 0.75 times per allocation, on the average, after which it finds that r > p + 1 about 95% of the time.

⟦149 Try to allocate within node |p| and its physical successors, and |goto found| if allocation was possible⟧ = ⟦
    q = p + node_size(p) // find the physical successor

    // merge node p with node q 
    while (is_empty(q)) {
        t = rlink(q);
        if (q == rover) {
            rover = t;
        }
        llink(t) = llink(q);
        rlink(llink(q)) = t;
        q = q + node_size(q);
    }

    r = q - s

    if (r > p + 1) {
        ⟦150 Allocate from the top of node |p| and |goto found|⟧
    }

    if (r == p) {
        if (rlink(p) != p) {
            ⟦151 Allocate entire node |p| and |goto found|⟧
        }
    }

    node_size(p) = q - p // reset the size in case it grew

150.

⟦150 Allocate from the top of node |p| and |goto found|⟧ = ⟦
    {
        // store the remaining size
        node_size(p) = r - p;
        // start searching here next time
        rover = p;
        goto found;
    }
⟧

151. Here we delete node p from the ring, and let rover rove around.

⟦151 Allocate entire node |p| and |goto found|⟧ = ⟦
    {
        rover = rlink(p);
        t = llink(p);
        llink(rover) = t;
        rlink(t) = rover;
        goto found;
    }
⟧

152. Conversely, when some variable-size node p of size s is no longer needed, the operation free_node(p, s) will make its words available, by inserting p as a new empty node just before where rover now points.

// variable-size node liberation
function free_node(p: pointer, s: halfword) {
    var
      q: pointer; //  llink ( rover ) 
    
    node_size(p) = s;
    link(p) = empty_flag;
    q = llink(rover);
    llink(p) = q;
    // set both links
    rlink(p) = rover;
    llink(rover) = p;
    // insert p into the ring
    rlink(q) = p;
    stat!{
        var_used = var_used - s;
        // maintain statistics
    }
}

153. Just before INITEX writes out the memory, it sorts the doubly linked available space list. The list is probably very short at such times, so a simple insertion sort is used. The smallest available location will be pointed to by rover , the next-smallest by rlink(rover) , etc.

init!{
    // sorts the available variable-size nodes by location
    function sort_avail() {
        var
          p, q, r: pointer, // indices into mem 
          old_rover: pointer; // initial rover setting
        
        // merge adjacent free areas
        p = get_node(0x40000000);
        p = rlink(rover);
        rlink(rover) = max_halfword;
        old_rover = rover;
        while (p != old_rover) {
            ⟦154 Sort \(p)|p| into the list starting at |rover| and advance |p| to |rlink(p)|⟧
        }
        p = rover;
        while (rlink(p) != max_halfword) {
            llink(rlink(p)) = p;
            p = rlink(p);
        }
        rlink(p) = rover;
        llink(rover) = p;
    }
}

154. The following while loop is guaranteed to terminate, since the list that starts at rover ends with max_halfword during the sorting procedure.

⟦154 Sort \(p)|p| into the list starting at |rover| and advance |p| to |rlink(p)|⟧ = ⟦
    if (p < rover) {
        q = p;
        p = rlink(q);
        rlink(q) = rover;
        rover = q;
    } else {
        q = rover;
        while (rlink(q) < p) {
            q = rlink(q);
        }
        r = rlink(p);
        rlink(p) = rlink(q);
        rlink(q) = p;
        p = r;
    }
⟧

155. [10] Data structures for boxes and their friends. From the computer’s standpoint, TEX’s chief mission is to create horizontal and vertical lists. We shall now investigate how the elements of these lists are represented internally as nodes in the dynamic memory.

A horizontal or vertical list is linked together by link fields in the first word of each node. Individual nodes represent boxes, glue, penalties, or special things like discretionary hyphens; because of this variety, some nodes are longer than others, and we must distinguish different kinds of nodes. We do this by putting a ‘type ’ field in the first word, together with the link and an optional ‘subtype ’.

// identifies what kind of node this is
@define type(#) => mem[#].hh.b0
// secondary identification in some cases
@define subtype(#) => mem[#].hh.b1

156. A char_node , which represents a single character, is the most important kind of node because it accounts for the vast majority of all boxes. Special precautions are therefore taken to ensure that a char_node does not take up much memory space. Every such node is one word long, and in fact it is identifiable by this property, since other kinds of nodes have at least two words, and they appear in mem locations less than hi_mem_min . This makes it possible to omit the type field in a char_node , leaving us room for two bytes that identify a font and a character within that font.

Note that the format of a char_node allows for up to 256 different fonts and up to 256 characters per font; but most implementations will probably limit the total number of fonts to fewer than 75 per job, and most fonts will stick to characters whose codes are less than 128 (since higher codes are more difficult to access on most keyboards).

Extensions of TEX intended for oriental languages will need even more than 256×256 possible characters, when we consider different sizes and styles of type. It is suggested that Chinese and Japanese fonts be handled by representing such characters in two consecutive char_node entries: The first of these has font == font_base , and its link points to the second; the second identifies the font and the character dimensions. The saving feature about oriental characters is that most of them have the same box dimensions. The character field of the first char_node is a “\charext” that distinguishes between graphic symbols whose dimensions are identical for typesetting purposes. (See the manual.) Such an extension of TEX would not be difficult; further details are left to the reader.

In order to make sure that the character code fits in a quarterword, TEX adds the quantity min_quarterword to the actual code.

Character nodes appear only in horizontal lists, never in vertical lists.

@define is_char_node(#) =>
    (# >= hi_mem_min) // does the argument point to a 
    // char_node ?
@define font => type // the font code in a char_node 
// the character code in a char_node 
@define character => subtype

157. An hlist_node stands for a box that was made from a horizontal list. Each hlist_node is seven words long, and contains the following fields (in addition to the mandatory type and link , which we shall not mention explicitly when discussing the other node types): The height and width and depth are scaled integers denoting the dimensions of the box. There is also a shift_amount field, a scaled integer indicating how much this box should be lowered (if it appears in a horizontal list), or how much it should be moved to the right (if it appears in a vertical list). There is a list_ptr field, which points to the beginning of the list from which this box was fabricated; if list_ptr is null , the box is empty. Finally, there are three fields that represent the setting of the glue: glue_set(p) is a word of type glue_ratio that represents the proportionality constant for glue setting; glue_sign(p) is stretching or shrinking or normal depending on whether or not the glue should stretch or shrink or remain rigid; and glue_order(p) specifies the order of infinity to which glue setting applies (normal , fil , fill , or filll ). The subtype field is not used in TEX. In 𝜀-TEX the subtype field records the box direction mode box_lr .

// Declare the {\sl Sync\TeX} field size to store the {\sl 
// Sync\TeX} information: we will put file tag and line into 
// lh and rh fields of one word
@define synctex_field_size => 1
// The tag subfield
@define sync_tag(#) => mem[# - synctex_field_size].hh.lh
// The line subfield
@define sync_line(#) => mem[# - synctex_field_size].hh.rh
@define hlist_node => 0 //  type of hlist nodes
// number of words to allocate for a box node
@define box_node_size => 7 + synctex_field_size
// position of width field in a box node
@define width_offset => 1
// position of depth field in a box node
@define depth_offset => 2
// position of height field in a box node
@define height_offset => 3
// width of the box, in sp
@define width(#) => mem[# + width_offset].sc
// depth of the box, in sp
@define depth(#) => mem[# + depth_offset].sc
// height of the box, in sp
@define height(#) => mem[# + height_offset].sc
// repositioning distance, in sp
@define shift_amount(#) => mem[# + 4].sc
// position of list_ptr field in a box node
@define list_offset => 5
// beginning of the list inside the box
@define list_ptr(#) => link(# + list_offset)
// applicable order of infinity
@define glue_order(#) => subtype(# + list_offset)
// stretching or shrinking
@define glue_sign(#) => type(# + list_offset)
// the most common case when several cases are named
@define normal => 0
// glue setting applies to the stretch components
@define stretching => 1
// glue setting applies to the shrink components
@define shrinking => 2
// position of glue_set in a box node
@define glue_offset => 6
// a word of type glue_ratio for glue setting
@define glue_set(#) => mem[# + glue_offset].gr

158. The new_null_box function returns a pointer to an hlist_node in which all subfields have the values corresponding to ‘\hbox{}’. (The subtype field is set to min_quarterword , for historic reasons that are no longer relevant.)

// creates a new box node
function new_null_box(): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(box_node_size);
    type(p) = hlist_node;
    subtype(p) = min_quarterword;
    width(p) = 0;
    depth(p) = 0;
    height(p) = 0;
    shift_amount(p) = 0;
    list_ptr(p) = null;
    glue_sign(p) = normal;
    glue_order(p) = normal;
    set_glue_ratio_zero(glue_set(p));
    new_null_box = p;
}

159. A vlist_node is like an hlist_node in all respects except that it contains a vertical list.

@define vlist_node => 1 //  type of vlist nodes

160. A rule_node stands for a solid black rectangle; it has width , depth , and height fields just as in an hlist_node . However, if any of these dimensions is 230, the actual value will be determined by running the rule up to the boundary of the innermost enclosing box. This is called a “running dimension.” The width is never running in an hlist; the height and depth are never running in a vlist.

@define rule_node => 2 //  type of rule nodes
// number of words to allocate for a rule node
@define rule_node_size => 4 + synctex_field_size
// $-2^{30}$, signifies a missing item
@define null_flag => -0x40000000
@define is_running(#) =>
    (# == null_flag) // tests for a running dimension

161. A new rule node is delivered by the new_rule function. It makes all the dimensions “running,” so you have to change the ones that are not allowed to run.

function new_rule(): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(rule_node_size);
    type(p) = rule_node;
    // the subtype is not used
    subtype(p) = 0;
    width(p) = null_flag;
    depth(p) = null_flag;
    height(p) = null_flag;
    new_rule = p;
}

162. Insertions are represented by ins_node records, where the subtype indicates the corresponding box number. For example, ‘\insert 250’ leads to an ins_node whose subtype is 250 + min_quarterword . The height field of an ins_node is slightly misnamed; it actually holds the natural height plus depth of the vertical list being inserted. The depth field holds the split_max_depth to be used in case this insertion is split, and the split_top_ptr points to the corresponding split_top_skip . The float_cost field holds the floating_penalty that will be used if this insertion floats to a subsequent page after a split insertion of the same class. There is one more field, the ins_ptr , which points to the beginning of the vlist for the insertion.

@define ins_node => 3 //  type of insertion nodes
// number of words to allocate for an insertion
@define ins_node_size => 5
// the floating_penalty to be used
@define float_cost(#) => mem[# + 1].int
// the vertical list to be inserted
@define ins_ptr(#) => info(# + 4)
// the split_top_skip to be used
@define split_top_ptr(#) => link(# + 4)

163. A mark_node has a mark_ptr field that points to the reference count of a token list that contains the user’s \mark text. In addition there is a mark_class field that contains the mark class.

@define mark_node => 4 //  type of a mark node
// number of words to allocate for most node types
@define small_node_size => 2
// number of words to allocate for synchronized node types 
// like math, kern, glue and penalty nodes
@define medium_node_size =>
    small_node_size + synctex_field_size
// head of the token list for a mark
@define mark_ptr(#) => link(# + 1)
@define mark_class(#) => info(# + 1) // the mark class

164. An adjust_node , which occurs only in horizontal lists, specifies material that will be moved out into the surrounding vertical list; i.e., it is used to implement TEX’s ‘\vadjust’ operation. The adjust_ptr field points to the vlist containing this material.

@define adjust_node => 5 //  type of an adjust node
@define adjust_pre =>
    //  if subtype != 0 it is pre-adjustment
    subtype;
    //  append_list is used to append a list to tail 
@define append_list(#) =>
    {
        link(tail) = link(#);
        append_list_end
    /* ... closed later ... */
@define append_list_end(#) =>
    /*... opened earlier ...*/
        tail = #;
    }
// vertical list to be moved out of horizontal list
@define adjust_ptr(#) => mem[# + 1].int

165. A ligature_node , which occurs only in horizontal lists, specifies a character that was fabricated from the interaction of two or more actual characters. The second word of the node, which is called the lig_char word, contains font and character fields just as in a char_node . The characters that generated the ligature have not been forgotten, since they are needed for diagnostic messages and for hyphenation; the lig_ptr field points to a linked list of character nodes for all original characters that have been deleted. (This list might be empty if the characters that generated the ligature were retained in other nodes.)

The subtype field is 0, plus 2 and/or 1 if the original source of the ligature included implicit left and/or right boundaries.

@define ligature_node => 6 //  type of a ligature node
// the word where the ligature is to be found
@define lig_char(#) => # + 1
// the list of characters
@define lig_ptr(#) => link(lig_char(#))

166. The new_ligature function creates a ligature node having given contents of the font , character , and lig_ptr fields. We also have a new_lig_item function, which returns a two-word node having a given character field. Such nodes are used for temporary processing as ligatures are being created.

function new_ligature(
  f: internal_font_number,
  c: quarterword,
  q: pointer,
): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(small_node_size);
    type(p) = ligature_node;
    font(lig_char(p)) = f;
    character(lig_char(p)) = c;
    lig_ptr(p) = q;
    subtype(p) = 0;
    new_ligature = p;
}

function new_lig_item(c: quarterword): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(small_node_size);
    character(p) = c;
    lig_ptr(p) = null;
    new_lig_item = p;
}

167. A disc_node , which occurs only in horizontal lists, specifies a “discretionary” line break. If such a break occurs at node p , the text that starts at pre_break(p) will precede the break, the text that starts at post_break(p) will follow the break, and text that appears in the next replace_count(p) nodes will be ignored. For example, an ordinary discretionary hyphen, indicated by ‘\-’, yields a disc_node with pre_break pointing to a char_node containing a hyphen, post_break == null , and replace_count == 0 . All three of the discretionary texts must be lists that consist entirely of character, kern, box, rule, and ligature nodes.

If pre_break(p) == null , the ex_hyphen_penalty will be charged for this break. Otherwise the hyphen_penalty will be charged. The texts will actually be substituted into the list by the line-breaking algorithm if it decides to make the break, and the discretionary node will disappear at that time; thus, the output routine sees only discretionaries that were not chosen.

@define disc_node => 7 //  type of a discretionary node
// how many subsequent nodes to replace
@define replace_count => subtype
// text that precedes a discretionary break
@define pre_break => llink
// text that follows a discretionary break
@define post_break => rlink
// creates an empty disc_node 
function new_disc(): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(small_node_size);
    type(p) = disc_node;
    replace_count(p) = 0;
    pre_break(p) = null;
    post_break(p) = null;
    new_disc = p;
}

168. A whatsit_node is a wild card reserved for extensions to TEX. The subtype field in its first word says what ‘\whatsit’ it is, and implicitly determines the node size (which must be 2 or more) and the format of the remaining words. When a whatsit_node is encountered in a list, special actions are invoked; knowledgeable people who are careful not to mess up the rest of TEX are able to make TEX do new things by adding code at the end of the program. For example, there might be a ‘TEXnicolor’ extension to specify different colors of ink, and the whatsit node might contain the desired parameters.

The present implementation of TEX treats the features associated with ‘\write’ and ‘\special’ as if they were extensions, in order to illustrate how such routines might be coded. We shall defer further discussion of extensions until the end of this program.

//  type of special extension nodes
@define whatsit_node => 8

169. To support “native” fonts, we build native_word_node s, which are variable size whatsits. These have the same width , depth , and height fields as a box_node , at offsets 1-3, and then a word containing a size field for the node, a font number, a length, and a glyph count. Then there is a field containing a C pointer to a glyph info array; this and the glyph count are set by set_native_metrics . Copying and freeing of these nodes needs to take account of this! This is followed by 2 * length bytes, for the actual characters of the string (in UTF-16).

So native_node_size , which does not include any space for the actual text, is 6.

0-3 whatsits subtypes are used for open, write, close, special; 4 is language; pdfTEX uses up through 30-something, so we use subtypes starting from 40.

There are also glyph_node s; these are like native_word_node s in having width , depth , and height fields, but then they contain a glyph ID rather than size and length fields, and there’s no subsidiary C pointer.

//  subtype of whatsits that hold native_font words
@define native_word_node => 40
// a native_word_node that should output ActualText
@define native_word_node_AT => 41
@define is_native_word_subtype(#) =>
    (
        (subtype(#) >= native_word_node)
        && (subtype(#) <= native_word_node_AT)
    )
//  subtype in whatsits that hold glyph numbers
@define glyph_node => 42
// size of a native_word node (plus the actual chars) -- see 
// also \.{xetex.h}
@define native_node_size => 6
@define glyph_node_size => 5
@define native_size(#) => mem[# + 4].qqqq.b0
@define native_font(#) => mem[# + 4].qqqq.b1
@define native_length(#) => mem[# + 4].qqqq.b2
@define native_glyph_count(#) => mem[# + 4].qqqq.b3
@define native_glyph_info_ptr(#) => mem[# + 5].ptr
// number of bytes of info per glyph: 16-bit glyph ID, 
// 32-bit x and y coords
@define native_glyph_info_size => 10
// in glyph_node s, we store the glyph number here
@define native_glyph => native_length
@define free_native_glyph_info(#) =>
    {
        if (native_glyph_info_ptr(#) != null_ptr) {
            libc_free(native_glyph_info_ptr(#));
            native_glyph_info_ptr(#) = null_ptr;
            native_glyph_count(#) = 0;
        }
    }
function copy_native_glyph_info(
  src: pointer,
  dest: pointer,
) {
    var glyph_count: integer;
    
    if (native_glyph_info_ptr(src) != null_ptr) {
        glyph_count = native_glyph_count(src);
        native_glyph_info_ptr(dest) = xmalloc_array(
          char,
          glyph_count * native_glyph_info_size,
        );
        memcpy(
          native_glyph_info_ptr(dest),
          native_glyph_info_ptr(src),
          glyph_count * native_glyph_info_size,
        );
        native_glyph_count(dest) = glyph_count;
    }
}

170. Picture files are handled with nodes that include fields for the transform associated with the picture, and a pathname for the picture file itself. They also have the width , depth , and height fields of a box_node at offsets 1-3. (depth will always be zero, as it happens.)

So pic_node_size , which does not include any space for the picture file pathname, is 7.

A pdf_node is just like pic_node , but generate a different XDV file code.

//  subtype in whatsits that hold picture file references
@define pic_node => 43
//  subtype in whatsits that hold PDF page references
@define pdf_node => 44
@define pic_node_size => 9 // must sync with \.{xetex.h}
@define pic_path_length(#) => mem[# + 4].hh.b0
@define pic_page(#) => mem[# + 4].hh.b1
@define pic_transform1(#) => mem[# + 5].hh.lh
@define pic_transform2(#) => mem[# + 5].hh.rh
@define pic_transform3(#) => mem[# + 6].hh.lh
@define pic_transform4(#) => mem[# + 6].hh.rh
@define pic_transform5(#) => mem[# + 7].hh.lh
@define pic_transform6(#) => mem[# + 7].hh.rh
@define pic_pdf_box(#) => mem[# + 8].hh.b0

171. A math_node , which occurs only in horizontal lists, appears before and after mathematical formulas. The subtype field is before before the formula and after after it. There is a width field, which represents the amount of surrounding space inserted by \mathsurround.

In addition a math_node with subtype > after and width == 0 will be (ab)used to record a regular math_node reinserted after being discarded at a line break or one of the text direction primitives ( \beginL, \endL, \beginR, and \endR ).

@define math_node => 9 //  type of a math node
//  subtype for math node that introduces a formula
@define before => 0
//  subtype for math node that winds up a formula
@define after => 1
@define M_code => 2
//  subtype for \.{\\beginM} node
@define begin_M_code => M_code + before
//  subtype for \.{\\endM} node
@define end_M_code => M_code + after
@define L_code => 4
//  subtype for \.{\\beginL} node
@define begin_L_code => L_code + begin_M_code
//  subtype for \.{\\endL} node
@define end_L_code => L_code + end_M_code
@define R_code => L_code + L_code
//  subtype for \.{\\beginR} node
@define begin_R_code => R_code + begin_M_code
//  subtype for \.{\\endR} node
@define end_R_code => R_code + end_M_code
@define end_LR(#) => odd(subtype(#))
@define end_LR_type(#) =>
    (L_code * (subtype(#) div L_code) + end_M_code)
@define begin_LR_type(#) => (# - after + before)
function new_math(w: scaled, s: small_number): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(medium_node_size);
    type(p) = math_node;
    subtype(p) = s;
    width(p) = w;
    new_math = p;
}

172. TEX makes use of the fact that hlist_node , vlist_node , rule_node , ins_node , mark_node , adjust_node , ligature_node , disc_node , whatsit_node , and math_node are at the low end of the type codes, by permitting a break at glue in a list if and only if the type of the previous node is less than math_node . Furthermore, a node is discarded after a break if its type is math_node or more.

@define precedes_break(#) => (type(#) < math_node)
@define non_discardable(#) => (type(#) < math_node)

173. A glue_node represents glue in a list. However, it is really only a pointer to a separate glue specification, since TEX makes use of the fact that many essentially identical nodes of glue are usually present. If p points to a glue_node , glue_ptr(p) points to another packet of words that specify the stretch and shrink components, etc.

Glue nodes also serve to represent leaders; the subtype is used to distinguish between ordinary glue (which is called normal ) and the three kinds of leaders (which are called a_leaders , c_leaders , and x_leaders ). The leader_ptr field points to a rule node or to a box node containing the leaders; it is set to null in ordinary glue nodes.

Many kinds of glue are computed from TEX’s “skip” parameters, and it is helpful to know which parameter has led to a particular glue node. Therefore the subtype is set to indicate the source of glue, whenever it originated as a parameter. We will be defining symbolic names for the parameter numbers later (e.g., line_skip_code == 0 , baseline_skip_code == 1 , etc.); it suffices for now to say that the subtype of parametric glue will be the same as the parameter number, plus one.

In math formulas there are two more possibilities for the subtype in a glue node: mu_glue denotes an \mskip (where the units are scaled mu instead of scaled pt); and cond_math_glue denotes the ‘\nonscript’ feature that cancels the glue node immediately following if it appears in a subscript.

//  type of node that points to a glue specification
@define glue_node => 10
// special subtype to suppress glue in the next node
@define cond_math_glue => 98
@define mu_glue => 99 //  subtype for math glue
@define a_leaders => 100 //  subtype for aligned leaders
@define c_leaders => 101 //  subtype for centered leaders
@define x_leaders => 102 //  subtype for expanded leaders
@define glue_ptr => llink // pointer to a glue specification
// pointer to box or rule node for leaders
@define leader_ptr => rlink

174. A glue specification has a halfword reference count in its first word, representing null plus the number of glue nodes that point to it (less one). Note that the reference count appears in the same position as the link field in list nodes; this is the field that is initialized to null when a node is allocated, and it is also the field that is flagged by empty_flag in empty nodes.

Glue specifications also contain three scaled fields, for the width , stretch , and shrink dimensions. Finally, there are two one-byte fields called stretch_order and shrink_order ; these contain the orders of infinity (normal , fil , fill , or filll ) corresponding to the stretch and shrink values.

// number of words to allocate for a glue specification
@define glue_spec_size => 4
// reference count of a glue specification
@define glue_ref_count(#) => link(#)
// the stretchability of this glob of glue
@define stretch(#) => mem[# + 2].sc
// the shrinkability of this glob of glue
@define shrink(#) => mem[# + 3].sc
// order of infinity for stretching
@define stretch_order => type
// order of infinity for shrinking
@define shrink_order => subtype
@define fil => 1 // first-order infinity
@define fill => 2 // second-order infinity
@define filll => 3 // third-order infinity
⟦18 Types in the outer block⟧ += ⟦
    // infinity to the 0, 1, 2, or 3 power
    type glue_ord = normal .. filll;
⟧

175. Here is a function that returns a pointer to a copy of a glue spec. The reference count in the copy is null , because there is assumed to be exactly one reference to the new specification.

// duplicates a glue specification
function new_spec(p: pointer): pointer {
    var
      q: pointer; // the new spec
    
    q = get_node(glue_spec_size);
    mem[q] = mem[p];
    glue_ref_count(q) = null;
    width(q) = width(p);
    stretch(q) = stretch(p);
    shrink(q) = shrink(p);
    new_spec = q;
}

176. And here’s a function that creates a glue node for a given parameter identified by its code number; for example, new_param_glue(line_skip_code) returns a pointer to a glue node for the current \lineskip.

function new_param_glue(n: small_number): pointer {
    var
      p: pointer, // the new node
      q: pointer; // the glue specification
    
    p = get_node(medium_node_size);
    type(p) = glue_node;
    subtype(p) = n + 1;
    leader_ptr(p) = null;
    q = ⟦250 Current |mem| equivalent of glue parameter number |n|⟧;
    glue_ptr(p) = q;
    incr(glue_ref_count(q));
    new_param_glue = p;
}

177. Glue nodes that are more or less anonymous are created by new_glue , whose argument points to a glue specification.

function new_glue(q: pointer): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(medium_node_size);
    type(p) = glue_node;
    subtype(p) = normal;
    leader_ptr(p) = null;
    glue_ptr(p) = q;
    incr(glue_ref_count(q));
    new_glue = p;
}

178. Still another subroutine is needed: This one is sort of a combination of new_param_glue and new_glue . It creates a glue node for one of the current glue parameters, but it makes a fresh copy of the glue specification, since that specification will probably be subject to change, while the parameter will stay put. The global variable temp_ptr is set to the address of the new spec.

function new_skip_param(n: small_number): pointer {
    var
      p: pointer; // the new node
    
    temp_ptr = new_spec(
      ⟦250 Current |mem| equivalent of glue parameter number |n|⟧,
    );
    p = new_glue(temp_ptr);
    glue_ref_count(temp_ptr) = null;
    subtype(p) = n + 1;
    new_skip_param = p;
}

179. A kern_node has a width field to specify a (normally negative) amount of spacing. This spacing correction appears in horizontal lists between letters like A and V when the font designer said that it looks better to move them closer together or further apart. A kern node can also appear in a vertical list, when its ‘width ’ denotes additional spacing in the vertical direction. The subtype is either normal (for kerns inserted from font information or math mode calculations) or explicit (for kerns inserted from \kern and \/ commands) or acc_kern (for kerns inserted from non-math accents) or mu_glue (for kerns inserted from \mkern specifications in math formulas).

@define kern_node => 11 //  type of a kern node
//  subtype of kern nodes from \.{\\kern} and \.{\\/}
@define explicit => 1
@define acc_kern => 2 //  subtype of kern nodes from accents
@define space_adjustment =>
    //  subtype of kern nodes from 
    // \.{\\XeTeXinterwordspaceshaping} adjustment
    3;
    // memory structure for marginal kerns
@define margin_kern_node => 40
@define margin_kern_node_size => 3
@define margin_char(#) =>
    // unused for now; relevant for font expansion
    info(# + 2);
    //  subtype of marginal kerns
@define left_side => 0
// base for lp/rp codes starts from 2: 0 for hyphen_char , 1 
// for skew_char 
@define right_side => 1
@define lp_code_base => 2
@define rp_code_base => 3
@define max_hlist_stack =>
    // maximum fill level for hlist_stack 
    512;
    // maybe good if larger than 2 * max_quarterword , so 
    // that box nesting level would overflow first

180. The new_kern function creates a kern node having a given width.

function new_kern(w: scaled): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(medium_node_size);
    type(p) = kern_node;
    subtype(p) = normal;
    width(p) = w;
    new_kern = p;
}

181.

⟦13 Global variables⟧ += ⟦
    var last_leftmost_char: pointer;

    var last_rightmost_char: pointer;

    // stack for find_protchar_left ( ) and 
    // find_protchar_right ( ) 
    var hlist_stack: array [0 .. max_hlist_stack] of pointer;

    // fill level for hlist_stack 
    var hlist_stack_level: 0 .. max_hlist_stack;

    // to access the first node of the paragraph
    var first_p: pointer;

    // to access prev_p in line_break ; should be kept in 
    // sync with prev_p by update_prev_p 
    var global_prev_p: pointer;
⟧

182. A penalty_node specifies the penalty associated with line or page breaking, in its penalty field. This field is a fullword integer, but the full range of integer values is not used: Any penalty >=10000 is treated as infinity, and no break will be allowed for such high values. Similarly, any penalty <=-10000 is treated as negative infinity, and a break will be forced.

@define penalty_node => 12 //  type of a penalty node
@define inf_penalty => inf_bad // ``infinite'' penalty value
// ``negatively infinite'' penalty value
@define eject_penalty => -inf_penalty
// the added cost of breaking a list here
@define penalty(#) => mem[# + 1].int

183. Anyone who has been reading the last few sections of the program will be able to guess what comes next.

function new_penalty(m: integer): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(medium_node_size);
    type(p) = penalty_node;
    // the subtype is not used
    subtype(p) = 0;
    penalty(p) = m;
    new_penalty = p;
}

184. You might think that we have introduced enough node types by now. Well, almost, but there is one more: An unset_node has nearly the same format as an hlist_node or vlist_node ; it is used for entries in \halign or \valign that are not yet in their final form, since the box dimensions are their “natural” sizes before any glue adjustment has been made. The glue_set word is not present; instead, we have a glue_stretch field, which contains the total stretch of order glue_order that is present in the hlist or vlist being boxed. Similarly, the shift_amount field is replaced by a glue_shrink field, containing the total shrink of order glue_sign that is present. The subtype field is called span_count ; an unset box typically contains the data for qo(span_count) + 1 columns. Unset nodes will be changed to box nodes when alignment is completed.

@define unset_node => 13 //  type for an unset node
// total stretch in an unset node
@define glue_stretch(#) => mem[# + glue_offset].sc
// total shrink in an unset node
@define glue_shrink => shift_amount
// indicates the number of spanned columns
@define span_count => subtype

185. In fact, there are still more types coming. When we get to math formula processing we will see that a style_node has type == 14 ; and a number of larger type codes will also be defined, for use in math mode only.

186. Warning: If any changes are made to these data structure layouts, such as changing any of the node sizes or even reordering the words of nodes, the copy_node_list procedure and the memory initialization code below may have to be changed. Such potentially dangerous parts of the program are listed in the index under ‘data structure assumptions’. However, other references to the nodes are made symbolically in terms of the WEB macro definitions above, so that format changes will leave TEX’s other algorithms intact.

187. [11] Memory layout. Some areas of mem are dedicated to fixed usage, since static allocation is more efficient than dynamic allocation when we can get away with it. For example, locations mem_bot to mem_bot + 3 are always used to store the specification for glue that is ‘0pt plus 0pt minus 0pt’. The following macro definitions accomplish the static allocation by giving symbolic names to the fixed positions. Static variable-size nodes appear in locations mem_bot through lo_mem_stat_max , and static single-word nodes appear in locations hi_mem_stat_min through mem_top , inclusive. It is harmless to let lig_trick and garbage share the same location of mem .

// specification for \.{0pt plus 0pt minus 0pt}
@define zero_glue => mem_bot
// \.{0pt plus 1fil minus 0pt}
@define fil_glue => zero_glue + glue_spec_size
// \.{0pt plus 1fill minus 0pt}
@define fill_glue => fil_glue + glue_spec_size
// \.{0pt plus 1fil minus 1fil}
@define ss_glue => fill_glue + glue_spec_size
// \.{0pt plus -1fil minus 0pt}
@define fil_neg_glue => ss_glue + glue_spec_size
// largest statically allocated word in the variable-size 
// mem 
@define lo_mem_stat_max => fil_neg_glue + glue_spec_size - 1
// list of insertion data for current page
@define page_ins_head => mem_top
// vlist of items not yet on current page
@define contrib_head => mem_top - 1
@define page_head => mem_top - 2 // vlist for current page
// head of a temporary list of some kind
@define temp_head => mem_top - 3
// head of a temporary list of another kind
@define hold_head => mem_top - 4
// head of adjustment list returned by hpack 
@define adjust_head => mem_top - 5
// head of active list in line_break , needs two words
@define active => mem_top - 7
// head of preamble list for alignments
@define align_head => mem_top - 8
// tail of spanned-width lists
@define end_span => mem_top - 9
// a constant token list
@define omit_template => mem_top - 10
@define null_list => mem_top - 11 // permanently empty list
// a ligature masquerading as a char_node 
@define lig_trick => mem_top - 12
@define garbage => mem_top - 12 // used for scrap 
// information
// head of token list built by scan_keyword 
@define backup_head => mem_top - 13
// head of pre-adjustment list returned by hpack 
@define pre_adjust_head => mem_top - 14
// smallest statically allocated word in the one-word mem 
@define hi_mem_stat_min => mem_top - 14
// the number of one-word nodes always present
@define hi_mem_stat_usage => 15

188. The following code gets mem off to a good start, when TEX is initializing itself the slow way.

⟦19 Local variables for initialization⟧ += ⟦
    // index into mem , eqtb , etc.
    var k: integer;
⟧

189.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ = ⟦
    for (k in mem_bot + 1 to lo_mem_stat_max) {
        // all glue dimensions are zeroed
        mem[k].sc = 0;
    }

    k = mem_bot

    // set first words of glue specifications
    while (k <= lo_mem_stat_max) {
        glue_ref_count(k) = null + 1;
        stretch_order(k) = normal;
        shrink_order(k) = normal;
        k = k + glue_spec_size;
    }

    stretch(fil_glue) = unity

    stretch_order(fil_glue) = fil

    stretch(fill_glue) = unity

    stretch_order(fill_glue) = fill

    stretch(ss_glue) = unity

    stretch_order(ss_glue) = fil

    shrink(ss_glue) = unity

    shrink_order(ss_glue) = fil

    stretch(fil_neg_glue) = -unity

    stretch_order(fil_neg_glue) = fil

    rover = lo_mem_stat_max + 1

    // now initialize the dynamic memory
    link(rover) = empty_flag

    // which is a 1000-word available node
    node_size(rover) = 1000

    llink(rover) = rover

    rlink(rover) = rover

    lo_mem_max = rover + 1000

    link(lo_mem_max) = null

    info(lo_mem_max) = null

    for (k in hi_mem_stat_min to mem_top) {
        // clear list heads
        mem[k] = mem[lo_mem_max];
    }

    ⟦838 Initialize the special list heads and constant nodes⟧

    avail = null

    mem_end = mem_top

    // initialize the one-word memory
    hi_mem_min = hi_mem_stat_min

    var_used = lo_mem_stat_max + 1 - mem_bot

    dyn_used = hi_mem_stat_usage // initialize statistics

190. If TEX is extended improperly, the mem array might get screwed up. For example, some pointers might be wrong, or some “dead” nodes might not have been freed when the last reference to them disappeared. Procedures check_mem and search_mem are available to help diagnose such problems. These procedures make use of two arrays called free and was_free that are present only if TEX’s debugging routines have been included. (You may want to decrease the size of mem while you are debugging.)

@define free => free_arr
⟦13 Global variables⟧ += ⟦
    // The debug memory arrays have not been mallocated yet.
    debug!{
        // free cells
        var free: packed array [0 .. 9] of boolean;
        // previously free cells
        var was_free: packed array [0 .. 9] of boolean;
        // previous mem_end , lo_mem_max , and hi_mem_min 
        var was_mem_end, was_lo_max, was_hi_min: pointer;
        // do we want to check memory constantly?
        var panicking: boolean;
    }
⟧

191.

⟦23 Set initial values of key variables⟧ += ⟦
    debug!{
        // indicate that everything was previously free
        was_mem_end = mem_min;
        was_lo_max = mem_min;
        was_hi_min = mem_max;
        panicking = false;
    }
⟧

192. Procedure check_mem makes sure that the available space lists of mem are well formed, and it optionally prints out all locations that are reserved now but were free the last time this procedure was called.

debug!{
    function check_mem(print_locs: boolean) {
        label done1, done2; // loop exits
        var
          p, q: pointer, // current locations of interest in 
          // mem 
          clobbered: boolean; // is something amiss?
        
        for (p in mem_min to lo_mem_max) {
            // you can probably do this faster
            free[p] = false;
        }
        for (p in hi_mem_min to mem_end) {
            // ditto
            free[p] = false;
        }
        ⟦193 Check single-word |avail| list⟧
        ⟦194 Check variable-size |avail| list⟧
        ⟦195 Check flags of unavailable nodes⟧
        if (print_locs) {
            ⟦196 Print newly busy locations⟧
        }
        for (p in mem_min to lo_mem_max) {
            was_free[p] = free[p];
        }
        for (p in hi_mem_min to mem_end) {
            //  was_free = free might be faster
            was_free[p] = free[p];
        }
        was_mem_end = mem_end;
        was_lo_max = lo_mem_max;
        was_hi_min = hi_mem_min;
    }
}

193.

⟦193 Check single-word |avail| list⟧ = ⟦
    p = avail

    q = null

    clobbered = false

    while (p != null) {
        if ((p > mem_end) || (p < hi_mem_min)) {
            clobbered = true;
        } else if (free[p]) {
            clobbered = true;
        }
        if (clobbered) {
            print_nl(strpool!("AVAIL list clobbered at "));
            print_int(q);
            goto done1;
        }
        free[p] = true;
        q = p;
        p = link(q);
    }

    done1:
⟧

194.

⟦194 Check variable-size |avail| list⟧ = ⟦
    p = rover

    q = null

    clobbered = false

    repeat {
        if ((p >= lo_mem_max) || (p < mem_min)) {
            clobbered = true;
        } else if (
            (rlink(p) >= lo_mem_max)
            || (rlink(p) < mem_min)
        ) {
            clobbered = true;
        } else if (!
            (is_empty(p))
            || (node_size(p) < 2)
            || (p + node_size(p) > lo_mem_max)
            || (llink(rlink(p)) != p)
        ) {
            clobbered = true;
        }
        if (clobbered) {
            print_nl(
              strpool!("Double-AVAIL list clobbered at "),
            );
            print_int(q);
            goto done2;
        }
        // mark all locations free
        for (q in p to p + node_size(p) - 1) {
            if (free[q]) {
                print_nl(
                  strpool!("Doubly free location at "),
                );
                print_int(q);
                goto done2;
            }
            free[q] = true;
        }
        q = p;
        p = rlink(p);
    } until (p == rover)

    done2:
⟧

195.

⟦195 Check flags of unavailable nodes⟧ = ⟦
    p = mem_min

    // node p should not be empty
    while (p <= lo_mem_max) {
        if (is_empty(p)) {
            print_nl(strpool!("Bad flag at "));
            print_int(p);
        }
        while ((p <= lo_mem_max) && !free[p]) {
            incr(p);
        }
        while ((p <= lo_mem_max) && free[p]) {
            incr(p);
        }
    }
⟧

196.

⟦196 Print newly busy locations⟧ = ⟦
    {
        print_nl(strpool!("New busy locs:"));
        for (p in mem_min to lo_mem_max) {
            if (!
                free[p]
                && ((p > was_lo_max) || was_free[p])
            ) {
                print_char(ord!(" "));
                print_int(p);
            }
        }
        for (p in hi_mem_min to mem_end) {
            if (!
                free[p]
                && (
                    (p < was_hi_min)
                    || (p > was_mem_end) || was_free[p]
                )
            ) {
                print_char(ord!(" "));
                print_int(p);
            }
        }
    }
⟧

197. The search_mem procedure attempts to answer the question “Who points to node p ?” In doing so, it fetches link and info fields of mem that might not be of type two_halves . Strictly speaking, this is undefined in Pascal, and it can lead to “false drops” (words that seem to point to p purely by coincidence). But for debugging purposes, we want to rule out the places that do not point to p , so a few false drops are tolerable.

debug!{
    // look for pointers to p 
    function search_mem(p: pointer) {
        var
          q: integer; // current position being searched
        
        for (q in mem_min to lo_mem_max) {
            if (link(q) == p) {
                print_nl(strpool!("LINK("));
                print_int(q);
                print_char(ord!(")"));
            }
            if (info(q) == p) {
                print_nl(strpool!("INFO("));
                print_int(q);
                print_char(ord!(")"));
            }
        }
        for (q in hi_mem_min to mem_end) {
            if (link(q) == p) {
                print_nl(strpool!("LINK("));
                print_int(q);
                print_char(ord!(")"));
            }
            if (info(q) == p) {
                print_nl(strpool!("INFO("));
                print_int(q);
                print_char(ord!(")"));
            }
        }
        ⟦281 Search |eqtb| for equivalents equal to |p|⟧
        ⟦315 Search |save_stack| for equivalents that point to |p|⟧
        ⟦987 Search |hyph_list| for pointers to |p|⟧
    }
}

198. Some stuff for character protrusion.

function pdf_error(t, p: str_number) {
    normalize_selector;
    print_err(strpool!("Error"));
    if (t != 0) {
        print(strpool!(" ("));
        print(t);
        print(ord!(")"));
    }
    print(strpool!(": "));
    print(p);
    succumb;
}

// finds the node preceding the rightmost node e ; s is some 
// node before e 
function prev_rightmost(s, e: pointer): pointer {
    var p: pointer;
    
    prev_rightmost = null;
    p = s;
    if (p == null) {
        return;
    }
    while (link(p) != e) {
        p = link(p);
        if (p == null) {
            return;
        }
    }
    prev_rightmost = p;
}

function round_xn_over_d(x: scaled, n, d: integer): scaled {
    var
      positive: boolean, // was x >= 0 ?
      t, u, v: nonnegative_integer; // intermediate 
      // quantities
    
    if (x >= 0) {
        positive = true;
    } else {
        negate(x);
        positive = false;
    }
    t = (x % 0x8000) * n;
    u = (x div 0x8000) * n + (t div 0x8000);
    v = (u % d) * 0x8000 + (t % 0x8000);
    if (u div d >= 0x8000) {
        arith_error = true;
    } else {
        u = 0x8000 * (u div d) + (v div d);
    }
    v = v % d;
    if (2 * v >= d) {
        incr(u);
    }
    if (positive) {
        round_xn_over_d = u;
    } else {
        round_xn_over_d = -u;
    }
}

⟦1411 Declare procedures that need to be declared forward for \pdfTeX⟧

199. [12] Displaying boxes. We can reinforce our knowledge of the data structures just introduced by considering two procedures that display a list in symbolic form. The first of these, called short_display , is used in “overfull box” messages to give the top-level description of a list. The other one, called show_node_list , prints a detailed description of exactly what is in the data structure.

The philosophy of short_display is to ignore the fine points about exactly what is inside boxes, except that ligatures and discretionary breaks are expanded. As a result, short_display is a recursive procedure, but the recursion is never more than one level deep.

A global variable font_in_short_display keeps track of the font code that is assumed to be present when short_display begins; deviations from this font will be printed.

⟦13 Global variables⟧ += ⟦
    // an internal font number
    var font_in_short_display: integer;
⟧

200. Boxes, rules, inserts, whatsits, marks, and things in general that are sort of “complicated” are indicated only by printing ‘[]’.

// prints highlights of list p 
function short_display(p: integer) {
    var
      n: integer; // for replacement counts
    
    while (p > mem_min) {
        if (is_char_node(p)) {
            if (p <= mem_end) {
                if (font(p) != font_in_short_display) {
                    if ((font(p) > font_max)) {
                        print_char(ord!("*"));
                    } else {
                        ⟦297 Print the font identifier for |font(p)|⟧
                    }
                    print_char(ord!(" "));
                    font_in_short_display = font(p);
                }
                print_ASCII(qo(character(p)));
            }
        } else {
            ⟦201 Print a short indication of the contents of node |p|⟧
        }
        p = link(p);
    }
}

201.

⟦201 Print a short indication of the contents of node |p|⟧ = ⟦
    case type(p) {
      hlist_node,
      vlist_node,
      ins_node,
      mark_node,
      adjust_node,
      unset_node:
        print(strpool!("[]"));
      whatsit_node:
        case subtype(p) {
          native_word_node, native_word_node_AT:
            if (native_font(p) != font_in_short_display) {
                print_esc(font_id_text(native_font(p)));
                print_char(ord!(" "));
                font_in_short_display = native_font(p);
            }
            print_native_word(p);
          othercases:
            print(strpool!("[]"));
        }
      rule_node:
        print_char(ord!("|"));
      glue_node:
        if (glue_ptr(p) != zero_glue) {
            print_char(ord!(" "));
        }
      math_node:
        if (subtype(p) >= L_code) {
            print(strpool!("[]"));
        } else {
            print_char(ord!("$"));
        }
      ligature_node:
        short_display(lig_ptr(p));
      disc_node:
        short_display(pre_break(p));
        short_display(post_break(p));
        n = replace_count(p);
        while (n > 0) {
            if (link(p) != null) {
                p = link(p);
            }
            decr(n);
        }
      othercases:
        do_nothing;
    }
⟧

202. The show_node_list routine requires some auxiliary subroutines: one to print a font-and-character combination, one to print a token list without its reference count, and one to print a rule dimension.

// prints char_node data
function print_font_and_char(p: integer) {
    if (p > mem_end) {
        print_esc(strpool!("CLOBBERED."));
    } else {
        if ((font(p) > font_max)) {
            print_char(ord!("*"));
        } else {
            ⟦297 Print the font identifier for |font(p)|⟧
        }
        print_char(ord!(" "));
        print_ASCII(qo(character(p)));
    }
}

// prints token list data in braces
function print_mark(p: integer) {
    print_char(ord!("{"));
    if ((p < hi_mem_min) || (p > mem_end)) {
        print_esc(strpool!("CLOBBERED."));
    } else {
        show_token_list(link(p), null, max_print_line - 10);
    }
    print_char(ord!("}"));
}

// prints dimension in rule node
function print_rule_dimen(d: scaled) {
    if (is_running(d)) {
        print_char(ord!("*"));
    } else {
        print_scaled(d);
    }
}

203. Then there is a subroutine that prints glue stretch and shrink, possibly followed by the name of finite units:

// prints a glue component
function print_glue(
  d: scaled,
  order: integer,
  s: str_number,
) {
    print_scaled(d);
    if ((order < normal) || (order > filll)) {
        print(strpool!("foul"));
    } else if (order > normal) {
        print(strpool!("fil"));
        while (order > fil) {
            print_char(ord!("l"));
            decr(order);
        }
    } else if (s != 0) {
        print(s);
    }
}

204. The next subroutine prints a whole glue specification.

// prints a glue specification
function print_spec(p: integer, s: str_number) {
    if ((p < mem_min) || (p >= lo_mem_max)) {
        print_char(ord!("*"));
    } else {
        print_scaled(width(p));
        if (s != 0) {
            print(s);
        }
        if (stretch(p) != 0) {
            print(strpool!(" plus "));
            print_glue(stretch(p), stretch_order(p), s);
        }
        if (shrink(p) != 0) {
            print(strpool!(" minus "));
            print_glue(shrink(p), shrink_order(p), s);
        }
    }
}

205. We also need to declare some procedures that appear later in this documentation.

⟦733 Declare procedures needed for displaying the elements of mlists⟧

⟦251 Declare the procedure called |print_skip_param|⟧

206. Since boxes can be inside of boxes, show_node_list is inherently recursive, up to a given maximum number of levels. The history of nesting is indicated by the current string, which will be printed at the beginning of each line; the length of this string, namely cur_length , is the depth of nesting.

Recursive calls on show_node_list therefore use the following pattern:

@define node_list_display(#) =>
    {
        append_char(ord!("."));
        show_node_list(#);
        flush_char;
        //  str_room need not be checked; see show_box below
    }

207. A global variable called depth_threshold is used to record the maximum depth of nesting for which show_node_list will show information. If we have depth_threshold == 0 , for example, only the top level information will be given and no sublists will be traversed. Another global variable, called breadth_max , tells the maximum number of items to show at each level; breadth_max had better be positive, or you won’t see anything.

⟦13 Global variables⟧ += ⟦
    // maximum nesting depth in box displays
    var depth_threshold: integer;

    // maximum number of items shown at the same list level
    var breadth_max: integer;
⟧

208. Now we are ready for show_node_list itself. This procedure has been written to be “extra robust” in the sense that it should not crash or get into a loop even if the data structures have been messed up by bugs in the rest of the program. You can safely call its parent routine show_box(p) for arbitrary values of p when you are debugging TEX. However, in the presence of bad data, the procedure may fetch a memory_word whose variant is different from the way it was stored; for example, it might try to read mem[p].hh when mem[p] contains a scaled integer, if p is a pointer that has been clobbered or chosen at random.

// prints a node list symbolically
function show_node_list(p: integer) {
    label exit;
    var
      n: integer, // the number of items already printed at 
      // this level
      i: integer, // temp index for printing chars of 
      // picfile paths
      g: real; // a glue ratio, as a floating point number
    
    if (cur_length > depth_threshold) {
        if (p > null) {
            // indicate that there's been some truncation
            print(strpool!(" []"));
        }
        return;
    }
    n = 0;
    while (p > mem_min) {
        print_ln;
        // display the nesting history
        print_current_string;
        // pointer out of range
        if (p > mem_end) {
            print(strpool!("Bad link, display aborted."));
            return;
        }
        incr(n);
        // time to stop
        if (n > breadth_max) {
            print(strpool!("etc."));
            return;
        }
        ⟦209 Display node |p|⟧
        p = link(p);
    }
  exit:
}

209.

⟦209 Display node |p|⟧ = ⟦
    if (is_char_node(p)) {
        print_font_and_char(p);
    } else {
        case type(p) {
          hlist_node, vlist_node, unset_node:
            ⟦210 Display box |p|⟧
          rule_node:
            ⟦213 Display rule |p|⟧
          ins_node:
            ⟦214 Display insertion |p|⟧
          whatsit_node:
            ⟦1416 Display the whatsit node |p|⟧
          glue_node:
            ⟦215 Display glue |p|⟧
          kern_node:
            ⟦217 Display kern |p|⟧
          margin_kern_node:
            print_esc(strpool!("kern"));
            print_scaled(width(p));
            if (subtype(p) == left_side) {
                print(strpool!(" (left margin)"));
            } else {
                print(strpool!(" (right margin)"));
            }
          math_node:
            ⟦218 Display math node |p|⟧
          ligature_node:
            ⟦219 Display ligature |p|⟧
          penalty_node:
            ⟦220 Display penalty |p|⟧
          disc_node:
            ⟦221 Display discretionary |p|⟧
          mark_node:
            ⟦222 Display mark |p|⟧
          adjust_node:
            ⟦223 Display adjustment |p|⟧
          ⟦732 Cases of |show_node_list| that arise in mlists only⟧
          othercases:
            print(strpool!("Unknown node type!"));
        }
    }
⟧

210.

⟦210 Display box |p|⟧ = ⟦
    {
        if (type(p) == hlist_node) {
            print_esc(ord!("h"));
        } else if (type(p) == vlist_node) {
            print_esc(ord!("v"));
        } else {
            print_esc(strpool!("unset"));
        }
        print(strpool!("box("));
        print_scaled(height(p));
        print_char(ord!("+"));
        print_scaled(depth(p));
        print(strpool!(")x"));
        print_scaled(width(p));
        if (type(p) == unset_node) {
            ⟦211 Display special fields of the unset node |p|⟧
        } else {
            ⟦212 Display the value of |glue_set(p)|⟧
            if (shift_amount(p) != 0) {
                print(strpool!(", shifted "));
                print_scaled(shift_amount(p));
            }
            if (eTeX_ex) {
                ⟦1514 Display if this box is never to be reversed⟧
            }
        }
        // recursive call
        node_list_display(list_ptr(p));
    }
⟧

211.

⟦211 Display special fields of the unset node |p|⟧ = ⟦
    {
        if (span_count(p) != min_quarterword) {
            print(strpool!(" ("));
            print_int(qo(span_count(p)) + 1);
            print(strpool!(" columns)"));
        }
        if (glue_stretch(p) != 0) {
            print(strpool!(", stretch "));
            print_glue(glue_stretch(p), glue_order(p), 0);
        }
        if (glue_shrink(p) != 0) {
            print(strpool!(", shrink "));
            print_glue(glue_shrink(p), glue_sign(p), 0);
        }
    }
⟧

212. The code will have to change in this place if glue_ratio is a structured type instead of an ordinary real . Note that this routine should avoid arithmetic errors even if the glue_set field holds an arbitrary random value. The following code assumes that a properly formed nonzero real number has absolute value 220 or more when it is regarded as an integer; this precaution was adequate to prevent floating point underflow on the author’s computer.

⟦212 Display the value of |glue_set(p)|⟧ = ⟦
    g = float(glue_set(p))

    if ((g != float_constant(0)) && (glue_sign(p) != normal)) {
        print(strpool!(", glue set "));
        if (glue_sign(p) == shrinking) {
            // The Unix pc folks removed this restriction 
            // with a remark that invalid bit patterns were 
            // vanishingly improbable, so we follow their 
            // example without really understanding it. if 
            // abs ( mem [ p + glue_offset ] . int ) < 
            // 0x100000 then print ( "?.?" )  else 
            print(strpool!("- "));
        }
        if (fabs(g) > float_constant(20000)) {
            if (g > float_constant(0)) {
                print_char(ord!(">"));
            } else {
                print(strpool!("< -"));
            }
            print_glue(20000 * unity, glue_order(p), 0);
        } else {
            print_glue(round(unity * g), glue_order(p), 0);
        }
    }
⟧

213.

⟦213 Display rule |p|⟧ = ⟦
    {
        print_esc(strpool!("rule("));
        print_rule_dimen(height(p));
        print_char(ord!("+"));
        print_rule_dimen(depth(p));
        print(strpool!(")x"));
        print_rule_dimen(width(p));
    }
⟧

214.

⟦214 Display insertion |p|⟧ = ⟦
    {
        print_esc(strpool!("insert"));
        print_int(qo(subtype(p)));
        print(strpool!(", natural size "));
        print_scaled(height(p));
        print(strpool!("; split("));
        print_spec(split_top_ptr(p), 0);
        print_char(ord!(","));
        print_scaled(depth(p));
        print(strpool!("); float cost "));
        print_int(float_cost(p));
        // recursive call
        node_list_display(ins_ptr(p));
    }
⟧

215.

⟦215 Display glue |p|⟧ = ⟦
    if (subtype(p) >= a_leaders) {
        ⟦216 Display leaders |p|⟧
    } else {
        print_esc(strpool!("glue"));
        if (subtype(p) != normal) {
            print_char(ord!("("));
            if (subtype(p) < cond_math_glue) {
                print_skip_param(subtype(p) - 1);
            } else if (subtype(p) == cond_math_glue) {
                print_esc(strpool!("nonscript"));
            } else {
                print_esc(strpool!("mskip"));
            }
            print_char(ord!(")"));
        }
        if (subtype(p) != cond_math_glue) {
            print_char(ord!(" "));
            if (subtype(p) < cond_math_glue) {
                print_spec(glue_ptr(p), 0);
            } else {
                print_spec(glue_ptr(p), strpool!("mu"));
            }
        }
    }
⟧

216.

⟦216 Display leaders |p|⟧ = ⟦
    {
        print_esc(strpool!(""));
        if (subtype(p) == c_leaders) {
            print_char(ord!("c"));
        } else if (subtype(p) == x_leaders) {
            print_char(ord!("x"));
        }
        print(strpool!("leaders "));
        print_spec(glue_ptr(p), 0);
        // recursive call
        node_list_display(leader_ptr(p));
    }
⟧

217. An “explicit” kern value is indicated implicitly by an explicit space.

⟦217 Display kern |p|⟧ = ⟦
    if (subtype(p) != mu_glue) {
        print_esc(strpool!("kern"));
        if (subtype(p) != normal) {
            print_char(ord!(" "));
        }
        print_scaled(width(p));
        if (subtype(p) == acc_kern) {
            print(strpool!(" (for accent)"));
        } else if (subtype(p) == space_adjustment) {
            print(strpool!(" (space adjustment)"));
        }
    } else {
        print_esc(strpool!("mkern"));
        print_scaled(width(p));
        print(strpool!("mu"));
    }
⟧

218.

⟦218 Display math node |p|⟧ = ⟦
    if (subtype(p) > after) {
        if (end_LR(p)) {
            print_esc(strpool!("end"));
        } else {
            print_esc(strpool!("begin"));
        }
        if (subtype(p) > R_code) {
            print_char(ord!("R"));
        } else if (subtype(p) > L_code) {
            print_char(ord!("L"));
        } else {
            print_char(ord!("M"));
        }
    } else {
        print_esc(strpool!("math"));
        if (subtype(p) == before) {
            print(strpool!("on"));
        } else {
            print(strpool!("off"));
        }
        if (width(p) != 0) {
            print(strpool!(", surrounded "));
            print_scaled(width(p));
        }
    }
⟧

219.

⟦219 Display ligature |p|⟧ = ⟦
    {
        print_font_and_char(lig_char(p));
        print(strpool!(" (ligature "));
        if (subtype(p) > 1) {
            print_char(ord!("|"));
        }
        font_in_short_display = font(lig_char(p));
        short_display(lig_ptr(p));
        if (odd(subtype(p))) {
            print_char(ord!("|"));
        }
        print_char(ord!(")"));
    }
⟧

220.

⟦220 Display penalty |p|⟧ = ⟦
    {
        print_esc(strpool!("penalty "));
        print_int(penalty(p));
    }
⟧

221. The post_break list of a discretionary node is indicated by a prefixed ‘|’ instead of the ‘.’ before the pre_break list.

⟦221 Display discretionary |p|⟧ = ⟦
    {
        print_esc(strpool!("discretionary"));
        if (replace_count(p) > 0) {
            print(strpool!(" replacing "));
            print_int(replace_count(p));
        }
        // recursive call
        node_list_display(pre_break(p));
        append_char(ord!("|"));
        show_node_list(post_break(p));
        // recursive call
        flush_char;
    }
⟧

222.

⟦222 Display mark |p|⟧ = ⟦
    {
        print_esc(strpool!("mark"));
        if (mark_class(p) != 0) {
            print_char(ord!("s"));
            print_int(mark_class(p));
        }
        print_mark(mark_ptr(p));
    }
⟧

223.

⟦223 Display adjustment |p|⟧ = ⟦
    {
        print_esc(strpool!("vadjust"));
        if (adjust_pre(p) != 0) {
            print(strpool!(" pre "));
        }
        // recursive call
        node_list_display(adjust_ptr(p));
    }
⟧

224. The recursive machinery is started by calling show_box .

function show_box(p: pointer) {
    ⟦262 Assign the values |depth_threshold:=show_box_depth| and |breadth_max:=show_box_breadth|⟧
    if (breadth_max <= 0) {
        breadth_max = 5;
    }
    if (pool_ptr + depth_threshold >= pool_size) {
        // now there's enough room for prefix string
        depth_threshold = pool_size - pool_ptr - 1;
    }
    // the show starts at p 
    show_node_list(p);
    print_ln;
}

// prints highlights of list p 
function short_display_n(p, m: integer) {
    breadth_max = m;
    depth_threshold = pool_size - pool_ptr - 1;
    // the show starts at p 
    show_node_list(p);
}

225. [13] Destroying boxes. When we are done with a node list, we are obliged to return it to free storage, including all of its sublists. The recursive procedure flush_node_list does this for us.

226. First, however, we shall consider two non-recursive procedures that do simpler tasks. The first of these, delete_token_ref , is called when a pointer to a token list’s reference count is being removed. This means that the token list should disappear if the reference count was null , otherwise the count should be decreased by one.

// reference count preceding a token list
@define token_ref_count(#) => info(#)
//  p points to the reference count of a token list that is 
// losing one reference
function delete_token_ref(p: pointer) {
    if (token_ref_count(p) == null) {
        flush_list(p);
    } else {
        decr(token_ref_count(p));
    }
}

227. Similarly, delete_glue_ref is called when a pointer to a glue specification is being withdrawn.

@define fast_delete_glue_ref(#) =>
    {
        if (glue_ref_count(#) == null) {
            free_node(#, glue_spec_size);
        } else {
            decr(glue_ref_count(#));
        }
    }
//  p points to a glue specification
function delete_glue_ref(p: pointer) {
    fast_delete_glue_ref(p);
}

228. Now we are ready to delete any node list, recursively. In practice, the nodes deleted are usually charnodes (about 2/3 of the time), and they are glue nodes in about half of the remaining cases.

// erase list of nodes starting at p 
function flush_node_list(p: pointer) {
    label done; // go here when node p has been freed
    var
      q: pointer; // successor to node p 
    
    while (p != null) {
        q = link(p);
        if (is_char_node(p)) {
            free_avail(p);
        } else {
            case type(p) {
              hlist_node, vlist_node, unset_node:
                flush_node_list(list_ptr(p));
                free_node(p, box_node_size);
                goto done;
              rule_node:
                free_node(p, rule_node_size);
                goto done;
              ins_node:
                flush_node_list(ins_ptr(p));
                delete_glue_ref(split_top_ptr(p));
                free_node(p, ins_node_size);
                goto done;
              whatsit_node:
                ⟦1418 Wipe out the whatsit node |p| and |goto done|⟧
              glue_node:
                fast_delete_glue_ref(glue_ptr(p));
                if (leader_ptr(p) != null) {
                    flush_node_list(leader_ptr(p));
                }
                free_node(p, medium_node_size);
                goto done;
              kern_node, math_node, penalty_node:
                free_node(p, medium_node_size);
                goto done;
              margin_kern_node:
                free_node(p, margin_kern_node_size);
                goto done;
              ligature_node:
                flush_node_list(lig_ptr(p));
              mark_node:
                delete_token_ref(mark_ptr(p));
              disc_node:
                flush_node_list(pre_break(p));
                flush_node_list(post_break(p));
              adjust_node:
                flush_node_list(adjust_ptr(p));
              ⟦740 Cases of |flush_node_list| that arise in mlists only⟧
              othercases:
                confusion(strpool!("flushing"));
            }
            free_node(p, small_node_size);
          done:
        }
        p = q;
    }
}

229. [14] Copying boxes. Another recursive operation that acts on boxes is sometimes needed: The procedure copy_node_list returns a pointer to another node list that has the same structure and meaning as the original. Note that since glue specifications and token lists have reference counts, we need not make copies of them. Reference counts can never get too large to fit in a halfword, since each pointer to a node is in a different memory address, and the total number of memory addresses fits in a halfword.

(Well, there actually are also references from outside mem ; if the save_stack is made arbitrarily large, it would theoretically be possible to break TEX by overflowing a reference count. But who would want to do that?)

// new reference to a token list
@define add_token_ref(#) => incr(token_ref_count(#))
// new reference to a glue spec
@define add_glue_ref(#) => incr(glue_ref_count(#))

230. The copying procedure copies words en masse without bothering to look at their individual fields. If the node format changes—for example, if the size is altered, or if some link field is moved to another relative position—then this code may need to be changed too.

// makes a duplicate of the node list that starts at p and 
// returns a pointer to the new list
function copy_node_list(p: pointer): pointer {
    var
      h: pointer, // temporary head of copied list
      q: pointer, // previous position in new list
      r: pointer, // current node being fabricated for new 
      // list
      words: 0 .. 5; // number of words remaining to be 
      // copied
    
    h = get_avail;
    q = h;
    while (p != null) {
        ⟦231 Make a copy of node |p| in node |r|⟧
        link(q) = r;
        q = r;
        p = link(p);
    }
    link(q) = null;
    q = link(h);
    free_avail(h);
    copy_node_list = q;
}

231.

⟦231 Make a copy of node |p| in node |r|⟧ = ⟦
    // this setting occurs in more branches than any other
    words = 1

    if (is_char_node(p)) {
        r = get_avail;
    } else {
        ⟦232 Case statement to copy different types and set |words| to the number of initial words not yet copied⟧
    }

    while (words > 0) {
        decr(words);
        mem[r + words] = mem[p + words];
    }
⟧

232.

⟦232 Case statement to copy different types and set |words| to the number of initial words not yet copied⟧ = ⟦
    case type(p) {
      hlist_node, vlist_node, unset_node:
        r = get_node(box_node_size);
        ⟦1733 Copy the box {\sl Sync\TeX} information⟧
        mem[r + 6] = mem[p + 6];
        // copy the last two words
        mem[r + 5] = mem[p + 5];
        // this affects mem [ r + 5 ] 
        list_ptr(r) = copy_node_list(list_ptr(p));
        words = 5;
      rule_node:
        r = get_node(rule_node_size);
        // {\sl Sync\TeX}: do not let \TeX\ copy the {\sl 
        // Sync\TeX} information
        words = rule_node_size - synctex_field_size;
        ⟦1734 Copy the rule {\sl Sync\TeX} information⟧
      ins_node:
        r = get_node(ins_node_size);
        mem[r + 4] = mem[p + 4];
        add_glue_ref(split_top_ptr(p));
        // this affects mem [ r + 4 ] 
        ins_ptr(r) = copy_node_list(ins_ptr(p));
        words = ins_node_size - 1;
      whatsit_node:
        ⟦1417 Make a partial copy of the whatsit node |p| and make |r| point to it; set |words| to the number of initial words not yet copied⟧
      glue_node:
        r = get_node(medium_node_size);
        add_glue_ref(glue_ptr(p));
        ⟦1735 Copy the medium sized node {\sl Sync\TeX} information⟧
        glue_ptr(r) = glue_ptr(p);
        leader_ptr(r) = copy_node_list(leader_ptr(p));
      kern_node, math_node, penalty_node:
        r = get_node(medium_node_size);
        words = medium_node_size;
      margin_kern_node:
        r = get_node(margin_kern_node_size);
        words = margin_kern_node_size;
      ligature_node:
        r = get_node(small_node_size);
        // copy font and character 
        mem[lig_char(r)] = mem[lig_char(p)];
        lig_ptr(r) = copy_node_list(lig_ptr(p));
      disc_node:
        r = get_node(small_node_size);
        pre_break(r) = copy_node_list(pre_break(p));
        post_break(r) = copy_node_list(post_break(p));
      mark_node:
        r = get_node(small_node_size);
        add_token_ref(mark_ptr(p));
        words = small_node_size;
      adjust_node:
        r = get_node(small_node_size);
        adjust_ptr(r) = copy_node_list(adjust_ptr(p));
        //  words == 1 == small_node_size - 1 
      othercases:
        confusion(strpool!("copying"));
    }
⟧

233. [15] The command codes. Before we can go any further, we need to define symbolic names for the internal code numbers that represent the various commands obeyed by TEX. These codes are somewhat arbitrary, but not completely so. For example, the command codes for character types are fixed by the language, since a user says, e.g., ‘\catcode \̀$ = 3’ to make $ a math delimiter, and the command code math_shift is equal to 3. Some other codes have been made adjacent so that case statements in the program need not consider cases that are widely spaced, or so that case statements can be replaced by if statements.

At any rate, here is the list, for future reference. First come the “catcode” commands, several of which share their numeric codes with ordinary commands when the catcode cannot emerge from TEX’s scanning routine.

// escape delimiter (called \.\\ in {\sl The \TeX book\/})
@define escape => 0
@define relax => 0 // do nothing ( \.{\\relax} )
@define left_brace => 1 // beginning of a group ( \.\{ )
@define right_brace => 2 // ending of a group ( \.\} )
// mathematics shift character ( \.\$ )
@define math_shift => 3
// alignment delimiter ( \.\&, \.{\\span} )
@define tab_mark => 4
// end of line ( carriage_return , \.{\\cr}, \.{\\crcr} )
@define car_ret => 5
@define out_param => 5 // output a macro parameter
@define mac_param => 6 // macro parameter symbol ( \.\# )
@define sup_mark => 7 // superscript ( \.{\char'136} )
@define sub_mark => 8 // subscript ( \.{\char'137} )
@define ignore => 9 // characters to ignore ( \.{\^\^@@} )
@define endv => 9 // end of \<v_j> list in alignment 
// template
// characters equivalent to blank space ( \.{\ } )
@define spacer => 10
// characters regarded as letters ( \.{A..Z}, \.{a..z} )
@define letter => 11
// none of the special character types
@define other_char => 12
// characters that invoke macros ( \.{\char`\~} )
@define active_char => 13
@define par_end => 13 // end of paragraph ( \.{\\par} )
@define match => 13 // match a macro parameter
// characters that introduce comments ( \.\% )
@define comment => 14
@define end_match => 14 // end of parameters to macro
@define stop => 14 // end of job ( \.{\\end}, \.{\\dump} )
// characters that shouldn't appear ( \.{\^\^?} )
@define invalid_char => 15
// specify delimiter numerically ( \.{\\delimiter} )
@define delim_num => 15
// largest catcode for individual characters
@define max_char_code => 15

234. Next are the ordinary run-of-the-mill command codes. Codes that are min_internal or more represent internal quantities that might be expanded by ‘\the’.

// character specified numerically ( \.{\\char} )
@define char_num => 16
// explicit math code ( \.{\\mathchar} )
@define math_char_num => 17
@define mark => 18 // mark definition ( \.{\\mark} )
// peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)
@define xray => 19
// make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)
@define make_box => 20
// horizontal motion ( \.{\\moveleft}, \.{\\moveright} )
@define hmove => 21
// vertical motion ( \.{\\raise}, \.{\\lower} )
@define vmove => 22
// unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )
@define un_hbox => 23
@define un_vbox =>
    // unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )
    24;
    // ( or \.{\\pagediscards}, \.{\\splitdiscards} )
// nullify last item ( \.{\\unpenalty}, \.{\\unkern}, 
// \.{\\unskip} )
@define remove_item => 25
// horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)
@define hskip => 26
// vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)
@define vskip => 27
@define mskip => 28 // math glue ( \.{\\mskip} )
@define kern => 29 // fixed space ( \.{\\kern} )
@define mkern => 30 // math kern ( \.{\\mkern} )
// use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)
@define leader_ship => 31
// horizontal table alignment ( \.{\\halign} )
@define halign => 32
@define valign =>
    // vertical table alignment ( \.{\\valign} )
    33;
    // or text direction directives ( \.{\\beginL}, etc.~)
// temporary escape from alignment ( \.{\\noalign} )
@define no_align => 34
@define vrule => 35 // vertical rule ( \.{\\vrule} )
@define hrule => 36 // horizontal rule ( \.{\\hrule} )
// vlist inserted in box ( \.{\\insert} )
@define insert => 37
// vlist inserted in enclosing paragraph ( \.{\\vadjust} )
@define vadjust => 38
// gobble spacer tokens ( \.{\\ignorespaces} )
@define ignore_spaces => 39
// save till assignment is done ( \.{\\afterassignment} )
@define after_assignment => 40
// save till group is done ( \.{\\aftergroup} )
@define after_group => 41
// additional badness ( \.{\\penalty} )
@define break_penalty => 42
// begin paragraph ( \.{\\indent}, \.{\\noindent} )
@define start_par => 43
@define ital_corr => 44 // italic correction ( \.{\\/} )
// attach accent in text ( \.{\\accent} )
@define accent => 45
// attach accent in math ( \.{\\mathaccent} )
@define math_accent => 46
// discretionary texts ( \.{\\-}, \.{\\discretionary} )
@define discretionary => 47
// equation number ( \.{\\eqno}, \.{\\leqno} )
@define eq_no => 48
@define left_right =>
    // variable delimiter ( \.{\\left}, \.{\\right} )
    49;
    // ( or \.{\\middle} )
// component of formula ( \.{\\mathbin}, etc.~)
@define math_comp => 50
// diddle limit conventions ( \.{\\displaylimits}, etc.~)
@define limit_switch => 51
// generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)
@define above => 52
// style specification ( \.{\\displaystyle}, etc.~)
@define math_style => 53
// choice specification ( \.{\\mathchoice} )
@define math_choice => 54
// conditional math glue ( \.{\\nonscript} )
@define non_script => 55
// vertically center a vbox ( \.{\\vcenter} )
@define vcenter => 56
// force specific case ( \.{\\lowercase}, \.{\\uppercase}~)
@define case_shift => 57
// send to user ( \.{\\message}, \.{\\errmessage} )
@define message => 58
// extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)
@define extension => 59
// files for reading ( \.{\\openin}, \.{\\closein} )
@define in_stream => 60
// begin local grouping ( \.{\\begingroup} )
@define begin_group => 61
// end local grouping ( \.{\\endgroup} )
@define end_group => 62
@define omit => 63 // omit alignment template ( \.{\\omit} )
@define ex_space => 64 // explicit space ( \.{\\\ } )
// suppress boundary ligatures ( \.{\\noboundary} )
@define no_boundary => 65
// square root and similar signs ( \.{\\radical} )
@define radical => 66
// end control sequence ( \.{\\endcsname} )
@define end_cs_name => 67
// the smallest code that can follow \.{\\the}
@define min_internal => 68
// character code defined by \.{\\chardef}
@define char_given => 68
// math code defined by \.{\\mathchardef}
@define math_given => 69
// extended math code defined by \.{\\Umathchardef}
@define XeTeX_math_given => 70
// most recent item ( \.{\\lastpenalty}, \.{\\lastkern}, 
// \.{\\lastskip} )
@define last_item => 71
// largest command code that can't be \.{\\global}
@define max_non_prefixed_command => 71

235. The next codes are special; they all relate to mode-independent assignment of values to TEX’s internal registers or tables. Codes that are max_internal or less represent internal quantities that might be expanded by ‘\the’.

// token list register ( \.{\\toks} )
@define toks_register => 72
// special token list ( \.{\\output}, \.{\\everypar}, etc.~)
@define assign_toks => 73
// user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)
@define assign_int => 74
// user-defined length ( \.{\\hsize}, etc.~)
@define assign_dimen => 75
// user-defined glue ( \.{\\baselineskip}, etc.~)
@define assign_glue => 76
// user-defined muglue ( \.{\\thinmuskip}, etc.~)
@define assign_mu_glue => 77
// user-defined font dimension ( \.{\\fontdimen} )
@define assign_font_dimen => 78
// user-defined font integer ( \.{\\hyphenchar}, 
// \.{\\skewchar} )
@define assign_font_int => 79
// specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )
@define set_aux => 80
// specify state info ( \.{\\prevgraf} )
@define set_prev_graf => 81
// specify state info ( \.{\\pagegoal}, etc.~)
@define set_page_dimen => 82
@define set_page_int =>
    // specify state info ( \.{\\deadcycles}, 
    // \.{\\insertpenalties} )
    83;
    // ( or \.{\\interactionmode} )
// change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )
@define set_box_dimen => 84
@define set_shape =>
    // specify fancy paragraph shape ( \.{\\parshape} )
    85;
    // (or \.{\\interlinepenalties}, etc.~)
// define a character code ( \.{\\catcode}, etc.~)
@define def_code => 86
// \.{\\Umathcode}, \.{\\Udelcode}
@define XeTeX_def_code => 87
// declare math fonts ( \.{\\textfont}, etc.~)
@define def_family => 88
// set current font ( font identifiers )
@define set_font => 89
@define def_font => 90 // define a font file ( \.{\\font} )
// internal register ( \.{\\count}, \.{\\dimen}, etc.~)
@define register => 91
// the largest code that can follow \.{\\the}
@define max_internal => 91
// advance a register or parameter ( \.{\\advance} )
@define advance => 92
// multiply a register or parameter ( \.{\\multiply} )
@define multiply => 93
// divide a register or parameter ( \.{\\divide} )
@define divide => 94
@define prefix =>
    // qualify a definition ( \.{\\global}, \.{\\long}, 
    // \.{\\outer} )
    95;
    // ( or \.{\\protected} )
// assign a command code ( \.{\\let}, \.{\\futurelet} )
@define let => 96
@define shorthand_def =>
    // code definition ( \.{\\chardef}, \.{\\countdef}, 
    // etc.~)
    97;
    // or \.{\\charsubdef}
@define read_to_cs =>
    // read into a control sequence ( \.{\\read} )
    98;
    // ( or \.{\\readline} )
// macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, 
// \.{\\edef} )
@define def => 99
@define set_box => 100 // set a box ( \.{\\setbox} )
// hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )
@define hyph_data => 101
// define level of interaction ( \.{\\batchmode}, etc.~)
@define set_interaction => 102
// the largest command code seen at big_switch 
@define max_command => 102

236. The remaining command codes are extra special, since they cannot get through TEX’s scanner to the main control routine. They have been given values higher than max_command so that their special nature is easily discernible. The “expandable” commands come first.

// initial state of most eq_type fields
@define undefined_cs => max_command + 1
// special expansion ( \.{\\expandafter} )
@define expand_after => max_command + 2
// special nonexpansion ( \.{\\noexpand} )
@define no_expand => max_command + 3
@define input =>
    // input a source file ( \.{\\input}, \.{\\endinput} )
    max_command + 4;
    // ( or \.{\\scantokens} )
// conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)
@define if_test => max_command + 5
// delimiters for conditionals ( \.{\\else}, etc.~)
@define fi_or_else => max_command + 6
// make a control sequence from tokens ( \.{\\csname} )
@define cs_name => max_command + 7
// convert to text ( \.{\\number}, \.{\\string}, etc.~)
@define convert => max_command + 8
@define the =>
    // expand an internal quantity ( \.{\\the} )
    max_command + 9;
    // ( or \.{\\unexpanded}, \.{\\detokenize} )
// inserted mark ( \.{\\topmark}, etc.~)
@define top_bot_mark => max_command + 10
// non-long, non-outer control sequence
@define call => max_command + 11
// long, non-outer control sequence
@define long_call => max_command + 12
// non-long, outer control sequence
@define outer_call => max_command + 13
// long, outer control sequence
@define long_outer_call => max_command + 14
// end of an alignment template
@define end_template => max_command + 15
// the following token was marked by \.{\\noexpand}
@define dont_expand => max_command + 16
// the equivalent points to a glue specification
@define glue_ref => max_command + 17
// the equivalent points to a parshape specification
@define shape_ref => max_command + 18
// the equivalent points to a box node, or is null 
@define box_ref => max_command + 19
// the equivalent is simply a halfword number
@define data => max_command + 20

237. [16] The semantic nest. TEX is typically in the midst of building many lists at once. For example, when a math formula is being processed, TEX is in math mode and working on an mlist; this formula has temporarily interrupted TEX from being in horizontal mode and building the hlist of a paragraph; and this paragraph has temporarily interrupted TEX from being in vertical mode and building the vlist for the next page of a document. Similarly, when a \vbox occurs inside of an \hbox, TEX is temporarily interrupted from working in restricted horizontal mode, and it enters internal vertical mode. The “semantic nest” is a stack that keeps track of what lists and modes are currently suspended.

At each level of processing we are in one of six modes:

vmode stands for vertical mode (the page builder);

hmode stands for horizontal mode (the paragraph builder);

mmode stands for displayed formula mode;

-vmode stands for internal vertical mode (e.g., in a \vbox);

-hmode stands for restricted horizontal mode (e.g., in an \hbox);

-mmode stands for math formula mode (not displayed).

The mode is temporarily set to zero while processing \write texts.

Numeric values are assigned to vmode , hmode , and mmode so that TEX’s “big semantic switch” can select the appropriate thing to do by computing the value abs(mode) + cur_cmd , where mode is the current mode and cur_cmd is the current command code.

@define vmode => 1 // vertical mode
@define hmode => vmode + max_command + 1 // horizontal mode
@define mmode => hmode + max_command + 1 // math mode
// prints the mode represented by m 
function print_mode(m: integer) {
    if (m > 0) {
        case m div (max_command + 1) {
          0:
            print(strpool!("vertical mode"));
          1:
            print(strpool!("horizontal mode"));
          2:
            print(strpool!("display math mode"));
        }
    } else if (m == 0) {
        print(strpool!("no mode"));
    } else {
        case (-m) div (max_command + 1) {
          0:
            print(strpool!("internal vertical mode"));
          1:
            print(strpool!("restricted horizontal mode"));
          2:
            print(strpool!("math mode"));
        }
    }
}

// prints the mode represented by m 
function print_in_mode(m: integer) {
    if (m > 0) {
        case m div (max_command + 1) {
          0:
            print(strpool!("' in vertical mode"));
          1:
            print(strpool!("' in horizontal mode"));
          2:
            print(strpool!("' in display math mode"));
        }
    } else if (m == 0) {
        print(strpool!("' in no mode"));
    } else {
        case (-m) div (max_command + 1) {
          0:
            print(strpool!("' in internal vertical mode"));
          1:
            print(
              strpool!("' in restricted horizontal mode"),
            );
          2:
            print(strpool!("' in math mode"));
        }
    }
}

238. The state of affairs at any semantic level can be represented by five values:

mode is the number representing the semantic mode, as just explained.

head is a pointer to a list head for the list being built; link(head) therefore points to the first element of the list, or to null if the list is empty.

tail is a pointer to the final node of the list being built; thus, tail == head if and only if the list is empty.

prev_graf is the number of lines of the current paragraph that have already been put into the present vertical list.

aux is an auxiliary memory_word that gives further information that is needed to characterize the situation.

In vertical mode, aux is also known as prev_depth ; it is the scaled value representing the depth of the previous box, for use in baseline calculations, or it is <=-1000 pt if the next box on the vertical list is to be exempt from baseline calculations. In horizontal mode, aux is also known as space_factor and clang ; it holds the current space factor used in spacing calculations, and the current language used for hyphenation. (The value of clang is undefined in restricted horizontal mode.) In math mode, aux is also known as incompleat_noad ; if not null , it points to a record that represents the numerator of a generalized fraction for which the denominator is currently being formed in the current list.

There is also a sixth quantity, mode_line , which correlates the semantic nest with the user’s input; mode_line contains the source line number at which the current level of nesting was entered. The negative of this line number is the mode_line at the level of the user’s output routine.

A seventh quantity, eTeX_aux , is used by the extended features 𝜀-TEX. In vertical modes it is known as LR_save and holds the LR stack when a paragraph is interrupted by a displayed formula. In display math mode it is known as LR_box and holds a pointer to a prototype box for the display. In math mode it is known as delim_ptr and points to the most recent left_noad or middle_noad of a math_left_group .

In horizontal mode, the prev_graf field is used for initial language data.

The semantic nest is an array called nest that holds the mode , head , tail , prev_graf , aux , and mode_line values for all semantic levels below the currently active one. Information about the currently active level is kept in the global quantities mode , head , tail , prev_graf , aux , and mode_line , which live in a Pascal record that is ready to be pushed onto nest if necessary.

//  prev_depth value that is ignored
@define ignore_depth => -65536000
⟦18 Types in the outer block⟧ += ⟦
    type list_state_record = record {
        mode_field: -mmode .. mmode,
        head_field, tail_field: pointer,
        eTeX_aux_field: pointer,
        pg_field, ml_field: integer,
        aux_field: memory_word,
    };
⟧

239.

@define mode => cur_list.mode_field // current mode
// header node of current list
@define head => cur_list.head_field
// final node on current list
@define tail => cur_list.tail_field
// auxiliary data for \eTeX
@define eTeX_aux => cur_list.eTeX_aux_field
// LR stack when a paragraph is interrupted
@define LR_save => eTeX_aux
@define LR_box => eTeX_aux // prototype box for display
// most recent left or right noad of a math left group
@define delim_ptr => eTeX_aux
// number of paragraph lines accumulated
@define prev_graf => cur_list.pg_field
// auxiliary data about the current list
@define aux => cur_list.aux_field
// the name of aux in vertical mode
@define prev_depth => aux.sc
// part of aux in horizontal mode
@define space_factor => aux.hh.lh
// the other part of aux in horizontal mode
@define clang => aux.hh.rh
// the name of aux in math mode
@define incompleat_noad => aux.int
// source file line number at beginning of list
@define mode_line => cur_list.ml_field
⟦13 Global variables⟧ += ⟦
    var nest: ^list_state_record;

    // first unused location of nest 
    var nest_ptr: 0 .. nest_size;

    // maximum of nest_ptr when pushing
    var max_nest_stack: 0 .. nest_size;

    // the ``top'' semantic state
    var cur_list: list_state_record;

    // most recent mode shown by \.{\\tracingcommands}
    var shown_mode: -mmode .. mmode;
⟧

240. Here is a common way to make the current list grow:

@define tail_append(#) =>
    {
        link(tail) = #;
        tail = link(tail);
    }

241. We will see later that the vertical list at the bottom semantic level is split into two parts; the “current page” runs from page_head to page_tail , and the “contribution list” runs from contrib_head to tail of semantic level zero. The idea is that contributions are first formed in vertical mode, then “contributed” to the current page (during which time the page-breaking decisions are made). For now, we don’t need to know any more details about the page-building process.

⟦23 Set initial values of key variables⟧ += ⟦
    nest_ptr = 0

    max_nest_stack = 0

    mode = vmode

    head = contrib_head

    tail = contrib_head

    eTeX_aux = null

    prev_depth = ignore_depth

    mode_line = 0

    prev_graf = 0

    // The following piece of code is a copy of module 991:
    shown_mode = 0

    page_contents = empty

    page_tail = page_head //  link ( page_head ) = null ; 

    last_glue = max_halfword

    last_penalty = 0

    last_kern = 0

    last_node_type = -1

    page_depth = 0

    page_max_depth = 0

242. When TEX’s work on one level is interrupted, the state is saved by calling push_nest . This routine changes head and tail so that a new (empty) list is begun; it does not change mode or aux .

// enter a new semantic level, save the old
function push_nest() {
    if (nest_ptr > max_nest_stack) {
        max_nest_stack = nest_ptr;
        if (nest_ptr == nest_size) {
            overflow(
              strpool!("semantic nest size"),
              nest_size,
            );
        }
    }
    // stack the record
    nest[nest_ptr] = cur_list;
    incr(nest_ptr);
    head = get_avail;
    tail = head;
    prev_graf = 0;
    mode_line = line;
    eTeX_aux = null;
}

243. Conversely, when TEX is finished on the current level, the former state is restored by calling pop_nest . This routine will never be called at the lowest semantic level, nor will it be called unless head is a node that should be returned to free memory.

// leave a semantic level, re-enter the old
function pop_nest() {
    free_avail(head);
    decr(nest_ptr);
    cur_list = nest[nest_ptr];
}

244. Here is a procedure that displays what TEX is working on, at all levels.

forward_declaration print_totals();

function show_activities() {
    var
      p: 0 .. nest_size, // index into nest 
      m: -mmode .. mmode, // mode
      a: memory_word, // auxiliary
      q, r: pointer, // for showing the current page
      t: integer; // ditto
    
    // put the top level into the array
    nest[nest_ptr] = cur_list;
    print_nl(strpool!(""));
    print_ln;
    for (p in nest_ptr downto 0) {
        m = nest[p].mode_field;
        a = nest[p].aux_field;
        print_nl(strpool!("### "));
        print_mode(m);
        print(strpool!(" entered at line "));
        print_int(abs(nest[p].ml_field));
        if (m == hmode) {
            if (nest[p].pg_field != 0x830000) {
                print(strpool!(" (language"));
                print_int(nest[p].pg_field % 0x10000);
                print(strpool!(":hyphenmin"));
                print_int(nest[p].pg_field div 0x400000);
                print_char(ord!(","));
                print_int(
                  (nest[p].pg_field div 0x10000) % 0x40,
                );
                print_char(ord!(")"));
            }
        }
        if (nest[p].ml_field < 0) {
            print(strpool!(" (\\output routine)"));
        }
        if (p == 0) {
            ⟦1040 Show the status of the current page⟧
            if (link(contrib_head) != null) {
                print_nl(
                  strpool!("### recent contributions:"),
                );
            }
        }
        show_box(link(nest[p].head_field));
        ⟦245 Show the auxiliary field, |a|⟧
    }
}

245.

⟦245 Show the auxiliary field, |a|⟧ = ⟦
    case abs(m) div (max_command + 1) {
      0:
        print_nl(strpool!("prevdepth "));
        if (a.sc <= ignore_depth) {
            print(strpool!("ignored"));
        } else {
            print_scaled(a.sc);
        }
        if (nest[p].pg_field != 0) {
            print(strpool!(", prevgraf "));
            print_int(nest[p].pg_field);
            if (nest[p].pg_field != 1) {
                print(strpool!(" lines"));
            } else {
                print(strpool!(" line"));
            }
        }
      1:
        print_nl(strpool!("spacefactor "));
        print_int(a.hh.lh);
        if (m > 0) {
            if (a.hh.rh > 0) {
                print(strpool!(", current language "));
                print_int(a.hh.rh);
            }
        }
      2:
        if (a.int != null) {
            print(
              strpool!("this will begin denominator of:"),
            );
            show_box(a.int);
        }// there are no other cases
    }
⟧

246. [17] The table of equivalents. Now that we have studied the data structures for TEX’s semantic routines, we ought to consider the data structures used by its syntactic routines. In other words, our next concern will be the tables that TEX looks at when it is scanning what the user has written.

The biggest and most important such table is called eqtb . It holds the current “equivalents” of things; i.e., it explains what things mean or what their current values are, for all quantities that are subject to the nesting structure provided by TEX’s grouping mechanism. There are six parts to eqtb :

1) eqtb[active_base .. (hash_base - 1)] holds the current equivalents of single-character control sequences.

2) eqtb[hash_base .. (glue_base - 1)] holds the current equivalents of multiletter control sequences.

3) eqtb[glue_base .. (local_base - 1)] holds the current equivalents of glue parameters like the current baselineskip.

4) eqtb[local_base .. (int_base - 1)] holds the current equivalents of local halfword quantities like the current box registers, the current “catcodes,” the current font, and a pointer to the current paragraph shape. Additionally region 4 contains the table with MLTEX’s character substitution definitions.

5) eqtb[int_base .. (dimen_base - 1)] holds the current equivalents of fullword integer parameters like the current hyphenation penalty.

6) eqtb[dimen_base .. eqtb_size] holds the current equivalents of fullword dimension parameters like the current hsize or amount of hanging indentation.

Note that, for example, the current amount of baselineskip glue is determined by the setting of a particular location in region 3 of eqtb , while the current meaning of the control sequence ‘\baselineskip’ (which might have been changed by \def or \let) appears in region 2.

247. Each entry in eqtb is a memory_word . Most of these words are of type two_halves , and subdivided into three fields:

1) The eq_level (a quarterword) is the level of grouping at which this equivalent was defined. If the level is level_zero , the equivalent has never been defined; level_one refers to the outer level (outside of all groups), and this level is also used for global definitions that never go away. Higher levels are for equivalents that will disappear at the end of their group.

2) The eq_type (another quarterword) specifies what kind of entry this is. There are many types, since each TEX primitive like \hbox, \def, etc., has its own special code. The list of command codes above includes all possible settings of the eq_type field.

3) The equiv (a halfword) is the current equivalent value. This may be a font number, a pointer into mem , or a variety of other things.

@define eq_level_field(#) => #.hh.b1
@define eq_type_field(#) => #.hh.b0
@define equiv_field(#) => #.hh.rh
// level of definition
@define eq_level(#) => eq_level_field(eqtb[#])
// command code for equivalent
@define eq_type(#) => eq_type_field(eqtb[#])
@define equiv(#) => equiv_field(eqtb[#]) // equivalent value
// level for undefined quantities
@define level_zero => min_quarterword
// outermost level for defined quantities
@define level_one => level_zero + 1

248. Many locations in eqtb have symbolic names. The purpose of the next paragraphs is to define these names, and to set up the initial values of the equivalents.

In the first region we have number_usvs equivalents for “active characters” that act as control sequences, followed by number_usvs equivalents for single-character control sequences.

Then comes region 2, which corresponds to the hash table that we will define later. The maximum address in this region is used for a dummy control sequence that is perpetually undefined. There also are several locations for control sequences that are perpetually defined (since they are used in error recovery).

// beginning of region 1, for active character equivalents
@define active_base => 1
// equivalents of one-character control sequences
@define single_base => active_base + number_usvs
// equivalent of \.{\\csname\\endcsname}
@define null_cs => single_base + number_usvs
// beginning of region 2, for the hash table
@define hash_base => null_cs + 1
// for error recovery
@define frozen_control_sequence => hash_base + hash_size
// inaccessible but definable
@define frozen_protection => frozen_control_sequence
// permanent `\.{\\cr}'
@define frozen_cr => frozen_control_sequence + 1
// permanent `\.{\\endgroup}'
@define frozen_end_group => frozen_control_sequence + 2
// permanent `\.{\\right}'
@define frozen_right => frozen_control_sequence + 3
// permanent `\.{\\fi}'
@define frozen_fi => frozen_control_sequence + 4
// permanent `\.{\\endtemplate}'
@define frozen_end_template => frozen_control_sequence + 5
// second permanent `\.{\\endtemplate}'
@define frozen_endv => frozen_control_sequence + 6
// permanent `\.{\\relax}'
@define frozen_relax => frozen_control_sequence + 7
// permanent `\.{\\endwrite}'
@define end_write => frozen_control_sequence + 8
// permanent `\.{\\notexpanded:}'
@define frozen_dont_expand => frozen_control_sequence + 9
@define prim_size => 2100 // maximum number of primitives
// permanent `\.{\\special}'
@define frozen_special => frozen_control_sequence + 10
// permanent `\.{\\nullfont}'
@define frozen_null_font =>
    frozen_control_sequence + 12 + prim_size
// permanent `\.{\\pdfprimitive}'
@define frozen_primitive => frozen_control_sequence + 11
@define prim_eqtb_base => frozen_primitive + 1
// begins table of 257 permanent font identifiers
@define font_id_base => frozen_null_font - font_base
// dummy location
@define undefined_control_sequence =>
    frozen_null_font + max_font_max + 1
// beginning of region 3
@define glue_base => undefined_control_sequence + 1
⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    eq_type(undefined_control_sequence) = undefined_cs

    equiv(undefined_control_sequence) = null

    eq_level(undefined_control_sequence) = level_zero

    for (k in active_base to eqtb_top) {
        eqtb[k] = eqtb[undefined_control_sequence];
    }
⟧

249. Here is a routine that displays the current meaning of an eqtb entry in region 1 or 2. (Similar routines for the other regions will appear below.)

⟦249 Show equivalent |n|, in region 1 or 2⟧ = ⟦
    {
        sprint_cs(n);
        print_char(ord!("="));
        print_cmd_chr(eq_type(n), equiv(n));
        if (eq_type(n) >= call) {
            print_char(ord!(":"));
            show_token_list(link(equiv(n)), null, 32);
        }
    }
⟧

250. Region 3 of eqtb contains the number_regs \skip registers, as well as the glue parameters defined here. It is important that the “muskip” parameters have larger numbers than the others.

// interline glue if baseline_skip is infeasible
@define line_skip_code => 0
// desired glue between baselines
@define baseline_skip_code => 1
// extra glue just above a paragraph
@define par_skip_code => 2
// extra glue just above displayed math
@define above_display_skip_code => 3
// extra glue just below displayed math
@define below_display_skip_code => 4
// glue above displayed math following short lines
@define above_display_short_skip_code => 5
// glue below displayed math following short lines
@define below_display_short_skip_code => 6
// glue at left of justified lines
@define left_skip_code => 7
// glue at right of justified lines
@define right_skip_code => 8
@define top_skip_code => 9 // glue at top of main pages
// glue at top of split pages
@define split_top_skip_code => 10
@define tab_skip_code => 11 // glue between aligned entries
// glue between words (if not zero_glue )
@define space_skip_code => 12
// glue after sentences (if not zero_glue )
@define xspace_skip_code => 13
// glue on last line of paragraph
@define par_fill_skip_code => 14
// glue introduced at potential linebreak location
@define XeTeX_linebreak_skip_code => 15
@define thin_mu_skip_code => 16 // thin space in math 
// formula
// medium space in math formula
@define med_mu_skip_code => 17
// thick space in math formula
@define thick_mu_skip_code => 18
@define glue_pars => 19 // total number of glue parameters
// table of number_regs ``skip'' registers
@define skip_base => glue_base + glue_pars
// table of number_regs ``muskip'' registers
@define mu_skip_base => skip_base + number_regs
// beginning of region 4
@define local_base => mu_skip_base + number_regs
//  mem location of glue specification
@define skip(#) => equiv(skip_base + #)
//  mem location of math glue spec
@define mu_skip(#) => equiv(mu_skip_base + #)
//  mem location of glue specification
@define glue_par(#) => equiv(glue_base + #)
@define line_skip => glue_par(line_skip_code)
@define baseline_skip => glue_par(baseline_skip_code)
@define par_skip => glue_par(par_skip_code)
@define above_display_skip =>
    glue_par(above_display_skip_code)
@define below_display_skip =>
    glue_par(below_display_skip_code)
@define above_display_short_skip =>
    glue_par(above_display_short_skip_code)
@define below_display_short_skip =>
    glue_par(below_display_short_skip_code)
@define left_skip => glue_par(left_skip_code)
@define right_skip => glue_par(right_skip_code)
@define top_skip => glue_par(top_skip_code)
@define split_top_skip => glue_par(split_top_skip_code)
@define tab_skip => glue_par(tab_skip_code)
@define space_skip => glue_par(space_skip_code)
@define xspace_skip => glue_par(xspace_skip_code)
@define par_fill_skip => glue_par(par_fill_skip_code)
@define XeTeX_linebreak_skip =>
    glue_par(XeTeX_linebreak_skip_code)
@define thin_mu_skip => glue_par(thin_mu_skip_code)
@define med_mu_skip => glue_par(med_mu_skip_code)
@define thick_mu_skip => glue_par(thick_mu_skip_code)
⟦250 Current |mem| equivalent of glue parameter number |n|⟧ = ⟦
    glue_par(n)
⟧

251. Sometimes we need to convert TEX’s internal code numbers into symbolic form. The print_skip_param routine gives the symbolic name of a glue parameter.

⟦251 Declare the procedure called |print_skip_param|⟧ = ⟦
    function print_skip_param(n: integer) {
        case n {
          line_skip_code:
            print_esc(strpool!("lineskip"));
          baseline_skip_code:
            print_esc(strpool!("baselineskip"));
          par_skip_code:
            print_esc(strpool!("parskip"));
          above_display_skip_code:
            print_esc(strpool!("abovedisplayskip"));
          below_display_skip_code:
            print_esc(strpool!("belowdisplayskip"));
          above_display_short_skip_code:
            print_esc(strpool!("abovedisplayshortskip"));
          below_display_short_skip_code:
            print_esc(strpool!("belowdisplayshortskip"));
          left_skip_code:
            print_esc(strpool!("leftskip"));
          right_skip_code:
            print_esc(strpool!("rightskip"));
          top_skip_code:
            print_esc(strpool!("topskip"));
          split_top_skip_code:
            print_esc(strpool!("splittopskip"));
          tab_skip_code:
            print_esc(strpool!("tabskip"));
          space_skip_code:
            print_esc(strpool!("spaceskip"));
          xspace_skip_code:
            print_esc(strpool!("xspaceskip"));
          par_fill_skip_code:
            print_esc(strpool!("parfillskip"));
          XeTeX_linebreak_skip_code:
            print_esc(strpool!("XeTeXlinebreakskip"));
          thin_mu_skip_code:
            print_esc(strpool!("thinmuskip"));
          med_mu_skip_code:
            print_esc(strpool!("medmuskip"));
          thick_mu_skip_code:
            print_esc(strpool!("thickmuskip"));
          othercases:
            print(strpool!("[unknown glue parameter!]"));
        }
    }
⟧

252. The symbolic names for glue parameters are put into TEX’s hash table by using the routine called primitive , defined below. Let us enter them now, so that we don’t have to list all those parameter names anywhere else.

⟦252 Put each of \TeX's primitives into the hash table⟧ = ⟦
    primitive(
      strpool!("lineskip"),
      assign_glue,
      glue_base + line_skip_code,
    )

    primitive(
      strpool!("baselineskip"),
      assign_glue,
      glue_base + baseline_skip_code,
    )

    primitive(
      strpool!("parskip"),
      assign_glue,
      glue_base + par_skip_code,
    )

    primitive(
      strpool!("abovedisplayskip"),
      assign_glue,
      glue_base + above_display_skip_code,
    )

    primitive(
      strpool!("belowdisplayskip"),
      assign_glue,
      glue_base + below_display_skip_code,
    )

    primitive(
      strpool!("abovedisplayshortskip"),
      assign_glue,
      glue_base + above_display_short_skip_code,
    )

    primitive(
      strpool!("belowdisplayshortskip"),
      assign_glue,
      glue_base + below_display_short_skip_code,
    )

    primitive(
      strpool!("leftskip"),
      assign_glue,
      glue_base + left_skip_code,
    )

    primitive(
      strpool!("rightskip"),
      assign_glue,
      glue_base + right_skip_code,
    )

    primitive(
      strpool!("topskip"),
      assign_glue,
      glue_base + top_skip_code,
    )

    primitive(
      strpool!("splittopskip"),
      assign_glue,
      glue_base + split_top_skip_code,
    )

    primitive(
      strpool!("tabskip"),
      assign_glue,
      glue_base + tab_skip_code,
    )

    primitive(
      strpool!("spaceskip"),
      assign_glue,
      glue_base + space_skip_code,
    )

    primitive(
      strpool!("xspaceskip"),
      assign_glue,
      glue_base + xspace_skip_code,
    )

    primitive(
      strpool!("parfillskip"),
      assign_glue,
      glue_base + par_fill_skip_code,
    )

    primitive(
      strpool!("XeTeXlinebreakskip"),
      assign_glue,
      glue_base + XeTeX_linebreak_skip_code,
    )

    primitive(
      strpool!("thinmuskip"),
      assign_mu_glue,
      glue_base + thin_mu_skip_code,
    )

    primitive(
      strpool!("medmuskip"),
      assign_mu_glue,
      glue_base + med_mu_skip_code,
    )

    primitive(
      strpool!("thickmuskip"),
      assign_mu_glue,
      glue_base + thick_mu_skip_code,
    )
⟧

253.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ = ⟦
    assign_glue, assign_mu_glue:
      if (chr_code < skip_base) {
          print_skip_param(chr_code - glue_base);
      } else if (chr_code < mu_skip_base) {
          print_esc(strpool!("skip"));
          print_int(chr_code - skip_base);
      } else {
          print_esc(strpool!("muskip"));
          print_int(chr_code - mu_skip_base);
      }
⟧

254. All glue parameters and registers are initially ‘0pt plus0pt minus0pt’.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    equiv(glue_base) = zero_glue

    eq_level(glue_base) = level_one

    eq_type(glue_base) = glue_ref

    for (k in glue_base + 1 to local_base - 1) {
        eqtb[k] = eqtb[glue_base];
    }

    glue_ref_count(zero_glue) = 
        glue_ref_count(zero_glue)
        + local_base - glue_base
⟧

255.

⟦255 Show equivalent |n|, in region 3⟧ = ⟦
    if (n < skip_base) {
        print_skip_param(n - glue_base);
        print_char(ord!("="));
        if (n < glue_base + thin_mu_skip_code) {
            print_spec(equiv(n), strpool!("pt"));
        } else {
            print_spec(equiv(n), strpool!("mu"));
        }
    } else if (n < mu_skip_base) {
        print_esc(strpool!("skip"));
        print_int(n - skip_base);
        print_char(ord!("="));
        print_spec(equiv(n), strpool!("pt"));
    } else {
        print_esc(strpool!("muskip"));
        print_int(n - mu_skip_base);
        print_char(ord!("="));
        print_spec(equiv(n), strpool!("mu"));
    }
⟧

256. Region 4 of eqtb contains the local quantities defined here. The bulk of this region is taken up by five tables that are indexed by eight-bit characters; these tables are important to both the syntactic and semantic portions of TEX. There are also a bunch of special things like font and token parameters, as well as the tables of \toks and \box registers.

// specifies paragraph shape
@define par_shape_loc => local_base
// points to token list for \.{\\output}
@define output_routine_loc => local_base + 1
// points to token list for \.{\\everypar}
@define every_par_loc => local_base + 2
// points to token list for \.{\\everymath}
@define every_math_loc => local_base + 3
// points to token list for \.{\\everydisplay}
@define every_display_loc => local_base + 4
// points to token list for \.{\\everyhbox}
@define every_hbox_loc => local_base + 5
// points to token list for \.{\\everyvbox}
@define every_vbox_loc => local_base + 6
// points to token list for \.{\\everyjob}
@define every_job_loc => local_base + 7
// points to token list for \.{\\everycr}
@define every_cr_loc => local_base + 8
// points to token list for \.{\\errhelp}
@define err_help_loc => local_base + 9
// end of \TeX's token list parameters
@define tex_toks => local_base + 10
// base for \eTeX's token list parameters
@define etex_toks_base => tex_toks
// points to token list for \.{\\everyeof}
@define every_eof_loc => etex_toks_base
// not really used, but serves as a flag
@define XeTeX_inter_char_loc => every_eof_loc + 1
// end of \eTeX's token list parameters
@define etex_toks => XeTeX_inter_char_loc + 1
// table of number_regs token list registers
@define toks_base => etex_toks
// start of table of \eTeX's penalties
@define etex_pen_base => toks_base + number_regs
// additional penalties between lines
@define inter_line_penalties_loc => etex_pen_base
// penalties for creating club lines
@define club_penalties_loc => etex_pen_base + 1
// penalties for creating widow lines
@define widow_penalties_loc => etex_pen_base + 2
// ditto, just before a display
@define display_widow_penalties_loc => etex_pen_base + 3
// end of table of \eTeX's penalties
@define etex_pens => etex_pen_base + 4
// table of number_regs box registers
@define box_base => etex_pens
// internal font number outside math mode
@define cur_font_loc => box_base + number_regs
// table of number_math_fonts math font numbers
@define math_font_base => cur_font_loc + 1
// table of number_usvs command codes (the ``catcodes'')
@define cat_code_base => math_font_base + number_math_fonts
// table of number_usvs lowercase mappings
@define lc_code_base => cat_code_base + number_usvs
// table of number_usvs uppercase mappings
@define uc_code_base => lc_code_base + number_usvs
// table of number_usvs spacefactor mappings
@define sf_code_base => uc_code_base + number_usvs
// table of number_usvs math mode mappings
@define math_code_base => sf_code_base + number_usvs
// table of character substitutions
@define char_sub_code_base => math_code_base + number_usvs
// beginning of region 5
@define int_base => char_sub_code_base + number_usvs
@define par_shape_ptr => equiv(par_shape_loc)
@define output_routine => equiv(output_routine_loc)
@define every_par => equiv(every_par_loc)
@define every_math => equiv(every_math_loc)
@define every_display => equiv(every_display_loc)
@define every_hbox => equiv(every_hbox_loc)
@define every_vbox => equiv(every_vbox_loc)
@define every_job => equiv(every_job_loc)
@define every_cr => equiv(every_cr_loc)
@define err_help => equiv(err_help_loc)
@define toks(#) => equiv(toks_base + #)
@define box(#) => equiv(box_base + #)
@define cur_font => equiv(cur_font_loc)
@define fam_fnt(#) => equiv(math_font_base + #)
@define cat_code(#) => equiv(cat_code_base + #)
@define lc_code(#) => equiv(lc_code_base + #)
@define uc_code(#) => equiv(uc_code_base + #)
@define sf_code(#) => equiv(sf_code_base + #)
// Note: math_code ( c ) is the true math code plus 
// min_halfword 
@define math_code(#) => equiv(math_code_base + #)
// Note: char_sub_code ( c ) is the true substitution info 
// plus min_halfword 
@define char_sub_code(#) => equiv(char_sub_code_base + #)
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("output"),
      assign_toks,
      output_routine_loc,
    )

    primitive(
      strpool!("everypar"),
      assign_toks,
      every_par_loc,
    )

    primitive(
      strpool!("everymath"),
      assign_toks,
      every_math_loc,
    )

    primitive(
      strpool!("everydisplay"),
      assign_toks,
      every_display_loc,
    )

    primitive(
      strpool!("everyhbox"),
      assign_toks,
      every_hbox_loc,
    )

    primitive(
      strpool!("everyvbox"),
      assign_toks,
      every_vbox_loc,
    )

    primitive(
      strpool!("everyjob"),
      assign_toks,
      every_job_loc,
    )

    primitive(
      strpool!("everycr"),
      assign_toks,
      every_cr_loc,
    )

    primitive(
      strpool!("errhelp"),
      assign_toks,
      err_help_loc,
    )
⟧

257.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    assign_toks:

    if (chr_code >= toks_base) {
        print_esc(strpool!("toks"));
        print_int(chr_code - toks_base);
    } else {
        case chr_code {
          output_routine_loc:
            print_esc(strpool!("output"));
          every_par_loc:
            print_esc(strpool!("everypar"));
          every_math_loc:
            print_esc(strpool!("everymath"));
          every_display_loc:
            print_esc(strpool!("everydisplay"));
          every_hbox_loc:
            print_esc(strpool!("everyhbox"));
          every_vbox_loc:
            print_esc(strpool!("everyvbox"));
          every_job_loc:
            print_esc(strpool!("everyjob"));
          every_cr_loc:
            print_esc(strpool!("everycr"));
          ⟦1468 Cases of |assign_toks| for |print_cmd_chr|⟧
          othercases:
            print_esc(strpool!("errhelp"));
        }
    }
⟧

258. We initialize most things to null or undefined values. An undefined font is represented by the internal code font_base .

However, the character code tables are given initial values based on the conventional interpretation of ASCII code. These initial values should not be changed when TEX is adapted for use with non-English languages; all changes to the initialization conventions should be made in format packages, not in TEX itself, so that global interchange of formats is possible.

@define null_font => font_base
@define var_fam_class => 7
@define active_math_char => 0x1fffff
@define is_active_math_char(#) =>
    math_char_field(#) == active_math_char
@define is_var_family(#) => math_class_field(#) == 7
⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    par_shape_ptr = null

    eq_type(par_shape_loc) = shape_ref

    eq_level(par_shape_loc) = level_one

    for (k in etex_pen_base to etex_pens - 1) {
        eqtb[k] = eqtb[par_shape_loc];
    }

    for (k in output_routine_loc to 
        toks_base
        + number_regs - 1
    ) {
        eqtb[k] = eqtb[undefined_control_sequence];
    }

    box(0) = null

    eq_type(box_base) = box_ref

    eq_level(box_base) = level_one

    for (k in box_base + 1 to box_base + number_regs - 1) {
        eqtb[k] = eqtb[box_base];
    }

    cur_font = null_font

    eq_type(cur_font_loc) = data

    eq_level(cur_font_loc) = level_one

    for (k in math_font_base to 
        math_font_base
        + number_math_fonts - 1
    ) {
        eqtb[k] = eqtb[cur_font_loc];
    }

    equiv(cat_code_base) = 0

    eq_type(cat_code_base) = data

    eq_level(cat_code_base) = level_one

    for (k in cat_code_base + 1 to int_base - 1) {
        eqtb[k] = eqtb[cat_code_base];
    }

    for (k in 0 to number_usvs - 1) {
        cat_code(k) = other_char;
        math_code(k) = hi(k);
        sf_code(k) = 1000;
    }

    cat_code(carriage_return) = car_ret

    cat_code(ord!(" ")) = spacer

    cat_code(ord!("\\")) = escape

    cat_code(ord!("%")) = comment

    cat_code(invalid_code) = invalid_char

    cat_code(null_code) = ignore

    for (k in ord!("0") to ord!("9")) {
        math_code(k) = hi(
          k + set_class_field(var_fam_class),
        );
    }

    for (k in ord!("A") to ord!("Z")) {
        cat_code(k) = letter;
        cat_code(k + ord!("a") - ord!("A")) = letter;
        math_code(k) = hi(
          
              k
              + set_family_field(1)
              + set_class_field(var_fam_class)
          ,
        );
        math_code(k + ord!("a") - ord!("A")) = hi(
          
              k
              + ord!("a")
              - ord!("A")
              + set_family_field(1)
              + set_class_field(var_fam_class)
          ,
        );
        lc_code(k) = k + ord!("a") - ord!("A");
        lc_code(k + ord!("a") - ord!("A")) = 
            k
            + ord!("a") - ord!("A")
        ;
        uc_code(k) = k;
        uc_code(k + ord!("a") - ord!("A")) = k;
        sf_code(k) = 999;
    }
⟧

259.

⟦259 Show equivalent |n|, in region 4⟧ = ⟦
    if (
        (n == par_shape_loc)
        || ((n >= etex_pen_base) && (n < etex_pens))
    ) {
        print_cmd_chr(set_shape, n);
        print_char(ord!("="));
        if (equiv(n) == null) {
            print_char(ord!("0"));
        } else if (n > par_shape_loc) {
            print_int(penalty(equiv(n)));
            print_char(ord!(" "));
            print_int(penalty(equiv(n) + 1));
            if (penalty(equiv(n)) > 1) {
                print_esc(strpool!("ETC."));
            }
        } else {
            print_int(info(par_shape_ptr));
        }
    } else if (n < toks_base) {
        print_cmd_chr(assign_toks, n);
        print_char(ord!("="));
        if (equiv(n) != null) {
            show_token_list(link(equiv(n)), null, 32);
        }
    } else if (n < box_base) {
        print_esc(strpool!("toks"));
        print_int(n - toks_base);
        print_char(ord!("="));
        if (equiv(n) != null) {
            show_token_list(link(equiv(n)), null, 32);
        }
    } else if (n < cur_font_loc) {
        print_esc(strpool!("box"));
        print_int(n - box_base);
        print_char(ord!("="));
        if (equiv(n) == null) {
            print(strpool!("void"));
        } else {
            depth_threshold = 0;
            breadth_max = 1;
            show_node_list(equiv(n));
        }
    } else if (n < cat_code_base) {
        ⟦260 Show the font identifier in |eqtb[n]|⟧
    } else {
        ⟦261 Show the halfword code in |eqtb[n]|⟧
    }
⟧

260.

⟦260 Show the font identifier in |eqtb[n]|⟧ = ⟦
    {
        if (n == cur_font_loc) {
            print(strpool!("current font"));
        } else if (n < math_font_base + script_size) {
            print_esc(strpool!("textfont"));
            print_int(n - math_font_base);
        } else if (n < math_font_base + script_script_size) {
            print_esc(strpool!("scriptfont"));
            print_int(n - math_font_base - script_size);
        } else {
            print_esc(strpool!("scriptscriptfont"));
            print_int(
              n - math_font_base - script_script_size,
            );
        }
        print_char(ord!("="));
        // that's font_id_text ( equiv ( n ) ) 
        print_esc(hash[font_id_base + equiv(n)].rh);
    }
⟧

261.

⟦261 Show the halfword code in |eqtb[n]|⟧ = ⟦
    if (n < math_code_base) {
        if (n < lc_code_base) {
            print_esc(strpool!("catcode"));
            print_int(n - cat_code_base);
        } else if (n < uc_code_base) {
            print_esc(strpool!("lccode"));
            print_int(n - lc_code_base);
        } else if (n < sf_code_base) {
            print_esc(strpool!("uccode"));
            print_int(n - uc_code_base);
        } else {
            print_esc(strpool!("sfcode"));
            print_int(n - sf_code_base);
        }
        print_char(ord!("="));
        print_int(equiv(n));
    } else {
        print_esc(strpool!("mathcode"));
        print_int(n - math_code_base);
        print_char(ord!("="));
        print_int(ho(equiv(n)));
    }
⟧

262. Region 5 of eqtb contains the integer parameters and registers defined here, as well as the del_code table. The latter table differs from the cat_code .. math_code tables that precede it, since delimiter codes are fullword integers while the other kinds of codes occupy at most a halfword. This is what makes region 5 different from region 4. We will store the eq_level information in an auxiliary array of quarterwords that will be defined later.

// badness tolerance before hyphenation
@define pretolerance_code => 0
// badness tolerance after hyphenation
@define tolerance_code => 1
// added to the badness of every line
@define line_penalty_code => 2
// penalty for break after discretionary hyphen
@define hyphen_penalty_code => 3
// penalty for break after explicit hyphen
@define ex_hyphen_penalty_code => 4
// penalty for creating a club line
@define club_penalty_code => 5
// penalty for creating a widow line
@define widow_penalty_code => 6
// ditto, just before a display
@define display_widow_penalty_code => 7
// penalty for breaking a page at a broken line
@define broken_penalty_code => 8
// penalty for breaking after a binary operation
@define bin_op_penalty_code => 9
// penalty for breaking after a relation
@define rel_penalty_code => 10
// penalty for breaking just before a displayed formula
@define pre_display_penalty_code => 11
// penalty for breaking just after a displayed formula
@define post_display_penalty_code => 12
// additional penalty between lines
@define inter_line_penalty_code => 13
// demerits for double hyphen break
@define double_hyphen_demerits_code => 14
// demerits for final hyphen break
@define final_hyphen_demerits_code => 15
// demerits for adjacent incompatible lines
@define adj_demerits_code => 16
@define mag_code => 17 // magnification ratio
// ratio for variable-size delimiters
@define delimiter_factor_code => 18
// change in number of lines for a paragraph
@define looseness_code => 19
@define time_code => 20 // current time of day
@define day_code => 21 // current day of the month
@define month_code => 22 // current month of the year
@define year_code => 23 // current year of our Lord
// nodes per level in show_box 
@define show_box_breadth_code => 24
// maximum level in show_box 
@define show_box_depth_code => 25
// hboxes exceeding this badness will be shown by hpack 
@define hbadness_code => 26
// vboxes exceeding this badness will be shown by vpack 
@define vbadness_code => 27
// pause after each line is read from a file
@define pausing_code => 28
// show diagnostic output on terminal
@define tracing_online_code => 29
// show macros as they are being expanded
@define tracing_macros_code => 30
// show memory usage if \TeX\ knows it
@define tracing_stats_code => 31
// show line-break calculations
@define tracing_paragraphs_code => 32
// show page-break calculations
@define tracing_pages_code => 33
// show boxes when they are shipped out
@define tracing_output_code => 34
// show characters that aren't in the font
@define tracing_lost_chars_code => 35
// show command codes at big_switch 
@define tracing_commands_code => 36
// show equivalents when they are restored
@define tracing_restores_code => 37
// hyphenate words beginning with a capital letter
@define uc_hyph_code => 38
// penalty found at current page break
@define output_penalty_code => 39
// bound on consecutive dead cycles of output
@define max_dead_cycles_code => 40
// hanging indentation changes after this many lines
@define hang_after_code => 41
// penalty for insertions held over after a split
@define floating_penalty_code => 42
// override \.{\\global} specifications
@define global_defs_code => 43
@define cur_fam_code => 44 // current family
// escape character for token output
@define escape_char_code => 45
// value of \.{\\hyphenchar} when a font is loaded
@define default_hyphen_char_code => 46
// value of \.{\\skewchar} when a font is loaded
@define default_skew_char_code => 47
// character placed at the right end of the buffer
@define end_line_char_code => 48
// character that prints as print_ln 
@define new_line_char_code => 49
@define language_code => 50 // current hyphenation table
// minimum left hyphenation fragment size
@define left_hyphen_min_code => 51
// minimum right hyphenation fragment size
@define right_hyphen_min_code => 52
// do not remove insertion nodes from \.{\\box255}
@define holding_inserts_code => 53
// maximum intermediate line pairs shown
@define error_context_lines_code => 54
// total number of \TeX's integer parameters
@define tex_int_pars => 55
// base for web2c's integer parameters
@define web2c_int_base => tex_int_pars
// smallest value in the charsubdef list
@define char_sub_def_min_code => web2c_int_base
// largest value in the charsubdef list
@define char_sub_def_max_code => web2c_int_base + 1
// traces changes to a charsubdef def
@define tracing_char_sub_def_code => web2c_int_base + 2
// tracing input_stack level if tracingmacros positive
@define tracing_stack_levels_code => web2c_int_base + 3
// total number of web2c's integer parameters
@define web2c_int_pars => web2c_int_base + 4
// base for \eTeX's integer parameters
@define etex_int_base => web2c_int_pars
// show assignments
@define tracing_assigns_code => etex_int_base
// show save/restore groups
@define tracing_groups_code => etex_int_base + 1
// show conditionals
@define tracing_ifs_code => etex_int_base + 2
// show pseudo file open and close
@define tracing_scan_tokens_code => etex_int_base + 3
// show incomplete groups and ifs within files
@define tracing_nesting_code => etex_int_base + 4
// text direction preceding a display
@define pre_display_direction_code => etex_int_base + 5
// adjustment for last line of paragraph
@define last_line_fit_code => etex_int_base + 6
// save items discarded from vlists
@define saving_vdiscards_code => etex_int_base + 7
// save hyphenation codes for languages
@define saving_hyph_codes_code => etex_int_base + 8
// suppress errors for missing fonts
@define suppress_fontnotfound_error_code => etex_int_base + 9
// string number of locale to use for linebreak locations
@define XeTeX_linebreak_locale_code => etex_int_base + 10
// penalty to use at locale-dependent linebreak locations
@define XeTeX_linebreak_penalty_code => etex_int_base + 11
// protrude chars at left/right edge of paragraphs
@define XeTeX_protrude_chars_code => etex_int_base + 12
// \eTeX\ state variables
@define eTeX_state_code => etex_int_base + 13
// total number of \eTeX's integer parameters
@define etex_int_pars => eTeX_state_code + eTeX_states
@define synctex_code => etex_int_pars
// total number of integer parameters
@define int_pars => synctex_code + 1
//  number_regs user \.{\\count} registers
@define count_base => int_base + int_pars
//  number_usvs delimiter code mappings
@define del_code_base => count_base + number_regs
// beginning of region 6
@define dimen_base => del_code_base + number_usvs
@define del_code(#) => eqtb[del_code_base + #].int
@define count(#) => eqtb[count_base + #].int
// an integer parameter
@define int_par(#) => eqtb[int_base + #].int
@define pretolerance => int_par(pretolerance_code)
@define tolerance => int_par(tolerance_code)
@define line_penalty => int_par(line_penalty_code)
@define hyphen_penalty => int_par(hyphen_penalty_code)
@define ex_hyphen_penalty => int_par(ex_hyphen_penalty_code)
@define club_penalty => int_par(club_penalty_code)
@define widow_penalty => int_par(widow_penalty_code)
@define display_widow_penalty =>
    int_par(display_widow_penalty_code)
@define broken_penalty => int_par(broken_penalty_code)
@define bin_op_penalty => int_par(bin_op_penalty_code)
@define rel_penalty => int_par(rel_penalty_code)
@define pre_display_penalty =>
    int_par(pre_display_penalty_code)
@define post_display_penalty =>
    int_par(post_display_penalty_code)
@define inter_line_penalty =>
    int_par(inter_line_penalty_code)
@define double_hyphen_demerits =>
    int_par(double_hyphen_demerits_code)
@define final_hyphen_demerits =>
    int_par(final_hyphen_demerits_code)
@define adj_demerits => int_par(adj_demerits_code)
@define mag => int_par(mag_code)
@define delimiter_factor => int_par(delimiter_factor_code)
@define looseness => int_par(looseness_code)
@define time => int_par(time_code)
@define day => int_par(day_code)
@define month => int_par(month_code)
@define year => int_par(year_code)
@define show_box_breadth => int_par(show_box_breadth_code)
@define show_box_depth => int_par(show_box_depth_code)
@define hbadness => int_par(hbadness_code)
@define vbadness => int_par(vbadness_code)
@define pausing => int_par(pausing_code)
@define tracing_online => int_par(tracing_online_code)
@define tracing_macros => int_par(tracing_macros_code)
@define tracing_stats => int_par(tracing_stats_code)
@define tracing_paragraphs =>
    int_par(tracing_paragraphs_code)
@define tracing_pages => int_par(tracing_pages_code)
@define tracing_output => int_par(tracing_output_code)
@define tracing_lost_chars =>
    int_par(tracing_lost_chars_code)
@define tracing_commands => int_par(tracing_commands_code)
@define tracing_restores => int_par(tracing_restores_code)
@define uc_hyph => int_par(uc_hyph_code)
@define output_penalty => int_par(output_penalty_code)
@define max_dead_cycles => int_par(max_dead_cycles_code)
@define hang_after => int_par(hang_after_code)
@define floating_penalty => int_par(floating_penalty_code)
@define global_defs => int_par(global_defs_code)
@define cur_fam => int_par(cur_fam_code)
@define escape_char => int_par(escape_char_code)
@define default_hyphen_char =>
    int_par(default_hyphen_char_code)
@define default_skew_char => int_par(default_skew_char_code)
@define end_line_char => int_par(end_line_char_code)
@define new_line_char => int_par(new_line_char_code)
@define language => int_par(language_code)
@define left_hyphen_min => int_par(left_hyphen_min_code)
@define right_hyphen_min => int_par(right_hyphen_min_code)
@define holding_inserts => int_par(holding_inserts_code)
@define error_context_lines =>
    int_par(error_context_lines_code)
@define synctex => int_par(synctex_code)
@define char_sub_def_min => int_par(char_sub_def_min_code)
@define char_sub_def_max => int_par(char_sub_def_max_code)
@define tracing_char_sub_def =>
    int_par(tracing_char_sub_def_code)
@define tracing_stack_levels =>
    int_par(tracing_stack_levels_code)
@define tracing_assigns => int_par(tracing_assigns_code)
@define tracing_groups => int_par(tracing_groups_code)
@define tracing_ifs => int_par(tracing_ifs_code)
@define tracing_scan_tokens =>
    int_par(tracing_scan_tokens_code)
@define tracing_nesting => int_par(tracing_nesting_code)
@define pre_display_direction =>
    int_par(pre_display_direction_code)
@define last_line_fit => int_par(last_line_fit_code)
@define saving_vdiscards => int_par(saving_vdiscards_code)
@define saving_hyph_codes => int_par(saving_hyph_codes_code)
@define suppress_fontnotfound_error =>
    int_par(suppress_fontnotfound_error_code)
@define XeTeX_linebreak_locale =>
    int_par(XeTeX_linebreak_locale_code)
@define XeTeX_linebreak_penalty =>
    int_par(XeTeX_linebreak_penalty_code)
@define XeTeX_protrude_chars =>
    int_par(XeTeX_protrude_chars_code)
⟦262 Assign the values |depth_threshold:=show_box_depth| and |breadth_max:=show_box_breadth|⟧ = ⟦
    depth_threshold = show_box_depth

    breadth_max = show_box_breadth
⟧

263. We can print the symbolic name of an integer parameter as follows.

function print_param(n: integer) {
    case n {
      pretolerance_code:
        print_esc(strpool!("pretolerance"));
      tolerance_code:
        print_esc(strpool!("tolerance"));
      line_penalty_code:
        print_esc(strpool!("linepenalty"));
      hyphen_penalty_code:
        print_esc(strpool!("hyphenpenalty"));
      ex_hyphen_penalty_code:
        print_esc(strpool!("exhyphenpenalty"));
      club_penalty_code:
        print_esc(strpool!("clubpenalty"));
      widow_penalty_code:
        print_esc(strpool!("widowpenalty"));
      display_widow_penalty_code:
        print_esc(strpool!("displaywidowpenalty"));
      broken_penalty_code:
        print_esc(strpool!("brokenpenalty"));
      bin_op_penalty_code:
        print_esc(strpool!("binoppenalty"));
      rel_penalty_code:
        print_esc(strpool!("relpenalty"));
      pre_display_penalty_code:
        print_esc(strpool!("predisplaypenalty"));
      post_display_penalty_code:
        print_esc(strpool!("postdisplaypenalty"));
      inter_line_penalty_code:
        print_esc(strpool!("interlinepenalty"));
      double_hyphen_demerits_code:
        print_esc(strpool!("doublehyphendemerits"));
      final_hyphen_demerits_code:
        print_esc(strpool!("finalhyphendemerits"));
      adj_demerits_code:
        print_esc(strpool!("adjdemerits"));
      mag_code:
        print_esc(strpool!("mag"));
      delimiter_factor_code:
        print_esc(strpool!("delimiterfactor"));
      looseness_code:
        print_esc(strpool!("looseness"));
      time_code:
        print_esc(strpool!("time"));
      day_code:
        print_esc(strpool!("day"));
      month_code:
        print_esc(strpool!("month"));
      year_code:
        print_esc(strpool!("year"));
      show_box_breadth_code:
        print_esc(strpool!("showboxbreadth"));
      show_box_depth_code:
        print_esc(strpool!("showboxdepth"));
      hbadness_code:
        print_esc(strpool!("hbadness"));
      vbadness_code:
        print_esc(strpool!("vbadness"));
      pausing_code:
        print_esc(strpool!("pausing"));
      tracing_online_code:
        print_esc(strpool!("tracingonline"));
      tracing_macros_code:
        print_esc(strpool!("tracingmacros"));
      tracing_stats_code:
        print_esc(strpool!("tracingstats"));
      tracing_paragraphs_code:
        print_esc(strpool!("tracingparagraphs"));
      tracing_pages_code:
        print_esc(strpool!("tracingpages"));
      tracing_output_code:
        print_esc(strpool!("tracingoutput"));
      tracing_lost_chars_code:
        print_esc(strpool!("tracinglostchars"));
      tracing_commands_code:
        print_esc(strpool!("tracingcommands"));
      tracing_restores_code:
        print_esc(strpool!("tracingrestores"));
      uc_hyph_code:
        print_esc(strpool!("uchyph"));
      output_penalty_code:
        print_esc(strpool!("outputpenalty"));
      max_dead_cycles_code:
        print_esc(strpool!("maxdeadcycles"));
      hang_after_code:
        print_esc(strpool!("hangafter"));
      floating_penalty_code:
        print_esc(strpool!("floatingpenalty"));
      global_defs_code:
        print_esc(strpool!("globaldefs"));
      cur_fam_code:
        print_esc(strpool!("fam"));
      escape_char_code:
        print_esc(strpool!("escapechar"));
      default_hyphen_char_code:
        print_esc(strpool!("defaulthyphenchar"));
      default_skew_char_code:
        print_esc(strpool!("defaultskewchar"));
      end_line_char_code:
        print_esc(strpool!("endlinechar"));
      new_line_char_code:
        print_esc(strpool!("newlinechar"));
      language_code:
        print_esc(strpool!("language"));
      left_hyphen_min_code:
        print_esc(strpool!("lefthyphenmin"));
      right_hyphen_min_code:
        print_esc(strpool!("righthyphenmin"));
      holding_inserts_code:
        print_esc(strpool!("holdinginserts"));
      error_context_lines_code:
        print_esc(strpool!("errorcontextlines"));
      char_sub_def_min_code:
        print_esc(strpool!("charsubdefmin"));
      char_sub_def_max_code:
        print_esc(strpool!("charsubdefmax"));
      tracing_char_sub_def_code:
        print_esc(strpool!("tracingcharsubdef"));
      tracing_stack_levels_code:
        print_esc(strpool!("tracingstacklevels"));
      XeTeX_linebreak_penalty_code:
        print_esc(strpool!("XeTeXlinebreakpenalty"));
      XeTeX_protrude_chars_code:
        print_esc(strpool!("XeTeXprotrudechars"));
      ⟦1707 synctex case for |print_param|⟧
      ⟦1469 Cases for |print_param|⟧
      othercases:
        print(strpool!("[unknown integer parameter!]"));
    }
}

264. The integer parameter names must be entered into the hash table.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("pretolerance"),
      assign_int,
      int_base + pretolerance_code,
    )

    primitive(
      strpool!("tolerance"),
      assign_int,
      int_base + tolerance_code,
    )

    primitive(
      strpool!("linepenalty"),
      assign_int,
      int_base + line_penalty_code,
    )

    primitive(
      strpool!("hyphenpenalty"),
      assign_int,
      int_base + hyphen_penalty_code,
    )

    primitive(
      strpool!("exhyphenpenalty"),
      assign_int,
      int_base + ex_hyphen_penalty_code,
    )

    primitive(
      strpool!("clubpenalty"),
      assign_int,
      int_base + club_penalty_code,
    )

    primitive(
      strpool!("widowpenalty"),
      assign_int,
      int_base + widow_penalty_code,
    )

    primitive(
      strpool!("displaywidowpenalty"),
      assign_int,
      int_base + display_widow_penalty_code,
    )

    primitive(
      strpool!("brokenpenalty"),
      assign_int,
      int_base + broken_penalty_code,
    )

    primitive(
      strpool!("binoppenalty"),
      assign_int,
      int_base + bin_op_penalty_code,
    )

    primitive(
      strpool!("relpenalty"),
      assign_int,
      int_base + rel_penalty_code,
    )

    primitive(
      strpool!("predisplaypenalty"),
      assign_int,
      int_base + pre_display_penalty_code,
    )

    primitive(
      strpool!("postdisplaypenalty"),
      assign_int,
      int_base + post_display_penalty_code,
    )

    primitive(
      strpool!("interlinepenalty"),
      assign_int,
      int_base + inter_line_penalty_code,
    )

    primitive(
      strpool!("doublehyphendemerits"),
      assign_int,
      int_base + double_hyphen_demerits_code,
    )

    primitive(
      strpool!("finalhyphendemerits"),
      assign_int,
      int_base + final_hyphen_demerits_code,
    )

    primitive(
      strpool!("adjdemerits"),
      assign_int,
      int_base + adj_demerits_code,
    )

    primitive(
      strpool!("mag"),
      assign_int,
      int_base + mag_code,
    )

    primitive(
      strpool!("delimiterfactor"),
      assign_int,
      int_base + delimiter_factor_code,
    )

    primitive(
      strpool!("looseness"),
      assign_int,
      int_base + looseness_code,
    )

    primitive(
      strpool!("time"),
      assign_int,
      int_base + time_code,
    )

    primitive(
      strpool!("day"),
      assign_int,
      int_base + day_code,
    )

    primitive(
      strpool!("month"),
      assign_int,
      int_base + month_code,
    )

    primitive(
      strpool!("year"),
      assign_int,
      int_base + year_code,
    )

    primitive(
      strpool!("showboxbreadth"),
      assign_int,
      int_base + show_box_breadth_code,
    )

    primitive(
      strpool!("showboxdepth"),
      assign_int,
      int_base + show_box_depth_code,
    )

    primitive(
      strpool!("hbadness"),
      assign_int,
      int_base + hbadness_code,
    )

    primitive(
      strpool!("vbadness"),
      assign_int,
      int_base + vbadness_code,
    )

    primitive(
      strpool!("pausing"),
      assign_int,
      int_base + pausing_code,
    )

    primitive(
      strpool!("tracingonline"),
      assign_int,
      int_base + tracing_online_code,
    )

    primitive(
      strpool!("tracingmacros"),
      assign_int,
      int_base + tracing_macros_code,
    )

    primitive(
      strpool!("tracingstats"),
      assign_int,
      int_base + tracing_stats_code,
    )

    primitive(
      strpool!("tracingparagraphs"),
      assign_int,
      int_base + tracing_paragraphs_code,
    )

    primitive(
      strpool!("tracingpages"),
      assign_int,
      int_base + tracing_pages_code,
    )

    primitive(
      strpool!("tracingoutput"),
      assign_int,
      int_base + tracing_output_code,
    )

    primitive(
      strpool!("tracinglostchars"),
      assign_int,
      int_base + tracing_lost_chars_code,
    )

    primitive(
      strpool!("tracingcommands"),
      assign_int,
      int_base + tracing_commands_code,
    )

    primitive(
      strpool!("tracingrestores"),
      assign_int,
      int_base + tracing_restores_code,
    )

    primitive(
      strpool!("uchyph"),
      assign_int,
      int_base + uc_hyph_code,
    )

    primitive(
      strpool!("outputpenalty"),
      assign_int,
      int_base + output_penalty_code,
    )

    primitive(
      strpool!("maxdeadcycles"),
      assign_int,
      int_base + max_dead_cycles_code,
    )

    primitive(
      strpool!("hangafter"),
      assign_int,
      int_base + hang_after_code,
    )

    primitive(
      strpool!("floatingpenalty"),
      assign_int,
      int_base + floating_penalty_code,
    )

    primitive(
      strpool!("globaldefs"),
      assign_int,
      int_base + global_defs_code,
    )

    primitive(
      strpool!("fam"),
      assign_int,
      int_base + cur_fam_code,
    )

    primitive(
      strpool!("escapechar"),
      assign_int,
      int_base + escape_char_code,
    )

    primitive(
      strpool!("defaulthyphenchar"),
      assign_int,
      int_base + default_hyphen_char_code,
    )

    primitive(
      strpool!("defaultskewchar"),
      assign_int,
      int_base + default_skew_char_code,
    )

    primitive(
      strpool!("endlinechar"),
      assign_int,
      int_base + end_line_char_code,
    )

    primitive(
      strpool!("newlinechar"),
      assign_int,
      int_base + new_line_char_code,
    )

    primitive(
      strpool!("language"),
      assign_int,
      int_base + language_code,
    )

    primitive(
      strpool!("lefthyphenmin"),
      assign_int,
      int_base + left_hyphen_min_code,
    )

    primitive(
      strpool!("righthyphenmin"),
      assign_int,
      int_base + right_hyphen_min_code,
    )

    primitive(
      strpool!("holdinginserts"),
      assign_int,
      int_base + holding_inserts_code,
    )

    primitive(
      strpool!("errorcontextlines"),
      assign_int,
      int_base + error_context_lines_code,
    )

    if (mltex_p) {
        // enable character substitution
        mltex_enabled_p = true;
        // remove the if-clause to enable 
        // \.{\\charsubdefmin}
        if (false) {
            primitive(
              strpool!("charsubdefmin"),
              assign_int,
              int_base + char_sub_def_min_code,
            );
        }
        primitive(
          strpool!("charsubdefmax"),
          assign_int,
          int_base + char_sub_def_max_code,
        );
        primitive(
          strpool!("tracingcharsubdef"),
          assign_int,
          int_base + tracing_char_sub_def_code,
        );
    }

    primitive(
      strpool!("tracingstacklevels"),
      assign_int,
      int_base + tracing_stack_levels_code,
    )

    primitive(
      strpool!("XeTeXlinebreakpenalty"),
      assign_int,
      int_base + XeTeX_linebreak_penalty_code,
    )

    primitive(
      strpool!("XeTeXprotrudechars"),
      assign_int,
      int_base + XeTeX_protrude_chars_code,
    )
⟧

265.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    assign_int:

    if (chr_code < count_base) {
        print_param(chr_code - int_base);
    } else {
        print_esc(strpool!("count"));
        print_int(chr_code - count_base);
    }
⟧

266. The integer parameters should really be initialized by a macro package; the following initialization does the minimum to keep TEX from complete failure.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    for (k in int_base to del_code_base - 1) {
        eqtb[k].int = 0;
    }

    char_sub_def_min = 256

    // allow \.{\\charsubdef} for char 0
    //  tracing_char_sub_def = 0 is already done
    char_sub_def_max = -1

    mag = 1000

    tolerance = 10000

    hang_after = 1

    max_dead_cycles = 25

    escape_char = ord!("\\")

    end_line_char = carriage_return

    for (k in 0 to number_usvs - 1) {
        del_code(k) = -1;
    }

    // this null delimiter is used in error recovery
    del_code(ord!(".")) = 0

267. The following procedure, which is called just before TEX initializes its input and output, establishes the initial values of the date and time. It calls a date_and_time C macro (a.k.a. dateandtime ), which calls the C function get_date_and_time , passing it the addresses of sys_time , etc., so they can be set by the routine. get_date_and_time also sets up interrupt catching if that is conditionally compiled in the C code.

We have to initialize the sys_ variables because that is what gets output on the first line of the log file. (New in 2021.)

function fix_date_and_time() {
    date_and_time(sys_time, sys_day, sys_month, sys_year);
    // minutes since midnight
    time = sys_time;
    // day of the month
    day = sys_day;
    // month of the year
    month = sys_month;
    // Anno Domini
    year = sys_year;
}

268.

⟦268 Show equivalent |n|, in region 5⟧ = ⟦
    {
        if (n < count_base) {
            print_param(n - int_base);
        } else if (n < del_code_base) {
            print_esc(strpool!("count"));
            print_int(n - count_base);
        } else {
            print_esc(strpool!("delcode"));
            print_int(n - del_code_base);
        }
        print_char(ord!("="));
        print_int(eqtb[n].int);
    }
⟧

269.

⟦269 Set variable |c| to the current escape character⟧ = ⟦
    c = escape_char
⟧

270.

⟦270 Character |s| is the current new-line character⟧ = ⟦
    s == new_line_char
⟧

271. TEX is occasionally supposed to print diagnostic information that goes only into the transcript file, unless tracing_online is positive. Here are two routines that adjust the destination of print commands:

// prepare to do some tracing
function begin_diagnostic() {
    old_setting = selector;
    if ((tracing_online <= 0) && (selector == term_and_log)) {
        decr(selector);
        if (history == spotless) {
            history = warning_issued;
        }
    }
}

// restore proper conditions after tracing
function end_diagnostic(blank_line: boolean) {
    print_nl(strpool!(""));
    if (blank_line) {
        print_ln;
    }
    selector = old_setting;
}

272. Of course we had better declare a few more global variables, if the previous routines are going to work.

⟦13 Global variables⟧ += ⟦
    var old_setting: 0 .. max_selector;

    // date and time supplied by external system
    var sys_time, sys_day, sys_month, sys_year: integer;
⟧

273. The final region of eqtb contains the dimension parameters defined here, and the number_regs \dimen registers.

@define par_indent_code => 0 // indentation of paragraphs
@define math_surround_code => 1 // space around math in text
// threshold for line_skip instead of baseline_skip 
@define line_skip_limit_code => 2
@define hsize_code => 3 // line width in horizontal mode
@define vsize_code => 4 // page height in vertical mode
// maximum depth of boxes on main pages
@define max_depth_code => 5
// maximum depth of boxes on split pages
@define split_max_depth_code => 6
// maximum depth of explicit vboxes
@define box_max_depth_code => 7
// tolerance for overfull hbox messages
@define hfuzz_code => 8
// tolerance for overfull vbox messages
@define vfuzz_code => 9
// maximum amount uncovered by variable delimiters
@define delimiter_shortfall_code => 10
// blank space in null delimiters
@define null_delimiter_space_code => 11
// extra space after subscript or superscript
@define script_space_code => 12
// length of text preceding a display
@define pre_display_size_code => 13
// length of line for displayed equation
@define display_width_code => 14
// indentation of line for displayed equation
@define display_indent_code => 15
// width of rule that identifies overfull hboxes
@define overfull_rule_code => 16
// amount of hanging indentation
@define hang_indent_code => 17
// amount of horizontal offset when shipping pages out
@define h_offset_code => 18
// amount of vertical offset when shipping pages out
@define v_offset_code => 19
// reduces badnesses on final pass of line-breaking
@define emergency_stretch_code => 20
// page width of the PDF output
@define pdf_page_width_code => 21
// page height of the PDF output
@define pdf_page_height_code => 22
// total number of dimension parameters
@define dimen_pars => 23
// table of number_regs user-defined \.{\\dimen} registers
@define scaled_base => dimen_base + dimen_pars
// largest subscript of eqtb 
@define eqtb_size => scaled_base + biggest_reg
@define dimen(#) => eqtb[scaled_base + #].sc
// a scaled quantity
@define dimen_par(#) => eqtb[dimen_base + #].sc
@define par_indent => dimen_par(par_indent_code)
@define math_surround => dimen_par(math_surround_code)
@define line_skip_limit => dimen_par(line_skip_limit_code)
@define hsize => dimen_par(hsize_code)
@define vsize => dimen_par(vsize_code)
@define max_depth => dimen_par(max_depth_code)
@define split_max_depth => dimen_par(split_max_depth_code)
@define box_max_depth => dimen_par(box_max_depth_code)
@define hfuzz => dimen_par(hfuzz_code)
@define vfuzz => dimen_par(vfuzz_code)
@define delimiter_shortfall =>
    dimen_par(delimiter_shortfall_code)
@define null_delimiter_space =>
    dimen_par(null_delimiter_space_code)
@define script_space => dimen_par(script_space_code)
@define pre_display_size => dimen_par(pre_display_size_code)
@define display_width => dimen_par(display_width_code)
@define display_indent => dimen_par(display_indent_code)
@define overfull_rule => dimen_par(overfull_rule_code)
@define hang_indent => dimen_par(hang_indent_code)
@define h_offset => dimen_par(h_offset_code)
@define v_offset => dimen_par(v_offset_code)
@define emergency_stretch =>
    dimen_par(emergency_stretch_code)
@define pdf_page_width => dimen_par(pdf_page_width_code)
@define pdf_page_height => dimen_par(pdf_page_height_code)
function print_length_param(n: integer) {
    case n {
      par_indent_code:
        print_esc(strpool!("parindent"));
      math_surround_code:
        print_esc(strpool!("mathsurround"));
      line_skip_limit_code:
        print_esc(strpool!("lineskiplimit"));
      hsize_code:
        print_esc(strpool!("hsize"));
      vsize_code:
        print_esc(strpool!("vsize"));
      max_depth_code:
        print_esc(strpool!("maxdepth"));
      split_max_depth_code:
        print_esc(strpool!("splitmaxdepth"));
      box_max_depth_code:
        print_esc(strpool!("boxmaxdepth"));
      hfuzz_code:
        print_esc(strpool!("hfuzz"));
      vfuzz_code:
        print_esc(strpool!("vfuzz"));
      delimiter_shortfall_code:
        print_esc(strpool!("delimitershortfall"));
      null_delimiter_space_code:
        print_esc(strpool!("nulldelimiterspace"));
      script_space_code:
        print_esc(strpool!("scriptspace"));
      pre_display_size_code:
        print_esc(strpool!("predisplaysize"));
      display_width_code:
        print_esc(strpool!("displaywidth"));
      display_indent_code:
        print_esc(strpool!("displayindent"));
      overfull_rule_code:
        print_esc(strpool!("overfullrule"));
      hang_indent_code:
        print_esc(strpool!("hangindent"));
      h_offset_code:
        print_esc(strpool!("hoffset"));
      v_offset_code:
        print_esc(strpool!("voffset"));
      emergency_stretch_code:
        print_esc(strpool!("emergencystretch"));
      pdf_page_width_code:
        print_esc(strpool!("pdfpagewidth"));
      pdf_page_height_code:
        print_esc(strpool!("pdfpageheight"));
      othercases:
        print(strpool!("[unknown dimen parameter!]"));
    }
}

274.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("parindent"),
      assign_dimen,
      dimen_base + par_indent_code,
    )

    primitive(
      strpool!("mathsurround"),
      assign_dimen,
      dimen_base + math_surround_code,
    )

    primitive(
      strpool!("lineskiplimit"),
      assign_dimen,
      dimen_base + line_skip_limit_code,
    )

    primitive(
      strpool!("hsize"),
      assign_dimen,
      dimen_base + hsize_code,
    )

    primitive(
      strpool!("vsize"),
      assign_dimen,
      dimen_base + vsize_code,
    )

    primitive(
      strpool!("maxdepth"),
      assign_dimen,
      dimen_base + max_depth_code,
    )

    primitive(
      strpool!("splitmaxdepth"),
      assign_dimen,
      dimen_base + split_max_depth_code,
    )

    primitive(
      strpool!("boxmaxdepth"),
      assign_dimen,
      dimen_base + box_max_depth_code,
    )

    primitive(
      strpool!("hfuzz"),
      assign_dimen,
      dimen_base + hfuzz_code,
    )

    primitive(
      strpool!("vfuzz"),
      assign_dimen,
      dimen_base + vfuzz_code,
    )

    primitive(
      strpool!("delimitershortfall"),
      assign_dimen,
      dimen_base + delimiter_shortfall_code,
    )

    primitive(
      strpool!("nulldelimiterspace"),
      assign_dimen,
      dimen_base + null_delimiter_space_code,
    )

    primitive(
      strpool!("scriptspace"),
      assign_dimen,
      dimen_base + script_space_code,
    )

    primitive(
      strpool!("predisplaysize"),
      assign_dimen,
      dimen_base + pre_display_size_code,
    )

    primitive(
      strpool!("displaywidth"),
      assign_dimen,
      dimen_base + display_width_code,
    )

    primitive(
      strpool!("displayindent"),
      assign_dimen,
      dimen_base + display_indent_code,
    )

    primitive(
      strpool!("overfullrule"),
      assign_dimen,
      dimen_base + overfull_rule_code,
    )

    primitive(
      strpool!("hangindent"),
      assign_dimen,
      dimen_base + hang_indent_code,
    )

    primitive(
      strpool!("hoffset"),
      assign_dimen,
      dimen_base + h_offset_code,
    )

    primitive(
      strpool!("voffset"),
      assign_dimen,
      dimen_base + v_offset_code,
    )

    primitive(
      strpool!("emergencystretch"),
      assign_dimen,
      dimen_base + emergency_stretch_code,
    )

    primitive(
      strpool!("pdfpagewidth"),
      assign_dimen,
      dimen_base + pdf_page_width_code,
    )

    primitive(
      strpool!("pdfpageheight"),
      assign_dimen,
      dimen_base + pdf_page_height_code,
    )
⟧

275.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    assign_dimen:

    if (chr_code < scaled_base) {
        print_length_param(chr_code - dimen_base);
    } else {
        print_esc(strpool!("dimen"));
        print_int(chr_code - scaled_base);
    }
⟧

276.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    for (k in dimen_base to eqtb_size) {
        eqtb[k].sc = 0;
    }
⟧

277.

⟦277 Show equivalent |n|, in region 6⟧ = ⟦
    {
        if (n < scaled_base) {
            print_length_param(n - dimen_base);
        } else {
            print_esc(strpool!("dimen"));
            print_int(n - scaled_base);
        }
        print_char(ord!("="));
        print_scaled(eqtb[n].sc);
        print(strpool!("pt"));
    }
⟧

278. Here is a procedure that displays the contents of eqtb[n] symbolically.

⟦328 Declare the procedure called |print_cmd_chr|⟧

stat!{
    function show_eqtb(n: pointer) {
        if (n < active_base) {
            // this can't happen
            print_char(ord!("?"));
        } else if (
            (n < glue_base)
            || ((n > eqtb_size) && (n <= eqtb_top))
        ) {
            ⟦249 Show equivalent |n|, in region 1 or 2⟧
        } else if (n < local_base) {
            ⟦255 Show equivalent |n|, in region 3⟧
        } else if (n < int_base) {
            ⟦259 Show equivalent |n|, in region 4⟧
        } else if (n < dimen_base) {
            ⟦268 Show equivalent |n|, in region 5⟧
        } else if (n <= eqtb_size) {
            ⟦277 Show equivalent |n|, in region 6⟧
        } else {
            // this can't happen either
            print_char(ord!("?"));
        }
    }
}

279. The last two regions of eqtb have fullword values instead of the three fields eq_level , eq_type , and equiv . An eq_type is unnecessary, but TEX needs to store the eq_level information in another array called xeq_level .

⟦13 Global variables⟧ += ⟦
    var zeqtb: ^memory_word;

    var xeq_level: array [int_base .. eqtb_size] of
      quarterword;
⟧

280.

⟦23 Set initial values of key variables⟧ += ⟦
    for (k in int_base to eqtb_size) {
        xeq_level[k] = level_one;
    }
⟧

281. When the debugging routine search_mem is looking for pointers having a given value, it is interested only in regions 1 to 3 of eqtb , and in the first part of region 4.

⟦281 Search |eqtb| for equivalents equal to |p|⟧ = ⟦
    for (q in active_base to box_base + biggest_reg) {
        if (equiv(q) == p) {
            print_nl(strpool!("EQUIV("));
            print_int(q);
            print_char(ord!(")"));
        }
    }
⟧

282. [18] The hash table. Control sequences are stored and retrieved by means of a fairly standard hash table algorithm called the method of “coalescing lists” (cf. Algorithm 6.4C in The Art of Computer Programming). Once a control sequence enters the table, it is never removed, because there are complicated situations involving \gdef where the removal of a control sequence at the end of a group would be a mistake preventable only by the introduction of a complicated reference-count mechanism.

The actual sequence of letters forming a control sequence identifier is stored in the str_pool array together with all the other strings. An auxiliary array hash consists of items with two halfword fields per word. The first of these, called next(p) , points to the next identifier belonging to the same coalesced list as the identifier corresponding to p ; and the other, called text(p) , points to the str_start entry for p ’s identifier. If position p of the hash table is empty, we have text(p) == 0 ; if position p is either empty or the end of a coalesced hash list, we have next(p) == 0 . An auxiliary pointer variable called hash_used is maintained in such a way that all locations p >= hash_used are nonempty. The global variable cs_count tells how many multiletter control sequences have been defined, if statistics are being kept.

A global boolean variable called no_new_control_sequence is set to true during the time that new hash table entries are forbidden.

@define next(#) => hash[#].lh // link for coalesced lists
// string number for control sequence name
@define text(#) => hash[#].rh
@define hash_is_full =>
    (hash_used == hash_base) // test if all positions are 
    // occupied
// a frozen font identifier's name
@define font_id_text(#) => text(font_id_base + #)
⟦13 Global variables⟧ += ⟦
    // the hash table
    var hash: ^two_halves;

    // auxiliary pointer for freeing hash
    var yhash: ^two_halves;

    // allocation pointer for hash 
    var hash_used: pointer;

    //  hash_extra == hash above eqtb_size 
    var hash_extra: pointer;

    // maximum of the hash array
    var hash_top: pointer;

    // maximum of the eqtb 
    var eqtb_top: pointer;

    // pointer to next high hash location
    var hash_high: pointer;

    // are new identifiers legal?
    var no_new_control_sequence: boolean;

    // total number of known identifiers
    var cs_count: integer;
⟧

283. Primitive support needs a few extra variables and definitions

// about 85\pct! of primitive_size 
@define prim_prime => 1777
@define prim_base => 1
// link for coalesced lists
@define prim_next(#) => prim[#].lh
// string number for control sequence name, plus one
@define prim_text(#) => prim[#].rh
@define prim_is_full =>
    (prim_used == prim_base) // test if all positions are 
    // occupied
@define prim_eq_level_field(#) => #.hh.b1
@define prim_eq_type_field(#) => #.hh.b0
@define prim_equiv_field(#) => #.hh.rh
// level of definition
@define prim_eq_level(#) =>
    prim_eq_level_field(eqtb[prim_eqtb_base + #])
// command code for equivalent
@define prim_eq_type(#) =>
    prim_eq_type_field(eqtb[prim_eqtb_base + #])
// equivalent value
@define prim_equiv(#) =>
    prim_equiv_field(eqtb[prim_eqtb_base + #])
@define undefined_primitive => 0
⟦13 Global variables⟧ += ⟦
    // the primitives table
    var prim: array [0 .. prim_size] of two_halves;

    // allocation pointer for prim 
    var prim_used: pointer;
⟧

284.

⟦23 Set initial values of key variables⟧ += ⟦
    // new identifiers are usually forbidden
    no_new_control_sequence = true

    prim_next(0) = 0

    prim_text(0) = 0

    for (k in 1 to prim_size) {
        prim[k] = prim[0];
    }
⟧

285.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    prim_used = prim_size // nothing is used

    hash_used = frozen_control_sequence // nothing is used

    hash_high = 0

    cs_count = 0

    eq_type(frozen_dont_expand) = dont_expand

    text(frozen_dont_expand) = strpool!("notexpanded:")

    eq_type(frozen_primitive) = ignore_spaces

    equiv(frozen_primitive) = 1

    eq_level(frozen_primitive) = level_one

    text(frozen_primitive) = strpool!("primitive")
⟧

286. Here is the subroutine that searches the hash table for an identifier that matches a given string of length l > 0 appearing in buffer[j .. (j + l - 1)] . If the identifier is found, the corresponding hash table address is returned. Otherwise, if the global variable no_new_control_sequence is true , the dummy address undefined_control_sequence is returned. Otherwise the identifier is inserted into the hash table and its location is returned.

// search the hash table
function id_lookup(j, l: integer): pointer {
    label found; // go here if you found it
    var
      h: integer, // hash code
      d: integer, // number of characters in incomplete 
      // current string
      p: pointer, // index in hash array
      k: pointer, // index in buffer array
      ll: integer; // length in UTF16 code units
    
    ⟦288 Compute the hash code |h|⟧
    // we start searching here; note that 0 <= h < 
    // hash_prime 
    p = h + hash_base;
    ll = l;
    for (d in 0 to l - 1) {
        if (buffer[j + d] >= 0x10000) {
            incr(ll);
        }
    }
    loop {
        if (text(p) > 0) {
            if (length(text(p)) == ll) {
                if (str_eq_buf(text(p), j)) {
                    goto found;
                }
            }
        }
        if (next(p) == 0) {
            if (no_new_control_sequence) {
                p = undefined_control_sequence;
            } else {
                ⟦287 Insert a new control sequence after |p|, then make |p| point to it⟧
            }
            goto found;
        }
        p = next(p);
    }
  found:
    id_lookup = p;
}

287.

⟦287 Insert a new control sequence after |p|, then make |p| point to it⟧ = ⟦
    {
        if (text(p) > 0) {
            if (hash_high < hash_extra) {
                incr(hash_high);
                next(p) = hash_high + eqtb_size;
                p = hash_high + eqtb_size;
            } else {
                repeat {
                    if (hash_is_full) {
                        overflow(
                          strpool!("hash size"),
                          hash_size + hash_extra,
                        );
                    }
                    decr(hash_used);// search for an empty 
                    // location in hash 
                } until (text(hash_used) == 0);
                next(p) = hash_used;
                p = hash_used;
            }
        }
        str_room(ll);
        d = cur_length;
        while (pool_ptr > str_start_macro(str_ptr)) {
            decr(pool_ptr);
            str_pool[pool_ptr + l] = str_pool[pool_ptr];
            // move current string up to make room for 
            // another
        }
        for (k in j to j + l - 1) {
            if (buffer[k] < 0x10000) {
                append_char(buffer[k]);
            } else {
                append_char(
                  0xd800 + (buffer[k] - 0x10000) div 0x400,
                );
                append_char(
                  0xdc00 + (buffer[k] - 0x10000) % 0x400,
                );
            }
        }
        text(p) = make_string;
        pool_ptr = pool_ptr + d;
        stat!{
            incr(cs_count);
        }
    }
⟧

288. The value of hash_prime should be roughly 85% of hash_size , and it should be a prime number. The theory of hashing tells us to expect fewer than two table probes, on the average, when the search is successful. [See J. S. Vitter, Journal of the ACM 30 (1983), 231–258.]

⟦288 Compute the hash code |h|⟧ = ⟦
    h = 0

    for (k in j to j + l - 1) {
        h = h + h + buffer[k];
        while (h >= hash_prime) {
            h = h - hash_prime;
        }
    }
⟧

289. Here is the subroutine that searches the primitive table for an identifier

// search the primitives table
function prim_lookup(s: str_number): pointer {
    label found; // go here if you found it
    var
      h: integer, // hash code
      p: pointer, // index in hash array
      k: pointer, // index in string pool
      j, l: integer;
    
    if (s <= biggest_char) {
        if (s < 0) {
            p = undefined_primitive;
            goto found;
        } else {
            // we start searching here
            p = (s % prim_prime) + prim_base;
        }
    } else {
        j = str_start_macro(s);
        if (s == str_ptr) {
            l = cur_length;
        } else {
            l = length(s);
        }
        ⟦291 Compute the primitive code |h|⟧
        // we start searching here; note that 0 <= h < 
        // prim_prime 
        p = h + prim_base;
    }
    loop {
        //  p points a multi-letter primitive
        if (prim_text(p) > 1 + biggest_char) {
            if (length(prim_text(p) - 1) == l) {
                if (str_eq_str(prim_text(p) - 1, s)) {
                    goto found;
                }
            }
        } else if (prim_text(p) == 1 + s) {
            //  p points a single-letter primitive
            goto found;
        }
        if (prim_next(p) == 0) {
            if (no_new_control_sequence) {
                p = undefined_primitive;
            } else {
                ⟦290 Insert a new primitive after |p|, then make |p| point to it⟧
            }
            goto found;
        }
        p = prim_next(p);
    }
  found:
    prim_lookup = p;
}

290.

⟦290 Insert a new primitive after |p|, then make |p| point to it⟧ = ⟦
    {
        if (prim_text(p) > 0) {
            repeat {
                if (prim_is_full) {
                    overflow(
                      strpool!("primitive size"),
                      prim_size,
                    );
                }
                decr(prim_used);// search for an empty 
                // location in prim 
            } until (prim_text(prim_used) == 0);
            prim_next(p) = prim_used;
            p = prim_used;
        }
        prim_text(p) = s + 1;
    }
⟧

291. The value of prim_prime should be roughly 85% of prim_size , and it should be a prime number.

⟦291 Compute the primitive code |h|⟧ = ⟦
    h = str_pool[j]

    for (k in j + 1 to j + l - 1) {
        h = h + h + str_pool[k];
        while (h >= prim_prime) {
            h = h - prim_prime;
        }
    }
⟧

292. Single-character control sequences do not need to be looked up in a hash table, since we can use the character code itself as a direct address. The procedure print_cs prints the name of a control sequence, given a pointer to its address in eqtb . A space is printed after the name unless it is a single nonletter or an active character. This procedure might be invoked with invalid data, so it is “extra robust.” The individual characters must be printed one at a time using print , since they may be unprintable.

⟦57 Basic printing procedures⟧ += ⟦
    // prints a purported control sequence
    function print_cs(p: integer) {
        // single character
        if (p < hash_base) {
            if (p >= single_base) {
                if (p == null_cs) {
                    print_esc(strpool!("csname"));
                    print_esc(strpool!("endcsname"));
                    print_char(ord!(" "));
                } else {
                    print_esc(p - single_base);
                    if (cat_code(p - single_base) == letter) {
                        print_char(ord!(" "));
                    }
                }
            } else if (p < active_base) {
                print_esc(strpool!("IMPOSSIBLE."));
            } else {
                print_char(p - active_base);
            }
        } else if (
            (
                (p >= undefined_control_sequence)
                && (p <= eqtb_size)
            )
            || (p > eqtb_top)
        ) {
            print_esc(strpool!("IMPOSSIBLE."));
        } else if ((text(p) >= str_ptr)) {
            print_esc(strpool!("NONEXISTENT."));
        } else {
            if (
                (p >= prim_eqtb_base)
                && (p < frozen_null_font)
            ) {
                print_esc(
                  prim_text(p - prim_eqtb_base) - 1,
                );
            } else {
                print_esc(text(p));
            }
            print_char(ord!(" "));
        }
    }
⟧

293. Here is a similar procedure; it avoids the error checks, and it never prints a space after the control sequence.

⟦57 Basic printing procedures⟧ += ⟦
    // prints a control sequence
    function sprint_cs(p: pointer) {
        if (p < hash_base) {
            if (p < single_base) {
                print_char(p - active_base);
            } else if (p < null_cs) {
                print_esc(p - single_base);
            } else {
                print_esc(strpool!("csname"));
                print_esc(strpool!("endcsname"));
            }
        } else if (
            (p >= prim_eqtb_base)
            && (p < frozen_null_font)
        ) {
            print_esc(prim_text(p - prim_eqtb_base) - 1);
        } else {
            print_esc(text(p));
        }
    }
⟧

294. We need to put TEX’s “primitive” control sequences into the hash table, together with their command code (which will be the eq_type ) and an operand (which will be the equiv ). The primitive procedure does this, in a way that no TEX user can. The global value cur_val contains the new eqtb pointer after primitive has acted.

init!{
    function primitive(
      s: str_number,
      c: quarterword,
      o: halfword,
    ) {
        var
          k: pool_pointer, // index into str_pool 
          j: 0 .. buf_size, // index into buffer 
          l: small_number, // length of the string
          prim_val: integer; // needed to fill prim_eqtb 
        
        if (s < 256) {
            cur_val = s + single_base;
            prim_val = prim_lookup(s);
        } else {
            k = str_start_macro(s);
            // we will move s into the (possibly non-empty) 
            // buffer 
            l = str_start_macro(s + 1) - k;
            if (first + l > buf_size + 1) {
                overflow(strpool!("buffer size"), buf_size);
            }
            for (j in 0 to l - 1) {
                buffer[first + j] = so(str_pool[k + j]);
            }
            //  no_new_control_sequence is false 
            cur_val = id_lookup(first, l);
            flush_string;
            // we don't want to have the string twice
            text(cur_val) = s;
            prim_val = prim_lookup(s);
        }
        eq_level(cur_val) = level_one;
        eq_type(cur_val) = c;
        equiv(cur_val) = o;
        prim_eq_level(prim_val) = level_one;
        prim_eq_type(prim_val) = c;
        prim_equiv(prim_val) = o;
    }
}

295. Many of TEX’s primitives need no equiv , since they are identifiable by their eq_type alone. These primitives are loaded into the hash table as follows:

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(ord!(" "), ex_space, 0)

    primitive(ord!("/"), ital_corr, 0)

    primitive(strpool!("accent"), accent, 0)

    primitive(strpool!("advance"), advance, 0)

    primitive(
      strpool!("afterassignment"),
      after_assignment,
      0,
    )

    primitive(strpool!("aftergroup"), after_group, 0)

    primitive(strpool!("begingroup"), begin_group, 0)

    primitive(strpool!("char"), char_num, 0)

    primitive(strpool!("csname"), cs_name, 0)

    primitive(strpool!("delimiter"), delim_num, 0)

    primitive(strpool!("XeTeXdelimiter"), delim_num, 1)

    primitive(strpool!("Udelimiter"), delim_num, 1)

    primitive(strpool!("divide"), divide, 0)

    primitive(strpool!("endcsname"), end_cs_name, 0)

    primitive(strpool!("endgroup"), end_group, 0)

    text(frozen_end_group) = strpool!("endgroup")

    eqtb[frozen_end_group] = eqtb[cur_val]

    primitive(strpool!("expandafter"), expand_after, 0)

    primitive(strpool!("font"), def_font, 0)

    primitive(strpool!("fontdimen"), assign_font_dimen, 0)

    primitive(strpool!("halign"), halign, 0)

    primitive(strpool!("hrule"), hrule, 0)

    primitive(strpool!("ignorespaces"), ignore_spaces, 0)

    primitive(strpool!("insert"), insert, 0)

    primitive(strpool!("mark"), mark, 0)

    primitive(strpool!("mathaccent"), math_accent, 0)

    primitive(strpool!("XeTeXmathaccent"), math_accent, 1)

    primitive(strpool!("Umathaccent"), math_accent, 1)

    primitive(strpool!("mathchar"), math_char_num, 0)

    primitive(
      strpool!("XeTeXmathcharnum"),
      math_char_num,
      1,
    )

    primitive(strpool!("Umathcharnum"), math_char_num, 1)

    primitive(strpool!("XeTeXmathchar"), math_char_num, 2)

    primitive(strpool!("Umathchar"), math_char_num, 2)

    primitive(strpool!("mathchoice"), math_choice, 0)

    primitive(strpool!("multiply"), multiply, 0)

    primitive(strpool!("noalign"), no_align, 0)

    primitive(strpool!("noboundary"), no_boundary, 0)

    primitive(strpool!("noexpand"), no_expand, 0)

    primitive(strpool!("primitive"), no_expand, 1)

    primitive(strpool!("nonscript"), non_script, 0)

    primitive(strpool!("omit"), omit, 0)

    primitive(
      strpool!("parshape"),
      set_shape,
      par_shape_loc,
    )

    primitive(strpool!("penalty"), break_penalty, 0)

    primitive(strpool!("prevgraf"), set_prev_graf, 0)

    primitive(strpool!("radical"), radical, 0)

    primitive(strpool!("XeTeXradical"), radical, 1)

    primitive(strpool!("Uradical"), radical, 1)

    primitive(strpool!("read"), read_to_cs, 0)

    // cf.\ scan_file_name 
    primitive(strpool!("relax"), relax, too_big_usv)

    text(frozen_relax) = strpool!("relax")

    eqtb[frozen_relax] = eqtb[cur_val]

    primitive(strpool!("setbox"), set_box, 0)

    primitive(strpool!("the"), the, 0)

    primitive(strpool!("toks"), toks_register, mem_bot)

    primitive(strpool!("vadjust"), vadjust, 0)

    primitive(strpool!("valign"), valign, 0)

    primitive(strpool!("vcenter"), vcenter, 0)

    primitive(strpool!("vrule"), vrule, 0)
⟧

296. Each primitive has a corresponding inverse, so that it is possible to display the cryptic numeric contents of eqtb in symbolic form. Every call of primitive in this program is therefore accompanied by some straightforward code that forms part of the print_cmd_chr routine below.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    accent:

    print_esc(strpool!("accent"))

    advance:

    print_esc(strpool!("advance"))

    after_assignment:

    print_esc(strpool!("afterassignment"))

    after_group:

    print_esc(strpool!("aftergroup"))

    assign_font_dimen:

    print_esc(strpool!("fontdimen"))

    begin_group:

    print_esc(strpool!("begingroup"))

    break_penalty:

    print_esc(strpool!("penalty"))

    char_num:

    print_esc(strpool!("char"))

    cs_name:

    print_esc(strpool!("csname"))

    def_font:

    print_esc(strpool!("font"))

    delim_num:

    if (chr_code == 1) {
        print_esc(strpool!("Udelimiter"));
    } else {
        print_esc(strpool!("delimiter"));
    }

    divide:

    print_esc(strpool!("divide"))

    end_cs_name:

    print_esc(strpool!("endcsname"))

    end_group:

    print_esc(strpool!("endgroup"))

    ex_space:

    print_esc(ord!(" "))

    expand_after:

    if (chr_code == 0) {
        print_esc(strpool!("expandafter"));
    }

    ⟦1574 Cases of |expandafter| for |print_cmd_chr|⟧

    halign:

    print_esc(strpool!("halign"))

    hrule:

    print_esc(strpool!("hrule"))

    ignore_spaces:

    if (chr_code == 0) {
        print_esc(strpool!("ignorespaces"));
    } else {
        print_esc(strpool!("primitive"));
    }

    insert:

    print_esc(strpool!("insert"))

    ital_corr:

    print_esc(ord!("/"))

    mark:

    {
        print_esc(strpool!("mark"));
        if (chr_code > 0) {
            print_char(ord!("s"));
        }
    }

    math_accent:

    if (chr_code == 1) {
        print_esc(strpool!("Umathaccent"));
    } else {
        print_esc(strpool!("mathaccent"));
    }

    math_char_num:

    if (chr_code == 2) {
        print_esc(strpool!("Umathchar"));
    } else if (chr_code == 1) {
        print_esc(strpool!("Umathcharnum"));
    } else {
        print_esc(strpool!("mathchar"));
    }

    math_choice:

    print_esc(strpool!("mathchoice"))

    multiply:

    print_esc(strpool!("multiply"))

    no_align:

    print_esc(strpool!("noalign"))

    no_boundary:

    print_esc(strpool!("noboundary"))

    no_expand:

    if (chr_code == 0) {
        print_esc(strpool!("noexpand"));
    } else {
        print_esc(strpool!("primitive"));
    }

    non_script:

    print_esc(strpool!("nonscript"))

    omit:

    print_esc(strpool!("omit"))

    radical:

    if (chr_code == 1) {
        print_esc(strpool!("Uradical"));
    } else {
        print_esc(strpool!("radical"));
    }

    read_to_cs:

    if (chr_code == 0) {
        print_esc(strpool!("read"));
    }

    ⟦1571 Cases of |read| for |print_cmd_chr|⟧

    relax:

    print_esc(strpool!("relax"))

    set_box:

    print_esc(strpool!("setbox"))

    set_prev_graf:

    print_esc(strpool!("prevgraf"))

    set_shape:

    case chr_code {
      par_shape_loc:
        print_esc(strpool!("parshape"));
      ⟦1676 Cases of |set_shape| for |print_cmd_chr|⟧// 
      // there are no other cases
    }

    the:

    if (chr_code == 0) {
        print_esc(strpool!("the"));
    }

    ⟦1497 Cases of |the| for |print_cmd_chr|⟧

    toks_register:

    ⟦1644 Cases of |toks_register| for |print_cmd_chr|⟧

    vadjust:

    print_esc(strpool!("vadjust"))

    valign:

    if (chr_code == 0) {
        print_esc(strpool!("valign"));
    }

    ⟦1512 Cases of |valign| for |print_cmd_chr|⟧

    vcenter:

    print_esc(strpool!("vcenter"))

    vrule:

    print_esc(strpool!("vrule"))
⟧

297. We will deal with the other primitives later, at some point in the program where their eq_type and equiv values are more meaningful. For example, the primitives for math mode will be loaded when we consider the routines that deal with formulas. It is easy to find where each particular primitive was treated by looking in the index at the end; for example, the section where strpool!("radical") entered eqtb is listed under ‘\radical primitive’. (Primitives consisting of a single nonalphabetic character, like ‘\/’, are listed under ‘Single-character primitives’.)

Meanwhile, this is a convenient place to catch up on something we were unable to do before the hash table was defined:

⟦297 Print the font identifier for |font(p)|⟧ = ⟦
    print_esc(font_id_text(font(p)))
⟧

298. [19] Saving and restoring equivalents. The nested structure provided by ‘{}’ groups in TEX means that eqtb entries valid in outer groups should be saved and restored later if they are overridden inside the braces. When a new eqtb value is being assigned, the program therefore checks to see if the previous entry belongs to an outer level. In such a case, the old value is placed on the save_stack just before the new value enters eqtb . At the end of a grouping level, i.e., when the right brace is sensed, the save_stack is used to restore the outer values, and the inner ones are destroyed.

Entries on the save_stack are of type memory_word . The top item on this stack is save_stack[p] , where p == save_ptr - 1 ; it contains three fields called save_type , save_level , and save_index , and it is interpreted in one of five ways:

1) If save_type(p) == restore_old_value , then save_index(p) is a location in eqtb whose current value should be destroyed at the end of the current group and replaced by save_stack[p - 1] . Furthermore if save_index(p) >= int_base , then save_level(p) should replace the corresponding entry in xeq_level .

2) If save_type(p) == restore_zero , then save_index(p) is a location in eqtb whose current value should be destroyed at the end of the current group, when it should be replaced by the value of eqtb[undefined_control_sequence] .

3) If save_type(p) == insert_token , then save_index(p) is a token that should be inserted into TEX’s input when the current group ends.

4) If save_type(p) == level_boundary , then save_level(p) is a code explaining what kind of group we were previously in, and save_index(p) points to the level boundary word at the bottom of the entries for that group. Furthermore, in extended 𝜀-TEX mode, save_stack[p - 1] contains the source line number at which the current level of grouping was entered.

5) If save_type(p) == restore_sa , then sa_chain points to a chain of sparse array entries to be restored at the end of the current group. Furthermore save_index(p) and save_level(p) should replace the values of sa_chain and sa_level respectively.

// classifies a save_stack entry
@define save_type(#) => save_stack[#].hh.b0
// saved level for regions 5 and 6, or group code
@define save_level(#) => save_stack[#].hh.b1
//  eqtb location or token or save_stack location
@define save_index(#) => save_stack[#].hh.rh
//  save_type when a value should be restored later
@define restore_old_value => 0
//  save_type when an undefined entry should be restored
@define restore_zero => 1
//  save_type when a token is being saved for later use
@define insert_token => 2
//  save_type corresponding to beginning of group
@define level_boundary => 3
//  save_type when sparse array entries should be restored
@define restore_sa => 4
⟦314 Declare \eTeX\ procedures for tracing and input⟧

299. Here are the group codes that are used to discriminate between different kinds of groups. They allow TEX to decide what special actions, if any, should be performed when a group ends.

Some groups are not supposed to be ended by right braces. For example, the ‘$’ that begins a math formula causes a math_shift_group to be started, and this should be terminated by a matching ‘$’. Similarly, a group that starts with \left should end with \right, and one that starts with \begingroup should end with \endgroup.

@define bottom_level => 0 // group code for the outside 
// world
// group code for local structure only
@define simple_group => 1
@define hbox_group => 2 // code for `\.{\\hbox}\grp'
// code for `\.{\\hbox}\grp' in vertical mode
@define adjusted_hbox_group => 3
@define vbox_group => 4 // code for `\.{\\vbox}\grp'
@define vtop_group => 5 // code for `\.{\\vtop}\grp'
// code for `\.{\\halign}\grp', `\.{\\valign}\grp'
@define align_group => 6
@define no_align_group => 7 // code for `\.{\\noalign}\grp'
@define output_group => 8 // code for output routine
// code for, e.g., `\.{\char'136}\grp'
@define math_group => 9
// code for `\.{\\discretionary}\grp\grp\grp'
@define disc_group => 10
// code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'
@define insert_group => 11
@define vcenter_group => 12 // code for `\.{\\vcenter}\grp'
// code for `\.{\\mathchoice}\grp\grp\grp\grp'
@define math_choice_group => 13
// code for `\.{\\begingroup...\\endgroup}'
@define semi_simple_group => 14
@define math_shift_group => 15 // code for `\.{\$...\$}'
// code for `\.{\\left...\\right}'
@define math_left_group => 16
@define max_group_code => 16
⟦18 Types in the outer block⟧ += ⟦
    //  save_level for a level boundary
    type group_code = 0 .. max_group_code;
⟧

300. The global variable cur_group keeps track of what sort of group we are currently in. Another global variable, cur_boundary , points to the topmost level_boundary word. And cur_level is the current depth of nesting. The routines are designed to preserve the condition that no entry in the save_stack or in eqtb ever has a level greater than cur_level .

301.

⟦13 Global variables⟧ += ⟦
    var save_stack: ^memory_word;

    // first unused entry on save_stack 
    var save_ptr: 0 .. save_size;

    // maximum usage of save stack
    var max_save_stack: 0 .. save_size;

    // current nesting level for groups
    var cur_level: quarterword;

    // current group type
    var cur_group: group_code;

    // where the current level begins
    var cur_boundary: 0 .. save_size;
⟧

302. At this time it might be a good idea for the reader to review the introduction to eqtb that was given above just before the long lists of parameter names. Recall that the “outer level” of the program is level_one , since undefined control sequences are assumed to be “defined” at level_zero .

⟦23 Set initial values of key variables⟧ += ⟦
    save_ptr = 0

    cur_level = level_one

    cur_group = bottom_level

    cur_boundary = 0

    max_save_stack = 0

303. The following macro is used to test if there is room for up to seven more entries on save_stack . By making a conservative test like this, we can get by with testing for overflow in only a few places.

@define check_full_save_stack =>
    if (save_ptr > max_save_stack) {
        max_save_stack = save_ptr;
        if (max_save_stack > save_size - 7) {
            overflow(strpool!("save size"), save_size);
        }
    }

304. Procedure new_save_level is called when a group begins. The argument is a group identification code like ‘hbox_group ’. After calling this routine, it is safe to put five more entries on save_stack .

In some cases integer-valued items are placed onto the save_stack just below a level_boundary word, because this is a convenient place to keep information that is supposed to “pop up” just when the group has finished. For example, when ‘\hbox to 100pt{...}’ is being treated, the 100pt dimension is stored on save_stack just before new_save_level is called.

We use the notation saved(k) to stand for an integer item that appears in location save_ptr + k of the save stack.

@define saved(#) => save_stack[save_ptr + #].int
// begin a new level of grouping
function new_save_level(c: group_code) {
    check_full_save_stack;
    if (eTeX_ex) {
        saved(0) = line;
        incr(save_ptr);
    }
    save_type(save_ptr) = level_boundary;
    save_level(save_ptr) = cur_group;
    save_index(save_ptr) = cur_boundary;
    if (cur_level == max_quarterword) {
        // quit if ( cur_level + 1 ) is too big to be stored 
        // in eqtb 
        overflow(
          strpool!("grouping levels"),
          max_quarterword - min_quarterword,
        );
    }
    cur_boundary = save_ptr;
    cur_group = c;
    stat!{
        if (tracing_groups > 0) {
            group_trace(false);
        }
    }
    incr(cur_level);
    incr(save_ptr);
}

305. Just before an entry of eqtb is changed, the following procedure should be called to update the other data structures properly. It is important to keep in mind that reference counts in mem include references from within save_stack , so these counts must be handled carefully.

// gets ready to forget w 
function eq_destroy(w: memory_word) {
    var
      q: pointer; //  equiv field of w 
    
    case eq_type_field(w) {
      call, long_call, outer_call, long_outer_call:
        delete_token_ref(equiv_field(w));
      glue_ref:
        delete_glue_ref(equiv_field(w));
      shape_ref:
        // we need to free a \.{\\parshape} block
        q = equiv_field(w);
        if (q != null) {
            free_node(q, info(q) + info(q) + 1);
        }
        // such a block is 2 n + 1 words long, where n == 
        // info ( q ) 
      box_ref:
        flush_node_list(equiv_field(w));
      ⟦1645 Cases for |eq_destroy|⟧
      othercases:
        do_nothing;
    }
}

306. To save a value of eqtb[p] that was established at level l , we can use the following subroutine.

// saves eqtb [ p ] 
function eq_save(p: pointer, l: quarterword) {
    check_full_save_stack;
    if (l == level_zero) {
        save_type(save_ptr) = restore_zero;
    } else {
        save_stack[save_ptr] = eqtb[p];
        incr(save_ptr);
        save_type(save_ptr) = restore_old_value;
    }
    save_level(save_ptr) = l;
    save_index(save_ptr) = p;
    incr(save_ptr);
}

307. The procedure eq_define defines an eqtb entry having specified eq_type and equiv fields, and saves the former value if appropriate. This procedure is used only for entries in the first four regions of eqtb , i.e., only for entries that have eq_type and equiv fields. After calling this routine, it is safe to put four more entries on save_stack , provided that there was room for four more entries before the call, since eq_save makes the necessary test.

@define assign_trace(#) =>
    stat!{
        if (tracing_assigns > 0) {
            restore_trace(#);
        }
    }
// new data for eqtb 
function eq_define(
  p: pointer,
  t: quarterword,
  e: halfword,
) {
    label exit;
    
    if (eTeX_ex && (eq_type(p) == t) && (equiv(p) == e)) {
        assign_trace(p, strpool!("reassigning"));
        eq_destroy(eqtb[p]);
        return;
    }
    assign_trace(p, strpool!("changing"));
    if (eq_level(p) == cur_level) {
        eq_destroy(eqtb[p]);
    } else if (cur_level > level_one) {
        eq_save(p, eq_level(p));
    }
    eq_level(p) = cur_level;
    eq_type(p) = t;
    equiv(p) = e;
    assign_trace(p, strpool!("into"));
  exit:
}

308. The counterpart of eq_define for the remaining (fullword) positions in eqtb is called eq_word_define . Since xeq_level[p] >= level_one for all p , a ‘restore_zero ’ will never be used in this case.

function eq_word_define(p: pointer, w: integer) {
    label exit;
    
    if (eTeX_ex && (eqtb[p].int == w)) {
        assign_trace(p, strpool!("reassigning"));
        return;
    }
    assign_trace(p, strpool!("changing"));
    if (xeq_level[p] != cur_level) {
        eq_save(p, xeq_level[p]);
        xeq_level[p] = cur_level;
    }
    eqtb[p].int = w;
    assign_trace(p, strpool!("into"));
  exit:
}

309. The eq_define and eq_word_define routines take care of local definitions. Global definitions are done in almost the same way, but there is no need to save old values, and the new value is associated with level_one .

// global eq_define 
function geq_define(
  p: pointer,
  t: quarterword,
  e: halfword,
) {
    assign_trace(p, strpool!("globally changing"));
    {
        eq_destroy(eqtb[p]);
        eq_level(p) = level_one;
        eq_type(p) = t;
        equiv(p) = e;
    }
    assign_trace(p, strpool!("into"));
}

// global eq_word_define 
function geq_word_define(p: pointer, w: integer) {
    assign_trace(p, strpool!("globally changing"));
    {
        eqtb[p].int = w;
        xeq_level[p] = level_one;
    }
    assign_trace(p, strpool!("into"));
}

310. Subroutine save_for_after puts a token on the stack for save-keeping.

function save_for_after(t: halfword) {
    if (cur_level > level_one) {
        check_full_save_stack;
        save_type(save_ptr) = insert_token;
        save_level(save_ptr) = level_zero;
        save_index(save_ptr) = t;
        incr(save_ptr);
    }
}

311. The unsave routine goes the other way, taking items off of save_stack . This routine takes care of restoration when a level ends; everything belonging to the topmost group is cleared off of the save stack.

forward_declaration back_input();

// pops the top level off the save stack
function unsave() {
    label done;
    var
      p: pointer, // position to be restored
      l: quarterword, // saved level, if in fullword regions 
      // of eqtb 
      t: halfword, // saved value of cur_tok 
      a: boolean; // have we already processed an 
      // \.{\\aftergroup} ?
    
    a = false;
    if (cur_level > level_one) {
        decr(cur_level);
        ⟦312 Clear off top level from |save_stack|⟧
    } else {
        //  unsave is not used when cur_group == 
        // bottom_level 
        confusion(strpool!("curlevel"));
    }
}

312.

⟦312 Clear off top level from |save_stack|⟧ = ⟦
    loop {
        decr(save_ptr);
        if (save_type(save_ptr) == level_boundary) {
            goto done;
        }
        p = save_index(save_ptr);
        if (save_type(save_ptr) == insert_token) {
            ⟦356 Insert token |p| into \TeX's input⟧
        } else if (save_type(save_ptr) == restore_sa) {
            sa_restore;
            sa_chain = p;
            sa_level = save_level(save_ptr);
        } else {
            if (save_type(save_ptr) == restore_old_value) {
                l = save_level(save_ptr);
                decr(save_ptr);
            } else {
                save_stack[save_ptr] = eqtb[
                  undefined_control_sequence,
                ];
            }
            ⟦313 Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless |eqtb[p]| holds a global value⟧
        }
    }

    done:

    stat!{
        if (tracing_groups > 0) {
            group_trace(true);
        }
    }

    if (grp_stack[in_open] == cur_boundary) {
        // groups possibly not properly nested with files
        group_warning;
    }

    cur_group = save_level(save_ptr)

    cur_boundary = save_index(save_ptr)

    if (eTeX_ex) {
        decr(save_ptr);
    }
⟧

313. A global definition, which sets the level to level_one , will not be undone by unsave . If at least one global definition of eqtb[p] has been carried out within the group that just ended, the last such definition will therefore survive.

⟦313 Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless |eqtb[p]| holds a global value⟧ = ⟦
    if ((p < int_base) || (p > eqtb_size)) {
        if (eq_level(p) == level_one) {
            // destroy the saved value
            eq_destroy(save_stack[save_ptr]);
            stat!{
                if (tracing_restores > 0) {
                    restore_trace(p, strpool!("retaining"));
                }
            }
        } else {
            // destroy the current value
            eq_destroy(eqtb[p]);
            // restore the saved value
            eqtb[p] = save_stack[save_ptr];
            stat!{
                if (tracing_restores > 0) {
                    restore_trace(p, strpool!("restoring"));
                }
            }
        }
    } else if (xeq_level[p] != level_one) {
        eqtb[p] = save_stack[save_ptr];
        xeq_level[p] = l;
        stat!{
            if (tracing_restores > 0) {
                restore_trace(p, strpool!("restoring"));
            }
        }
    } else {
        stat!{
            if (tracing_restores > 0) {
                restore_trace(p, strpool!("retaining"));
            }
        }
    }
⟧

314.

⟦314 Declare \eTeX\ procedures for tracing and input⟧ = ⟦
    stat!{
        //  eqtb [ p ] has just been restored or retained
        function restore_trace(p: pointer, s: str_number) {
            begin_diagnostic;
            print_char(ord!("{"));
            print(s);
            print_char(ord!(" "));
            show_eqtb(p);
            print_char(ord!("}"));
            end_diagnostic(false);
        }
    }
⟧

315. When looking for possible pointers to a memory location, it is helpful to look for references from eqtb that might be waiting on the save stack. Of course, we might find spurious pointers too; but this routine is merely an aid when debugging, and at such times we are grateful for any scraps of information, even if they prove to be irrelevant.

⟦315 Search |save_stack| for equivalents that point to |p|⟧ = ⟦
    if (save_ptr > 0) {
        for (q in 0 to save_ptr - 1) {
            if (equiv_field(save_stack[q]) == p) {
                print_nl(strpool!("SAVE("));
                print_int(q);
                print_char(ord!(")"));
            }
        }
    }
⟧

316. Most of the parameters kept in eqtb can be changed freely, but there’s an exception: The magnification should not be used with two different values during any TEX job, since a single magnification is applied to an entire run. The global variable mag_set is set to the current magnification whenever it becomes necessary to “freeze” it at a particular value.

⟦13 Global variables⟧ += ⟦
    // if nonzero, this magnification should be used 
    // henceforth
    var mag_set: integer;
⟧

317.

⟦23 Set initial values of key variables⟧ += ⟦
    mag_set = 0

318. The prepare_mag subroutine is called whenever TEX wants to use mag for magnification.

function prepare_mag() {
    if ((mag_set > 0) && (mag != mag_set)) {
        print_err(strpool!("Incompatible magnification ("));
        print_int(mag);
        print(strpool!(");"));
        print_nl(
          strpool!(" the previous value will be retained"),
        );
        help2(
          strpool!("I can handle only one magnification ratio per job. So I've"),
        )(
          strpool!("reverted to the magnification you used earlier on this run."),
        );
        int_error(mag_set);
        //  mag = mag_set 
        geq_word_define(int_base + mag_code, mag_set);
    }
    if ((mag <= 0) || (mag > 32768)) {
        print_err(
          strpool!("Illegal magnification has been changed to 1000"),
        );
        help1(
          strpool!("The magnification ratio must be between 1 and 32768."),
        );
        int_error(mag);
        geq_word_define(int_base + mag_code, 1000);
    }
    mag_set = mag;
}

319. [20] Token lists. A TEX token is either a character or a control sequence, and it is represented internally in one of two ways: (1) A character whose ASCII code number is c and whose command code is m is represented as the number 221𝑚+𝑐; the command code is in the range 1 <= m <= 14 . (2) A control sequence whose eqtb address is p is represented as the number cs_token_flag + p . Here cs_token_flag == 2^{25}-1 is larger than 221𝑚+𝑐, yet it is small enough that cs_token_flag + p < max_halfword ; thus, a token fits comfortably in a halfword.

A token t represents a left_brace command if and only if t < left_brace_limit ; it represents a right_brace command if and only if we have left_brace_limit <= t < right_brace_limit ; and it represents a match or end_match command if and only if match_token <= t <= end_match_token . The following definitions take care of these token-oriented constants and a few others.

// amount added to the eqtb location in a token that stands 
// for a control sequence; is a multiple of~ 0x10000 , 
// less~1
@define cs_token_flag => 0x1ffffff
// to separate char and command code
@define max_char_val => 0x200000
// $2^{21}\cdot left_brace $
@define left_brace_token => 0x200000
// $2^{21}\cdot( left_brace +1)$
@define left_brace_limit => 0x400000
// $2^{21}\cdot right_brace $
@define right_brace_token => 0x400000
// $2^{21}\cdot( right_brace +1)$
@define right_brace_limit => 0x600000
// $2^{21}\cdot math_shift $
@define math_shift_token => 0x600000
@define tab_token => 0x800000 // $2^{21}\cdot tab_mark $
// $2^{21}\cdot out_param $
@define out_param_token => 0xa00000
// $2^{21}\cdot spacer + ord!(" ") $
@define space_token => 0x1400020
@define letter_token => 0x1600000 // $2^{21}\cdot letter $
@define other_token => 0x1800000 // $2^{21}\cdot other_char 
// $
@define match_token => 0x1a00000 // $2^{21}\cdot match $
// $2^{21}\cdot end_match $
@define end_match_token => 0x1c00000
// $2^{21}\cdot end_match +1$
@define protected_token => end_match_token + 1

320.

⟦14 Check the ``constant'' values for consistency⟧ += ⟦
    if (
        cs_token_flag
        + eqtb_size + hash_extra > max_halfword
    ) {
        bad = 21;
    }

    if ((hash_offset < 0) || (hash_offset > hash_base)) {
        bad = 42;
    }
⟧

321. A token list is a singly linked list of one-word nodes in mem , where each word contains a token and a link. Macro definitions, output-routine definitions, marks, \write texts, and a few other things are remembered by TEX in the form of token lists, usually preceded by a node with a reference count in its token_ref_count field. The token stored in location p is called info(p) .

Three special commands appear in the token lists of macro definitions. When m == match , it means that TEX should scan a parameter for the current macro; when m == end_match , it means that parameter matching should end and TEX should start reading the macro text; and when m == out_param , it means that TEX should insert parameter number c into the text at this point.

The enclosing { and } characters of a macro definition are omitted, but an output routine will be enclosed in braces.

Here is an example macro definition that illustrates these conventions. After TEX processes the text

\def\maca#1#2\b{#1\-a##1#2#2}
the definition of \mac is represented as a token list containing
(referencecount),lettera,match#,match#,spacer,\b,end_match,out_param1,\-,lettera,spacer,mac_param#,other_char1,out_param2,spacer,out_param2.
The procedure scan_toks builds such token lists, and macro_call does the parameter matching.

Examples such as

\def\m{\def\m{a}b}
explain why reference counts would be needed even if TEX had no \let operation: When the token list for \m is being read, the redefinition of \m changes the eqtb entry before the token list has been fully consumed, so we dare not simply destroy a token list when its control sequence is being redefined.

If the parameter-matching part of a definition ends with ‘#{’, the corresponding token list will have ‘{’ just before the ‘end_match ’ and also at the very end. The first ‘{’ is used to delimit the parameter; the second one keeps the first from disappearing.

322. The procedure show_token_list , which prints a symbolic form of the token list that starts at a given node p , illustrates these conventions. The token list being displayed should not begin with a reference count. However, the procedure is intended to be robust, so that if the memory links are awry or if p is not really a pointer to a token list, nothing catastrophic will happen.

An additional parameter q is also given; this parameter is either null or it points to a node in the token list where a certain magic computation takes place that will be explained later. (Basically, q is non-null when we are printing the two-line context information at the time of an error message; q marks the place corresponding to where the second line should begin.)

For example, if p points to the node containing the first a in the token list above, then show_token_list will print the string

a#1#2\b->#1\-a##1#2#2;
and if q points to the node containing the second a, the magic computation will be performed just before the second a is printed.

The generation will stop, and ‘\ETC.’ will be printed, if the length of printing exceeds a given limit l . Anomalous entries are printed in the form of control sequences that are not followed by a blank space, e.g., ‘\BAD.’; this cannot be confused with actual control sequences because a real control sequence named BAD would come out ‘\BAD ’.

⟦322 Declare the procedure called |show_token_list|⟧ = ⟦
    function show_token_list(p, q: integer, l: integer) {
        label exit;
        var
          m, c: integer, // pieces of a token
          match_chr: integer, // character used in a ` match 
          // '
          n: ASCII_code; // the highest parameter number, as 
          // an ASCII digit
        
        match_chr = ord!("#");
        n = ord!("0");
        tally = 0;
        while ((p != null) && (tally < l)) {
            if (p == q) {
                ⟦350 Do magic computation⟧
            }
            ⟦323 Display token |p|, and |return| if there are problems⟧
            p = link(p);
        }
        if (p != null) {
            print_esc(strpool!("ETC."));
        }
      exit:
    }
⟧

323.

⟦323 Display token |p|, and |return| if there are problems⟧ = ⟦
    if ((p < hi_mem_min) || (p > mem_end)) {
        print_esc(strpool!("CLOBBERED."));
        return;
    }

    if (info(p) >= cs_token_flag) {
        print_cs(info(p) - cs_token_flag);
    } else {
        m = info(p) div max_char_val;
        c = info(p) % max_char_val;
        if (info(p) < 0) {
            print_esc(strpool!("BAD."));
        } else {
            ⟦324 Display the token $(|m|,|c|)$⟧
        }
    }
⟧

324. The procedure usually “learns” the character code used for macro parameters by seeing one in a match command before it runs into any out_param commands.

⟦324 Display the token $(|m|,|c|)$⟧ = ⟦
    case m {
      left_brace,
      right_brace,
      math_shift,
      tab_mark,
      sup_mark,
      sub_mark,
      spacer,
      letter,
      other_char:
        print_char(c);
      mac_param:
        print_char(c);
        print_char(c);
      out_param:
        print_char(match_chr);
        if (c <= 9) {
            print_char(c + ord!("0"));
        } else {
            print_char(ord!("!"));
            return;
        }
      match:
        match_chr = c;
        print_char(c);
        incr(n);
        print_char(n);
        if (n > ord!("9")) {
            return;
        }
      end_match:
        if (c == 0) {
            print(strpool!("->"));
        }
      othercases:
        print_esc(strpool!("BAD."));
    }
⟧

325. Here’s the way we sometimes want to display a token list, given a pointer to its reference count; the pointer may be null.

function token_show(p: pointer) {
    if (p != null) {
        show_token_list(link(p), null, 10000000);
    }
}

326. The print_meaning subroutine displays cur_cmd and cur_chr in symbolic form, including the expansion of a macro or mark.

function print_meaning() {
    print_cmd_chr(cur_cmd, cur_chr);
    if (cur_cmd >= call) {
        print_char(ord!(":"));
        print_ln;
        token_show(cur_chr);
    } else if (
        (cur_cmd == top_bot_mark)
        && (cur_chr < marks_code)
    ) {
        print_char(ord!(":"));
        print_ln;
        token_show(cur_mark[cur_chr]);
    }
}

327. [21] Introduction to the syntactic routines. Let’s pause a moment now and try to look at the Big Picture. The TEX program consists of three main parts: syntactic routines, semantic routines, and output routines. The chief purpose of the syntactic routines is to deliver the user’s input to the semantic routines, one token at a time. The semantic routines act as an interpreter responding to these tokens, which may be regarded as commands. And the output routines are periodically called on to convert box-and-glue lists into a compact set of instructions that will be sent to a typesetter. We have discussed the basic data structures and utility routines of TEX, so we are good and ready to plunge into the real activity by considering the syntactic routines.

Our current goal is to come to grips with the get_next procedure, which is the keystone of TEX’s input mechanism. Each call of get_next sets the value of three variables cur_cmd , cur_chr , and cur_cs , representing the next input token.

cur_cmddenotesacommandcodefromthelonglistofcodesgivenabove;cur_chrdenotesacharactercodeorothermodierofthecommandcode;cur_csistheeqtblocationofthecurrentcontrolsequence,ifthecurrenttokenwasacontrolsequence,otherwiseitszero.
Underlying this external behavior of get_next is all the machinery necessary to convert from character files to tokens. At a given time we may be only partially finished with the reading of several files (for which \input was specified), and partially finished with the expansion of some user-defined macros and/or some macro parameters, and partially finished with the generation of some text in a template for \halign, and so on. When reading a character file, special characters must be classified as math delimiters, etc.; comments and extra blank spaces must be removed, paragraphs must be recognized, and control sequences must be found in the hash table. Furthermore there are occasions in which the scanning routines have looked ahead for a word like ‘plus’ but only part of that word was found, hence a few characters must be put back into the input and scanned again.

To handle these situations, which might all be present simultaneously, TEX uses various stacks that hold information about the incomplete activities, and there is a finite state control for each level of the input mechanism. These stacks record the current state of an implicitly recursive process, but the get_next procedure is not recursive. Therefore it will not be difficult to translate these algorithms into low-level languages that do not support recursion.

⟦13 Global variables⟧ += ⟦
    // current command set by get_next 
    var cur_cmd: eight_bits;

    // operand of current command
    var cur_chr: halfword;

    // control sequence found here, zero if none found
    var cur_cs: pointer;

    // packed representative of cur_cmd and cur_chr 
    var cur_tok: halfword;
⟧

328. The print_cmd_chr routine prints a symbolic interpretation of a command code and its modifier. This is used in certain ‘You can t’ error messages, and in the implementation of diagnostic routines like \show.

The body of print_cmd_chr is a rather tedious listing of print commands, and most of it is essentially an inverse to the primitive routine that enters a TEX primitive into eqtb . Therefore much of this procedure appears elsewhere in the program, together with the corresponding primitive calls.

@define chr_cmd(#) =>
    {
        print(#);
        if (chr_code < 0x10000) {
            print_ASCII(chr_code);
        } else {
            // non-Plane 0 Unicodes can't be sent through 
            // print_ASCII 
            print_char(chr_code);
        }
    }
⟦328 Declare the procedure called |print_cmd_chr|⟧ = ⟦
    function print_cmd_chr(
      cmd: quarterword,
      chr_code: halfword,
    ) {
        var
          n: integer, // temp variable
          font_name_str: str_number, // local vars for 
          // \.{\\fontname} quoting extension
          quote_char: UTF16_code;
        
        case cmd {
          left_brace:
            chr_cmd(strpool!("begin-group character "));
          right_brace:
            chr_cmd(strpool!("end-group character "));
          math_shift:
            chr_cmd(strpool!("math shift character "));
          mac_param:
            chr_cmd(strpool!("macro parameter character "));
          sup_mark:
            chr_cmd(strpool!("superscript character "));
          sub_mark:
            chr_cmd(strpool!("subscript character "));
          endv:
            print(strpool!("end of alignment template"));
          spacer:
            chr_cmd(strpool!("blank space "));
          letter:
            chr_cmd(strpool!("the letter "));
          other_char:
            chr_cmd(strpool!("the character "));
          ⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧
          othercases:
            print(strpool!("[unknown command code!]"));
        }
    }
⟧

329. Here is a procedure that displays the current command.

function show_cur_cmd_chr() {
    var
      n: integer, // level of \.{\\if...\\fi} nesting
      l: integer, // line where \.{\\if} started
      p: pointer;
    
    begin_diagnostic;
    print_nl(ord!("{"));
    if (mode != shown_mode) {
        print_mode(mode);
        print(strpool!(": "));
        shown_mode = mode;
    }
    print_cmd_chr(cur_cmd, cur_chr);
    if (tracing_ifs > 0) {
        if (cur_cmd >= if_test) {
            if (cur_cmd <= fi_or_else) {
                print(strpool!(": "));
                if (cur_cmd == fi_or_else) {
                    print_cmd_chr(if_test, cur_if);
                    print_char(ord!(" "));
                    n = 0;
                    l = if_line;
                } else {
                    n = 1;
                    l = line;
                }
                p = cond_ptr;
                while (p != null) {
                    incr(n);
                    p = link(p);
                }
                print(strpool!("(level "));
                print_int(n);
                print_char(ord!(")"));
                print_if_line(l);
            }
        }
    }
    print_char(ord!("}"));
    end_diagnostic(false);
}

330. [22] Input stacks and states. This implementation of TEX uses two different conventions for representing sequential stacks.

1) If there is frequent access to the top entry, and if the stack is essentially never empty, then the top entry is kept in a global variable (even better would be a machine register), and the other entries appear in the array \𝑠𝑡𝑎𝑐𝑘[0(\𝑝𝑡𝑟1)]. For example, the semantic stack described above is handled this way, and so is the input stack that we are about to study.

2) If there is infrequent top access, the entire stack contents are in the array \𝑠𝑡𝑎𝑐𝑘[0(\𝑝𝑡𝑟1)]. For example, the save_stack is treated this way, as we have seen.

The state of TEX’s input mechanism appears in the input stack, whose entries are records with six fields, called state , index , start , loc , limit , and name . This stack is maintained with convention (1), so it is declared in the following way:

⟦18 Types in the outer block⟧ += ⟦
    type in_state_record = record {
        state_field, index_field: quarterword,
        start_field, loc_field, limit_field, name_field: halfword,
        // stack the tag of the current file
        synctex_tag_field: integer,
    };
⟧

331.

⟦13 Global variables⟧ += ⟦
    var input_stack: ^in_state_record;

    // first unused location of input_stack 
    var input_ptr: 0 .. stack_size;

    // largest value of input_ptr when pushing
    var max_in_stack: 0 .. stack_size;

    // the ``top'' input state, according to convention (1)
    var cur_input: in_state_record;
⟧

332. We’ve already defined the special variable loc => cur_input.loc_field in our discussion of basic input-output routines. The other components of cur_input are defined in the same way:

// current scanner state
@define state => cur_input.state_field
// reference for buffer information
@define index => cur_input.index_field
// starting position in buffer 
@define start => cur_input.start_field
// end of current line in buffer 
@define limit => cur_input.limit_field
// name of the current file
@define name => cur_input.name_field
// {\sl Sync\TeX} tag of the current file
@define synctex_tag => cur_input.synctex_tag_field

333. Let’s look more closely now at the control variables (state , index , start , loc , limit , name ), assuming that TEX is reading a line of characters that have been input from some file or from the user’s terminal. There is an array called buffer that acts as a stack of all lines of characters that are currently being read from files, including all lines on subsidiary levels of the input stack that are not yet completed. TEX will return to the other lines when it is finished with the present input file.

(Incidentally, on a machine with byte-oriented addressing, it might be appropriate to combine buffer with the str_pool array, letting the buffer entries grow downward from the top of the string pool and checking that these two tables don’t bump into each other.)

The line we are currently working on begins in position start of the buffer; the next character we are about to read is buffer[loc] ; and limit is the location of the last character present. If loc > limit , the line has been completely read. Usually buffer[limit] is the end_line_char , denoting the end of a line, but this is not true if the current line is an insertion that was entered on the user’s terminal in response to an error message.

The name variable is a string number that designates the name of the current file, if we are reading a text file. It is zero if we are reading from the terminal; it is n + 1 if we are reading from input stream n , where 0 <= n <= 16 . (Input stream 16 stands for an invalid stream number; in such cases the input is actually from the terminal, under control of the procedure read_toks .) Finally 18 <= name <= 19 indicates that we are reading a pseudo file created by the \scantokens command.

The state variable has one of three values, when we are scanning such files:

1)state==mid_lineisthenormalstate.2)state==skip_blanksislikemid_line,butblanksareignored.3)state==new_lineisthestateatthebeginningofaline.
These state values are assigned numeric codes so that if we add the state code to the next character’s command code, we get distinct values. For example, ‘mid_line + spacer ’ stands for the case that a blank space character occurs in the middle of a line when it is not being ignored; after this case is processed, the next value of state will be skip_blanks .

//  state code when scanning a line of characters
@define mid_line => 1
//  state code when ignoring blanks
@define skip_blanks => 2 + max_char_code
//  state code at start of line
@define new_line => 3 + max_char_code + max_char_code

334. Additional information about the current line is available via the index variable, which counts how many lines of characters are present in the buffer below the current level. We have index == 0 when reading from the terminal and prompting the user for each line; then if the user types, e.g., ‘\input paper’, we will have index == 1 while reading the file paper.tex. However, it does not follow that index is the same as the input stack pointer, since many of the levels on the input stack may come from token lists. For example, the instruction ‘\input paper’ might occur in a token list.

The global variable in_open is equal to the index value of the highest non-token-list level. Thus, the number of partially read lines in the buffer is in_open + 1 , and we have in_open == index when we are not reading a token list.

If we are not currently reading from the terminal, or from an input stream, we are reading from the file variable input_file[index] . We use the notation terminal_input as a convenient abbreviation for name == 0 , and cur_file as an abbreviation for input_file[index] .

The global variable line contains the line number in the topmost open file, for use in error messages. If we are not reading from the terminal, line_stack[index] holds the line number for the enclosing level, so that line can be restored when the current file has been read. Line numbers should never be negative, since the negative of the current line number is used to identify the user’s output routine in the mode_line field of the semantic nest entries.

If more information about the input state is needed, it can be included in small arrays like those shown here. For example, the current page or segment number in the input file might be put into a variable page , maintained for enclosing levels in ‘var page_stack: array [1 .. max_in_open] of integer; ’ by analogy with line_stack .

@define terminal_input =>
    (name == 0) // are we reading from the terminal?
// the current alpha_file variable
@define cur_file => input_file[index]
⟦13 Global variables⟧ += ⟦
    // the number of lines in the buffer, less one
    var in_open: 0 .. max_in_open;

    // the number of open text files
    var open_parens: 0 .. max_in_open;

    var input_file: ^unicode_file;

    // current line number in the current source file
    var line: integer;

    var line_stack: ^integer;

    var source_filename_stack: ^str_number;

    var full_source_filename_stack: ^str_number;
⟧

335. Users of TEX sometimes forget to balance left and right braces properly, and one of the ways TEX tries to spot such errors is by considering an input file as broken into subfiles by control sequences that are declared to be \outer.

A variable called scanner_status tells TEX whether or not to complain when a subfile ends. This variable has six possible values:

normal , means that a subfile can safely end here without incident.

skipping , means that a subfile can safely end here, but not a file, because we’re reading past some conditional text that was not selected.

defining , means that a subfile shouldn’t end now because a macro is being defined.

matching , means that a subfile shouldn’t end now because a macro is being used and we are searching for the end of its arguments.

aligning , means that a subfile shouldn’t end now because we are not finished with the preamble of an \halign or \valign.

absorbing , means that a subfile shouldn’t end now because we are reading a balanced token list for \message, \write, etc.

If the scanner_status is not normal , the variable warning_index points to the eqtb location for the relevant control sequence name to print in an error message.

//  scanner_status when passing conditional text
@define skipping => 1
//  scanner_status when reading a macro definition
@define defining => 2
//  scanner_status when reading macro arguments
@define matching => 3
//  scanner_status when reading an alignment preamble
@define aligning => 4
//  scanner_status when reading a balanced text
@define absorbing => 5
⟦13 Global variables⟧ += ⟦
    // can a subfile end now?
    var scanner_status: normal .. absorbing;

    // identifier relevant to non- normal scanner status
    var warning_index: pointer;

    // reference count of token list being defined
    var def_ref: pointer;
⟧

336. Here is a procedure that uses scanner_status to print a warning message when a subfile has ended, and at certain other crucial times:

⟦336 Declare the procedure called |runaway|⟧ = ⟦
    function runaway() {
        var
          p: pointer; // head of runaway list
        
        if (scanner_status > skipping) {
            case scanner_status {
              defining:
                print_nl(strpool!("Runaway definition"));
                p = def_ref;
              matching:
                print_nl(strpool!("Runaway argument"));
                p = temp_head;
              aligning:
                print_nl(strpool!("Runaway preamble"));
                p = hold_head;
              absorbing:
                print_nl(strpool!("Runaway text"));
                p = def_ref;
              // there are no other cases
            }
            print_char(ord!("?"));
            print_ln;
            show_token_list(link(p), null, error_line - 10);
        }
    }
⟧

337. However, all this discussion about input state really applies only to the case that we are inputting from a file. There is another important case, namely when we are currently getting input from a token list. In this case state == token_list , and the conventions about the other state variables are different:

loc is a pointer to the current node in the token list, i.e., the node that will be read next. If loc == null , the token list has been fully read.

start points to the first node of the token list; this node may or may not contain a reference count, depending on the type of token list involved.

token_type , which takes the place of index in the discussion above, is a code number that explains what kind of token list is being scanned.

name points to the eqtb address of the control sequence being expanded, if the current token list is a macro.

param_start , which takes the place of limit , tells where the parameters of the current macro begin in the param_stack , if the current token list is a macro.

The token_type can take several values, depending on where the current token list came from:

parameter , if a parameter is being scanned;

u_template , if the 𝑢𝑗 part of an alignment template is being scanned;

v_template , if the 𝑣𝑗 part of an alignment template is being scanned;

backed_up , if the token list being scanned has been inserted as ‘to be read again’;

inserted , if the token list being scanned has been inserted as the text expansion of a \count or similar variable;

macro , if a user-defined control sequence is being scanned;

output_text , if an \output routine is being scanned;

every_par_text , if the text of \everypar is being scanned;

every_math_text , if the text of \everymath is being scanned;

every_display_text , if the text of \everydisplay is being scanned;

every_hbox_text , if the text of \everyhbox is being scanned;

every_vbox_text , if the text of \everyvbox is being scanned;

every_job_text , if the text of \everyjob is being scanned;

every_cr_text , if the text of \everycr is being scanned;

mark_text , if the text of a \mark is being scanned;

write_text , if the text of a \write is being scanned.

The codes for output_text , every_par_text , etc., are equal to a constant plus the corresponding codes for token list parameters output_routine_loc , every_par_loc , etc. The token list begins with a reference count if and only if token_type >= macro .

Since 𝜀-TEX’s additional token list parameters precede toks_base , the corresponding token types must precede write_text .

//  state code when scanning a token list
@define token_list => 0
@define token_type => index // type of current token list
// base of macro parameters in param_stack 
@define param_start => limit
@define parameter => 0 //  token_type code for parameter
//  token_type code for \<u_j> template
@define u_template => 1
//  token_type code for \<v_j> template
@define v_template => 2
//  token_type code for text to be reread
@define backed_up => 3
// special code for backed-up char from \\XeTeXinterchartoks 
// hook
@define backed_up_char => 4
@define inserted => 5 //  token_type code for inserted texts
//  token_type code for defined control sequences
@define macro => 6
//  token_type code for output routines
@define output_text => 7
//  token_type code for \.{\\everypar}
@define every_par_text => 8
//  token_type code for \.{\\everymath}
@define every_math_text => 9
//  token_type code for \.{\\everydisplay}
@define every_display_text => 10
//  token_type code for \.{\\everyhbox}
@define every_hbox_text => 11
//  token_type code for \.{\\everyvbox}
@define every_vbox_text => 12
//  token_type code for \.{\\everyjob}
@define every_job_text => 13
//  token_type code for \.{\\everycr}
@define every_cr_text => 14
//  token_type code for \.{\\topmark}, etc.
@define mark_text => 15
@define eTeX_text_offset => output_routine_loc - output_text
//  token_type code for \.{\\everyeof}
@define every_eof_text => every_eof_loc - eTeX_text_offset
//  token_type code for \.{\\XeTeXinterchartoks}
@define inter_char_text =>
    XeTeX_inter_char_loc - eTeX_text_offset
//  token_type code for \.{\\write}
@define write_text => toks_base - eTeX_text_offset

338. The param_stack is an auxiliary array used to hold pointers to the token lists for parameters at the current level and subsidiary levels of input. This stack is maintained with convention (2), and it grows at a different rate from the others.

⟦13 Global variables⟧ += ⟦
    // token list pointers for parameters
    var param_stack: ^pointer;

    // first unused entry in param_stack 
    var param_ptr: 0 .. param_size;

    // largest value of param_ptr , will be <= param_size + 
    // 9 
    var max_param_stack: integer;
⟧

339. The input routines must also interact with the processing of \halign and \valign, since the appearance of tab marks and \cr in certain places is supposed to trigger the beginning of special 𝑣𝑗 template text in the scanner. This magic is accomplished by an align_state variable that is increased by 1 when a ‘{’ is scanned and decreased by 1 when a ‘}’ is scanned. The align_state is nonzero during the 𝑢𝑗 template, after which it is set to zero; the 𝑣𝑗 template begins when a tab mark or \cr occurs at a time that align_state == 0 .

⟦13 Global variables⟧ += ⟦
    // group level with respect to current alignment
    var align_state: integer;
⟧

340. Thus, the “current input state” can be very complicated indeed; there can be many levels and each level can arise in a variety of ways. The show_context procedure, which is used by TEX’s error-reporting routine to print out the current input state on all levels down to the most recent line of characters from an input file, illustrates most of these conventions. The global variable base_ptr contains the lowest level that was displayed by this procedure.

⟦13 Global variables⟧ += ⟦
    // shallowest level shown by show_context 
    var base_ptr: 0 .. stack_size;
⟧

341. The status at each level is indicated by printing two lines, where the first line indicates what was read so far and the second line shows what remains to be read. The context is cropped, if necessary, so that the first line contains at most half_error_line characters, and the second contains at most error_line . Non-current input levels whose token_type is ‘backed_up ’ are shown only if they have not been fully read.

// prints where the scanner is
function show_context() {
    label done;
    var
      old_setting: 0 .. max_selector, // saved selector 
      // setting
      nn: integer, // number of contexts shown so far, less 
      // one
      bottom_line: boolean, // have we reached the final 
      // context to be shown?
      ⟦345 Local variables for formatting calculations⟧;
    
    base_ptr = input_ptr;
    // store current state
    input_stack[base_ptr] = cur_input;
    nn = -1;
    bottom_line = false;
    loop {
        // enter into the context
        cur_input = input_stack[base_ptr];
        if ((state != token_list)) {
            if ((name > 19) || (base_ptr == 0)) {
                bottom_line = true;
            }
        }
        if (
            (base_ptr == input_ptr)
            || bottom_line || (nn < error_context_lines)
        ) {
            ⟦342 Display the current context⟧
        } else if (nn == error_context_lines) {
            print_nl(strpool!("..."));
            // omitted if error_context_lines < 0 
            incr(nn);
        }
        if (bottom_line) {
            goto done;
        }
        decr(base_ptr);
    }
  done:
    // restore original state
    cur_input = input_stack[input_ptr];
}

342.

⟦342 Display the current context⟧ = ⟦
    {
        // we omit backed-up token lists that have already 
        // been read
        if (
            (base_ptr == input_ptr)
            || (state != token_list)
            || (token_type != backed_up) || (loc != null)
        ) {
            // get ready to count characters
            tally = 0;
            old_setting = selector;
            if (state != token_list) {
                ⟦343 Print location of current line⟧
                ⟦348 Pseudoprint the line⟧
            } else {
                ⟦344 Print type of token list⟧
                ⟦349 Pseudoprint the token list⟧
            }
            // stop pseudoprinting
            selector = old_setting;
            ⟦347 Print two lines using the tricky pseudoprinted information⟧
            incr(nn);
        }
    }
⟧

343. This routine should be changed, if necessary, to give the best possible indication of where the current line resides in the input file. For example, on some systems it is best to print both a page and line number.

⟦343 Print location of current line⟧ = ⟦
    if (name <= 17) {
        if (terminal_input) {
            if (base_ptr == 0) {
                print_nl(strpool!("<*>"));
            } else {
                print_nl(strpool!("<insert> "));
            }
        } else {
            print_nl(strpool!("<read "));
            if (name == 17) {
                print_char(ord!("*"));
            } else {
                print_int(name - 1);
            }
            print_char(ord!(">"));
        }
    } else {
        print_nl(strpool!("l."));
        if (index == in_open) {
            print_int(line);
        } else {
            // input from a pseudo file
            print_int(line_stack[index + 1]);
        }
    }

    print_char(ord!(" "))
⟧

344.

⟦344 Print type of token list⟧ = ⟦
    case token_type {
      parameter:
        print_nl(strpool!("<argument> "));
      u_template, v_template:
        print_nl(strpool!("<template> "));
      backed_up, backed_up_char:
        if (loc == null) {
            print_nl(strpool!("<recently read> "));
        } else {
            print_nl(strpool!("<to be read again> "));
        }
      inserted:
        print_nl(strpool!("<inserted text> "));
      macro:
        print_ln;
        print_cs(name);
      output_text:
        print_nl(strpool!("<output> "));
      every_par_text:
        print_nl(strpool!("<everypar> "));
      every_math_text:
        print_nl(strpool!("<everymath> "));
      every_display_text:
        print_nl(strpool!("<everydisplay> "));
      every_hbox_text:
        print_nl(strpool!("<everyhbox> "));
      every_vbox_text:
        print_nl(strpool!("<everyvbox> "));
      every_job_text:
        print_nl(strpool!("<everyjob> "));
      every_cr_text:
        print_nl(strpool!("<everycr> "));
      mark_text:
        print_nl(strpool!("<mark> "));
      every_eof_text:
        print_nl(strpool!("<everyeof> "));
      inter_char_text:
        print_nl(strpool!("<XeTeXinterchartoks> "));
      write_text:
        print_nl(strpool!("<write> "));
      othercases:
        // this should never happen
        print_nl(ord!("?"));
    }
⟧

345. Here it is necessary to explain a little trick. We don’t want to store a long string that corresponds to a token list, because that string might take up lots of memory; and we are printing during a time when an error message is being given, so we dare not do anything that might overflow one of TEX’s tables. So ‘pseudoprinting’ is the answer: We enter a mode of printing that stores characters into a buffer of length error_line , where character 𝑘+1 is placed into trick_buf[k % error_line] if k < trick_count , otherwise character k is dropped. Initially we set tally = 0 and trick_count = 1000000 ; then when we reach the point where transition from line 1 to line 2 should occur, we set first_count = tally and trick_count = "max"( error_line, tally + 1 + error_line - half_error_line, ) . At the end of the pseudoprinting, the values of first_count , tally , and trick_count give us all the information we need to print the two lines, and all of the necessary text is in trick_buf .

Namely, let l be the length of the descriptive information that appears on the first line. The length of the context information gathered for that line is k == first_count , and the length of the context information gathered for line 2 is 𝑚=min(𝑡𝑎𝑙𝑙𝑦,𝑡𝑟𝑖𝑐𝑘_𝑐𝑜𝑢𝑛𝑡)𝑘. If l + k <= h , where h == half_error_line , we print trick_buf[0 .. k - 1] after the descriptive information on line 1, and set n = l + k ; here n is the length of line 1. If 𝑙+𝑘>, some cropping is necessary, so we set n = h and print ‘...’ followed by

trick_buf[(l+k-h+3)..k-1],
where subscripts of trick_buf are circular modulo error_line . The second line consists of n spaces followed by trick_buf[k .. (k + m - 1)] , unless n + m > error_line ; in the latter case, further cropping is done. This is easier to program than to explain.

⟦345 Local variables for formatting calculations⟧ = ⟦
    // index into buffer 
    var i: 0 .. buf_size;

    // end of current line in buffer 
    var j: 0 .. buf_size;

    // length of descriptive information on line 1
    var l: 0 .. half_error_line;

    // context information gathered for line 2
    var m: integer;

    // length of line 1
    var n: 0 .. error_line;

    // starting or ending place in trick_buf 
    var p: integer;

    // temporary index
    var q: integer;
⟧

346. The following code sets up the print routines so that they will gather the desired information.

@define begin_pseudoprint =>
    {
        l = tally;
        tally = 0;
        selector = pseudo;
        trick_count = 1000000;
    }
@define set_trick_count =>
    {
        first_count = tally;
        trick_count = 
            tally
            + 1 + error_line - half_error_line
        ;
        if (trick_count < error_line) {
            trick_count = error_line;
        }
    }

347. And the following code uses the information after it has been gathered.

⟦347 Print two lines using the tricky pseudoprinted information⟧ = ⟦
    if (trick_count == 1000000) {
        //  set_trick_count must be performed
        set_trick_count;
    }

    if (tally < trick_count) {
        m = tally - first_count;
    } else {
        // context on line 2
        m = trick_count - first_count;
    }

    if (l + first_count <= half_error_line) {
        p = 0;
        n = l + first_count;
    } else {
        print(strpool!("..."));
        p = l + first_count - half_error_line + 3;
        n = half_error_line;
    }

    for (q in p to first_count - 1) {
        print_char(trick_buf[q % error_line]);
    }

    print_ln

    for (q in 1 to n) {
        // print n spaces to begin line~2
        print_visible_char(ord!(" "));
    }

    if (m + n <= error_line) {
        p = first_count + m;
    } else {
        p = first_count + (error_line - n - 3);
    }

    for (q in first_count to p - 1) {
        print_char(trick_buf[q % error_line]);
    }

    if (m + n > error_line) {
        print(strpool!("..."));
    }
⟧

348. But the trick is distracting us from our current goal, which is to understand the input state. So let’s concentrate on the data structures that are being pseudoprinted as we finish up the show_context procedure.

⟦348 Pseudoprint the line⟧ = ⟦
    begin_pseudoprint

    if (buffer[limit] == end_line_char) {
        j = limit;
    } else {
        // determine the effective end of the line
        j = limit + 1;
    }

    if (j > 0) {
        for (i in start to j - 1) {
            if (i == loc) {
                set_trick_count;
            }
            print_char(buffer[i]);
        }
    }
⟧

349.

⟦349 Pseudoprint the token list⟧ = ⟦
    begin_pseudoprint

    if (token_type < macro) {
        show_token_list(start, loc, 100000);
    } else {
        // avoid reference count
        show_token_list(link(start), loc, 100000);
    }
⟧

350. Here is the missing piece of show_token_list that is activated when the token beginning line 2 is about to be shown:

⟦350 Do magic computation⟧ = ⟦
    set_trick_count
⟧

351. [23] Maintaining the input stacks. The following subroutines change the input status in commonly needed ways.

First comes push_input , which stores the current state and creates a new level (having, initially, the same properties as the old).

// enter a new input level, save the old
@define push_input =>
    {
        if (input_ptr > max_in_stack) {
            max_in_stack = input_ptr;
            if (input_ptr == stack_size) {
                overflow(
                  strpool!("input stack size"),
                  stack_size,
                );
            }
        }
        // stack the record
        input_stack[input_ptr] = cur_input;
        incr(input_ptr);
    }

352. And of course what goes up must come down.

// leave an input level, re-enter the old
@define pop_input =>
    {
        decr(input_ptr);
        cur_input = input_stack[input_ptr];
    }

353. Here is a procedure that starts a new level of token-list input, given a token list p and its type t . If t == macro , the calling routine should set name and loc .

// backs up a simple token list
@define back_list(#) => begin_token_list(#, backed_up)
// inserts a simple token list
@define ins_list(#) => begin_token_list(#, inserted)
function begin_token_list(p: pointer, t: quarterword) {
    push_input;
    state = token_list;
    start = p;
    token_type = t;
    // the token list starts with a reference count
    if (t >= macro) {
        add_token_ref(p);
        if (t == macro) {
            param_start = param_ptr;
        } else {
            loc = link(p);
            if (tracing_macros > 1) {
                begin_diagnostic;
                print_nl(strpool!(""));
                case t {
                  mark_text:
                    print_esc(strpool!("mark"));
                  write_text:
                    print_esc(strpool!("write"));
                  othercases:
                    print_cmd_chr(
                      assign_toks,
                      t - output_text + output_routine_loc,
                    );
                }
                print(strpool!("->"));
                token_show(p);
                end_diagnostic(false);
            }
        }
    } else {
        loc = p;
    }
}

354. When a token list has been fully scanned, the following computations should be done as we leave that level of input. The token_type tends to be equal to either backed_up or inserted about 2/3 of the time.

// leave a token-list input level
function end_token_list() {
    // token list to be deleted
    if (token_type >= backed_up) {
        if (token_type <= inserted) {
            flush_list(start);
        } else {
            // update reference count
            delete_token_ref(start);
            // parameters must be flushed
            if (token_type == macro) {
                while (param_ptr > param_start) {
                    decr(param_ptr);
                    flush_list(param_stack[param_ptr]);
                }
            }
        }
    } else if (token_type == u_template) {
        if (align_state > 500000) {
            align_state = 0;
        } else {
            fatal_error(
              strpool!("(interwoven alignment preambles are not allowed)"),
            );
        }
    }
    pop_input;
    check_interrupt;
}

355. Sometimes TEX has read too far and wants to “unscan” what it has seen. The back_input procedure takes care of this by putting the token just scanned back into the input stream, ready to be read again. This procedure can be used only if cur_tok represents the token to be replaced. Some applications of TEX use this procedure a lot, so it has been slightly optimized for speed.

// undoes one token of input
function back_input() {
    var
      p: pointer; // a token list of length one
    
    while (
        (state == token_list)
        && (loc == null) && (token_type != v_template)
    ) {
        // conserve stack space
        end_token_list;
    }
    p = get_avail;
    info(p) = cur_tok;
    if (cur_tok < right_brace_limit) {
        if (cur_tok < left_brace_limit) {
            decr(align_state);
        } else {
            incr(align_state);
        }
    }
    push_input;
    state = token_list;
    start = p;
    token_type = backed_up;
    // that was back_list ( p ) , without procedure overhead
    loc = p;
}

356.

⟦356 Insert token |p| into \TeX's input⟧ = ⟦
    {
        t = cur_tok;
        cur_tok = p;
        if (a) {
            p = get_avail;
            info(p) = cur_tok;
            link(p) = loc;
            loc = p;
            start = p;
            if (cur_tok < right_brace_limit) {
                if (cur_tok < left_brace_limit) {
                    decr(align_state);
                } else {
                    incr(align_state);
                }
            }
        } else {
            back_input;
            a = eTeX_ex;
        }
        cur_tok = t;
    }
⟧

357. The back_error routine is used when we want to replace an offending token just before issuing an error message. This routine, like back_input , requires that cur_tok has been set. We disable interrupts during the call of back_input so that the help message won’t be lost.

// back up one token and call error 
function back_error() {
    OK_to_interrupt = false;
    back_input;
    OK_to_interrupt = true;
    error;
}

// back up one inserted token and call error 
function ins_error() {
    OK_to_interrupt = false;
    back_input;
    token_type = inserted;
    OK_to_interrupt = true;
    error;
}

358. The begin_file_reading procedure starts a new level of input for lines of characters to be read from a file, or as an insertion from the terminal. It does not take care of opening the file, nor does it set loc or limit or line .

function begin_file_reading() {
    if (in_open == max_in_open) {
        overflow(
          strpool!("text input levels"),
          max_in_open,
        );
    }
    if (first == buf_size) {
        overflow(strpool!("buffer size"), buf_size);
    }
    incr(in_open);
    push_input;
    index = in_open;
    source_filename_stack[index] = 0;
    full_source_filename_stack[index] = 0;
    eof_seen[index] = false;
    grp_stack[index] = cur_boundary;
    if_stack[index] = cond_ptr;
    line_stack[index] = line;
    start = first;
    state = mid_line;
    //  terminal_input is now true 
    name = 0;
    ⟦1717 Prepare terminal input {\sl Sync\TeX} information⟧
}

359. Conversely, the variables must be downdated when such a level of input is finished:

function end_file_reading() {
    first = start;
    line = line_stack[index];
    if ((name == 18) || (name == 19)) {
        pseudo_close;
    } else if (name > 17) {
        // forget it
        u_close(cur_file);
    }
    pop_input;
    decr(in_open);
}

360. In order to keep the stack from overflowing during a long sequence of inserted ‘\show’ commands, the following routine removes completed error-inserted lines from memory.

function clear_for_error_prompt() {
    while (
        (state != token_list)
        && terminal_input
        && (input_ptr > 0) && (loc > limit)
    ) {
        end_file_reading;
    }
    print_ln;
    clear_terminal;
}

361. To get TEX’s whole input mechanism going, we perform the following actions.

⟦361 Initialize the input routines⟧ = ⟦
    {
        input_ptr = 0;
        max_in_stack = 0;
        source_filename_stack[0] = 0;
        full_source_filename_stack[0] = 0;
        in_open = 0;
        open_parens = 0;
        max_buf_stack = 0;
        grp_stack[0] = 0;
        if_stack[0] = null;
        param_ptr = 0;
        max_param_stack = 0;
        first = buf_size;
        repeat {
            buffer[first] = 0;
            decr(first);
        } until (first == 0);
        scanner_status = normal;
        warning_index = null;
        first = 1;
        state = new_line;
        start = 1;
        index = 0;
        line = 0;
        name = 0;
        force_eof = false;
        align_state = 1000000;
        if (!init_terminal) {
            goto final_end;
        }
        limit = last;
        //  init_terminal has set loc and last 
        first = last + 1;
    }
⟧

362. [24] Getting the next token. The heart of TEX’s input mechanism is the get_next procedure, which we shall develop in the next few sections of the program. Perhaps we shouldn’t actually call it the “heart,” however, because it really acts as TEX’s eyes and mouth, reading the source files and gobbling them up. And it also helps TEX to regurgitate stored token lists that are to be processed again.

The main duty of get_next is to input one token and to set cur_cmd and cur_chr to that token’s command code and modifier. Furthermore, if the input token is a control sequence, the eqtb location of that control sequence is stored in cur_cs ; otherwise cur_cs is set to zero.

Underlying this simple description is a certain amount of complexity because of all the cases that need to be handled. However, the inner loop of get_next is reasonably short and fast.

When get_next is asked to get the next token of a \read line, it sets cur_cmd == cur_chr == cur_cs == 0 in the case that no more tokens appear on that line. (There might not be any tokens at all, if the end_line_char has ignore as its catcode.)

363. The value of par_loc is the eqtb address of ‘\par’. This quantity is needed because a blank line of input is supposed to be exactly equivalent to the appearance of \par; we must set cur_cs = par_loc when detecting a blank line.

⟦13 Global variables⟧ += ⟦
    // location of `\.{\\par}' in eqtb 
    var par_loc: pointer;

    // token representing `\.{\\par}'
    var par_token: halfword;
⟧

364.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    // cf.\ scan_file_name 
    primitive(strpool!("par"), par_end, too_big_usv)

    par_loc = cur_val

    par_token = cs_token_flag + par_loc
⟧

365.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    par_end:

    print_esc(strpool!("par"))
⟧

366. Before getting into get_next , let’s consider the subroutine that is called when an ‘\outer’ control sequence has been scanned or when the end of a file has been reached. These two cases are distinguished by cur_cs , which is zero at the end of a file.

function check_outer_validity() {
    var
      p: pointer, // points to inserted token list
      q: pointer; // auxiliary pointer
    
    if (scanner_status != normal) {
        deletions_allowed = false;
        ⟦367 Back up an outer control sequence so that it can be reread⟧
        if (scanner_status > skipping) {
            ⟦368 Tell the user what has run away and try to recover⟧
        } else {
            print_err(strpool!("Incomplete "));
            print_cmd_chr(if_test, cur_if);
            print(
              strpool!("; all text was ignored after line "),
            );
            print_int(skip_line);
            help3(
              strpool!("A forbidden control sequence occurred in skipped text."),
            )(
              strpool!("This kind of error happens when you say `\\if...' and forget"),
            )(
              strpool!("the matching `\\fi'. I've inserted a `\\fi'; this might work."),
            );
            if (cur_cs != 0) {
                cur_cs = 0;
            } else {
                help_line[2] = strpool!("The file ended while I was skipping conditional text.");
            }
            cur_tok = cs_token_flag + frozen_fi;
            ins_error;
        }
        deletions_allowed = true;
    }
}

367. An outer control sequence that occurs in a \read will not be reread, since the error recovery for \read is not very powerful.

⟦367 Back up an outer control sequence so that it can be reread⟧ = ⟦
    if (cur_cs != 0) {
        if (
            (state == token_list)
            || (name < 1) || (name > 17)
        ) {
            p = get_avail;
            info(p) = cs_token_flag + cur_cs;
            // prepare to read the control sequence again
            back_list(p);
        }
        cur_cmd = spacer;
        // replace it by a space
        cur_chr = ord!(" ");
    }
⟧

368.

⟦368 Tell the user what has run away and try to recover⟧ = ⟦
    {
        // print a definition, argument, or preamble
        runaway;
        if (cur_cs == 0) {
            print_err(strpool!("File ended"));
        } else {
            cur_cs = 0;
            print_err(
              strpool!("Forbidden control sequence found"),
            );
        }
        ⟦369 Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}', and insert tokens that should lead to recovery⟧
        print(strpool!(" of "));
        sprint_cs(warning_index);
        help4(
          strpool!("I suspect you have forgotten a `}', causing me"),
        )(
          strpool!("to read past where you wanted me to stop."),
        )(
          strpool!("I'll try to recover; but if the error is serious,"),
        )(
          strpool!("you'd better type `E' or `X' now and fix your file."),
        );
        error;
    }
⟧

369. The recovery procedure can’t be fully understood without knowing more about the TEX routines that should be aborted, but we can sketch the ideas here: For a runaway definition or a runaway balanced text we will insert a right brace; for a runaway preamble, we will insert a special \cr token and a right brace; and for a runaway argument, we will set long_state to outer_call and insert \par.

⟦369 Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}', and insert tokens that should lead to recovery⟧ = ⟦
    p = get_avail

    case scanner_status {
      defining:
        print(strpool!(" while scanning definition"));
        info(p) = right_brace_token + ord!("}");
      matching:
        print(strpool!(" while scanning use"));
        info(p) = par_token;
        long_state = outer_call;
      aligning:
        print(strpool!(" while scanning preamble"));
        info(p) = right_brace_token + ord!("}");
        q = p;
        p = get_avail;
        link(p) = q;
        info(p) = cs_token_flag + frozen_cr;
        align_state = -1000000;
      absorbing:
        print(strpool!(" while scanning text"));
        info(p) = right_brace_token + ord!("}");
      // there are no other cases
    }

    ins_list(p)
⟧

370. We need to mention a procedure here that may be called by get_next .

forward_declaration firm_up_the_line();

371. Now we’re ready to take the plunge into get_next itself. Parts of this routine are executed more often than any other instructions of TEX.

@define switch => 25 // a label in get_next 
@define start_cs => 26 // another
@define not_exp => 27
// sets cur_cmd , cur_chr , cur_cs to next token
function get_next() {
    label
        restart, // go here to get the next input token
        switch, // go here to eat the next character from a 
        // file
        reswitch, // go here to digest it again
        start_cs, // go here to start looking for a control 
        // sequence
        found, // go here when a control sequence has been 
        // found
        not_exp, // go here when ^^ turned out not to start 
        // an expanded code
        exit; // go here when the next input token has been 
        // got
    var
      k: 0 .. buf_size, // an index into buffer 
      t: halfword, // a token
      cat: 0 .. max_char_code, //  cat_code ( cur_chr ) , 
      // usually
      c: UnicodeScalar, // constituent of a possible 
      // expanded code
      lower: UTF16_code, // lower surrogate of a possible 
      // UTF-16 compound
      d: small_number, // number of excess characters in an 
      // expanded code
      sup_count: small_number; // number of identical 
      // sup_mark characters
    
  restart:
    cur_cs = 0;
    if (state != token_list) {
        ⟦373 Input from external file, |goto restart| if no input found⟧
    } else {
        ⟦387 Input from token list, |goto restart| if end of list or if a parameter needs to be expanded⟧
    }
    ⟦372 If an alignment entry has just ended, take appropriate action⟧
  exit:
}

372. An alignment entry ends when a tab or \cr occurs, provided that the current level of braces is the same as the level that was present at the beginning of that alignment entry; i.e., provided that align_state has returned to the value it had after the 𝑢𝑗 template for that entry.

⟦372 If an alignment entry has just ended, take appropriate action⟧ = ⟦
    if (cur_cmd <= car_ret) {
        if (cur_cmd >= tab_mark) {
            if (align_state == 0) {
                ⟦837 Insert the \(v)\<v_j> template and |goto restart|⟧
            }
        }
    }
⟧

373.

⟦373 Input from external file, |goto restart| if no input found⟧ = ⟦
    {
        
      switch:
        // current line not yet finished
        if (loc <= limit) {
            cur_chr = buffer[loc];
            incr(loc);
            if (
                (cur_chr >= 0xd800)
                && (cur_chr < 0xdc00)
                && (loc <= limit)
                && (buffer[loc] >= 0xdc00)
                && (buffer[loc] < 0xe000)
            ) {
                lower = buffer[loc] - 0xdc00;
                incr(loc);
                cur_chr = 
                    0x10000
                    + (cur_chr - 0xd800) * 1024 + lower
                ;
            }
          reswitch:
            cur_cmd = cat_code(cur_chr);
            ⟦374 Change state if necessary, and |goto switch| if the current character should be ignored, or |goto reswitch| if the current character changes to another⟧
        } else {
            state = new_line;
            ⟦390 Move to next line of file, or |goto restart| if there is no next line, or |return| if a \.{\\read} line has finished⟧
            check_interrupt;
            goto switch;
        }
    }
⟧

374. The following 48-way switch accomplishes the scanning quickly, assuming that a decent Pascal compiler has translated the code. Note that the numeric values for mid_line , skip_blanks , and new_line are spaced apart from each other by max_char_code + 1 , so we can add a character’s command code to the state to get a single number that characterizes both.

@define any_state_plus(#) =>
    mid_line + #,
    skip_blanks + #,
    new_line + #
⟦374 Change state if necessary, and |goto switch| if the current character should be ignored, or |goto reswitch| if the current character changes to another⟧ = ⟦
    case state + cur_cmd {
      ⟦375 Cases where character is ignored⟧:
        goto switch;
      any_state_plus(escape):
        ⟦384 Scan a control sequence and set |state:=skip_blanks| or |mid_line|⟧
      any_state_plus(active_char):
        ⟦383 Process an active-character control sequence and set |state:=mid_line|⟧
      any_state_plus(sup_mark):
        ⟦382 If this |sup_mark| starts an expanded character like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|, otherwise set |state:=mid_line|⟧
      any_state_plus(invalid_char):
        ⟦376 Decry the invalid character and |goto restart|⟧
      ⟦377 Handle situations involving spaces, braces, changes of state⟧
      othercases:
        do_nothing;
    }
⟧

375.

⟦375 Cases where character is ignored⟧ = ⟦
    any_state_plus(ignore),
    skip_blanks + spacer,
    new_line + spacer
⟧

376. We go to restart instead of to switch , because state might equal token_list after the error has been dealt with (cf. clear_for_error_prompt ).

⟦376 Decry the invalid character and |goto restart|⟧ = ⟦
    {
        print_err(
          strpool!("Text line contains an invalid character"),
        );
        help2(
          strpool!("A funny symbol that I can't read has just been input."),
        )(
          strpool!("Continue, and I'll forget that it ever happened."),
        );
        deletions_allowed = false;
        error;
        deletions_allowed = true;
        goto restart;
    }
⟧

377.

@define add_delims_to(#) =>
    # + math_shift,
    # + tab_mark,
    # + mac_param,
    # + sub_mark,
    # + letter,
    # + other_char
⟦377 Handle situations involving spaces, braces, changes of state⟧ = ⟦
    mid_line + spacer:
      ⟦379 Enter |skip_blanks| state, emit a space⟧

    mid_line + car_ret:
      ⟦378 Finish line, emit a space⟧

    skip_blanks + car_ret, any_state_plus(comment):
      ⟦380 Finish line, |goto switch|⟧

    new_line + car_ret:
      ⟦381 Finish line, emit a \.{\\par}⟧

    mid_line + left_brace:
      incr(align_state);;

    skip_blanks + left_brace, new_line + left_brace:
      state = mid_line;
      incr(align_state);

    mid_line + right_brace:
      decr(align_state);;

    skip_blanks + right_brace, new_line + right_brace:
      state = mid_line;
      decr(align_state);

    add_delims_to(skip_blanks), add_delims_to(new_line):
      state = mid_line;;
⟧

378. When a character of type spacer gets through, its character code is changed to ""=0𝑥20. This means that the ASCII codes for tab and space, and for the space inserted at the end of a line, will be treated alike when macro parameters are being matched. We do this since such characters are indistinguishable on most computer terminal displays.

⟦378 Finish line, emit a space⟧ = ⟦
    {
        loc = limit + 1;
        cur_cmd = spacer;
        cur_chr = ord!(" ");
    }
⟧

379. The following code is performed only when cur_cmd == spacer .

⟦379 Enter |skip_blanks| state, emit a space⟧ = ⟦
    {
        state = skip_blanks;
        cur_chr = ord!(" ");
    }
⟧

380.

⟦380 Finish line, |goto switch|⟧ = ⟦
    {
        loc = limit + 1;
        goto switch;
    }
⟧

381.

⟦381 Finish line, emit a \.{\\par}⟧ = ⟦
    {
        loc = limit + 1;
        cur_cs = par_loc;
        cur_cmd = eq_type(cur_cs);
        cur_chr = equiv(cur_cs);
        if (cur_cmd >= outer_call) {
            check_outer_validity;
        }
    }
⟧

382. Notice that a code like ^^8 becomes x if not followed by a hex digit.

@define is_hex(#) =>
    (
        ((# >= ord!("0")) && (# <= ord!("9")))
        || ((# >= ord!("a")) && (# <= ord!("f")))
    )
@define hex_to_cur_chr =>
    if (c <= ord!("9")) {
        cur_chr = c - ord!("0");
    } else {
        cur_chr = c - ord!("a") + 10;
    }
    if (cc <= ord!("9")) {
        cur_chr = 16 * cur_chr + cc - ord!("0");
    } else {
        cur_chr = 16 * cur_chr + cc - ord!("a") + 10;
    }
@define long_hex_to_cur_chr =>
    if (c <= ord!("9")) {
        cur_chr = c - ord!("0");
    } else {
        cur_chr = c - ord!("a") + 10;
    }
    if (cc <= ord!("9")) {
        cur_chr = 16 * cur_chr + cc - ord!("0");
    } else {
        cur_chr = 16 * cur_chr + cc - ord!("a") + 10;
    }
    if (ccc <= ord!("9")) {
        cur_chr = 16 * cur_chr + ccc - ord!("0");
    } else {
        cur_chr = 16 * cur_chr + ccc - ord!("a") + 10;
    }
    if (cccc <= ord!("9")) {
        cur_chr = 16 * cur_chr + cccc - ord!("0");
    } else {
        cur_chr = 16 * cur_chr + cccc - ord!("a") + 10;
    }
⟦382 If this |sup_mark| starts an expanded character like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|, otherwise set |state:=mid_line|⟧ = ⟦
    {
        if (cur_chr == buffer[loc]) {
            if (loc < limit) {
                // we have ^ ^ and another char; check how 
                // many ^ s we have altogether, up to a max 
                // of 6
                sup_count = 2;
                while (
                    (sup_count < 6)
                    && (loc + 2 * sup_count - 2 <= limit)
                    && (
                        cur_chr
                        == buffer[loc + sup_count - 1]
                    )
                ) {
                    // check whether we have enough hex 
                    // chars for the number of ^ s
                    incr(sup_count);
                }
                for (d in 1 to sup_count) {
                    // found a non-hex char, so do single ^ 
                    // ^ X style
                    if (!is_hex(
                      buffer[loc + sup_count - 2 + d],
                    )) {
                        c = buffer[loc + 1];
                        if (c < 0x80) {
                            loc = loc + 2;
                            if (c < 0x40) {
                                cur_chr = c + 0x40;
                            } else {
                                cur_chr = c - 0x40;
                            }
                            goto reswitch;
                        }
                        goto not_exp;
                        // there were the right number of 
                        // hex chars, so convert them
                    }
                }
                cur_chr = 0;
                for (d in 1 to sup_count) {
                    c = buffer[loc + sup_count - 2 + d];
                    if (c <= ord!("9")) {
                        cur_chr = 
                            16
                            * cur_chr + c - ord!("0")
                        ;
                    } else {
                        cur_chr = 
                            16
                            * cur_chr + c - ord!("a") + 10
                        ;
                    }
                    // check the resulting value is within 
                    // the valid range
                }
                if (cur_chr > biggest_usv) {
                    cur_chr = buffer[loc];
                    goto not_exp;
                }
                loc = loc + 2 * sup_count - 1;
                goto reswitch;
            }
        }
      not_exp:
        state = mid_line;
    }
⟧

383.

⟦383 Process an active-character control sequence and set |state:=mid_line|⟧ = ⟦
    {
        cur_cs = cur_chr + active_base;
        cur_cmd = eq_type(cur_cs);
        cur_chr = equiv(cur_cs);
        state = mid_line;
        if (cur_cmd >= outer_call) {
            check_outer_validity;
        }
    }
⟧

384. Control sequence names are scanned only when they appear in some line of a file; once they have been scanned the first time, their eqtb location serves as a unique identification, so TEX doesn’t need to refer to the original name any more except when it prints the equivalent in symbolic form.

The program that scans a control sequence has been written carefully in order to avoid the blowups that might otherwise occur if a malicious user tried something like ‘\catcode 15=0’. The algorithm might look at buffer[limit + 1] , but it never looks at buffer[limit + 2] .

If expanded characters like ‘^^A’ or ‘^^df’ appear in or just following a control sequence name, they are converted to single characters in the buffer and the process is repeated, slowly but surely.

⟦384 Scan a control sequence and set |state:=skip_blanks| or |mid_line|⟧ = ⟦
    {
        if (loc > limit) {
            //  state is irrelevant in this case
            cur_cs = null_cs;
        } else {
          start_cs:
            k = loc;
            cur_chr = buffer[k];
            cat = cat_code(cur_chr);
            incr(k);
            if (cat == letter) {
                state = skip_blanks;
            } else if (cat == spacer) {
                state = skip_blanks;
            } else {
                state = mid_line;
            }
            if ((cat == letter) && (k <= limit)) {
                ⟦386 Scan ahead in the buffer until finding a nonletter; if an expanded code is encountered, reduce it and |goto start_cs|; otherwise if a multiletter control sequence is found, adjust |cur_cs| and |loc|, and |goto found|⟧
            } else {
                // At this point, we have a single-character 
                // cs name in the buffer. But if the 
                // character code is ${}> 0xFFFF $, we treat 
                // it like a multiletter name for string 
                // purposes, because we use UTF-16 in the 
                // string pool.
                ⟦385 If an expanded code is present, reduce it and |goto start_cs|⟧
            }
            if (buffer[loc] > 0xffff) {
                cur_cs = id_lookup(loc, 1);
                incr(loc);
                goto found;
            }
            cur_cs = single_base + buffer[loc];
            incr(loc);
        }
      found:
        cur_cmd = eq_type(cur_cs);
        cur_chr = equiv(cur_cs);
        if (cur_cmd >= outer_call) {
            check_outer_validity;
        }
    }
⟧

385. Whenever we reach the following piece of code, we will have cur_chr == buffer[k - 1] and k <= limit + 1 and cat == cat_code(cur_chr) . If an expanded code like ^^A or ^^df appears in buffer[(k - 1) .. (k + 1)] or buffer[(k - 1) .. (k + 2)] , we will store the corresponding code in buffer[k - 1] and shift the rest of the buffer left two or three places.

⟦385 If an expanded code is present, reduce it and |goto start_cs|⟧ = ⟦
    {
        if (
            (cat == sup_mark)
            && (buffer[k] == cur_chr) && (k < limit)
        ) {
            // we have ^ ^ and another char; check how many 
            // ^ s we have altogether, up to a max of 6
            sup_count = 2;
            while (
                (sup_count < 6)
                && (k + 2 * sup_count - 2 <= limit)
                && (buffer[k + sup_count - 1] == cur_chr)
            ) {
                // check whether we have enough hex chars 
                // for the number of ^ s
                incr(sup_count);
            }
            for (d in 1 to sup_count) {
                // found a non-hex char, so do single ^ ^ X 
                // style
                if (!is_hex(buffer[k + sup_count - 2 + d])) {
                    c = buffer[k + 1];
                    if (c < 0x80) {
                        if (c < 0x40) {
                            buffer[k - 1] = c + 0x40;
                        } else {
                            buffer[k - 1] = c - 0x40;
                        }
                        d = 2;
                        limit = limit - d;
                        while (k <= limit) {
                            buffer[k] = buffer[k + d];
                            incr(k);
                        }
                        goto start_cs;
                    } else {
                        sup_count = 0;
                    }
                }
            }
            // there were the right number of hex chars, so 
            // convert them
            if (sup_count > 0) {
                cur_chr = 0;
                for (d in 1 to sup_count) {
                    c = buffer[k + sup_count - 2 + d];
                    if (c <= ord!("9")) {
                        cur_chr = 
                            16
                            * cur_chr + c - ord!("0")
                        ;
                    } else {
                        cur_chr = 
                            16
                            * cur_chr + c - ord!("a") + 10
                        ;
                    }
                    // check the resulting value is within 
                    // the valid range
                }
                if (cur_chr > biggest_usv) {
                    cur_chr = buffer[k];
                } else {
                    buffer[k - 1] = cur_chr;
                    // shift the rest of the buffer left by 
                    // d chars
                    d = 2 * sup_count - 1;
                    limit = limit - d;
                    while (k <= limit) {
                        buffer[k] = buffer[k + d];
                        incr(k);
                    }
                    goto start_cs;
                }
            }
        }
    }
⟧

386.

⟦386 Scan ahead in the buffer until finding a nonletter; if an expanded code is encountered, reduce it and |goto start_cs|; otherwise if a multiletter control sequence is found, adjust |cur_cs| and |loc|, and |goto found|⟧ = ⟦
    {
        repeat {
            cur_chr = buffer[k];
            cat = cat_code(cur_chr);
            incr(k);
        } until ((cat != letter) || (k > limit));
        ⟦385 If an expanded code is present, reduce it and |goto start_cs|⟧
        if (cat != letter) {
            // now k points to first nonletter
            decr(k);
        }
        // multiletter control sequence has been scanned
        if (k > loc + 1) {
            cur_cs = id_lookup(loc, k - loc);
            loc = k;
            goto found;
        }
    }
⟧

387. Let’s consider now what happens when get_next is looking at a token list.

⟦387 Input from token list, |goto restart| if end of list or if a parameter needs to be expanded⟧ = ⟦
    // list not exhausted
    if (loc != null) {
        t = info(loc);
        // move to next
        loc = link(loc);
        // a control sequence token
        if (t >= cs_token_flag) {
            cur_cs = t - cs_token_flag;
            cur_cmd = eq_type(cur_cs);
            cur_chr = equiv(cur_cs);
            if (cur_cmd >= outer_call) {
                if (cur_cmd == dont_expand) {
                    ⟦388 Get the next token, suppressing expansion⟧
                } else {
                    check_outer_validity;
                }
            }
        } else {
            cur_cmd = t div max_char_val;
            cur_chr = t % max_char_val;
            case cur_cmd {
              left_brace:
                incr(align_state);
              right_brace:
                decr(align_state);
              out_param:
                ⟦389 Insert macro parameter and |goto restart|⟧
              othercases:
                do_nothing;
            }
        }
    } else {
        // we are done with this token list
        end_token_list;
        // resume previous level
        goto restart;
    }
⟧

388. The present point in the program is reached only when the expand routine has inserted a special marker into the input. In this special case, info(loc) is known to be a control sequence token, and link(loc) == null .

// this characterizes a special variant of relax 
@define no_expand_flag => special_char
⟦388 Get the next token, suppressing expansion⟧ = ⟦
    {
        cur_cs = info(loc) - cs_token_flag;
        loc = null;
        cur_cmd = eq_type(cur_cs);
        cur_chr = equiv(cur_cs);
        if (cur_cmd > max_command) {
            cur_cmd = relax;
            cur_chr = no_expand_flag;
        }
    }
⟧

389.

⟦389 Insert macro parameter and |goto restart|⟧ = ⟦
    {
        begin_token_list(
          param_stack[param_start + cur_chr - 1],
          parameter,
        );
        goto restart;
    }
⟧

390. All of the easy branches of get_next have now been taken care of. There is one more branch.

@define end_line_char_inactive =>
    (end_line_char < 0) || (end_line_char > 255)
⟦390 Move to next line of file, or |goto restart| if there is no next line, or |return| if a \.{\\read} line has finished⟧ = ⟦
    if (name > 17) {
        ⟦392 Read next line of file into |buffer|, or |goto restart| if the file has ended⟧
    } else {
        // \.{\\read} line has ended
        if (!terminal_input) {
            cur_cmd = 0;
            cur_chr = 0;
            return;
        }
        // text was inserted during error recovery
        if (input_ptr > 0) {
            end_file_reading;
            // resume previous level
            goto restart;
        }
        if (selector < log_only) {
            open_log_file;
        }
        if (interaction > nonstop_mode) {
            if (end_line_char_inactive) {
                incr(limit);
            }
            // previous line was empty
            if (limit == start) {
                print_nl(
                  strpool!("(Please type a command or say `\\end')"),
                );
            }
            print_ln;
            first = start;
            // input on-line into buffer 
            prompt_input(ord!("*"));
            limit = last;
            if (end_line_char_inactive) {
                decr(limit);
            } else {
                buffer[limit] = end_line_char;
            }
            first = limit + 1;
            loc = start;
        } else {
            // nonstop mode, which is intended for overnight 
            // batch processing, never waits for on-line 
            // input
            fatal_error(
              strpool!("*** (job aborted, no legal \\end found)"),
            );
        }
    }
⟧

391. The global variable force_eof is normally false ; it is set true by an \endinput command.

⟦13 Global variables⟧ += ⟦
    // should the next \.{\\input} be aborted early?
    var force_eof: boolean;
⟧

392.

⟦392 Read next line of file into |buffer|, or |goto restart| if the file has ended⟧ = ⟦
    {
        incr(line);
        first = start;
        if (!force_eof) {
            if (name <= 19) {
                // not end of file
                if (pseudo_input) {
                    // this sets limit 
                    firm_up_the_line;
                } else if (
                    (every_eof != null)
                    && !eof_seen[index]
                ) {
                    limit = first - 1;
                    // fake one empty line
                    eof_seen[index] = true;
                    begin_token_list(
                      every_eof,
                      every_eof_text,
                    );
                    goto restart;
                } else {
                    force_eof = true;
                }
            } else {
                // not end of file
                if (input_ln(cur_file, true)) {
                    // this sets limit 
                    firm_up_the_line;
                } else if (
                    (every_eof != null)
                    && !eof_seen[index]
                ) {
                    limit = first - 1;
                    // fake one empty line
                    eof_seen[index] = true;
                    begin_token_list(
                      every_eof,
                      every_eof_text,
                    );
                    goto restart;
                } else {
                    force_eof = true;
                }
            }
        }
        if (force_eof) {
            if (tracing_nesting > 0) {
                if (
                    (grp_stack[in_open] != cur_boundary)
                    || (if_stack[in_open] != cond_ptr)
                ) {
                    // give warning for some unfinished 
                    // groups and/or conditionals
                    file_warning;
                }
            }
            if (name >= 19) {
                print_char(ord!(")"));
                decr(open_parens);
                // show user that file has been read
                update_terminal;
            }
            force_eof = false;
            // resume previous level
            end_file_reading;
            check_outer_validity;
            goto restart;
        }
        if (end_line_char_inactive) {
            decr(limit);
        } else {
            buffer[limit] = end_line_char;
        }
        first = limit + 1;
        // ready to read
        loc = start;
    }
⟧

393. If the user has set the pausing parameter to some positive value, and if nonstop mode has not been selected, each line of input is displayed on the terminal and the transcript file, followed by ‘=>’. TEX waits for a response. If the response is simply carriage_return , the line is accepted as it stands, otherwise the line typed is used instead of the line in the file.

function firm_up_the_line() {
    var
      k: 0 .. buf_size; // an index into buffer 
    
    limit = last;
    if (pausing > 0) {
        if (interaction > nonstop_mode) {
            wake_up_terminal;
            print_ln;
            if (start < limit) {
                for (k in start to limit - 1) {
                    print(buffer[k]);
                }
            }
            first = limit;
            // wait for user response
            prompt_input(strpool!("=>"));
            if (last > first) {
                // move line down in buffer
                for (k in first to last - 1) {
                    buffer[k + start - first] = buffer[k];
                }
                limit = start + last - first;
            }
        }
    }
}

394. Since get_next is used so frequently in TEX, it is convenient to define three related procedures that do a little more:

get_token not only sets cur_cmd and cur_chr , it also sets cur_tok , a packed halfword version of the current token.

get_x_token , meaning “get an expanded token,” is like get_token , but if the current token turns out to be a user-defined control sequence (i.e., a macro call), or a conditional, or something like \topmark or \expandafter or \csname, it is eliminated from the input by beginning the expansion of the macro or the evaluation of the conditional.

x_token is like get_x_token except that it assumes that get_next has already been called.

In fact, these three procedures account for almost every use of get_next .

395. No new control sequences will be defined except during a call of get_token , or when \csname compresses a token list, because no_new_control_sequence is always true at other times.

// sets cur_cmd , cur_chr , cur_tok 
function get_token() {
    no_new_control_sequence = false;
    get_next;
    no_new_control_sequence = true;
    if (cur_cs == 0) {
        cur_tok = (cur_cmd * max_char_val) + cur_chr;
    } else {
        cur_tok = cs_token_flag + cur_cs;
    }
}

396. [25] Expanding the next token. Only a dozen or so command codes >max_command can possibly be returned by get_next ; in increasing order, they are undefined_cs , expand_after , no_expand , input , if_test , fi_or_else , cs_name , convert , the , top_bot_mark , call , long_call , outer_call , long_outer_call , and end_template .

The expand subroutine is used when cur_cmd > max_command . It removes a “call” or a conditional or one of the other special operations just listed. It follows that expand might invoke itself recursively. In all cases, expand destroys the current token, but it sets things up so that the next get_next will deliver the appropriate next token. The value of cur_tok need not be known when expand is called.

Since several of the basic scanning routines communicate via global variables, their values are saved as local variables of expand so that recursive calls don’t invalidate them.

⟦423 Declare the procedure called |macro_call|⟧

⟦413 Declare the procedure called |insert_relax|⟧

⟦1563 Declare \eTeX\ procedures for expanding⟧

forward_declaration pass_text();

forward_declaration start_input();

forward_declaration conditional();

forward_declaration get_x_token();

forward_declaration conv_toks();

forward_declaration ins_the_toks();

function expand() {
    label reswitch;
    var
      t: halfword, // token that is being ``expanded after''
      b: boolean, // keep track of nested csnames
      p, q, r: pointer, // for list manipulation
      j: 0 .. buf_size, // index into buffer 
      cv_backup: integer, // to save the global quantity 
      // cur_val 
      cvl_backup, radix_backup, co_backup: small_number, // 
      // to save cur_val_level , etc.
      backup_backup: pointer, // to save link ( backup_head 
      // ) 
      save_scanner_status: small_number; // temporary 
      // storage of scanner_status 
    
    incr(expand_depth_count);
    if (expand_depth_count >= expand_depth) {
        overflow(strpool!("expansion depth"), expand_depth);
    }
    cv_backup = cur_val;
    cvl_backup = cur_val_level;
    radix_backup = radix;
    co_backup = cur_order;
    backup_backup = link(backup_head);
  reswitch:
    if (cur_cmd < call) {
        ⟦399 Expand a nonmacro⟧
    } else if (cur_cmd < end_template) {
        macro_call;
    } else {
        ⟦409 Insert a token containing |frozen_endv|⟧
    }
    cur_val = cv_backup;
    cur_val_level = cvl_backup;
    radix = radix_backup;
    cur_order = co_backup;
    link(backup_head) = backup_backup;
    decr(expand_depth_count);
}

397.

⟦13 Global variables⟧ += ⟦
    var is_in_csname: boolean;
⟧

398.

⟦23 Set initial values of key variables⟧ += ⟦
    is_in_csname = false
⟧

399.

⟦399 Expand a nonmacro⟧ = ⟦
    {
        if (tracing_commands > 1) {
            show_cur_cmd_chr;
        }
        case cur_cmd {
          top_bot_mark:
            ⟦420 Insert the \(a)appropriate mark text into the scanner⟧
          expand_after:
            if (cur_chr == 0) {
                ⟦400 Expand the token after the next token⟧
            } else {
                ⟦1576 Negate a boolean conditional and |goto reswitch|⟧
            }
          no_expand:
            if (cur_chr == 0) {
                ⟦401 Suppress expansion of the next token⟧
            } else {
                ⟦402 Implement \.{\\primitive}⟧
            }
          cs_name:
            ⟦406 Manufacture a control sequence name⟧
          convert:
            // this procedure is discussed in Part 27 below
            conv_toks;
          the:
            // this procedure is discussed in Part 27 below
            ins_the_toks;
          if_test:
            // this procedure is discussed in Part 28 below
            conditional;
          fi_or_else:
            ⟦545 Terminate the current conditional and skip to \.{\\fi}⟧
          input:
            ⟦412 Initiate or terminate input from a file⟧
          othercases:
            ⟦404 Complain about an undefined macro⟧
        }
    }
⟧

400. It takes only a little shuffling to do what TEX calls \expandafter.

⟦400 Expand the token after the next token⟧ = ⟦
    {
        get_token;
        t = cur_tok;
        get_token;
        if (cur_cmd > max_command) {
            expand;
        } else {
            back_input;
        }
        cur_tok = t;
        back_input;
    }
⟧

401. The implementation of \noexpand is a bit trickier, because it is necessary to insert a special ‘dont_expand ’ marker into TEX’s reading mechanism. This special marker is processed by get_next , but it does not slow down the inner loop.

Since \outer macros might arise here, we must also clear the scanner_status temporarily.

⟦401 Suppress expansion of the next token⟧ = ⟦
    {
        save_scanner_status = scanner_status;
        scanner_status = normal;
        get_token;
        scanner_status = save_scanner_status;
        t = cur_tok;
        // now start and loc point to the backed-up token t 
        back_input;
        if (t >= cs_token_flag) {
            p = get_avail;
            info(p) = cs_token_flag + frozen_dont_expand;
            link(p) = loc;
            start = p;
            loc = p;
        }
    }
⟧

402. The \primitive handling. If the primitive meaning of the next token is an expandable command, it suffices to replace the current token with the primitive one and restart expand /

Otherwise, the token we just read has to be pushed back, as well as a token matching the internal form of \primitive, that is sneaked in as an alternate form of ignore_spaces .

Simply pushing back a token that matches the correct internal command does not work, because approach would not survive roundtripping to a temporary file.

⟦402 Implement \.{\\primitive}⟧ = ⟦
    {
        save_scanner_status = scanner_status;
        scanner_status = normal;
        get_token;
        scanner_status = save_scanner_status;
        if (cur_cs < hash_base) {
            cur_cs = prim_lookup(cur_cs - single_base);
        } else {
            cur_cs = prim_lookup(text(cur_cs));
        }
        if (cur_cs != undefined_primitive) {
            t = prim_eq_type(cur_cs);
            if (t > max_command) {
                cur_cmd = t;
                cur_chr = prim_equiv(cur_cs);
                cur_tok = (cur_cmd * max_char_val) + cur_chr;
                cur_cs = 0;
                goto reswitch;
            } else {
                // now loc and start point to a one-item 
                // list
                back_input;
                p = get_avail;
                info(p) = cs_token_flag + frozen_primitive;
                link(p) = loc;
                loc = p;
                start = p;
            }
        }
    }
⟧

403. This block deals with unexpandable \primitive appearing at a spot where an integer or an internal values should have been found. It fetches the next token then resets cur_cmd , cur_cs , and cur_tok , based on the primitive value of that token. No expansion takes place, because the next token may be all sorts of things. This could trigger further expansion creating new errors.

⟦403 Reset |cur_tok| for unexpandable primitives, goto restart⟧ = ⟦
    {
        get_token;
        if (cur_cs < hash_base) {
            cur_cs = prim_lookup(cur_cs - single_base);
        } else {
            cur_cs = prim_lookup(text(cur_cs));
        }
        if (cur_cs != undefined_primitive) {
            cur_cmd = prim_eq_type(cur_cs);
            cur_chr = prim_equiv(cur_cs);
            cur_cs = prim_eqtb_base + cur_cs;
            cur_tok = cs_token_flag + cur_cs;
        } else {
            cur_cmd = relax;
            cur_chr = 0;
            cur_tok = cs_token_flag + frozen_relax;
            cur_cs = frozen_relax;
        }
        goto restart;
    }
⟧

404.

⟦404 Complain about an undefined macro⟧ = ⟦
    {
        print_err(strpool!("Undefined control sequence"));
        help5(
          strpool!("The control sequence at the end of the top line"),
        )(
          strpool!("of your error message was never \\def'ed. If you have"),
        )(
          strpool!("misspelled it (e.g., `\\hobx'), type `I' and the correct"),
        )(
          strpool!("spelling (e.g., `I\\hbox'). Otherwise just continue,"),
        )(
          strpool!("and I'll forget about whatever was undefined."),
        );
        error;
    }
⟧

405. The expand procedure and some other routines that construct token lists find it convenient to use the following macros, which are valid only if the variables p and q are reserved for token-list building.

@define store_new_token(#) =>
    {
        q = get_avail;
        link(p) = q;
        info(q) = #;
        //  link ( p ) is null 
        p = q;
    }
@define fast_store_new_token(#) =>
    {
        fast_get_avail(q);
        link(p) = q;
        info(q) = #;
        //  link ( p ) is null 
        p = q;
    }

406.

⟦406 Manufacture a control sequence name⟧ = ⟦
    {
        r = get_avail;
        // head of the list of characters
        p = r;
        b = is_in_csname;
        is_in_csname = true;
        repeat {
            get_x_token;
            if (cur_cs == 0) {
                store_new_token(cur_tok);
            }
        } until (cur_cs != 0);
        if (cur_cmd != end_cs_name) {
            ⟦407 Complain about missing \.{\\endcsname}⟧
        }
        is_in_csname = b;
        ⟦408 Look up the characters of list |r| in the hash table, and set |cur_cs|⟧
        flush_list(r);
        if (eq_type(cur_cs) == undefined_cs) {
            // N.B.: The save_stack might change
            eq_define(cur_cs, relax, too_big_usv);
            // the control sequence will now match 
            // `\.{\\relax}'
        }
        cur_tok = cur_cs + cs_token_flag;
        back_input;
    }
⟧

407.

⟦407 Complain about missing \.{\\endcsname}⟧ = ⟦
    {
        print_err(strpool!("Missing "));
        print_esc(strpool!("endcsname"));
        print(strpool!(" inserted"));
        help2(
          strpool!("The control sequence marked <to be read again> should"),
        )(
          strpool!("not appear between \\csname and \\endcsname."),
        );
        back_error;
    }
⟧

408.

⟦408 Look up the characters of list |r| in the hash table, and set |cur_cs|⟧ = ⟦
    j = first

    p = link(r)

    while (p != null) {
        if (j >= max_buf_stack) {
            max_buf_stack = j + 1;
            if (max_buf_stack == buf_size) {
                overflow(strpool!("buffer size"), buf_size);
            }
        }
        buffer[j] = info(p) % max_char_val;
        incr(j);
        p = link(p);
    }

    if ((j > first + 1) || (buffer[first] > 0xffff)) {
        no_new_control_sequence = false;
        cur_cs = id_lookup(first, j - first);
        no_new_control_sequence = true;
    } else if (j == first) {
        // the list is empty
        cur_cs = null_cs;
    } else {
        // the list has length one
        cur_cs = single_base + buffer[first];
    }
⟧

409. An end_template command is effectively changed to an endv command by the following code. (The reason for this is discussed below; the frozen_end_template at the end of the template has passed the check_outer_validity test, so its mission of error detection has been accomplished.)

⟦409 Insert a token containing |frozen_endv|⟧ = ⟦
    {
        cur_tok = cs_token_flag + frozen_endv;
        back_input;
    }
⟧

410. The processing of \input involves the start_input subroutine, which will be declared later; the processing of \endinput is trivial.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("input"), input, 0)

    primitive(strpool!("endinput"), input, 1)
⟧

411.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    input:

    if (chr_code == 0) {
        print_esc(strpool!("input"));
    }

    ⟦1559 Cases of |input| for |print_cmd_chr|⟧

    else

    print_esc(strpool!("endinput"))
⟧

412.

⟦412 Initiate or terminate input from a file⟧ = ⟦
    if (cur_chr == 1) {
        force_eof = true;
    }

    ⟦1560 Cases for |input|⟧

    else

    if (name_in_progress) {
        insert_relax;
    } else {
        start_input;
    }
⟧

413. Sometimes the expansion looks too far ahead, so we want to insert a harmless \relax into the user’s input.

⟦413 Declare the procedure called |insert_relax|⟧ = ⟦
    function insert_relax() {
        cur_tok = cs_token_flag + cur_cs;
        back_input;
        cur_tok = cs_token_flag + frozen_relax;
        back_input;
        token_type = inserted;
    }
⟧

414. Here is a recursive procedure that is TEX’s usual way to get the next token of input. It has been slightly optimized to take account of common cases.

// sets cur_cmd , cur_chr , cur_tok , and expands macros
function get_x_token() {
    label restart, done;
    
  restart:
    get_next;
    if (cur_cmd <= max_command) {
        goto done;
    }
    if (cur_cmd >= call) {
        if (cur_cmd < end_template) {
            macro_call;
        } else {
            cur_cs = frozen_endv;
            cur_cmd = endv;
            //  cur_chr == null_list 
            goto done;
        }
    } else {
        expand;
    }
    goto restart;
  done:
    if (cur_cs == 0) {
        cur_tok = (cur_cmd * max_char_val) + cur_chr;
    } else {
        cur_tok = cs_token_flag + cur_cs;
    }
}

415. The get_x_token procedure is essentially equivalent to two consecutive procedure calls: get_next x_token .

//  get_x_token without the initial get_next 
function x_token() {
    while (cur_cmd > max_command) {
        expand;
        get_next;
    }
    if (cur_cs == 0) {
        cur_tok = (cur_cmd * max_char_val) + cur_chr;
    } else {
        cur_tok = cs_token_flag + cur_cs;
    }
}

416. A control sequence that has been \def’ed by the user is expanded by TEX’s macro_call procedure.

Before we get into the details of macro_call , however, let’s consider the treatment of primitives like \topmark, since they are essentially macros without parameters. The token lists for such marks are kept in a global array of five pointers; we refer to the individual entries of this array by symbolic names top_mark , etc. The value of top_mark is either null or a pointer to the reference count of a token list.

@define marks_code => 5 // add this for \.{\\topmarks} etc.
// the mark in effect at the previous page break
@define top_mark_code => 0
// the first mark between top_mark and bot_mark 
@define first_mark_code => 1
// the mark in effect at the current page break
@define bot_mark_code => 2
// the first mark found by \.{\\vsplit}
@define split_first_mark_code => 3
// the last mark found by \.{\\vsplit}
@define split_bot_mark_code => 4
@define top_mark => cur_mark[top_mark_code]
@define first_mark => cur_mark[first_mark_code]
@define bot_mark => cur_mark[bot_mark_code]
@define split_first_mark => cur_mark[split_first_mark_code]
@define split_bot_mark => cur_mark[split_bot_mark_code]
⟦13 Global variables⟧ += ⟦
    // token lists for marks
    var cur_mark: array [
      top_mark_code .. split_bot_mark_code,
    ] of pointer;
⟧

417.

⟦23 Set initial values of key variables⟧ += ⟦
    top_mark = null

    first_mark = null

    bot_mark = null

    split_first_mark = null

    split_bot_mark = null
⟧

418.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("topmark"),
      top_bot_mark,
      top_mark_code,
    )

    primitive(
      strpool!("firstmark"),
      top_bot_mark,
      first_mark_code,
    )

    primitive(
      strpool!("botmark"),
      top_bot_mark,
      bot_mark_code,
    )

    primitive(
      strpool!("splitfirstmark"),
      top_bot_mark,
      split_first_mark_code,
    )

    primitive(
      strpool!("splitbotmark"),
      top_bot_mark,
      split_bot_mark_code,
    )
⟧

419.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    top_bot_mark:

    {
        case (chr_code % marks_code) {
          first_mark_code:
            print_esc(strpool!("firstmark"));
          bot_mark_code:
            print_esc(strpool!("botmark"));
          split_first_mark_code:
            print_esc(strpool!("splitfirstmark"));
          split_bot_mark_code:
            print_esc(strpool!("splitbotmark"));
          othercases:
            print_esc(strpool!("topmark"));
        }
        if (chr_code >= marks_code) {
            print_char(ord!("s"));
        }
    }
⟧

420. The following code is activated when cur_cmd == top_bot_mark and when cur_chr is a code like top_mark_code .

⟦420 Insert the \(a)appropriate mark text into the scanner⟧ = ⟦
    {
        t = cur_chr % marks_code;
        if (cur_chr >= marks_code) {
            scan_register_num;
        } else {
            cur_val = 0;
        }
        if (cur_val == 0) {
            cur_ptr = cur_mark[t];
        } else {
            ⟦1635 Compute the mark pointer for mark type |t| and class |cur_val|⟧
        }
        if (cur_ptr != null) {
            begin_token_list(cur_ptr, mark_text);
        }
    }
⟧

421. Now let’s consider macro_call itself, which is invoked when TEX is scanning a control sequence whose cur_cmd is either call , long_call , outer_call , or long_outer_call . The control sequence definition appears in the token list whose reference count is in location cur_chr of mem .

The global variable long_state will be set to call or to long_call , depending on whether or not the control sequence disallows \par in its parameters. The get_next routine will set long_state to outer_call and emit \par, if a file ends or if an \outer control sequence occurs in the midst of an argument.

⟦13 Global variables⟧ += ⟦
    // governs the acceptance of \.{\\par}
    var long_state: call .. long_outer_call;
⟧

422. The parameters, if any, must be scanned before the macro is expanded. Parameters are token lists without reference counts. They are placed on an auxiliary stack called pstack while they are being scanned, since the param_stack may be losing entries during the matching process. (Note that param_stack can’t be gaining entries, since macro_call is the only routine that puts anything onto param_stack , and it is not recursive.)

⟦13 Global variables⟧ += ⟦
    // arguments supplied to a macro
    var pstack: array [0 .. 8] of pointer;
⟧

423. After parameter scanning is complete, the parameters are moved to the param_stack . Then the macro body is fed to the scanner; in other words, macro_call places the defined text of the control sequence at the top of TEX’s input stack, so that get_next will proceed to read it next.

The global variable cur_cs contains the eqtb address of the control sequence being expanded, when macro_call begins. If this control sequence has not been declared \long, i.e., if its command code in the eq_type field is not long_call or long_outer_call , its parameters are not allowed to contain the control sequence \par. If an illegal \par appears, the macro call is aborted, and the \par will be rescanned.

⟦423 Declare the procedure called |macro_call|⟧ = ⟦
    // invokes a user-defined control sequence
    function macro_call() {
        label exit, continue, done, done1, found;
        var
          r: pointer, // current node in the macro's token 
          // list
          p: pointer, // current node in parameter token 
          // list being built
          q: pointer, // new node being put into the token 
          // list
          s: pointer, // backup pointer for parameter 
          // matching
          t: pointer, // cycle pointer for backup recovery
          u, v: pointer, // auxiliary pointers for backup 
          // recovery
          rbrace_ptr: pointer, // one step before the last 
          // right_brace token
          n: small_number, // the number of parameters 
          // scanned
          unbalance: halfword, // unmatched left braces in 
          // current parameter
          m: halfword, // the number of tokens or groups 
          // (usually)
          ref_count: pointer, // start of the token list
          save_scanner_status: small_number, //  
          // scanner_status upon entry
          save_warning_index: pointer, //  warning_index 
          // upon entry
          match_chr: ASCII_code; // character used in 
          // parameter
        
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        warning_index = cur_cs;
        ref_count = cur_chr;
        r = link(ref_count);
        n = 0;
        if (tracing_macros > 0) {
            ⟦435 Show the text of the macro being expanded⟧
        }
        if (info(r) == protected_token) {
            r = link(r);
        }
        if (info(r) != end_match_token) {
            ⟦425 Scan the parameters and make |link(r)| point to the macro body; but |return| if an illegal \.{\\par} is detected⟧
        }
        ⟦424 Feed the macro body and its parameters to the scanner⟧
      exit:
        scanner_status = save_scanner_status;
        warning_index = save_warning_index;
    }
⟧

424. Before we put a new token list on the input stack, it is wise to clean off all token lists that have recently been depleted. Then a user macro that ends with a call to itself will not require unbounded stack space.

⟦424 Feed the macro body and its parameters to the scanner⟧ = ⟦
    while (
        (state == token_list)
        && (loc == null) && (token_type != v_template)
    ) {
        // conserve stack space
        end_token_list;
    }

    begin_token_list(ref_count, macro)

    name = warning_index

    loc = link(r)

    if (n > 0) {
        if (param_ptr + n > max_param_stack) {
            max_param_stack = param_ptr + n;
            if (max_param_stack > param_size) {
                overflow(
                  strpool!("parameter stack size"),
                  param_size,
                );
            }
        }
        for (m in 0 to n - 1) {
            param_stack[param_ptr + m] = pstack[m];
        }
        param_ptr = param_ptr + n;
    }
⟧

425. At this point, the reader will find it advisable to review the explanation of token list format that was presented earlier, since many aspects of that format are of importance chiefly in the macro_call routine.

The token list might begin with a string of compulsory tokens before the first match or end_match . In that case the macro name is supposed to be followed by those tokens; the following program will set s == null to represent this restriction. Otherwise s will be set to the first token of a string that will delimit the next parameter.

⟦425 Scan the parameters and make |link(r)| point to the macro body; but |return| if an illegal \.{\\par} is detected⟧ = ⟦
    {
        scanner_status = matching;
        unbalance = 0;
        long_state = eq_type(cur_cs);
        if (long_state >= outer_call) {
            long_state = long_state - 2;
        }
        repeat {
            link(temp_head) = null;
            if (
                (info(r) >= end_match_token)
                || (info(r) < match_token)
            ) {
                s = null;
            } else {
                match_chr = info(r) - match_token;
                s = link(r);
                r = s;
                p = temp_head;
                m = 0;
            }
            // now info ( r ) is a token whose command code 
            // is either match or end_match 
            ⟦426 Scan a parameter until its delimiter string has been found; or, if |s=null|, simply scan the delimiter string⟧
        } until (info(r) == end_match_token);
    }
⟧

426. If info(r) is a match or end_match command, it cannot be equal to any token found by get_token . Therefore an undelimited parameter—i.e., a match that is immediately followed by match or end_match —will always fail the test ‘cur_tok == info(r) ’ in the following algorithm.

⟦426 Scan a parameter until its delimiter string has been found; or, if |s=null|, simply scan the delimiter string⟧ = ⟦
    // set cur_tok to the next token of input
    var continue: get_token;

    if (cur_tok == info(r)) {
        ⟦428 Advance \(r)|r|; |goto found| if the parameter delimiter has been fully matched, otherwise |goto continue|⟧
    }

    ⟦431 Contribute the recently matched tokens to the current parameter, and |goto continue| if a partial match is still in effect; but abort if |s=null|⟧

    if (cur_tok == par_token) {
        if (long_state != long_call) {
            ⟦430 Report a runaway argument and abort⟧
        }
    }

    if (cur_tok < right_brace_limit) {
        if (cur_tok < left_brace_limit) {
            ⟦433 Contribute an entire group to the current parameter⟧
        } else {
            ⟦429 Report an extra right brace and |goto continue|⟧
        }
    } else {
        ⟦427 Store the current token, but |goto continue| if it is a blank space that would become an undelimited parameter⟧
    }

    incr(m)

    if (info(r) > end_match_token) {
        goto continue;
    }

    if (info(r) < match_token) {
        goto continue;
    }

    found:

    if (s != null) {
        ⟦434 Tidy up the parameter just scanned, and tuck it away⟧
    }
⟧

427.

⟦427 Store the current token, but |goto continue| if it is a blank space that would become an undelimited parameter⟧ = ⟦
    {
        if (cur_tok == space_token) {
            if (info(r) <= end_match_token) {
                if (info(r) >= match_token) {
                    goto continue;
                }
            }
        }
        store_new_token(cur_tok);
    }
⟧

428. A slightly subtle point arises here: When the parameter delimiter ends with ‘#{’, the token list will have a left brace both before and after the end_match . Only one of these should affect the align_state , but both will be scanned, so we must make a correction.

⟦428 Advance \(r)|r|; |goto found| if the parameter delimiter has been fully matched, otherwise |goto continue|⟧ = ⟦
    {
        r = link(r);
        if (
            (info(r) >= match_token)
            && (info(r) <= end_match_token)
        ) {
            if (cur_tok < left_brace_limit) {
                decr(align_state);
            }
            goto found;
        } else {
            goto continue;
        }
    }
⟧

429.

⟦429 Report an extra right brace and |goto continue|⟧ = ⟦
    {
        back_input;
        print_err(strpool!("Argument of "));
        sprint_cs(warning_index);
        print(strpool!(" has an extra }"));
        help6(
          strpool!("I've run across a `}' that doesn't seem to match anything."),
        )(
          strpool!("For example, `\\def\\a#1{...}' and `\\a}' would produce"),
        )(
          strpool!("this error. If you simply proceed now, the `\\par' that"),
        )(
          strpool!("I've just inserted will cause me to report a runaway"),
        )(
          strpool!("argument that might be the root of the problem. But if"),
        )(
          strpool!("your `}' was spurious, just type `2' and it will go away."),
        );
        incr(align_state);
        long_state = call;
        cur_tok = par_token;
        ins_error;
        goto continue;
        // a white lie; the \.{\\par} won't always trigger a 
        // runaway
    }
⟧

430. If long_state == outer_call , a runaway argument has already been reported.

⟦430 Report a runaway argument and abort⟧ = ⟦
    {
        if (long_state == call) {
            runaway;
            print_err(strpool!("Paragraph ended before "));
            sprint_cs(warning_index);
            print(strpool!(" was complete"));
            help3(
              strpool!("I suspect you've forgotten a `}', causing me to apply this"),
            )(
              strpool!("control sequence to too much text. How can we recover?"),
            )(
              strpool!("My plan is to forget the whole thing and hope for the best."),
            );
            back_error;
        }
        pstack[n] = link(temp_head);
        align_state = align_state - unbalance;
        for (m in 0 to n) {
            flush_list(pstack[m]);
        }
        return;
    }
⟧

431. When the following code becomes active, we have matched tokens from s to the predecessor of r , and we have found that cur_tok != info(r) . An interesting situation now presents itself: If the parameter is to be delimited by a string such as ‘ab’, and if we have scanned ‘aa’, we want to contribute one ‘a’ to the current parameter and resume looking for a ‘b’. The program must account for such partial matches and for others that can be quite complex. But most of the time we have s == r and nothing needs to be done.

Incidentally, it is possible for \par tokens to sneak in to certain parameters of non-\long macros. For example, consider a case like ‘\def\a#1\par!{...}’ where the first \par is not followed by an exclamation point. In such situations it does not seem appropriate to prohibit the \par, so TEX keeps quiet about this bending of the rules.

⟦431 Contribute the recently matched tokens to the current parameter, and |goto continue| if a partial match is still in effect; but abort if |s=null|⟧ = ⟦
    if (s != r) {
        if (s == null) {
            ⟦432 Report an improper use of the macro and abort⟧
        } else {
            t = s;
            repeat {
                store_new_token(info(t));
                incr(m);
                u = link(t);
                v = s;
                loop {
                    if (u == r) {
                        if (cur_tok != info(v)) {
                            goto done;
                        } else {
                            r = link(v);
                            goto continue;
                        }
                    }
                    if (info(u) != info(v)) {
                        goto done;
                    }
                    u = link(u);
                    v = link(v);
                }
              done:
                t = link(t);
            } until (t == r);
            // at this point, no tokens are recently matched
            r = s;
        }
    }
⟧

432.

⟦432 Report an improper use of the macro and abort⟧ = ⟦
    {
        print_err(strpool!("Use of "));
        sprint_cs(warning_index);
        print(strpool!(" doesn't match its definition"));
        help4(
          strpool!("If you say, e.g., `\\def\\a1{...}', then you must always"),
        )(
          strpool!("put `1' after `\\a', since control sequence names are"),
        )(
          strpool!("made up of letters only. The macro here has not been"),
        )(
          strpool!("followed by the required stuff, so I'm ignoring it."),
        );
        error;
        return;
    }
⟧

433.

⟦433 Contribute an entire group to the current parameter⟧ = ⟦
    {
        unbalance = 1;
        loop {
            fast_store_new_token(cur_tok);
            get_token;
            if (cur_tok == par_token) {
                if (long_state != long_call) {
                    ⟦430 Report a runaway argument and abort⟧
                }
            }
            if (cur_tok < right_brace_limit) {
                if (cur_tok < left_brace_limit) {
                    incr(unbalance);
                } else {
                    decr(unbalance);
                    if (unbalance == 0) {
                        goto done1;
                    }
                }
            }
        }
      done1:
        rbrace_ptr = p;
        store_new_token(cur_tok);
    }
⟧

434. If the parameter consists of a single group enclosed in braces, we must strip off the enclosing braces. That’s why rbrace_ptr was introduced.

⟦434 Tidy up the parameter just scanned, and tuck it away⟧ = ⟦
    {
        if ((m == 1) && (info(p) < right_brace_limit)) {
            link(rbrace_ptr) = null;
            free_avail(p);
            p = link(temp_head);
            pstack[n] = link(p);
            free_avail(p);
        } else {
            pstack[n] = link(temp_head);
        }
        incr(n);
        if (tracing_macros > 0) {
            if (
                (tracing_stack_levels == 0)
                || (input_ptr < tracing_stack_levels)
            ) {
                begin_diagnostic;
                print_nl(match_chr);
                print_int(n);
                print(strpool!("<-"));
                show_token_list(pstack[n - 1], null, 1000);
                end_diagnostic(false);
            }
        }
    }
⟧

435.

⟦435 Show the text of the macro being expanded⟧ = ⟦
    {
        begin_diagnostic;
        if (tracing_stack_levels > 0) {
            if (input_ptr < tracing_stack_levels) {
                v = input_ptr;
                print_ln;
                print_char(ord!("~"));
                while (v > 0) {
                    print_char(ord!("."));
                    decr(v);
                }
                print_cs(warning_index);
                token_show(ref_count);
            } else {
                print_char(ord!("~"));
                print_char(ord!("~"));
                print_cs(warning_index);
            }
        } else {
            print_ln;
            print_cs(warning_index);
            token_show(ref_count);
        }
        end_diagnostic(false);
    }
⟧

436. [26] Basic scanning subroutines. Let’s turn now to some procedures that TEX calls upon frequently to digest certain kinds of patterns in the input. Most of these are quite simple; some are quite elaborate. Almost all of the routines call get_x_token , which can cause them to be invoked recursively.

437. The scan_left_brace routine is called when a left brace is supposed to be the next non-blank token. (The term “left brace” means, more precisely, a character whose catcode is left_brace .) TEX allows \relax to appear before the left_brace .

// reads a mandatory left_brace 
function scan_left_brace() {
    ⟦438 Get the next non-blank non-relax non-call token⟧
    if (cur_cmd != left_brace) {
        print_err(strpool!("Missing { inserted"));
        help4(
          strpool!("A left brace was mandatory here, so I've put one in."),
        )(
          strpool!("You might want to delete and/or insert some corrections"),
        )(
          strpool!("so that I will find a matching right brace soon."),
        )(
          strpool!("(If you're confused by all this, try typing `I}' now.)"),
        );
        back_error;
        cur_tok = left_brace_token + ord!("{");
        cur_cmd = left_brace;
        cur_chr = ord!("{");
        incr(align_state);
    }
}

438.

⟦438 Get the next non-blank non-relax non-call token⟧ = ⟦
    repeat {
        get_x_token;
    } until ((cur_cmd != spacer) && (cur_cmd != relax))
⟧

439. The scan_optional_equals routine looks for an optional ‘=’ sign preceded by optional spaces; ‘\relax’ is not ignored here.

function scan_optional_equals() {
    ⟦440 Get the next non-blank non-call token⟧
    if (cur_tok != other_token + ord!("=")) {
        back_input;
    }
}

440.

⟦440 Get the next non-blank non-call token⟧ = ⟦
    repeat {
        get_x_token;
    } until (cur_cmd != spacer)
⟧

441. In case you are getting bored, here is a slightly less trivial routine: Given a string of lowercase letters, like ‘pt’ or ‘plus’ or ‘width’, the scan_keyword routine checks to see whether the next tokens of input match this string. The match must be exact, except that uppercase letters will match their lowercase counterparts; uppercase equivalents are determined by subtracting ord!("a") - ord!("A") , rather than using the uc_code table, since TEX uses this routine only for its own limited set of keywords.

If a match is found, the characters are effectively removed from the input and true is returned. Otherwise false is returned, and the input is left essentially unchanged (except for the fact that some macros may have been expanded, etc.).

// look for a given string
function scan_keyword(s: str_number): boolean {
    label exit;
    var
      p: pointer, // tail of the backup list
      q: pointer, // new node being added to the token list 
      // via store_new_token 
      k: pool_pointer, // index into str_pool 
      save_cur_cs: pointer; // to save cur_cs 
    
    p = backup_head;
    link(p) = null;
    if (s < too_big_char) {
        while (true) {
            // recursion is possible here
            get_x_token;
            if (
                (cur_cs == 0)
                && (
                    (cur_chr == s)
                    || (cur_chr == s - ord!("a") + ord!("A"))
                )
            ) {
                store_new_token(cur_tok);
                flush_list(link(backup_head));
                scan_keyword = true;
                return;
            } else if (
                (cur_cmd != spacer)
                || (p != backup_head)
            ) {
                back_input;
                if (p != backup_head) {
                    back_list(link(backup_head));
                }
                scan_keyword = false;
                return;
            }
        }
    }
    k = str_start_macro(s);
    save_cur_cs = cur_cs;
    while (k < str_start_macro(s + 1)) {
        // recursion is possible here
        get_x_token;
        if (
            (cur_cs == 0)
            && (
                (cur_chr == so(str_pool[k]))
                || (
                    cur_chr
                    == so(str_pool[k])
                    - ord!("a") + ord!("A")
                )
            )
        ) {
            store_new_token(cur_tok);
            incr(k);
        } else if ((cur_cmd != spacer) || (p != backup_head)) {
            back_input;
            if (p != backup_head) {
                back_list(link(backup_head));
            }
            cur_cs = save_cur_cs;
            scan_keyword = false;
            return;
        }
    }
    flush_list(link(backup_head));
    scan_keyword = true;
  exit:
}

442. Here is a procedure that sounds an alarm when mu and non-mu units are being switched.

function mu_error() {
    print_err(strpool!("Incompatible glue units"));
    help1(
      strpool!("I'm going to assume that 1mu=1pt when they're mixed."),
    );
    error;
}

443. The next routine ‘scan_something_internal ’ is used to fetch internal numeric quantities like ‘\hsize’, and also to handle the ‘\the’ when expanding constructions like ‘\the\toks0’ and ‘\the\baselineskip’. Soon we will be considering the scan_int procedure, which calls scan_something_internal ; on the other hand, scan_something_internal also calls scan_int , for constructions like ‘\catcode\̀$’ or ‘\fontdimen 3 \ff’. So we have to declare scan_int as a forward procedure. A few other procedures are also declared at this point.

// scans an integer value
forward_declaration scan_int();

⟦467 Declare procedures that scan restricted classes of integers⟧

⟦1492 Declare \eTeX\ procedures for scanning⟧

⟦612 Declare procedures that scan font-related stuff⟧

444. TEX doesn’t know exactly what to expect when scan_something_internal begins. For example, an integer or dimension or glue value could occur immediately after ‘\hskip’; and one can even say \the with respect to token lists in constructions like ‘\xdef\o{\the\output}’. On the other hand, only integers are allowed after a construction like ‘\count’. To handle the various possibilities, scan_something_internal has a level parameter, which tells the “highest” kind of quantity that scan_something_internal is allowed to produce. Six levels are distinguished, namely int_val , dimen_val , glue_val , mu_val , ident_val , and tok_val .

The output of scan_something_internal (and of the other routines scan_int , scan_dimen , and scan_glue below) is put into the global variable cur_val , and its level is put into cur_val_level . The highest values of cur_val_level are special: mu_val is used only when cur_val points to something in a “muskip” register, or to one of the three parameters \thinmuskip, \medmuskip, \thickmuskip; ident_val is used only when cur_val points to a font identifier; tok_val is used only when cur_val points to null or to the reference count of a token list. The last two cases are allowed only when scan_something_internal is called with level == tok_val .

If the output is glue, cur_val will point to a glue specification, and the reference count of that glue will have been updated to reflect this reference; if the output is a nonempty token list, cur_val will point to its reference count, but in this case the count will not have been updated. Otherwise cur_val will contain the integer or scaled value in question.

@define int_val => 0 // integer values
@define dimen_val => 1 // dimension values
@define glue_val => 2 // glue specifications
@define mu_val => 3 // math glue specifications
@define ident_val => 4 // font identifier
@define tok_val => 5 // token lists
// inter-character (class) token lists
@define inter_char_val => 6
⟦13 Global variables⟧ += ⟦
    // value returned by numeric scanners
    var cur_val: integer;

    // value returned by numeric scanners
    var cur_val1: integer;

    // the ``level'' of this value
    var cur_val_level: int_val .. tok_val;
⟧

445. The hash table is initialized with ‘\count’, ‘\dimen’, ‘\skip’, and ‘\muskip’ all having register as their command code; they are distinguished by the chr_code , which is either int_val , dimen_val , glue_val , or mu_val more than mem_bot (dynamic variable-size nodes cannot have these values)

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("count"),
      register,
      mem_bot + int_val,
    )

    primitive(
      strpool!("dimen"),
      register,
      mem_bot + dimen_val,
    )

    primitive(
      strpool!("skip"),
      register,
      mem_bot + glue_val,
    )

    primitive(
      strpool!("muskip"),
      register,
      mem_bot + mu_val,
    )
⟧

446.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    register:

    ⟦1643 Cases of |register| for |print_cmd_chr|⟧

447. OK, we’re ready for scan_something_internal itself. A second parameter, negative , is set true if the value that is found should be negated. It is assumed that cur_cmd and cur_chr represent the first token of the internal quantity to be scanned; an error will be signalled if cur_cmd < min_internal or cur_cmd > max_internal .

@define scanned_result_end(#) =>
    /*... opened earlier ...*/
        cur_val_level = #;
    }
@define scanned_result(#) =>
    {
        cur_val = #;
        scanned_result_end
    /* ... closed later ... */
@define char_class_limit => 0x1000
@define char_class_ignored => char_class_limit
@define char_class_boundary => (char_class_ignored - 1)
// fetch an internal parameter
function scan_something_internal(
  level: small_number,
  negative: boolean,
) {
    label exit, restart;
    var
      m: halfword, //  chr_code part of the operand token
      n, k, kk: integer, // accumulators
      q, r: pointer, // general purpose indices
      tx: pointer, // effective tail node
      i: four_quarters, // character info
      p: 0 .. nest_size; // index into nest 
    
  restart:
    m = cur_chr;
    case cur_cmd {
      def_code:
        ⟦448 Fetch a character code from some table⟧
      XeTeX_def_code:
        scan_usv_num;
        if (m == sf_code_base) {
            scanned_result(
              ho(sf_code(cur_val) div 0x10000),
            )(int_val);
        } else if (m == math_code_base) {
            scanned_result(ho(math_code(cur_val)))(int_val);
        } else if (m == math_code_base + 1) {
            print_err(
              strpool!("Can't use \\Umathcode as a number (try \\Umathcodenum)"),
            );
            help2(
              strpool!("\\Umathcode is for setting a mathcode from separate values;"),
            )(
              strpool!("use \\Umathcodenum to access them as single values."),
            );
            error;
            scanned_result(0)(int_val);
        } else if (m == del_code_base) {
            scanned_result(ho(del_code(cur_val)))(int_val);
        } else {
            print_err(
              strpool!("Can't use \\Udelcode as a number (try \\Udelcodenum)"),
            );
            help2(
              strpool!("\\Udelcode is for setting a delcode from separate values;"),
            )(
              strpool!("use \\Udelcodenum to access them as single values."),
            );
            error;
            scanned_result(0)(int_val);
        }
      toks_register,
      assign_toks,
      def_family,
      set_font,
      def_font:
        ⟦449 Fetch a token list or font identifier, provided that |level=tok_val|⟧
      assign_int:
        scanned_result(eqtb[m].int)(int_val);
      assign_dimen:
        scanned_result(eqtb[m].sc)(dimen_val);
      assign_glue:
        scanned_result(equiv(m))(glue_val);
      assign_mu_glue:
        scanned_result(equiv(m))(mu_val);
      set_aux:
        ⟦452 Fetch the |space_factor| or the |prev_depth|⟧
      set_prev_graf:
        ⟦456 Fetch the |prev_graf|⟧
      set_page_int:
        ⟦453 Fetch the |dead_cycles| or the |insert_penalties|⟧
      set_page_dimen:
        ⟦455 Fetch something on the |page_so_far|⟧
      set_shape:
        ⟦457 Fetch the |par_shape| size⟧
      set_box_dimen:
        ⟦454 Fetch a box dimension⟧
      char_given, math_given, XeTeX_math_given:
        scanned_result(cur_chr)(int_val);
      assign_font_dimen:
        ⟦459 Fetch a font dimension⟧
      assign_font_int:
        ⟦460 Fetch a font integer⟧
      register:
        ⟦461 Fetch a register⟧
      last_item:
        ⟦458 Fetch an item in the current node, if appropriate⟧
        // trap unexpandable primitives
      ignore_spaces:
        if (cur_chr == 1) {
            ⟦403 Reset |cur_tok| for unexpandable primitives, goto restart⟧
        }
      othercases:
        ⟦462 Complain that \.{\\the} can't do this; give zero result⟧
    }
    while (cur_val_level > level) {
        ⟦463 Convert \(c)|cur_val| to a lower level⟧
    }
    ⟦464 Fix the reference count, if any, and negate |cur_val| if |negative|⟧
  exit:
}

448.

⟦448 Fetch a character code from some table⟧ = ⟦
    {
        scan_usv_num;
        if (m == math_code_base) {
            cur_val1 = ho(math_code(cur_val));
            if (is_active_math_char(cur_val1)) {
                cur_val1 = 0x8000;
            } else if (
                (math_class_field(cur_val1) > 7)
                || (math_fam_field(cur_val1) > 15)
                || (math_char_field(cur_val1) > 255)
            ) {
                print_err(
                  strpool!("Extended mathchar used as mathchar"),
                );
                help2(
                  strpool!("A mathchar number must be between 0 and \"7FFF."),
                )(strpool!("I changed this one to zero."));
                int_error(cur_val1);
                cur_val1 = 0;
            }
            cur_val1 = 
                (math_class_field(cur_val1) * 0x1000)
                + (math_fam_field(cur_val1) * 0x100)
                + math_char_field(cur_val1)
            ;
            scanned_result(cur_val1)(int_val);
        } else if (m == del_code_base) {
            cur_val1 = del_code(cur_val);
            if (cur_val1 >= 0x40000000) {
                print_err(
                  strpool!("Extended delcode used as delcode"),
                );
                help2(
                  strpool!("A delimiter code must be between 0 and \"7FFFFFF."),
                )(strpool!("I changed this one to zero."));
                error;
                scanned_result(0)(int_val);
            } else {
                scanned_result(cur_val1)(int_val);
            }
        } else if (m < sf_code_base) {
            scanned_result(equiv(m + cur_val))(int_val);
        } else if (m < math_code_base) {
            scanned_result(equiv(m + cur_val) % 0x10000)(
              int_val,
            );
        } else {
            scanned_result(eqtb[m + cur_val].int)(int_val);
        }
    }
⟧

449.

⟦449 Fetch a token list or font identifier, provided that |level=tok_val|⟧ = ⟦
    if (level != tok_val) {
        print_err(
          strpool!("Missing number, treated as zero"),
        );
        help3(
          strpool!("A number should have been here; I inserted `0'."),
        )(
          strpool!("(If you can't figure out why I needed to see a number,"),
        )(
          strpool!("look up `weird error' in the index to The TeXbook.)"),
        );
        back_error;
        scanned_result(0)(dimen_val);
    } else if (cur_cmd <= assign_toks) {
        //  cur_cmd == toks_register 
        if (cur_cmd < assign_toks) {
            if (m == mem_bot) {
                scan_register_num;
                if (cur_val < 256) {
                    cur_val = equiv(toks_base + cur_val);
                } else {
                    find_sa_element(
                      tok_val,
                      cur_val,
                      false,
                    );
                    if (cur_ptr == null) {
                        cur_val = null;
                    } else {
                        cur_val = sa_ptr(cur_ptr);
                    }
                }
            } else {
                cur_val = sa_ptr(m);
            }
        } else if (cur_chr == XeTeX_inter_char_loc) {
            scan_char_class_not_ignored;
            cur_ptr = cur_val;
            scan_char_class_not_ignored;
            find_sa_element(
              inter_char_val,
              cur_ptr * char_class_limit + cur_val,
              false,
            );
            if (cur_ptr == null) {
                cur_val = null;
            } else {
                cur_val = sa_ptr(cur_ptr);
            }
        } else {
            cur_val = equiv(m);
        }
        cur_val_level = tok_val;
    } else {
        back_input;
        scan_font_ident;
        scanned_result(font_id_base + cur_val)(ident_val);
    }
⟧

450. Users refer to ‘\the\spacefactor’ only in horizontal mode, and to ‘\the\prevdepth’ only in vertical mode; so we put the associated mode in the modifier part of the set_aux command. The set_page_int command has modifier 0 or 1, for ‘\deadcycles’ and ‘\insertpenalties’, respectively. The set_box_dimen command is modified by either width_offset , height_offset , or depth_offset . And the last_item command is modified by either int_val , dimen_val , glue_val , input_line_no_code , or badness_code . 𝜀-TEX inserts last_node_type_code after glue_val and adds the codes for its extensions: eTeX_version_code , … .

// code for \.{\\lastnodetype}
@define last_node_type_code => glue_val + 1
// code for \.{\\inputlineno}
@define input_line_no_code => glue_val + 2
// code for \.{\\badness}
@define badness_code => input_line_no_code + 1
// base for \pdfTeX's command codes
@define pdftex_first_rint_code => badness_code + 1
// code for \.{\\pdflastxpos}
@define pdf_last_x_pos_code => pdftex_first_rint_code + 6
// code for \.{\\pdflastypos}
@define pdf_last_y_pos_code => pdftex_first_rint_code + 7
// code for \.{\\elapsedtime}
@define elapsed_time_code => pdftex_first_rint_code + 10
// code for \.{\\shellescape}
@define pdf_shell_escape_code => pdftex_first_rint_code + 11
// code for \.{\\randomseed}
@define random_seed_code => pdftex_first_rint_code + 12
// end of \pdfTeX's command codes
@define pdftex_last_item_codes => pdftex_first_rint_code + 12
// first of \eTeX\ codes for integers
@define eTeX_int => pdftex_last_item_codes + 1
// base for \XeTeX's command codes
@define XeTeX_int => eTeX_int + 8
// code for \.{\\XeTeXversion}
@define XeTeX_version_code => XeTeX_int + 0
// code for \.{\\XeTeXcountglyphs}
@define XeTeX_count_glyphs_code => XeTeX_int + 1
// Deprecated
@define XeTeX_count_variations_code => XeTeX_int + 2
@define XeTeX_variation_code => XeTeX_int + 3 // Deprecated
// Deprecated
@define XeTeX_find_variation_by_name_code => XeTeX_int + 4
// Deprecated
@define XeTeX_variation_min_code => XeTeX_int + 5
// Deprecated
@define XeTeX_variation_max_code => XeTeX_int + 6
// Deprecated
@define XeTeX_variation_default_code => XeTeX_int + 7
// code for \.{\\XeTeXcountfeatures}
@define XeTeX_count_features_code => XeTeX_int + 8
// code for \.{\\XeTeXfeaturecode}
@define XeTeX_feature_code_code => XeTeX_int + 9
// code for \.{\\XeTeXfindfeaturebyname}
@define XeTeX_find_feature_by_name_code => XeTeX_int + 10
// code for \.{\\XeTeXisexclusivefeature}
@define XeTeX_is_exclusive_feature_code => XeTeX_int + 11
// code for \.{\\XeTeXcountselectors}
@define XeTeX_count_selectors_code => XeTeX_int + 12
// code for \.{\\XeTeXselectorcode}
@define XeTeX_selector_code_code => XeTeX_int + 13
// code for \.{\\XeTeXfindselectorbyname}
@define XeTeX_find_selector_by_name_code => XeTeX_int + 14
// code for \.{\\XeTeXisdefaultselector}
@define XeTeX_is_default_selector_code => XeTeX_int + 15
// code for \.{\\XeTeXOTcountscripts}
@define XeTeX_OT_count_scripts_code => XeTeX_int + 16
// code for \.{\\XeTeXOTcountlanguages}
@define XeTeX_OT_count_languages_code => XeTeX_int + 17
// code for \.{\\XeTeXOTcountfeatures}
@define XeTeX_OT_count_features_code => XeTeX_int + 18
// code for \.{\\XeTeXOTscripttag}
@define XeTeX_OT_script_code => XeTeX_int + 19
// code for \.{\\XeTeXOTlanguagetag}
@define XeTeX_OT_language_code => XeTeX_int + 20
// code for \.{\\XeTeXOTfeaturetag}
@define XeTeX_OT_feature_code => XeTeX_int + 21
// code for \.{\\XeTeXcharglyph}
@define XeTeX_map_char_to_glyph_code => XeTeX_int + 22
// code for \.{\\XeTeXglyphindex}
@define XeTeX_glyph_index_code => XeTeX_int + 23
// code for \.{\\XeTeXfonttype}
@define XeTeX_font_type_code => XeTeX_int + 24
// code for \.{\\XeTeXfirstfontchar}
@define XeTeX_first_char_code => XeTeX_int + 25
// code for \.{\\XeTeXlastfontchar}
@define XeTeX_last_char_code => XeTeX_int + 26
// code for \.{\\XeTeXpdfpagecount}
@define XeTeX_pdf_page_count_code => XeTeX_int + 27
// end of \XeTeX's command codes
@define XeTeX_last_item_codes => XeTeX_int + 27
// first of \XeTeX\ codes for dimensions
@define XeTeX_dim => XeTeX_last_item_codes + 1
// code for \.{\\XeTeXglyphbounds}
@define XeTeX_glyph_bounds_code => XeTeX_dim + 0
// end of \XeTeX's command codes
@define XeTeX_last_dim_codes => XeTeX_dim + 0
// first of \eTeX\ codes for dimensions
@define eTeX_dim => XeTeX_last_dim_codes + 1
// first of \eTeX\ codes for glue
@define eTeX_glue => eTeX_dim + 9
// first of \eTeX\ codes for muglue
@define eTeX_mu => eTeX_glue + 1
// first of \eTeX\ codes for expressions
@define eTeX_expr => eTeX_mu + 1
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("spacefactor"), set_aux, hmode)

    primitive(strpool!("prevdepth"), set_aux, vmode)

    primitive(strpool!("deadcycles"), set_page_int, 0)

    primitive(strpool!("insertpenalties"), set_page_int, 1)

    primitive(strpool!("wd"), set_box_dimen, width_offset)

    primitive(strpool!("ht"), set_box_dimen, height_offset)

    primitive(strpool!("dp"), set_box_dimen, depth_offset)

    primitive(strpool!("lastpenalty"), last_item, int_val)

    primitive(strpool!("lastkern"), last_item, dimen_val)

    primitive(strpool!("lastskip"), last_item, glue_val)

    primitive(
      strpool!("inputlineno"),
      last_item,
      input_line_no_code,
    )

    primitive(strpool!("badness"), last_item, badness_code)

    primitive(
      strpool!("pdflastxpos"),
      last_item,
      pdf_last_x_pos_code,
    )

    primitive(
      strpool!("pdflastypos"),
      last_item,
      pdf_last_y_pos_code,
    )

    primitive(
      strpool!("elapsedtime"),
      last_item,
      elapsed_time_code,
    )

    primitive(
      strpool!("shellescape"),
      last_item,
      pdf_shell_escape_code,
    )

    primitive(
      strpool!("randomseed"),
      last_item,
      random_seed_code,
    )
⟧

451.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    set_aux:

    if (chr_code == vmode) {
        print_esc(strpool!("prevdepth"));
    } else {
        print_esc(strpool!("spacefactor"));
    }

    set_page_int:

    if (chr_code == 0) {
        print_esc(strpool!("deadcycles"));
    }

    ⟦1503 Cases of |set_page_int| for |print_cmd_chr|⟧

    else

    print_esc(strpool!("insertpenalties"))

    set_box_dimen:

    if (chr_code == width_offset) {
        print_esc(strpool!("wd"));
    } else if (chr_code == height_offset) {
        print_esc(strpool!("ht"));
    } else {
        print_esc(strpool!("dp"));
    }

    last_item:

    case chr_code {
      int_val:
        print_esc(strpool!("lastpenalty"));
      dimen_val:
        print_esc(strpool!("lastkern"));
      glue_val:
        print_esc(strpool!("lastskip"));
      input_line_no_code:
        print_esc(strpool!("inputlineno"));
      ⟦1453 Cases of |last_item| for |print_cmd_chr|⟧
      pdf_last_x_pos_code:
        print_esc(strpool!("pdflastxpos"));
      pdf_last_y_pos_code:
        print_esc(strpool!("pdflastypos"));
      elapsed_time_code:
        print_esc(strpool!("elapsedtime"));
      pdf_shell_escape_code:
        print_esc(strpool!("shellescape"));
      random_seed_code:
        print_esc(strpool!("randomseed"));
      othercases:
        print_esc(strpool!("badness"));
    }
⟧

452.

⟦452 Fetch the |space_factor| or the |prev_depth|⟧ = ⟦
    if (abs(mode) != m) {
        print_err(strpool!("Improper "));
        print_cmd_chr(set_aux, m);
        help4(
          strpool!("You can refer to \\spacefactor only in horizontal mode;"),
        )(
          strpool!("you can refer to \\prevdepth only in vertical mode; and"),
        )(
          strpool!("neither of these is meaningful inside \\write. So"),
        )(
          strpool!("I'm forgetting what you said and using zero instead."),
        );
        error;
        if (level != tok_val) {
            scanned_result(0)(dimen_val);
        } else {
            scanned_result(0)(int_val);
        }
    } else if (m == vmode) {
        scanned_result(prev_depth)(dimen_val);
    } else {
        scanned_result(space_factor)(int_val);
    }
⟧

453.

⟦453 Fetch the |dead_cycles| or the |insert_penalties|⟧ = ⟦
    begin

    if (m == 0) {
        cur_val = dead_cycles;
    }

    ⟦1504 Cases for `Fetch the |dead_cycles| or the |insert_penalties|'⟧

    else

    cur_val = insert_penalties

    cur_val_level = int_val

    end
⟧

454.

⟦454 Fetch a box dimension⟧ = ⟦
    {
        scan_register_num;
        fetch_box(q);
        if (q == null) {
            cur_val = 0;
        } else {
            cur_val = mem[q + m].sc;
        }
        cur_val_level = dimen_val;
    }
⟧

455. Inside an \output routine, a user may wish to look at the page totals that were present at the moment when output was triggered.

@define max_dimen => 0x3fffffff // $2^{30}-1$
⟦455 Fetch something on the |page_so_far|⟧ = ⟦
    {
        if ((page_contents == empty) && (!output_active)) {
            if (m == 0) {
                cur_val = max_dimen;
            } else {
                cur_val = 0;
            }
        } else {
            cur_val = page_so_far[m];
        }
        cur_val_level = dimen_val;
    }
⟧

456.

⟦456 Fetch the |prev_graf|⟧ = ⟦
    if (mode == 0) {
        //  prev_graf == 0 within \.{\\write}
        scanned_result(0)(int_val);
    } else {
        nest[nest_ptr] = cur_list;
        p = nest_ptr;
        while (abs(nest[p].mode_field) != vmode) {
            decr(p);
        }
        scanned_result(nest[p].pg_field)(int_val);
    }
⟧

457.

⟦457 Fetch the |par_shape| size⟧ = ⟦
    {
        if (m > par_shape_loc) {
            ⟦1677 Fetch a penalties array element⟧
        } else if (par_shape_ptr == null) {
            cur_val = 0;
        } else {
            cur_val = info(par_shape_ptr);
        }
        cur_val_level = int_val;
    }
⟧

458. Here is where \lastpenalty, \lastkern, \lastskip, and \lastnodetype are implemented. The reference count for \lastskip will be updated later.

We also handle \inputlineno and \badness here, because they are legal in similar contexts.

The macro find_effective_tail_eTeX sets tx to the last non-\endM node of the current list.

@define find_effective_tail_eTeX =>
    tx = tail;
    if (!is_char_node(tx)) {
        if (
            (type(tx) == math_node)
            && (subtype(tx) == end_M_code)
        ) {
            r = head;
            repeat {
                q = r;
                r = link(q);
            } until (r == tx);
            tx = q;
        }
    }
@define find_effective_tail => find_effective_tail_eTeX
⟦458 Fetch an item in the current node, if appropriate⟧ = ⟦
    if (m >= input_line_no_code) {
        if (m >= eTeX_glue) {
            ⟦1591 Process an expression and |return|⟧
        } else if (m >= XeTeX_dim) {
            case m {
              ⟦1458 Cases for fetching a dimension value⟧// 
              // there are no other cases
            }
            cur_val_level = dimen_val;
        } else {
            case m {
              input_line_no_code:
                cur_val = line;
              badness_code:
                cur_val = last_badness;
              elapsed_time_code:
                cur_val = get_microinterval;
              random_seed_code:
                cur_val = random_seed;
              pdf_shell_escape_code:
                if (shellenabledp) {
                    if (restrictedshell) {
                        cur_val = 2;
                    } else {
                        cur_val = 1;
                    }
                } else {
                    cur_val = 0;
                }
              ⟦1454 Cases for fetching an integer value⟧// 
              // there are no other cases
            }
            cur_val_level = int_val;
        }
    } else {
        if (cur_chr == glue_val) {
            cur_val = zero_glue;
        } else {
            cur_val = 0;
        }
        find_effective_tail;
        if (cur_chr == last_node_type_code) {
            cur_val_level = int_val;
            if ((tx == head) || (mode == 0)) {
                cur_val = -1;
            }
        } else {
            cur_val_level = cur_chr;
        }
        if (!is_char_node(tx) && (mode != 0)) {
            case cur_chr {
              int_val:
                if (type(tx) == penalty_node) {
                    cur_val = penalty(tx);
                }
              dimen_val:
                if (type(tx) == kern_node) {
                    cur_val = width(tx);
                }
              glue_val:
                if (type(tx) == glue_node) {
                    cur_val = glue_ptr(tx);
                    if (subtype(tx) == mu_glue) {
                        cur_val_level = mu_val;
                    }
                }
              last_node_type_code:
                if (type(tx) <= unset_node) {
                    cur_val = type(tx) + 1;
                } else {
                    cur_val = unset_node + 2;
                }// there are no other cases
            }
        } else if ((mode == vmode) && (tx == head)) {
            case cur_chr {
              int_val:
                cur_val = last_penalty;
              dimen_val:
                cur_val = last_kern;
              glue_val:
                if (last_glue != max_halfword) {
                    cur_val = last_glue;
                }
              last_node_type_code:
                cur_val = last_node_type;// there are no 
              // other cases
            }
        }
    }
⟧

459.

⟦459 Fetch a font dimension⟧ = ⟦
    {
        find_font_dimen(false);
        font_info[fmem_ptr].sc = 0;
        scanned_result(font_info[cur_val].sc)(dimen_val);
    }
⟧

460.

⟦460 Fetch a font integer⟧ = ⟦
    {
        scan_font_ident;
        if (m == 0) {
            scanned_result(hyphen_char[cur_val])(int_val);
        } else if (m == 1) {
            scanned_result(skew_char[cur_val])(int_val);
        } else {
            n = cur_val;
            if (is_native_font(n)) {
                scan_glyph_number(n);
            } else {
                scan_char_num;
            }
            k = cur_val;
            case m {
              lp_code_base:
                scanned_result(
                  get_cp_code(n, k, left_side),
                )(int_val);
              rp_code_base:
                scanned_result(
                  get_cp_code(n, k, right_side),
                )(int_val);
            }
        }
    }
⟧

461.

⟦461 Fetch a register⟧ = ⟦
    {
        if ((m < mem_bot) || (m > lo_mem_stat_max)) {
            cur_val_level = sa_type(m);
            if (cur_val_level < glue_val) {
                cur_val = sa_int(m);
            } else {
                cur_val = sa_ptr(m);
            }
        } else {
            scan_register_num;
            cur_val_level = m - mem_bot;
            if (cur_val > 255) {
                find_sa_element(
                  cur_val_level,
                  cur_val,
                  false,
                );
                if (cur_ptr == null) {
                    if (cur_val_level < glue_val) {
                        cur_val = 0;
                    } else {
                        cur_val = zero_glue;
                    }
                } else if (cur_val_level < glue_val) {
                    cur_val = sa_int(cur_ptr);
                } else {
                    cur_val = sa_ptr(cur_ptr);
                }
            } else {
                case cur_val_level {
                  int_val:
                    cur_val = count(cur_val);
                  dimen_val:
                    cur_val = dimen(cur_val);
                  glue_val:
                    cur_val = skip(cur_val);
                  mu_val:
                    cur_val = mu_skip(cur_val);// there are 
                  // no other cases
                }
            }
        }
    }
⟧

462.

⟦462 Complain that \.{\\the} can't do this; give zero result⟧ = ⟦
    {
        print_err(strpool!("You can't use `"));
        print_cmd_chr(cur_cmd, cur_chr);
        print(strpool!("' after "));
        print_esc(strpool!("the"));
        help1(
          strpool!("I'm forgetting what you said and using zero instead."),
        );
        error;
        if (level != tok_val) {
            scanned_result(0)(dimen_val);
        } else {
            scanned_result(0)(int_val);
        }
    }
⟧

463. When a glue_val changes to a dimen_val , we use the width component of the glue; there is no need to decrease the reference count, since it has not yet been increased. When a dimen_val changes to an int_val , we use scaled points so that the value doesn’t actually change. And when a mu_val changes to a glue_val , the value doesn’t change either.

⟦463 Convert \(c)|cur_val| to a lower level⟧ = ⟦
    {
        if (cur_val_level == glue_val) {
            cur_val = width(cur_val);
        } else if (cur_val_level == mu_val) {
            mu_error;
        }
        decr(cur_val_level);
    }
⟧

464. If cur_val points to a glue specification at this point, the reference count for the glue does not yet include the reference by cur_val . If negative is true , cur_val_level is known to be <=mu_val .

⟦464 Fix the reference count, if any, and negate |cur_val| if |negative|⟧ = ⟦
    if (negative) {
        if (cur_val_level >= glue_val) {
            cur_val = new_spec(cur_val);
            ⟦465 Negate all three glue components of |cur_val|⟧
        } else {
            negate(cur_val);
        }
    } else if (
        (cur_val_level >= glue_val)
        && (cur_val_level <= mu_val)
    ) {
        add_glue_ref(cur_val);
    }
⟧

465.

⟦465 Negate all three glue components of |cur_val|⟧ = ⟦
    {
        negate(width(cur_val));
        negate(stretch(cur_val));
        negate(shrink(cur_val));
    }
⟧

466. Our next goal is to write the scan_int procedure, which scans anything that TEX treats as an integer. But first we might as well look at some simple applications of scan_int that have already been made inside of scan_something_internal .

467.

⟦467 Declare procedures that scan restricted classes of integers⟧ = ⟦
    // scan a glyph ID for native font f , identified by 
    // Unicode value or name or glyph number
    function scan_glyph_number(f: internal_font_number) {
        // set cp value by glyph name
        if (scan_keyword(ord!("/"))) {
            // result is in nameoffile 
            scan_and_pack_name;
            scanned_result(map_glyph_to_index(f))(int_val);
        } else // set cp value by unicode
        if (scan_keyword(ord!("u"))) {
            scan_char_num;
            scanned_result(map_char_to_glyph(f, cur_val))(
              int_val,
            );
        } else {
            scan_int;
        }
    }

    function scan_char_class() {
        scan_int;
        if ((cur_val < 0) || (cur_val > char_class_limit)) {
            print_err(strpool!("Bad character class"));
            help2(
              strpool!("A character class must be between 0 and 4096."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }

    function scan_char_class_not_ignored() {
        scan_int;
        if ((cur_val < 0) || (cur_val > char_class_limit)) {
            print_err(strpool!("Bad character class"));
            help2(
              strpool!("A class for inter-character transitions must be between 0 and 4095."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }

    function scan_eight_bit_int() {
        scan_int;
        if ((cur_val < 0) || (cur_val > 255)) {
            print_err(strpool!("Bad register code"));
            help2(
              strpool!("A register code or char class must be between 0 and 255."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

468.

⟦467 Declare procedures that scan restricted classes of integers⟧ += ⟦
    function scan_usv_num() {
        scan_int;
        if ((cur_val < 0) || (cur_val > biggest_usv)) {
            print_err(strpool!("Bad character code"));
            help2(
              strpool!("A Unicode scalar value must be between 0 and \"10FFFF."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }

    function scan_char_num() {
        scan_int;
        if ((cur_val < 0) || (cur_val > biggest_char)) {
            print_err(strpool!("Bad character code"));
            help2(
              strpool!("A character number must be between 0 and 65535."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

469. While we’re at it, we might as well deal with similar routines that will be needed later.

⟦467 Declare procedures that scan restricted classes of integers⟧ += ⟦
    function scan_xetex_math_char_int() {
        scan_int;
        if (is_active_math_char(cur_val)) {
            if (cur_val != active_math_char) {
                print_err(
                  strpool!("Bad active XeTeX math code"),
                );
                help2(
                  strpool!("Since I ignore class and family for active math chars,"),
                )(
                  strpool!("I changed this one to \"1FFFFF."),
                );
                int_error(cur_val);
                cur_val = active_math_char;
            }
        } else if (math_char_field(cur_val) > biggest_usv) {
            print_err(
              strpool!("Bad XeTeX math character code"),
            );
            help2(
              strpool!("Since I expected a character number between 0 and \"10FFFF,"),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }

    function scan_math_class_int() {
        scan_int;
        if ((cur_val < 0) || (cur_val > 7)) {
            print_err(strpool!("Bad math class"));
            help2(
              strpool!("Since I expected to read a number between 0 and 7,"),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }

    function scan_math_fam_int() {
        scan_int;
        if (
            (cur_val < 0)
            || (cur_val > number_math_families - 1)
        ) {
            print_err(strpool!("Bad math family"));
            help2(
              strpool!("Since I expected to read a number between 0 and 255,"),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }

    function scan_four_bit_int() {
        scan_int;
        if ((cur_val < 0) || (cur_val > 15)) {
            print_err(strpool!("Bad number"));
            help2(
              strpool!("Since I expected to read a number between 0 and 15,"),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

470.

⟦467 Declare procedures that scan restricted classes of integers⟧ += ⟦
    function scan_fifteen_bit_int() {
        scan_int;
        if ((cur_val < 0) || (cur_val > 0x7fff)) {
            print_err(strpool!("Bad mathchar"));
            help2(
              strpool!("A mathchar number must be between 0 and 32767."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

471.

⟦467 Declare procedures that scan restricted classes of integers⟧ += ⟦
    function scan_delimiter_int() {
        scan_int;
        if ((cur_val < 0) || (cur_val > 0x7ffffff)) {
            print_err(strpool!("Bad delimiter code"));
            help2(
              strpool!("A numeric delimiter code must be between 0 and 2^{27}-1."),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

472. An integer number can be preceded by any number of spaces and ‘+’ or ‘-’ signs. Then comes either a decimal constant (i.e., radix 10), an octal constant (i.e., radix 8, preceded by ), a hexadecimal constant (radix 16, preceded by "), an alphabetic constant (preceded by ̀), or an internal variable. After scanning is complete, cur_val will contain the answer, which must be at most 2311=2147483647 in absolute value. The value of radix is set to 10, 8, or 16 in the cases of decimal, octal, or hexadecimal constants, otherwise radix is set to zero. An optional space follows a constant.

// apostrophe, indicates an octal constant
@define octal_token => other_token + ord!("'")
// double quote, indicates a hex constant
@define hex_token => other_token + ord!("\"")
// reverse apostrophe, precedes alpha constants
@define alpha_token => other_token + ord!("`")
// decimal point
@define point_token => other_token + ord!(".")
// decimal point, Eurostyle
@define continental_point_token => other_token + ord!(",")
⟦13 Global variables⟧ += ⟦
    //  scan_int sets this to 8, 10, 16, or zero
    var radix: small_number;
⟧

473. We initialize the following global variables just in case expand comes into action before any of the basic scanning routines has assigned them a value.

⟦23 Set initial values of key variables⟧ += ⟦
    cur_val = 0

    cur_val_level = int_val

    radix = 0

    cur_order = normal
⟧

474. The scan_int routine is used also to scan the integer part of a fraction; for example, the ‘3’ in ‘3.14159’ will be found by scan_int . The scan_dimen routine assumes that cur_tok == point_token after the integer part of such a fraction has been scanned by scan_int , and that the decimal point has been backed up to be scanned again.

// sets cur_val to an integer
function scan_int() {
    label done, restart;
    var
      negative: boolean, // should the answer be negated?
      m: integer, //  2^{31} div radix , the threshold of 
      // danger
      d: small_number, // the digit just scanned
      vacuous: boolean, // have no digits appeared?
      OK_so_far: boolean; // has an error message been 
      // issued?
    
    radix = 0;
    OK_so_far = true;
    ⟦475 Get the next non-blank non-sign token; set |negative| appropriately⟧
  restart:
    if (cur_tok == alpha_token) {
        ⟦476 Scan an alphabetic character code into |cur_val|⟧
    } else if (cur_tok == cs_token_flag + frozen_primitive) {
        ⟦403 Reset |cur_tok| for unexpandable primitives, goto restart⟧
    } else if (
        (cur_cmd >= min_internal)
        && (cur_cmd <= max_internal)
    ) {
        scan_something_internal(int_val, false);
    } else {
        ⟦478 Scan a numeric constant⟧
    }
    if (negative) {
        negate(cur_val);
    }
}

475.

⟦475 Get the next non-blank non-sign token; set |negative| appropriately⟧ = ⟦
    negative = false

    repeat {
        ⟦440 Get the next non-blank non-call token⟧
        if (cur_tok == other_token + ord!("-")) {
            negative = !negative;
            cur_tok = other_token + ord!("+");
        }
    } until (cur_tok != other_token + ord!("+"))
⟧

476. A space is ignored after an alphabetic character constant, so that such constants behave like numeric ones.

⟦476 Scan an alphabetic character code into |cur_val|⟧ = ⟦
    {
        // suppress macro expansion
        get_token;
        if (cur_tok < cs_token_flag) {
            cur_val = cur_chr;
            if (cur_cmd <= right_brace) {
                if (cur_cmd == right_brace) {
                    incr(align_state);
                } else {
                    decr(align_state);
                }
            }
        } else if (cur_tok < cs_token_flag + single_base) {
            cur_val = cur_tok - cs_token_flag - active_base;
        } else {
            cur_val = cur_tok - cs_token_flag - single_base;
        }
        if (cur_val > biggest_usv) {
            print_err(
              strpool!("Improper alphabetic constant"),
            );
            help2(
              strpool!("A one-character control sequence belongs after a ` mark."),
            )(
              strpool!("So I'm essentially inserting \\0 here."),
            );
            cur_val = ord!("0");
            back_error;
        } else {
            ⟦477 Scan an optional space⟧
        }
    }
⟧

477.

⟦477 Scan an optional space⟧ = ⟦
    {
        get_x_token;
        if (cur_cmd != spacer) {
            back_input;
        }
    }
⟧

478.

⟦478 Scan a numeric constant⟧ = ⟦
    {
        radix = 10;
        m = 214748364;
        if (cur_tok == octal_token) {
            radix = 8;
            m = 0x10000000;
            get_x_token;
        } else if (cur_tok == hex_token) {
            radix = 16;
            m = 0x8000000;
            get_x_token;
        }
        vacuous = true;
        cur_val = 0;
        ⟦479 Accumulate the constant until |cur_tok| is not a suitable digit⟧
        if (vacuous) {
            ⟦480 Express astonishment that no number was here⟧
        } else if (cur_cmd != spacer) {
            back_input;
        }
    }
⟧

479.

// the largest positive value that \TeX\ knows
@define infinity => 0x7fffffff
// zero, the smallest digit
@define zero_token => other_token + ord!("0")
// the smallest special hex digit
@define A_token => letter_token + ord!("A")
// special hex digit of type other_char 
@define other_A_token => other_token + ord!("A")
⟦479 Accumulate the constant until |cur_tok| is not a suitable digit⟧ = ⟦
    loop {
        if (
            (cur_tok < zero_token + radix)
            && (cur_tok >= zero_token)
            && (cur_tok <= zero_token + 9)
        ) {
            d = cur_tok - zero_token;
        } else if (radix == 16) {
            if (
                (cur_tok <= A_token + 5)
                && (cur_tok >= A_token)
            ) {
                d = cur_tok - A_token + 10;
            } else if (
                (cur_tok <= other_A_token + 5)
                && (cur_tok >= other_A_token)
            ) {
                d = cur_tok - other_A_token + 10;
            } else {
                goto done;
            }
        } else {
            goto done;
        }
        vacuous = false;
        if (
            (cur_val >= m)
            && ((cur_val > m) || (d > 7) || (radix != 10))
        ) {
            if (OK_so_far) {
                print_err(strpool!("Number too big"));
                help2(
                  strpool!("I can only go up to 2147483647='17777777777=\"7FFFFFFF,"),
                )(
                  strpool!("so I'm using that number instead of yours."),
                );
                error;
                cur_val = infinity;
                OK_so_far = false;
            }
        } else {
            cur_val = cur_val * radix + d;
        }
        get_x_token;
    }

    done:
⟧

480.

⟦480 Express astonishment that no number was here⟧ = ⟦
    {
        print_err(
          strpool!("Missing number, treated as zero"),
        );
        help3(
          strpool!("A number should have been here; I inserted `0'."),
        )(
          strpool!("(If you can't figure out why I needed to see a number,"),
        )(
          strpool!("look up `weird error' in the index to The TeXbook.)"),
        );
        back_error;
    }
⟧

481. The scan_dimen routine is similar to scan_int , but it sets cur_val to a scaled value, i.e., an integral number of sp. One of its main tasks is therefore to interpret the abbreviations for various kinds of units and to convert measurements to scaled points.

There are three parameters: mu is true if the finite units must be ‘mu’, while mu is false if ‘mu’ units are disallowed; inf is true if the infinite units ‘fil’, ‘fill’, ‘filll’ are permitted; and shortcut is true if cur_val already contains an integer and only the units need to be considered.

The order of infinity that was found in the case of infinite glue is returned in the global variable cur_order .

⟦13 Global variables⟧ += ⟦
    // order of infinity found by scan_dimen 
    var cur_order: glue_ord;
⟧

482. Constructions like ‘- 77 pt’ are legal dimensions, so scan_dimen may begin with scan_int . This explains why it is convenient to use scan_int also for the integer part of a decimal fraction.

Several branches of scan_dimen work with cur_val as an integer and with an auxiliary fraction f , so that the actual quantity of interest is 𝑐𝑢𝑟_𝑣𝑎𝑙+𝑓/216. At the end of the routine, this “unpacked” representation is put into the single word cur_val , which suddenly switches significance from integer to scaled .

// go here to pack cur_val and f into cur_val 
@define attach_fraction => 88
// go here when cur_val is correct except perhaps for sign
@define attach_sign => 89
@define scan_normal_dimen => scan_dimen(false, false, false)
// sets cur_val to a dimension
function xetex_scan_dimen(
  mu, inf, shortcut, requires_units: boolean,
) {
    label
        done,
        done1,
        done2,
        found,
        not_found,
        attach_fraction,
        attach_sign;
    var
      negative: boolean, // should the answer be negated?
      f: integer, // numerator of a fraction whose 
      // denominator is $2^{16}$
      ⟦485 Local variables for dimension calculations⟧;
    
    f = 0;
    arith_error = false;
    cur_order = normal;
    negative = false;
    if (!shortcut) {
        ⟦475 Get the next non-blank non-sign token; set |negative| appropriately⟧
        if (
            (cur_cmd >= min_internal)
            && (cur_cmd <= max_internal)
        ) {
            ⟦484 Fetch an internal dimension and |goto attach_sign|, or fetch an internal integer⟧
        } else {
            back_input;
            if (cur_tok == continental_point_token) {
                cur_tok = point_token;
            }
            if (cur_tok != point_token) {
                scan_int;
            } else {
                radix = 10;
                cur_val = 0;
            }
            if (cur_tok == continental_point_token) {
                cur_tok = point_token;
            }
            if ((radix == 10) && (cur_tok == point_token)) {
                ⟦487 Scan decimal fraction⟧
            }
        }
    }
    // in this case f == 0 
    if (cur_val < 0) {
        negative = !negative;
        negate(cur_val);
    }
    if (requires_units) {
        ⟦488 Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there are |x| sp per unit; |goto attach_sign| if the units are internal⟧
        ⟦477 Scan an optional space⟧
    } else {
        if (cur_val >= 0x4000) {
            arith_error = true;
        } else {
            cur_val = cur_val * unity + f;
        }
    }
  attach_sign:
    if (arith_error || (abs(cur_val) >= 0x40000000)) {
        ⟦495 Report that this dimension is out of range⟧
    }
    if (negative) {
        negate(cur_val);
    }
}

function scan_dimen(mu, inf, shortcut: boolean) {
    xetex_scan_dimen(mu, inf, shortcut, true);
}

483. For XeTeX, we have an additional version scan_decimal , like scan_dimen but without any scanning of units.

// sets cur_val to a quantity expressed as a decimal 
// fraction
function scan_decimal() {
    xetex_scan_dimen(false, false, false, false);
}

484.

⟦484 Fetch an internal dimension and |goto attach_sign|, or fetch an internal integer⟧ = ⟦
    if (mu) {
        scan_something_internal(mu_val, false);
        ⟦486 Coerce glue to a dimension⟧
        if (cur_val_level == mu_val) {
            goto attach_sign;
        }
        if (cur_val_level != int_val) {
            mu_error;
        }
    } else {
        scan_something_internal(dimen_val, false);
        if (cur_val_level == dimen_val) {
            goto attach_sign;
        }
    }
⟧

485.

⟦485 Local variables for dimension calculations⟧ = ⟦
    // conversion ratio for the scanned units
    var num, denom: 1 .. 65536;

    // number of digits in a decimal fraction
    var k, kk: small_number;

    // top of decimal digit stack
    var p, q: pointer;

    // an internal dimension
    var v: scaled;

    // temporary storage of cur_val 
    var save_cur_val: integer;
⟧

486. The following code is executed when scan_something_internal was called asking for mu_val , when we really wanted a “mudimen” instead of “muglue.”

⟦486 Coerce glue to a dimension⟧ = ⟦
    if (cur_val_level >= glue_val) {
        v = width(cur_val);
        delete_glue_ref(cur_val);
        cur_val = v;
    }
⟧

487. When the following code is executed, we have cur_tok == point_token , but this token has been backed up using back_input ; we must first discard it.

It turns out that a decimal point all by itself is equivalent to ‘0.0’. Let’s hope people don’t use that fact.

⟦487 Scan decimal fraction⟧ = ⟦
    {
        k = 0;
        p = null;
        //  point_token is being re-scanned
        get_token;
        loop {
            get_x_token;
            if (
                (cur_tok > zero_token + 9)
                || (cur_tok < zero_token)
            ) {
                goto done1;
            }
            // digits for k >= 17 cannot affect the result
            if (k < 17) {
                q = get_avail;
                link(q) = p;
                info(q) = cur_tok - zero_token;
                p = q;
                incr(k);
            }
        }
      done1:
        for (kk in k downto 1) {
            dig[kk - 1] = info(p);
            q = p;
            p = link(p);
            free_avail(q);
        }
        f = round_decimals(k);
        if (cur_cmd != spacer) {
            back_input;
        }
    }
⟧

488. Now comes the harder part: At this point in the program, cur_val is a nonnegative integer and 𝑓/216 is a nonnegative fraction less than 1; we want to multiply the sum of these two quantities by the appropriate factor, based on the specified units, in order to produce a scaled result, and we want to do the calculation with fixed point arithmetic that does not overflow.

⟦488 Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there are |x| sp per unit; |goto attach_sign| if the units are internal⟧ = ⟦
    if (inf) {
        ⟦489 Scan for \(f)\.{fil} units; |goto attach_fraction| if found⟧
    }

    ⟦490 Scan for \(u)units that are internal dimensions; |goto attach_sign| with |cur_val| set if found⟧

    if (mu) {
        ⟦491 Scan for \(m)\.{mu} units and |goto attach_fraction|⟧
    }

    if (scan_keyword(strpool!("true"))) {
        ⟦492 Adjust \(f)for the magnification ratio⟧
    }

    if (scan_keyword(strpool!("pt"))) {
        // the easy case
        goto attach_fraction;
    }

    ⟦493 Scan for \(a)all other units and adjust |cur_val| and |f| accordingly; |goto done| in the case of scaled points⟧

    attach_fraction:

    if (cur_val >= 0x4000) {
        arith_error = true;
    } else {
        cur_val = cur_val * unity + f;
    }

    done:
⟧

489. A specification like ‘filllll’ or ‘fill L L L’ will lead to two error messages (one for each additional keyword "l").

⟦489 Scan for \(f)\.{fil} units; |goto attach_fraction| if found⟧ = ⟦
    if (scan_keyword(strpool!("fil"))) {
        cur_order = fil;
        while (scan_keyword(ord!("l"))) {
            if (cur_order == filll) {
                print_err(
                  strpool!("Illegal unit of measure ("),
                );
                print(strpool!("replaced by filll)"));
                help1(
                  strpool!("I dddon't go any higher than filll."),
                );
                error;
            } else {
                incr(cur_order);
            }
        }
        goto attach_fraction;
    }
⟧

490.

⟦490 Scan for \(u)units that are internal dimensions; |goto attach_sign| with |cur_val| set if found⟧ = ⟦
    save_cur_val = cur_val

    ⟦440 Get the next non-blank non-call token⟧

    if ((cur_cmd < min_internal) || (cur_cmd > max_internal)) {
        back_input;
    } else {
        if (mu) {
            scan_something_internal(mu_val, false);
            ⟦486 Coerce glue to a dimension⟧
            if (cur_val_level != mu_val) {
                mu_error;
            }
        } else {
            scan_something_internal(dimen_val, false);
        }
        v = cur_val;
        goto found;
    }

    if (mu) {
        goto not_found;
    }

    if (scan_keyword(strpool!("em"))) {
        v = (⟦593 The em width for |cur_font|⟧);
    } else if (scan_keyword(strpool!("ex"))) {
        v = (⟦594 The x-height for |cur_font|⟧);
    } else {
        goto not_found;
    }

    ⟦477 Scan an optional space⟧

    found:

    cur_val = nx_plus_y(
      save_cur_val,
      v,
      xn_over_d(v, f, 0x10000),
    )

    goto attach_sign

    not_found:
⟧

491.

⟦491 Scan for \(m)\.{mu} units and |goto attach_fraction|⟧ = ⟦
    if (scan_keyword(strpool!("mu"))) {
        goto attach_fraction;
    } else {
        print_err(strpool!("Illegal unit of measure ("));
        print(strpool!("mu inserted)"));
        help4(
          strpool!("The unit of measurement in math glue must be mu."),
        )(
          strpool!("To recover gracefully from this error, it's best to"),
        )(
          strpool!("delete the erroneous units; e.g., type `2' to delete"),
        )(
          strpool!("two letters. (See Chapter 27 of The TeXbook.)"),
        );
        error;
        goto attach_fraction;
    }
⟧

492.

⟦492 Adjust \(f)for the magnification ratio⟧ = ⟦
    {
        prepare_mag;
        if (mag != 1000) {
            cur_val = xn_over_d(cur_val, 1000, mag);
            f = (1000 * f + 0x10000 * remainder) div mag;
            cur_val = cur_val + (f div 0x10000);
            f = f % 0x10000;
        }
    }
⟧

493. The necessary conversion factors can all be specified exactly as fractions whose numerator and denominator sum to 32768 or less. According to the definitions here, 2660dd1000.33297mm; this agrees well with the value 1000.333mm cited by Bosshard in Technische Grundlagen zur Satzherstellung (Bern, 1980).

@define set_conversion_end(#) =>
    /*... opened earlier ...*/
        denom = #;
    }
@define set_conversion(#) =>
    {
        num = #;
        set_conversion_end
    /* ... closed later ... */
⟦493 Scan for \(a)all other units and adjust |cur_val| and |f| accordingly; |goto done| in the case of scaled points⟧ = ⟦
    if (scan_keyword(strpool!("in"))) {
        set_conversion(7227)(100);
    } else if (scan_keyword(strpool!("pc"))) {
        set_conversion(12)(1);
    } else if (scan_keyword(strpool!("cm"))) {
        set_conversion(7227)(254);
    } else if (scan_keyword(strpool!("mm"))) {
        set_conversion(7227)(2540);
    } else if (scan_keyword(strpool!("bp"))) {
        set_conversion(7227)(7200);
    } else if (scan_keyword(strpool!("dd"))) {
        set_conversion(1238)(1157);
    } else if (scan_keyword(strpool!("cc"))) {
        set_conversion(14856)(1157);
    } else if (scan_keyword(strpool!("sp"))) {
        goto done;
    } else {
        ⟦494 Complain about unknown unit and |goto done2|⟧
    }

    cur_val = xn_over_d(cur_val, num, denom)

    f = (num * f + 0x10000 * remainder) div denom

    cur_val = cur_val + (f div 0x10000)

    f = f % 0x10000

    done2:
⟧

494.

⟦494 Complain about unknown unit and |goto done2|⟧ = ⟦
    {
        print_err(strpool!("Illegal unit of measure ("));
        print(strpool!("pt inserted)"));
        help6(
          strpool!("Dimensions can be in units of em, ex, in, pt, pc,"),
        )(
          strpool!("cm, mm, dd, cc, bp, or sp; but yours is a new one!"),
        )(
          strpool!("I'll assume that you meant to say pt, for printer's points."),
        )(
          strpool!("To recover gracefully from this error, it's best to"),
        )(
          strpool!("delete the erroneous units; e.g., type `2' to delete"),
        )(
          strpool!("two letters. (See Chapter 27 of The TeXbook.)"),
        );
        error;
        goto done2;
    }
⟧

495.

⟦495 Report that this dimension is out of range⟧ = ⟦
    {
        print_err(strpool!("Dimension too large"));
        help2(
          strpool!("I can't work with sizes bigger than about 19 feet."),
        )(
          strpool!("Continue and I'll use the largest value I can."),
        );
        error;
        cur_val = max_dimen;
        arith_error = false;
    }
⟧

496. The final member of TEX’s value-scanning trio is scan_glue , which makes cur_val point to a glue specification. The reference count of that glue spec will take account of the fact that cur_val is pointing to it.

The level parameter should be either glue_val or mu_val .

Since scan_dimen was so much more complex than scan_int , we might expect scan_glue to be even worse. But fortunately, it is very simple, since most of the work has already been done.

// sets cur_val to a glue spec pointer
function scan_glue(level: small_number) {
    label exit;
    var
      negative: boolean, // should the answer be negated?
      q: pointer, // new glue specification
      mu: boolean; // does level == mu_val ?
    
    mu = (level == mu_val);
    ⟦475 Get the next non-blank non-sign token; set |negative| appropriately⟧
    if (
        (cur_cmd >= min_internal)
        && (cur_cmd <= max_internal)
    ) {
        scan_something_internal(level, negative);
        if (cur_val_level >= glue_val) {
            if (cur_val_level != level) {
                mu_error;
            }
            return;
        }
        if (cur_val_level == int_val) {
            scan_dimen(mu, false, true);
        } else if (level == mu_val) {
            mu_error;
        }
    } else {
        back_input;
        scan_dimen(mu, false, false);
        if (negative) {
            negate(cur_val);
        }
    }
    ⟦497 Create a new glue specification whose width is |cur_val|; scan for its stretch and shrink components⟧
  exit:
}

⟦1593 Declare procedures needed for expressions⟧

497.

⟦497 Create a new glue specification whose width is |cur_val|; scan for its stretch and shrink components⟧ = ⟦
    q = new_spec(zero_glue)

    width(q) = cur_val

    if (scan_keyword(strpool!("plus"))) {
        scan_dimen(mu, true, false);
        stretch(q) = cur_val;
        stretch_order(q) = cur_order;
    }

    if (scan_keyword(strpool!("minus"))) {
        scan_dimen(mu, true, false);
        shrink(q) = cur_val;
        shrink_order(q) = cur_order;
    }

    cur_val = q
⟧

498. Here’s a similar procedure that returns a pointer to a rule node. This routine is called just after TEX has seen \hrule or \vrule; therefore cur_cmd will be either hrule or vrule . The idea is to store the default rule dimensions in the node, then to override them if ‘height’ or ‘width’ or ‘depth’ specifications are found (in any order).

@define default_rule => 26214 // 0.4\thinspace pt
function scan_rule_spec(): pointer {
    label reswitch;
    var
      q: pointer; // the rule node being created
    
    //  width , depth , and height all equal null_flag now
    q = new_rule;
    if (cur_cmd == vrule) {
        width(q) = default_rule;
    } else {
        height(q) = default_rule;
        depth(q) = 0;
    }
  reswitch:
    if (scan_keyword(strpool!("width"))) {
        scan_normal_dimen;
        width(q) = cur_val;
        goto reswitch;
    }
    if (scan_keyword(strpool!("height"))) {
        scan_normal_dimen;
        height(q) = cur_val;
        goto reswitch;
    }
    if (scan_keyword(strpool!("depth"))) {
        scan_normal_dimen;
        depth(q) = cur_val;
        goto reswitch;
    }
    scan_rule_spec = q;
}

499. [27] Building token lists. The token lists for macros and for other things like \mark and \output and \write are produced by a procedure called scan_toks .

Before we get into the details of scan_toks , let’s consider a much simpler task, that of converting the current string into a token list. The str_toks function does this; it classifies spaces as type spacer and everything else as type other_char .

The token list created by str_toks begins at link(temp_head) and ends at the value p that is returned. (If p == temp_head , the list is empty.)

The str_toks_cat function is the same, except that the catcode cat is stamped on all the characters, unless zero is passed in which case it chooses spacer or other_char automatically.

⟦1493 Declare \eTeX\ procedures for token lists⟧

// changes the string str_pool [ b .. pool_ptr ] to a token 
// list
function str_toks_cat(
  b: pool_pointer,
  cat: small_number,
): pointer {
    var
      p: pointer, // tail of the token list
      q: pointer, // new node being added to the token list 
      // via store_new_token 
      t: halfword, // token being appended
      k: pool_pointer; // index into str_pool 
    
    str_room(1);
    p = temp_head;
    link(p) = null;
    k = b;
    while (k < pool_ptr) {
        t = so(str_pool[k]);
        if ((t == ord!(" ")) && (cat == 0)) {
            t = space_token;
        } else {
            if (
                (t >= 0xd800)
                && (t <= 0xdbff)
                && (k + 1 < pool_ptr)
                && (so(str_pool[k + 1]) >= 0xdc00)
                && (so(str_pool[k + 1]) <= 0xdfff)
            ) {
                incr(k);
                t = 
                    0x10000
                    + (t - 0xd800)
                    * 0x400 + (so(str_pool[k]) - 0xdc00)
                ;
            }
            if (cat == 0) {
                t = other_token + t;
            } else if (cat == active_char) {
                t = cs_token_flag + active_base + t;
            } else {
                t = max_char_val * cat + t;
            }
        }
        fast_store_new_token(t);
        incr(k);
    }
    pool_ptr = b;
    str_toks_cat = p;
}

function str_toks(b: pool_pointer): pointer {
    str_toks = str_toks_cat(b, 0);
}

500. The main reason for wanting str_toks is the next function, the_toks , which has similar input/output characteristics.

This procedure is supposed to scan something like ‘\skip\count12’, i.e., whatever can follow ‘\the’, and it constructs a token list containing something like ‘-3.0pt minus 0.5fill’.

function the_toks(): pointer {
    label exit;
    var
      old_setting: 0 .. max_selector, // holds selector 
      // setting
      p, q, r: pointer, // used for copying a token list
      b: pool_pointer, // base of temporary string
      c: small_number; // value of cur_chr 
    
    ⟦1498 Handle \.{\\unexpanded} or \.{\\detokenize} and |return|⟧
    get_x_token;
    scan_something_internal(tok_val, false);
    if (cur_val_level >= ident_val) {
        ⟦501 Copy the token list⟧
    } else {
        old_setting = selector;
        selector = new_string;
        b = pool_ptr;
        case cur_val_level {
          int_val:
            print_int(cur_val);
          dimen_val:
            print_scaled(cur_val);
            print(strpool!("pt"));
          glue_val:
            print_spec(cur_val, strpool!("pt"));
            delete_glue_ref(cur_val);
          mu_val:
            print_spec(cur_val, strpool!("mu"));
            delete_glue_ref(cur_val);
          // there are no other cases
        }
        selector = old_setting;
        the_toks = str_toks(b);
    }
  exit:
}

501.

⟦501 Copy the token list⟧ = ⟦
    {
        p = temp_head;
        link(p) = null;
        if (cur_val_level == ident_val) {
            store_new_token(cs_token_flag + cur_val);
        } else if (cur_val != null) {
            // do not copy the reference count
            r = link(cur_val);
            while (r != null) {
                fast_store_new_token(info(r));
                r = link(r);
            }
        }
        the_toks = p;
    }
⟧

502. Here’s part of the expand subroutine that we are now ready to complete:

function ins_the_toks() {
    link(garbage) = the_toks;
    ins_list(link(temp_head));
}

503. The primitives \number, \romannumeral, \string, \meaning, \fontname, and \jobname are defined as follows.

𝜀-TEX adds \eTeXrevision such that job_name_code remains last.

@define number_code => 0 // command code for \.{\\number}
// command code for \.{\\romannumeral}
@define roman_numeral_code => 1
@define string_code => 2 // command code for \.{\\string}
@define meaning_code => 3 // command code for \.{\\meaning}
// command code for \.{\\fontname}
@define font_name_code => 4
// base for \eTeX's command codes
@define etex_convert_base => 5
// command code for \.{\\eTeXrevision}
@define eTeX_revision_code => etex_convert_base
// end of \eTeX's command codes
@define etex_convert_codes => etex_convert_base + 1
// command code for \.{\\expanded}
@define expanded_code => etex_convert_codes
// base for \pdfTeX's command codes
@define pdftex_first_expand_code => expanded_code + 1
// command code for \.{\\leftmarginkern}
@define left_margin_kern_code => pdftex_first_expand_code + 9
// command code for \.{\\rightmarginkern}
@define right_margin_kern_code =>
    pdftex_first_expand_code + 10
// command code for \.{\\strcmp}
@define pdf_strcmp_code => pdftex_first_expand_code + 11
// command code for \.{\\creationdate}
@define pdf_creation_date_code =>
    pdftex_first_expand_code + 15
// command code for \.{\\filemoddate}
@define pdf_file_mod_date_code =>
    pdftex_first_expand_code + 16
// command code for \.{\\filesize}
@define pdf_file_size_code => pdftex_first_expand_code + 17
// command code for \.{\\mdfivesum}
@define pdf_mdfive_sum_code => pdftex_first_expand_code + 18
// command code for \.{\\filedump}
@define pdf_file_dump_code => pdftex_first_expand_code + 19
// command code for \.{\\uniformdeviate}
@define uniform_deviate_code => pdftex_first_expand_code + 22
// command code for \.{\\normaldeviate}
@define normal_deviate_code => pdftex_first_expand_code + 23
// end of \pdfTeX's command codes
@define pdftex_convert_codes => pdftex_first_expand_code + 26
// base for \XeTeX's command codes
@define XeTeX_first_expand_code => pdftex_convert_codes
// command code for \.{\\XeTeXrevision}
@define XeTeX_revision_code => XeTeX_first_expand_code + 0
// command code for \.{\\XeTeXvariationname}
@define XeTeX_variation_name_code =>
    XeTeX_first_expand_code + 1
// command code for \.{\\XeTeXfeaturename}
@define XeTeX_feature_name_code =>
    XeTeX_first_expand_code + 2
// command code for \.{\\XeTeXselectornamename}
@define XeTeX_selector_name_code =>
    XeTeX_first_expand_code + 3
// command code for \.{\\XeTeXglyphname}
@define XeTeX_glyph_name_code => XeTeX_first_expand_code + 4
// command code for \.{\\Uchar}
@define XeTeX_Uchar_code => XeTeX_first_expand_code + 5
// command code for \.{\\Ucharcat}
@define XeTeX_Ucharcat_code => XeTeX_first_expand_code + 6
// end of \XeTeX's command codes
@define XeTeX_convert_codes => XeTeX_first_expand_code + 7
// command code for \.{\\jobname}
@define job_name_code => XeTeX_convert_codes
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("number"), convert, number_code)

    primitive(
      strpool!("romannumeral"),
      convert,
      roman_numeral_code,
    )

    primitive(strpool!("string"), convert, string_code)

    primitive(strpool!("meaning"), convert, meaning_code)

    primitive(strpool!("fontname"), convert, font_name_code)

    primitive(strpool!("expanded"), convert, expanded_code)

    primitive(
      strpool!("leftmarginkern"),
      convert,
      left_margin_kern_code,
    )

    primitive(
      strpool!("rightmarginkern"),
      convert,
      right_margin_kern_code,
    )

    primitive(
      strpool!("creationdate"),
      convert,
      pdf_creation_date_code,
    )

    primitive(
      strpool!("filemoddate"),
      convert,
      pdf_file_mod_date_code,
    )

    primitive(
      strpool!("filesize"),
      convert,
      pdf_file_size_code,
    )

    primitive(
      strpool!("mdfivesum"),
      convert,
      pdf_mdfive_sum_code,
    )

    primitive(
      strpool!("filedump"),
      convert,
      pdf_file_dump_code,
    )

    primitive(strpool!("strcmp"), convert, pdf_strcmp_code)

    primitive(
      strpool!("uniformdeviate"),
      convert,
      uniform_deviate_code,
    )

    primitive(
      strpool!("normaldeviate"),
      convert,
      normal_deviate_code,
    )

    primitive(strpool!("jobname"), convert, job_name_code)

    primitive(strpool!("Uchar"), convert, XeTeX_Uchar_code)

    primitive(
      strpool!("Ucharcat"),
      convert,
      XeTeX_Ucharcat_code,
    )
⟧

504.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    convert:

    case chr_code {
      number_code:
        print_esc(strpool!("number"));
      roman_numeral_code:
        print_esc(strpool!("romannumeral"));
      string_code:
        print_esc(strpool!("string"));
      meaning_code:
        print_esc(strpool!("meaning"));
      font_name_code:
        print_esc(strpool!("fontname"));
      eTeX_revision_code:
        print_esc(strpool!("eTeXrevision"));
      expanded_code:
        print_esc(strpool!("expanded"));
      left_margin_kern_code:
        print_esc(strpool!("leftmarginkern"));
      right_margin_kern_code:
        print_esc(strpool!("rightmarginkern"));
      pdf_creation_date_code:
        print_esc(strpool!("creationdate"));
      pdf_file_mod_date_code:
        print_esc(strpool!("filemoddate"));
      pdf_file_size_code:
        print_esc(strpool!("filesize"));
      pdf_mdfive_sum_code:
        print_esc(strpool!("mdfivesum"));
      pdf_file_dump_code:
        print_esc(strpool!("filedump"));
      pdf_strcmp_code:
        print_esc(strpool!("strcmp"));
      uniform_deviate_code:
        print_esc(strpool!("uniformdeviate"));
      normal_deviate_code:
        print_esc(strpool!("normaldeviate"));
      ⟦1459 Cases of |convert| for |print_cmd_chr|⟧
      othercases:
        print_esc(strpool!("jobname"));
    }
⟧

505. The procedure conv_toks uses str_toks to insert the token list for convert functions into the scanner; ‘\outer’ control sequences are allowed to follow ‘\string’ and ‘\meaning’.

The extra temp string u is needed because pdf_scan_ext_toks incorporates any pending string in its output. In order to save such a pending string, we have to create a temporary string that is destroyed immediately after.

@define save_cur_string =>
    if (str_start_macro(str_ptr) < pool_ptr) {
        u = make_string;
    } else {
        u = 0;
    }
@define restore_cur_string =>
    if (u != 0) {
        decr(str_ptr);
    }
function conv_toks() {
    var
      old_setting: 0 .. max_selector, // holds selector 
      // setting
      save_warning_index, save_def_ref: pointer,
      boolvar: boolean, // temp boolean
      s: str_number,
      u: str_number,
      j: integer,
      c: small_number, // desired type of conversion
      save_scanner_status: small_number, //  scanner_status 
      // upon entry
      b: pool_pointer, // base of temporary string
      fnt, arg1, arg2: integer, // args for \XeTeX\ 
      // extensions
      font_name_str: str_number, // local vars for 
      // \.{\\fontname} quoting extension
      i: small_number,
      quote_char: UTF16_code,
      cat: small_number, // desired catcode, or 0 for 
      // automatic spacer / other_char selection
      saved_chr: UnicodeScalar,
      p, q: pointer;
    
    cat = 0;
    c = cur_chr;
    ⟦506 Scan the argument for command |c|⟧
    old_setting = selector;
    selector = new_string;
    b = pool_ptr;
    ⟦507 Print the result of command |c|⟧
    selector = old_setting;
    link(garbage) = str_toks_cat(b, cat);
    ins_list(link(temp_head));
}

506. Not all catcode values are allowed by \Ucharcat:

@define illegal_Ucharcat_catcode(#) =>
    
        (# < left_brace)
        || (# > active_char)
        || (# == out_param) || (# == ignore)
⟦506 Scan the argument for command |c|⟧ = ⟦
    case c {
      number_code, roman_numeral_code:
        scan_int;
      string_code, meaning_code:
        save_scanner_status = scanner_status;
        scanner_status = normal;
        get_token;
        scanner_status = save_scanner_status;
      font_name_code:
        scan_font_ident;
      eTeX_revision_code:
        do_nothing;
      expanded_code:
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        save_def_ref = def_ref;
        save_cur_string;
        scan_pdf_ext_toks;
        warning_index = save_warning_index;
        scanner_status = save_scanner_status;
        ins_list(link(def_ref));
        free_avail(def_ref);
        def_ref = save_def_ref;
        restore_cur_string;
        return;
      left_margin_kern_code, right_margin_kern_code:
        scan_register_num;
        fetch_box(p);
        if ((p == null) || (type(p) != hlist_node)) {
            pdf_error(
              strpool!("marginkern"),
              strpool!("a non-empty hbox expected"),
            );
        }
      pdf_creation_date_code:
        b = pool_ptr;
        getcreationdate;
        link(garbage) = str_toks(b);
        ins_list(link(temp_head));
        return;
      pdf_file_mod_date_code:
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        save_def_ref = def_ref;
        save_cur_string;
        scan_pdf_ext_toks;
        if (selector == new_string) {
            pdf_error(
              strpool!("tokens"),
              strpool!("tokens_to_string() called while selector = new_string"),
            );
        }
        old_setting = selector;
        selector = new_string;
        show_token_list(
          link(def_ref),
          null,
          pool_size - pool_ptr,
        );
        selector = old_setting;
        s = make_string;
        delete_token_ref(def_ref);
        def_ref = save_def_ref;
        warning_index = save_warning_index;
        scanner_status = save_scanner_status;
        b = pool_ptr;
        getfilemoddate(s);
        link(garbage) = str_toks(b);
        if (flushable(s)) {
            flush_string;
        }
        ins_list(link(temp_head));
        restore_cur_string;
        return;
      pdf_file_size_code:
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        save_def_ref = def_ref;
        save_cur_string;
        scan_pdf_ext_toks;
        if (selector == new_string) {
            pdf_error(
              strpool!("tokens"),
              strpool!("tokens_to_string() called while selector = new_string"),
            );
        }
        old_setting = selector;
        selector = new_string;
        show_token_list(
          link(def_ref),
          null,
          pool_size - pool_ptr,
        );
        selector = old_setting;
        s = make_string;
        delete_token_ref(def_ref);
        def_ref = save_def_ref;
        warning_index = save_warning_index;
        scanner_status = save_scanner_status;
        b = pool_ptr;
        getfilesize(s);
        link(garbage) = str_toks(b);
        if (flushable(s)) {
            flush_string;
        }
        ins_list(link(temp_head));
        restore_cur_string;
        return;
      pdf_mdfive_sum_code:
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        save_def_ref = def_ref;
        save_cur_string;
        boolvar = scan_keyword(strpool!("file"));
        scan_pdf_ext_toks;
        if (selector == new_string) {
            pdf_error(
              strpool!("tokens"),
              strpool!("tokens_to_string() called while selector = new_string"),
            );
        }
        old_setting = selector;
        selector = new_string;
        show_token_list(
          link(def_ref),
          null,
          pool_size - pool_ptr,
        );
        selector = old_setting;
        s = make_string;
        delete_token_ref(def_ref);
        def_ref = save_def_ref;
        warning_index = save_warning_index;
        scanner_status = save_scanner_status;
        b = pool_ptr;
        getmd5sum(s, boolvar);
        link(garbage) = str_toks(b);
        if (flushable(s)) {
            flush_string;
        }
        ins_list(link(temp_head));
        restore_cur_string;
        return;
      pdf_file_dump_code:
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        save_def_ref = def_ref;
        // scan offset
        save_cur_string;
        cur_val = 0;
        if ((scan_keyword(strpool!("offset")))) {
            scan_int;
            if ((cur_val < 0)) {
                print_err(strpool!("Bad file offset"));
                help2(
                  strpool!("A file offset must be between 0 and 2^{31}-1,"),
                )(strpool!("I changed this one to zero."));
                int_error(cur_val);
                cur_val = 0;
            }
        }
        // scan length
        i = cur_val;
        cur_val = 0;
        if ((scan_keyword(strpool!("length")))) {
            scan_int;
            if ((cur_val < 0)) {
                print_err(strpool!("Bad dump length"));
                help2(
                  strpool!("A dump length must be between 0 and 2^{31}-1,"),
                )(strpool!("I changed this one to zero."));
                int_error(cur_val);
                cur_val = 0;
            }
        }
        // scan file name
        j = cur_val;
        scan_pdf_ext_toks;
        if (selector == new_string) {
            pdf_error(
              strpool!("tokens"),
              strpool!("tokens_to_string() called while selector = new_string"),
            );
        }
        old_setting = selector;
        selector = new_string;
        show_token_list(
          link(def_ref),
          null,
          pool_size - pool_ptr,
        );
        selector = old_setting;
        s = make_string;
        delete_token_ref(def_ref);
        def_ref = save_def_ref;
        warning_index = save_warning_index;
        scanner_status = save_scanner_status;
        b = pool_ptr;
        getfiledump(s, i, j);
        link(garbage) = str_toks(b);
        if (flushable(s)) {
            flush_string;
        }
        ins_list(link(temp_head));
        restore_cur_string;
        return;
      pdf_strcmp_code:
        save_scanner_status = scanner_status;
        save_warning_index = warning_index;
        save_def_ref = def_ref;
        save_cur_string;
        compare_strings;
        def_ref = save_def_ref;
        warning_index = save_warning_index;
        scanner_status = save_scanner_status;
        restore_cur_string;
      XeTeX_Uchar_code:
        scan_usv_num;
      XeTeX_Ucharcat_code:
        scan_usv_num;
        saved_chr = cur_val;
        scan_int;
        if (illegal_Ucharcat_catcode(cur_val)) {
            print_err(strpool!("Invalid code ("));
            print_int(cur_val);
            print(
              strpool!("), should be in the ranges 1..4, 6..8, 10..13"),
            );
            help1(
              strpool!("I'm going to use 12 instead of that illegal code value."),
            );
            error;
            cat = 12;
        } else {
            cat = cur_val;
        }
        cur_val = saved_chr;
      ⟦1460 Cases of `Scan the argument for command |c|'⟧
      job_name_code:
        if (job_name == 0) {
            open_log_file;
        }
      uniform_deviate_code:
        scan_int;
      normal_deviate_code:
        do_nothing;// there are no other cases
    }
⟧

507.

⟦507 Print the result of command |c|⟧ = ⟦
    case c {
      number_code:
        print_int(cur_val);
      roman_numeral_code:
        print_roman_int(cur_val);
      string_code:
        if (cur_cs != 0) {
            sprint_cs(cur_cs);
        } else {
            print_char(cur_chr);
        }
      meaning_code:
        print_meaning;
      font_name_code:
        font_name_str = font_name[cur_val];
        if (is_native_font(cur_val)) {
            quote_char = ord!("\"");
            for (i in 0 to length(font_name_str) - 1) {
                if (
                    str_pool[
                      str_start_macro(font_name_str) + i,
                    ]
                    == ord!("\"")
                ) {
                    quote_char = ord!("'");
                }
            }
            print_char(quote_char);
            print(font_name_str);
            print_char(quote_char);
        } else {
            print(font_name_str);
        }
        if (font_size[cur_val] != font_dsize[cur_val]) {
            print(strpool!(" at "));
            print_scaled(font_size[cur_val]);
            print(strpool!("pt"));
        }
      eTeX_revision_code:
        print(eTeX_revision);
      left_margin_kern_code:
        p = list_ptr(p);
        while (
            (p != null)
            && (
                cp_skipable(p)
                || (
                    (!is_char_node(p))
                    && (type(p) == glue_node)
                    && (subtype(p) == left_skip_code + 1)
                )
            )
        ) {
            p = link(p);
        }
        if (
            (p != null)
            && (!is_char_node(p))
            && (type(p) == margin_kern_node)
            && (subtype(p) == left_side)
        ) {
            print_scaled(width(p));
        } else {
            print(ord!("0"));
        }
        print(strpool!("pt"));
      right_margin_kern_code:
        q = list_ptr(p);
        p = prev_rightmost(q, null);
        while (
            (p != null)
            && (
                cp_skipable(p)
                || (
                    (!is_char_node(p))
                    && (type(p) == glue_node)
                    && (subtype(p) == right_skip_code + 1)
                )
            )
        ) {
            p = prev_rightmost(q, p);
        }
        if (
            (p != null)
            && (!is_char_node(p))
            && (type(p) == margin_kern_node)
            && (subtype(p) == right_side)
        ) {
            print_scaled(width(p));
        } else {
            print(ord!("0"));
        }
        print(strpool!("pt"));
      pdf_strcmp_code:
        print_int(cur_val);
      uniform_deviate_code:
        print_int(unif_rand(cur_val));
      normal_deviate_code:
        print_int(norm_rand);
      XeTeX_Uchar_code, XeTeX_Ucharcat_code:
        print_char(cur_val);
      ⟦1461 Cases of `Print the result of command |c|'⟧
      job_name_code:
        print_file_name(job_name, 0, 0);// there are no 
      // other cases
    }
⟧

508. Now we can’t postpone the difficulties any longer; we must bravely tackle scan_toks . This function returns a pointer to the tail of a new token list, and it also makes def_ref point to the reference count at the head of that list.

There are two boolean parameters, macro_def and xpand . If macro_def is true, the goal is to create the token list for a macro definition; otherwise the goal is to create the token list for some other TEX primitive: \mark, \output, \everypar, \lowercase, \uppercase, \message, \errmessage, \write, or \special. In the latter cases a left brace must be scanned next; this left brace will not be part of the token list, nor will the matching right brace that comes at the end. If xpand is false, the token list will simply be copied from the input using get_token . Otherwise all expandable tokens will be expanded until unexpandable tokens are left, except that the results of expanding ‘\the’ are not expanded further. If both macro_def and xpand are true, the expansion applies only to the macro body (i.e., to the material following the first left_brace character).

The value of cur_cs when scan_toks begins should be the eqtb address of the control sequence to display in “runaway” error messages.

function scan_toks(macro_def, xpand: boolean): pointer {
    label found, continue, done, done1, done2;
    var
      t: halfword, // token representing the highest 
      // parameter number
      s: halfword, // saved token
      p: pointer, // tail of the token list being built
      q: pointer, // new node being added to the token list 
      // via store_new_token 
      unbalance: halfword, // number of unmatched left 
      // braces
      hash_brace: halfword; // possible `\.{\#\{}' token
    
    if (macro_def) {
        scanner_status = defining;
    } else {
        scanner_status = absorbing;
    }
    warning_index = cur_cs;
    def_ref = get_avail;
    token_ref_count(def_ref) = null;
    p = def_ref;
    hash_brace = 0;
    t = zero_token;
    if (macro_def) {
        ⟦509 Scan and build the parameter part of the macro definition⟧
    } else {
        // remove the compulsory left brace
        scan_left_brace;
    }
    ⟦512 Scan and build the body of the token list; |goto found| when finished⟧
  found:
    scanner_status = normal;
    if (hash_brace != 0) {
        store_new_token(hash_brace);
    }
    scan_toks = p;
}

509.

⟦509 Scan and build the parameter part of the macro definition⟧ = ⟦
    {
        loop {
          continue:
            // set cur_cmd , cur_chr , cur_tok 
            get_token;
            if (cur_tok < right_brace_limit) {
                goto done1;
            }
            if (cur_cmd == mac_param) {
                ⟦511 If the next character is a parameter number, make |cur_tok| a |match| token; but if it is a left brace, store `|left_brace|, |end_match|', set |hash_brace|, and |goto done|⟧
            }
            store_new_token(cur_tok);
        }
      done1:
        store_new_token(end_match_token);
        if (cur_cmd == right_brace) {
            ⟦510 Express shock at the missing left brace; |goto found|⟧
        }
      done:
    }
⟧

510.

⟦510 Express shock at the missing left brace; |goto found|⟧ = ⟦
    {
        print_err(strpool!("Missing { inserted"));
        incr(align_state);
        help2(
          strpool!("Where was the left brace? You said something like `\\def\\a}',"),
        )(
          strpool!("which I'm going to interpret as `\\def\\a{}'."),
        );
        error;
        goto found;
    }
⟧

511.

⟦511 If the next character is a parameter number, make |cur_tok| a |match| token; but if it is a left brace, store `|left_brace|, |end_match|', set |hash_brace|, and |goto done|⟧ = ⟦
    {
        s = match_token + cur_chr;
        get_token;
        if (cur_tok < left_brace_limit) {
            hash_brace = cur_tok;
            store_new_token(cur_tok);
            store_new_token(end_match_token);
            goto done;
        }
        if (t == zero_token + 9) {
            print_err(
              strpool!("You already have nine parameters"),
            );
            help2(
              strpool!("I'm going to ignore the # sign you just used,"),
            )(
              strpool!("as well as the token that followed it."),
            );
            error;
            goto continue;
        } else {
            incr(t);
            if (cur_tok != t) {
                print_err(
                  strpool!("Parameters must be numbered consecutively"),
                );
                help2(
                  strpool!("I've inserted the digit you should have used after the #."),
                )(
                  strpool!("Type `1' to delete what you did use."),
                );
                back_error;
            }
            cur_tok = s;
        }
    }
⟧

512.

⟦512 Scan and build the body of the token list; |goto found| when finished⟧ = ⟦
    unbalance = 1

    loop {
        if (xpand) {
            ⟦513 Expand the next part of the input⟧
        } else {
            get_token;
        }
        if (cur_tok < right_brace_limit) {
            if (cur_cmd < right_brace) {
                incr(unbalance);
            } else {
                decr(unbalance);
                if (unbalance == 0) {
                    goto found;
                }
            }
        } else if (cur_cmd == mac_param) {
            if (macro_def) {
                ⟦514 Look for parameter number or \.{\#\#}⟧
            }
        }
        store_new_token(cur_tok);
    }
⟧

513. Here we insert an entire token list created by the_toks without expanding it further.

⟦513 Expand the next part of the input⟧ = ⟦
    {
        loop {
            get_next;
            if (cur_cmd >= call) {
                if (info(link(cur_chr)) == protected_token) {
                    cur_cmd = relax;
                    cur_chr = no_expand_flag;
                }
            }
            if (cur_cmd <= max_command) {
                goto done2;
            }
            if (cur_cmd != the) {
                expand;
            } else {
                q = the_toks;
                if (link(temp_head) != null) {
                    link(p) = link(temp_head);
                    p = q;
                }
            }
        }
      done2:
        x_token;
    }
⟧

514.

⟦514 Look for parameter number or \.{\#\#}⟧ = ⟦
    {
        s = cur_tok;
        if (xpand) {
            get_x_token;
        } else {
            get_token;
        }
        if (cur_cmd != mac_param) {
            if ((cur_tok <= zero_token) || (cur_tok > t)) {
                print_err(
                  strpool!("Illegal parameter number in definition of "),
                );
                sprint_cs(warning_index);
                help3(
                  strpool!("You meant to type ## instead of #, right?"),
                )(
                  strpool!("Or maybe a } was forgotten somewhere earlier, and things"),
                )(
                  strpool!("are all screwed up? I'm going to assume that you meant ##."),
                );
                back_error;
                cur_tok = s;
            } else {
                cur_tok = 
                    out_param_token
                    - ord!("0") + cur_chr
                ;
            }
        }
    }
⟧

515. Another way to create a token list is via the \read command. The sixteen files potentially usable for reading appear in the following global variables. The value of read_open[n] will be closed if stream number n has not been opened or if it has been fully read; just_open if an \openin but not a \read has been done; and normal if it is open and ready to read the next line.

@define closed => 2 // not open, or at end of file
// newly opened, first line not yet read
@define just_open => 1
⟦13 Global variables⟧ += ⟦
    // used for \.{\\read}
    var read_file: array [0 .. 15] of unicode_file;

    // state of read_file [ n ] 
    var read_open: array [0 .. 16] of normal .. closed;
⟧

516.

⟦23 Set initial values of key variables⟧ += ⟦
    for (k in 0 to 16) {
        read_open[k] = closed;
    }
⟧

517. The read_toks procedure constructs a token list like that for any macro definition, and makes cur_val point to it. Parameter r points to the control sequence that will receive this token list.

function read_toks(n: integer, r: pointer, j: halfword) {
    label done;
    var
      p: pointer, // tail of the token list
      q: pointer, // new node being added to the token list 
      // via store_new_token 
      s: integer, // saved value of align_state 
      m: small_number; // stream number
    
    scanner_status = defining;
    warning_index = r;
    def_ref = get_avail;
    token_ref_count(def_ref) = null;
    // the reference count
    p = def_ref;
    store_new_token(end_match_token);
    if ((n < 0) || (n > 15)) {
        m = 16;
    } else {
        m = n;
    }
    s = align_state;
    // disable tab marks, etc.
    align_state = 1000000;
    repeat {
        ⟦518 Input and store tokens from the next line of the file⟧
    } until (align_state == 1000000);
    cur_val = def_ref;
    scanner_status = normal;
    align_state = s;
}

518.

⟦518 Input and store tokens from the next line of the file⟧ = ⟦
    begin_file_reading

    name = m + 1

    if (read_open[m] == closed) {
        ⟦519 Input for \.{\\read} from the terminal⟧
    } else if (read_open[m] == just_open) {
        ⟦520 Input the first line of |read_file[m]|⟧
    } else {
        ⟦521 Input the next line of |read_file[m]|⟧
    }

    limit = last

    if (end_line_char_inactive) {
        decr(limit);
    } else {
        buffer[limit] = end_line_char;
    }

    first = limit + 1

    loc = start

    state = new_line

    ⟦1572 Handle \.{\\readline} and |goto done|⟧

    loop {
        get_token;
        if (cur_tok == 0) {
            //  cur_cmd == cur_chr == 0 will occur at the 
            // end of the line
            goto done;
        }
        // unmatched `\.\}' aborts the line
        if (align_state < 1000000) {
            repeat {
                get_token;
            } until (cur_tok == 0);
            align_state = 1000000;
            goto done;
        }
        store_new_token(cur_tok);
    }

    var done: end_file_reading;
⟧

519. Here we input on-line into the buffer array, prompting the user explicitly if n >= 0 . The value of n is set negative so that additional prompts will not be given in the case of multi-line input.

⟦519 Input for \.{\\read} from the terminal⟧ = ⟦
    if (interaction > nonstop_mode) {
        if (n < 0) {
            prompt_input(strpool!(""));
        } else {
            wake_up_terminal;
            print_ln;
            sprint_cs(r);
            prompt_input(ord!("="));
            n = -1;
        }
    } else {
        limit = 0;
        fatal_error(
          strpool!("*** (cannot \\read from terminal in nonstop modes)"),
        );
    }
⟧

520. The first line of a file must be treated specially, since input_ln must be told not to start with get .

⟦520 Input the first line of |read_file[m]|⟧ = ⟦
    if (input_ln(read_file[m], false)) {
        read_open[m] = normal;
    } else {
        u_close(read_file[m]);
        read_open[m] = closed;
    }
⟧

521. An empty line is appended at the end of a read_file .

⟦521 Input the next line of |read_file[m]|⟧ = ⟦
    {
        if (!input_ln(read_file[m], true)) {
            u_close(read_file[m]);
            read_open[m] = closed;
            if (align_state != 1000000) {
                runaway;
                print_err(strpool!("File ended within "));
                print_esc(strpool!("read"));
                help1(
                  strpool!("This \\read has unbalanced braces."),
                );
                align_state = 1000000;
                limit = 0;
                error;
            }
        }
    }
⟧

522. [28] Conditional processing. We consider now the way TEX handles various kinds of \if commands.

// amount added for `\.{\\unless}' prefix
@define unless_code => 32
@define if_char_code => 0 // `\.{\\if}'
@define if_cat_code => 1 // `\.{\\ifcat}'
@define if_int_code => 2 // `\.{\\ifnum}'
@define if_dim_code => 3 // `\.{\\ifdim}'
@define if_odd_code => 4 // `\.{\\ifodd}'
@define if_vmode_code => 5 // `\.{\\ifvmode}'
@define if_hmode_code => 6 // `\.{\\ifhmode}'
@define if_mmode_code => 7 // `\.{\\ifmmode}'
@define if_inner_code => 8 // `\.{\\ifinner}'
@define if_void_code => 9 // `\.{\\ifvoid}'
@define if_hbox_code => 10 // `\.{\\ifhbox}'
@define if_vbox_code => 11 // `\.{\\ifvbox}'
@define ifx_code => 12 // `\.{\\ifx}'
@define if_eof_code => 13 // `\.{\\ifeof}'
@define if_true_code => 14 // `\.{\\iftrue}'
@define if_false_code => 15 // `\.{\\iffalse}'
@define if_case_code => 16 // `\.{\\ifcase}'
@define if_primitive_code => 21 // `\.{\\ifprimitive}'
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("if"), if_test, if_char_code)

    primitive(strpool!("ifcat"), if_test, if_cat_code)

    primitive(strpool!("ifnum"), if_test, if_int_code)

    primitive(strpool!("ifdim"), if_test, if_dim_code)

    primitive(strpool!("ifodd"), if_test, if_odd_code)

    primitive(strpool!("ifvmode"), if_test, if_vmode_code)

    primitive(strpool!("ifhmode"), if_test, if_hmode_code)

    primitive(strpool!("ifmmode"), if_test, if_mmode_code)

    primitive(strpool!("ifinner"), if_test, if_inner_code)

    primitive(strpool!("ifvoid"), if_test, if_void_code)

    primitive(strpool!("ifhbox"), if_test, if_hbox_code)

    primitive(strpool!("ifvbox"), if_test, if_vbox_code)

    primitive(strpool!("ifx"), if_test, ifx_code)

    primitive(strpool!("ifeof"), if_test, if_eof_code)

    primitive(strpool!("iftrue"), if_test, if_true_code)

    primitive(strpool!("iffalse"), if_test, if_false_code)

    primitive(strpool!("ifcase"), if_test, if_case_code)

    primitive(
      strpool!("ifprimitive"),
      if_test,
      if_primitive_code,
    )
⟧

523.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    if_test:

    {
        if (chr_code >= unless_code) {
            print_esc(strpool!("unless"));
        }
        case chr_code % unless_code {
          if_cat_code:
            print_esc(strpool!("ifcat"));
          if_int_code:
            print_esc(strpool!("ifnum"));
          if_dim_code:
            print_esc(strpool!("ifdim"));
          if_odd_code:
            print_esc(strpool!("ifodd"));
          if_vmode_code:
            print_esc(strpool!("ifvmode"));
          if_hmode_code:
            print_esc(strpool!("ifhmode"));
          if_mmode_code:
            print_esc(strpool!("ifmmode"));
          if_inner_code:
            print_esc(strpool!("ifinner"));
          if_void_code:
            print_esc(strpool!("ifvoid"));
          if_hbox_code:
            print_esc(strpool!("ifhbox"));
          if_vbox_code:
            print_esc(strpool!("ifvbox"));
          ifx_code:
            print_esc(strpool!("ifx"));
          if_eof_code:
            print_esc(strpool!("ifeof"));
          if_true_code:
            print_esc(strpool!("iftrue"));
          if_false_code:
            print_esc(strpool!("iffalse"));
          if_case_code:
            print_esc(strpool!("ifcase"));
          if_primitive_code:
            print_esc(strpool!("ifprimitive"));
          ⟦1575 Cases of |if_test| for |print_cmd_chr|⟧
          othercases:
            print_esc(strpool!("if"));
        }
    }
⟧

524. Conditions can be inside conditions, and this nesting has a stack that is independent of the save_stack .

Four global variables represent the top of the condition stack: cond_ptr points to pushed-down entries, if any; if_limit specifies the largest code of a fi_or_else command that is syntactically legal; cur_if is the name of the current type of conditional; and if_line is the line number at which it began.

If no conditions are currently in progress, the condition stack has the special state cond_ptr == null , if_limit == normal , cur_if == 0 , if_line == 0 . Otherwise cond_ptr points to a two-word node; the type , subtype , and link fields of the first word contain if_limit , cur_if , and cond_ptr at the next level, and the second word contains the corresponding if_line .

// number of words in stack entry for conditionals
@define if_node_size => 2
@define if_line_field(#) => mem[# + 1].int
@define if_code => 1 // code for \.{\\if...} being evaluated
@define fi_code => 2 // code for \.{\\fi}
@define else_code => 3 // code for \.{\\else}
@define or_code => 4 // code for \.{\\or}
⟦13 Global variables⟧ += ⟦
    // top of the condition stack
    var cond_ptr: pointer;

    // upper bound on fi_or_else codes
    var if_limit: normal .. or_code;

    // type of conditional being worked on
    var cur_if: small_number;

    // line where that conditional began
    var if_line: integer;
⟧

525.

⟦23 Set initial values of key variables⟧ += ⟦
    cond_ptr = null

    if_limit = normal

    cur_if = 0

    if_line = 0

526.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("fi"), fi_or_else, fi_code)

    text(frozen_fi) = strpool!("fi")

    eqtb[frozen_fi] = eqtb[cur_val]

    primitive(strpool!("or"), fi_or_else, or_code)

    primitive(strpool!("else"), fi_or_else, else_code)
⟧

527.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    fi_or_else:

    if (chr_code == fi_code) {
        print_esc(strpool!("fi"));
    } else if (chr_code == or_code) {
        print_esc(strpool!("or"));
    } else {
        print_esc(strpool!("else"));
    }
⟧

528. When we skip conditional text, we keep track of the line number where skipping began, for use in error messages.

⟦13 Global variables⟧ += ⟦
    // skipping began here
    var skip_line: integer;
⟧

529. Here is a procedure that ignores text until coming to an \or, \else, or \fi at the current level of \if\fi nesting. After it has acted, cur_chr will indicate the token that was found, but cur_tok will not be set (because this makes the procedure run faster).

function pass_text() {
    label done;
    var
      l: integer, // level of $\.{\\if}\ldots\.{\\fi}$ 
      // nesting
      save_scanner_status: small_number; //  scanner_status 
      // upon entry
    
    save_scanner_status = scanner_status;
    scanner_status = skipping;
    l = 0;
    skip_line = line;
    loop {
        get_next;
        if (cur_cmd == fi_or_else) {
            if (l == 0) {
                goto done;
            }
            if (cur_chr == fi_code) {
                decr(l);
            }
        } else if (cur_cmd == if_test) {
            incr(l);
        }
    }
  done:
    scanner_status = save_scanner_status;
    if (tracing_ifs > 0) {
        show_cur_cmd_chr;
    }
}

530. When we begin to process a new \if, we set if_limit = if_code ; then if \or or \else or \fi occurs before the current \if condition has been evaluated, \relax will be inserted. For example, a sequence of commands like ‘\ifvoid1\else...\fi’ would otherwise require something after the ‘1’.

⟦530 Push the condition stack⟧ = ⟦
    {
        p = get_node(if_node_size);
        link(p) = cond_ptr;
        type(p) = if_limit;
        subtype(p) = cur_if;
        if_line_field(p) = if_line;
        cond_ptr = p;
        cur_if = cur_chr;
        if_limit = if_code;
        if_line = line;
    }
⟧

531.

⟦531 Pop the condition stack⟧ = ⟦
    {
        if (if_stack[in_open] == cond_ptr) {
            // conditionals possibly not properly nested 
            // with files
            if_warning;
        }
        p = cond_ptr;
        if_line = if_line_field(p);
        cur_if = subtype(p);
        if_limit = type(p);
        cond_ptr = link(p);
        free_node(p, if_node_size);
    }
⟧

532. Here’s a procedure that changes the if_limit code corresponding to a given value of cond_ptr .

function change_if_limit(l: small_number, p: pointer) {
    label exit;
    var q: pointer;
    
    if (p == cond_ptr) {
        // that's the easy case
        if_limit = l;
    } else {
        q = cond_ptr;
        loop {
            if (q == null) {
                confusion(strpool!("if"));
            }
            if (link(q) == p) {
                type(q) = l;
                return;
            }
            q = link(q);
        }
    }
  exit:
}

533. A condition is started when the expand procedure encounters an if_test command; in that case expand reduces to conditional , which is a recursive procedure.

function conditional() {
    label exit, common_ending;
    var
      b: boolean, // is the condition true?
      e: boolean, // keep track of nested csnames
      r: ord!("<") .. ord!(">"), // relation to be evaluated
      m, n: integer, // to be tested against the second 
      // operand
      p, q: pointer, // for traversing token lists in 
      // \.{\\ifx} tests
      save_scanner_status: small_number, //  scanner_status 
      // upon entry
      save_cond_ptr: pointer, //  cond_ptr corresponding to 
      // this conditional
      this_if: small_number, // type of this conditional
      is_unless: boolean; // was this if preceded by 
      // `\.{\\unless}' ?
    
    if (tracing_ifs > 0) {
        if (tracing_commands <= 1) {
            show_cur_cmd_chr;
        }
    }
    ⟦530 Push the condition stack⟧
    save_cond_ptr = cond_ptr;
    is_unless = (cur_chr >= unless_code);
    this_if = cur_chr % unless_code;
    ⟦536 Either process \.{\\ifcase} or set |b| to the value of a boolean condition⟧
    if (is_unless) {
        b = !b;
    }
    if (tracing_commands > 1) {
        ⟦537 Display the value of |b|⟧
    }
    if (b) {
        change_if_limit(else_code, save_cond_ptr);
        // wait for \.{\\else} or \.{\\fi}
        return;
    }
    ⟦535 Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|⟧
  common_ending:
    if (cur_chr == fi_code) {
        ⟦531 Pop the condition stack⟧
    } else {
        // wait for \.{\\fi}
        if_limit = fi_code;
    }
  exit:
}

534. In a construction like ‘\if\iftrue abc\else d\fi’, the first \else that we come to after learning that the \if is false is not the \else we’re looking for. Hence the following curious logic is needed.

535.

⟦535 Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|⟧ = ⟦
    loop {
        pass_text;
        if (cond_ptr == save_cond_ptr) {
            if (cur_chr != or_code) {
                goto common_ending;
            }
            print_err(strpool!("Extra "));
            print_esc(strpool!("or"));
            help1(
              strpool!("I'm ignoring this; it doesn't match any \\if."),
            );
            error;
        } else if (cur_chr == fi_code) {
            ⟦531 Pop the condition stack⟧
        }
    }
⟧

536.

⟦536 Either process \.{\\ifcase} or set |b| to the value of a boolean condition⟧ = ⟦
    case this_if {
      if_char_code, if_cat_code:
        ⟦541 Test if two characters match⟧
      if_int_code, if_dim_code:
        ⟦538 Test relation between integers or dimensions⟧
      if_odd_code:
        ⟦539 Test if an integer is odd⟧
      if_vmode_code:
        b = (abs(mode) == vmode);
      if_hmode_code:
        b = (abs(mode) == hmode);
      if_mmode_code:
        b = (abs(mode) == mmode);
      if_inner_code:
        b = (mode < 0);
      if_void_code, if_hbox_code, if_vbox_code:
        ⟦540 Test box register status⟧
      ifx_code:
        ⟦542 Test if two tokens match⟧
      if_eof_code:
        scan_four_bit_int_or_18;
        if (cur_val == 18) {
            b = !shellenabledp;
        } else {
            b = (read_open[cur_val] == closed);
        }
      if_true_code:
        b = true;
      if_false_code:
        b = false;
      ⟦1577 Cases for |conditional|⟧
      if_case_code:
        ⟦544 Select the appropriate case and |return| or |goto common_ending|⟧
      if_primitive_code:
        save_scanner_status = scanner_status;
        scanner_status = normal;
        get_next;
        scanner_status = save_scanner_status;
        if (cur_cs < hash_base) {
            m = prim_lookup(cur_cs - single_base);
        } else {
            m = prim_lookup(text(cur_cs));
        }
        b = (
            (cur_cmd != undefined_cs)
            && (m != undefined_primitive)
            && (cur_cmd == prim_eq_type(m))
            && (cur_chr == prim_equiv(m))
        );
      // there are no other cases
    }
⟧

537.

⟦537 Display the value of |b|⟧ = ⟦
    {
        begin_diagnostic;
        if (b) {
            print(strpool!("{true}"));
        } else {
            print(strpool!("{false}"));
        }
        end_diagnostic(false);
    }
⟧

538. Here we use the fact that ord!("<") , ord!("=") , and ord!(">") are consecutive ASCII codes.

⟦538 Test relation between integers or dimensions⟧ = ⟦
    {
        if (this_if == if_int_code) {
            scan_int;
        } else {
            scan_normal_dimen;
        }
        n = cur_val;
        ⟦440 Get the next non-blank non-call token⟧
        if (
            (cur_tok >= other_token + ord!("<"))
            && (cur_tok <= other_token + ord!(">"))
        ) {
            r = cur_tok - other_token;
        } else {
            print_err(strpool!("Missing = inserted for "));
            print_cmd_chr(if_test, this_if);
            help1(
              strpool!("I was expecting to see `<', `=', or `>'. Didn't."),
            );
            back_error;
            r = ord!("=");
        }
        if (this_if == if_int_code) {
            scan_int;
        } else {
            scan_normal_dimen;
        }
        case r {
          ord!("<"):
            b = (n < cur_val);
          ord!("="):
            b = (n == cur_val);
          ord!(">"):
            b = (n > cur_val);
        }
    }
⟧

539.

⟦539 Test if an integer is odd⟧ = ⟦
    {
        scan_int;
        b = odd(cur_val);
    }
⟧

540.

⟦540 Test box register status⟧ = ⟦
    {
        scan_register_num;
        fetch_box(p);
        if (this_if == if_void_code) {
            b = (p == null);
        } else if (p == null) {
            b = false;
        } else if (this_if == if_hbox_code) {
            b = (type(p) == hlist_node);
        } else {
            b = (type(p) == vlist_node);
        }
    }
⟧

541. An active character will be treated as category 13 following \if\noexpand or following \ifcat\noexpand. We use the fact that active characters have the smallest tokens, among all control sequences.

@define get_x_token_or_active_char =>
    {
        get_x_token;
        if (cur_cmd == relax) {
            if (cur_chr == no_expand_flag) {
                cur_cmd = active_char;
                cur_chr = 
                    cur_tok
                    - cs_token_flag - active_base
                ;
            }
        }
    }
⟦541 Test if two characters match⟧ = ⟦
    {
        get_x_token_or_active_char;
        // not a character
        if (
            (cur_cmd > active_char)
            || (cur_chr > biggest_usv)
        ) {
            m = relax;
            n = too_big_usv;
        } else {
            m = cur_cmd;
            n = cur_chr;
        }
        get_x_token_or_active_char;
        if (
            (cur_cmd > active_char)
            || (cur_chr > biggest_usv)
        ) {
            cur_cmd = relax;
            cur_chr = too_big_usv;
        }
        if (this_if == if_char_code) {
            b = (n == cur_chr);
        } else {
            b = (m == cur_cmd);
        }
    }
⟧

542. Note that ‘\ifx’ will declare two macros different if one is \long or \outer and the other isn’t, even though the texts of the macros are the same.

We need to reset scanner_status , since \outer control sequences are allowed, but we might be scanning a macro definition or preamble.

⟦542 Test if two tokens match⟧ = ⟦
    {
        save_scanner_status = scanner_status;
        scanner_status = normal;
        get_next;
        n = cur_cs;
        p = cur_cmd;
        q = cur_chr;
        get_next;
        if (cur_cmd != p) {
            b = false;
        } else if (cur_cmd < call) {
            b = (cur_chr == q);
        } else {
            ⟦543 Test if two macro texts match⟧
        }
        scanner_status = save_scanner_status;
    }
⟧

543. Note also that ‘\ifx’ decides that macros \a and \b are different in examples like this:

\def\a{\c}\def\c{}\def\b{\d}\def\d{}

⟦543 Test if two macro texts match⟧ = ⟦
    {
        p = link(cur_chr);
        // omit reference counts
        q = link(equiv(n));
        if (p == q) {
            b = true;
        } else {
            while ((p != null) && (q != null)) {
                if (info(p) != info(q)) {
                    p = null;
                } else {
                    p = link(p);
                    q = link(q);
                }
            }
            b = ((p == null) && (q == null));
        }
    }
⟧

544.

⟦544 Select the appropriate case and |return| or |goto common_ending|⟧ = ⟦
    {
        scan_int;
        //  n is the number of cases to pass
        n = cur_val;
        if (tracing_commands > 1) {
            begin_diagnostic;
            print(strpool!("{case "));
            print_int(n);
            print_char(ord!("}"));
            end_diagnostic(false);
        }
        while (n != 0) {
            pass_text;
            if (cond_ptr == save_cond_ptr) {
                if (cur_chr == or_code) {
                    decr(n);
                } else {
                    goto common_ending;
                }
            } else if (cur_chr == fi_code) {
                ⟦531 Pop the condition stack⟧
            }
        }
        change_if_limit(or_code, save_cond_ptr);
        // wait for \.{\\or}, \.{\\else}, or \.{\\fi}
        return;
    }
⟧

545. The processing of conditionals is complete except for the following code, which is actually part of expand . It comes into play when \or, \else, or \fi is scanned.

⟦545 Terminate the current conditional and skip to \.{\\fi}⟧ = ⟦
    {
        if (tracing_ifs > 0) {
            if (tracing_commands <= 1) {
                show_cur_cmd_chr;
            }
        }
        if (cur_chr > if_limit) {
            if (if_limit == if_code) {
                // condition not yet evaluated
                insert_relax;
            } else {
                print_err(strpool!("Extra "));
                print_cmd_chr(fi_or_else, cur_chr);
                help1(
                  strpool!("I'm ignoring this; it doesn't match any \\if."),
                );
                error;
            }
        } else {
            while (cur_chr != fi_code) {
                // skip to \.{\\fi}
                pass_text;
            }
            ⟦531 Pop the condition stack⟧
        }
    }
⟧

546. [29] File names. It’s time now to fret about file names. Besides the fact that different operating systems treat files in different ways, we must cope with the fact that completely different naming conventions are used by different groups of people. The following programs show what is required for one particular operating system; similar routines for other systems are not difficult to devise.

TEX assumes that a file name has three parts: the name proper; its “extension”; and a “file area” where it is found in an external file system. The extension of an input file or a write file is assumed to be ‘.tex’ unless otherwise specified; it is ‘.log’ on the transcript file that records each run of TEX; it is ‘.tfm’ on the font metric files that describe characters in the fonts TEX uses; it is ‘.dvi’ on the output files that specify typesetting information; and it is ‘.fmt’ on the format files written by INITEX to initialize TEX. The file area can be arbitrary on input files, but files are usually output to the user’s current area. If an input file cannot be found on the specified area, TEX will look for it on a special system area; this special area is intended for commonly used input files like webmac.tex.

Simple uses of TEX refer only to file names that have no explicit extension or area. For example, a person usually says ‘\input paper’ or ‘\font\tenrm = helvetica’ instead of ‘\input paper.new’ or ‘\font\tenrm = <csd.knuth>test’. Simple file names are best, because they make the TEX source files portable; whenever a file name consists entirely of letters and digits, it should be treated in the same way by all implementations of TEX. However, users need the ability to refer to other files in their environment, especially when responding to error messages concerning unopenable files; therefore we want to let them use the syntax that appears in their favorite operating system.

547. In order to isolate the system-dependent aspects of file names, the system-independent parts of TEX are expressed in terms of three system-dependent procedures called begin_name , more_name , and end_name . In essence, if the user-specified characters of the file name are 𝑐1𝑐𝑛, the system-independent driver program does the operations

𝑏𝑒𝑔𝑖𝑛_𝑛𝑎𝑚𝑒;𝑚𝑜𝑟𝑒_𝑛𝑎𝑚𝑒(𝑐1);;𝑚𝑜𝑟𝑒_𝑛𝑎𝑚𝑒(𝑐𝑛);𝑒𝑛𝑑_𝑛𝑎𝑚𝑒.
These three procedures communicate with each other via global variables. Afterwards the file name will appear in the string pool as three strings called cur_name , cur_area , and cur_ext ; the latter two are null (i.e., strpool!("") ), unless they were explicitly specified by the user.

Actually the situation is slightly more complicated, because TEX needs to know when the file name ends. The more_name routine is a function (with side effects) that returns true on the calls more_name (𝑐1), …, more_name (𝑐𝑛1). The final call more_name (𝑐𝑛) returns false ; or, it returns true and the token following 𝑐𝑛 is something like ‘\hbox’ (i.e., not a character). In other words, more_name is supposed to return true unless it is sure that the file name has been completely scanned; and end_name is supposed to be able to finish the assembly of cur_name , cur_area , and cur_ext regardless of whether 𝑚𝑜𝑟𝑒_𝑛𝑎𝑚𝑒(𝑐𝑛) returned true or false .

⟦13 Global variables⟧ += ⟦
    // name of file just scanned
    var cur_name: str_number;

    // file area just scanned, or \.{""}
    var cur_area: str_number;

    // file extension just scanned, or \.{""}
    var cur_ext: str_number;
⟧

548. The file names we shall deal with have the following structure: If the name contains ‘/’ or ‘:’ (for Amiga only), the file area consists of all characters up to and including the final such character; otherwise the file area is null. If the remaining file name contains ‘.’, the file extension consists of all such characters from the last ‘.’ to the end, otherwise the file extension is null.

We can scan such file names easily by using two global variables that keep track of the occurrences of area and extension delimiters:

⟦13 Global variables⟧ += ⟦
    // the most recent `\./', if any
    var area_delimiter: pool_pointer;

    // the most recent `\..', if any
    var ext_delimiter: pool_pointer;

    var file_name_quote_char: UTF16_code;
⟧

549. Input files that can’t be found in the user’s area may appear in a standard system area called TEX_area . Font metric files whose areas are not given explicitly are assumed to appear in a standard system area called TEX_font_area . These system area names will, of course, vary from place to place.

In C, the default paths are specified separately.

550. Here now is the first of the system-dependent routines for file name scanning.

function begin_name() {
    area_delimiter = 0;
    ext_delimiter = 0;
    quoted_filename = false;
    file_name_quote_char = 0;
}

551. And here’s the second. The string pool might change as the file name is being scanned, since a new \csname might be entered; therefore we keep area_delimiter and ext_delimiter relative to the beginning of the current string, instead of assigning an absolute address like pool_ptr to them.

function more_name(c: ASCII_code): boolean {
    if (
        stop_at_space
        && (c == ord!(" ")) && (file_name_quote_char == 0)
    ) {
        more_name = false;
    } else if (
        stop_at_space
        && (file_name_quote_char != 0)
        && (c == file_name_quote_char)
    ) {
        file_name_quote_char = 0;
        more_name = true;
    } else if (
        stop_at_space
        && (file_name_quote_char == 0)
        && ((c == ord!("\"")) || (c == ord!("'")))
    ) {
        file_name_quote_char = c;
        quoted_filename = true;
        more_name = true;
    } else {
        str_room(1);
        // contribute c to the current string
        append_char(c);
        if (IS_DIR_SEP(c)) {
            area_delimiter = cur_length;
            ext_delimiter = 0;
        } else if (c == ord!(".")) {
            ext_delimiter = cur_length;
        }
        more_name = true;
    }
}

552. The third. If a string is already in the string pool, the function slow_make_string does not create a new string but returns this string number, thus saving string space. Because of this new property of the returned string number it is not possible to apply flush_string to these strings.

function end_name() {
    var
      temp_str: str_number, // result of file name cache 
      // lookups
      j: pool_pointer; // running index
    
    if (str_ptr + 3 > max_strings) {
        overflow(
          strpool!("number of strings"),
          max_strings - init_str_ptr,
        );
    }
    if (area_delimiter == 0) {
        cur_area = strpool!("");
    } else {
        cur_area = str_ptr;
        str_start_macro(str_ptr + 1) = 
            str_start_macro(str_ptr)
            + area_delimiter
        ;
        incr(str_ptr);
        temp_str = search_string(cur_area);
        if (temp_str > 0) {
            cur_area = temp_str;
            // no flush_string , pool_ptr will be wrong!
            decr(str_ptr);
            for (j in str_start_macro(str_ptr + 1) to 
                pool_ptr
                - 1
            ) {
                str_pool[j - area_delimiter] = str_pool[j];
            }
            // update pool_ptr 
            pool_ptr = pool_ptr - area_delimiter;
        }
    }
    if (ext_delimiter == 0) {
        cur_ext = strpool!("");
        cur_name = slow_make_string;
    } else {
        cur_name = str_ptr;
        str_start_macro(str_ptr + 1) = 
            str_start_macro(str_ptr)
            + ext_delimiter - area_delimiter - 1
        ;
        incr(str_ptr);
        cur_ext = make_string;
        // undo extension string to look at name part
        decr(str_ptr);
        temp_str = search_string(cur_name);
        if (temp_str > 0) {
            cur_name = temp_str;
            // no flush_string , pool_ptr will be wrong!
            decr(str_ptr);
            for (j in str_start_macro(str_ptr + 1) to 
                pool_ptr
                - 1
            ) {
                str_pool[
                  j - ext_delimiter + area_delimiter + 1,
                ] = str_pool[j];
            }
            // update pool_ptr 
            pool_ptr = 
                pool_ptr
                - ext_delimiter + area_delimiter + 1
            ;
        }
        // remake extension string
        cur_ext = slow_make_string;
    }
}

553. Conversely, here is a routine that takes three strings and prints a file name that might have produced them. (The routine is system dependent, because some operating systems put the file area last instead of first.)

// check if string # needs quoting
@define check_quoted(#) =>
    if (# != 0) {
        j = str_start_macro(#);
        while (
            ((!must_quote) || (quote_char == 0))
            && (j < str_start_macro(# + 1))
        ) {
            if (str_pool[j] == ord!(" ")) {
                must_quote = true;
            } else if (
                (str_pool[j] == ord!("\""))
                || (str_pool[j] == ord!("'"))
            ) {
                must_quote = true;
                quote_char = 
                    ord!("\"")
                    + ord!("'") - str_pool[j]
                ;
            }
            incr(j);
        }
    }
// print string # , omitting quotes
@define print_quoted(#) =>
    if (# != 0) {
        for (j in str_start_macro(#) to 
            str_start_macro(# + 1)
            - 1
        ) {
            if (str_pool[j] == quote_char) {
                print(quote_char);
                quote_char = 
                    ord!("\"")
                    + ord!("'") - quote_char
                ;
                print(quote_char);
            }
            print(str_pool[j]);
        }
    }
⟦57 Basic printing procedures⟧ += ⟦
    function print_file_name(n, a, e: integer) {
        var
          must_quote: boolean, // whether to quote the 
          // filename
          quote_char: integer, // current quote char (single 
          // or double)
          j: pool_pointer; // index into str_pool 
        
        must_quote = false;
        quote_char = 0;
        check_quoted(a);
        check_quoted(n);
        check_quoted(e);
        if (must_quote) {
            if (quote_char == 0) {
                quote_char = ord!("\"");
            }
            print_char(quote_char);
        }
        print_quoted(a);
        print_quoted(n);
        print_quoted(e);
        if (quote_char != 0) {
            print_char(quote_char);
        }
    }
⟧

554. Another system-dependent routine is needed to convert three internal TEX strings into the name_of_file value that is used to open files. The present code allows both lowercase and uppercase letters in the file name.

@define append_to_name(#) =>
    {
        c = #;
        incr(k);
        if (k <= file_name_size) {
            if ((c < 128)) {
                name_of_file[k] = c;
            } else if ((c < 0x800)) {
                name_of_file[k] = 0xc0 + c div 0x40;
                incr(k);
                name_of_file[k] = 0x80 + c % 0x40;
            } else {
                name_of_file[k] = 0xe0 + c div 0x1000;
                incr(k);
                name_of_file[k] = 
                    0x80
                    + (c % 0x1000) div 0x40
                ;
                incr(k);
                name_of_file[k] = 0x80 + (c % 0x1000) % 0x40;
            }
        }
    }
function pack_file_name(n, a, e: str_number) {
    var
      k: integer, // number of positions filled in 
      // name_of_file 
      c: ASCII_code, // character being packed
      j: pool_pointer; // index into str_pool 
    
    k = 0;
    if (name_of_file) {
        libc_free(name_of_file);
    }
    name_of_file = xmalloc_array(
      UTF8_code,
      (length(a) + length(n) + length(e)) * 3 + 1,
    );
    for (j in str_start_macro(a) to 
        str_start_macro(a + 1)
        - 1
    ) {
        append_to_name(so(str_pool[j]));
    }
    for (j in str_start_macro(n) to 
        str_start_macro(n + 1)
        - 1
    ) {
        append_to_name(so(str_pool[j]));
    }
    for (j in str_start_macro(e) to 
        str_start_macro(e + 1)
        - 1
    ) {
        append_to_name(so(str_pool[j]));
    }
    if (k <= file_name_size) {
        name_length = k;
    } else {
        name_length = file_name_size;
    }
    name_of_file[name_length + 1] = 0;
}

555. A messier routine is also needed, since format file names must be scanned before TEX’s string mechanism has been initialized. We shall use the global variable TEX_format_default to supply the text for default system areas and extensions related to format files.

Under UNIX we don’t give the area part, instead depending on the path searching that will happen during file opening. Also, the length will be set in the main program.

@define format_area_length => 0 // length of its area part
// length of its `\.{.fmt}' part
@define format_ext_length => 4
// the extension, as a \.{WEB} constant
@define format_extension => strpool!(".fmt")
⟦13 Global variables⟧ += ⟦
    var format_default_length: integer;

    var TEX_format_default: cstring;
⟧

556. We set the name of the default format file and the length of that name in C, instead of Pascal, since we want them to depend on the name of the program.

557.

⟦14 Check the ``constant'' values for consistency⟧ += ⟦
    if (format_default_length > file_name_size) {
        bad = 31;
    }
⟧

558. Here is the messy routine that was just mentioned. It sets name_of_file from the first n characters of TEX_format_default , followed by buffer[a .. b] , followed by the last format_ext_length characters of TEX_format_default .

We dare not give error messages here, since TEX calls this routine before the error routine is ready to roll. Instead, we simply drop excess characters, since the error will be detected in another way when a strange file name isn’t found.

function pack_buffered_name(
  n: small_number,
  a, b: integer,
) {
    var
      k: integer, // number of positions filled in 
      // name_of_file 
      c: ASCII_code, // character being packed
      j: integer; // index into buffer or TEX_format_default 
    
    if (n + b - a + 1 + format_ext_length > file_name_size) {
        b = a + file_name_size - n - 1 - format_ext_length;
    }
    k = 0;
    if (name_of_file) {
        libc_free(name_of_file);
    }
    name_of_file = xmalloc_array(
      UTF8_code,
      n + (b - a + 1) + format_ext_length + 1,
    );
    for (j in 1 to n) {
        append_to_name(TEX_format_default[j]);
    }
    for (j in a to b) {
        append_to_name(buffer[j]);
    }
    for (j in format_default_length - format_ext_length + 1 to (
      format_default_length
    )) {
        append_to_name(TEX_format_default[j]);
    }
    if (k <= file_name_size) {
        name_length = k;
    } else {
        name_length = file_name_size;
    }
    name_of_file[name_length + 1] = 0;
}

559. Here is the only place we use pack_buffered_name . This part of the program becomes active when a “virgin” TEX is trying to get going, just after the preliminary initialization, or when the user is substituting another format file by typing ‘&’ after the initial ‘**’ prompt. The buffer contains the first line of input in buffer[loc .. (last - 1)] , where loc < last and buffer[loc] != ord!(" ") .

⟦559 Declare the function called |open_fmt_file|⟧ = ⟦
    function open_fmt_file(): boolean {
        label found, exit;
        var
          j: 0 .. buf_size; // the first space after the 
          // format file name
        
        j = loc;
        if (buffer[loc] == ord!("&")) {
            incr(loc);
            j = loc;
            buffer[last] = ord!(" ");
            while (buffer[j] != ord!(" ")) {
                incr(j);
            }
            // Kpathsea does everything
            pack_buffered_name(0, loc, j - 1);
            if (w_open_in(fmt_file)) {
                goto found;
            }
            wake_up_terminal;
            wterm("Sorry, I can't find the format `");
            fputs(stringcast(name_of_file + 1), stdout);
            wterm("'; will try `");
            fputs(TEX_format_default + 1, stdout);
            wterm_ln("'.");
            update_terminal;
            // now pull out all the stops: try for the 
            // system \.{plain} file
        }
        pack_buffered_name(
          format_default_length - format_ext_length,
          1,
          0,
        );
        if (!w_open_in(fmt_file)) {
            wake_up_terminal;
            wterm("I can't find the format file `");
            fputs(TEX_format_default + 1, stdout);
            wterm_ln("'!");
            open_fmt_file = false;
            return;
        }
      found:
        loc = j;
        open_fmt_file = true;
      exit:
    }
⟧

560. Operating systems often make it possible to determine the exact name (and possible version number) of a file that has been opened. The following routine, which simply makes a TEX string from the value of name_of_file , should ideally be changed to deduce the full name of file f , which is the file most recently opened, if it is possible to do this in a Pascal program.

This routine might be called after string memory has overflowed, hence we dare not use ‘str_room ’.

function make_name_string(): str_number {
    var
      k: 0 .. file_name_size, // index into name_of_file 
      save_area_delimiter, save_ext_delimiter: pool_pointer,
      save_name_in_progress, save_stop_at_space: boolean;
    
    if (
        (pool_ptr + name_length > pool_size)
        || (str_ptr == max_strings) || (cur_length > 0)
    ) {
        make_name_string = ord!("?");
    } else {
        make_utf16_name;
        for (k in 0 to name_length16 - 1) {
            append_char(name_of_file16[k]);
        }
        // At this point we also set cur_name , cur_ext , 
        // and cur_area to match the contents of 
        // name_of_file .
        make_name_string = make_string;
        save_area_delimiter = area_delimiter;
        save_ext_delimiter = ext_delimiter;
        save_name_in_progress = name_in_progress;
        save_stop_at_space = stop_at_space;
        name_in_progress = true;
        begin_name;
        stop_at_space = false;
        k = 0;
        while (
            (k < name_length16)
            && (more_name(name_of_file16[k]))
        ) {
            incr(k);
        }
        stop_at_space = save_stop_at_space;
        end_name;
        name_in_progress = save_name_in_progress;
        area_delimiter = save_area_delimiter;
        ext_delimiter = save_ext_delimiter;
    }
}

function u_make_name_string(
  var f: unicode_file,
): str_number {
    u_make_name_string = make_name_string;
}

function a_make_name_string(var f: alpha_file): str_number {
    a_make_name_string = make_name_string;
}

function b_make_name_string(var f: byte_file): str_number {
    b_make_name_string = make_name_string;
}

function w_make_name_string(var f: word_file): str_number {
    w_make_name_string = make_name_string;
}

561. Now let’s consider the “driver” routines by which TEX deals with file names in a system-independent manner. First comes a procedure that looks for a file name in the input by calling get_x_token for the information.

function scan_file_name() {
    label done;
    var save_warning_index: pointer;
    
    save_warning_index = warning_index;
    // store cur_cs here to remember until later
    warning_index = cur_cs;
    // here the program expands tokens and removes spaces 
    // and \.{\\relax}es from the input. The \.{\\relax} 
    // removal follows LuaTeX''s implementation, and other 
    // cases of balanced text scanning.
    ⟦438 Get the next non-blank non-relax non-call token⟧
    // return the last token to be read by either code path
    back_input;
    if (cur_cmd == left_brace) {
        scan_file_name_braced;
    } else {
        name_in_progress = true;
        begin_name;
        ⟦440 Get the next non-blank non-call token⟧
        loop {
            // not a character
            if (
                (cur_cmd > other_char)
                || (cur_chr > biggest_char)
            ) {
                back_input;
                goto done;
            }
            if (!more_name(cur_chr)) {
                goto done;
            }
            get_x_token;
        }
    }
  done:
    end_name;
    name_in_progress = false;
    // restore warning_index 
    warning_index = save_warning_index;
}

562. The global variable name_in_progress is used to prevent recursive use of scan_file_name , since the begin_name and other procedures communicate via global variables. Recursion would arise only by devious tricks like ‘\input\input f’; such attempts at sabotage must be thwarted. Furthermore, name_in_progress prevents \input from being initiated when a font size specification is being scanned.

Another global variable, job_name , contains the file name that was first \input by the user. This name is extended by ‘.log’ and ‘.dvi’ and ‘.fmt’ in the names of TEX’s output files.

⟦13 Global variables⟧ += ⟦
    // is a file name being scanned?
    var name_in_progress: boolean;

    // principal file name
    var job_name: str_number;

    // has the transcript file been opened?
    var log_opened: boolean;
⟧

563. Initially job_name == 0 ; it becomes nonzero as soon as the true name is known. We have job_name == 0 if and only if the ‘log’ file has not been opened, except of course for a short time just after job_name has become nonzero.

⟦55 Initialize the output routines⟧ += ⟦
    job_name = 0

    name_in_progress = false

    log_opened = false
⟧

564. Here is a routine that manufactures the output file names, assuming that job_name != 0 . It ignores and changes the current settings of cur_area and cur_ext .

@define pack_cur_name =>
    pack_file_name(cur_name, cur_area, cur_ext)
//  s == pool!(".log") , output_file_extension , or 
// format_extension 
function pack_job_name(s: str_number) {
    cur_area = strpool!("");
    cur_ext = s;
    cur_name = job_name;
    pack_cur_name;
}

565. If some trouble arises when TEX tries to open a file, the following routine calls upon the user to supply another file name. Parameter s is used in the error message to identify the type of file; parameter e is the default extension if none is given. Upon exit from the routine, variables cur_name , cur_area , cur_ext , and name_of_file are ready for another attempt at file opening.

function prompt_file_name(s, e: str_number) {
    label done;
    var
      k: 0 .. buf_size, // index into buffer 
      saved_cur_name: str_number, // to catch empty terminal 
      // input
      saved_cur_ext: str_number, // to catch empty terminal 
      // input
      saved_cur_area: str_number; // to catch empty terminal 
      // input
    
    if (interaction == scroll_mode) {
        wake_up_terminal;
    }
    if (s == strpool!("input file name")) {
        print_err(strpool!("I can't find file `"));
    } else {
        print_err(strpool!("I can't write on file `"));
    }
    print_file_name(cur_name, cur_area, cur_ext);
    print(strpool!("'."));
    if ((e == strpool!(".tex")) || (e == strpool!(""))) {
        show_context;
    }
    print_ln;
    print_c_string(prompt_file_name_help_msg);
    if ((e != strpool!(""))) {
        print(strpool!("; default file extension is `"));
        print(e);
        print(ord!("'"));
    }
    print(ord!(")"));
    print_ln;
    print_nl(strpool!("Please type another "));
    print(s);
    if (interaction < scroll_mode) {
        fatal_error(
          strpool!("*** (job aborted, file error in nonstop mode)"),
        );
    }
    saved_cur_name = cur_name;
    saved_cur_ext = cur_ext;
    saved_cur_area = cur_area;
    clear_terminal;
    prompt_input(strpool!(": "));
    ⟦566 Scan file name in the buffer⟧
    if (
        (length(cur_name) == 0)
        && (cur_ext == strpool!(""))
        && (cur_area == strpool!(""))
    ) {
        cur_name = saved_cur_name;
        cur_ext = saved_cur_ext;
        cur_area = saved_cur_area;
    } else if (cur_ext == strpool!("")) {
        cur_ext = e;
    }
    pack_cur_name;
}

566.

⟦566 Scan file name in the buffer⟧ = ⟦
    {
        begin_name;
        k = first;
        while ((buffer[k] == ord!(" ")) && (k < last)) {
            incr(k);
        }
        loop {
            if (k == last) {
                goto done;
            }
            if (!more_name(buffer[k])) {
                goto done;
            }
            incr(k);
        }
      done:
        end_name;
    }
⟧

567. Here’s an example of how these conventions are used. Whenever it is time to ship out a box of stuff, we shall use the macro ensure_dvi_open .

@define log_name => texmf_log_name
@define ensure_dvi_open =>
    if (output_file_name == 0) {
        if (job_name == 0) {
            open_log_file;
        }
        pack_job_name(output_file_extension);
        while (!dvi_open_out(dvi_file)) {
            prompt_file_name(
              strpool!("file name for output"),
              output_file_extension,
            );
        }
        output_file_name = b_make_name_string(dvi_file);
    }
⟦13 Global variables⟧ += ⟦
    var output_file_extension: str_number;

    var no_pdf_output: boolean;

    // the device-independent output goes here
    var dvi_file: byte_file;

    // full name of the output file
    var output_file_name: str_number;

    // full name of the log file
    var log_name: str_number;
⟧

568.

⟦55 Initialize the output routines⟧ += ⟦
    output_file_name = 0

    if (no_pdf_output) {
        output_file_extension = strpool!(".xdv");
    } else {
        output_file_extension = strpool!(".pdf");
    }
⟧

569. The open_log_file routine is used to open the transcript file and to help it catch up to what has previously been printed on the terminal.

function open_log_file() {
    var
      old_setting: 0 .. max_selector, // previous selector 
      // setting
      k: 0 .. buf_size, // index into months and buffer 
      l: 0 .. buf_size, // end of first input line
      months: const_cstring;
    
    old_setting = selector;
    if (job_name == 0) {
        job_name = get_job_name(strpool!("texput"));
    }
    pack_job_name(strpool!(".fls"));
    recorder_change_filename(stringcast(name_of_file + 1));
    pack_job_name(strpool!(".log"));
    while (!a_open_out(log_file)) {
        ⟦570 Try to get a different log file name⟧
    }
    log_name = a_make_name_string(log_file);
    selector = log_only;
    log_opened = true;
    ⟦571 Print the banner line, including the date and time⟧
    if (mltex_enabled_p) {
        wlog_cr;
        wlog("MLTeX v2.2 enabled");
    }
    // make sure bottom level is in memory
    input_stack[input_ptr] = cur_input;
    print_nl(strpool!("**"));
    // last position of first line
    l = input_stack[0].limit_field;
    if (buffer[l] == end_line_char) {
        decr(l);
    }
    for (k in 1 to l) {
        print(buffer[k]);
    }
    // now the transcript file contains the first line of 
    // input
    print_ln;
    //  log_only or term_and_log 
    selector = old_setting + 2;
}

570. Sometimes open_log_file is called at awkward moments when TEX is unable to print error messages or even to show_context . The prompt_file_name routine can result in a fatal_error , but the error routine will not be invoked because log_opened will be false.

The normal idea of batch_mode is that nothing at all should be written on the terminal. However, in the unusual case that no log file could be opened, we make an exception and allow an explanatory message to be seen.

Incidentally, the program always refers to the log file as a ‘transcript file’, because some systems cannot use the extension ‘.log’ for this file.

⟦570 Try to get a different log file name⟧ = ⟦
    {
        selector = term_only;
        prompt_file_name(
          strpool!("transcript file name"),
          strpool!(".log"),
        );
    }
⟧

571.

⟦571 Print the banner line, including the date and time⟧ = ⟦
    {
        if (
            src_specials_p
            || file_line_error_style_p || parse_first_line_p
        ) {
            wlog(banner_k);
        } else {
            wlog(banner);
        }
        wlog(version_string);
        slow_print(format_ident);
        print(strpool!("  "));
        print_int(sys_day);
        print_char(ord!(" "));
        months = " JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC";
        for (k in 3 * sys_month - 2 to 3 * sys_month) {
            wlog(months[k]);
        }
        print_char(ord!(" "));
        print_int(sys_year);
        print_char(ord!(" "));
        print_two(sys_time div 60);
        print_char(ord!(":"));
        print_two(sys_time % 60);
        if (eTeX_ex) {
            wlog_cr;
            wlog("entering extended mode");
        }
        if (shellenabledp) {
            wlog_cr;
            wlog(" ");
            if (restrictedshell) {
                wlog("restricted ");
            }
            wlog("\\write18 enabled.");
        }
        if (src_specials_p) {
            wlog_cr;
            wlog(" Source specials enabled.");
        }
        if (file_line_error_style_p) {
            wlog_cr;
            wlog(
              " file:line:error style messages enabled.",
            );
        }
        if (parse_first_line_p) {
            wlog_cr;
            wlog(" %&-line parsing enabled.");
        }
        if (translate_filename) {
            wlog_cr;
            wlog(" (WARNING: translate-file \"");
            fputs(translate_filename, log_file);
            wlog("\" ignored)");
        }
    }
⟧

572. Let’s turn now to the procedure that is used to initiate file reading when an ‘\input’ command is being processed. Beware: For historic reasons, this code foolishly conserves a tiny bit of string pool space; but that can confuse the interactive ‘E’ option.

// \TeX\ will \.{\\input} something
function start_input() {
    label done;
    var
      temp_str: str_number,
      v: pointer,
      k: 0 .. file_name_size; // index into name_of_file16 
    
    // set cur_name to desired file name
    scan_file_name;
    pack_cur_name;
    loop {
        // set up cur_file and new level of input
        begin_file_reading;
        // Tell open_input we are \.{\\input}.
        // Kpathsea tries all the various ways to get the 
        // file.
        tex_input_type = 1;
        // At this point name_of_file contains the actual 
        // name found, as a UTF8 string. We convert to 
        // UTF16, then extract the cur_area , cur_name , and 
        // cur_ext from it.
        if (
            kpse_in_name_ok(stringcast(name_of_file + 1))
            && u_open_in(
              cur_file,
              kpse_tex_format,
              XeTeX_default_input_mode,
              XeTeX_default_input_encoding,
            )
        ) {
            make_utf16_name;
            name_in_progress = true;
            begin_name;
            stop_at_space = false;
            k = 0;
            while (
                (k < name_length16)
                && (more_name(name_of_file16[k]))
            ) {
                incr(k);
            }
            stop_at_space = true;
            end_name;
            name_in_progress = false;
            goto done;
        }
        // remove the level that didn't work
        end_file_reading;
        prompt_file_name(
          strpool!("input file name"),
          strpool!(""),
        );
    }
  done:
    name = a_make_name_string(cur_file);
    source_filename_stack[in_open] = name;
    full_source_filename_stack[in_open] = (
      make_full_name_string
    );
    // we can try to conserve string pool space now
    if (name == str_ptr - 1) {
        temp_str = search_string(name);
        if (temp_str > 0) {
            name = temp_str;
            flush_string;
        }
    }
    if (job_name == 0) {
        job_name = get_job_name(cur_name);
        open_log_file;
        //  open_log_file doesn't show_context , so limit 
        // and loc needn't be set to meaningful values yet
    }
    if (
        term_offset
        + length(full_source_filename_stack[in_open])
        > max_print_line - 2
    ) {
        print_ln;
    } else if ((term_offset > 0) || (file_offset > 0)) {
        print_char(ord!(" "));
    }
    print_char(ord!("("));
    incr(open_parens);
    slow_print(full_source_filename_stack[in_open]);
    update_terminal;
    if (tracing_stack_levels > 0) {
        begin_diagnostic;
        print_ln;
        print_char(ord!("~"));
        v = input_ptr - 1;
        if (v < tracing_stack_levels) {
            while (v > 0) {
                print_char(ord!("."));
                decr(v);
            }
        } else {
            print_char(ord!("~"));
        }
        slow_print(strpool!("INPUT "));
        slow_print(cur_name);
        slow_print(cur_ext);
        print_ln;
        end_diagnostic(false);
    }
    state = new_line;
    ⟦1716 Prepare new file {\sl Sync\TeX} information⟧
    ⟦573 Read the first line of the new file⟧
}

573. Here we have to remember to tell the input_ln routine not to start with a get . If the file is empty, it is considered to contain a single blank line.

⟦573 Read the first line of the new file⟧ = ⟦
    {
        line = 1;
        if (input_ln(cur_file, false)) {
            do_nothing;
        }
        firm_up_the_line;
        if (end_line_char_inactive) {
            decr(limit);
        } else {
            buffer[limit] = end_line_char;
        }
        first = limit + 1;
        loc = start;
    }
⟧

574. [30] Font metric data. TEX gets its knowledge about fonts from font metric files, also called TFM files; the ‘T’ in ‘TFM’ stands for TEX, but other programs know about them too.

The information in a TFM file appears in a sequence of 8-bit bytes. Since the number of bytes is always a multiple of 4, we could also regard the file as a sequence of 32-bit words, but TEX uses the byte interpretation. The format of TFM files was designed by Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds of information in a compact but useful form.

⟦13 Global variables⟧ += ⟦
    var tfm_file: byte_file;
⟧

575. The first 24 bytes (6 words) of a TFM file contain twelve 16-bit integers that give the lengths of the various subsequent portions of the file. These twelve integers are, in order:

lf=lengthoftheentirele,inwords;lh=lengthoftheheaderdata,inwords;bc=smallestcharactercodeinthefont;ec=largestcharactercodeinthefont;nw=numberofwordsinthewidthtable;nh=numberofwordsintheheighttable;nd=numberofwordsinthedepthtable;ni=numberofwordsintheitaliccorrectiontable;nl=numberofwordsinthelig/kerntable;nk=numberofwordsinthekerntable;ne=numberofwordsintheextensiblecharactertable;np=numberoffontparameterwords.
They are all nonnegative and less than 215. We must have bc - 1 <= ec <= 255 , and
lf==6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np.
Note that a font may contain as many as 256 characters (if bc == 0 and ec == 255 ), and as few as 0 characters (if bc == ec + 1 ).

Incidentally, when two or more 8-bit bytes are combined to form an integer of 16 or more bits, the most significant bytes appear first in the file. This is called BigEndian order.

576. The rest of the TFM file may be regarded as a sequence of ten data arrays having the informal specification

\header&array[0..𝑙1]&ofstu\char_info&array[𝑏𝑐..𝑒𝑐]&ofchar_info_word\width&array[0..𝑛𝑤1]&ofx_word\height&array[0..𝑛1]&ofx_word\depth&array[0..𝑛𝑑1]&ofx_word\italic&array[0..𝑛𝑖1]&ofx_word\lig_kern&array[0..𝑛𝑙1]&oflig_kern_command\kern&array[0..𝑛𝑘1]&ofx_word\exten&array[0..𝑛𝑒1]&ofextensible_recipe\param&array[1..𝑛𝑝]&ofx_word
The most important data type used here is a fix_word , which is a 32-bit representation of a binary fraction. A fix_word is a signed quantity, with the two’s complement of the entire word used to represent negation. Of the 32 bits in a fix_word , exactly 12 are to the left of the binary point; thus, the largest fix_word value is 2048220, and the smallest is 2048. We will see below, however, that all but two of the fix_word values must lie between 16 and +16.

577. The first data array is a block of header information, which contains general facts about the font. The header must contain at least two words, header[0] and header[1] , whose meaning is explained below. Additional header information of use to other software routines might also be included, but TEX82 does not need to know about such details. For example, 16 more words of header information are in use at the Xerox Palo Alto Research Center; the first ten specify the character coding scheme used (e.g., ‘XEROX text’ or ‘TeX math symbols’), the next five give the font identifier (e.g., ‘HELVETICA’ or ‘CMSY’), and the last gives the “face byte.” The program that converts DVI files to Xerox printing format gets this information by looking at the TFM file, which it needs to read anyway because of other information that is not explicitly repeated in DVI format.

header[0] is a 32-bit check sum that TEX will copy into the DVI output file. Later on when the DVI file is printed, possibly on another computer, the actual font that gets used is supposed to have a check sum that agrees with the one in the TFM file used by TEX. In this way, users will be warned about potential incompatibilities. (However, if the check sum is zero in either the font file or the TFM file, no check is made.) The actual relation between this check sum and the rest of the TFM file is not important; the check sum is simply an identification number with the property that incompatible fonts almost always have distinct check sums.

header[1] is a fix_word containing the design size of the font, in units of TEX points. This number must be at least 1.0; it is fairly arbitrary, but usually the design size is 10.0 for a “10 point” font, i.e., a font that was designed to look best at a 10-point size, whatever that really means. When a TEX user asks for a font ‘at 𝛿 pt’, the effect is to override the design size and replace it by 𝛿, and to multiply the 𝑥 and 𝑦 coordinates of the points in the font image by a factor of 𝛿 divided by the design size. All other dimensions in the TFM file are fix_word numbers in design-size units, with the exception of param[1] (which denotes the slant ratio). Thus, for example, the value of param[6] , which defines the em unit, is often the fix_word value 220=1.0, since many fonts have a design size equal to one em. The other dimensions must be less than 16 design-size units in absolute value; thus, header[1] and param[1] are the only fix_word entries in the whole TFM file whose first byte might be something besides 0 or 255.

578. Next comes the char_info array, which contains one char_info_word per character. Each word in this part of the file contains six fields packed into four bytes as follows.

first byte: width_index (8 bits)

second byte: height_index (4 bits) times 16, plus depth_index (4 bits)

third byte: italic_index (6 bits) times 4, plus tag (2 bits)

fourth byte: remainder (8 bits)

The actual width of a character is \width[width_index] , in design-size units; this is a device for compressing information, since many characters have the same width. Since it is quite common for many characters to have the same height, depth, or italic correction, the TFM format imposes a limit of 16 different heights, 16 different depths, and 64 different italic corrections.

The italic correction of a character has two different uses. (a) In ordinary text, the italic correction is added to the width only if the TEX user specifies ‘\/’ after the character. (b) In math formulas, the italic correction is always added to the width, except with respect to the positioning of subscripts.

Incidentally, the relation \𝑤𝑖𝑑𝑡[0]=\𝑒𝑖𝑔𝑡[0]=\𝑑𝑒𝑝𝑡[0]=\𝑖𝑡𝑎𝑙𝑖𝑐[0]=0 should always hold, so that an index of zero implies a value of zero. The width_index should never be zero unless the character does not exist in the font, since a character is valid if and only if it lies between bc and ec and has a nonzero width_index .

579. The tag field in a char_info_word has four values that explain how to interpret the remainder field.

tag == 0 (no_tag ) means that remainder is unused.

tag == 1 (lig_tag ) means that this character has a ligature/kerning program starting at position remainder in the lig_kern array.

tag == 2 (list_tag ) means that this character is part of a chain of characters of ascending sizes, and not the largest in the chain. The remainder field gives the character code of the next larger character.

tag == 3 (ext_tag ) means that this character code represents an extensible character, i.e., a character that is built up of smaller pieces so that it can be made arbitrarily large. The pieces are specified in exten[remainder] .

Characters with tag == 2 and tag == 3 are treated as characters with tag == 0 unless they are used in special circumstances in math formulas. For example, the \sum operation looks for a list_tag , and the \left operation looks for both list_tag and ext_tag .

@define no_tag => 0 // vanilla character
// character has a ligature/kerning program
@define lig_tag => 1
// character has a successor in a charlist
@define list_tag => 2
@define ext_tag => 3 // character is extensible

580. The lig_kern array contains instructions in a simple programming language that explains what to do for special letter pairs. Each word in this array is a lig_kern_command of four bytes.

first byte: skip_byte , indicates that this is the final program step if the byte is 128 or more, otherwise the next step is obtained by skipping this number of intervening steps.

second byte: next_char , “if next_char follows the current character, then perform the operation and stop, otherwise continue.”

third byte: op_byte , indicates a ligature step if less than 128, a kern step otherwise.

fourth byte: remainder .

In a kern step, an additional space equal to kern[256 * (op_byte - 128) + remainder] is inserted between the current character and next_char . This amount is often negative, so that the characters are brought closer together by kerning; but it might be positive.

There are eight kinds of ligature steps, having op_byte codes 4𝑎+2𝑏+𝑐 where 0𝑎𝑏+𝑐 and 0𝑏,𝑐1. The character whose code is remainder is inserted between the current character and next_char ; then the current character is deleted if 𝑏=0, and next_char is deleted if 𝑐=0; then we pass over 𝑎 characters to reach the next current character (which may have a ligature/kerning program of its own).

If the very first instruction of the lig_kern array has skip_byte == 255 , the next_char byte is the so-called boundary character of this font; the value of next_char need not lie between bc and ec . If the very last instruction of the lig_kern array has skip_byte == 255 , there is a special ligature/kerning program for a boundary character at the left, beginning at location 256 * op_byte + remainder . The interpretation is that TEX puts implicit boundary characters before and after each consecutive string of characters from the same font. These implicit characters do not appear in the output, but they can affect ligatures and kerning.

If the very first instruction of a character’s lig_kern program has skip_byte > 128 , the program actually begins in location 256 * op_byte + remainder . This feature allows access to large lig_kern arrays, because the first instruction must otherwise appear in a location <=255 .

Any instruction with skip_byte > 128 in the lig_kern array must satisfy the condition

256*op_byte+remainder<nl.
If such an instruction is encountered during normal program execution, it denotes an unconditional halt; no ligature or kerning command is performed.

// value indicating `\.{STOP}' in a lig/kern program
@define stop_flag => qi(128)
@define kern_flag => qi(128) // op code for a kern step
@define skip_byte(#) => #.b0
@define next_char(#) => #.b1
@define op_byte(#) => #.b2
@define rem_byte(#) => #.b3

581. Extensible characters are specified by an extensible_recipe , which consists of four bytes called top , mid , bot , and rep (in this order). These bytes are the character codes of individual pieces used to build up a large symbol. If top , mid , or bot are zero, they are not present in the built-up result. For example, an extensible vertical line is like an extensible bracket, except that the top and bottom pieces are missing.

Let 𝑇, 𝑀, 𝐵, and 𝑅 denote the respective pieces, or an empty box if the piece isn’t present. Then the extensible characters have the form 𝑇𝑅𝑘𝑀𝑅𝑘𝐵 from top to bottom, for some k >= 0 , unless 𝑀 is absent; in the latter case we can have 𝑇𝑅𝑘𝐵 for both even and odd values of k . The width of the extensible character is the width of 𝑅; and the height-plus-depth is the sum of the individual height-plus-depths of the components used, since the pieces are butted together in a vertical list.

@define ext_top(#) => #.b0 //  top piece in a recipe
@define ext_mid(#) => #.b1 //  mid piece in a recipe
@define ext_bot(#) => #.b2 //  bot piece in a recipe
@define ext_rep(#) => #.b3 //  rep piece in a recipe

582. The final portion of a TFM file is the param array, which is another sequence of fix_word values.

param[1] == slant is the amount of italic slant, which is used to help position accents. For example, slant = 0.25 means that when you go up one unit, you also go .25 units to the right. The slant is a pure number; it’s the only fix_word other than the design size itself that is not scaled by the design size.

param[2] == space is the normal spacing between words in text. Note that character ord!(" ") in the font need not have anything to do with blank spaces.

param[3] == space_stretch is the amount of glue stretching between words.

param[4] == space_shrink is the amount of glue shrinking between words.

param[5] == x_height is the size of one ex in the font; it is also the height of letters for which accents don’t have to be raised or lowered.

param[6] == quad is the size of one em in the font.

param[7] == extra_space is the amount added to param[2] at the ends of sentences.

If fewer than seven parameters are present, TEX sets the missing parameters to zero. Fonts used for math symbols are required to have additional parameter information, which is explained later.

@define slant_code => 1
@define space_code => 2
@define space_stretch_code => 3
@define space_shrink_code => 4
@define x_height_code => 5
@define quad_code => 6
@define extra_space_code => 7

583. So that is what TFM files hold. Since TEX has to absorb such information about lots of fonts, it stores most of the data in a large array called font_info . Each item of font_info is a memory_word ; the fix_word data gets converted into scaled entries, while everything else goes into words of type four_quarters .

When the user defines \font\f, say, TEX assigns an internal number to the user’s font \f. Adding this number to font_id_base gives the eqtb location of a “frozen” control sequence that will always select the font.

⟦18 Types in the outer block⟧ += ⟦
    //  font in a char_node 
    const internal_font_number = integer;

    // index into font_info 
    const font_index = integer;

    type nine_bits = min_quarterword .. non_char;
⟧

584. Here now is the (rather formidable) array of font arrays.

@define otgr_font_flag => 0xfffe
@define aat_font_flag => 0xffff
@define is_aat_font(#) => (font_area[#] == aat_font_flag)
@define is_ot_font(#) =>
    (
        (font_area[#] == otgr_font_flag)
        && (usingOpenType(font_layout_engine[#]))
    )
@define is_gr_font(#) =>
    (
        (font_area[#] == otgr_font_flag)
        && (usingGraphite(font_layout_engine[#]))
    )
@define is_otgr_font(#) => (font_area[#] == otgr_font_flag)
@define is_native_font(#) =>
    (is_aat_font(#) || is_otgr_font(#)) // native fonts have 
    // font_area = 65534 or 65535, which would be a string 
    // containing an invalid Unicode character
@define is_new_mathfont(#) =>
    (
        (font_area[#] == otgr_font_flag)
        && (isOpenTypeMathFont(font_layout_engine[#]))
    )
// a halfword code that can't match a real character
@define non_char => qi(too_big_char)
@define non_address => 0 // a spurious bchar_label 
⟦13 Global variables⟧ += ⟦
    // the big collection of font data
    var font_info: ^fmemory_word;

    // first unused word of font_info 
    var fmem_ptr: font_index;

    // largest internal font number in use
    var font_ptr: internal_font_number;

    // check sum
    var font_check: ^four_quarters;

    // ``at'' size
    var font_size: ^scaled;

    // ``design'' size
    var font_dsize: ^scaled;

    // how many font parameters are present
    var font_params: ^font_index;

    // name of the font
    var font_name: ^str_number;

    // area of the font
    var font_area: ^str_number;

    // beginning (smallest) character code
    var font_bc: ^UTF16_code;

    // ending (largest) character code
    var font_ec: ^UTF16_code;

    // glue specification for interword space, null if not 
    // allocated
    var font_glue: ^pointer;

    // has a character from this font actually appeared in 
    // the output?
    var font_used: ^boolean;

    // current \.{\\hyphenchar} values
    var hyphen_char: ^integer;

    // current \.{\\skewchar} values
    var skew_char: ^integer;

    // start of lig_kern program for left boundary 
    // character, non_address if there is none
    var bchar_label: ^font_index;

    // boundary character, non_char if there is none
    var font_bchar: ^nine_bits;

    //  font_bchar if it doesn't exist in the font, 
    // otherwise non_char 
    var font_false_bchar: ^nine_bits;

    // either an CFDictionaryRef or a XeTeXLayoutEngine
    var font_layout_engine: ^void_pointer;

    //  TECkit_Converter or 0
    var font_mapping: ^void_pointer;

    // flags: 0x01: font_colored 0x02: font_vertical 
    var font_flags: ^char;

    // letterspacing to be applied to the font
    var font_letter_space: ^scaled;

    // used by load_native_font to return mapping, if any
    var loaded_font_mapping: void_pointer;

    // used by load_native_font to return flags
    var loaded_font_flags: char;

    var loaded_font_letter_space: scaled;

    var loaded_font_design_size: scaled;

    // scratch buffer used while applying font mappings
    var mapped_text: ^UTF16_code;

    // scratch buffer used in generating XDV output
    var xdv_buffer: ^char;
⟧

585. Besides the arrays just enumerated, we have directory arrays that make it easy to get at the individual entries in font_info . For example, the char_info data for character c in font f will be in font_info[char_base[f] + c].qqqq ; and if w is the width_index part of this word (the b0 field), the width of the character is font_info[width_base[f] + w].sc . (These formulas assume that min_quarterword has already been added to c and to w , since TEX stores its quarterwords that way.)

⟦13 Global variables⟧ += ⟦
    // base addresses for char_info 
    var char_base: ^integer;

    // base addresses for widths
    var width_base: ^integer;

    // base addresses for heights
    var height_base: ^integer;

    // base addresses for depths
    var depth_base: ^integer;

    // base addresses for italic corrections
    var italic_base: ^integer;

    // base addresses for ligature/kerning programs
    var lig_kern_base: ^integer;

    // base addresses for kerns
    var kern_base: ^integer;

    // base addresses for extensible recipes
    var exten_base: ^integer;

    // base addresses for font parameters
    var param_base: ^integer;
⟧

586.

⟦23 Set initial values of key variables⟧ += ⟦
    /*nothing*/

587. TEX always knows at least one font, namely the null font. It has no characters, and its seven parameters are all equal to zero.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    /*nothing*/

588.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("nullfont"), set_font, null_font)

    text(frozen_null_font) = strpool!("nullfont")

    eqtb[frozen_null_font] = eqtb[cur_val]
⟧

589. Of course we want to define macros that suppress the detail of how font information is actually packed, so that we don’t have to write things like

font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0,].sc
too often. The WEB definitions here make char_info(f)(c) the four_quarters word of font information corresponding to character c of font f . If q is such a word, char_width(f)(q) will be the character’s width; hence the long formula above is at least abbreviated to
char_width(f)(char_info(f)(c)).
Usually, of course, we will fetch q first and look at several of its fields at the same time.

The italic correction of a character will be denoted by char_italic(f)(q) , so it is analogous to char_width . But we will get at the height and depth in a slightly different way, since we usually want to compute both height and depth if we want either one. The value of height_depth(q) will be the 8-bit quantity

𝑏=𝑒𝑖𝑔𝑡_𝑖𝑛𝑑𝑒𝑥×16+𝑑𝑒𝑝𝑡_𝑖𝑛𝑑𝑒𝑥,
and if b is such a byte we will write char_height(f)(b) and char_depth(f)(b) for the height and depth of the character c for which q == char_info(f)(c) . Got that?

The tag field will be called char_tag(q) ; the remainder byte will be called rem_byte(q) , using a macro that we have already defined above.

Access to a character’s width , height , depth , and tag fields is part of TEX’s inner loop, so we want these macros to produce code that is as fast as possible under the circumstances.

MLTEX will assume that a character c exists iff either exists in the current font or a character substitution definition for this character was defined using \charsubdef. To avoid the distinction between these two cases, MLTEX introduces the notion “effective character” of an input character c . If c exists in the current font, the effective character of c is the character c itself. If it doesn’t exist but a character substitution is defined, the effective character of c is the base character defined in the character substitution. If there is an effective character for a non-existing character c , the “virtual character” c will get appended to the horizontal lists.

The effective character is used within char_info to access appropriate character descriptions in the font. For example, when calculating the width of a box, MLTEX will use the metrics of the effective characters. For the case of a substitution, MLTEX uses the metrics of the base character, ignoring the metrics of the accent character.

If character substitutions are changed, it will be possible that a character c neither exists in a font nor there is a valid character substitution for c . To handle these cases effective_char should be called with its first argument set to true to ensure that it will still return an existing character in the font. If neither c nor the substituted base character in the current character substitution exists, effective_char will output a warning and return the character font_bc[f] (which is incorrect, but can not be changed within the current framework).

Sometimes character substitutions are unwanted, therefore the original definition of char_info can be used using the macro orig_char_info . Operations in which character substitutions should be avoided are, for example, loading a new font and checking the font metric information in this font, and character accesses in math mode.

@define char_list_exists(#) => (char_sub_code(#) > hi(0))
@define char_list_accent(#) => (ho(char_sub_code(#)) div 256)
@define char_list_char(#) => (ho(char_sub_code(#)) % 256)
@define char_info_end(#) => #)].qqqq
@define char_info(#) =>
    font_info[char_base[#] + effective_char(true, #, char_info_end
@define orig_char_info_end(#) => #].qqqq
@define orig_char_info(#) =>
    font_info[char_base[#] + orig_char_info_end
@define char_width_end(#) => #.b0].sc
@define char_width(#) =>
    font_info[width_base[#] + char_width_end
@define char_exists(#) => (#.b0 > min_quarterword)
@define char_italic_end(#) => (qo(#.b2)) div 4].sc
@define char_italic(#) =>
    font_info[italic_base[#] + char_italic_end
@define height_depth(#) => qo(#.b1)
@define char_height_end(#) => (#) div 16].sc
@define char_height(#) =>
    font_info[height_base[#] + char_height_end
@define char_depth_end(#) => (#) % 16].sc
@define char_depth(#) =>
    font_info[depth_base[#] + char_depth_end
@define char_tag(#) => ((qo(#.b2)) % 4)

590. The global variable null_character is set up to be a word of char_info for a character that doesn’t exist. Such a word provides a convenient way to deal with erroneous situations.

⟦13 Global variables⟧ += ⟦
    // nonexistent character information
    var null_character: four_quarters;
⟧

591.

⟦23 Set initial values of key variables⟧ += ⟦
    null_character.b0 = min_quarterword

    null_character.b1 = min_quarterword

    null_character.b2 = min_quarterword

    null_character.b3 = min_quarterword
⟧

592. Here are some macros that help process ligatures and kerns. We write char_kern(f)(j) to find the amount of kerning specified by kerning command j in font f . If j is the char_info for a character with a ligature/kern program, the first instruction of that program is either i == font_info[lig_kern_start(f)(j)] or font_info[lig_kern_restart(f)(i)] , depending on whether or not skip_byte(i) <= stop_flag .

The constant kern_base_offset should be simplified, for Pascal compilers that do not do local optimization.

@define char_kern_end(#) =>
    256 * op_byte(#) + rem_byte(#)].sc
@define char_kern(#) =>
    font_info[kern_base[#] + char_kern_end
@define kern_base_offset => 256 * (128 + min_quarterword)
// beginning of lig/kern program
@define lig_kern_start(#) => lig_kern_base[#] + rem_byte
@define lig_kern_restart_end(#) =>
    
        256
        * op_byte(#)
        + rem_byte(#) + 32768 - kern_base_offset
@define lig_kern_restart(#) =>
    lig_kern_base[#] + lig_kern_restart_end

593. Font parameters are referred to as slant(f) , space(f) , etc.

@define param_end(#) => param_base[#]].sc
@define param(#) => font_info[# + param_end
// slant to the right, per unit distance upward
@define slant => param(slant_code)
// normal space between words
@define space => param(space_code)
// stretch between words
@define space_stretch => param(space_stretch_code)
// shrink between words
@define space_shrink => param(space_shrink_code)
@define x_height => param(x_height_code) // one ex
@define quad => param(quad_code) // one em
// additional space at end of sentence
@define extra_space => param(extra_space_code)
⟦593 The em width for |cur_font|⟧ = ⟦
    quad(cur_font)
⟧

594.

⟦594 The x-height for |cur_font|⟧ = ⟦
    x_height(cur_font)
⟧

595. TEX checks the information of a TFM file for validity as the file is being read in, so that no further checks will be needed when typesetting is going on. The somewhat tedious subroutine that does this is called read_font_info . It has four parameters: the user font identifier u , the file name and area strings nom and aire , and the “at” size s . If s is negative, it’s the negative of a scale factor to be applied to the design size; s == -1000 is the normal case. Otherwise s will be substituted for the design size; in this case, s must be positive and less than 2048pt (i.e., it must be less than 227 when considered as an integer).

The subroutine opens and closes a global file variable called tfm_file . It returns the value of the internal font number that was just loaded. If an error is detected, an error message is issued and no font information is stored; null_font is returned in this case.

@define bad_tfm => 11 // label for read_font_info 
@define abort =>
    // do this when the \.{TFM} data is wrong
    goto bad_tfm
⟦1694 Declare additional functions for ML\TeX⟧

// input a \.{TFM} file
function read_font_info(
  u: pointer,
  nom, aire: str_number,
  s: scaled,
): internal_font_number {
    label done, bad_tfm, not_found;
    var
      k: font_index, // index into font_info 
      name_too_long: boolean, //  nom or aire exceeds 255 
      // bytes?
      file_opened: boolean, // was tfm_file successfully 
      // opened?
      lf, lh, bc, ec, nw, nh, nd, ni, nl, nk, ne, np: halfword, // 
      // sizes of subfiles
      f: internal_font_number, // the new font's number
      g: internal_font_number, // the number to return
      a, b, c, d: eight_bits, // byte variables
      qw: four_quarters,
      sw: scaled, // accumulators
      bch_label: integer, // left boundary start location, 
      // or infinity
      bchar: 0 .. 256, // boundary character, or 256
      z: scaled, // the design size or the ``at'' size
      alpha: integer,
      beta: 1 .. 16; // auxiliary quantities used in 
      // fixed-point multiplication
    
    g = null_font;
    file_opened = false;
    pack_file_name(nom, aire, cur_ext);
    if (XeTeX_tracing_fonts_state > 0) {
        begin_diagnostic;
        print_nl(strpool!("Requested font \""));
        print_c_string(stringcast(name_of_file + 1));
        print("\"");
        if (s < 0) {
            print(strpool!(" scaled "));
            print_int(-s);
        } else {
            print(strpool!(" at "));
            print_scaled(s);
            print(strpool!("pt"));
        }
        end_diagnostic(false);
    }
    if (quoted_filename) {
        // quoted name, so try for a native font
        g = load_native_font(u, nom, aire, s);
        if (g != null_font) {
            goto done;
        }
        // it was an unquoted name, or not found as an 
        // installed font, so try for a TFM file
    }
    ⟦597 Read and check the font data if file exists; |abort| if the \.{TFM} file is malformed; if there's no room for this font, say so and |goto done|; otherwise |incr(font_ptr)| and |goto done|⟧
    if (g != null_font) {
        goto done;
    }
    if (!quoted_filename) {
        // we failed to find a TFM file, so try for a native 
        // font
        g = load_native_font(u, nom, aire, s);
        if (g != null_font) {
            goto done;
        }
    }
  bad_tfm:
    if (suppress_fontnotfound_error == 0) {
        ⟦596 Report that the font won't be loaded⟧
    }
  done:
    if (file_opened) {
        b_close(tfm_file);
    }
    if (XeTeX_tracing_fonts_state > 0) {
        if (g == null_font) {
            begin_diagnostic;
            print_nl(
              strpool!(" -> font not found, using \"nullfont\""),
            );
            end_diagnostic(false);
        } else if (file_opened) {
            begin_diagnostic;
            print_nl(strpool!(" -> "));
            print_c_string(stringcast(name_of_file + 1));
            end_diagnostic(false);
        }
    }
    read_font_info = g;
}

596. There are programs called TFtoPL and PLtoTF that convert between the TFM format and a symbolic property-list format that can be easily edited. These programs contain extensive diagnostic information, so TEX does not have to bother giving precise details about why it rejects a particular TFM file.

@define start_font_error_message =>
    print_err(strpool!("Font "));
    sprint_cs(u);
    print_char(ord!("="));
    if (file_name_quote_char != 0) {
        print_char(file_name_quote_char);
    }
    print_file_name(nom, aire, cur_ext);
    if (file_name_quote_char != 0) {
        print_char(file_name_quote_char);
    }
    if (s >= 0) {
        print(strpool!(" at "));
        print_scaled(s);
        print(strpool!("pt"));
    } else if (s != -1000) {
        print(strpool!(" scaled "));
        print_int(-s);
    }
⟦596 Report that the font won't be loaded⟧ = ⟦
    start_font_error_message

    if (file_opened) {
        print(
          strpool!(" not loadable: Bad metric (TFM) file"),
        );
    } else if (name_too_long) {
        print(
          strpool!(" not loadable: Metric (TFM) file name too long"),
        );
    } else {
        print(
          strpool!(" not loadable: Metric (TFM) file or installed font not found"),
        );
    }

    help5(
      strpool!("I wasn't able to read the size data for this font,"),
    )(strpool!("so I will ignore the font specification."))(
      strpool!("[Wizards can fix TFM files using TFtoPL/PLtoTF.]"),
    )(
      strpool!("You might try inserting a different font spec;"),
    )(
      strpool!("e.g., type `I\\font<same font id>=<substitute font name>'."),
    )

    error
⟧

597.

⟦597 Read and check the font data if file exists; |abort| if the \.{TFM} file is malformed; if there's no room for this font, say so and |goto done|; otherwise |incr(font_ptr)| and |goto done|⟧ = ⟦
    ⟦598 Open |tfm_file| for input and |begin|⟧

    ⟦600 Read the {\.{TFM}} size fields⟧

    ⟦601 Use size fields to allocate font information⟧

    ⟦603 Read the {\.{TFM}} header⟧

    ⟦604 Read character data⟧

    ⟦606 Read box dimensions⟧

    ⟦608 Read ligature/kern program⟧

    ⟦609 Read extensible character recipes⟧

    ⟦610 Read font parameters⟧

    ⟦611 Make final adjustments and |goto done|⟧

    end
⟧

598.

⟦598 Open |tfm_file| for input and |begin|⟧ = ⟦
    name_too_long = 
        (length(nom) > 255)
        || (length(aire) > 255)

    if (name_too_long) {
        //  kpse_find_file will append the pool!(".tfm") , 
        // and avoid searching the disk before the font 
        // alias files as well.
        abort;
    }

    pack_file_name(nom, aire, strpool!(""))

    check_for_tfm_font_mapping

    if

    b_open_in(tfm_file)

    then

    begin

    file_opened = true
⟧

599. Note: A malformed TFM file might be shorter than it claims to be; thus eof(tfm_file) might be true when read_font_info refers to tfm_file^ or when it says get(tfm_file) . If such circumstances cause system error messages, you will have to defeat them somehow, for example by defining fget to be ‘begin get(tfm_file) if (eof(tfm_file)) { abort; } end ’.

@define fget => tfm_temp = getc(tfm_file)
@define fbyte => tfm_temp
@define read_sixteen(#) =>
    {
        # = fbyte;
        if (# > 127) {
            abort;
        }
        fget;
        # = # * 0x100 + fbyte;
    }
@define store_four_quarters(#) =>
    {
        fget;
        a = fbyte;
        qw.b0 = qi(a);
        fget;
        b = fbyte;
        qw.b1 = qi(b);
        fget;
        c = fbyte;
        qw.b2 = qi(c);
        fget;
        d = fbyte;
        qw.b3 = qi(d);
        # = qw;
    }

600.

⟦600 Read the {\.{TFM}} size fields⟧ = ⟦
    {
        read_sixteen(lf);
        fget;
        read_sixteen(lh);
        fget;
        read_sixteen(bc);
        fget;
        read_sixteen(ec);
        if ((bc > ec + 1) || (ec > 255)) {
            abort;
        }
        //  bc == 256 and ec == 255 
        if (bc > 255) {
            bc = 1;
            ec = 0;
        }
        fget;
        read_sixteen(nw);
        fget;
        read_sixteen(nh);
        fget;
        read_sixteen(nd);
        fget;
        read_sixteen(ni);
        fget;
        read_sixteen(nl);
        fget;
        read_sixteen(nk);
        fget;
        read_sixteen(ne);
        fget;
        read_sixteen(np);
        if (
            lf
            != 6
            + lh
            + (ec - bc + 1)
            + nw + nh + nd + ni + nl + nk + ne + np
        ) {
            abort;
        }
        if ((nw == 0) || (nh == 0) || (nd == 0) || (ni == 0)) {
            abort;
        }
    }
⟧

601. The preliminary settings of the index-offset variables char_base , width_base , lig_kern_base , kern_base , and exten_base will be corrected later by subtracting min_quarterword from them; and we will subtract 1 from param_base too. It’s best to forget about such anomalies until later.

⟦601 Use size fields to allocate font information⟧ = ⟦
    //  lf words should be loaded into font_info 
    lf = lf - 6 - lh

    if (np < 7) {
        // at least seven parameters will appear
        lf = lf + 7 - np;
    }

    if (
        (font_ptr == font_max)
        || (fmem_ptr + lf > font_mem_size)
    ) {
        ⟦602 Apologize for not loading the font, |goto done|⟧
    }

    f = font_ptr + 1

    char_base[f] = fmem_ptr - bc

    width_base[f] = char_base[f] + ec + 1

    height_base[f] = width_base[f] + nw

    depth_base[f] = height_base[f] + nh

    italic_base[f] = depth_base[f] + nd

    lig_kern_base[f] = italic_base[f] + ni

    kern_base[f] = lig_kern_base[f] + nl - kern_base_offset

    exten_base[f] = kern_base[f] + kern_base_offset + nk

    param_base[f] = exten_base[f] + ne
⟧

602.

⟦602 Apologize for not loading the font, |goto done|⟧ = ⟦
    {
        start_font_error_message;
        print(
          strpool!(" not loaded: Not enough room left"),
        );
        help4(
          strpool!("I'm afraid I won't be able to make use of this font,"),
        )(
          strpool!("because my memory for character-size data is too small."),
        )(
          strpool!("If you're really stuck, ask a wizard to enlarge me."),
        )(
          strpool!("Or maybe try `I\\font<same font id>=<name of loaded font>'."),
        );
        error;
        goto done;
    }
⟧

603. Only the first two words of the header are needed by TEX82.

⟦603 Read the {\.{TFM}} header⟧ = ⟦
    {
        if (lh < 2) {
            abort;
        }
        store_four_quarters(font_check[f]);
        fget;
        // this rejects a negative design size
        read_sixteen(z);
        fget;
        z = z * 0x100 + fbyte;
        fget;
        z = (z * 0x10) + (fbyte div 0x10);
        if (z < unity) {
            abort;
        }
        while (lh > 2) {
            fget;
            fget;
            fget;
            fget;
            // ignore the rest of the header
            decr(lh);
        }
        font_dsize[f] = z;
        if (s != -1000) {
            if (s >= 0) {
                z = s;
            } else {
                z = xn_over_d(z, -s, 1000);
            }
        }
        font_size[f] = z;
    }
⟧

604.

⟦604 Read character data⟧ = ⟦
    for (k in fmem_ptr to width_base[f] - 1) {
        store_four_quarters(font_info[k].qqqq);
        if (
            (a >= nw)
            || (b div 0x10 >= nh)
            || (b % 0x10 >= nd) || (c div 4 >= ni)
        ) {
            abort;
        }
        case c % 4 {
          lig_tag:
            if (d >= nl) {
                abort;
            }
          ext_tag:
            if (d >= ne) {
                abort;
            }
          list_tag:
            ⟦605 Check for charlist cycle⟧
          othercases:
            //  no_tag 
            do_nothing;
        }
    }
⟧

605. We want to make sure that there is no cycle of characters linked together by list_tag entries, since such a cycle would get TEX into an endless loop. If such a cycle exists, the routine here detects it when processing the largest character code in the cycle.

@define check_byte_range(#) =>
    {
        if ((# < bc) || (# > ec)) {
            abort;
        }
    }
@define current_character_being_worked_on =>
    k + bc - fmem_ptr
⟦605 Check for charlist cycle⟧ = ⟦
    {
        check_byte_range(d);
        while (d < current_character_being_worked_on) {
            // N.B.: not qi ( d ) , since char_base [ f ] 
            // hasn't been adjusted yet
            qw = orig_char_info(f)(d);
            if (char_tag(qw) != list_tag) {
                goto not_found;
            }
            // next character on the list
            d = qo(rem_byte(qw));
        }
        if (d == current_character_being_worked_on) {
            // yes, there's a cycle
            abort;
        }
      not_found:
    }
⟧

606. A fix_word whose four bytes are (𝑎,𝑏,𝑐,𝑑) from left to right represents the number

𝑥={𝑏24+𝑐212+𝑑220,if𝑎=0;16+𝑏24+𝑐212+𝑑220,if𝑎=255.
(No other choices of a are allowed, since the magnitude of a number in design-size units must be less than 16.) We want to multiply this quantity by the integer z , which is known to be less than 227. If 𝑧<223, the individual multiplications 𝑏𝑧, 𝑐𝑧, 𝑑𝑧 cannot overflow; otherwise we will divide z by 2, 4, 8, or 16, to obtain a multiplier less than 223, and we can compensate for this later. If z has thereby been replaced by 𝑧=𝑧/2𝑒, let 𝛽=24𝑒; we shall compute
(𝑏+𝑐28+𝑑216)𝑧/𝛽
if 𝑎=0, or the same quantity minus 𝛼=24+𝑒𝑧 if 𝑎=255. This calculation must be done exactly, in order to guarantee portability of TEX between computers.

@define store_scaled(#) =>
    {
        fget;
        a = fbyte;
        fget;
        b = fbyte;
        fget;
        c = fbyte;
        fget;
        d = fbyte;
        sw = 
            (
                ((((d * z) div 0x100) + (c * z)) div 0x100)
                + (b * z)
            )
            div beta
        ;
        if (a == 0) {
            # = sw;
        } else if (a == 255) {
            # = sw - alpha;
        } else {
            abort;
        }
    }
⟦606 Read box dimensions⟧ = ⟦
    {
        ⟦607 Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$⟧
        for (k in width_base[f] to lig_kern_base[f] - 1) {
            store_scaled(font_info[k].sc);
        }
        if (font_info[width_base[f]].sc != 0) {
            // \\{width}[0] must be zero
            abort;
        }
        if (font_info[height_base[f]].sc != 0) {
            // \\{height}[0] must be zero
            abort;
        }
        if (font_info[depth_base[f]].sc != 0) {
            // \\{depth}[0] must be zero
            abort;
        }
        if (font_info[italic_base[f]].sc != 0) {
            // \\{italic}[0] must be zero
            abort;
        }
    }
⟧

607.

⟦607 Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$⟧ = ⟦
    {
        alpha = 16;
        while (z >= 0x800000) {
            z = z div 2;
            alpha = alpha + alpha;
        }
        beta = 256 div alpha;
        alpha = alpha * z;
    }
⟧

608.

@define check_existence(#) =>
    {
        check_byte_range(#);
        // N.B.: not qi ( # ) 
        qw = orig_char_info(f)(#);
        if (!char_exists(qw)) {
            abort;
        }
    }
⟦608 Read ligature/kern program⟧ = ⟦
    bch_label = 0x7fff

    bchar = 256

    if (nl > 0) {
        for (k in lig_kern_base[f] to 
            kern_base[f]
            + kern_base_offset - 1
        ) {
            store_four_quarters(font_info[k].qqqq);
            if (a > 128) {
                if (256 * c + d >= nl) {
                    abort;
                }
                if (a == 255) {
                    if (k == lig_kern_base[f]) {
                        bchar = b;
                    }
                }
            } else {
                if (b != bchar) {
                    check_existence(b);
                }
                if (c < 128) {
                    // check ligature
                    check_existence(d);
                } else if (256 * (c - 128) + d >= nk) {
                    // check kern
                    abort;
                }
                if (a < 128) {
                    if (k - lig_kern_base[f] + a + 1 >= nl) {
                        abort;
                    }
                }
            }
        }
        if (a == 255) {
            bch_label = 256 * c + d;
        }
    }

    for (k in kern_base[f] + kern_base_offset to 
        exten_base[f]
        - 1
    ) {
        store_scaled(font_info[k].sc);
    }
⟧

609.

⟦609 Read extensible character recipes⟧ = ⟦
    for (k in exten_base[f] to param_base[f] - 1) {
        store_four_quarters(font_info[k].qqqq);
        if (a != 0) {
            check_existence(a);
        }
        if (b != 0) {
            check_existence(b);
        }
        if (c != 0) {
            check_existence(c);
        }
        check_existence(d);
    }
⟧

610. We check to see that the TFM file doesn’t end prematurely; but no error message is given for files having more than lf words.

⟦610 Read font parameters⟧ = ⟦
    {
        for (k in 1 to np) {
            // the slant parameter is a pure number
            if (k == 1) {
                fget;
                sw = fbyte;
                if (sw > 127) {
                    sw = sw - 256;
                }
                fget;
                sw = sw * 0x100 + fbyte;
                fget;
                sw = sw * 0x100 + fbyte;
                fget;
                font_info[param_base[f]].sc = 
                    (sw * 0x10)
                    + (fbyte div 0x10)
                ;
            } else {
                store_scaled(
                  font_info[param_base[f] + k - 1].sc,
                );
            }
        }
        if (feof(tfm_file)) {
            abort;
        }
        for (k in np + 1 to 7) {
            font_info[param_base[f] + k - 1].sc = 0;
        }
    }
⟧

611. Now to wrap it up, we have checked all the necessary things about the TFM file, and all we need to do is put the finishing touches on the data for the new font.

@define adjust(#) =>
    // correct for the excess min_quarterword that was added
    #[f] = qo(#[f])
⟦611 Make final adjustments and |goto done|⟧ = ⟦
    if (np >= 7) {
        font_params[f] = np;
    } else {
        font_params[f] = 7;
    }

    hyphen_char[f] = default_hyphen_char

    skew_char[f] = default_skew_char

    if (bch_label < nl) {
        bchar_label[f] = bch_label + lig_kern_base[f];
    } else {
        bchar_label[f] = non_address;
    }

    font_bchar[f] = qi(bchar)

    font_false_bchar[f] = qi(bchar)

    if (bchar <= ec) {
        if (bchar >= bc) {
            // N.B.: not qi ( bchar ) 
            qw = orig_char_info(f)(bchar);
            if (char_exists(qw)) {
                font_false_bchar[f] = non_char;
            }
        }
    }

    font_name[f] = nom

    font_area[f] = aire

    font_bc[f] = bc

    font_ec[f] = ec

    font_glue[f] = null

    adjust(char_base)

    adjust(width_base)

    adjust(lig_kern_base)

    adjust(kern_base)

    adjust(exten_base)

    decr(param_base[f])

    fmem_ptr = fmem_ptr + lf

    font_ptr = f

    g = f

    font_mapping[f] = load_tfm_font_mapping

    goto done

612. Before we forget about the format of these tables, let’s deal with two of TEX’s basic scanning routines related to font information.

⟦612 Declare procedures that scan font-related stuff⟧ = ⟦
    function scan_font_ident() {
        var f: internal_font_number, m: halfword;
        
        ⟦440 Get the next non-blank non-call token⟧
        if (cur_cmd == def_font) {
            f = cur_font;
        } else if (cur_cmd == set_font) {
            f = cur_chr;
        } else if (cur_cmd == def_family) {
            m = cur_chr;
            scan_math_fam_int;
            f = equiv(m + cur_val);
        } else {
            print_err(strpool!("Missing font identifier"));
            help2(
              strpool!("I was looking for a control sequence whose"),
            )(
              strpool!("current meaning has been defined by \\font."),
            );
            back_error;
            f = null_font;
        }
        cur_val = f;
    }
⟧

613. The following routine is used to implement ‘\fontdimen n f ’. The boolean parameter writing is set true if the calling program intends to change the parameter value.

⟦612 Declare procedures that scan font-related stuff⟧ += ⟦
    // sets cur_val to font_info location
    function find_font_dimen(writing: boolean) {
        var
          f: internal_font_number,
          n: integer; // the parameter number
        
        scan_int;
        n = cur_val;
        scan_font_ident;
        f = cur_val;
        if (n <= 0) {
            cur_val = fmem_ptr;
        } else {
            if (
                writing
                && (n <= space_shrink_code)
                && (n >= space_code)
                && (font_glue[f] != null)
            ) {
                delete_glue_ref(font_glue[f]);
                font_glue[f] = null;
            }
            if (n > font_params[f]) {
                if (f < font_ptr) {
                    cur_val = fmem_ptr;
                } else {
                    ⟦615 Increase the number of parameters in the last font⟧
                }
            } else {
                cur_val = n + param_base[f];
            }
        }
        ⟦614 Issue an error message if |cur_val=fmem_ptr|⟧
    }
⟧

614.

⟦614 Issue an error message if |cur_val=fmem_ptr|⟧ = ⟦
    if (cur_val == fmem_ptr) {
        print_err(strpool!("Font "));
        print_esc(font_id_text(f));
        print(strpool!(" has only "));
        print_int(font_params[f]);
        print(strpool!(" fontdimen parameters"));
        help2(
          strpool!("To increase the number of font parameters, you must"),
        )(
          strpool!("use \\fontdimen immediately after the \\font is loaded."),
        );
        error;
    }
⟧

615.

⟦615 Increase the number of parameters in the last font⟧ = ⟦
    {
        repeat {
            if (fmem_ptr == font_mem_size) {
                overflow(
                  strpool!("font memory"),
                  font_mem_size,
                );
            }
            font_info[fmem_ptr].sc = 0;
            incr(fmem_ptr);
            incr(font_params[f]);
        } until (n == font_params[f]);
        // this equals param_base [ f ] + font_params [ f ] 
        cur_val = fmem_ptr - 1;
    }
⟧

616. When TEX wants to typeset a character that doesn’t exist, the character node is not created; thus the output routine can assume that characters exist when it sees them. The following procedure prints a warning message unless the user has suppressed it.

⟦616 Declare subroutines for |new_character|⟧ = ⟦
    // cf.~ print_hex 
    function print_ucs_code(n: UnicodeScalar) {
        var
          k: 0 .. 22; // index to current digit; we assume 
          // that $0\L n<16^{22}$
        
        k = 0;
        // prefix with U+ instead of "
        print(strpool!("U+"));
        repeat {
            dig[k] = n % 16;
            n = n div 16;
            incr(k);// pad to at least 4 hex digits
        } until (n == 0);
        while (k < 4) {
            dig[k] = 0;
            incr(k);
        }
        print_the_digs(k);
    }

    function char_warning(
      f: internal_font_number,
      c: integer,
    ) {
        var
          old_setting: integer; // saved value of 
          // tracing_online 
        
        if (tracing_lost_chars > 0) {
            old_setting = tracing_online;
            if (eTeX_ex && (tracing_lost_chars > 1)) {
                tracing_online = 1;
            }
            if (tracing_lost_chars > 2) {
                print_err(
                  strpool!("Missing character: There is no "),
                );
            } else {
                begin_diagnostic;
                print_nl(
                  strpool!("Missing character: There is no "),
                );
            }
            if (c < 0x10000) {
                print_ASCII(c);
            } else {
                // non-Plane 0 Unicodes can't be sent 
                // through print_ASCII 
                print_char(c);
            }
            print(strpool!(" ("));
            if (is_native_font(f)) {
                print_ucs_code(c);
            } else {
                print_hex(c);
            }
            print(ord!(")"));
            print(strpool!(" in font "));
            slow_print(font_name[f]);
            if (tracing_lost_chars < 3) {
                print_char(ord!("!"));
            }
            tracing_online = old_setting;
            if (tracing_lost_chars > 2) {
                help0;
                error;
            } else {
                end_diagnostic(false);
            }
            // of tracing_lost_chars > 0 
        }
        // of procedure
    }
⟧

617. The subroutines for new_character have been moved.

618. Here is a function that returns a pointer to a character node for a given character in a given font. If that character doesn’t exist, null is returned instead.

This allows a character node to be used if there is an equivalent in the char_sub_code list.

function new_character(
  f: internal_font_number,
  c: ASCII_code,
): pointer {
    label exit;
    var
      p: pointer, // newly allocated node
      ec: quarterword; // effective character of c 
    
    if (is_native_font(f)) {
        new_character = new_native_character(f, c);
        return;
    }
    ec = effective_char(false, f, qi(c));
    if (font_bc[f] <= qo(ec)) {
        if (font_ec[f] >= qo(ec)) {
            // N.B.: not char_info 
            if (char_exists(orig_char_info(f)(ec))) {
                p = get_avail;
                font(p) = f;
                character(p) = qi(c);
                new_character = p;
                return;
            }
        }
    }
    char_warning(f, c);
    new_character = null;
  exit:
}

619. [31] Device-independent file format. The most important output produced by a run of TEX is the “device independent” (DVI) file that specifies where characters and rules are to appear on printed pages. The form of these files was designed by David R. Fuchs in 1979. Almost any reasonable typesetting device can be driven by a program that takes DVI files as input, and dozens of such DVI-to-whatever programs have been written. Thus, it is possible to print the output of TEX on many different kinds of equipment, using TEX as a device-independent “front end.”

A DVI file is a stream of 8-bit bytes, which may be regarded as a series of commands in a machine-like language. The first byte of each command is the operation code, and this code is followed by zero or more bytes that provide parameters to the command. The parameters themselves may consist of several consecutive bytes; for example, the ‘set_rule ’ command has two parameters, each of which is four bytes long. Parameters are usually regarded as nonnegative integers; but four-byte-long parameters, and shorter parameters that denote distances, can be either positive or negative. Such parameters are given in two’s complement notation. For example, a two-byte-long distance parameter has a value between 215 and 2151. As in TFM files, numbers that occupy more than one byte position appear in BigEndian order.

TEX extends the format of DVI with its own commands, and thus produced “extended device independent” (XDV) files.

A DVI file consists of a “preamble,” followed by a sequence of one or more “pages,” followed by a “postamble.” The preamble is simply a pre command, with its parameters that define the dimensions used in the file; this must come first. Each “page” consists of a bop command, followed by any number of other commands that tell where characters are to be placed on a physical page, followed by an eop command. The pages appear in the order that TEX generated them. If we ignore nop commands and \fnt_def commands (which are allowed between any two commands in the file), each eop command is immediately followed by a bop command, or by a post command; in the latter case, there are no more pages in the file, and the remaining bytes form the postamble. Further details about the postamble will be explained later.

Some parameters in DVI commands are “pointers.” These are four-byte quantities that give the location number of some other byte in the file; the first byte is number 0, then comes number 1, and so on. For example, one of the parameters of a bop command points to the previous bop ; this makes it feasible to read the pages in backwards order, in case the results are being directed to a device that stacks its output face up. Suppose the preamble of a DVI file occupies bytes 0 to 99. Now if the first page occupies bytes 100 to 999, say, and if the second page occupies bytes 1000 to 1999, then the bop that starts in byte 1000 points to 100 and the bop that starts in byte 2000 points to 1000. (The very first bop , i.e., the one starting in byte 100, has a pointer of 1.)

620. The DVI format is intended to be both compact and easily interpreted by a machine. Compactness is achieved by making most of the information implicit instead of explicit. When a DVI-reading program reads the commands for a page, it keeps track of several quantities: (a) The current font f is an integer; this value is changed only by \fnt and \fnt_num commands. (b) The current position on the page is given by two numbers called the horizontal and vertical coordinates, h and v . Both coordinates are zero at the upper left corner of the page; moving to the right corresponds to increasing the horizontal coordinate, and moving down corresponds to increasing the vertical coordinate. Thus, the coordinates are essentially Cartesian, except that vertical directions are flipped; the Cartesian version of (h, v) would be (h, -v) . (c) The current spacing amounts are given by four numbers w , x , y , and z , where w and x are used for horizontal spacing and where y and z are used for vertical spacing. (d) There is a stack containing (h, v, w, x, y, z) values; the DVI commands push and pop are used to change the current level of operation. Note that the current font f is not pushed and popped; the stack contains only information about positioning.

The values of h , v , w , x , y , and z are signed integers having up to 32 bits, including the sign. Since they represent physical distances, there is a small unit of measurement such that increasing h by 1 means moving a certain tiny distance to the right. The actual unit of measurement is variable, as explained below; TEX sets things up so that its DVI output is in sp units, i.e., scaled points, in agreement with all the scaled dimensions in TEX’s data structures.

621. Here is a list of all the commands that may appear in a XDV file. Each command is specified by its symbolic name (e.g., bop ), its opcode byte (e.g., 139), and its parameters (if any). The parameters are followed by a bracketed number telling how many bytes they occupy; for example, ‘p[4] ’ means that parameter p is four bytes long.

set_char_0 0. Typeset character number 0 from font f such that the reference point of the character is at (h, v) . Then increase h by the width of that character. Note that a character may have zero or negative width, so one cannot be sure that h will advance after this command; but h usually does increase.

\set_char_1 through \set_char_127 (opcodes 1 to 127). Do the operations of set_char_0 ; but use the character whose number matches the opcode, instead of character 0.

set1 128 c[1] . Same as set_char_0 , except that character number c is typeset. TEX82 uses this command for characters in the range 128 <= c < 256 .

set2 129 c[2] . Same as set1 , except that c is two bytes long, so it is in the range 0 <= c < 65536 . TEX82 never uses this command, but it should come in handy for extensions of TEX that deal with oriental languages.

set3 130 c[3] . Same as set1 , except that c is three bytes long, so it can be as large as 2241. Not even the Chinese language has this many characters, but this command might prove useful in some yet unforeseen extension.

set4 131 c[4] . Same as set1 , except that c is four bytes long. Imagine that.

set_rule 132 a[4] b[4] . Typeset a solid black rectangle of height a and width b , with its bottom left corner at (h, v) . Then set h = h + b . If either a <= 0 or b <= 0 , nothing should be typeset. Note that if b < 0 , the value of h will decrease even though nothing else happens. See below for details about how to typeset rules so that consistency with is guaranteed.

put1 133 c[1] . Typeset character number c from font f such that the reference point of the character is at (h, v) . (The ‘put’ commands are exactly like the ‘set’ commands, except that they simply put out a character or a rule without moving the reference point afterwards.)

put2 134 c[2] . Same as set2 , except that h is not changed.

put3 135 c[3] . Same as set3 , except that h is not changed.

put4 136 c[4] . Same as set4 , except that h is not changed.

put_rule 137 a[4] b[4] . Same as set_rule , except that h is not changed.

nop 138. No operation, do nothing. Any number of nop ’s may occur between DVI commands, but a nop cannot be inserted between a command and its parameters or between two parameters.

bop 139 𝑐0[4] 𝑐1[4] 𝑐9[4] 𝑝[4]. Beginning of a page: Set (h, v, w, x, y, z) = (0, 0, 0, 0, 0, 0) and set the stack empty. Set the current font f to an undefined value. The ten 𝑐𝑖 parameters hold the values of \count0 \count9 in TEX at the time \shipout was invoked for this page; they can be used to identify pages, if a user wants to print only part of a DVI file. The parameter p points to the previous bop in the file; the first bop has 𝑝=1.

eop 140. End of page: Print what you have read since the previous bop . At this point the stack should be empty. (The DVI-reading programs that drive most output devices will have kept a buffer of the material that appears on the page that has just ended. This material is largely, but not entirely, in order by v coordinate and (for fixed v ) by h coordinate; so it usually needs to be sorted into some order that is appropriate for the device in question.)

push 141. Push the current values of (h, v, w, x, y, z) onto the top of the stack; do not change any of these values. Note that f is not pushed.

pop 142. Pop the top six values off of the stack and assign them respectively to (h, v, w, x, y, z) . The number of pops should never exceed the number of pushes, since it would be highly embarrassing if the stack were empty at the time of a pop command.

right1 143 b[1] . Set h = h + b , i.e., move right b units. The parameter is a signed number in two’s complement notation, -128 <= b < 128 ; if b < 0 , the reference point moves left.

right2 144 b[2] . Same as right1 , except that b is a two-byte quantity in the range -32768 <= b < 32768 .

right3 145 b[3] . Same as right1 , except that b is a three-byte quantity in the range -2^{23} <= b < 2^{23} .

right4 146 b[4] . Same as right1 , except that b is a four-byte quantity in the range -2^{31} <= b < 2^{31} .

w0 147. Set h = h + w ; i.e., move right w units. With luck, this parameterless command will usually suffice, because the same kind of motion will occur several times in succession; the following commands explain how w gets particular values.

w1 148 b[1] . Set w = b and h = h + b . The value of b is a signed quantity in two’s complement notation, -128 <= b < 128 . This command changes the current w spacing and moves right by b .

w2 149 b[2] . Same as w1 , but b is two bytes long, -32768 <= b < 32768 .

w3 150 b[3] . Same as w1 , but b is three bytes long, -2^{23} <= b < 2^{23} .

w4 151 b[4] . Same as w1 , but b is four bytes long, -2^{31} <= b < 2^{31} .

x0 152. Set h = h + x ; i.e., move right x units. The ‘x ’ commands are like the ‘w ’ commands except that they involve x instead of w .

x1 153 b[1] . Set x = b and h = h + b . The value of b is a signed quantity in two’s complement notation, -128 <= b < 128 . This command changes the current x spacing and moves right by b .

x2 154 b[2] . Same as x1 , but b is two bytes long, -32768 <= b < 32768 .

x3 155 b[3] . Same as x1 , but b is three bytes long, -2^{23} <= b < 2^{23} .

x4 156 b[4] . Same as x1 , but b is four bytes long, -2^{31} <= b < 2^{31} .

down1 157 a[1] . Set v = v + a , i.e., move down a units. The parameter is a signed number in two’s complement notation, -128 <= a < 128 ; if a < 0 , the reference point moves up.

down2 158 a[2] . Same as down1 , except that a is a two-byte quantity in the range -32768 <= a < 32768 .

down3 159 a[3] . Same as down1 , except that a is a three-byte quantity in the range -2^{23} <= a < 2^{23} .

down4 160 a[4] . Same as down1 , except that a is a four-byte quantity in the range -2^{31} <= a < 2^{31} .

y0 161. Set v = v + y ; i.e., move down y units. With luck, this parameterless command will usually suffice, because the same kind of motion will occur several times in succession; the following commands explain how y gets particular values.

y1 162 a[1] . Set y = a and v = v + a . The value of a is a signed quantity in two’s complement notation, -128 <= a < 128 . This command changes the current y spacing and moves down by a .

y2 163 a[2] . Same as y1 , but a is two bytes long, -32768 <= a < 32768 .

y3 164 a[3] . Same as y1 , but a is three bytes long, -2^{23} <= a < 2^{23} .

y4 165 a[4] . Same as y1 , but a is four bytes long, -2^{31} <= a < 2^{31} .

z0 166. Set v = v + z ; i.e., move down z units. The ‘z ’ commands are like the ‘y ’ commands except that they involve z instead of y .

z1 167 a[1] . Set z = a and v = v + a . The value of a is a signed quantity in two’s complement notation, -128 <= a < 128 . This command changes the current z spacing and moves down by a .

z2 168 a[2] . Same as z1 , but a is two bytes long, -32768 <= a < 32768 .

z3 169 a[3] . Same as z1 , but a is three bytes long, -2^{23} <= a < 2^{23} .

z4 170 a[4] . Same as z1 , but a is four bytes long, -2^{31} <= a < 2^{31} .

fnt_num_0 171. Set f = 0 . Font 0 must previously have been defined by a \fnt_def instruction, as explained below.

\fnt_num_1 through \fnt_num_63 (opcodes 172 to 234). Set f = 1 , …, f = 63 , respectively.

fnt1 235 k[1] . Set f = k . TEX82 uses this command for font numbers in the range 64 <= k < 256 .

fnt2 236 k[2] . Same as fnt1 , except that k is two bytes long, so it is in the range 0 <= k < 65536 . TEX82 never generates this command, but large font numbers may prove useful for specifications of color or texture, or they may be used for special fonts that have fixed numbers in some external coding scheme.

fnt3 237 k[3] . Same as fnt1 , except that k is three bytes long, so it can be as large as 2241.

fnt4 238 k[4] . Same as fnt1 , except that k is four bytes long; this is for the really big font numbers (and for the negative ones).

xxx1 239 k[1] x[k] . This command is undefined in general; it functions as a (𝑘+2)-byte nop unless special DVI-reading programs are being used. TEX82 generates xxx1 when a short enough \special appears, setting k to the number of bytes being sent. It is recommended that x be a string having the form of a keyword followed by possible parameters relevant to that keyword.

xxx2 240 k[2] x[k] . Like xxx1 , but 0 <= k < 65536 .

xxx3 241 k[3] x[k] . Like xxx1 , but 0 <= k < 2^{24} .

xxx4 242 k[4] x[k] . Like xxx1 , but k can be ridiculously large. TEX82 uses xxx4 when sending a string of length 256 or more.

fnt_def1 243 k[1] c[4] s[4] d[4] a[1] l[1] n[a + l] . Define font k , where 0 <= k < 256 ; font definitions will be explained shortly.

fnt_def2 244 k[2] c[4] s[4] d[4] a[1] l[1] n[a + l] . Define font k , where 0 <= k < 65536 .

fnt_def3 245 k[3] c[4] s[4] d[4] a[1] l[1] n[a + l] . Define font k , where 0 <= k < 2^{24} .

fnt_def4 246 k[4] c[4] s[4] d[4] a[1] l[1] n[a + l] . Define font k , where -2^{31} <= k < 2^{31} .

pre 247 i[1] num[4] den[4] mag[4] k[1] x[k] . Beginning of the preamble; this must come at the very beginning of the file. Parameters i , num , den , mag , k , and x are explained below.

post 248. Beginning of the postamble, see below.

post_post 249. Ending of the postamble, see below.

Commands 250–255 are undefined in normal DVI files, but the following commands are used in XDV files.

define_native_font 252 k[4] s[4] flags[2] l[1] n[l] i[4] if (flags && COLORED) then rgba[4] if (flags && EXTEND) then extend[4] if (flags && SLANT) then slant[4] if (flags && EMBOLDEN) then embolden[4]

set_glyphs 253 w[4] k[2] xy[8k] g[2k] .

set_text_and_glyphs 254 l[2] t[2l] w[4] k[2] xy[8k] g[2k] .

Commands 250 and 255 are undefined in normal XDV files.

622.

@define set_char_0 => 0 // typeset character 0 and move 
// right
@define set1 => 128 // typeset a character and move right
@define set_rule => 132 // typeset a rule and move right
@define put_rule => 137 // typeset a rule
@define nop => 138 // no operation
@define bop => 139 // beginning of page
@define eop => 140 // ending of page
@define push => 141 // save the current positions
@define pop => 142 // restore previous positions
@define right1 => 143 // move right
@define w0 => 147 // move right by w 
@define w1 => 148 // move right and set w 
@define x0 => 152 // move right by x 
@define x1 => 153 // move right and set x 
@define down1 => 157 // move down
@define y0 => 161 // move down by y 
@define y1 => 162 // move down and set y 
@define z0 => 166 // move down by z 
@define z1 => 167 // move down and set z 
@define fnt_num_0 => 171 // set current font to 0
@define fnt1 => 235 // set current font
@define xxx1 => 239 // extension to \.{DVI} primitives
// potentially long extension to \.{DVI} primitives
@define xxx4 => 242
// define the meaning of a font number
@define fnt_def1 => 243
@define pre => 247 // preamble
@define post => 248 // postamble beginning
@define post_post => 249 // postamble ending
@define define_native_font => 252 // define native font
// sequence of glyphs with individual x-y coordinates
@define set_glyphs => 253
// run of Unicode (UTF16) text followed by positioned glyphs
@define set_text_and_glyphs => 254

623. The preamble contains basic information about the file as a whole. As stated above, there are six parameters:

i[1]num[4]den[4]mag[4]k[1]x[k].
The i byte identifies DVI format; in XƎTEX this byte is set to 7, as we have new DVI opcodes, while in TEX82 it is always set to 2. (The value i == 3 is used for an extended format that allows a mixture of right-to-left and left-to-right typesetting. Older versions of XƎTEX used i == 4 , i == 5 and i == 6 .)

The next two parameters, num and den , are positive integers that define the units of measurement; they are the numerator and denominator of a fraction by which all dimensions in the DVI file could be multiplied in order to get lengths in units of 107 meters. Since 7227pt=254cm, and since TEX works with scaled points where there are 216 sp in a point, TEX sets 𝑛𝑢𝑚/𝑑𝑒𝑛=(254105)/(7227216)=25400000/473628672.

The mag parameter is what TEX calls \mag, i.e., 1000 times the desired magnification. The actual fraction by which dimensions are multiplied is therefore 𝑚𝑎𝑔𝑛𝑢𝑚/1000𝑑𝑒𝑛. Note that if a TEX source document does not call for any ‘true’ dimensions, and if you change it only by specifying a different \mag setting, the DVI file that TEX creates will be completely unchanged except for the value of mag in the preamble and postamble. (Fancy DVI-reading programs allow users to override the mag setting when a DVI file is being printed.)

Finally, k and x allow the DVI writer to include a comment, which is not interpreted further. The length of comment x is k , where 0 <= k < 256 .

// identifies the kind of \.{DVI} files described here
@define id_byte => 7

624. Font definitions for a given font number k contain further parameters

c[4]s[4]d[4]a[1]l[1]n[a+l].
The four-byte value c is the check sum that TEX found in the TFM file for this font; c should match the check sum of the font found by programs that read this DVI file.

Parameter s contains a fixed-point scale factor that is applied to the character widths in font k ; font dimensions in TFM files and other font files are relative to this quantity, which is called the “at size” elsewhere in this documentation. The value of s is always positive and less than 227. It is given in the same units as the other DVI dimensions, i.e., in sp when TEX82 has made the file. Parameter d is similar to s ; it is the “design size,” and (like s ) it is given in DVI units. Thus, font k is to be used at 𝑚𝑎𝑔𝑠/1000𝑑 times its normal size.

The remaining part of a font definition gives the external name of the font, which is an ASCII string of length a + l . The number a is the length of the “area” or directory, and l is the length of the font name itself; the standard local system font area is supposed to be used when a == 0 . The n field contains the area in its first a bytes.

Font definitions must appear before the first use of a particular font number. Once font k is defined, it must not be defined again; however, we shall see below that font definitions appear in the postamble as well as in the pages, so in this sense each font number is defined exactly twice, if at all. Like nop commands, font definitions can appear before the first bop , or between an eop and a bop .

625. Sometimes it is desirable to make horizontal or vertical rules line up precisely with certain features in characters of a font. It is possible to guarantee the correct matching between DVI output and the characters generated by by adhering to the following principles: (1) The characters should be positioned so that a bottom edge or left edge that is supposed to line up with the bottom or left edge of a rule appears at the reference point, i.e., in row 0 and column 0 of the raster. This ensures that the position of the rule will not be rounded differently when the pixel size is not a perfect multiple of the units of measurement in the DVI file. (2) A typeset rule of height 𝑎>0 and width 𝑏>0 should be equivalent to a -generated character having black pixels in precisely those raster positions whose coordinates satisfy 0 <= x < \alpha b and 0 <= y < \alpha a , where 𝛼 is the number of pixels per DVI unit.

626. The last page in a DVI file is followed by ‘post ’; this command introduces the postamble, which summarizes important facts that TEX has accumulated about the file, making it possible to print subsets of the data with reasonable efficiency. The postamble has the form

postp[4]num[4]den[4]mag[4]l[4]u[4]s[2]t[2]fontdenitionspost_postq[4]i[1]223s[4]
Here p is a pointer to the final bop in the file. The next three parameters, num , den , and mag , are duplicates of the quantities that appeared in the preamble.

Parameters l and u give respectively the height-plus-depth of the tallest page and the width of the widest page, in the same units as other dimensions of the file. These numbers might be used by a DVI-reading program to position individual “pages” on large sheets of film or paper; however, the standard convention for output on normal size paper is to position each page so that the upper left-hand corner is exactly one inch from the left and the top. Experience has shown that it is unwise to design DVI-to-printer software that attempts cleverly to center the output; a fixed position of the upper left corner is easiest for users to understand and to work with. Therefore l and u are often ignored.

Parameter s is the maximum stack depth (i.e., the largest excess of push commands over pop commands) needed to process this file. Then comes t , the total number of pages (bop commands) present.

The postamble continues with font definitions, which are any number of \fnt_def commands as described above, possibly interspersed with nop commands. Each font number that is used in the DVI file must be defined exactly twice: Once before it is first selected by a \fnt command, and once in the postamble.

627. The last part of the postamble, following the post_post byte that signifies the end of the font definitions, contains q , a pointer to the post command that started the postamble. An identification byte, i , comes next; this currently equals 2, as in the preamble.

The i byte is followed by four or more bytes that are all equal to the decimal number 223 (i.e., 0xDF in octal). TEX puts out four to seven of these trailing bytes, until the total length of the file is a multiple of four bytes, since this works out best on machines that pack four bytes per word; but any number of 223’s is allowed, as long as there are at least four of them. In effect, 223 is a sort of signature that is added at the very end.

This curious way to finish off a DVI file makes it feasible for DVI-reading programs to find the postamble first, on most computers, even though TEX wants to write the postamble last. Most operating systems permit random access to individual words or bytes of a file, so the DVI reader can start at the end and skip backwards over the 223’s until finding the identification byte. Then it can back up four bytes, read q , and move to byte q of the file. This byte should, of course, contain the value 248 (post ); now the postamble can be read, so the DVI reader can discover all the information needed for typesetting the pages. Note that it is also possible to skip through the DVI file at reasonably high speed to locate a particular page, if that proves desirable. This saves a lot of time, since DVI files used in production jobs tend to be large.

Unfortunately, however, standard Pascal does not include the ability to access a random position in a file, or even to determine the length of a file. Almost all systems nowadays provide the necessary capabilities, so DVI format has been designed to work most efficiently with modern operating systems. But if DVI files have to be processed under the restrictions of standard Pascal, one can simply read them from front to back, since the necessary header information is present in the preamble and in the font definitions. (The l and u and s and t parameters, which appear only in the postamble, are “frills” that are handy but not absolutely necessary.)

628. [32] Shipping pages out. After considering TEX’s eyes and stomach, we come now to the bowels.

The ship_out procedure is given a pointer to a box; its mission is to describe that box in DVI form, outputting a “page” to dvi_file . The DVI coordinates (,𝑣)=(0,0) should correspond to the upper left corner of the box being shipped.

Since boxes can be inside of boxes inside of boxes, the main work of ship_out is done by two mutually recursive routines, hlist_out and vlist_out , which traverse the hlists and vlists inside of horizontal and vertical boxes.

As individual pages are being processed, we need to accumulate information about the entire set of pages, since such statistics must be reported in the postamble. The global variables total_pages , max_v , max_h , max_push , and last_bop are used to record this information.

The variable doing_leaders is true while leaders are being output. The variable dead_cycles contains the number of times an output routine has been initiated since the last ship_out .

A few additional global variables are also defined here for use in vlist_out and hlist_out . They could have been local variables, but that would waste stack space when boxes are deeply nested, since the values of these variables are not needed during recursive calls.

⟦13 Global variables⟧ += ⟦
    // the number of pages that have been shipped out
    var total_pages: integer;

    // maximum height-plus-depth of pages shipped so far
    var max_v: scaled;

    // maximum width of pages shipped so far
    var max_h: scaled;

    // deepest nesting of push commands encountered so far
    var max_push: integer;

    // location of previous bop in the \.{DVI} output
    var last_bop: integer;

    // recent outputs that didn't ship anything out
    var dead_cycles: integer;

    // are we inside a leader box?
    // character and font in current char_node 
    var doing_leaders: boolean;

    var c: quarterword;

    var f: internal_font_number;

    // size of current rule being output
    var rule_ht, rule_dp, rule_wd: scaled;

    // current glue specification
    var g: pointer;

    // quantities used in calculations for leaders
    var lq, lr: integer;
⟧

629.

⟦23 Set initial values of key variables⟧ += ⟦
    total_pages = 0

    max_v = 0

    max_h = 0

    max_push = 0

    last_bop = -1

    doing_leaders = false

    dead_cycles = 0

    cur_s = -1

630. The DVI bytes are output to a buffer instead of being written directly to the output file. This makes it possible to reduce the overhead of subroutine calls, thereby measurably speeding up the computation, since output of DVI bytes is part of TEX’s inner loop. And it has another advantage as well, since we can change instructions in the buffer in order to make the output more compact. For example, a ‘down2 ’ command can be changed to a ‘y2 ’, thereby making a subsequent ‘y0 ’ command possible, saving two bytes.

The output buffer is divided into two parts of equal size; the bytes found in dvi_buf[0 .. half_buf - 1] constitute the first half, and those in dvi_buf[half_buf .. dvi_buf_size - 1] constitute the second. The global variable dvi_ptr points to the position that will receive the next output byte. When dvi_ptr reaches dvi_limit , which is always equal to one of the two values half_buf or dvi_buf_size , the half buffer that is about to be invaded next is sent to the output and dvi_limit is changed to its other value. Thus, there is always at least a half buffer’s worth of information present, except at the very beginning of the job.

Bytes of the DVI file are numbered sequentially starting with 0; the next byte to be generated will be number dvi_offset + dvi_ptr . A byte is present in the buffer only if its number is >=dvi_gone .

⟦18 Types in the outer block⟧ += ⟦
    // an index into the output buffer
    type dvi_index = 0 .. dvi_buf_size;
⟧

631. Some systems may find it more efficient to make dvi_buf a packed array, since output of four bytes at once may be facilitated.

⟦13 Global variables⟧ += ⟦
    // buffer for \.{DVI} output
    var dvi_buf: ^eight_bits;

    // half of dvi_buf_size 
    var half_buf: integer;

    // end of the current half buffer
    var dvi_limit: integer;

    // the next available buffer address
    var dvi_ptr: integer;

    //  dvi_buf_size times the number of times the output 
    // buffer has been fully emptied
    var dvi_offset: integer;

    // the number of bytes already output to dvi_file 
    var dvi_gone: integer;
⟧

632. Initially the buffer is all in one piece; we will output half of it only after it first fills up.

⟦23 Set initial values of key variables⟧ += ⟦
    half_buf = dvi_buf_size div 2

    dvi_limit = dvi_buf_size

    dvi_ptr = 0

    dvi_offset = 0

    dvi_gone = 0

633. The actual output of dvi_buf[a .. b] to dvi_file is performed by calling write_dvi(a, b) . For best results, this procedure should be optimized to run as fast as possible on each particular system, since it is part of TEX’s inner loop. It is safe to assume that a and b + 1 will both be multiples of 4 when write_dvi(a, b) is called; therefore it is possible on many machines to use efficient methods to pack four bytes per word and to output an array of words with one system call.

In C, we use a macro to call fwrite or write directly, writing all the bytes in one shot. Much better even than writing four bytes at a time.

634. To put a byte in the buffer without paying the cost of invoking a procedure each time, we use the macro dvi_out .

The length of dvi_file should not exceed 0x7fffffff ; we set cur_s = -2 to prevent further DVI output causing infinite recursion.

@define dvi_out(#) =>
    {
        dvi_buf[dvi_ptr] = #;
        incr(dvi_ptr);
        if (dvi_ptr == dvi_limit) {
            dvi_swap;
        }
    }
// outputs half of the buffer
function dvi_swap() {
    if (dvi_ptr > (0x7fffffff - dvi_offset)) {
        cur_s = -2;
        fatal_error(
          strpool!("dvi length exceeds \"7FFFFFFF"),
        );
    }
    if (dvi_limit == dvi_buf_size) {
        write_dvi(0, half_buf - 1);
        dvi_limit = half_buf;
        dvi_offset = dvi_offset + dvi_buf_size;
        dvi_ptr = 0;
    } else {
        write_dvi(half_buf, dvi_buf_size - 1);
        dvi_limit = dvi_buf_size;
    }
    dvi_gone = dvi_gone + half_buf;
}

635. Here is how we clean out the buffer when TEX is all through; dvi_ptr will be a multiple of 4.

⟦635 Empty the last bytes out of |dvi_buf|⟧ = ⟦
    if (dvi_limit == half_buf) {
        write_dvi(half_buf, dvi_buf_size - 1);
    }

    if (dvi_ptr > (0x7fffffff - dvi_offset)) {
        cur_s = -2;
        fatal_error(
          strpool!("dvi length exceeds \"7FFFFFFF"),
        );
    }

    if (dvi_ptr > 0) {
        write_dvi(0, dvi_ptr - 1);
    }
⟧

636. The dvi_four procedure outputs four bytes in two’s complement notation, without risking arithmetic overflow.

function dvi_four(x: integer) {
    if (x >= 0) {
        dvi_out(x div 0x1000000);
    } else {
        x = x + 0x40000000;
        x = x + 0x40000000;
        dvi_out((x div 0x1000000) + 128);
    }
    x = x % 0x1000000;
    dvi_out(x div 0x10000);
    x = x % 0x10000;
    dvi_out(x div 0x100);
    dvi_out(x % 0x100);
}

function dvi_two(s: UTF16_code) {
    dvi_out(s div 0x100);
    dvi_out(s % 0x100);
}

637. A mild optimization of the output is performed by the dvi_pop routine, which issues a pop unless it is possible to cancel a ‘push pop ’ pair. The parameter to dvi_pop is the byte address following the old push that matches the new pop .

function dvi_pop(l: integer) {
    if ((l == dvi_offset + dvi_ptr) && (dvi_ptr > 0)) {
        decr(dvi_ptr);
    } else {
        dvi_out(pop);
    }
}

638. Here’s a procedure that outputs a font definition. Since TEX82 uses at most 256 different fonts per job, fnt_def1 is always used as the command code.

function dvi_native_font_def(f: internal_font_number) {
    var font_def_length, i: integer;
    
    dvi_out(define_native_font);
    dvi_four(f - font_base - 1);
    font_def_length = make_font_def(f);
    for (i in 0 to font_def_length - 1) {
        dvi_out(xdv_buffer[i]);
    }
}

function dvi_font_def(f: internal_font_number) {
    var
      k: pool_pointer, // index into str_pool 
      l: integer; // length of name without mapping option
    
    if (is_native_font(f)) {
        dvi_native_font_def(f);
    } else {
        if (f <= 256 + font_base) {
            dvi_out(fnt_def1);
            dvi_out(f - font_base - 1);
        } else {
            dvi_out(fnt_def1 + 1);
            dvi_out((f - font_base - 1) div 0x100);
            dvi_out((f - font_base - 1) % 0x100);
        }
        dvi_out(qo(font_check[f].b0));
        dvi_out(qo(font_check[f].b1));
        dvi_out(qo(font_check[f].b2));
        dvi_out(qo(font_check[f].b3));
        dvi_four(font_size[f]);
        dvi_four(font_dsize[f]);
        dvi_out(length(font_area[f]));
        ⟦639 Output the font name whose internal number is |f|⟧
    }
}

639.

⟦639 Output the font name whose internal number is |f|⟧ = ⟦
    l = 0

    // search for colon; we will truncate the name there
    k = str_start_macro(font_name[f])

    while (
        (l == 0)
        && (k < str_start_macro(font_name[f] + 1))
    ) {
        if (so(str_pool[k]) == ord!(":")) {
            l = k - str_start_macro(font_name[f]);
        }
        incr(k);
    }

    if (l == 0) {
        // no colon found
        l = length(font_name[f]);
    }

    dvi_out(l)

    for (k in str_start_macro(font_area[f]) to 
        str_start_macro(font_area[f] + 1)
        - 1
    ) {
        dvi_out(so(str_pool[k]));
    }

    for (k in str_start_macro(font_name[f]) to 
        str_start_macro(font_name[f])
        + l - 1
    ) {
        dvi_out(so(str_pool[k]));
    }

    
⟧

640. Versions of TEX intended for small computers might well choose to omit the ideas in the next few parts of this program, since it is not really necessary to optimize the DVI code by making use of the w0 , x0 , y0 , and z0 commands. Furthermore, the algorithm that we are about to describe does not pretend to give an optimum reduction in the length of the DVI code; after all, speed is more important than compactness. But the method is surprisingly effective, and it takes comparatively little time.

We can best understand the basic idea by first considering a simpler problem that has the same essential characteristics. Given a sequence of digits, say 3141592653589, we want to assign subscripts 𝑑, 𝑦, or 𝑧 to each digit so as to maximize the number of “𝑦-hits” and “𝑧-hits”; a 𝑦-hit is an instance of two appearances of the same digit with the subscript 𝑦, where no 𝑦’s intervene between the two appearances, and a 𝑧-hit is defined similarly. For example, the sequence above could be decorated with subscripts as follows:

3𝑧1𝑦4𝑑1𝑦5𝑦9𝑑2𝑑6𝑑5𝑦3𝑧5𝑦8𝑑9𝑑.
There are three 𝑦-hits (1𝑦1𝑦 and 5𝑦5𝑦5𝑦) and one 𝑧-hit (3𝑧3𝑧); there are no 𝑑-hits, since the two appearances of 9𝑑 have 𝑑’s between them, but we don’t count 𝑑-hits so it doesn’t matter how many there are. These subscripts are analogous to the DVI commands called \down, 𝑦, and 𝑧, and the digits are analogous to different amounts of vertical motion; a 𝑦-hit or 𝑧-hit corresponds to the opportunity to use the one-byte commands y0 or z0 in a DVI file.

TEX’s method of assigning subscripts works like this: Append a new digit, say 𝛿, to the right of the sequence. Now look back through the sequence until one of the following things happens: (a) You see 𝛿𝑦 or 𝛿𝑧, and this was the first time you encountered a 𝑦 or 𝑧 subscript, respectively. Then assign 𝑦 or 𝑧 to the new 𝛿; you have scored a hit. (b) You see 𝛿𝑑, and no 𝑦 subscripts have been encountered so far during this search. Then change the previous 𝛿𝑑 to 𝛿𝑦 (this corresponds to changing a command in the output buffer), and assign 𝑦 to the new 𝛿; it’s another hit. (c) You see 𝛿𝑑, and a 𝑦 subscript has been seen but not a 𝑧. Change the previous 𝛿𝑑 to 𝛿𝑧 and assign 𝑧 to the new 𝛿. (d) You encounter both 𝑦 and 𝑧 subscripts before encountering a suitable 𝛿, or you scan all the way to the front of the sequence. Assign 𝑑 to the new 𝛿; this assignment may be changed later.

The subscripts 3𝑧1𝑦4𝑑 in the example above were, in fact, produced by this procedure, as the reader can verify. (Go ahead and try it.)

641. In order to implement such an idea, TEX maintains a stack of pointers to the \down, 𝑦, and 𝑧 commands that have been generated for the current page. And there is a similar stack for \right, w , and x commands. These stacks are called the down stack and right stack, and their top elements are maintained in the variables down_ptr and right_ptr .

Each entry in these stacks contains four fields: The width field is the amount of motion down or to the right; the location field is the byte number of the DVI command in question (including the appropriate dvi_offset ); the link field points to the next item below this one on the stack; and the info field encodes the options for possible change in the DVI command.

// number of words per entry in the down and right stacks
@define movement_node_size => 3
// \.{DVI} byte number for a movement command
@define location(#) => mem[# + 2].int
⟦13 Global variables⟧ += ⟦
    // heads of the down and right stacks
    var down_ptr, right_ptr: pointer;
⟧

642.

⟦23 Set initial values of key variables⟧ += ⟦
    down_ptr = null

    right_ptr = null
⟧

643. Here is a subroutine that produces a DVI command for some specified downward or rightward motion. It has two parameters: w is the amount of motion, and o is either down1 or right1 . We use the fact that the command codes have convenient arithmetic properties: y1 - down1 == w1 - right1 and z1 - down1 == x1 - right1 .

function movement(w: scaled, o: eight_bits) {
    label exit, found, not_found, 2, 1;
    var
      mstate: small_number, // have we seen a y or z ?
      p, q: pointer, // current and top nodes on the stack
      k: integer; // index into dvi_buf , modulo 
      // dvi_buf_size 
    
    // new node for the top of the stack
    q = get_node(movement_node_size);
    width(q) = w;
    location(q) = dvi_offset + dvi_ptr;
    if (o == down1) {
        link(q) = down_ptr;
        down_ptr = q;
    } else {
        link(q) = right_ptr;
        right_ptr = q;
    }
    ⟦647 Look at the other stack entries until deciding what sort of \.{DVI} command to generate; |goto found| if node |p| is a ``hit''⟧
    ⟦646 Generate a |down| or |right| command for |w| and |return|⟧
  found:
    ⟦645 Generate a |y0| or |z0| command in order to reuse a previous appearance of~|w|⟧
  exit:
}

644. The info fields in the entries of the down stack or the right stack have six possible settings: y_here or z_here mean that the DVI command refers to y or z , respectively (or to w or x , in the case of horizontal motion); yz_OK means that the DVI command is \down (or \right) but can be changed to either y or z (or to either w or x ); y_OK means that it is \down and can be changed to y but not z ; z_OK is similar; and d_fixed means it must stay \down.

The four settings yz_OK , y_OK , z_OK , d_fixed would not need to be distinguished from each other if we were simply solving the digit-subscripting problem mentioned above. But in TEX’s case there is a complication because of the nested structure of push and pop commands. Suppose we add parentheses to the digit-subscripting problem, redefining hits so that 𝛿𝑦𝛿𝑦 is a hit if all 𝑦’s between the 𝛿’s are enclosed in properly nested parentheses, and if the parenthesis level of the right-hand 𝛿𝑦 is deeper than or equal to that of the left-hand one. Thus, ‘(’ and ‘)’ correspond to ‘push ’ and ‘pop ’. Now if we want to assign a subscript to the final 1 in the sequence

2𝑦7𝑑1𝑑(8𝑧2𝑦8𝑧)1
we cannot change the previous 1𝑑 to 1𝑦, since that would invalidate the 2𝑦2𝑦 hit. But we can change it to 1𝑧, scoring a hit since the intervening 8𝑧’s are enclosed in parentheses.

The program below removes movement nodes that are introduced after a push , before it outputs the corresponding pop .

//  info when the movement entry points to a y command
@define y_here => 1
//  info when the movement entry points to a z command
@define z_here => 2
//  info corresponding to an unconstrained \\{down} command
@define yz_OK => 3
//  info corresponding to a \\{down} that can't become a z 
@define y_OK => 4
//  info corresponding to a \\{down} that can't become a y 
@define z_OK => 5
//  info corresponding to a \\{down} that can't change
@define d_fixed => 6

645. When the movement procedure gets to the label found , the value of info(p) will be either y_here or z_here . If it is, say, y_here , the procedure generates a y0 command (or a w0 command), and marks all info fields between q and p so that y is not OK in that range.

⟦645 Generate a |y0| or |z0| command in order to reuse a previous appearance of~|w|⟧ = ⟦
    info(q) = info(p)

    if (info(q) == y_here) {
        //  y0 or w0 
        dvi_out(o + y0 - down1);
        while (link(q) != p) {
            q = link(q);
            case info(q) {
              yz_OK:
                info(q) = z_OK;
              y_OK:
                info(q) = d_fixed;
              othercases:
                do_nothing;
            }
        }
    } else {
        //  z0 or x0 
        dvi_out(o + z0 - down1);
        while (link(q) != p) {
            q = link(q);
            case info(q) {
              yz_OK:
                info(q) = y_OK;
              z_OK:
                info(q) = d_fixed;
              othercases:
                do_nothing;
            }
        }
    }
⟧

646.

⟦646 Generate a |down| or |right| command for |w| and |return|⟧ = ⟦
    info(q) = yz_OK

    if (abs(w) >= 0x800000) {
        //  down4 or right4 
        dvi_out(o + 3);
        dvi_four(w);
        return;
    }

    if (abs(w) >= 0x8000) {
        //  down3 or right3 
        dvi_out(o + 2);
        if (w < 0) {
            w = w + 0x1000000;
        }
        dvi_out(w div 0x10000);
        w = w % 0x10000;
        goto 2;
    }

    if (abs(w) >= 0x80) {
        //  down2 or right2 
        dvi_out(o + 1);
        if (w < 0) {
            w = w + 0x10000;
        }
        goto 2;
    }

    dvi_out(o) //  down1 or right1 

    if (w < 0) {
        w = w + 0x100;
    }

    goto 1

    2:

    dvi_out(w div 0x100)

    1:

    dvi_out(w % 0x100)

    return
⟧

647. As we search through the stack, we are in one of three states, y_seen , z_seen , or none_seen , depending on whether we have encountered y_here or z_here nodes. These states are encoded as multiples of 6, so that they can be added to the info fields for quick decision-making.

// no y_here or z_here nodes have been encountered yet
@define none_seen => 0
@define y_seen => 6 // we have seen y_here but not z_here 
// we have seen z_here but not y_here 
@define z_seen => 12
⟦647 Look at the other stack entries until deciding what sort of \.{DVI} command to generate; |goto found| if node |p| is a ``hit''⟧ = ⟦
    p = link(q)

    mstate = none_seen

    while (p != null) {
        if (width(p) == w) {
            ⟦648 Consider a node with matching width; |goto found| if it's a hit⟧
        } else {
            case mstate + info(p) {
              none_seen + y_here:
                mstate = y_seen;
              none_seen + z_here:
                mstate = z_seen;
              y_seen + z_here, z_seen + y_here:
                goto not_found;
              othercases:
                do_nothing;
            }
        }
        p = link(p);
    }

    not_found:
⟧

648. We might find a valid hit in a y or z byte that is already gone from the buffer. But we can’t change bytes that are gone forever; “the moving finger writes, .”

⟦648 Consider a node with matching width; |goto found| if it's a hit⟧ = ⟦
    case mstate + info(p) {
      none_seen + yz_OK,
      none_seen + y_OK,
      z_seen + yz_OK,
      z_seen + y_OK:
        if (location(p) < dvi_gone) {
            goto not_found;
        } else {
            ⟦649 Change buffered instruction to |y| or |w| and |goto found|⟧
        }
      none_seen + z_OK, y_seen + yz_OK, y_seen + z_OK:
        if (location(p) < dvi_gone) {
            goto not_found;
        } else {
            ⟦650 Change buffered instruction to |z| or |x| and |goto found|⟧
        }
      none_seen + y_here,
      none_seen + z_here,
      y_seen + z_here,
      z_seen + y_here:
        goto found;
      othercases:
        do_nothing;
    }
⟧

649.

⟦649 Change buffered instruction to |y| or |w| and |goto found|⟧ = ⟦
    {
        k = location(p) - dvi_offset;
        if (k < 0) {
            k = k + dvi_buf_size;
        }
        dvi_buf[k] = dvi_buf[k] + y1 - down1;
        info(p) = y_here;
        goto found;
    }
⟧

650.

⟦650 Change buffered instruction to |z| or |x| and |goto found|⟧ = ⟦
    {
        k = location(p) - dvi_offset;
        if (k < 0) {
            k = k + dvi_buf_size;
        }
        dvi_buf[k] = dvi_buf[k] + z1 - down1;
        info(p) = z_here;
        goto found;
    }
⟧

651. In case you are wondering when all the movement nodes are removed from TEX’s memory, the answer is that they are recycled just before hlist_out and vlist_out finish outputting a box. This restores the down and right stacks to the state they were in before the box was output, except that some info ’s may have become more restrictive.

// delete movement nodes with location >= l 
function prune_movements(l: integer) {
    label done, exit;
    var
      p: pointer; // node being deleted
    
    while (down_ptr != null) {
        if (location(down_ptr) < l) {
            goto done;
        }
        p = down_ptr;
        down_ptr = link(p);
        free_node(p, movement_node_size);
    }
  done:
    while (right_ptr != null) {
        if (location(right_ptr) < l) {
            return;
        }
        p = right_ptr;
        right_ptr = link(p);
        free_node(p, movement_node_size);
    }
  exit:
}

652. The actual distances by which we want to move might be computed as the sum of several separate movements. For example, there might be several glue nodes in succession, or we might want to move right by the width of some box plus some amount of glue. More importantly, the baselineskip distances are computed in terms of glue together with the depth and height of adjacent boxes, and we want the DVI file to lump these three quantities together into a single motion.

Therefore, TEX maintains two pairs of global variables: dvi_h and dvi_v are the h and v coordinates corresponding to the commands actually output to the DVI file, while cur_h and cur_v are the coordinates corresponding to the current state of the output routines. Coordinate changes will accumulate in cur_h and cur_v without being reflected in the output, until such a change becomes necessary or desirable; we can call the movement procedure whenever we want to make dvi_h == cur_h or dvi_v == cur_v .

The current font reflected in the DVI output is called dvi_f ; there is no need for a ‘\cur_f’ variable.

The depth of nesting of hlist_out and vlist_out is called cur_s ; this is essentially the depth of push commands in the DVI output.

For mixed direction text (TEX--X E T) the current text direction is called cur_dir . As the box being shipped out will never be used again and soon be recycled, we can simply reverse any R-text (i.e., right-to-left) segments of hlist nodes as well as complete hlist nodes embedded in such segments. Moreover this can be done iteratively rather than recursively. There are, however, two complications related to leaders that require some additional bookkeeping: (1) One and the same hlist node might be used more than once (but never inside both L- and R-text); and (2) leader boxes inside hlists must be aligned with respect to the left edge of the original hlist.

A math node is changed into a kern node whenever the text direction remains the same, it is replaced by an edge_node if the text direction changes; the subtype of an an hlist_node inside R-text is changed to reversed once its hlist has been reversed.

// subtype for an hlist_node whose hlist has been reversed
@define reversed => 1
// subtype for an hlist_node from display math mode
@define dlist => 2
@define box_lr(#) =>
    (qo(subtype(#))) // direction mode of a box
@define set_box_lr(#) => subtype(#) = set_box_lr_end
@define set_box_lr_end(#) => qi(#)
@define left_to_right => 0
@define right_to_left => 1
@define reflected => 1 - cur_dir // the opposite of cur_dir 
@define synch_h =>
    if (cur_h != dvi_h) {
        movement(cur_h - dvi_h, right1);
        dvi_h = cur_h;
    }
@define synch_v =>
    if (cur_v != dvi_v) {
        movement(cur_v - dvi_v, down1);
        dvi_v = cur_v;
    }
⟦13 Global variables⟧ += ⟦
    // a \.{DVI} reader program thinks we are here
    var dvi_h, dvi_v: scaled;

    // \TeX\ thinks we are here
    var cur_h, cur_v: scaled;

    // the current font
    var dvi_f: internal_font_number;

    // current depth of output box nesting, initially $-1$
    var cur_s: integer;
⟧

653.

⟦653 Initialize variables as |ship_out| begins⟧ = ⟦
    dvi_h = 0

    dvi_v = 0

    cur_h = h_offset

    dvi_f = null_font

    ⟦1428 Calculate page dimensions and margins⟧

    ensure_dvi_open

    if (total_pages == 0) {
        dvi_out(pre);
        // output the preamble
        dvi_out(id_byte);
        dvi_four(25400000);
        // conversion ratio for sp
        dvi_four(473628672);
        prepare_mag;
        // magnification factor is frozen
        dvi_four(mag);
        if (output_comment) {
            l = strlen(output_comment);
            dvi_out(l);
            for (s in 0 to l - 1) {
                dvi_out(output_comment[s]);
            }
        } else {
            // the default code is unchanged
            old_setting = selector;
            selector = new_string;
            print(strpool!(" XeTeX output "));
            print_int(year);
            print_char(ord!("."));
            print_two(month);
            print_char(ord!("."));
            print_two(day);
            print_char(ord!(":"));
            print_two(time div 60);
            print_two(time % 60);
            selector = old_setting;
            dvi_out(cur_length);
            for (s in str_start_macro(str_ptr) to 
                pool_ptr
                - 1
            ) {
                dvi_out(so(str_pool[s]));
            }
            // flush the current string
            pool_ptr = str_start_macro(str_ptr);
        }
    }
⟧

654. When hlist_out is called, its duty is to output the box represented by the hlist_node pointed to by temp_ptr . The reference point of that box has coordinates (cur_h, cur_v) .

Similarly, when vlist_out is called, its duty is to output the box represented by the vlist_node pointed to by temp_ptr . The reference point of that box has coordinates (cur_h, cur_v) .

//  hlist_out and vlist_out are mutually recursive
forward_declaration vlist_out();

655. The recursive procedures hlist_out and vlist_out each have local variables save_h and save_v to hold the values of dvi_h and dvi_v just before entering a new level of recursion. In effect, the values of save_h and save_v on TEX’s run-time stack correspond to the values of h and v that a DVI-reading program will push onto its coordinate stack.

// go to this label when advancing past glue or a rule
@define move_past => 13
// go to this label to finish processing a rule
@define fin_rule => 14
// go to this label when finished with node p 
@define next_p => 15
@define check_next => 1236
@define end_node_run => 1237
⟦1431 Declare procedures needed in |hlist_out|, |vlist_out|⟧

// output an hlist_node box
function hlist_out() {
    label
        reswitch,
        move_past,
        fin_rule,
        next_p,
        continue,
        found,
        check_next,
        end_node_run;
    var
      base_line: scaled, // the baseline coordinate for this 
      // box
      left_edge: scaled, // the left coordinate for this box
      save_h, save_v: scaled, // what dvi_h and dvi_v should 
      // pop to
      this_box: pointer, // pointer to containing box
      g_order: glue_ord, // applicable order of infinity for 
      // glue
      g_sign: normal .. shrinking, // selects type of glue
      p: pointer, // current position in the hlist
      save_loc: integer, // \.{DVI} byte location upon entry
      leader_box: pointer, // the leader box being 
      // replicated
      leader_wd: scaled, // width of leader box being 
      // replicated
      lx: scaled, // extra space between leader boxes
      outer_doing_leaders: boolean, // were we doing 
      // leaders?
      edge: scaled, // right edge of sub-box or leader space
      prev_p: pointer, // one step behind p 
      len: integer, // length of scratch string for native 
      // word output
      q, r: pointer,
      k, j: integer,
      glue_temp: real, // glue value before rounding
      cur_glue: real, // glue seen so far
      cur_g: scaled; // rounded equivalent of cur_glue times 
      // the glue ratio
    
    cur_g = 0;
    cur_glue = float_constant(0);
    this_box = temp_ptr;
    g_order = glue_order(this_box);
    g_sign = glue_sign(this_box);
    if (XeTeX_interword_space_shaping_state > 1) {
        ⟦656 Merge sequences of words using native fonts and inter-word spaces into single nodes⟧
    }
    p = list_ptr(this_box);
    incr(cur_s);
    if (cur_s > 0) {
        dvi_out(push);
    }
    if (cur_s > max_push) {
        max_push = cur_s;
    }
    save_loc = dvi_offset + dvi_ptr;
    base_line = cur_v;
    prev_p = this_box + list_offset;
    ⟦1524 Initialize |hlist_out| for mixed direction typesetting⟧
    left_edge = cur_h;
    ⟦1725 Start hlist {\sl Sync\TeX} information record⟧
    while (p != null) {
        ⟦658 Output node |p| for |hlist_out| and move to the next node, maintaining the condition |cur_v=base_line|⟧
    }
    ⟦1726 Finish hlist {\sl Sync\TeX} information record⟧
    ⟦1525 Finish |hlist_out| for mixed direction typesetting⟧
    prune_movements(save_loc);
    if (cur_s > 0) {
        dvi_pop(save_loc);
    }
    decr(cur_s);
}

656. Extra stuff for justifiable AAT text; need to merge runs of words and normal spaces.

@define is_native_word_node(#) =>
    (
        ((#) != null)
        && (!is_char_node(#))
        && (type(#) == whatsit_node)
        && (is_native_word_subtype(#))
    )
@define is_glyph_node(#) =>
    ((
        (#)
        != null
        && (!is_char_node(#))
        && (type(#) == whatsit_node)
        && (subtype(#) == glyph_node)
    ))
@define node_is_invisible_to_interword_space(#) =>
    !
        is_char_node(#)
        && (
            (type(#) == penalty_node)
            || (type(#) == ins_node)
            || (type(#) == mark_node)
            || (type(#) == adjust_node)
            || (
                (type(#) == whatsit_node)
                && (subtype(#) <= 4)
            )
        ) // This checks for subtype s in the range 
        // open/write/close/special/language, but the 
        // definitions haven't appeared yet in the .web file 
        // so we cheat.
⟦656 Merge sequences of words using native fonts and inter-word spaces into single nodes⟧ = ⟦
    p = list_ptr(this_box)

    prev_p = this_box + list_offset

    while (p != null) {
        if (link(p) != null) {
            // not worth looking ahead at the end
            if (
                is_native_word_node(p)
                && (font_letter_space[native_font(p)] == 0)
            ) {
                // got a word in an AAT font, might be the 
                // start of a run
                //  r is start of possible run
                r = p;
                k = native_length(r);
                q = link(p);
              check_next:
                ⟦657 Advance |q| past ignorable nodes⟧
                if ((q != null) && !is_char_node(q)) {
                    if (
                        (type(q) == glue_node)
                        && (subtype(q) == normal)
                    ) {
                        if ((
                            glue_ptr(q)
                            == font_glue[native_font(r)]
                        )) {
                            // found a normal space; if the 
                            // next node is another word in 
                            // the same font, we'll merge
                            q = link(q);
                            ⟦657 Advance |q| past ignorable nodes⟧
                            if (
                                is_native_word_node(q)
                                && (
                                    native_font(q)
                                    == native_font(r)
                                )
                            ) {
                                // record new tail of run in 
                                // p 
                                p = q;
                                k = k + 1 + native_length(q);
                                q = link(q);
                                goto check_next;
                            }
                        } else {
                            // we'll also merge if if 
                            // space-adjustment was applied 
                            // at this glue, even if it 
                            // wasn't the font's standard 
                            // inter-word space
                            q = link(q);
                        }
                        if (
                            (q != null)
                            && !
                                is_char_node(q)
                                && (type(q) == kern_node)
                                && (
                                    subtype(q)
                                    == space_adjustment
                                )
                        ) {
                            q = link(q);
                            ⟦657 Advance |q| past ignorable nodes⟧
                            if (
                                is_native_word_node(q)
                                && (
                                    native_font(q)
                                    == native_font(r)
                                )
                            ) {
                                // record new tail of run in 
                                // p 
                                p = q;
                                k = k + 1 + native_length(q);
                                q = link(q);
                                goto check_next;
                            }
                        }
                        goto end_node_run;
                    }
                    if (
                        is_native_word_node(q)
                        && (native_font(q) == native_font(r))
                    ) {
                        // record new tail of run in p 
                        p = q;
                        q = link(q);
                        goto check_next;
                    }
                }
              end_node_run:
                // now r points to first native_word_node of 
                // the run, and p to the last
                if (p != r) {
                    // merge nodes from r to p inclusive; 
                    // total text length is k 
                    str_room(k);
                    // now we'll use this as accumulator for 
                    // total width
                    k = 0;
                    q = r;
                    loop {
                        if (type(q) == whatsit_node) {
                            if ((is_native_word_subtype(q))) {
                                for (j in 0 to 
                                    native_length(q)
                                    - 1
                                ) {
                                    append_char(
                                      get_native_char(q, j),
                                    );
                                }
                                k = k + width(q);
                            }
                        } else if (type(q) == glue_node) {
                            append_char(ord!(" "));
                            g = glue_ptr(q);
                            k = k + width(g);
                            if (g_sign != normal) {
                                if (g_sign == stretching) {
                                    if (
                                        stretch_order(g)
                                        == g_order
                                    ) {
                                        k = 
                                            k
                                            + round(
                                              
                                                  float(
                                                    glue_set(
                                                      this_box,
                                                    ),
                                                  )
                                                  * stretch(
                                                    g,
                                                  )
                                              ,
                                            )
                                        ;
                                    }
                                } else {
                                    if (
                                        shrink_order(g)
                                        == g_order
                                    ) {
                                        k = 
                                            k
                                            - round(
                                              
                                                  float(
                                                    glue_set(
                                                      this_box,
                                                    ),
                                                  )
                                                  * shrink(
                                                    g,
                                                  )
                                              ,
                                            )
                                        ;
                                    }
                                }
                            }
                        } else if (type(q) == kern_node) {
                            k = k + width(q);
                            // discretionary and deleted 
                            // nodes can be discarded here
                        }
                        if (q == p) {
                            break;
                        } else {
                            q = link(q);
                        }
                        // create the new merged node q 
                    }
                    q = new_native_word_node(
                      native_font(r),
                      cur_length,
                    );
                    subtype(q) = subtype(r);
                    for (j in 0 to cur_length - 1) {
                        // impose the required width on q , 
                        // and shape its text accordingly
                        set_native_char(
                          q,
                          j,
                          str_pool[
                            str_start_macro(str_ptr) + j,
                          ],
                        );
                    }
                    width(q) = k;
                    // link q into the list in place of r .. 
                    // p 
                    set_justified_native_glyphs(q);
                    link(prev_p) = q;
                    link(q) = link(p);
                    // Extract any "invisible" nodes from 
                    // the old list and insert them after 
                    // the new node, so we don't lose them 
                    // altogether. Note that the first node 
                    // cannot be one of these, as we always 
                    // start merging at a native_word node.
                    link(p) = null;
                    prev_p = r;
                    p = link(r);
                    while (p != null) {
                        if (node_is_invisible_to_interword_space(
                          p,
                        )) {
                            link(prev_p) = link(p);
                            link(p) = link(q);
                            link(q) = p;
                            q = p;
                        }
                        prev_p = p;
                        p = link(p);
                        // discard the remains of the old 
                        // list
                    }
                    // clean up and prepare for the next 
                    // round
                    flush_node_list(r);
                    // flush the temporary string data
                    pool_ptr = str_start_macro(str_ptr);
                    p = q;
                }
            }
            prev_p = p;
        }
        p = link(p);
    }
⟧

657.

⟦657 Advance |q| past ignorable nodes⟧ = ⟦
    while (
        (q != null)
        && node_is_invisible_to_interword_space(q)
    ) {
        q = link(q);
    }
⟧

658. We ought to give special care to the efficiency of one part of hlist_out , since it belongs to TEX’s inner loop. When a char_node is encountered, we save a little time by processing several nodes in succession until reaching a non-char_node . The program uses the fact that set_char_0 == 0 .

In MLTEX this part looks for the existence of a substitution definition for a character c , if c does not exist in the font, and create appropriate DVI commands. Former versions of MLTEX have spliced appropriate character, kern, and box nodes into the horizontal list. Because the user can change character substitutions or \charsubdefmax on the fly, we have to test a again for valid substitutions. (Additional it is necessary to be careful—if leaders are used the current hlist is normally traversed more than once!)

⟦658 Output node |p| for |hlist_out| and move to the next node, maintaining the condition |cur_v=base_line|⟧ = ⟦
    reswitch:

    if (is_char_node(p)) {
        synch_h;
        synch_v;
        repeat {
            f = font(p);
            c = character(p);
            if ((p != lig_trick) && (font_mapping[f] != nil)) {
                c = apply_tfm_font_mapping(
                  font_mapping[f],
                  c,
                );
            }
            if (f != dvi_f) {
                ⟦659 Change font |dvi_f| to |f|⟧
            }
            if (font_ec[f] >= qo(c)) {
                if (font_bc[f] <= qo(c)) {
                    // N.B.: not char_info 
                    if (char_exists(orig_char_info(f)(c))) {
                        if (c >= qi(128)) {
                            dvi_out(set1);
                        }
                        dvi_out(qo(c));
                        cur_h = 
                            cur_h
                            + char_width(f)(
                              orig_char_info(f)(c),
                            )
                        ;
                        goto continue;
                    }
                }
            }
            if (mltex_enabled_p) {
                ⟦1695 Output a substitution, |goto continue| if not possible⟧
            }
          continue:
            // N.B.: not prev_p = p , p might be lig_trick 
            prev_p = link(prev_p);
            p = link(p);
        } until (!is_char_node(p));
        ⟦1728 Record current point {\sl Sync\TeX} information⟧
        dvi_h = cur_h;
    } else {
        ⟦660 Output the non-|char_node| |p| for |hlist_out| and move to the next node⟧
    }
⟧

659.

⟦659 Change font |dvi_f| to |f|⟧ = ⟦
    {
        if (!font_used[f]) {
            dvi_font_def(f);
            font_used[f] = true;
        }
        if (f <= 64 + font_base) {
            dvi_out(f - font_base - 1 + fnt_num_0);
        } else if (f <= 256 + font_base) {
            dvi_out(fnt1);
            dvi_out(f - font_base - 1);
        } else {
            dvi_out(fnt1 + 1);
            dvi_out((f - font_base - 1) div 0x100);
            dvi_out((f - font_base - 1) % 0x100);
        }
        dvi_f = f;
    }
⟧

660.

⟦660 Output the non-|char_node| |p| for |hlist_out| and move to the next node⟧ = ⟦
    {
        case type(p) {
          hlist_node, vlist_node:
            ⟦661 Output a box in an hlist⟧
          rule_node:
            rule_ht = height(p);
            rule_dp = depth(p);
            rule_wd = width(p);
            goto fin_rule;
          whatsit_node:
            ⟦1430 Output the whatsit node |p| in an hlist⟧
          glue_node:
            ⟦663 Move right or output leaders⟧
          margin_kern_node:
            cur_h = cur_h + width(p);
          kern_node:
            ⟦1730 Record |kern_node| {\sl Sync\TeX} information⟧
            cur_h = cur_h + width(p);
          math_node:
            ⟦1731 Record |math_node| {\sl Sync\TeX} information⟧
            ⟦1526 Handle a math node in |hlist_out|⟧
          ligature_node:
            ⟦692 Make node |p| look like a |char_node| and |goto reswitch|⟧
          ⟦1530 Cases of |hlist_out| that arise in mixed direction text only⟧
          othercases:
            do_nothing;
        }
        goto next_p;
      fin_rule:
        ⟦662 Output a rule in an hlist⟧
      move_past:
        {
            cur_h = cur_h + rule_wd;
            ⟦1729 Record horizontal |rule_node| or |glue_node| {\sl Sync\TeX} information⟧
        }
      next_p:
        prev_p = p;
        p = link(p);
    }
⟧

661.

⟦661 Output a box in an hlist⟧ = ⟦
    if (list_ptr(p) == null) {
        ⟦1727 Record void list {\sl Sync\TeX} information⟧
        cur_h = cur_h + width(p);
    } else {
        save_h = dvi_h;
        save_v = dvi_v;
        // shift the box down
        cur_v = base_line + shift_amount(p);
        temp_ptr = p;
        edge = cur_h + width(p);
        if (cur_dir == right_to_left) {
            cur_h = edge;
        }
        if (type(p) == vlist_node) {
            vlist_out;
        } else {
            hlist_out;
        }
        dvi_h = save_h;
        dvi_v = save_v;
        cur_h = edge;
        cur_v = base_line;
    }
⟧

662.

⟦662 Output a rule in an hlist⟧ = ⟦
    if (is_running(rule_ht)) {
        rule_ht = height(this_box);
    }

    if (is_running(rule_dp)) {
        rule_dp = depth(this_box);
    }

    // this is the rule thickness
    rule_ht = rule_ht + rule_dp

    // we don't output empty rules
    if ((rule_ht > 0) && (rule_wd > 0)) {
        synch_h;
        cur_v = base_line + rule_dp;
        synch_v;
        dvi_out(set_rule);
        dvi_four(rule_ht);
        dvi_four(rule_wd);
        cur_v = base_line;
        dvi_h = dvi_h + rule_wd;
    }
⟧

663.

@define billion => float_constant(1000000000)
@define vet_glue(#) =>
    glue_temp = #;
    if (glue_temp > billion) {
        glue_temp = billion;
    } else if (glue_temp < -billion) {
        glue_temp = -billion;
    }
@define round_glue =>
    g = glue_ptr(p);
    rule_wd = width(g) - cur_g;
    if (g_sign != normal) {
        if (g_sign == stretching) {
            if (stretch_order(g) == g_order) {
                cur_glue = cur_glue + stretch(g);
                vet_glue(
                  float(glue_set(this_box)) * cur_glue,
                );
                cur_g = round(glue_temp);
            }
        } else if (shrink_order(g) == g_order) {
            cur_glue = cur_glue - shrink(g);
            vet_glue(float(glue_set(this_box)) * cur_glue);
            cur_g = round(glue_temp);
        }
    }
    rule_wd = rule_wd + cur_g
⟦663 Move right or output leaders⟧ = ⟦
    {
        round_glue;
        if (eTeX_ex) {
            ⟦1509 Handle a glue node for mixed direction typesetting⟧
        }
        if (subtype(p) >= a_leaders) {
            ⟦664 Output leaders in an hlist, |goto fin_rule| if a rule or to |next_p| if done⟧
        }
        goto move_past;
    }
⟧

664.

⟦664 Output leaders in an hlist, |goto fin_rule| if a rule or to |next_p| if done⟧ = ⟦
    {
        leader_box = leader_ptr(p);
        if (type(leader_box) == rule_node) {
            rule_ht = height(leader_box);
            rule_dp = depth(leader_box);
            goto fin_rule;
        }
        leader_wd = width(leader_box);
        if ((leader_wd > 0) && (rule_wd > 0)) {
            // compensate for floating-point rounding
            rule_wd = rule_wd + 10;
            if (cur_dir == right_to_left) {
                cur_h = cur_h - 10;
            }
            edge = cur_h + rule_wd;
            lx = 0;
            ⟦665 Let |cur_h| be the position of the first box, and set |leader_wd+lx| to the spacing between corresponding parts of boxes⟧
            while (cur_h + leader_wd <= edge) {
                ⟦666 Output a leader box at |cur_h|, then advance |cur_h| by |leader_wd+lx|⟧
            }
            if (cur_dir == right_to_left) {
                cur_h = edge;
            } else {
                cur_h = edge - 10;
            }
            goto next_p;
        }
    }
⟧

665. The calculations related to leaders require a bit of care. First, in the case of a_leaders (aligned leaders), we want to move cur_h to left_edge plus the smallest multiple of leader_wd for which the result is not less than the current value of cur_h ; i.e., cur_h should become 𝑙𝑒𝑓𝑡_𝑒𝑑𝑔𝑒+𝑙𝑒𝑎𝑑𝑒𝑟_𝑤𝑑×(𝑐𝑢𝑟_𝑙𝑒𝑓𝑡_𝑒𝑑𝑔𝑒)/𝑙𝑒𝑎𝑑𝑒𝑟_𝑤𝑑. The program here should work in all cases even though some implementations of Pascal give nonstandard results for the div operation when cur_h is less than left_edge .

In the case of c_leaders (centered leaders), we want to increase cur_h by half of the excess space not occupied by the leaders; and in the case of x_leaders (expanded leaders) we increase cur_h by 1/(𝑞+1) of this excess space, where 𝑞 is the number of times the leader box will be replicated. Slight inaccuracies in the division might accumulate; half of this rounding error is placed at each end of the leaders.

⟦665 Let |cur_h| be the position of the first box, and set |leader_wd+lx| to the spacing between corresponding parts of boxes⟧ = ⟦
    if (subtype(p) == a_leaders) {
        save_h = cur_h;
        cur_h = 
            left_edge
            + leader_wd
            * ((cur_h - left_edge) div leader_wd)
        ;
        if (cur_h < save_h) {
            cur_h = cur_h + leader_wd;
        }
    } else {
        // the number of box copies
        lq = rule_wd div leader_wd;
        // the remaining space
        lr = rule_wd % leader_wd;
        if (subtype(p) == c_leaders) {
            cur_h = cur_h + (lr div 2);
        } else {
            lx = lr div (lq + 1);
            cur_h = cur_h + ((lr - (lq - 1) * lx) div 2);
        }
    }
⟧

666. The ‘\synch’ operations here are intended to decrease the number of bytes needed to specify horizontal and vertical motion in the DVI output.

⟦666 Output a leader box at |cur_h|, then advance |cur_h| by |leader_wd+lx|⟧ = ⟦
    {
        cur_v = base_line + shift_amount(leader_box);
        synch_v;
        save_v = dvi_v;
        synch_h;
        save_h = dvi_h;
        temp_ptr = leader_box;
        if (cur_dir == right_to_left) {
            cur_h = cur_h + leader_wd;
        }
        outer_doing_leaders = doing_leaders;
        doing_leaders = true;
        if (type(leader_box) == vlist_node) {
            vlist_out;
        } else {
            hlist_out;
        }
        doing_leaders = outer_doing_leaders;
        dvi_v = save_v;
        dvi_h = save_h;
        cur_v = base_line;
        cur_h = save_h + leader_wd + lx;
    }
⟧

667. The vlist_out routine is similar to hlist_out , but a bit simpler.

// output a vlist_node box
function vlist_out() {
    label move_past, fin_rule, next_p;
    var
      left_edge: scaled, // the left coordinate for this box
      top_edge: scaled, // the top coordinate for this box
      save_h, save_v: scaled, // what dvi_h and dvi_v should 
      // pop to
      this_box: pointer, // pointer to containing box
      g_order: glue_ord, // applicable order of infinity for 
      // glue
      g_sign: normal .. shrinking, // selects type of glue
      p: pointer, // current position in the vlist
      save_loc: integer, // \.{DVI} byte location upon entry
      leader_box: pointer, // the leader box being 
      // replicated
      leader_ht: scaled, // height of leader box being 
      // replicated
      lx: scaled, // extra space between leader boxes
      outer_doing_leaders: boolean, // were we doing 
      // leaders?
      edge: scaled, // bottom boundary of leader space
      glue_temp: real, // glue value before rounding
      cur_glue: real, // glue seen so far
      cur_g: scaled, // rounded equivalent of cur_glue times 
      // the glue ratio
      upwards: boolean; // whether we're stacking upwards
    
    cur_g = 0;
    cur_glue = float_constant(0);
    this_box = temp_ptr;
    g_order = glue_order(this_box);
    g_sign = glue_sign(this_box);
    p = list_ptr(this_box);
    upwards = (subtype(this_box) == min_quarterword + 1);
    incr(cur_s);
    if (cur_s > 0) {
        dvi_out(push);
    }
    if (cur_s > max_push) {
        max_push = cur_s;
    }
    save_loc = dvi_offset + dvi_ptr;
    left_edge = cur_h;
    ⟦1723 Start vlist {\sl Sync\TeX} information record⟧
    if (upwards) {
        cur_v = cur_v + depth(this_box);
    } else {
        cur_v = cur_v - height(this_box);
    }
    top_edge = cur_v;
    while (p != null) {
        ⟦668 Output node |p| for |vlist_out| and move to the next node, maintaining the condition |cur_h=left_edge|⟧
    }
    ⟦1724 Finish vlist {\sl Sync\TeX} information record⟧
    prune_movements(save_loc);
    if (cur_s > 0) {
        dvi_pop(save_loc);
    }
    decr(cur_s);
}

668.

⟦668 Output node |p| for |vlist_out| and move to the next node, maintaining the condition |cur_h=left_edge|⟧ = ⟦
    {
        if (is_char_node(p)) {
            confusion(strpool!("vlistout"));
        } else {
            ⟦669 Output the non-|char_node| |p| for |vlist_out|⟧
        }
      next_p:
        p = link(p);
    }
⟧

669.

⟦669 Output the non-|char_node| |p| for |vlist_out|⟧ = ⟦
    {
        case type(p) {
          hlist_node, vlist_node:
            ⟦670 Output a box in a vlist⟧
          rule_node:
            rule_ht = height(p);
            rule_dp = depth(p);
            rule_wd = width(p);
            goto fin_rule;
          whatsit_node:
            ⟦1426 Output the whatsit node |p| in a vlist⟧
          glue_node:
            ⟦672 Move down or output leaders⟧
          kern_node:
            if (upwards) {
                cur_v = cur_v - width(p);
            } else {
                cur_v = cur_v + width(p);
            }
          othercases:
            do_nothing;
        }
        goto next_p;
      fin_rule:
        ⟦671 Output a rule in a vlist, |goto next_p|⟧
      move_past:
        if (upwards) {
            cur_v = cur_v - rule_ht;
        } else {
            cur_v = cur_v + rule_ht;
        }
    }
⟧

670. The synch_v here allows the DVI output to use one-byte commands for adjusting v in most cases, since the baselineskip distance will usually be constant.

⟦670 Output a box in a vlist⟧ = ⟦
    if (list_ptr(p) == null) {
        if (upwards) {
            cur_v = cur_v - depth(p);
        } else {
            cur_v = cur_v + height(p);
        }
        ⟦1727 Record void list {\sl Sync\TeX} information⟧
        if (upwards) {
            cur_v = cur_v - height(p);
        } else {
            cur_v = cur_v + depth(p);
        }
    } else {
        if (upwards) {
            cur_v = cur_v - depth(p);
        } else {
            cur_v = cur_v + height(p);
        }
        synch_v;
        save_h = dvi_h;
        save_v = dvi_v;
        if (cur_dir == right_to_left) {
            cur_h = left_edge - shift_amount(p);
        } else {
            // shift the box right
            cur_h = left_edge + shift_amount(p);
        }
        temp_ptr = p;
        if (type(p) == vlist_node) {
            vlist_out;
        } else {
            hlist_out;
        }
        dvi_h = save_h;
        dvi_v = save_v;
        if (upwards) {
            cur_v = save_v - height(p);
        } else {
            cur_v = save_v + depth(p);
        }
        cur_h = left_edge;
    }
⟧

671.

⟦671 Output a rule in a vlist, |goto next_p|⟧ = ⟦
    if (is_running(rule_wd)) {
        rule_wd = width(this_box);
    }

    // this is the rule thickness
    rule_ht = rule_ht + rule_dp

    if (upwards) {
        cur_v = cur_v - rule_ht;
    } else {
        cur_v = cur_v + rule_ht;
    }

    // we don't output empty rules
    if ((rule_ht > 0) && (rule_wd > 0)) {
        if (cur_dir == right_to_left) {
            cur_h = cur_h - rule_wd;
        }
        synch_h;
        synch_v;
        dvi_out(put_rule);
        dvi_four(rule_ht);
        dvi_four(rule_wd);
        cur_h = left_edge;
    }

    goto next_p

672.

⟦672 Move down or output leaders⟧ = ⟦
    {
        g = glue_ptr(p);
        rule_ht = width(g) - cur_g;
        if (g_sign != normal) {
            if (g_sign == stretching) {
                if (stretch_order(g) == g_order) {
                    cur_glue = cur_glue + stretch(g);
                    vet_glue(
                      float(glue_set(this_box)) * cur_glue,
                    );
                    cur_g = round(glue_temp);
                }
            } else if (shrink_order(g) == g_order) {
                cur_glue = cur_glue - shrink(g);
                vet_glue(
                  float(glue_set(this_box)) * cur_glue,
                );
                cur_g = round(glue_temp);
            }
        }
        rule_ht = rule_ht + cur_g;
        if (subtype(p) >= a_leaders) {
            ⟦673 Output leaders in a vlist, |goto fin_rule| if a rule or to |next_p| if done⟧
        }
        goto move_past;
    }
⟧

673.

⟦673 Output leaders in a vlist, |goto fin_rule| if a rule or to |next_p| if done⟧ = ⟦
    {
        leader_box = leader_ptr(p);
        if (type(leader_box) == rule_node) {
            rule_wd = width(leader_box);
            rule_dp = 0;
            goto fin_rule;
        }
        leader_ht = height(leader_box) + depth(leader_box);
        if ((leader_ht > 0) && (rule_ht > 0)) {
            // compensate for floating-point rounding
            rule_ht = rule_ht + 10;
            edge = cur_v + rule_ht;
            lx = 0;
            ⟦674 Let |cur_v| be the position of the first box, and set |leader_ht+lx| to the spacing between corresponding parts of boxes⟧
            while (cur_v + leader_ht <= edge) {
                ⟦675 Output a leader box at |cur_v|, then advance |cur_v| by |leader_ht+lx|⟧
            }
            cur_v = edge - 10;
            goto next_p;
        }
    }
⟧

674.

⟦674 Let |cur_v| be the position of the first box, and set |leader_ht+lx| to the spacing between corresponding parts of boxes⟧ = ⟦
    if (subtype(p) == a_leaders) {
        save_v = cur_v;
        cur_v = 
            top_edge
            + leader_ht * ((cur_v - top_edge) div leader_ht)
        ;
        if (cur_v < save_v) {
            cur_v = cur_v + leader_ht;
        }
    } else {
        // the number of box copies
        lq = rule_ht div leader_ht;
        // the remaining space
        lr = rule_ht % leader_ht;
        if (subtype(p) == c_leaders) {
            cur_v = cur_v + (lr div 2);
        } else {
            lx = lr div (lq + 1);
            cur_v = cur_v + ((lr - (lq - 1) * lx) div 2);
        }
    }
⟧

675. When we reach this part of the program, cur_v indicates the top of a leader box, not its baseline.

⟦675 Output a leader box at |cur_v|, then advance |cur_v| by |leader_ht+lx|⟧ = ⟦
    {
        if (cur_dir == right_to_left) {
            cur_h = left_edge - shift_amount(leader_box);
        } else {
            cur_h = left_edge + shift_amount(leader_box);
        }
        synch_h;
        save_h = dvi_h;
        cur_v = cur_v + height(leader_box);
        synch_v;
        save_v = dvi_v;
        temp_ptr = leader_box;
        outer_doing_leaders = doing_leaders;
        doing_leaders = true;
        if (type(leader_box) == vlist_node) {
            vlist_out;
        } else {
            hlist_out;
        }
        doing_leaders = outer_doing_leaders;
        dvi_v = save_v;
        dvi_h = save_h;
        cur_h = left_edge;
        cur_v = save_v - height(leader_box) + leader_ht + lx;
    }
⟧

676. The hlist_out and vlist_out procedures are now complete, so we are ready for the ship_out routine that gets them started in the first place.

// output the box p 
function ship_out(p: pointer) {
    label done;
    var
      page_loc: integer, // location of the current bop 
      j, k: 0 .. 9, // indices to first ten count registers
      s: pool_pointer, // index into str_pool 
      old_setting: 0 .. max_selector; // saved selector 
      // setting
    
    ⟦1721 Start sheet {\sl Sync\TeX} information record⟧
    {
        if (job_name == 0) {
            open_log_file;
        }
        if (tracing_output > 0) {
            print_nl(strpool!(""));
            print_ln;
            print(
              strpool!("Completed box being shipped out"),
            );
        }
        if (term_offset > max_print_line - 9) {
            print_ln;
        } else if ((term_offset > 0) || (file_offset > 0)) {
            print_char(ord!(" "));
        }
        print_char(ord!("["));
        j = 9;
        while ((count(j) == 0) && (j > 0)) {
            decr(j);
        }
        for (k in 0 to j) {
            print_int(count(k));
            if (k < j) {
                print_char(ord!("."));
            }
        }
        update_terminal;
        if (tracing_output > 0) {
            print_char(ord!("]"));
            begin_diagnostic;
            show_box(p);
            end_diagnostic(true);
        }
        ⟦678 Ship box |p| out⟧
        if (eTeX_ex) {
            ⟦1541 Check for LR anomalies at the end of |ship_out|⟧
        }
        if (tracing_output <= 0) {
            print_char(ord!("]"));
        }
        dead_cycles = 0;
        // progress report
        update_terminal;
        ⟦677 Flush the box from memory, showing statistics if requested⟧
    }
    ⟦1722 Finish sheet {\sl Sync\TeX} information record⟧
}

677.

⟦677 Flush the box from memory, showing statistics if requested⟧ = ⟦
    stat!{
        if (tracing_stats > 1) {
            print_nl(strpool!("Memory usage before: "));
            print_int(var_used);
            print_char(ord!("&"));
            print_int(dyn_used);
            print_char(ord!(";"));
        }
    }

    flush_node_list(p)

    stat!{
        if (tracing_stats > 1) {
            print(strpool!(" after: "));
            print_int(var_used);
            print_char(ord!("&"));
            print_int(dyn_used);
            print(strpool!("; still untouched: "));
            print_int(hi_mem_min - lo_mem_max - 1);
            print_ln;
        }
    }
⟧

678.

⟦678 Ship box |p| out⟧ = ⟦
    ⟦679 Update the values of |max_h| and |max_v|; but if the page is too large, |goto done|⟧

    ⟦653 Initialize variables as |ship_out| begins⟧

    page_loc = dvi_offset + dvi_ptr

    dvi_out(bop)

    for (k in 0 to 9) {
        dvi_four(count(k));
    }

    dvi_four(last_bop)

    // generate a pagesize special at start of page
    last_bop = page_loc

    old_setting = selector

    selector = new_string

    print(strpool!("pdf:pagesize "))

    if ((pdf_page_width > 0) && (pdf_page_height > 0)) {
        print(strpool!("width"));
        print(ord!(" "));
        print_scaled(pdf_page_width);
        print(strpool!("pt"));
        print(ord!(" "));
        print(strpool!("height"));
        print(ord!(" "));
        print_scaled(pdf_page_height);
        print(strpool!("pt"));
    } else {
        print(strpool!("default"));
    }

    selector = old_setting

    dvi_out(xxx1)

    dvi_out(cur_length)

    for (s in str_start_macro(str_ptr) to pool_ptr - 1) {
        dvi_out(so(str_pool[s]));
    }

    pool_ptr = str_start_macro(str_ptr) // erase the string

    // does this need changing for upwards mode ????
    cur_v = height(p) + v_offset

    temp_ptr = p

    if (type(p) == vlist_node) {
        vlist_out;
    } else {
        hlist_out;
    }

    dvi_out(eop)

    incr(total_pages)

    cur_s = -1

    if (!no_pdf_output) {
        fflush(dvi_file);
    }

    ifdef("IPC")

    if (ipc_on > 0) {
        if (dvi_limit == half_buf) {
            write_dvi(half_buf, dvi_buf_size - 1);
            flush_dvi;
            dvi_gone = dvi_gone + half_buf;
        }
        if (dvi_ptr > (0x7fffffff - dvi_offset)) {
            cur_s = -2;
            fatal_error(
              strpool!("dvi length exceeds \"7FFFFFFF"),
            );
        }
        if (dvi_ptr > 0) {
            write_dvi(0, dvi_ptr - 1);
            flush_dvi;
            dvi_offset = dvi_offset + dvi_ptr;
            dvi_gone = dvi_gone + dvi_ptr;
        }
        dvi_ptr = 0;
        dvi_limit = dvi_buf_size;
        ipc_page(dvi_gone);
    }

    endif("IPC")

    done:
⟧

679. Sometimes the user will generate a huge page because other error messages are being ignored. Such pages are not output to the dvi file, since they may confuse the printing software.

⟦679 Update the values of |max_h| and |max_v|; but if the page is too large, |goto done|⟧ = ⟦
    if (
        (height(p) > max_dimen)
        || (depth(p) > max_dimen)
        || (height(p) + depth(p) + v_offset > max_dimen)
        || (width(p) + h_offset > max_dimen)
    ) {
        print_err(
          strpool!("Huge page cannot be shipped out"),
        );
        help2(
          strpool!("The page just created is more than 18 feet tall or"),
        )(
          strpool!("more than 18 feet wide, so I suspect something went wrong."),
        );
        error;
        if (tracing_output <= 0) {
            begin_diagnostic;
            print_nl(
              strpool!("The following box has been deleted:"),
            );
            show_box(p);
            end_diagnostic(true);
        }
        goto done;
    }

    if (height(p) + depth(p) + v_offset > max_v) {
        max_v = height(p) + depth(p) + v_offset;
    }

    if (width(p) + h_offset > max_h) {
        max_h = width(p) + h_offset;
    }
⟧

680. At the end of the program, we must finish things off by writing the postamble. If total_pages == 0 , the DVI file was never opened. If total_pages >= 65536 , the DVI file will lie. And if max_push >= 65536 , the user deserves whatever chaos might ensue.

An integer variable k will be declared for use by this routine.

⟦680 Finish the \.{DVI} file⟧ = ⟦
    while (cur_s > -1) {
        if (cur_s > 0) {
            dvi_out(pop);
        } else {
            dvi_out(eop);
            incr(total_pages);
        }
        decr(cur_s);
    }

    if (total_pages == 0) {
        print_nl(strpool!("No pages of output."));
    } else if (cur_s != -2) {
        // beginning of the postamble
        dvi_out(post);
        dvi_four(last_bop);
        //  post location
        last_bop = dvi_offset + dvi_ptr - 5;
        dvi_four(25400000);
        // conversion ratio for sp
        dvi_four(473628672);
        prepare_mag;
        // magnification factor
        dvi_four(mag);
        dvi_four(max_v);
        dvi_four(max_h);
        dvi_out(max_push div 256);
        dvi_out(max_push % 256);
        dvi_out((total_pages div 256) % 256);
        dvi_out(total_pages % 256);
        ⟦681 Output the font definitions for all fonts that were used⟧
        dvi_out(post_post);
        dvi_four(last_bop);
        dvi_out(id_byte);
        ifdef("IPC");
        // the number of 223's
        k = 7 - ((3 + dvi_offset + dvi_ptr) % 4);
        endif("IPC");
        ifndef("IPC");
        // the number of 223's
        k = 4 + ((dvi_buf_size - dvi_ptr) % 4);
        endifn("IPC");
        while (k > 0) {
            dvi_out(223);
            decr(k);
        }
        ⟦635 Empty the last bytes out of |dvi_buf|⟧
        k = dvi_close(dvi_file);
        if (k == 0) {
            print_nl(strpool!("Output written on "));
            print(output_file_name);
            print(strpool!(" ("));
            print_int(total_pages);
            if (total_pages != 1) {
                print(strpool!(" pages"));
            } else {
                print(strpool!(" page"));
            }
            if (no_pdf_output) {
                print(strpool!(", "));
                print_int(dvi_offset + dvi_ptr);
                print(strpool!(" bytes)."));
            } else {
                print(strpool!(")."));
            }
        } else {
            print_nl(strpool!("Error "));
            print_int(k);
            print(strpool!(" ("));
            if (no_pdf_output) {
                print_c_string(strerror(k));
            } else {
                print(strpool!("driver return code"));
            }
            print(strpool!(") generating output;"));
            print_nl(strpool!("file "));
            print(output_file_name);
            print(strpool!(" may not be valid."));
            history = output_failure;
        }
    }
⟧

681.

⟦681 Output the font definitions for all fonts that were used⟧ = ⟦
    while (font_ptr > font_base) {
        if (font_used[font_ptr]) {
            dvi_font_def(font_ptr);
        }
        decr(font_ptr);
    }
⟧

682. [32b] pdfTEX output low-level subroutines (equivalents).

⟦13 Global variables⟧ += ⟦
    var epochseconds: integer;

    var microseconds: integer;
⟧

683. [33] Packaging. We’re essentially done with the parts of TEX that are concerned with the input (get_next ) and the output (ship_out ). So it’s time to get heavily into the remaining part, which does the real work of typesetting.

After lists are constructed, TEX wraps them up and puts them into boxes. Two major subroutines are given the responsibility for this task: hpack applies to horizontal lists (hlists) and vpack applies to vertical lists (vlists). The main duty of hpack and vpack is to compute the dimensions of the resulting boxes, and to adjust the glue if one of those dimensions is pre-specified. The computed sizes normally enclose all of the material inside the new box; but some items may stick out if negative glue is used, if the box is overfull, or if a \vbox includes other boxes that have been shifted left.

The subroutine call hpack(p, w, m) returns a pointer to an hlist_node for a box containing the hlist that starts at p . Parameter w specifies a width; and parameter m is either ‘exactly ’ or ‘additional ’. Thus, hpack(p, w, exactly) produces a box whose width is exactly w , while hpack(p, w, additional) yields a box whose width is the natural width plus w . It is convenient to define a macro called ‘natural ’ to cover the most common case, so that we can say hpack(p, natural) to get a box that has the natural width of list p .

Similarly, vpack(p, w, m) returns a pointer to a vlist_node for a box containing the vlist that starts at p . In this case w represents a height instead of a width; the parameter m is interpreted as in hpack .

@define exactly => 0 // a box dimension is pre-specified
// a box dimension is increased from the natural one
@define additional => 1
// shorthand for parameters to hpack and vpack 
@define natural => 0, additional

684. The parameters to hpack and vpack correspond to TEX’s primitives like ‘\hbox to 300pt’, ‘\hbox spread 10pt’; note that ‘\hbox’ with no dimension following it is equivalent to ‘\hbox spread 0pt’. The scan_spec subroutine scans such constructions in the user’s input, including the mandatory left brace that follows them, and it puts the specification onto save_stack so that the desired box can later be obtained by executing the following code:

save_ptr=save_ptr-2hpack(p,saved(1),saved(0)).
Special care is necessary to ensure that the special save_stack codes are placed just below the new group code, because scanning can change save_stack when \csname appears.

// scans a box specification and left brace
function scan_spec(c: group_code, three_codes: boolean) {
    label found;
    var
      s: integer, // temporarily saved value
      spec_code: exactly .. additional;
    
    if (three_codes) {
        s = saved(0);
    }
    if (scan_keyword(strpool!("to"))) {
        spec_code = exactly;
    } else if (scan_keyword(strpool!("spread"))) {
        spec_code = additional;
    } else {
        spec_code = additional;
        cur_val = 0;
        goto found;
    }
    scan_normal_dimen;
  found:
    if (three_codes) {
        saved(0) = s;
        incr(save_ptr);
    }
    saved(0) = spec_code;
    saved(1) = cur_val;
    save_ptr = save_ptr + 2;
    new_save_level(c);
    scan_left_brace;
}

685. To figure out the glue setting, hpack and vpack determine how much stretchability and shrinkability are present, considering all four orders of infinity. The highest order of infinity that has a nonzero coefficient is then used as if no other orders were present.

For example, suppose that the given list contains six glue nodes with the respective stretchabilities 3pt, 8fill, 5fil, 6pt, 3fil, 8fill. Then the total is essentially 2fil; and if a total additional space of 6pt is to be achieved by stretching, the actual amounts of stretch will be 0pt, 0pt, 15pt, 0pt, 9pt, and 0pt, since only ‘fil’ glue will be considered. (The ‘fill’ glue is therefore not really stretching infinitely with respect to ‘fil’; nobody would actually want that to happen.)

The arrays total_stretch and total_shrink are used to determine how much glue of each kind is present. A global variable last_badness is used to implement \badness.

⟦13 Global variables⟧ += ⟦
    // glue found by hpack or vpack 
    var total_stretch,
      total_shrink: array [glue_ord] of scaled;

    // badness of the most recently packaged box
    var last_badness: integer;
⟧

686. If the global variable adjust_tail is non-null, the hpack routine also removes all occurrences of ins_node , mark_node , and adjust_node items and appends the resulting material onto the list that ends at location adjust_tail .

⟦13 Global variables⟧ += ⟦
    // tail of adjustment list
    var adjust_tail: pointer;
⟧

687.

⟦23 Set initial values of key variables⟧ += ⟦
    adjust_tail = null

    last_badness = 0

688. Some stuff for character protrusion.

@define left_pw(#) => char_pw(#, left_side)
@define right_pw(#) => char_pw(#, right_side)
function char_pw(p: pointer, side: small_number): scaled {
    var f: internal_font_number, c: integer;
    
    char_pw = 0;
    if (side == left_side) {
        last_leftmost_char = null;
    } else {
        last_rightmost_char = null;
    }
    if (p == null) {
        // native word
        return;
    }
    if (is_native_word_node(p)) {
        if (native_glyph_info_ptr(p) != null_ptr) {
            f = native_font(p);
            char_pw = round_xn_over_d(
              quad(f),
              get_native_word_cp(p, side),
              1000,
            );
        }
        return;
        // glyph node
    }
    if (is_glyph_node(p)) {
        f = native_font(p);
        char_pw = round_xn_over_d(
          quad(f),
          get_cp_code(f, native_glyph(p), side),
          1000,
        );
        return;
        // char node or ligature; same like pdftex
    }
    if (!is_char_node(p)) {
        if (type(p) == ligature_node) {
            p = lig_char(p);
        } else {
            return;
        }
    }
    f = font(p);
    c = get_cp_code(f, character(p), side);
    case side {
      left_side:
        last_leftmost_char = p;
      right_side:
        last_rightmost_char = p;
    }
    if (c == 0) {
        return;
    }
    char_pw = round_xn_over_d(quad(f), c, 1000);
}

function new_margin_kern(
  w: scaled,
  p: pointer,
  side: small_number,
): pointer {
    var k: pointer;
    
    k = get_node(margin_kern_node_size);
    type(k) = margin_kern_node;
    subtype(k) = side;
    width(k) = w;
    new_margin_kern = k;
}

689. Here now is hpack , which contains few if any surprises.

function hpack(
  p: pointer,
  w: scaled,
  m: small_number,
): pointer {
    label reswitch, common_ending, exit, restart;
    var
      r: pointer, // the box node that will be returned
      q: pointer, // trails behind p 
      h, d, x: scaled, // height, depth, and natural width
      s: scaled, // shift amount
      g: pointer, // points to a glue specification
      o: glue_ord, // order of infinity
      f: internal_font_number, // the font in a char_node 
      i: four_quarters, // font information about a 
      // char_node 
      hd: eight_bits, // height and depth indices for a 
      // character
      pp, ppp: pointer,
      total_chars, k: integer;
    
    last_badness = 0;
    r = get_node(box_node_size);
    type(r) = hlist_node;
    subtype(r) = min_quarterword;
    shift_amount(r) = 0;
    q = r + list_offset;
    link(q) = p;
    h = 0;
    ⟦690 Clear dimensions to zero⟧
    if (TeXXeT_en) {
        ⟦1520 Initialize the LR stack⟧
    }
    while (p != null) {
        ⟦691 Examine node |p| in the hlist, taking account of its effect on the dimensions of the new box, or moving it to the adjustment list; then advance |p| to the next node⟧
    }
    if (adjust_tail != null) {
        link(adjust_tail) = null;
    }
    if (pre_adjust_tail != null) {
        link(pre_adjust_tail) = null;
    }
    height(r) = h;
    depth(r) = d;
    ⟦699 Determine the value of |width(r)| and the appropriate glue setting; then |return| or |goto common_ending|⟧
  common_ending:
    ⟦705 Finish issuing a diagnostic message for an overfull or underfull hbox⟧
  exit:
    if (TeXXeT_en) {
        ⟦1522 Check for LR anomalies at the end of |hpack|⟧
    }
    hpack = r;
}

690.

⟦690 Clear dimensions to zero⟧ = ⟦
    d = 0

    x = 0

    total_stretch[normal] = 0

    total_shrink[normal] = 0

    total_stretch[fil] = 0

    total_shrink[fil] = 0

    total_stretch[fill] = 0

    total_shrink[fill] = 0

    total_stretch[filll] = 0

    total_shrink[filll] = 0

691.

⟦691 Examine node |p| in the hlist, taking account of its effect on the dimensions of the new box, or moving it to the adjustment list; then advance |p| to the next node⟧ = ⟦
    {
        
      reswitch:
        while (is_char_node(p)) {
            ⟦694 Incorporate character dimensions into the dimensions of the hbox that will contain~it, then move to the next node⟧
        }
        if (p != null) {
            case type(p) {
              hlist_node, vlist_node, rule_node, unset_node:
                ⟦693 Incorporate box dimensions into the dimensions of the hbox that will contain~it⟧
              ins_node, mark_node, adjust_node:
                if (
                    (adjust_tail != null)
                    || (pre_adjust_tail != null)
                ) {
                    ⟦697 Transfer node |p| to the adjustment list⟧
                }
              whatsit_node:
                ⟦1420 Incorporate a whatsit node into an hbox⟧
              glue_node:
                ⟦698 Incorporate glue into the horizontal totals⟧
              kern_node:
                x = x + width(p);
              margin_kern_node:
                x = x + width(p);
              math_node:
                x = x + width(p);
                if (TeXXeT_en) {
                    ⟦1521 Adjust \(t)the LR stack for the |hpack| routine⟧
                }
              ligature_node:
                ⟦692 Make node |p| look like a |char_node| and |goto reswitch|⟧
              othercases:
                do_nothing;
            }
            p = link(p);
        }
    }
⟧

692.

⟦692 Make node |p| look like a |char_node| and |goto reswitch|⟧ = ⟦
    {
        mem[lig_trick] = mem[lig_char(p)];
        link(lig_trick) = link(p);
        p = lig_trick;
        xtx_ligature_present = true;
        goto reswitch;
    }
⟧

693. The code here implicitly uses the fact that running dimensions are indicated by null_flag , which will be ignored in the calculations because it is a highly negative number.

⟦693 Incorporate box dimensions into the dimensions of the hbox that will contain~it⟧ = ⟦
    {
        x = x + width(p);
        if (type(p) >= rule_node) {
            s = 0;
        } else {
            s = shift_amount(p);
        }
        if (height(p) - s > h) {
            h = height(p) - s;
        }
        if (depth(p) + s > d) {
            d = depth(p) + s;
        }
    }
⟧

694. The following code is part of TEX’s inner loop; i.e., adding another character of text to the user’s input will cause each of these instructions to be exercised one more time.

⟦694 Incorporate character dimensions into the dimensions of the hbox that will contain~it, then move to the next node⟧ = ⟦
    {
        f = font(p);
        i = char_info(f)(character(p));
        hd = height_depth(i);
        x = x + char_width(f)(i);
        s = char_height(f)(hd);
        if (s > h) {
            h = s;
        }
        s = char_depth(f)(hd);
        if (s > d) {
            d = s;
        }
        p = link(p);
    }
⟧

695. Although node q is not necessarily the immediate predecessor of node p , it always points to some node in the list preceding p . Thus, we can delete nodes by moving q when necessary. The algorithm takes linear time, and the extra computation does not intrude on the inner loop unless it is necessary to make a deletion.

⟦13 Global variables⟧ += ⟦
    var pre_adjust_tail: pointer;
⟧

696.

⟦23 Set initial values of key variables⟧ += ⟦
    pre_adjust_tail = null
⟧

697. Materials in \vadjust used with pre keyword will be appended to pre_adjust_tail instead of adjust_tail .

@define update_adjust_list(#) =>
    {
        if (# == null) {
            confusion(strpool!("pre vadjust"));
        }
        link(#) = adjust_ptr(p);
        while (link(#) != null) {
            # = link(#);
        }
    }
⟦697 Transfer node |p| to the adjustment list⟧ = ⟦
    {
        while (link(q) != p) {
            q = link(q);
        }
        if (type(p) == adjust_node) {
            if (adjust_pre(p) != 0) {
                update_adjust_list(pre_adjust_tail);
            } else {
                update_adjust_list(adjust_tail);
            }
            p = link(p);
            free_node(link(q), small_node_size);
        } else {
            link(adjust_tail) = p;
            adjust_tail = p;
            p = link(p);
        }
        link(q) = p;
        p = q;
    }
⟧

698.

⟦698 Incorporate glue into the horizontal totals⟧ = ⟦
    {
        g = glue_ptr(p);
        x = x + width(g);
        o = stretch_order(g);
        total_stretch[o] = total_stretch[o] + stretch(g);
        o = shrink_order(g);
        total_shrink[o] = total_shrink[o] + shrink(g);
        if (subtype(p) >= a_leaders) {
            g = leader_ptr(p);
            if (height(g) > h) {
                h = height(g);
            }
            if (depth(g) > d) {
                d = depth(g);
            }
        }
    }
⟧

699. When we get to the present part of the program, x is the natural width of the box being packaged.

⟦699 Determine the value of |width(r)| and the appropriate glue setting; then |return| or |goto common_ending|⟧ = ⟦
    if (m == additional) {
        w = x + w;
    }

    width(r) = w

    x = w - x // now x is the excess to be made up

    if (x == 0) {
        glue_sign(r) = normal;
        glue_order(r) = normal;
        set_glue_ratio_zero(glue_set(r));
        return;
    } else if (x > 0) {
        ⟦700 Determine horizontal glue stretch setting, then |return| or \hbox{|goto common_ending|}⟧
    } else {
        ⟦706 Determine horizontal glue shrink setting, then |return| or \hbox{|goto common_ending|}⟧
    }
⟧

700.

⟦700 Determine horizontal glue stretch setting, then |return| or \hbox{|goto common_ending|}⟧ = ⟦
    {
        ⟦701 Determine the stretch order⟧
        glue_order(r) = o;
        glue_sign(r) = stretching;
        if (total_stretch[o] != 0) {
            glue_set(r) = unfloat(x / total_stretch[o]);
        } else {
            glue_sign(r) = normal;
            // there's nothing to stretch
            set_glue_ratio_zero(glue_set(r));
        }
        if (o == normal) {
            if (list_ptr(r) != null) {
                ⟦702 Report an underfull hbox and |goto common_ending|, if this box is sufficiently bad⟧
            }
        }
        return;
    }
⟧

701.

⟦701 Determine the stretch order⟧ = ⟦
    if (total_stretch[filll] != 0) {
        o = filll;
    } else if (total_stretch[fill] != 0) {
        o = fill;
    } else if (total_stretch[fil] != 0) {
        o = fil;
    } else {
        o = normal;
    }
⟧

702.

⟦702 Report an underfull hbox and |goto common_ending|, if this box is sufficiently bad⟧ = ⟦
    {
        last_badness = badness(x, total_stretch[normal]);
        if (last_badness > hbadness) {
            print_ln;
            if (last_badness > 100) {
                print_nl(strpool!("Underfull"));
            } else {
                print_nl(strpool!("Loose"));
            }
            print(strpool!(" \\hbox (badness "));
            print_int(last_badness);
            goto common_ending;
        }
    }
⟧

703. In order to provide a decent indication of where an overfull or underfull box originated, we use a global variable pack_begin_line that is set nonzero only when hpack is being called by the paragraph builder or the alignment finishing routine.

⟦13 Global variables⟧ += ⟦
    // source file line where the current paragraph or 
    // alignment began; a negative value denotes alignment
    var pack_begin_line: integer;
⟧

704.

⟦23 Set initial values of key variables⟧ += ⟦
    pack_begin_line = 0

705.

⟦705 Finish issuing a diagnostic message for an overfull or underfull hbox⟧ = ⟦
    if (output_active) {
        print(
          strpool!(") has occurred while \\output is active"),
        );
    } else {
        if (pack_begin_line != 0) {
            if (pack_begin_line > 0) {
                print(strpool!(") in paragraph at lines "));
            } else {
                print(strpool!(") in alignment at lines "));
            }
            print_int(abs(pack_begin_line));
            print(strpool!("--"));
        } else {
            print(strpool!(") detected at line "));
        }
        print_int(line);
    }

    print_ln

    font_in_short_display = null_font

    short_display(list_ptr(r))

    print_ln

    begin_diagnostic

    show_box(r)

    end_diagnostic(true)
⟧

706.

⟦706 Determine horizontal glue shrink setting, then |return| or \hbox{|goto common_ending|}⟧ = ⟦
    {
        ⟦707 Determine the shrink order⟧
        glue_order(r) = o;
        glue_sign(r) = shrinking;
        if (total_shrink[o] != 0) {
            glue_set(r) = unfloat((-x) / total_shrink[o]);
        } else {
            glue_sign(r) = normal;
            // there's nothing to shrink
            set_glue_ratio_zero(glue_set(r));
        }
        if (
            (total_shrink[o] < -x)
            && (o == normal) && (list_ptr(r) != null)
        ) {
            last_badness = 1000000;
            // use the maximum shrinkage
            set_glue_ratio_one(glue_set(r));
            ⟦708 Report an overfull hbox and |goto common_ending|, if this box is sufficiently bad⟧
        } else if (o == normal) {
            if (list_ptr(r) != null) {
                ⟦709 Report a tight hbox and |goto common_ending|, if this box is sufficiently bad⟧
            }
        }
        return;
    }
⟧

707.

⟦707 Determine the shrink order⟧ = ⟦
    if (total_shrink[filll] != 0) {
        o = filll;
    } else if (total_shrink[fill] != 0) {
        o = fill;
    } else if (total_shrink[fil] != 0) {
        o = fil;
    } else {
        o = normal;
    }
⟧

708.

⟦708 Report an overfull hbox and |goto common_ending|, if this box is sufficiently bad⟧ = ⟦
    if (
        (-x - total_shrink[normal] > hfuzz)
        || (hbadness < 100)
    ) {
        if (
            (overfull_rule > 0)
            && (-x - total_shrink[normal] > hfuzz)
        ) {
            while (link(q) != null) {
                q = link(q);
            }
            link(q) = new_rule;
            width(link(q)) = overfull_rule;
        }
        print_ln;
        print_nl(strpool!("Overfull \\hbox ("));
        print_scaled(-x - total_shrink[normal]);
        print(strpool!("pt too wide"));
        goto common_ending;
    }
⟧

709.

⟦709 Report a tight hbox and |goto common_ending|, if this box is sufficiently bad⟧ = ⟦
    {
        last_badness = badness(-x, total_shrink[normal]);
        if (last_badness > hbadness) {
            print_ln;
            print_nl(strpool!("Tight \\hbox (badness "));
            print_int(last_badness);
            goto common_ending;
        }
    }
⟧

710. The vpack subroutine is actually a special case of a slightly more general routine called vpackage , which has four parameters. The fourth parameter, which is max_dimen in the case of vpack , specifies the maximum depth of the page box that is constructed. The depth is first computed by the normal rules; if it exceeds this limit, the reference point is simply moved down until the limiting depth is attained.

// special case of unconstrained depth
@define vpack(#) => vpackage(#, max_dimen)
function vpackage(
  p: pointer,
  h: scaled,
  m: small_number,
  l: scaled,
): pointer {
    label common_ending, exit;
    var
      r: pointer, // the box node that will be returned
      w, d, x: scaled, // width, depth, and natural height
      s: scaled, // shift amount
      g: pointer, // points to a glue specification
      o: glue_ord; // order of infinity
    
    last_badness = 0;
    r = get_node(box_node_size);
    type(r) = vlist_node;
    if (XeTeX_upwards) {
        subtype(r) = min_quarterword + 1;
    } else {
        subtype(r) = min_quarterword;
    }
    shift_amount(r) = 0;
    list_ptr(r) = p;
    w = 0;
    ⟦690 Clear dimensions to zero⟧
    while (p != null) {
        ⟦711 Examine node |p| in the vlist, taking account of its effect on the dimensions of the new box; then advance |p| to the next node⟧
    }
    width(r) = w;
    if (d > l) {
        x = x + d - l;
        depth(r) = l;
    } else {
        depth(r) = d;
    }
    ⟦714 Determine the value of |height(r)| and the appropriate glue setting; then |return| or |goto common_ending|⟧
  common_ending:
    ⟦717 Finish issuing a diagnostic message for an overfull or underfull vbox⟧
  exit:
    vpackage = r;
}

711.

⟦711 Examine node |p| in the vlist, taking account of its effect on the dimensions of the new box; then advance |p| to the next node⟧ = ⟦
    {
        if (is_char_node(p)) {
            confusion(strpool!("vpack"));
        } else {
            case type(p) {
              hlist_node, vlist_node, rule_node, unset_node:
                ⟦712 Incorporate box dimensions into the dimensions of the vbox that will contain~it⟧
              whatsit_node:
                ⟦1419 Incorporate a whatsit node into a vbox⟧
              glue_node:
                ⟦713 Incorporate glue into the vertical totals⟧
              kern_node:
                x = x + d + width(p);
                d = 0;
              othercases:
                do_nothing;
            }
        }
        p = link(p);
    }
⟧

712.

⟦712 Incorporate box dimensions into the dimensions of the vbox that will contain~it⟧ = ⟦
    {
        x = x + d + height(p);
        d = depth(p);
        if (type(p) >= rule_node) {
            s = 0;
        } else {
            s = shift_amount(p);
        }
        if (width(p) + s > w) {
            w = width(p) + s;
        }
    }
⟧

713.

⟦713 Incorporate glue into the vertical totals⟧ = ⟦
    {
        x = x + d;
        d = 0;
        g = glue_ptr(p);
        x = x + width(g);
        o = stretch_order(g);
        total_stretch[o] = total_stretch[o] + stretch(g);
        o = shrink_order(g);
        total_shrink[o] = total_shrink[o] + shrink(g);
        if (subtype(p) >= a_leaders) {
            g = leader_ptr(p);
            if (width(g) > w) {
                w = width(g);
            }
        }
    }
⟧

714. When we get to the present part of the program, x is the natural height of the box being packaged.

⟦714 Determine the value of |height(r)| and the appropriate glue setting; then |return| or |goto common_ending|⟧ = ⟦
    if (m == additional) {
        h = x + h;
    }

    height(r) = h

    x = h - x // now x is the excess to be made up

    if (x == 0) {
        glue_sign(r) = normal;
        glue_order(r) = normal;
        set_glue_ratio_zero(glue_set(r));
        return;
    } else if (x > 0) {
        ⟦715 Determine vertical glue stretch setting, then |return| or \hbox{|goto common_ending|}⟧
    } else {
        ⟦718 Determine vertical glue shrink setting, then |return| or \hbox{|goto common_ending|}⟧
    }
⟧

715.

⟦715 Determine vertical glue stretch setting, then |return| or \hbox{|goto common_ending|}⟧ = ⟦
    {
        ⟦701 Determine the stretch order⟧
        glue_order(r) = o;
        glue_sign(r) = stretching;
        if (total_stretch[o] != 0) {
            glue_set(r) = unfloat(x / total_stretch[o]);
        } else {
            glue_sign(r) = normal;
            // there's nothing to stretch
            set_glue_ratio_zero(glue_set(r));
        }
        if (o == normal) {
            if (list_ptr(r) != null) {
                ⟦716 Report an underfull vbox and |goto common_ending|, if this box is sufficiently bad⟧
            }
        }
        return;
    }
⟧

716.

⟦716 Report an underfull vbox and |goto common_ending|, if this box is sufficiently bad⟧ = ⟦
    {
        last_badness = badness(x, total_stretch[normal]);
        if (last_badness > vbadness) {
            print_ln;
            if (last_badness > 100) {
                print_nl(strpool!("Underfull"));
            } else {
                print_nl(strpool!("Loose"));
            }
            print(strpool!(" \\vbox (badness "));
            print_int(last_badness);
            goto common_ending;
        }
    }
⟧

717.

⟦717 Finish issuing a diagnostic message for an overfull or underfull vbox⟧ = ⟦
    if (output_active) {
        print(
          strpool!(") has occurred while \\output is active"),
        );
    } else {
        // it's actually negative
        if (pack_begin_line != 0) {
            print(strpool!(") in alignment at lines "));
            print_int(abs(pack_begin_line));
            print(strpool!("--"));
        } else {
            print(strpool!(") detected at line "));
        }
        print_int(line);
        print_ln;
    }

    begin_diagnostic

    show_box(r)

    end_diagnostic(true)
⟧

718.

⟦718 Determine vertical glue shrink setting, then |return| or \hbox{|goto common_ending|}⟧ = ⟦
    {
        ⟦707 Determine the shrink order⟧
        glue_order(r) = o;
        glue_sign(r) = shrinking;
        if (total_shrink[o] != 0) {
            glue_set(r) = unfloat((-x) / total_shrink[o]);
        } else {
            glue_sign(r) = normal;
            // there's nothing to shrink
            set_glue_ratio_zero(glue_set(r));
        }
        if (
            (total_shrink[o] < -x)
            && (o == normal) && (list_ptr(r) != null)
        ) {
            last_badness = 1000000;
            // use the maximum shrinkage
            set_glue_ratio_one(glue_set(r));
            ⟦719 Report an overfull vbox and |goto common_ending|, if this box is sufficiently bad⟧
        } else if (o == normal) {
            if (list_ptr(r) != null) {
                ⟦720 Report a tight vbox and |goto common_ending|, if this box is sufficiently bad⟧
            }
        }
        return;
    }
⟧

719.

⟦719 Report an overfull vbox and |goto common_ending|, if this box is sufficiently bad⟧ = ⟦
    if (
        (-x - total_shrink[normal] > vfuzz)
        || (vbadness < 100)
    ) {
        print_ln;
        print_nl(strpool!("Overfull \\vbox ("));
        print_scaled(-x - total_shrink[normal]);
        print(strpool!("pt too high"));
        goto common_ending;
    }
⟧

720.

⟦720 Report a tight vbox and |goto common_ending|, if this box is sufficiently bad⟧ = ⟦
    {
        last_badness = badness(-x, total_shrink[normal]);
        if (last_badness > vbadness) {
            print_ln;
            print_nl(strpool!("Tight \\vbox (badness "));
            print_int(last_badness);
            goto common_ending;
        }
    }
⟧

721. When a box is being appended to the current vertical list, the baselineskip calculation is handled by the append_to_vlist routine.

function append_to_vlist(b: pointer) {
    var
      d: scaled, // deficiency of space between baselines
      p: pointer, // a new glue node
      upwards: boolean;
    
    upwards = XeTeX_upwards;
    if (prev_depth > ignore_depth) {
        if (upwards) {
            d = width(baseline_skip) - prev_depth - depth(b);
        } else {
            d = 
                width(baseline_skip)
                - prev_depth - height(b)
            ;
        }
        if (d < line_skip_limit) {
            p = new_param_glue(line_skip_code);
        } else {
            p = new_skip_param(baseline_skip_code);
            //  temp_ptr == glue_ptr ( p ) 
            width(temp_ptr) = d;
        }
        link(tail) = p;
        tail = p;
    }
    link(tail) = b;
    tail = b;
    if (upwards) {
        prev_depth = height(b);
    } else {
        prev_depth = depth(b);
    }
}

722. [34] Data structures for math mode. When TEX reads a formula that is enclosed between $’s, it constructs an mlist, which is essentially a tree structure representing that formula. An mlist is a linear sequence of items, but we can regard it as a tree structure because mlists can appear within mlists. For example, many of the entries can be subscripted or superscripted, and such “scripts” are mlists in their own right.

An entire formula is parsed into such a tree before any of the actual typesetting is done, because the current style of type is usually not known until the formula has been fully scanned. For example, when the formula ‘$a+b \over c+d$’ is being read, there is no way to tell that ‘a+b’ will be in script size until ‘\over’ has appeared.

During the scanning process, each element of the mlist being built is classified as a relation, a binary operator, an open parenthesis, etc., or as a construct like ‘\sqrt’ that must be built up. This classification appears in the mlist data structure.

After a formula has been fully scanned, the mlist is converted to an hlist so that it can be incorporated into the surrounding text. This conversion is controlled by a recursive procedure that decides all of the appropriate styles by a “top-down” process starting at the outermost level and working in towards the subformulas. The formula is ultimately pasted together using combinations of horizontal and vertical boxes, with glue and penalty nodes inserted as necessary.

An mlist is represented internally as a linked list consisting chiefly of “noads” (pronounced “no-adds”), to distinguish them from the somewhat similar “nodes” in hlists and vlists. Certain kinds of ordinary nodes are allowed to appear in mlists together with the noads; TEX tells the difference by means of the type field, since a noad’s type is always greater than that of a node. An mlist does not contain character nodes, hlist nodes, vlist nodes, math nodes, ligature nodes, or unset nodes; in particular, each mlist item appears in the variable-size part of mem , so the type field is always present.

723. Each noad is four or more words long. The first word contains the type and subtype and link fields that are already so familiar to us; the second, third, and fourth words are called the noad’s nucleus , subscr , and supscr fields.

Consider, for example, the simple formula ‘$x^2$’, which would be parsed into an mlist containing a single element called an ord_noad . The nucleus of this noad is a representation of ‘x’, the subscr is empty, and the supscr is a representation of ‘2’.

The nucleus , subscr , and supscr fields are further broken into subfields. If p points to a noad, and if q is one of its principal fields (e.g., q == subscr(p) ), there are several possibilities for the subfields, depending on the math_type of q .

math_type(q) == math_char means that fam(q) refers to one of the sixteen font families, and character(q) is the number of a character within a font of that family, as in a character node.

math_type(q) == math_text_char is similar, but the character is unsubscripted and unsuperscripted and it is followed immediately by another character from the same font. (This math_type setting appears only briefly during the processing; it is used to suppress unwanted italic corrections.)

math_type(q) == empty indicates a field with no value (the corresponding attribute of noad p is not present).

math_type(q) == sub_box means that info(q) points to a box node (either an hlist_node or a vlist_node ) that should be used as the value of the field. The shift_amount in the subsidiary box node is the amount by which that box will be shifted downward.

math_type(q) == sub_mlist means that info(q) points to an mlist; the mlist must be converted to an hlist in order to obtain the value of this field.

In the latter case, we might have info(q) == null . This is not the same as math_type(q) == empty ; for example, ‘$P_{}$’ and ‘$P$’ produce different results (the former will not have the “italic correction” added to the width of P , but the “script skip” will be added).

The definitions of subfields given here are evidently wasteful of space, since a halfword is being used for the math_type although only three bits would be needed. However, there are hardly ever many noads present at once, since they are soon converted to nodes that take up even more space, so we can afford to represent them in whatever way simplifies the programming.

@define noad_size => 4 // number of words in a normal noad
@define nucleus(#) => # + 1 // the nucleus field of a noad
@define supscr(#) => # + 2 // the supscr field of a noad
@define subscr(#) => # + 3 // the subscr field of a noad
@define math_type => link // a halfword in mem 
// a quarterword in mem 
@define plane_and_fam_field => font
@define fam(#) => (plane_and_fam_field(#) % 0x100)
//  math_type when the attribute is simple
@define math_char => 1
//  math_type when the attribute is a box
@define sub_box => 2
//  math_type when the attribute is a formula
@define sub_mlist => 3
//  math_type when italic correction is dubious
@define math_text_char => 4

724. Each portion of a formula is classified as Ord, Op, Bin, Rel, Open, Close, Punct, or Inner, for purposes of spacing and line breaking. An ord_noad , op_noad , bin_noad , rel_noad , open_noad , close_noad , punct_noad , or inner_noad is used to represent portions of the various types. For example, an ‘=’ sign in a formula leads to the creation of a rel_noad whose nucleus field is a representation of an equals sign (usually fam == 0 , character == 0x3d ). A formula preceded by \mathrel also results in a rel_noad . When a rel_noad is followed by an op_noad , say, and possibly separated by one or more ordinary nodes (not noads), TEX will insert a penalty node (with the current rel_penalty ) just after the formula that corresponds to the rel_noad , unless there already was a penalty immediately following; and a “thick space” will be inserted just before the formula that corresponds to the op_noad .

A noad of type ord_noad , op_noad , …, inner_noad usually has a subtype == normal . The only exception is that an op_noad might have subtype == limits or no_limits , if the normal positioning of limits has been overridden for this operator.

//  type of a noad classified Ord
@define ord_noad => unset_node + 3
//  type of a noad classified Op
@define op_noad => ord_noad + 1
//  type of a noad classified Bin
@define bin_noad => ord_noad + 2
//  type of a noad classified Rel
@define rel_noad => ord_noad + 3
//  type of a noad classified Open
@define open_noad => ord_noad + 4
//  type of a noad classified Close
@define close_noad => ord_noad + 5
//  type of a noad classified Punct
@define punct_noad => ord_noad + 6
//  type of a noad classified Inner
@define inner_noad => ord_noad + 7
//  subtype of op_noad whose scripts are to be above, below
@define limits => 1
//  subtype of op_noad whose scripts are to be normal
@define no_limits => 2

725. A radical_noad is five words long; the fifth word is the left_delimiter field, which usually represents a square root sign.

A fraction_noad is six words long; it has a right_delimiter field as well as a left_delimiter .

Delimiter fields are of type four_quarters , and they have four subfields called small_fam , small_char , large_fam , large_char . These subfields represent variable-size delimiters by giving the “small” and “large” starting characters, as explained in Chapter 17 of The TEXbook.

A fraction_noad is actually quite different from all other noads. Not only does it have six words, it has thickness , denominator , and numerator fields instead of nucleus , subscr , and supscr . The thickness is a scaled value that tells how thick to make a fraction rule; however, the special value default_code is used to stand for the default_rule_thickness of the current size. The numerator and denominator point to mlists that define a fraction; we always have

math_type(numerator)==math_type(denominator)==sub_mlist.
The left_delimiter and right_delimiter fields specify delimiters that will be placed at the left and right of the fraction. In this way, a fraction_noad is able to represent all of TEX’s operators \over, \atop, \above, \overwithdelims, \atopwithdelims, and \abovewithdelims.

// first delimiter field of a noad
@define left_delimiter(#) => # + 4
// second delimiter field of a fraction noad
@define right_delimiter(#) => # + 5
//  type of a noad for square roots
@define radical_noad => inner_noad + 1
// number of mem words in a radical noad
@define radical_noad_size => 5
//  type of a noad for generalized fractions
@define fraction_noad => radical_noad + 1
// number of mem words in a fraction noad
@define fraction_noad_size => 6
@define small_fam(#) =>
    (mem[#].qqqq.b0 % 0x100) //  fam for ``small'' delimiter
@define small_char(#) =>
    (mem[#].qqqq.b1 + (mem[#].qqqq.b0 div 0x100) * 0x10000) //  
    // character for ``small'' delimiter
@define large_fam(#) =>
    (mem[#].qqqq.b2 % 0x100) //  fam for ``large'' delimiter
@define large_char(#) =>
    (mem[#].qqqq.b3 + (mem[#].qqqq.b2 div 0x100) * 0x10000) //  
    // character for ``large'' delimiter
@define small_plane_and_fam_field(#) => mem[#].qqqq.b0
@define small_char_field(#) => mem[#].qqqq.b1
@define large_plane_and_fam_field(#) => mem[#].qqqq.b2
@define large_char_field(#) => mem[#].qqqq.b3
//  thickness field in a fraction noad
@define thickness => width
// denotes default_rule_thickness 
@define default_code => 0x40000000
//  numerator field in a fraction noad
@define numerator => supscr
//  denominator field in a fraction noad
@define denominator => subscr

726. The global variable empty_field is set up for initialization of empty fields in new noads. Similarly, null_delimiter is for the initialization of delimiter fields.

⟦13 Global variables⟧ += ⟦
    var empty_field: two_halves;

    var null_delimiter: four_quarters;
⟧

727.

⟦23 Set initial values of key variables⟧ += ⟦
    empty_field.rh = empty

    empty_field.lh = null

    null_delimiter.b0 = 0

    null_delimiter.b1 = min_quarterword

    null_delimiter.b2 = 0

    null_delimiter.b3 = min_quarterword
⟧

728. The new_noad function creates an ord_noad that is completely null.

function new_noad(): pointer {
    var p: pointer;
    
    p = get_node(noad_size);
    type(p) = ord_noad;
    subtype(p) = normal;
    mem[nucleus(p)].hh = empty_field;
    mem[subscr(p)].hh = empty_field;
    mem[supscr(p)].hh = empty_field;
    new_noad = p;
}

729. A few more kinds of noads will complete the set: An under_noad has its nucleus underlined; an over_noad has it overlined. An accent_noad places an accent over its nucleus; the accent character appears as fam(accent_chr(p)) and character(accent_chr(p)) . A vcenter_noad centers its nucleus vertically with respect to the axis of the formula; in such noads we always have math_type(nucleus(p)) == sub_box .

And finally, we have left_noad and right_noad types, to implement TEX’s \left and \right as well as 𝜀-TEX’s \middle. The nucleus of such noads is replaced by a delimiter field; thus, for example, ‘\left(’ produces a left_noad such that delimiter(p) holds the family and character codes for all left parentheses. A left_noad never appears in an mlist except as the first element, and a right_noad never appears in an mlist except as the last element; furthermore, we either have both a left_noad and a right_noad , or neither one is present. The subscr and supscr fields are always empty in a left_noad and a right_noad .

//  type of a noad for underlining
@define under_noad => fraction_noad + 1
//  type of a noad for overlining
@define over_noad => under_noad + 1
//  type of a noad for accented subformulas
@define accent_noad => over_noad + 1
//  subtype for non growing math accents
@define fixed_acc => 1
@define bottom_acc => 2 //  subtype for bottom math accents
@define is_bottom_acc(#) =>
    (
        (subtype(#) == bottom_acc)
        || (subtype(#) == bottom_acc + fixed_acc)
    )
// number of mem words in an accent noad
@define accent_noad_size => 5
// the accent_chr field of an accent noad
@define accent_chr(#) => # + 4
//  type of a noad for \.{\\vcenter}
@define vcenter_noad => accent_noad + 1
//  type of a noad for \.{\\left}
@define left_noad => vcenter_noad + 1
//  type of a noad for \.{\\right}
@define right_noad => left_noad + 1
//  delimiter field in left and right noads
@define delimiter => nucleus
//  subtype of right noad representing \.{\\middle}
@define middle_noad => 1
@define scripts_allowed(#) =>
    (type(#) >= ord_noad) && (type(#) < left_noad)

730. Math formulas can also contain instructions like \textstyle that override TEX’s normal style rules. A style_node is inserted into the data structure to record such instructions; it is three words long, so it is considered a node instead of a noad. The subtype is either display_style or text_style or script_style or script_script_style . The second and third words of a style_node are not used, but they are present because a choice_node is converted to a style_node .

TEX uses even numbers 0, 2, 4, 6 to encode the basic styles display_style , …, script_script_style , and adds 1 to get the “cramped” versions of these styles. This gives a numerical order that is backwards from the convention of Appendix G in The TEXbook; i.e., a smaller style has a larger numerical value.

//  type of a style node
@define style_node => unset_node + 1
// number of words in a style node
@define style_node_size => 3
//  subtype for \.{\\displaystyle}
@define display_style => 0
@define text_style => 2 //  subtype for \.{\\textstyle}
@define script_style => 4 //  subtype for \.{\\scriptstyle}
//  subtype for \.{\\scriptscriptstyle}
@define script_script_style => 6
// add this to an uncramped style if you want to cramp it
@define cramped => 1
// create a style node
function new_style(s: small_number): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(style_node_size);
    type(p) = style_node;
    subtype(p) = s;
    width(p) = 0;
    // the width and depth are not used
    depth(p) = 0;
    new_style = p;
}

731. Finally, the \mathchoice primitive creates a choice_node , which has special subfields display_mlist , text_mlist , script_mlist , and script_script_mlist pointing to the mlists for each style.

//  type of a choice node
@define choice_node => unset_node + 2
// mlist to be used in display style
@define display_mlist(#) => info(# + 1)
// mlist to be used in text style
@define text_mlist(#) => link(# + 1)
// mlist to be used in script style
@define script_mlist(#) => info(# + 2)
// mlist to be used in scriptscript style
@define script_script_mlist(#) => link(# + 2)
// create a choice node
function new_choice(): pointer {
    var
      p: pointer; // the new node
    
    p = get_node(style_node_size);
    type(p) = choice_node;
    // the subtype is not used
    subtype(p) = 0;
    display_mlist(p) = null;
    text_mlist(p) = null;
    script_mlist(p) = null;
    script_script_mlist(p) = null;
    new_choice = p;
}

732. Let’s consider now the previously unwritten part of show_node_list that displays the things that can only be present in mlists; this program illustrates how to access the data structures just defined.

In the context of the following program, p points to a node or noad that should be displayed, and the current string contains the “recursion history” that leads to this point. The recursion history consists of a dot for each outer level in which p is subsidiary to some node, or in which p is subsidiary to the nucleus field of some noad; the dot is replaced by ‘_’ or ‘^’ or ‘/’ or ‘\’ if p is descended from the subscr or supscr or denominator or numerator fields of noads. For example, the current string would be ‘.^._/’ if p points to the ord_noad for x in the (ridiculous) formula ‘$\sqrt{a^{\mathinner{b_{c\over x+y}}}}$’.

⟦732 Cases of |show_node_list| that arise in mlists only⟧ = ⟦
    style_node:

    print_style(subtype(p))

    choice_node:

    ⟦737 Display choice node |p|⟧

    
    ord_noad,
    op_noad,
    bin_noad,
    rel_noad,
    open_noad,
    close_noad,
    punct_noad,
    inner_noad,
    radical_noad,
    over_noad,
    under_noad,
    vcenter_noad,
    accent_noad,
    left_noad,
    right_noad:
      ⟦738 Display normal noad |p|⟧

    fraction_noad:

    ⟦739 Display fraction noad |p|⟧

733. Here are some simple routines used in the display of noads.

⟦733 Declare procedures needed for displaying the elements of mlists⟧ = ⟦
    // prints family and character
    function print_fam_and_char(p: pointer) {
        var c: integer;
        
        print_esc(strpool!("fam"));
        print_int(fam(p) % 0x100);
        print_char(ord!(" "));
        c = (
            cast_to_ushort(character(p))
            + ((plane_and_fam_field(p) div 0x100) * 0x10000)
        );
        if (c < 0x10000) {
            print_ASCII(c);
        } else {
            // non-Plane 0 Unicodes can't be sent through 
            // print_ASCII 
            print_char(c);
        }
    }

    // prints a delimiter as 24-bit hex value
    function print_delimiter(p: pointer) {
        var
          a: integer; // accumulator
        
        a = small_fam(p) * 256 + qo(small_char(p));
        a = 
            a
            * 0x1000
            + large_fam(p) * 256 + qo(large_char(p))
        ;
        if (a < 0) {
            // this should never happen
            print_int(a);
        } else {
            print_hex(a);
        }
    }
⟧

734. The next subroutine will descend to another level of recursion when a subsidiary mlist needs to be displayed. The parameter c indicates what character is to become part of the recursion history. An empty mlist is distinguished from a field with math_type(p) == empty , because these are not equivalent (as explained above).

⟦733 Declare procedures needed for displaying the elements of mlists⟧ += ⟦
    //  show_node_list ( info ( temp_ptr ) ) 
    forward_declaration show_info();

    // display a noad field
    function print_subsidiary_data(
      p: pointer,
      c: ASCII_code,
    ) {
        if (cur_length >= depth_threshold) {
            if (math_type(p) != empty) {
                print(strpool!(" []"));
            }
        } else {
            // include c in the recursion history
            append_char(c);
            // prepare for show_info if recursion is needed
            temp_ptr = p;
            case math_type(p) {
              math_char:
                print_ln;
                print_current_string;
                print_fam_and_char(p);
              sub_box:
                // recursive call
                show_info;
              sub_mlist:
                if (info(p) == null) {
                    print_ln;
                    print_current_string;
                    print(strpool!("{}"));
                } else {
                    // recursive call
                    show_info;
                }
              othercases:
                //  empty 
                do_nothing;
            }
            // remove c from the recursion history
            flush_char;
        }
    }
⟧

735. The inelegant introduction of show_info in the code above seems better than the alternative of using Pascal’s strange forward declaration for a procedure with parameters. The Pascal convention about dropping parameters from a post-forward procedure is, frankly, so intolerable to the author of TEX that he would rather stoop to communication via a global temporary variable. (A similar stoopidity occurred with respect to hlist_out and vlist_out above, and it will occur with respect to mlist_to_hlist below.)

// the reader will kindly forgive this
function show_info() {
    show_node_list(info(temp_ptr));
}

736.

⟦733 Declare procedures needed for displaying the elements of mlists⟧ += ⟦
    function print_style(c: integer) {
        case c div 2 {
          0:
            //  display_style == 0 
            print_esc(strpool!("displaystyle"));
          1:
            //  text_style == 2 
            print_esc(strpool!("textstyle"));
          2:
            //  script_style == 4 
            print_esc(strpool!("scriptstyle"));
          3:
            //  script_script_style == 6 
            print_esc(strpool!("scriptscriptstyle"));
          othercases:
            print(strpool!("Unknown style!"));
        }
    }
⟧

737.

⟦737 Display choice node |p|⟧ = ⟦
    {
        print_esc(strpool!("mathchoice"));
        append_char(ord!("D"));
        show_node_list(display_mlist(p));
        flush_char;
        append_char(ord!("T"));
        show_node_list(text_mlist(p));
        flush_char;
        append_char(ord!("S"));
        show_node_list(script_mlist(p));
        flush_char;
        append_char(ord!("s"));
        show_node_list(script_script_mlist(p));
        flush_char;
    }
⟧

738.

⟦738 Display normal noad |p|⟧ = ⟦
    {
        case type(p) {
          ord_noad:
            print_esc(strpool!("mathord"));
          op_noad:
            print_esc(strpool!("mathop"));
          bin_noad:
            print_esc(strpool!("mathbin"));
          rel_noad:
            print_esc(strpool!("mathrel"));
          open_noad:
            print_esc(strpool!("mathopen"));
          close_noad:
            print_esc(strpool!("mathclose"));
          punct_noad:
            print_esc(strpool!("mathpunct"));
          inner_noad:
            print_esc(strpool!("mathinner"));
          over_noad:
            print_esc(strpool!("overline"));
          under_noad:
            print_esc(strpool!("underline"));
          vcenter_noad:
            print_esc(strpool!("vcenter"));
          radical_noad:
            print_esc(strpool!("radical"));
            print_delimiter(left_delimiter(p));
          accent_noad:
            print_esc(strpool!("accent"));
            print_fam_and_char(accent_chr(p));
          left_noad:
            print_esc(strpool!("left"));
            print_delimiter(delimiter(p));
          right_noad:
            if (subtype(p) == normal) {
                print_esc(strpool!("right"));
            } else {
                print_esc(strpool!("middle"));
            }
            print_delimiter(delimiter(p));
        }
        if (type(p) < left_noad) {
            if (subtype(p) != normal) {
                if (subtype(p) == limits) {
                    print_esc(strpool!("limits"));
                } else {
                    print_esc(strpool!("nolimits"));
                }
            }
            print_subsidiary_data(nucleus(p), ord!("."));
        }
        print_subsidiary_data(supscr(p), ord!("^"));
        print_subsidiary_data(subscr(p), ord!("_"));
    }
⟧

739.

⟦739 Display fraction noad |p|⟧ = ⟦
    {
        print_esc(strpool!("fraction, thickness "));
        if (thickness(p) == default_code) {
            print(strpool!("= default"));
        } else {
            print_scaled(thickness(p));
        }
        if (
            (small_fam(left_delimiter(p)) != 0)
            || (
                small_char(left_delimiter(p))
                != min_quarterword
            )
            || (large_fam(left_delimiter(p)) != 0)
            || (
                large_char(left_delimiter(p))
                != min_quarterword
            )
        ) {
            print(strpool!(", left-delimiter "));
            print_delimiter(left_delimiter(p));
        }
        if (
            (small_fam(right_delimiter(p)) != 0)
            || (
                small_char(right_delimiter(p))
                != min_quarterword
            )
            || (large_fam(right_delimiter(p)) != 0)
            || (
                large_char(right_delimiter(p))
                != min_quarterword
            )
        ) {
            print(strpool!(", right-delimiter "));
            print_delimiter(right_delimiter(p));
        }
        print_subsidiary_data(numerator(p), ord!("\\"));
        print_subsidiary_data(denominator(p), ord!("/"));
    }
⟧

740. That which can be displayed can also be destroyed.

⟦740 Cases of |flush_node_list| that arise in mlists only⟧ = ⟦
    style_node:

    {
        free_node(p, style_node_size);
        goto done;
    }

    choice_node:

    {
        flush_node_list(display_mlist(p));
        flush_node_list(text_mlist(p));
        flush_node_list(script_mlist(p));
        flush_node_list(script_script_mlist(p));
        free_node(p, style_node_size);
        goto done;
    }

    
    ord_noad,
    op_noad,
    bin_noad,
    rel_noad,
    open_noad,
    close_noad,
    punct_noad,
    inner_noad,
    radical_noad,
    over_noad,
    under_noad,
    vcenter_noad,
    accent_noad:
      if (math_type(nucleus(p)) >= sub_box) {
          flush_node_list(info(nucleus(p)));
      }
      if (math_type(supscr(p)) >= sub_box) {
          flush_node_list(info(supscr(p)));
      }
      if (math_type(subscr(p)) >= sub_box) {
          flush_node_list(info(subscr(p)));
      }
      if (type(p) == radical_noad) {
          free_node(p, radical_noad_size);
      } else if (type(p) == accent_noad) {
          free_node(p, accent_noad_size);
      } else {
          free_node(p, noad_size);
      }
      goto done;

    left_noad, right_noad:
      free_node(p, noad_size);
      goto done;

    fraction_noad:

    {
        flush_node_list(info(numerator(p)));
        flush_node_list(info(denominator(p)));
        free_node(p, fraction_noad_size);
        goto done;
    }
⟧

741. [35] Subroutines for math mode. In order to convert mlists to hlists, i.e., noads to nodes, we need several subroutines that are conveniently dealt with now.

Let us first introduce the macros that make it easy to get at the parameters and other font information. A size code, which is a multiple of 16, is added to a family number to get an index into the table of internal font numbers for each combination of family and size. (Be alert: Size codes get larger as the type gets smaller.)

⟦57 Basic printing procedures⟧ += ⟦
    function print_size(s: integer) {
        if (s == text_size) {
            print_esc(strpool!("textfont"));
        } else if (s == script_size) {
            print_esc(strpool!("scriptfont"));
        } else {
            print_esc(strpool!("scriptscriptfont"));
        }
    }
⟧

742. Before an mlist is converted to an hlist, TEX makes sure that the fonts in family 2 have enough parameters to be math-symbol fonts, and that the fonts in family 3 have enough parameters to be math-extension fonts. The math-symbol parameters are referred to by using the following macros, which take a size code as their parameter; for example, num1(cur_size) gives the value of the num1 parameter for the current size.

NB: the access functions here must all put the font # into /f/ for mathsy().

The accessors are defined with define_mathsy_accessor(NAME)(fontdimen - number)(NAME) because I can’t see how to only give the name once, with WEB’s limited macro capabilities. This seems a bit ugly, but it works.

// the following are OpenType MATH constant indices for use 
// with OT math fonts
@define total_mathsy_params => 22
@define scriptPercentScaleDown => 0
@define scriptScriptPercentScaleDown => 1
@define delimitedSubFormulaMinHeight => 2
@define displayOperatorMinHeight => 3
@define mathLeading => 4
@define firstMathValueRecord => mathLeading
@define axisHeight => 5
@define accentBaseHeight => 6
@define flattenedAccentBaseHeight => 7
@define subscriptShiftDown => 8
@define subscriptTopMax => 9
@define subscriptBaselineDropMin => 10
@define superscriptShiftUp => 11
@define superscriptShiftUpCramped => 12
@define superscriptBottomMin => 13
@define superscriptBaselineDropMax => 14
@define subSuperscriptGapMin => 15
@define superscriptBottomMaxWithSubscript => 16
@define spaceAfterScript => 17
@define upperLimitGapMin => 18
@define upperLimitBaselineRiseMin => 19
@define lowerLimitGapMin => 20
@define lowerLimitBaselineDropMin => 21
@define stackTopShiftUp => 22
@define stackTopDisplayStyleShiftUp => 23
@define stackBottomShiftDown => 24
@define stackBottomDisplayStyleShiftDown => 25
@define stackGapMin => 26
@define stackDisplayStyleGapMin => 27
@define stretchStackTopShiftUp => 28
@define stretchStackBottomShiftDown => 29
@define stretchStackGapAboveMin => 30
@define stretchStackGapBelowMin => 31
@define fractionNumeratorShiftUp => 32
@define fractionNumeratorDisplayStyleShiftUp => 33
@define fractionDenominatorShiftDown => 34
@define fractionDenominatorDisplayStyleShiftDown => 35
@define fractionNumeratorGapMin => 36
@define fractionNumDisplayStyleGapMin => 37
@define fractionRuleThickness => 38
@define fractionDenominatorGapMin => 39
@define fractionDenomDisplayStyleGapMin => 40
@define skewedFractionHorizontalGap => 41
@define skewedFractionVerticalGap => 42
@define overbarVerticalGap => 43
@define overbarRuleThickness => 44
@define overbarExtraAscender => 45
@define underbarVerticalGap => 46
@define underbarRuleThickness => 47
@define underbarExtraDescender => 48
@define radicalVerticalGap => 49
@define radicalDisplayStyleVerticalGap => 50
@define radicalRuleThickness => 51
@define radicalExtraAscender => 52
@define radicalKernBeforeDegree => 53
@define radicalKernAfterDegree => 54
@define lastMathValueRecord => radicalKernAfterDegree
@define radicalDegreeBottomRaisePercent => 55
@define lastMathConstant => radicalDegreeBottomRaisePercent
@define mathsy(#) => font_info[# + param_base[f]].sc
@define define_mathsy_end(#) =>
    /*... opened earlier ...*/
        # = rval;
    }
@define define_mathsy_body(#) =>
    var
      var f: integer;
      var rval: scaled;
      
      f = fam_fnt(2 + size_code);
      if (is_new_mathfont(f)) {
          rval = get_native_mathsy_param(f, #);
      } else {
          rval = mathsy(#);
      }
      define_mathsy_end;
@define define_mathsy_accessor(#) =>
    function #(size_code: integer): scaled {
        define_mathsy_body
    /* ... continued later ... */
define_mathsy_accessor(math_x_height)(5)(math_x_height)

define_mathsy_accessor(math_quad)(6)(math_quad)

define_mathsy_accessor(num1)(8)(num1)

define_mathsy_accessor(num2)(9)(num2)

define_mathsy_accessor(num3)(10)(num3)

define_mathsy_accessor(denom1)(11)(denom1)

define_mathsy_accessor(denom2)(12)(denom2)

define_mathsy_accessor(sup1)(13)(sup1)

define_mathsy_accessor(sup2)(14)(sup2)

define_mathsy_accessor(sup3)(15)(sup3)

define_mathsy_accessor(sub1)(16)(sub1)

define_mathsy_accessor(sub2)(17)(sub2)

define_mathsy_accessor(sup_drop)(18)(sup_drop)

define_mathsy_accessor(sub_drop)(19)(sub_drop)

define_mathsy_accessor(delim1)(20)(delim1)

define_mathsy_accessor(delim2)(21)(delim2)

define_mathsy_accessor(axis_height)(22)(axis_height)

743. The math-extension parameters have similar macros, but the size code is omitted (since it is always cur_size when we refer to such parameters).

@define total_mathex_params => 13
@define mathex(#) => font_info[# + param_base[f]].sc
@define define_mathex_end(#) =>
    /*... opened earlier ...*/
        # = rval;
    }
@define define_mathex_body(#) =>
    var
      var f: integer;
      var rval: scaled;
      
      f = fam_fnt(3 + cur_size);
      if (is_new_mathfont(f)) {
          rval = get_native_mathex_param(f, #);
      } else {
          rval = mathex(#);
      }
      define_mathex_end;
@define define_mathex_accessor(#) =>
    function #(): scaled {
        define_mathex_body
    /* ... continued later ... */
define_mathex_accessor(default_rule_thickness)(8)(
  default_rule_thickness,
)

define_mathex_accessor(big_op_spacing1)(9)(big_op_spacing1)

define_mathex_accessor(big_op_spacing2)(10)(big_op_spacing2)

define_mathex_accessor(big_op_spacing3)(11)(big_op_spacing3)

define_mathex_accessor(big_op_spacing4)(12)(big_op_spacing4)

define_mathex_accessor(big_op_spacing5)(13)(big_op_spacing5)

744. Native font support requires these additional subroutines.

new_native_word_node creates the node, but does not actually set its metrics; call set_native_metrics(node) if that is required.
⟦616 Declare subroutines for |new_character|⟧ += ⟦
    function new_native_word_node(
      f: internal_font_number,
      n: integer,
    ): pointer {
        var l: integer, q: pointer;
        
        l = 
            native_node_size
            + (
                n
                * sizeof(UTF16_code)
                + sizeof(memory_word) - 1
            )
            div sizeof(memory_word)
        ;
        q = get_node(l);
        type(q) = whatsit_node;
        if (XeTeX_generate_actual_text_en) {
            subtype(q) = native_word_node_AT;
        } else {
            subtype(q) = native_word_node;
        }
        native_size(q) = l;
        native_font(q) = f;
        native_length(q) = n;
        native_glyph_count(q) = 0;
        native_glyph_info_ptr(q) = null_ptr;
        new_native_word_node = q;
    }

    function new_native_character(
      f: internal_font_number,
      c: UnicodeScalar,
    ): pointer {
        var p: pointer, i, len: integer;
        
        if (font_mapping[f] != 0) {
            if (c > 0xffff) {
                str_room(2);
                append_char(
                  (c - 0x10000) div 1024 + 0xd800,
                );
                append_char((c - 0x10000) % 1024 + 0xdc00);
            } else {
                str_room(1);
                append_char(c);
            }
            len = apply_mapping(
              font_mapping[f],
              addressof(str_pool[str_start_macro(str_ptr)]),
              cur_length,
            );
            // flush the string, as we'll be using the 
            // mapped text instead
            pool_ptr = str_start_macro(str_ptr);
            i = 0;
            while (i < len) {
                if (
                    (mapped_text[i] >= 0xd800)
                    && (mapped_text[i] < 0xdc00)
                ) {
                    c = 
                        (mapped_text[i] - 0xd800)
                        * 1024
                        + mapped_text[i + 1]
                        - 0xdc00 + 0x10000
                    ;
                    if (map_char_to_glyph(f, c) == 0) {
                        char_warning(f, c);
                    }
                    i = i + 2;
                } else {
                    if (
                        map_char_to_glyph(
                          f,
                          mapped_text[i],
                        )
                        == 0
                    ) {
                        char_warning(f, mapped_text[i]);
                    }
                    i = i + 1;
                }
            }
            p = new_native_word_node(f, len);
            for (i in 0 to len - 1) {
                set_native_char(p, i, mapped_text[i]);
            }
        } else {
            if (tracing_lost_chars > 0) {
                if (map_char_to_glyph(f, c) == 0) {
                    char_warning(f, c);
                }
            }
            p = get_node(native_node_size + 1);
            type(p) = whatsit_node;
            subtype(p) = native_word_node;
            native_size(p) = native_node_size + 1;
            native_glyph_count(p) = 0;
            native_glyph_info_ptr(p) = null_ptr;
            native_font(p) = f;
            if (c > 0xffff) {
                native_length(p) = 2;
                set_native_char(
                  p,
                  0,
                  (c - 0x10000) div 1024 + 0xd800,
                );
                set_native_char(
                  p,
                  1,
                  (c - 0x10000) % 1024 + 0xdc00,
                );
            } else {
                native_length(p) = 1;
                set_native_char(p, 0, c);
            }
        }
        set_native_metrics(p, XeTeX_use_glyph_metrics);
        new_native_character = p;
    }

    function font_feature_warning(
      featureNameP: void_pointer,
      featLen: integer,
      settingNameP: void_pointer,
      setLen: integer,
    ) {
        var i: integer;
        
        begin_diagnostic;
        print_nl(strpool!("Unknown "));
        if (setLen > 0) {
            print(strpool!("selector `"));
            print_utf8_str(settingNameP, setLen);
            print(strpool!("' for "));
        }
        print(strpool!("feature `"));
        print_utf8_str(featureNameP, featLen);
        print(strpool!("' in font `"));
        i = 1;
        while (ord(name_of_file[i]) != 0) {
            // this is already UTF-8
            print_visible_char(name_of_file[i]);
            incr(i);
        }
        print(strpool!("'."));
        end_diagnostic(false);
    }

    // 0: just logging; 1: file not found; 2: can't load
    function font_mapping_warning(
      mappingNameP: void_pointer,
      mappingNameLen: integer,
      warningType: integer,
    ) {
        var i: integer;
        
        begin_diagnostic;
        if (warningType == 0) {
            print_nl(strpool!("Loaded mapping `"));
        } else {
            print_nl(strpool!("Font mapping `"));
        }
        print_utf8_str(mappingNameP, mappingNameLen);
        print(strpool!("' for font `"));
        i = 1;
        while (ord(name_of_file[i]) != 0) {
            // this is already UTF-8
            print_visible_char(name_of_file[i]);
            incr(i);
        }
        case warningType {
          1:
            print(strpool!("' not found."));
          2:
            print(strpool!("' not usable;"));
            print_nl(
              strpool!("bad mapping file or incorrect mapping type."),
            );
          othercases:
            print(strpool!("'."));
        }
        end_diagnostic(false);
    }

    function graphite_warning() {
        var i: integer;
        
        begin_diagnostic;
        print_nl(strpool!("Font `"));
        i = 1;
        while (ord(name_of_file[i]) != 0) {
            // this is already UTF-8
            print_visible_char(name_of_file[i]);
            incr(i);
        }
        print(
          strpool!("' does not support Graphite. Trying OpenType layout instead."),
        );
        end_diagnostic(false);
    }

    function load_native_font(
      u: pointer,
      nom, aire: str_number,
      s: scaled,
    ): internal_font_number {
        label done;
        const first_math_fontdimen = 10;
        var
          k, num_font_dimens: integer,
          font_engine: void_pointer, // really an 
          // CFDictionaryRef or XeTeXLayoutEngine
          actual_size: scaled, //  s converted to real size, 
          // if it was negative
          p: pointer, // for temporary native_char node 
          // we'll create
          ascent, descent, font_slant, x_ht, cap_ht: scaled,
          f: internal_font_number,
          full_name: str_number;
        
        // on entry here, the full name is packed into 
        // name_of_file in UTF8 form
        load_native_font = null_font;
        font_engine = find_native_font(name_of_file + 1, s);
        if (font_engine == 0) {
            goto done;
        }
        if (s >= 0) {
            actual_size = s;
        } else {
            if ((s != -1000)) {
                actual_size = xn_over_d(
                  loaded_font_design_size,
                  -s,
                  1000,
                );
            } else {
                actual_size = loaded_font_design_size;
            }
            // look again to see if the font is already 
            // loaded, now that we know its canonical name
        }
        str_room(name_length);
        for (k in 1 to name_length) {
            append_char(name_of_file[k]);
        }
        // not slow_make_string because we'll flush it if 
        // the font was already loaded
        full_name = make_string;
        for (f in font_base + 1 to font_ptr) {
            if (
                (font_area[f] == native_font_type_flag)
                && str_eq_str(font_name[f], full_name)
                && (font_size[f] == actual_size)
            ) {
                release_font_engine(
                  font_engine,
                  native_font_type_flag,
                );
                flush_string;
                load_native_font = f;
                goto done;
            }
        }
        if (
            (native_font_type_flag == otgr_font_flag)
            && isOpenTypeMathFont(font_engine)
        ) {
            num_font_dimens = 
                first_math_fontdimen
                + lastMathConstant
            ;
        } else {
            num_font_dimens = 8;
        }
        if (
            (font_ptr == font_max)
            || (fmem_ptr + num_font_dimens > font_mem_size)
        ) {
            ⟦602 Apologize for not loading the font, |goto done|⟧
            // we've found a valid installed font, and have 
            // room
        }
        incr(font_ptr);
        // set by find_native_font to either aat_font_flag 
        // or ot_font_flag 
        // store the canonical name
        font_area[font_ptr] = native_font_type_flag;
        font_name[font_ptr] = full_name;
        font_check[font_ptr].b0 = 0;
        font_check[font_ptr].b1 = 0;
        font_check[font_ptr].b2 = 0;
        font_check[font_ptr].b3 = 0;
        font_glue[font_ptr] = null;
        font_dsize[font_ptr] = loaded_font_design_size;
        font_size[font_ptr] = actual_size;
        if ((native_font_type_flag == aat_font_flag)) {
            aat_get_font_metrics(
              font_engine,
              addressof(ascent),
              addressof(descent),
              addressof(x_ht),
              addressof(cap_ht),
              addressof(font_slant),
            );
        } else {
            ot_get_font_metrics(
              font_engine,
              addressof(ascent),
              addressof(descent),
              addressof(x_ht),
              addressof(cap_ht),
              addressof(font_slant),
            );
        }
        height_base[font_ptr] = ascent;
        depth_base[font_ptr] = -descent;
        // we add an extra \.{\\fontdimen8} = cap_height ; 
        // then OT math fonts have a bunch more
        font_params[font_ptr] = num_font_dimens;
        font_bc[font_ptr] = 0;
        font_ec[font_ptr] = 65535;
        font_used[font_ptr] = false;
        hyphen_char[font_ptr] = default_hyphen_char;
        skew_char[font_ptr] = default_skew_char;
        param_base[font_ptr] = fmem_ptr - 1;
        font_layout_engine[font_ptr] = font_engine;
        // don't use the mapping, if any, when measuring 
        // space here
        font_mapping[font_ptr] = 0;
        // measure the width of the space character and set 
        // up font parameters
        font_letter_space[font_ptr] = (
          loaded_font_letter_space
        );
        p = new_native_character(font_ptr, ord!(" "));
        s = width(p) + loaded_font_letter_space;
        free_node(p, native_size(p));
        //  slant 
        font_info[fmem_ptr].sc = font_slant;
        incr(fmem_ptr);
        //  space = width of space character
        font_info[fmem_ptr].sc = s;
        incr(fmem_ptr);
        //  space_stretch = 1/2 * space
        font_info[fmem_ptr].sc = s div 2;
        incr(fmem_ptr);
        //  space_shrink = 1/3 * space
        font_info[fmem_ptr].sc = s div 3;
        incr(fmem_ptr);
        //  x_height 
        font_info[fmem_ptr].sc = x_ht;
        incr(fmem_ptr);
        //  quad = font size
        font_info[fmem_ptr].sc = font_size[font_ptr];
        incr(fmem_ptr);
        //  extra_space = 1/3 * space
        font_info[fmem_ptr].sc = s div 3;
        incr(fmem_ptr);
        //  cap_height 
        font_info[fmem_ptr].sc = cap_ht;
        incr(fmem_ptr);
        if (
            num_font_dimens
            == first_math_fontdimen + lastMathConstant
        ) {
            // \.{\\fontdimen9} = number of assigned 
            // fontdimens
            font_info[fmem_ptr].int = num_font_dimens;
            incr(fmem_ptr);
            for (k in 0 to lastMathConstant) {
                font_info[fmem_ptr].sc = (
                  get_ot_math_constant
                )(font_ptr, k);
                incr(fmem_ptr);
            }
        }
        font_mapping[font_ptr] = loaded_font_mapping;
        font_flags[font_ptr] = loaded_font_flags;
        load_native_font = font_ptr;
      done:
    }

    function do_locale_linebreaks(
      s: integer,
      len: integer,
    ) {
        var
          offs, prevOffs, i: integer,
          use_penalty, use_skip: boolean;
        
        if ((XeTeX_linebreak_locale == 0) || (len == 1)) {
            link(tail) = new_native_word_node(main_f, len);
            tail = link(tail);
            for (i in 0 to len - 1) {
                set_native_char(
                  tail,
                  i,
                  native_text[s + i],
                );
            }
            set_native_metrics(
              tail,
              XeTeX_use_glyph_metrics,
            );
        } else {
            use_skip = XeTeX_linebreak_skip != zero_glue;
            use_penalty = 
                XeTeX_linebreak_penalty
                != 0 || !use_skip
            ;
            linebreak_start(
              main_f,
              XeTeX_linebreak_locale,
              native_text + s,
              len,
            );
            offs = 0;
            repeat {
                prevOffs = offs;
                offs = linebreak_next;
                if (offs > 0) {
                    if (prevOffs != 0) {
                        if (use_penalty) {
                            tail_append(
                              new_penalty(
                                XeTeX_linebreak_penalty,
                              ),
                            );
                        }
                        if (use_skip) {
                            tail_append(
                              new_param_glue(
                                XeTeX_linebreak_skip_code,
                              ),
                            );
                        }
                    }
                    link(tail) = new_native_word_node(
                      main_f,
                      offs - prevOffs,
                    );
                    tail = link(tail);
                    for (i in prevOffs to offs - 1) {
                        set_native_char(
                          tail,
                          i - prevOffs,
                          native_text[s + i],
                        );
                    }
                    set_native_metrics(
                      tail,
                      XeTeX_use_glyph_metrics,
                    );
                }
            } until (offs < 0);
        }
    }

    function bad_utf8_warning() {
        begin_diagnostic;
        print_nl(
          strpool!("Invalid UTF-8 byte or sequence"),
        );
        if (terminal_input) {
            print(strpool!(" in terminal input"));
        } else {
            print(strpool!(" at line "));
            print_int(line);
        }
        print(strpool!(" replaced by U+FFFD."));
        end_diagnostic(false);
    }

    function get_input_normalization_state(): integer {
        if (eqtb == nil) {
            // may be called before eqtb is initialized
            get_input_normalization_state = 0;
        } else {
            get_input_normalization_state = (
              XeTeX_input_normalization_state
            );
        }
    }

    function get_tracing_fonts_state(): integer {
        get_tracing_fonts_state = XeTeX_tracing_fonts_state;
    }
⟧

745. We also need to compute the change in style between mlists and their subsidiaries. The following macros define the subsidiary style for an overlined nucleus (cramped_style ), for a subscript or a superscript (sub_style or sup_style ), or for a numerator or denominator (num_style or denom_style ).

// cramp the style
@define cramped_style(#) => 2 * (# div 2) + cramped
// smaller and cramped
@define sub_style(#) =>
    2 * (# div 4) + script_style + cramped
@define sup_style(#) =>
    2 * (# div 4) + script_style + (# % 2) // smaller
@define num_style(#) =>
    
        #
        + 2
        - 2
        * (# div 6) // smaller unless already script-script
@define denom_style(#) =>
    
        2
        * (# div 2)
        + cramped + 2 - 2 * (# div 6) // smaller, cramped

746. When the style changes, the following piece of program computes associated information:

⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧ = ⟦
    {
        if (cur_style < script_style) {
            cur_size = text_size;
        } else {
            cur_size = 
                script_size
                * ((cur_style - text_style) div 2)
            ;
        }
        cur_mu = x_over_n(math_quad(cur_size), 18);
    }
⟧

747. Here is a function that returns a pointer to a rule node having a given thickness t . The rule will extend horizontally to the boundary of the vlist that eventually contains it.

// construct the bar for a fraction
function fraction_rule(t: scaled): pointer {
    var
      p: pointer; // the new node
    
    p = new_rule;
    height(p) = t;
    depth(p) = 0;
    fraction_rule = p;
}

748. The overbar function returns a pointer to a vlist box that consists of a given box b , above which has been placed a kern of height k under a fraction rule of thickness t under additional space of height t .

function overbar(b: pointer, k, t: scaled): pointer {
    var
      p, q: pointer; // nodes being constructed
    
    p = new_kern(k);
    link(p) = b;
    q = fraction_rule(t);
    link(q) = p;
    p = new_kern(t);
    link(p) = q;
    overbar = vpack(p, natural);
}

749. The var_delimiter function, which finds or constructs a sufficiently large delimiter, is the most interesting of the auxiliary functions that currently concern us. Given a pointer d to a delimiter field in some noad, together with a size code s and a vertical distance v , this function returns a pointer to a box that contains the smallest variant of d whose height plus depth is v or more. (And if no variant is large enough, it returns the largest available variant.) In particular, this routine will construct arbitrarily large delimiters from extensible components, if d leads to such characters.

The value returned is a box whose shift_amount has been set so that the box is vertically centered with respect to the axis in the given size. If a built-up symbol is returned, the height of the box before shifting will be the height of its topmost component.

⟦752 Declare subprocedures for |var_delimiter|⟧

function stack_glyph_into_box(
  b: pointer,
  f: internal_font_number,
  g: integer,
) {
    var p, q: pointer;
    
    p = get_node(glyph_node_size);
    type(p) = whatsit_node;
    subtype(p) = glyph_node;
    native_font(p) = f;
    native_glyph(p) = g;
    set_native_glyph_metrics(p, 1);
    if (type(b) == hlist_node) {
        q = list_ptr(b);
        if (q == null) {
            list_ptr(b) = p;
        } else {
            while (link(q) != null) {
                q = link(q);
            }
            link(q) = p;
            if ((height(b) < height(p))) {
                height(b) = height(p);
            }
            if ((depth(b) < depth(p))) {
                depth(b) = depth(p);
            }
        }
    } else {
        link(p) = list_ptr(b);
        list_ptr(b) = p;
        height(b) = height(p);
        if ((width(b) < width(p))) {
            width(b) = width(p);
        }
    }
}

function stack_glue_into_box(b: pointer, min, max: scaled) {
    var p, q: pointer;
    
    q = new_spec(zero_glue);
    width(q) = min;
    stretch(q) = max - min;
    p = new_glue(q);
    if (type(b) == hlist_node) {
        q = list_ptr(b);
        if (q == null) {
            list_ptr(b) = p;
        } else {
            while (link(q) != null) {
                q = link(q);
            }
            link(q) = p;
        }
    } else {
        link(p) = list_ptr(b);
        list_ptr(b) = p;
        height(b) = height(p);
        width(b) = width(p);
    }
}

// return a box with height/width at least s , using font f 
// , with glyph assembly info from a 
function build_opentype_assembly(
  f: internal_font_number,
  a: void_pointer,
  s: scaled,
  horiz: boolean,
): pointer {
    var
      b: pointer, // the box we're constructing
      n: integer, // the number of repetitions of each 
      // extender
      i, j: integer, // indexes
      g: integer, // glyph code
      p: pointer, // temp pointer
      s_max, o, oo, prev_o, min_o: scaled,
      no_extenders: boolean,
      nat, str: scaled; // natural size, stretch
    
    b = new_null_box;
    if (horiz) {
        type(b) = hlist_node;
    } else {
        // figure out how many repeats of each extender to 
        // use
        type(b) = vlist_node;
    }
    n = -1;
    no_extenders = true;
    min_o = ot_min_connector_overlap(f);
    repeat {
        // calc max possible size with this number of 
        // extenders
        n = n + 1;
        s_max = 0;
        prev_o = 0;
        for (i in 0 to ot_part_count(a) - 1) {
            if (ot_part_is_extender(a, i)) {
                no_extenders = false;
                for (j in 1 to n) {
                    o = ot_part_start_connector(f, a, i);
                    if (min_o < o) {
                        o = min_o;
                    }
                    if (prev_o < o) {
                        o = prev_o;
                    }
                    s_max = 
                        s_max
                        - o + ot_part_full_advance(f, a, i)
                    ;
                    prev_o = ot_part_end_connector(f, a, i);
                }
            } else {
                o = ot_part_start_connector(f, a, i);
                if (min_o < o) {
                    o = min_o;
                }
                if (prev_o < o) {
                    o = prev_o;
                }
                s_max = 
                    s_max
                    - o + ot_part_full_advance(f, a, i)
                ;
                prev_o = ot_part_end_connector(f, a, i);
            }
        }// assemble box using n copies of each extender, 
        // with appropriate glue wherever an overlap occurs
    } until ((s_max >= s) || no_extenders);
    prev_o = 0;
    for (i in 0 to ot_part_count(a) - 1) {
        if (ot_part_is_extender(a, i)) {
            for (j in 1 to n) {
                o = ot_part_start_connector(f, a, i);
                if (prev_o < o) {
                    o = prev_o;
                }
                // max overlap
                oo = o;
                if (min_o < o) {
                    o = min_o;
                }
                if (oo > 0) {
                    stack_glue_into_box(b, -oo, -o);
                }
                g = ot_part_glyph(a, i);
                stack_glyph_into_box(b, f, g);
                prev_o = ot_part_end_connector(f, a, i);
            }
        } else {
            o = ot_part_start_connector(f, a, i);
            if (prev_o < o) {
                o = prev_o;
            }
            // max overlap
            oo = o;
            if (min_o < o) {
                o = min_o;
            }
            if (oo > 0) {
                stack_glue_into_box(b, -oo, -o);
            }
            g = ot_part_glyph(a, i);
            stack_glyph_into_box(b, f, g);
            prev_o = ot_part_end_connector(f, a, i);
        }
        // find natural size and total stretch of the box
    }
    p = list_ptr(b);
    nat = 0;
    str = 0;
    while (p != null) {
        if (type(p) == whatsit_node) {
            if (horiz) {
                nat = nat + width(p);
            } else {
                nat = nat + height(p) + depth(p);
            }
        } else if (type(p) == glue_node) {
            nat = nat + width(glue_ptr(p));
            str = str + stretch(glue_ptr(p));
        }
        p = link(p);
        // set glue so as to stretch the connections if 
        // needed
    }
    o = 0;
    if ((s > nat) && (str > 0)) {
        // don't stretch more than str 
        o = (s - nat);
        if ((o > str)) {
            o = str;
        }
        glue_order(b) = normal;
        glue_sign(b) = stretching;
        glue_set(b) = unfloat(o / str);
        if (horiz) {
            width(b) = nat + round(str * float(glue_set(b)));
        } else {
            height(b) = 
                nat
                + round(str * float(glue_set(b)))
            ;
        }
    } else if (horiz) {
        width(b) = nat;
    } else {
        height(b) = nat;
    }
    build_opentype_assembly = b;
}

function var_delimiter(
  d: pointer,
  s: integer,
  v: scaled,
): pointer {
    label found, continue;
    var
      b: pointer, // the box that will be constructed
      ot_assembly_ptr: void_pointer,
      f, g: internal_font_number, // best-so-far and 
      // tentative font codes
      c, x, y: quarterword, // best-so-far and tentative 
      // character codes
      m, n: integer, // the number of extensible pieces
      u: scaled, // height-plus-depth of a tentative 
      // character
      w: scaled, // largest height-plus-depth so far
      q: four_quarters, // character info
      hd: eight_bits, // height-depth byte
      r: four_quarters, // extensible pieces
      z: integer, // runs through font family members
      large_attempt: boolean; // are we trying the ``large'' 
      // variant?
    
    f = null_font;
    w = 0;
    large_attempt = false;
    z = small_fam(d);
    x = small_char(d);
    ot_assembly_ptr = nil;
    loop {
        ⟦750 Look at the variants of |(z,x)|; set |f| and |c| whenever a better character is found; |goto found| as soon as a large enough variant is encountered⟧
        if (large_attempt) {
            // there were none large enough
            goto found;
        }
        large_attempt = true;
        z = large_fam(d);
        x = large_char(d);
    }
  found:
    if (f != null_font) {
        if (!is_ot_font(f)) {
            ⟦753 Make variable |b| point to a box for |(f,c)|⟧
        } else {
            // for OT fonts, c is the glyph ID to use
            if (ot_assembly_ptr != nil) {
                b = build_opentype_assembly(
                  f,
                  ot_assembly_ptr,
                  v,
                  0,
                );
            } else {
                b = new_null_box;
                type(b) = vlist_node;
                list_ptr(b) = get_node(glyph_node_size);
                type(list_ptr(b)) = whatsit_node;
                subtype(list_ptr(b)) = glyph_node;
                native_font(list_ptr(b)) = f;
                native_glyph(list_ptr(b)) = c;
                set_native_glyph_metrics(list_ptr(b), 1);
                width(b) = width(list_ptr(b));
                height(b) = height(list_ptr(b));
                depth(b) = depth(list_ptr(b));
            }
        }
    } else {
        b = new_null_box;
        // use this width if no delimiter was found
        width(b) = null_delimiter_space;
    }
    shift_amount(b) = 
        half(height(b) - depth(b))
        - axis_height(s)
    ;
    free_ot_assembly(ot_assembly_ptr);
    var_delimiter = b;
}

750. The search process is complicated slightly by the facts that some of the characters might not be present in some of the fonts, and they might not be probed in increasing order of height.

⟦750 Look at the variants of |(z,x)|; set |f| and |c| whenever a better character is found; |goto found| as soon as a large enough variant is encountered⟧ = ⟦
    if ((z != 0) || (x != min_quarterword)) {
        z = z + s + script_size;
        repeat {
            z = z - script_size;
            g = fam_fnt(z);
            if (g != null_font) {
                ⟦751 Look at the list of characters starting with |x| in font |g|; set |f| and |c| whenever a better character is found; |goto found| as soon as a large enough variant is encountered⟧
            }
        } until (z < script_size);
    }
⟧

751.

⟦751 Look at the list of characters starting with |x| in font |g|; set |f| and |c| whenever a better character is found; |goto found| as soon as a large enough variant is encountered⟧ = ⟦
    if (is_ot_font(g)) {
        x = map_char_to_glyph(g, x);
        f = g;
        c = x;
        w = 0;
        n = 0;
        repeat {
            y = get_ot_math_variant(
              g,
              x,
              n,
              addressof(u),
              0,
            );
            if (u > w) {
                c = y;
                w = u;
                if (u >= v) {
                    goto found;
                }
            }
            n = n + 1;// if we get here, then we didn't find 
            // a big enough glyph; check if the char is 
            // extensible
        } until (u < 0);
        ot_assembly_ptr = get_ot_assembly_ptr(g, x, 0);
        if (ot_assembly_ptr != nil) {
            goto found;
        }
    } else {
        y = x;
        if ((qo(y) >= font_bc[g]) && (qo(y) <= font_ec[g])) {
          continue:
            q = orig_char_info(g)(y);
            if (char_exists(q)) {
                if (char_tag(q) == ext_tag) {
                    f = g;
                    c = y;
                    goto found;
                }
                hd = height_depth(q);
                u = char_height(g)(hd) + char_depth(g)(hd);
                if (u > w) {
                    f = g;
                    c = y;
                    w = u;
                    if (u >= v) {
                        goto found;
                    }
                }
                if (char_tag(q) == list_tag) {
                    y = rem_byte(q);
                    goto continue;
                }
            }
        }
    }
⟧

752. Here is a subroutine that creates a new box, whose list contains a single character, and whose width includes the italic correction for that character. The height or depth of the box will be negative, if the height or depth of the character is negative; thus, this routine may deliver a slightly different result than hpack would produce.

⟦752 Declare subprocedures for |var_delimiter|⟧ = ⟦
    function char_box(
      f: internal_font_number,
      c: integer,
    ): pointer {
        var
          q: four_quarters,
          hd: eight_bits, //  height_depth byte
          b, p: pointer; // the new box and its character 
          // node
        
        if (is_native_font(f)) {
            b = new_null_box;
            p = new_native_character(f, c);
            list_ptr(b) = p;
            height(b) = height(p);
            width(b) = width(p);
            if (depth(p) < 0) {
                depth(b) = 0;
            } else {
                depth(b) = depth(p);
            }
        } else {
            q = char_info(f)(c);
            hd = height_depth(q);
            b = new_null_box;
            width(b) = char_width(f)(q) + char_italic(f)(q);
            height(b) = char_height(f)(hd);
            depth(b) = char_depth(f)(hd);
            p = get_avail;
            character(p) = c;
            font(p) = f;
        }
        list_ptr(b) = p;
        char_box = b;
    }
⟧

753. When the following code is executed, char_tag(q) will be equal to ext_tag if and only if a built-up symbol is supposed to be returned.

⟦753 Make variable |b| point to a box for |(f,c)|⟧ = ⟦
    if (char_tag(q) == ext_tag) {
        ⟦756 Construct an extensible character in a new box |b|, using recipe |rem_byte(q)| and font |f|⟧
    } else {
        b = char_box(f, c);
    }
⟧

754. When we build an extensible character, it’s handy to have the following subroutine, which puts a given character on top of the characters already in box b :

⟦752 Declare subprocedures for |var_delimiter|⟧ += ⟦
    function stack_into_box(
      b: pointer,
      f: internal_font_number,
      c: quarterword,
    ) {
        var
          p: pointer; // new node placed into b 
        
        p = char_box(f, c);
        link(p) = list_ptr(b);
        list_ptr(b) = p;
        height(b) = height(p);
    }
⟧

755. Another handy subroutine computes the height plus depth of a given character:

⟦752 Declare subprocedures for |var_delimiter|⟧ += ⟦
    function height_plus_depth(
      f: internal_font_number,
      c: quarterword,
    ): scaled {
        var
          q: four_quarters,
          hd: eight_bits; //  height_depth byte
        
        q = char_info(f)(c);
        hd = height_depth(q);
        height_plus_depth = 
            char_height(f)(hd)
            + char_depth(f)(hd)
        ;
    }
⟧

756.

⟦756 Construct an extensible character in a new box |b|, using recipe |rem_byte(q)| and font |f|⟧ = ⟦
    {
        b = new_null_box;
        type(b) = vlist_node;
        r = font_info[exten_base[f] + rem_byte(q)].qqqq;
        ⟦757 Compute the minimum suitable height, |w|, and the corresponding number of extension steps, |n|; also set |width(b)|⟧
        c = ext_bot(r);
        if (c != min_quarterword) {
            stack_into_box(b, f, c);
        }
        c = ext_rep(r);
        for (m in 1 to n) {
            stack_into_box(b, f, c);
        }
        c = ext_mid(r);
        if (c != min_quarterword) {
            stack_into_box(b, f, c);
            c = ext_rep(r);
            for (m in 1 to n) {
                stack_into_box(b, f, c);
            }
        }
        c = ext_top(r);
        if (c != min_quarterword) {
            stack_into_box(b, f, c);
        }
        depth(b) = w - height(b);
    }
⟧

757. The width of an extensible character is the width of the repeatable module. If this module does not have positive height plus depth, we don’t use any copies of it, otherwise we use as few as possible (in groups of two if there is a middle part).

⟦757 Compute the minimum suitable height, |w|, and the corresponding number of extension steps, |n|; also set |width(b)|⟧ = ⟦
    c = ext_rep(r)

    u = height_plus_depth(f, c)

    w = 0

    q = char_info(f)(c)

    width(b) = char_width(f)(q) + char_italic(f)(q)

    c = ext_bot(r)

    if (c != min_quarterword) {
        w = w + height_plus_depth(f, c);
    }

    c = ext_mid(r)

    if (c != min_quarterword) {
        w = w + height_plus_depth(f, c);
    }

    c = ext_top(r)

    if (c != min_quarterword) {
        w = w + height_plus_depth(f, c);
    }

    n = 0

    if (u > 0) {
        while (w < v) {
            w = w + u;
            incr(n);
            if (ext_mid(r) != min_quarterword) {
                w = w + u;
            }
        }
    }
⟧

758. The next subroutine is much simpler; it is used for numerators and denominators of fractions as well as for displayed operators and their limits above and below. It takes a given box b and changes it so that the new box is centered in a box of width w . The centering is done by putting \hss glue at the left and right of the list inside b , then packaging the new box; thus, the actual box might not really be centered, if it already contains infinite glue.

The given box might contain a single character whose italic correction has been added to the width of the box; in this case a compensating kern is inserted.

function rebox(b: pointer, w: scaled): pointer {
    var
      p: pointer, // temporary register for list 
      // manipulation
      f: internal_font_number, // font in a one-character 
      // box
      v: scaled; // width of a character without italic 
      // correction
    
    if ((width(b) != w) && (list_ptr(b) != null)) {
        if (type(b) == vlist_node) {
            b = hpack(b, natural);
        }
        p = list_ptr(b);
        if ((is_char_node(p)) && (link(p) == null)) {
            f = font(p);
            v = char_width(f)(char_info(f)(character(p)));
            if (v != width(b)) {
                link(p) = new_kern(width(b) - v);
            }
        }
        free_node(b, box_node_size);
        b = new_glue(ss_glue);
        link(b) = p;
        while (link(p) != null) {
            p = link(p);
        }
        link(p) = new_glue(ss_glue);
        rebox = hpack(b, w, exactly);
    } else {
        width(b) = w;
        rebox = b;
    }
}

759. Here is a subroutine that creates a new glue specification from another one that is expressed in ‘mu’, given the value of the math unit.

@define mu_mult(#) =>
    nx_plus_y(n, #, xn_over_d(#, f, 0x10000))
function math_glue(g: pointer, m: scaled): pointer {
    var
      p: pointer, // the new glue specification
      n: integer, // integer part of m 
      f: scaled; // fraction part of m 
    
    n = x_over_n(m, 0x10000);
    f = remainder;
    if (f < 0) {
        decr(n);
        f = f + 0x10000;
    }
    p = get_node(glue_spec_size);
    // convert \.{mu} to \.{pt}
    width(p) = mu_mult(width(g));
    stretch_order(p) = stretch_order(g);
    if (stretch_order(p) == normal) {
        stretch(p) = mu_mult(stretch(g));
    } else {
        stretch(p) = stretch(g);
    }
    shrink_order(p) = shrink_order(g);
    if (shrink_order(p) == normal) {
        shrink(p) = mu_mult(shrink(g));
    } else {
        shrink(p) = shrink(g);
    }
    math_glue = p;
}

760. The math_kern subroutine removes mu_glue from a kern node, given the value of the math unit.

function math_kern(p: pointer, m: scaled) {
    var
      n: integer, // integer part of m 
      f: scaled; // fraction part of m 
    
    if (subtype(p) == mu_glue) {
        n = x_over_n(m, 0x10000);
        f = remainder;
        if (f < 0) {
            decr(n);
            f = f + 0x10000;
        }
        width(p) = mu_mult(width(p));
        subtype(p) = explicit;
    }
}

761. Sometimes it is necessary to destroy an mlist. The following subroutine empties the current list, assuming that abs(mode) == mmode .

function flush_math() {
    flush_node_list(link(head));
    flush_node_list(incompleat_noad);
    link(head) = null;
    tail = head;
    incompleat_noad = null;
}

762. [36] Typesetting math formulas. TEX’s most important routine for dealing with formulas is called mlist_to_hlist . After a formula has been scanned and represented as an mlist, this routine converts it to an hlist that can be placed into a box or incorporated into the text of a paragraph. There are three implicit parameters, passed in global variables: cur_mlist points to the first node or noad in the given mlist (and it might be null ); cur_style is a style code; and mlist_penalties is true if penalty nodes for potential line breaks are to be inserted into the resulting hlist. After mlist_to_hlist has acted, link(temp_head) points to the translated hlist.

Since mlists can be inside mlists, the procedure is recursive. And since this is not part of TEX’s inner loop, the program has been written in a manner that stresses compactness over efficiency.

⟦13 Global variables⟧ += ⟦
    // beginning of mlist to be translated
    var cur_mlist: pointer;

    // style code at current place in the list
    var cur_style: small_number;

    // size code corresponding to cur_style 
    var cur_size: integer;

    // the math unit width corresponding to cur_size 
    var cur_mu: scaled;

    // should mlist_to_hlist insert penalties?
    var mlist_penalties: boolean;
⟧

763. The recursion in mlist_to_hlist is due primarily to a subroutine called clean_box that puts a given noad field into a box using a given math style; mlist_to_hlist can call clean_box , which can call mlist_to_hlist .

The box returned by clean_box is “clean” in the sense that its shift_amount is zero.

forward_declaration mlist_to_hlist();

function clean_box(p: pointer, s: small_number): pointer {
    label found;
    var
      q: pointer, // beginning of a list to be boxed
      save_style: small_number, //  cur_style to be restored
      x: pointer, // box to be returned
      r: pointer; // temporary pointer
    
    case math_type(p) {
      math_char:
        cur_mlist = new_noad;
        mem[nucleus(cur_mlist)] = mem[p];
      sub_box:
        q = info(p);
        goto found;
      sub_mlist:
        cur_mlist = info(p);
      othercases:
        q = new_null_box;
        goto found;
    }
    save_style = cur_style;
    cur_style = s;
    mlist_penalties = false;
    mlist_to_hlist;
    // recursive call
    q = link(temp_head);
    // restore the style
    cur_style = save_style;
    ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
  found:
    if (is_char_node(q) || (q == null)) {
        x = hpack(q, natural);
    } else if (
        (link(q) == null)
        && (type(q) <= vlist_node) && (shift_amount(q) == 0)
    ) {
        // it's already clean
        x = q;
    } else {
        x = hpack(q, natural);
    }
    ⟦764 Simplify a trivial box⟧
    clean_box = x;
}

764. Here we save memory space in a common case.

⟦764 Simplify a trivial box⟧ = ⟦
    q = list_ptr(x)

    if (is_char_node(q)) {
        r = link(q);
        if (r != null) {
            if (link(r) == null) {
                if (!is_char_node(r)) {
                    // unneeded italic correction
                    if (type(r) == kern_node) {
                        free_node(r, medium_node_size);
                        link(q) = null;
                    }
                }
            }
        }
    }
⟧

765. It is convenient to have a procedure that converts a math_char field to an “unpacked” form. The fetch routine sets cur_f , cur_c , and cur_i to the font code, character code, and character information bytes of a given noad field. It also takes care of issuing error messages for nonexistent characters; in such cases, char_exists(cur_i) will be false after fetch has acted, and the field will also have been reset to empty .

// unpack the math_char field a 
function fetch(a: pointer) {
    cur_c = cast_to_ushort(character(a));
    cur_f = fam_fnt(fam(a) + cur_size);
    cur_c = 
        cur_c
        + (plane_and_fam_field(a) div 0x100) * 0x10000
    ;
    if (cur_f == null_font) {
        ⟦766 Complain about an undefined family and set |cur_i| null⟧
    } else if (is_native_font(cur_f)) {
        cur_i = null_character;
    } else {
        if (
            (qo(cur_c) >= font_bc[cur_f])
            && (qo(cur_c) <= font_ec[cur_f])
        ) {
            cur_i = orig_char_info(cur_f)(cur_c);
        } else {
            cur_i = null_character;
        }
        if (!(char_exists(cur_i))) {
            char_warning(cur_f, qo(cur_c));
            math_type(a) = empty;
            cur_i = null_character;
        }
    }
}

766.

⟦766 Complain about an undefined family and set |cur_i| null⟧ = ⟦
    {
        print_err(strpool!(""));
        print_size(cur_size);
        print_char(ord!(" "));
        print_int(fam(a));
        print(strpool!(" is undefined (character "));
        print_ASCII(qo(cur_c));
        print_char(ord!(")"));
        help4(
          strpool!("Somewhere in the math formula just ended, you used the"),
        )(
          strpool!("stated character from an undefined font family. For example,"),
        )(
          strpool!("plain TeX doesn't allow \\it or \\sl in subscripts. Proceed,"),
        )(
          strpool!("and I'll try to forget that I needed that character."),
        );
        error;
        cur_i = null_character;
        math_type(a) = empty;
    }
⟧

767. The outputs of fetch are placed in global variables.

⟦13 Global variables⟧ += ⟦
    // the font field of a math_char 
    var cur_f: internal_font_number;

    // the character field of a math_char 
    var cur_c: integer;

    // the char_info of a math_char , or a lig/kern 
    // instruction
    var cur_i: four_quarters;
⟧

768. We need to do a lot of different things, so mlist_to_hlist makes two passes over the given mlist.

The first pass does most of the processing: It removes “mu” spacing from glue, it recursively evaluates all subsidiary mlists so that only the top-level mlist remains to be handled, it puts fractions and square roots and such things into boxes, it attaches subscripts and superscripts, and it computes the overall height and depth of the top-level mlist so that the size of delimiters for a left_noad and a right_noad will be known. The hlist resulting from each noad is recorded in that noad’s new_hlist field, an integer field that replaces the nucleus or thickness .

The second pass eliminates all noads and inserts the correct glue and penalties between nodes.

// the translation of an mlist
@define new_hlist(#) => mem[nucleus(#)].int

769. Here is the overall plan of mlist_to_hlist , and the list of its local variables.

// go here when a noad has been fully translated
@define done_with_noad => 80
// go here when a node has been fully converted
@define done_with_node => 81
// go here to update max_h and max_d 
@define check_dimensions => 82
// go here to delete q and move to the next node
@define delete_q => 83
⟦777 Declare math construction procedures⟧

function mlist_to_hlist() {
    label
        reswitch,
        check_dimensions,
        done_with_noad,
        done_with_node,
        delete_q,
        done;
    var
      mlist: pointer, // beginning of the given list
      penalties: boolean, // should penalty nodes be 
      // inserted?
      style: small_number, // the given style
      save_style: small_number, // holds cur_style during 
      // recursion
      q: pointer, // runs through the mlist
      r: pointer, // the most recent noad preceding q 
      r_type: small_number, // the type of noad r , or 
      // op_noad if r == null 
      t: small_number, // the effective type of noad q 
      // during the second pass
      p, x, y, z: pointer, // temporary registers for list 
      // construction
      pen: integer, // a penalty to be inserted
      s: small_number, // the size of a noad to be deleted
      max_h, max_d: scaled, // maximum height and depth of 
      // the list translated so far
      delta: scaled; // offset between subscript and 
      // superscript
    
    mlist = cur_mlist;
    penalties = mlist_penalties;
    // tuck global parameters away as local variables
    style = cur_style;
    q = mlist;
    r = null;
    r_type = op_noad;
    max_h = 0;
    max_d = 0;
    ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
    while (q != null) {
        ⟦770 Process node-or-noad |q| as much as possible in preparation for the second pass of |mlist_to_hlist|, then move to the next item in the mlist⟧
    }
    ⟦772 Convert \(a)a final |bin_noad| to an |ord_noad|⟧
    ⟦808 Make a second pass over the mlist, removing all noads and inserting the proper spacing and penalties⟧
}

770. We use the fact that no character nodes appear in an mlist, hence the field type(q) is always present.

⟦770 Process node-or-noad |q| as much as possible in preparation for the second pass of |mlist_to_hlist|, then move to the next item in the mlist⟧ = ⟦
    {
        ⟦771 Do first-pass processing based on |type(q)|; |goto done_with_noad| if a noad has been fully processed, |goto check_dimensions| if it has been translated into |new_hlist(q)|, or |goto done_with_node| if a node has been fully processed⟧
      check_dimensions:
        z = hpack(new_hlist(q), natural);
        if (height(z) > max_h) {
            max_h = height(z);
        }
        if (depth(z) > max_d) {
            max_d = depth(z);
        }
        free_node(z, box_node_size);
      done_with_noad:
        r = q;
        r_type = type(r);
        if (r_type == right_noad) {
            r_type = left_noad;
            cur_style = style;
            ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        }
      done_with_node:
        q = link(q);
    }
⟧

771. One of the things we must do on the first pass is change a bin_noad to an ord_noad if the bin_noad is not in the context of a binary operator. The values of r and r_type make this fairly easy.

⟦771 Do first-pass processing based on |type(q)|; |goto done_with_noad| if a noad has been fully processed, |goto check_dimensions| if it has been translated into |new_hlist(q)|, or |goto done_with_node| if a node has been fully processed⟧ = ⟦
    reswitch:

    delta = 0

    case type(q) {
      bin_noad:
        case r_type {
          bin_noad,
          op_noad,
          rel_noad,
          open_noad,
          punct_noad,
          left_noad:
            type(q) = ord_noad;
            goto reswitch;
          othercases:
            do_nothing;
        }
      rel_noad, close_noad, punct_noad, right_noad:
        ⟦772 Convert \(a)a final |bin_noad| to an |ord_noad|⟧
        if (type(q) == right_noad) {
            goto done_with_noad;
        }
      ⟦776 Cases for noads that can follow a |bin_noad|⟧
      ⟦773 Cases for nodes that can appear in an mlist, after which we |goto done_with_node|⟧
      othercases:
        confusion(strpool!("mlist1"));
    }

    ⟦798 Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts⟧

772.

⟦772 Convert \(a)a final |bin_noad| to an |ord_noad|⟧ = ⟦
    if (r_type == bin_noad) {
        type(r) = ord_noad;
    }
⟧

773.

⟦773 Cases for nodes that can appear in an mlist, after which we |goto done_with_node|⟧ = ⟦
    style_node:

    {
        cur_style = subtype(q);
        ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        goto done_with_node;
    }

    choice_node:

    ⟦774 Change this node to a style node followed by the correct choice, then |goto done_with_node|⟧

    
    ins_node,
    mark_node,
    adjust_node,
    whatsit_node,
    penalty_node,
    disc_node:
      goto done_with_node;;

    rule_node:

    {
        if (height(q) > max_h) {
            max_h = height(q);
        }
        if (depth(q) > max_d) {
            max_d = depth(q);
        }
        goto done_with_node;
    }

    glue_node:

    {
        ⟦775 Convert \(m)math glue to ordinary glue⟧
        goto done_with_node;
    }

    kern_node:

    {
        math_kern(q, cur_mu);
        goto done_with_node;
    }
⟧

774.

@define choose_mlist(#) =>
    {
        p = #(q);
        #(q) = null;
    }
⟦774 Change this node to a style node followed by the correct choice, then |goto done_with_node|⟧ = ⟦
    {
        case cur_style div 2 {
          0:
            //  display_style == 0 
            choose_mlist(display_mlist);
          1:
            //  text_style == 2 
            choose_mlist(text_mlist);
          2:
            //  script_style == 4 
            choose_mlist(script_mlist);
          3:
            //  script_script_style == 6 
            choose_mlist(script_script_mlist);// there are 
          // no other cases
        }
        flush_node_list(display_mlist(q));
        flush_node_list(text_mlist(q));
        flush_node_list(script_mlist(q));
        flush_node_list(script_script_mlist(q));
        type(q) = style_node;
        subtype(q) = cur_style;
        width(q) = 0;
        depth(q) = 0;
        if (p != null) {
            z = link(q);
            link(q) = p;
            while (link(p) != null) {
                p = link(p);
            }
            link(p) = z;
        }
        goto done_with_node;
    }
⟧

775. Conditional math glue (‘\nonscript’) results in a glue_node pointing to zero_glue , with subtype(q) == cond_math_glue ; in such a case the node following will be eliminated if it is a glue or kern node and if the current size is different from text_size . Unconditional math glue (‘\muskip’) is converted to normal glue by multiplying the dimensions by cur_mu .

⟦775 Convert \(m)math glue to ordinary glue⟧ = ⟦
    if (subtype(q) == mu_glue) {
        x = glue_ptr(q);
        y = math_glue(x, cur_mu);
        delete_glue_ref(x);
        glue_ptr(q) = y;
        subtype(q) = normal;
    } else if (
        (cur_size != text_size)
        && (subtype(q) == cond_math_glue)
    ) {
        p = link(q);
        if (p != null) {
            if (
                (type(p) == glue_node)
                || (type(p) == kern_node)
            ) {
                link(q) = link(p);
                link(p) = null;
                flush_node_list(p);
            }
        }
    }
⟧

776.

⟦776 Cases for noads that can follow a |bin_noad|⟧ = ⟦
    left_noad:

    goto done_with_noad

    fraction_noad:

    {
        make_fraction(q);
        goto check_dimensions;
    }

    op_noad:

    {
        delta = make_op(q);
        if (subtype(q) == limits) {
            goto check_dimensions;
        }
    }

    ord_noad:

    make_ord(q)

    var open_noad, inner_noad: do_nothing;

    radical_noad:

    make_radical(q)

    over_noad:

    make_over(q)

    under_noad:

    make_under(q)

    accent_noad:

    make_math_accent(q)

    vcenter_noad:

    make_vcenter(q)
⟧

777. Most of the actual construction work of mlist_to_hlist is done by procedures with names like make_fraction , make_radical , etc. To illustrate the general setup of such procedures, let’s begin with a couple of simple ones.

⟦777 Declare math construction procedures⟧ = ⟦
    function make_over(q: pointer) {
        info(nucleus(q)) = overbar(
          clean_box(nucleus(q), cramped_style(cur_style)),
          3 * default_rule_thickness,
          default_rule_thickness,
        );
        math_type(nucleus(q)) = sub_box;
    }
⟧

778.

⟦777 Declare math construction procedures⟧ += ⟦
    function make_under(q: pointer) {
        var
          p, x, y: pointer, // temporary registers for box 
          // construction
          delta: scaled; // overall height plus depth
        
        x = clean_box(nucleus(q), cur_style);
        p = new_kern(3 * default_rule_thickness);
        link(x) = p;
        link(p) = fraction_rule(default_rule_thickness);
        y = vpack(x, natural);
        delta = 
            height(y)
            + depth(y) + default_rule_thickness
        ;
        height(y) = height(x);
        depth(y) = delta - height(y);
        info(nucleus(q)) = y;
        math_type(nucleus(q)) = sub_box;
    }
⟧

779.

⟦777 Declare math construction procedures⟧ += ⟦
    function make_vcenter(q: pointer) {
        var
          v: pointer, // the box that should be centered 
          // vertically
          delta: scaled; // its height plus depth
        
        v = info(nucleus(q));
        if (type(v) != vlist_node) {
            confusion(strpool!("vcenter"));
        }
        delta = height(v) + depth(v);
        height(v) = axis_height(cur_size) + half(delta);
        depth(v) = delta - height(v);
    }
⟧

780. According to the rules in the DVI file specifications, we ensure alignment between a square root sign and the rule above its nucleus by assuming that the baseline of the square-root symbol is the same as the bottom of the rule. The height of the square-root symbol will be the thickness of the rule, and the depth of the square-root symbol should exceed or equal the height-plus-depth of the nucleus plus a certain minimum clearance clr . The symbol will be placed so that the actual clearance is clr plus half the excess.

⟦777 Declare math construction procedures⟧ += ⟦
    function make_radical(q: pointer) {
        var
          x, y: pointer, // temporary registers for box 
          // construction
          f: internal_font_number,
          rule_thickness: scaled, // rule thickness
          delta, clr: scaled; // dimensions involved in the 
          // calculation
        
        f = fam_fnt(
          small_fam(left_delimiter(q)) + cur_size,
        );
        if (is_new_mathfont(f)) {
            rule_thickness = get_ot_math_constant(
              f,
              radicalRuleThickness,
            );
        } else {
            rule_thickness = default_rule_thickness;
        }
        x = clean_box(nucleus(q), cramped_style(cur_style));
        if (is_new_mathfont(f)) {
            // display style
            if (cur_style < text_style) {
                clr = get_ot_math_constant(
                  f,
                  radicalDisplayStyleVerticalGap,
                );
            } else {
                clr = get_ot_math_constant(
                  f,
                  radicalVerticalGap,
                );
            }
        } else {
            // display style
            if (cur_style < text_style) {
                clr = 
                    rule_thickness
                    + (abs(math_x_height(cur_size)) div 4)
                ;
            } else {
                clr = rule_thickness;
                clr = clr + (abs(clr) div 4);
            }
        }
        y = var_delimiter(
          left_delimiter(q),
          cur_size,
          height(x) + depth(x) + clr + rule_thickness,
        );
        if (is_new_mathfont(f)) {
            depth(y) = height(y) + depth(y) - rule_thickness;
            height(y) = rule_thickness;
        }
        delta = depth(y) - (height(x) + depth(x) + clr);
        if (delta > 0) {
            // increase the actual clearance
            clr = clr + half(delta);
        }
        shift_amount(y) = -(height(x) + clr);
        link(y) = overbar(x, clr, height(y));
        info(nucleus(q)) = hpack(y, natural);
        math_type(nucleus(q)) = sub_box;
    }
⟧

781. Slants are not considered when placing accents in math mode. The accenter is centered over the accentee, and the accent width is treated as zero with respect to the size of the final box.

⟦777 Declare math construction procedures⟧ += ⟦
    function compute_ot_math_accent_pos(
      p: pointer,
    ): scaled {
        var q, r: pointer, s, g: scaled;
        
        if ((math_type(nucleus(p)) == math_char)) {
            fetch(nucleus(p));
            q = new_native_character(cur_f, qo(cur_c));
            g = get_native_glyph(q, 0);
            s = get_ot_math_accent_pos(cur_f, g);
        } else {
            if ((math_type(nucleus(p)) == sub_mlist)) {
                r = info(nucleus(p));
                if ((r != null) && (type(r) == accent_noad)) {
                    s = compute_ot_math_accent_pos(r);
                } else {
                    s = 0x7fffffff;
                }
            } else {
                s = 0x7fffffff;
            }
        }
        compute_ot_math_accent_pos = s;
    }

    function make_math_accent(q: pointer) {
        label done, done1;
        var
          p, x, y: pointer, // temporary registers for box 
          // construction
          a: integer, // address of lig/kern instruction
          c, g: integer, // accent character
          f: internal_font_number, // its font
          i: four_quarters, // its char_info 
          s, sa: scaled, // amount to skew the accent to the 
          // right
          h: scaled, // height of character being accented
          delta: scaled, // space to remove between accent 
          // and accentee
          w, w2: scaled, // width of the accentee, not 
          // including sub/superscripts
          ot_assembly_ptr: void_pointer;
        
        fetch(accent_chr(q));
        x = null;
        ot_assembly_ptr = nil;
        if (is_native_font(cur_f)) {
            c = cur_c;
            f = cur_f;
            if (!is_bottom_acc(q)) {
                s = compute_ot_math_accent_pos(q);
            } else {
                s = 0;
            }
            x = clean_box(
              nucleus(q),
              cramped_style(cur_style),
            );
            w = width(x);
            h = height(x);
        } else if (char_exists(cur_i)) {
            i = cur_i;
            c = cur_c;
            f = cur_f;
            ⟦785 Compute the amount of skew⟧
            x = clean_box(
              nucleus(q),
              cramped_style(cur_style),
            );
            w = width(x);
            h = height(x);
            ⟦784 Switch to a larger accent if available and appropriate⟧
        }
        if (x != null) {
            if (is_new_mathfont(f)) {
                if (is_bottom_acc(q)) {
                    delta = 0;
                } else if (
                    h
                    < get_ot_math_constant(
                      f,
                      accentBaseHeight,
                    )
                ) {
                    delta = h;
                } else {
                    delta = get_ot_math_constant(
                      f,
                      accentBaseHeight,
                    );
                }
            } else if (h < x_height(f)) {
                delta = h;
            } else {
                delta = x_height(f);
            }
            if (
                (math_type(supscr(q)) != empty)
                || (math_type(subscr(q)) != empty)
            ) {
                if (math_type(nucleus(q)) == math_char) {
                    ⟦786 Swap the subscript and superscript into box |x|⟧
                }
            }
            y = char_box(f, c);
            if (is_native_font(f)) {
                // turn the native_word node into a 
                // native_glyph one
                p = get_node(glyph_node_size);
                type(p) = whatsit_node;
                subtype(p) = glyph_node;
                native_font(p) = f;
                native_glyph(p) = get_native_glyph(
                  list_ptr(y),
                  0,
                );
                set_native_glyph_metrics(p, 1);
                free_node(
                  list_ptr(y),
                  native_size(list_ptr(y)),
                );
                list_ptr(y) = p;
                // determine horiz positioning
                ⟦783 Switch to a larger native-font accent if available and appropriate⟧
                if (is_glyph_node(p)) {
                    sa = get_ot_math_accent_pos(
                      f,
                      native_glyph(p),
                    );
                    if (sa == 0x7fffffff) {
                        sa = half(width(y));
                    }
                } else {
                    sa = half(width(y));
                }
                if (is_bottom_acc(q) || (s == 0x7fffffff)) {
                    s = half(w);
                }
                shift_amount(y) = s - sa;
            } else {
                shift_amount(y) = s + half(w - width(y));
            }
            width(y) = 0;
            if (is_bottom_acc(q)) {
                link(x) = y;
                y = vpack(x, natural);
                shift_amount(y) = -(h - height(y));
            } else {
                p = new_kern(-delta);
                link(p) = x;
                link(y) = p;
                y = vpack(y, natural);
                if (height(y) < h) {
                    ⟦782 Make the height of box |y| equal to |h|⟧
                }
            }
            width(y) = width(x);
            info(nucleus(q)) = y;
            math_type(nucleus(q)) = sub_box;
        }
        free_ot_assembly(ot_assembly_ptr);
    }
⟧

782.

⟦782 Make the height of box |y| equal to |h|⟧ = ⟦
    {
        p = new_kern(h - height(y));
        link(p) = list_ptr(y);
        list_ptr(y) = p;
        height(y) = h;
    }
⟧

783.

⟦783 Switch to a larger native-font accent if available and appropriate⟧ = ⟦
    // non growing accent
    if (odd(subtype(q))) {
        set_native_glyph_metrics(p, 1);
    } else {
        c = native_glyph(p);
        a = 0;
        repeat {
            g = get_ot_math_variant(
              f,
              c,
              a,
              addressof(w2),
              1,
            );
            if ((w2 > 0) && (w2 <= w)) {
                native_glyph(p) = g;
                set_native_glyph_metrics(p, 1);
                incr(a);
            }
        } until ((w2 < 0) || (w2 >= w));
        if ((w2 < 0)) {
            ot_assembly_ptr = get_ot_assembly_ptr(f, c, 1);
            if (ot_assembly_ptr != nil) {
                free_node(p, glyph_node_size);
                p = build_opentype_assembly(
                  f,
                  ot_assembly_ptr,
                  w,
                  1,
                );
                list_ptr(y) = p;
                goto found;
            }
        } else {
            set_native_glyph_metrics(p, 1);
        }
    }

    found:

    width(y) = width(p)

    height(y) = height(p)

    depth(y) = depth(p)

    if (is_bottom_acc(q)) {
        if (height(y) < 0) {
            height(y) = 0;
        }
    } else if (depth(y) < 0) {
        depth(y) = 0;
    }
⟧

784.

⟦784 Switch to a larger accent if available and appropriate⟧ = ⟦
    loop {
        if (char_tag(i) != list_tag) {
            goto done;
        }
        y = rem_byte(i);
        i = orig_char_info(f)(y);
        if (!char_exists(i)) {
            goto done;
        }
        if (char_width(f)(i) > w) {
            goto done;
        }
        c = y;
    }

    done:
⟧

785.

⟦785 Compute the amount of skew⟧ = ⟦
    s = 0

    if (math_type(nucleus(q)) == math_char) {
        fetch(nucleus(q));
        if (char_tag(cur_i) == lig_tag) {
            a = lig_kern_start(cur_f)(cur_i);
            cur_i = font_info[a].qqqq;
            if (skip_byte(cur_i) > stop_flag) {
                a = lig_kern_restart(cur_f)(cur_i);
                cur_i = font_info[a].qqqq;
            }
            loop {
                if (qo(next_char(cur_i)) == skew_char[cur_f]) {
                    if (op_byte(cur_i) >= kern_flag) {
                        if (skip_byte(cur_i) <= stop_flag) {
                            s = char_kern(cur_f)(cur_i);
                        }
                    }
                    goto done1;
                }
                if (skip_byte(cur_i) >= stop_flag) {
                    goto done1;
                }
                a = a + qo(skip_byte(cur_i)) + 1;
                cur_i = font_info[a].qqqq;
            }
        }
    }

    done1:
⟧

786.

⟦786 Swap the subscript and superscript into box |x|⟧ = ⟦
    {
        flush_node_list(x);
        x = new_noad;
        mem[nucleus(x)] = mem[nucleus(q)];
        mem[supscr(x)] = mem[supscr(q)];
        mem[subscr(x)] = mem[subscr(q)];
        mem[supscr(q)].hh = empty_field;
        mem[subscr(q)].hh = empty_field;
        math_type(nucleus(q)) = sub_mlist;
        info(nucleus(q)) = x;
        x = clean_box(nucleus(q), cur_style);
        delta = delta + height(x) - h;
        h = height(x);
    }
⟧

787. The make_fraction procedure is a bit different because it sets new_hlist(q) directly rather than making a sub-box.

⟦777 Declare math construction procedures⟧ += ⟦
    function make_fraction(q: pointer) {
        var
          p, v, x, y, z: pointer, // temporary registers for 
          // box construction
          delta, delta1, delta2, shift_up, shift_down, clr: scaled; // 
          // dimensions for box calculations
        
        if (thickness(q) == default_code) {
            thickness(q) = default_rule_thickness;
        }
        ⟦788 Create equal-width boxes |x| and |z| for the numerator and denominator, and compute the default amounts |shift_up| and |shift_down| by which they are displaced from the baseline⟧
        if (thickness(q) == 0) {
            ⟦789 Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line⟧
        } else {
            ⟦790 Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line⟧
        }
        ⟦791 Construct a vlist box for the fraction, according to |shift_up| and |shift_down|⟧
        ⟦792 Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)| point to it⟧
    }
⟧

788.

⟦788 Create equal-width boxes |x| and |z| for the numerator and denominator, and compute the default amounts |shift_up| and |shift_down| by which they are displaced from the baseline⟧ = ⟦
    x = clean_box(numerator(q), num_style(cur_style))

    z = clean_box(denominator(q), denom_style(cur_style))

    if (width(x) < width(z)) {
        x = rebox(x, width(z));
    } else {
        z = rebox(z, width(x));
    }

    // display style
    if (cur_style < text_style) {
        shift_up = num1(cur_size);
        shift_down = denom1(cur_size);
    } else {
        shift_down = denom2(cur_size);
        if (thickness(q) != 0) {
            shift_up = num2(cur_size);
        } else {
            shift_up = num3(cur_size);
        }
    }
⟧

789. The numerator and denominator must be separated by a certain minimum clearance, called clr in the following program. The difference between clr and the actual clearance is twice delta .

⟦789 Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line⟧ = ⟦
    {
        if (is_new_mathfont(cur_f)) {
            if (cur_style < text_style) {
                clr = get_ot_math_constant(
                  cur_f,
                  stackDisplayStyleGapMin,
                );
            } else {
                clr = get_ot_math_constant(
                  cur_f,
                  stackGapMin,
                );
            }
        } else {
            if (cur_style < text_style) {
                clr = 7 * default_rule_thickness;
            } else {
                clr = 3 * default_rule_thickness;
            }
        }
        delta = half(
          
              clr
              - (
                  (shift_up - depth(x))
                  - (height(z) - shift_down)
              )
          ,
        );
        if (delta > 0) {
            shift_up = shift_up + delta;
            shift_down = shift_down + delta;
        }
    }
⟧

790. In the case of a fraction line, the minimum clearance depends on the actual thickness of the line.

⟦790 Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line⟧ = ⟦
    {
        if (is_new_mathfont(cur_f)) {
            delta = half(thickness(q));
            if (cur_style < text_style) {
                clr = get_ot_math_constant(
                  cur_f,
                  fractionNumDisplayStyleGapMin,
                );
            } else {
                clr = get_ot_math_constant(
                  cur_f,
                  fractionNumeratorGapMin,
                );
            }
            delta1 = 
                clr
                - (
                    (shift_up - depth(x))
                    - (axis_height(cur_size) + delta)
                )
            ;
            if (cur_style < text_style) {
                clr = get_ot_math_constant(
                  cur_f,
                  fractionDenomDisplayStyleGapMin,
                );
            } else {
                clr = get_ot_math_constant(
                  cur_f,
                  fractionDenominatorGapMin,
                );
            }
            delta2 = 
                clr
                - (
                    (axis_height(cur_size) - delta)
                    - (height(z) - shift_down)
                )
            ;
        } else {
            if (cur_style < text_style) {
                clr = 3 * thickness(q);
            } else {
                clr = thickness(q);
            }
            delta = half(thickness(q));
            delta1 = 
                clr
                - (
                    (shift_up - depth(x))
                    - (axis_height(cur_size) + delta)
                )
            ;
            delta2 = 
                clr
                - (
                    (axis_height(cur_size) - delta)
                    - (height(z) - shift_down)
                )
            ;
        }
        if (delta1 > 0) {
            shift_up = shift_up + delta1;
        }
        if (delta2 > 0) {
            shift_down = shift_down + delta2;
        }
    }
⟧

791.

⟦791 Construct a vlist box for the fraction, according to |shift_up| and |shift_down|⟧ = ⟦
    v = new_null_box

    type(v) = vlist_node

    height(v) = shift_up + height(x)

    depth(v) = depth(z) + shift_down

    width(v) = width(x) // this also equals width ( z ) 

    if (thickness(q) == 0) {
        p = new_kern(
          (shift_up - depth(x)) - (height(z) - shift_down),
        );
        link(p) = z;
    } else {
        y = fraction_rule(thickness(q));
        p = new_kern(
          
              (axis_height(cur_size) - delta)
              - (height(z) - shift_down)
          ,
        );
        link(y) = p;
        link(p) = z;
        p = new_kern(
          
              (shift_up - depth(x))
              - (axis_height(cur_size) + delta)
          ,
        );
        link(p) = y;
    }

    link(x) = p

    list_ptr(v) = x
⟧

792.

⟦792 Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)| point to it⟧ = ⟦
    if (cur_style < text_style) {
        delta = delim1(cur_size);
    } else {
        delta = delim2(cur_size);
    }

    x = var_delimiter(left_delimiter(q), cur_size, delta)

    link(x) = v

    z = var_delimiter(right_delimiter(q), cur_size, delta)

    link(v) = z

    new_hlist(q) = hpack(x, natural)
⟧

793. If the nucleus of an op_noad is a single character, it is to be centered vertically with respect to the axis, after first being enlarged (via a character list in the font) if we are in display style. The normal convention for placing displayed limits is to put them above and below the operator in display style.

The italic correction is removed from the character if there is a subscript and the limits are not being displayed. The make_op routine returns the value that should be used as an offset between subscript and superscript.

After make_op has acted, subtype(q) will be limits if and only if the limits have been set above and below the operator. In that case, new_hlist(q) will already contain the desired final box.

⟦777 Declare math construction procedures⟧ += ⟦
    function make_op(q: pointer): scaled {
        label found;
        var
          delta: scaled, // offset between subscript and 
          // superscript
          p, v, x, y, z: pointer, // temporary registers for 
          // box construction
          c: quarterword,
          i: four_quarters, // registers for character 
          // examination
          shift_up, shift_down: scaled, // dimensions for 
          // box calculation
          h1, h2: scaled, // height of original text-style 
          // symbol and possible replacement
          n, g: integer, // potential variant index and 
          // glyph code
          ot_assembly_ptr: void_pointer,
          save_f: internal_font_number;
        
        if (
            (subtype(q) == normal)
            && (cur_style < text_style)
        ) {
            subtype(q) = limits;
        }
        delta = 0;
        ot_assembly_ptr = nil;
        if (math_type(nucleus(q)) == math_char) {
            fetch(nucleus(q));
            if (!is_ot_font(cur_f)) {
                // make it larger
                if (
                    (cur_style < text_style)
                    && (char_tag(cur_i) == list_tag)
                ) {
                    c = rem_byte(cur_i);
                    i = orig_char_info(cur_f)(c);
                    if (char_exists(i)) {
                        cur_c = c;
                        cur_i = i;
                        character(nucleus(q)) = c;
                    }
                }
                delta = char_italic(cur_f)(cur_i);
            }
            x = clean_box(nucleus(q), cur_style);
            if (is_new_mathfont(cur_f)) {
                p = list_ptr(x);
                if (is_glyph_node(p)) {
                    if (cur_style < text_style) {
                        // try to replace the operator glyph 
                        // with a display-size variant, 
                        // ensuring it is larger than the 
                        // text size
                        h1 = get_ot_math_constant(
                          cur_f,
                          displayOperatorMinHeight,
                        );
                        if (
                            h1
                            < (height(p) + depth(p)) * 5 / 4
                        ) {
                            h1 = 
                                (height(p) + depth(p))
                                * 5 / 4
                            ;
                        }
                        c = native_glyph(p);
                        n = 0;
                        repeat {
                            g = get_ot_math_variant(
                              cur_f,
                              c,
                              n,
                              addressof(h2),
                              0,
                            );
                            if (h2 > 0) {
                                native_glyph(p) = g;
                                set_native_glyph_metrics(
                                  p,
                                  1,
                                );
                            }
                            incr(n);
                        } until ((h2 < 0) || (h2 >= h1));
                        if ((h2 < 0)) {
                            // if we get here, then we 
                            // didn't find a big enough 
                            // glyph; check if the char is 
                            // extensible
                            ot_assembly_ptr = (
                              get_ot_assembly_ptr
                            )(cur_f, c, 0);
                            if (ot_assembly_ptr != nil) {
                                free_node(
                                  p,
                                  glyph_node_size,
                                );
                                p = build_opentype_assembly(
                                  cur_f,
                                  ot_assembly_ptr,
                                  h1,
                                  0,
                                );
                                list_ptr(x) = p;
                                delta = 0;
                                goto found;
                            }
                        } else {
                            set_native_glyph_metrics(p, 1);
                        }
                    }
                    delta = get_ot_math_ital_corr(
                      cur_f,
                      native_glyph(p),
                    );
                  found:
                    width(x) = width(p);
                    height(x) = height(p);
                    depth(x) = depth(p);
                }
            }
            if (
                (math_type(subscr(q)) != empty)
                && (subtype(q) != limits)
            ) {
                // remove italic correction
                width(x) = width(x) - delta;
            }
            // center vertically
            shift_amount(x) = 
                half(height(x) - depth(x))
                - axis_height(cur_size)
            ;
            math_type(nucleus(q)) = sub_box;
            info(nucleus(q)) = x;
        }
        save_f = cur_f;
        if (subtype(q) == limits) {
            ⟦794 Construct a box with limits above and below it, skewed by |delta|⟧
        }
        free_ot_assembly(ot_assembly_ptr);
        make_op = delta;
    }
⟧

794. The following program builds a vlist box v for displayed limits. The width of the box is not affected by the fact that the limits may be skewed.

⟦794 Construct a box with limits above and below it, skewed by |delta|⟧ = ⟦
    {
        x = clean_box(supscr(q), sup_style(cur_style));
        y = clean_box(nucleus(q), cur_style);
        z = clean_box(subscr(q), sub_style(cur_style));
        v = new_null_box;
        type(v) = vlist_node;
        width(v) = width(y);
        if (width(x) > width(v)) {
            width(v) = width(x);
        }
        if (width(z) > width(v)) {
            width(v) = width(z);
        }
        x = rebox(x, width(v));
        y = rebox(y, width(v));
        z = rebox(z, width(v));
        shift_amount(x) = half(delta);
        shift_amount(z) = -shift_amount(x);
        height(v) = height(y);
        depth(v) = depth(y);
        ⟦795 Attach the limits to |y| and adjust |height(v)|, |depth(v)| to account for their presence⟧
        new_hlist(q) = v;
    }
⟧

795. We use shift_up and shift_down in the following program for the amount of glue between the displayed operator y and its limits x and z . The vlist inside box v will consist of x followed by y followed by z , with kern nodes for the spaces between and around them.

⟦795 Attach the limits to |y| and adjust |height(v)|, |depth(v)| to account for their presence⟧ = ⟦
    cur_f = save_f

    if (math_type(supscr(q)) == empty) {
        free_node(x, box_node_size);
        list_ptr(v) = y;
    } else {
        shift_up = big_op_spacing3 - depth(x);
        if (shift_up < big_op_spacing1) {
            shift_up = big_op_spacing1;
        }
        p = new_kern(shift_up);
        link(p) = y;
        link(x) = p;
        p = new_kern(big_op_spacing5);
        link(p) = x;
        list_ptr(v) = p;
        height(v) = 
            height(v)
            + big_op_spacing5
            + height(x) + depth(x) + shift_up
        ;
    }

    if (math_type(subscr(q)) == empty) {
        free_node(z, box_node_size);
    } else {
        shift_down = big_op_spacing4 - height(z);
        if (shift_down < big_op_spacing2) {
            shift_down = big_op_spacing2;
        }
        p = new_kern(shift_down);
        link(y) = p;
        link(p) = z;
        p = new_kern(big_op_spacing5);
        link(z) = p;
        depth(v) = 
            depth(v)
            + big_op_spacing5
            + height(z) + depth(z) + shift_down
        ;
    }
⟧

796. A ligature found in a math formula does not create a ligature_node , because there is no question of hyphenation afterwards; the ligature will simply be stored in an ordinary char_node , after residing in an ord_noad .

The math_type is converted to math_text_char here if we would not want to apply an italic correction to the current character unless it belongs to a math font (i.e., a font with space == 0 ).

No boundary characters enter into these ligatures.

⟦777 Declare math construction procedures⟧ += ⟦
    function make_ord(q: pointer) {
        label restart, exit;
        var
          a: integer, // address of lig/kern instruction
          p, r: pointer; // temporary registers for list 
          // manipulation
        
      restart:
        if (math_type(subscr(q)) == empty) {
            if (math_type(supscr(q)) == empty) {
                if (math_type(nucleus(q)) == math_char) {
                    p = link(q);
                    if (p != null) {
                        if (
                            (type(p) >= ord_noad)
                            && (type(p) <= punct_noad)
                        ) {
                            if (
                                math_type(nucleus(p))
                                == math_char
                            ) {
                                if (
                                    fam(nucleus(p))
                                    == fam(nucleus(q))
                                ) {
                                    math_type(nucleus(q)) = (
                                      math_text_char
                                    );
                                    fetch(nucleus(q));
                                    if (
                                        char_tag(cur_i)
                                        == lig_tag
                                    ) {
                                        a = lig_kern_start(
                                          cur_f,
                                        )(cur_i);
                                        cur_c = character(
                                          nucleus(p),
                                        );
                                        cur_i = font_info[
                                          a,
                                        ].qqqq;
                                        if (
                                            skip_byte(
                                              cur_i,
                                            )
                                            > stop_flag
                                        ) {
                                            a = lig_kern_restart(
                                              cur_f,
                                            )(cur_i);
                                            cur_i = (
                                              font_info
                                            )[a].qqqq;
                                        }
                                        loop {
                                            ⟦797 If instruction |cur_i| is a kern with |cur_c|, attach the kern after~|q|; or if it is a ligature with |cur_c|, combine noads |q| and~|p| appropriately; then |return| if the cursor has moved past a noad, or |goto restart|⟧
                                            if (
                                                skip_byte(
                                                  cur_i,
                                                )
                                                >= stop_flag
                                            ) {
                                                return;
                                            }
                                            a = 
                                                a
                                                + qo(
                                                  skip_byte(
                                                    cur_i,
                                                  ),
                                                )
                                                + 1
                                            ;
                                            cur_i = (
                                              font_info
                                            )[a].qqqq;
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
      exit:
    }
⟧

797. Note that a ligature between an ord_noad and another kind of noad is replaced by an ord_noad , when the two noads collapse into one. But we could make a parenthesis (say) change shape when it follows certain letters. Presumably a font designer will define such ligatures only when this convention makes sense.

⟦797 If instruction |cur_i| is a kern with |cur_c|, attach the kern after~|q|; or if it is a ligature with |cur_c|, combine noads |q| and~|p| appropriately; then |return| if the cursor has moved past a noad, or |goto restart|⟧ = ⟦
    if (next_char(cur_i) == cur_c) {
        if (skip_byte(cur_i) <= stop_flag) {
            if (op_byte(cur_i) >= kern_flag) {
                p = new_kern(char_kern(cur_f)(cur_i));
                link(p) = link(q);
                link(q) = p;
                return;
            } else {
                // allow a way out of infinite ligature loop
                check_interrupt;
                case op_byte(cur_i) {
                  qi(1), qi(5):
                    // \.{=:\?}, \.{=:\?>}
                    character(nucleus(q)) = rem_byte(cur_i);
                  qi(2), qi(6):
                    // \.{\?=:}, \.{\?=:>}
                    character(nucleus(p)) = rem_byte(cur_i);
                  qi(3), qi(7), qi(11):
                    // \.{\?=:\?}, \.{\?=:\?>}, \.{\?=:\?>>}
                    r = new_noad;
                    character(nucleus(r)) = rem_byte(cur_i);
                    plane_and_fam_field(nucleus(r)) = fam(
                      nucleus(q),
                    );
                    link(q) = r;
                    link(r) = p;
                    if (op_byte(cur_i) < qi(11)) {
                        math_type(nucleus(r)) = math_char;
                    } else {
                        // prevent combination
                        math_type(nucleus(r)) = (
                          math_text_char
                        );
                    }
                  othercases:
                    link(q) = link(p);
                    // \.{=:}
                    character(nucleus(q)) = rem_byte(cur_i);
                    mem[subscr(q)] = mem[subscr(p)];
                    mem[supscr(q)] = mem[supscr(p)];
                    free_node(p, noad_size);
                }
                if (op_byte(cur_i) > qi(3)) {
                    return;
                }
                math_type(nucleus(q)) = math_char;
                goto restart;
            }
        }
    }
⟧

798. When we get to the following part of the program, we have “fallen through” from cases that did not lead to check_dimensions or done_with_noad or done_with_node . Thus, q points to a noad whose nucleus may need to be converted to an hlist, and whose subscripts and superscripts need to be appended if they are present.

If nucleus(q) is not a math_char , the variable delta is the amount by which a superscript should be moved right with respect to a subscript when both are present.

⟦798 Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts⟧ = ⟦
    case math_type(nucleus(q)) {
      math_char, math_text_char:
        ⟦799 Create a character node |p| for |nucleus(q)|, possibly followed by a kern node for the italic correction, and set |delta| to the italic correction if a subscript is present⟧
      empty:
        p = null;
      sub_box:
        p = info(nucleus(q));
      sub_mlist:
        cur_mlist = info(nucleus(q));
        save_style = cur_style;
        mlist_penalties = false;
        // recursive call
        mlist_to_hlist;
        cur_style = save_style;
        ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        p = hpack(link(temp_head), natural);
      othercases:
        confusion(strpool!("mlist2"));
    }

    new_hlist(q) = p

    if (
        (math_type(subscr(q)) == empty)
        && (math_type(supscr(q)) == empty)
    ) {
        goto check_dimensions;
    }

    make_scripts(q, delta)
⟧

799.

⟦799 Create a character node |p| for |nucleus(q)|, possibly followed by a kern node for the italic correction, and set |delta| to the italic correction if a subscript is present⟧ = ⟦
    {
        fetch(nucleus(q));
        if (is_native_font(cur_f)) {
            z = new_native_character(cur_f, qo(cur_c));
            p = get_node(glyph_node_size);
            type(p) = whatsit_node;
            subtype(p) = glyph_node;
            native_font(p) = cur_f;
            native_glyph(p) = get_native_glyph(z, 0);
            set_native_glyph_metrics(p, 1);
            free_node(z, native_size(z));
            delta = get_ot_math_ital_corr(
              cur_f,
              native_glyph(p),
            );
            if (
                (math_type(nucleus(q)) == math_text_char)
                && (!is_new_mathfont(cur_f) != 0)
            ) {
                // no italic correction in mid-word of text 
                // font
                delta = 0;
            }
            if (
                (math_type(subscr(q)) == empty)
                && (delta != 0)
            ) {
                link(p) = new_kern(delta);
                delta = 0;
            }
        } else if (char_exists(cur_i)) {
            delta = char_italic(cur_f)(cur_i);
            p = new_character(cur_f, qo(cur_c));
            if (
                (math_type(nucleus(q)) == math_text_char)
                && (space(cur_f) != 0)
            ) {
                // no italic correction in mid-word of text 
                // font
                delta = 0;
            }
            if (
                (math_type(subscr(q)) == empty)
                && (delta != 0)
            ) {
                link(p) = new_kern(delta);
                delta = 0;
            }
        } else {
            p = null;
        }
    }
⟧

800. The purpose of make_scripts(q, delta) is to attach the subscript and/or superscript of noad q to the list that starts at new_hlist(q) , given that the subscript and superscript aren’t both empty. The superscript will appear to the right of the subscript by a given distance delta .

We set shift_down and shift_up to the minimum amounts to shift the baseline of subscripts and superscripts based on the given nucleus.

⟦777 Declare math construction procedures⟧ += ⟦
    function attach_hkern_to_new_hlist(
      q: pointer,
      delta: scaled,
    ): pointer {
        var
          y, z: pointer; // temporary registers for box 
          // construction
        
        z = new_kern(delta);
        if (new_hlist(q) == null) {
            new_hlist(q) = z;
        } else {
            y = new_hlist(q);
            while (link(y) != null) {
                y = link(y);
            }
            link(y) = z;
        }
        attach_hkern_to_new_hlist = new_hlist(q);
    }

    function make_scripts(q: pointer, delta: scaled) {
        var
          p, x, y, z: pointer, // temporary registers for 
          // box construction
          shift_up, shift_down, clr, sub_kern, sup_kern: scaled, // 
          // dimensions in the calculation
          script_c: pointer, // temprary native character 
          // for sub/superscript
          script_g: quarterword, // temporary register for 
          // sub/superscript native glyph id
          script_f: internal_font_number, // temporary 
          // register for sub/superscript font
          sup_g: quarterword, // superscript native glyph id
          sup_f: internal_font_number, // superscript font
          sub_g: quarterword, // subscript native glyph id
          sub_f: internal_font_number, // subscript font
          t: integer, // subsidiary size code
          save_f: internal_font_number,
          script_head: pointer, // scratch var for OpenType 
          // s*scripts
          script_ptr: pointer, // scratch var for OpenType 
          // s*scripts
          saved_math_style: small_number, // scratch var for 
          // OpenType s*scripts
          this_math_style: small_number; // scratch var for 
          // OpenType s*scripts
        
        p = new_hlist(q);
        script_c = null;
        script_g = 0;
        script_f = 0;
        sup_kern = 0;
        sub_kern = 0;
        if (is_char_node(p) || is_glyph_node(p)) {
            shift_up = 0;
            shift_down = 0;
        } else {
            z = hpack(p, natural);
            if (cur_style < script_style) {
                t = script_size;
            } else {
                t = script_script_size;
            }
            shift_up = height(z) - sup_drop(t);
            shift_down = depth(z) + sub_drop(t);
            free_node(z, box_node_size);
        }
        if (math_type(supscr(q)) == empty) {
            ⟦801 Construct a subscript box |x| when there is no superscript⟧
        } else {
            ⟦802 Construct a superscript box |x|⟧
            if (math_type(subscr(q)) == empty) {
                shift_amount(x) = -shift_up;
            } else {
                ⟦803 Construct a sub/superscript combination box |x|, with the superscript offset by |delta|⟧
            }
        }
        if (new_hlist(q) == null) {
            new_hlist(q) = x;
        } else {
            p = new_hlist(q);
            while (link(p) != null) {
                p = link(p);
            }
            link(p) = x;
        }
    }
⟧

801. When there is a subscript without a superscript, the top of the subscript should not exceed the baseline plus four-fifths of the x-height.

⟦801 Construct a subscript box |x| when there is no superscript⟧ = ⟦
    {
        script_head = subscr(q);
        ⟦805 Fetch first character of a sub/superscript⟧
        sub_g = script_g;
        sub_f = script_f;
        save_f = cur_f;
        x = clean_box(subscr(q), sub_style(cur_style));
        cur_f = save_f;
        width(x) = width(x) + script_space;
        if (shift_down < sub1(cur_size)) {
            shift_down = sub1(cur_size);
        }
        if (is_new_mathfont(cur_f)) {
            clr = 
                height(x)
                - get_ot_math_constant(
                  cur_f,
                  subscriptTopMax,
                )
            ;
        } else {
            clr = 
                height(x)
                - (abs(math_x_height(cur_size) * 4) div 5)
            ;
        }
        if (shift_down < clr) {
            shift_down = clr;
        }
        shift_amount(x) = shift_down;
        if (is_new_mathfont(cur_f)) {
            ⟦806 Attach subscript OpenType math kerning⟧
        }
    }
⟧

802. The bottom of a superscript should never descend below the baseline plus one-fourth of the x-height.

⟦802 Construct a superscript box |x|⟧ = ⟦
    {
        script_head = supscr(q);
        ⟦805 Fetch first character of a sub/superscript⟧
        sup_g = script_g;
        sup_f = script_f;
        save_f = cur_f;
        x = clean_box(supscr(q), sup_style(cur_style));
        cur_f = save_f;
        width(x) = width(x) + script_space;
        if (odd(cur_style)) {
            clr = sup3(cur_size);
        } else if (cur_style < text_style) {
            clr = sup1(cur_size);
        } else {
            clr = sup2(cur_size);
        }
        if (shift_up < clr) {
            shift_up = clr;
        }
        if (is_new_mathfont(cur_f)) {
            clr = 
                depth(x)
                + get_ot_math_constant(
                  cur_f,
                  superscriptBottomMin,
                )
            ;
        } else {
            clr = 
                depth(x)
                + (abs(math_x_height(cur_size)) div 4)
            ;
        }
        if (shift_up < clr) {
            shift_up = clr;
        }
        if (is_new_mathfont(cur_f)) {
            ⟦807 Attach superscript OpenType math kerning⟧
        }
    }
⟧

803. When both subscript and superscript are present, the subscript must be separated from the superscript by at least four times default_rule_thickness . If this condition would be violated, the subscript moves down, after which both subscript and superscript move up so that the bottom of the superscript is at least as high as the baseline plus four-fifths of the x-height.

⟦803 Construct a sub/superscript combination box |x|, with the superscript offset by |delta|⟧ = ⟦
    {
        save_f = cur_f;
        script_head = subscr(q);
        ⟦805 Fetch first character of a sub/superscript⟧
        sub_g = script_g;
        sub_f = script_f;
        y = clean_box(subscr(q), sub_style(cur_style));
        cur_f = save_f;
        width(y) = width(y) + script_space;
        if (shift_down < sub2(cur_size)) {
            shift_down = sub2(cur_size);
        }
        if (is_new_mathfont(cur_f)) {
            clr = 
                get_ot_math_constant(
                  cur_f,
                  subSuperscriptGapMin,
                )
                - (
                    (shift_up - depth(x))
                    - (height(y) - shift_down)
                )
            ;
        } else {
            clr = 
                4
                * default_rule_thickness
                - (
                    (shift_up - depth(x))
                    - (height(y) - shift_down)
                )
            ;
        }
        if (clr > 0) {
            shift_down = shift_down + clr;
            if (is_new_mathfont(cur_f)) {
                clr = 
                    get_ot_math_constant(
                      cur_f,
                      superscriptBottomMaxWithSubscript,
                    )
                    - (shift_up - depth(x))
                ;
            } else {
                clr = 
                    (abs(math_x_height(cur_size) * 4) div 5)
                    - (shift_up - depth(x))
                ;
            }
            if (clr > 0) {
                shift_up = shift_up + clr;
                shift_down = shift_down - clr;
            }
        }
        if (is_new_mathfont(cur_f)) {
            ⟦806 Attach subscript OpenType math kerning⟧
            ⟦807 Attach superscript OpenType math kerning⟧
        } else {
            sup_kern = 0;
            sub_kern = 0;
        }
        // superscript is delta to the right of the 
        // subscript
        shift_amount(x) = sup_kern + delta - sub_kern;
        p = new_kern(
          (shift_up - depth(x)) - (height(y) - shift_down),
        );
        link(x) = p;
        link(p) = y;
        x = vpack(x, natural);
        shift_amount(x) = shift_down;
    }
⟧

804. OpenType math fonts provide an additional adjustment for the horizontal position of sub/superscripts called math kerning.

The following definitions should be kept in sync with XeTeXOTMath.cpp.

// superscript kern type for get_ot_math_kern 
@define sup_cmd => 0
// subscript kern type for get_ot_math_kern 
@define sub_cmd => 1
@define is_valid_pointer(#) =>
    ((# >= mem_min) && (# <= mem_end))

805.

⟦805 Fetch first character of a sub/superscript⟧ = ⟦
    script_c = null

    script_g = qi(0)

    script_f = null_font

    // Loop through the sub_mlist looking for the first 
    // character-like thing. Ignore kerns or glue so that, 
    // for example, changing $P_j$ to $P_{\!j}$ will have a 
    // predictable effect. Intercept style_node s and 
    // execute them. If we encounter a choice_node , follow 
    // the appropriate branch. Anything else halts the 
    // search and inhibits OpenType kerning.
    // Don't try to do anything clever if the nucleus of the 
    // script_head is empty, e.g., $P_{^j}$ and the such.
    this_math_style = sub_style(cur_style)

    if (math_type(script_head) == sub_mlist) {
        script_ptr = info(script_head);
        script_head = null;
        while (is_valid_pointer(script_ptr)) {
            case type(script_ptr) {
              kern_node, glue_node:
                do_nothing;
              style_node:
                this_math_style = subtype(script_ptr);
              choice_node:
                // see below
                do_nothing;
              ord_noad,
              op_noad,
              bin_noad,
              rel_noad,
              open_noad,
              close_noad,
              punct_noad:
                script_head = nucleus(script_ptr);
                script_ptr = null;
              othercases:
                // end the search
                script_ptr = null;
            }
            if (is_valid_pointer(script_ptr)) {
                if (type(script_ptr) == choice_node) {
                    case this_math_style div 2 {
                      0:
                        script_ptr = display_mlist(
                          script_ptr,
                        );
                      1:
                        script_ptr = text_mlist(script_ptr);
                      2:
                        script_ptr = script_mlist(
                          script_ptr,
                        );
                      3:
                        script_ptr = script_script_mlist(
                          script_ptr,
                        );
                    }
                } else {
                    script_ptr = link(script_ptr);
                }
            }
        }
    }

    if (
        is_valid_pointer(script_head)
        && math_type(script_head) == math_char
    ) {
        save_f = cur_f;
        saved_math_style = cur_style;
        cur_style = this_math_style;
        ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        fetch(script_head);
        if (is_new_mathfont(cur_f)) {
            script_c = new_native_character(
              cur_f,
              qo(cur_c),
            );
            script_g = get_native_glyph(script_c, 0);
            // script font
            script_f = cur_f;
        }
        cur_f = save_f;
        cur_style = saved_math_style;
        ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        // The remaining case is math_type ( script_head ) 
        // == sub_box . Although it would be possible to 
        // deconstruct the box node to find the first glyph, 
        // it will most likely be from a text font without 
        // MATH kerning, so there's probably no point.
    }
⟧

806.

⟦806 Attach subscript OpenType math kerning⟧ = ⟦
    {
        if (is_glyph_node(p)) {
            sub_kern = get_ot_math_kern(
              native_font(p),
              native_glyph(p),
              sub_f,
              sub_g,
              sub_cmd,
              shift_down,
            );
            if (sub_kern != 0) {
                p = attach_hkern_to_new_hlist(q, sub_kern);
            }
        }
    }
⟧

807.

⟦807 Attach superscript OpenType math kerning⟧ = ⟦
    {
        // if there is a superscript the kern will be added 
        // to shift_amount ( x ) 
        if (math_type(subscr(q)) == empty) {
            if (is_glyph_node(p)) {
                sup_kern = get_ot_math_kern(
                  native_font(p),
                  native_glyph(p),
                  sup_f,
                  sup_g,
                  sup_cmd,
                  shift_up,
                );
                if (sup_kern != 0) {
                    p = attach_hkern_to_new_hlist(
                      q,
                      sup_kern,
                    );
                }
            }
        }
    }
⟧

808. We have now tied up all the loose ends of the first pass of mlist_to_hlist . The second pass simply goes through and hooks everything together with the proper glue and penalties. It also handles the left_noad and right_noad that might be present, since max_h and max_d are now known. Variable p points to a node at the current end of the final hlist.

⟦808 Make a second pass over the mlist, removing all noads and inserting the proper spacing and penalties⟧ = ⟦
    p = temp_head

    link(p) = null

    q = mlist

    r_type = 0

    cur_style = style

    ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧

    while (q != null) {
        ⟦809 If node |q| is a style node, change the style and |goto delete_q|; otherwise if it is not a noad, put it into the hlist, advance |q|, and |goto done|; otherwise set |s| to the size of noad |q|, set |t| to the associated type (|ord_noad.. inner_noad|), and set |pen| to the associated penalty⟧
        ⟦814 Append inter-element spacing based on |r_type| and |t|⟧
        ⟦815 Append any |new_hlist| entries for |q|, and any appropriate penalties⟧
        if (type(q) == right_noad) {
            t = open_noad;
        }
        r_type = t;
      delete_q:
        r = q;
        q = link(q);
        free_node(r, s);
      done:
    }
⟧

809. Just before doing the big case switch in the second pass, the program sets up default values so that most of the branches are short.

⟦809 If node |q| is a style node, change the style and |goto delete_q|; otherwise if it is not a noad, put it into the hlist, advance |q|, and |goto done|; otherwise set |s| to the size of noad |q|, set |t| to the associated type (|ord_noad.. inner_noad|), and set |pen| to the associated penalty⟧ = ⟦
    t = ord_noad

    s = noad_size

    pen = inf_penalty

    case type(q) {
      op_noad,
      open_noad,
      close_noad,
      punct_noad,
      inner_noad:
        t = type(q);
      bin_noad:
        t = bin_noad;
        pen = bin_op_penalty;
      rel_noad:
        t = rel_noad;
        pen = rel_penalty;
      ord_noad, vcenter_noad, over_noad, under_noad:
        do_nothing;
      radical_noad:
        s = radical_noad_size;
      accent_noad:
        s = accent_noad_size;
      fraction_noad:
        s = fraction_noad_size;
      left_noad, right_noad:
        t = make_left_right(q, style, max_d, max_h);
      style_node:
        ⟦811 Change the current style and |goto delete_q|⟧
      whatsit_node,
      penalty_node,
      rule_node,
      disc_node,
      adjust_node,
      ins_node,
      mark_node,
      glue_node,
      kern_node:
        link(p) = q;
        p = q;
        q = link(q);
        link(p) = null;
        goto done;
      othercases:
        confusion(strpool!("mlist3"));
    }
⟧

810. The make_left_right function constructs a left or right delimiter of the required size and returns the value open_noad or close_noad . The right_noad and left_noad will both be based on the original style , so they will have consistent sizes.

We use the fact that right_noad - left_noad == close_noad - open_noad .

⟦777 Declare math construction procedures⟧ += ⟦
    function make_left_right(
      q: pointer,
      style: small_number,
      max_d, max_h: scaled,
    ): small_number {
        var
          delta, delta1, delta2: scaled; // dimensions used 
          // in the calculation
        
        cur_style = style;
        ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        delta2 = max_d + axis_height(cur_size);
        delta1 = max_h + max_d - delta2;
        if (delta2 > delta1) {
            //  delta1 is max distance from axis
            delta1 = delta2;
        }
        delta = (delta1 div 500) * delimiter_factor;
        delta2 = delta1 + delta1 - delimiter_shortfall;
        if (delta < delta2) {
            delta = delta2;
        }
        new_hlist(q) = var_delimiter(
          delimiter(q),
          cur_size,
          delta,
        );
        //  open_noad or close_noad 
        make_left_right = type(q) - (left_noad - open_noad);
    }
⟧

811.

⟦811 Change the current style and |goto delete_q|⟧ = ⟦
    {
        cur_style = subtype(q);
        s = style_node_size;
        ⟦746 Set up the values of |cur_size| and |cur_mu|, based on |cur_style|⟧
        goto delete_q;
    }
⟧

812. The inter-element spacing in math formulas depends on an 8×8 table that TEX preloads as a 64-digit string. The elements of this string have the following significance:

0meansnospace;1meansaconditionalthinspace(\nonscript\mskip\thinmuskip);2meansathinspace(\mskip\thinmuskip);3meansaconditionalmediumspace(\nonscript\mskip\medmuskip);4meansaconditionalthickspace(\nonscript\mskip\thickmuskip);*meansanimpossiblecase.
This is all pretty cryptic, but The TEXbook explains what is supposed to happen, and the string makes it happen.

A global variable magic_offset is computed so that if a and b are in the range ord_noad .. inner_noad , then str_pool[a * 8 + b + magic_offset] is the digit for spacing between noad types a and b .

If Pascal had provided a good way to preload constant arrays, this part of the program would not have been so strange.

@define math_spacing =>
    strpool!("0234000122*4000133**3**344*0400400*000000234000111*1111112341011");
     \hskip-35pt
⟦13 Global variables⟧ += ⟦
    // used to find inter-element spacing
    var magic_offset: integer;
⟧

813.

⟦813 Compute the magic offset⟧ = ⟦
    magic_offset = 
        str_start_macro(math_spacing)
        - 9 * ord_noad
⟧

814.

⟦814 Append inter-element spacing based on |r_type| and |t|⟧ = ⟦
    // not the first noad
    if (r_type > 0) {
        case so(str_pool[r_type * 8 + t + magic_offset]) {
          ord!("0"):
            x = 0;
          ord!("1"):
            if (cur_style < script_style) {
                x = thin_mu_skip_code;
            } else {
                x = 0;
            }
          ord!("2"):
            x = thin_mu_skip_code;
          ord!("3"):
            if (cur_style < script_style) {
                x = med_mu_skip_code;
            } else {
                x = 0;
            }
          ord!("4"):
            if (cur_style < script_style) {
                x = thick_mu_skip_code;
            } else {
                x = 0;
            }
          othercases:
            confusion(strpool!("mlist4"));
        }
        if (x != 0) {
            y = math_glue(glue_par(x), cur_mu);
            z = new_glue(y);
            glue_ref_count(y) = null;
            link(p) = z;
            p = z;
            // store a symbolic subtype
            subtype(z) = x + 1;
        }
    }
⟧

815. We insert a penalty node after the hlist entries of noad q if pen is not an “infinite” penalty, and if the node immediately following q is not a penalty node or a rel_noad or absent entirely.

⟦815 Append any |new_hlist| entries for |q|, and any appropriate penalties⟧ = ⟦
    if (new_hlist(q) != null) {
        link(p) = new_hlist(q);
        repeat {
            p = link(p);
        } until (link(p) == null);
    }

    if (penalties) {
        if (link(q) != null) {
            if (pen < inf_penalty) {
                r_type = type(link(q));
                if (r_type != penalty_node) {
                    if (r_type != rel_noad) {
                        z = new_penalty(pen);
                        link(p) = z;
                        p = z;
                    }
                }
            }
        }
    }
⟧

816. [37] Alignment. It’s sort of a miracle whenever \halign and \valign work, because they cut across so many of the control structures of TEX.

Therefore the present page is probably not the best place for a beginner to start reading this program; it is better to master everything else first.

Let us focus our thoughts on an example of what the input might be, in order to get some idea about how the alignment miracle happens. The example doesn’t do anything useful, but it is sufficiently general to indicate all of the special cases that must be dealt with; please do not be disturbed by its apparent complexity and meaninglessness.

\tabskip2ptplus3pt\halignto300pt{u1#v1&\tabskip1ptplus1filu2#v2&u3#v3\cra1&\omita2&\vrule\cr\noalign{\vskip3pt}b1\spanb2\cr\omit&c2\span\omit\cr}
Here’s what happens:

(0) When ‘\halign to 300pt{’ is scanned, the scan_spec routine places the 300pt dimension onto the save_stack , and an align_group code is placed above it. This will make it possible to complete the alignment when the matching ‘}’ is found.

(1) The preamble is scanned next. Macros in the preamble are not expanded, except as part of a tabskip specification. For example, if u2 had been a macro in the preamble above, it would have been expanded, since TEX must look for ‘minus...’ as part of the tabskip glue. A “preamble list” is constructed based on the user’s preamble; in our case it contains the following seven items:

\glue2ptplus3pt(thetabskipprecedingcolumn1)\alignrecord,width(preambleinfoforcolumn1)\glue2ptplus3pt(thetabskipbetweencolumns1and2)\alignrecord,width(preambleinfoforcolumn2)\glue1ptplus1fil(thetabskipbetweencolumns2and3)\alignrecord,width(preambleinfoforcolumn3)\glue1ptplus1fil(thetabskipfollowingcolumn3)
These “alignrecord” entries have the same size as an unset_node , since they will later be converted into such nodes. However, at the moment they have no type or subtype fields; they have info fields instead, and these info fields are initially set to the value end_span , for reasons explained below. Furthermore, the alignrecord nodes have no height or depth fields; these are renamed u_part and v_part , and they point to token lists for the templates of the alignment. For example, the u_part field in the first alignrecord points to the token list ‘u1’, i.e., the template preceding the ‘#’ for column 1.

(2) TEX now looks at what follows the \cr that ended the preamble. It is not ‘\noalign’ or ‘\omit’, so this input is put back to be read again, and the template ‘u1’ is fed to the scanner. Just before reading ‘u1’, TEX goes into restricted horizontal mode. Just after reading ‘u1’, TEX will see ‘a1’, and then (when the & is sensed) TEX will see ‘v1’. Then TEX scans an endv token, indicating the end of a column. At this point an unset_node is created, containing the contents of the current hlist (i.e., ‘u1a1v1’). The natural width of this unset node replaces the width field of the alignrecord for column 1; in general, the alignrecords will record the maximum natural width that has occurred so far in a given column.

(3) Since ‘\omit’ follows the ‘&’, the templates for column 2 are now bypassed. Again TEX goes into restricted horizontal mode and makes an unset_node from the resulting hlist; but this time the hlist contains simply ‘a2’. The natural width of the new unset box is remembered in the width field of the alignrecord for column 2.

(4) A third unset_node is created for column 3, using essentially the mechanism that worked for column 1; this unset box contains ‘u3\vrule v3’. The vertical rule in this case has running dimensions that will later extend to the height and depth of the whole first row, since each unset_node in a row will eventually inherit the height and depth of its enclosing box.

(5) The first row has now ended; it is made into a single unset box comprising the following seven items:

\glue2ptplus3pt\unsetboxfor1column:u1a1v1\glue2ptplus3pt\unsetboxfor1column:a2\glue1ptplus1fil\unsetboxfor1column:u3\vrulev3\glue1ptplus1fil
The width of this unset row is unimportant, but it has the correct height and depth, so the correct baselineskip glue will be computed as the row is inserted into a vertical list.

(6) Since ‘\noalign’ follows the current \cr, TEX appends additional material (in this case \vskip 3pt) to the vertical list. While processing this material, TEX will be in internal vertical mode, and no_align_group will be on save_stack .

(7) The next row produces an unset box that looks like this:

\glue2ptplus3pt\unsetboxfor2columns:u1b1v1u2b2v2\glue1ptplus1fil\unsetboxfor1column:(empty)\glue1ptplus1fil
The natural width of the unset box that spans columns 1 and 2 is stored in a “span node,” which we will explain later; the info field of the alignrecord for column 1 now points to the new span node, and the info of the span node points to end_span .

(8) The final row produces the unset box

\glue2ptplus3pt\unsetboxfor1column:(empty)\glue2ptplus3pt\unsetboxfor2columns:u2c2v2\glue1ptplus1fil
A new span node is attached to the alignrecord for column 2.

(9) The last step is to compute the true column widths and to change all the unset boxes to hboxes, appending the whole works to the vertical list that encloses the \halign. The rules for deciding on the final widths of each unset column box will be explained below.

Note that as \halign is being processed, we fearlessly give up control to the rest of TEX. At critical junctures, an alignment routine is called upon to step in and do some little action, but most of the time these routines just lurk in the background. It’s something like post-hypnotic suggestion.

817. We have mentioned that alignrecords contain no height or depth fields. Their glue_sign and glue_order are pre-empted as well, since it is necessary to store information about what to do when a template ends. This information is called the extra_info field.

// pointer to \<u_j> token list
@define u_part(#) => mem[# + height_offset].int
// pointer to \<v_j> token list
@define v_part(#) => mem[# + depth_offset].int
// info to remember during template
@define extra_info(#) => info(# + list_offset)

818. Alignments can occur within alignments, so a small stack is used to access the alignrecord information. At each level we have a preamble pointer, indicating the beginning of the preamble list; a cur_align pointer, indicating the current position in the preamble list; a cur_span pointer, indicating the value of cur_align at the beginning of a sequence of spanned columns; a cur_loop pointer, indicating the tabskip glue before an alignrecord that should be copied next if the current list is extended; and the align_state variable, which indicates the nesting of braces so that \cr and \span and tab marks are properly intercepted. There also are pointers cur_head and cur_tail to the head and tail of a list of adjustments being moved out from horizontal mode to vertical mode.

The current values of these seven quantities appear in global variables; when they have to be pushed down, they are stored in 5-word nodes, and align_ptr points to the topmost such node.

// the current preamble list
@define preamble => link(align_head)
// number of mem words to save alignment states
@define align_stack_node_size => 6
⟦13 Global variables⟧ += ⟦
    // current position in preamble list
    var cur_align: pointer;

    // start of currently spanned columns in preamble list
    var cur_span: pointer;

    // place to copy when extending a periodic preamble
    var cur_loop: pointer;

    // most recently pushed-down alignment stack node
    var align_ptr: pointer;

    // adjustment list pointers
    var cur_head, cur_tail: pointer;

    // pre-adjustment list pointers
    var cur_pre_head, cur_pre_tail: pointer;
⟧

819. The align_state and preamble variables are initialized elsewhere.

⟦23 Set initial values of key variables⟧ += ⟦
    align_ptr = null

    cur_align = null

    cur_span = null

    cur_loop = null

    cur_head = null

    cur_tail = null

    cur_pre_head = null

    cur_pre_tail = null
⟧

820. Alignment stack maintenance is handled by a pair of trivial routines called push_alignment and pop_alignment .

function push_alignment() {
    var
      p: pointer; // the new alignment stack node
    
    p = get_node(align_stack_node_size);
    link(p) = align_ptr;
    info(p) = cur_align;
    llink(p) = preamble;
    rlink(p) = cur_span;
    mem[p + 2].int = cur_loop;
    mem[p + 3].int = align_state;
    info(p + 4) = cur_head;
    link(p + 4) = cur_tail;
    info(p + 5) = cur_pre_head;
    link(p + 5) = cur_pre_tail;
    align_ptr = p;
    cur_head = get_avail;
    cur_pre_head = get_avail;
}

function pop_alignment() {
    var
      p: pointer; // the top alignment stack node
    
    free_avail(cur_head);
    free_avail(cur_pre_head);
    p = align_ptr;
    cur_tail = link(p + 4);
    cur_head = info(p + 4);
    cur_pre_tail = link(p + 5);
    cur_pre_head = info(p + 5);
    align_state = mem[p + 3].int;
    cur_loop = mem[p + 2].int;
    cur_span = rlink(p);
    preamble = llink(p);
    cur_align = info(p);
    align_ptr = link(p);
    free_node(p, align_stack_node_size);
}

821. TEX has eight procedures that govern alignments: init_align and fin_align are used at the very beginning and the very end; init_row and fin_row are used at the beginning and end of individual rows; init_span is used at the beginning of a sequence of spanned columns (possibly involving only one column); init_col and fin_col are used at the beginning and end of individual columns; and align_peek is used after \cr to see whether the next item is \noalign.

We shall consider these routines in the order they are first used during the course of a complete \halign, namely init_align , align_peek , init_row , init_span , init_col , fin_col , fin_row , fin_align .

822. When \halign or \valign has been scanned in an appropriate mode, TEX calls init_align , whose task is to get everything off to a good start. This mostly involves scanning the preamble and putting its information into the preamble list.

⟦830 Declare the procedure called |get_preamble_token|⟧

forward_declaration align_peek();

forward_declaration normal_paragraph();

function init_align() {
    label done, done1, done2, continue;
    var
      save_cs_ptr: pointer, //  warning_index value for 
      // error messages
      p: pointer; // for short-term temporary use
    
    // \.{\\halign} or \.{\\valign}, usually
    save_cs_ptr = cur_cs;
    push_alignment;
    // enter a new alignment level
    align_state = -1000000;
    ⟦824 Check for improper alignment in displayed math⟧
    // enter a new semantic level
    push_nest;
    ⟦823 Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}⟧
    scan_spec(align_group, false);
    ⟦825 Scan the preamble and record it in the |preamble| list⟧
    new_save_level(align_group);
    if (every_cr != null) {
        begin_token_list(every_cr, every_cr_text);
    }
    // look for \.{\\noalign} or \.{\\omit}
    align_peek;
}

823. In vertical modes, prev_depth already has the correct value. But if we are in mmode (displayed formula mode), we reach out to the enclosing vertical mode for the prev_depth value that produces the correct baseline calculations.

⟦823 Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}⟧ = ⟦
    if (mode == mmode) {
        mode = -vmode;
        prev_depth = nest[nest_ptr - 2].aux_field.sc;
    } else if (mode > 0) {
        negate(mode);
    }
⟧

824. When \halign is used as a displayed formula, there should be no other pieces of mlists present.

⟦824 Check for improper alignment in displayed math⟧ = ⟦
    if (
        (mode == mmode)
        && ((tail != head) || (incompleat_noad != null))
    ) {
        print_err(strpool!("Improper "));
        print_esc(strpool!("halign"));
        print(strpool!(" inside $$'s"));
        help3(
          strpool!("Displays can use special alignments (like \\eqalignno)"),
        )(
          strpool!("only if nothing but the alignment itself is between $$'s."),
        )(
          strpool!("So I've deleted the formulas that preceded this alignment."),
        );
        error;
        flush_math;
    }
⟧

825.

⟦825 Scan the preamble and record it in the |preamble| list⟧ = ⟦
    preamble = null

    cur_align = align_head

    cur_loop = null

    scanner_status = aligning

    warning_index = save_cs_ptr

    // at this point, cur_cmd == left_brace 
    align_state = -1000000

    loop {
        ⟦826 Append the current tabskip glue to the preamble list⟧
        if (cur_cmd == car_ret) {
            // \.{\\cr} ends the preamble
            goto done;
        }
        ⟦827 Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|, looking for changes in the tabskip glue; append an alignrecord to the preamble list⟧
    }

    done:

    scanner_status = normal
⟧

826.

⟦826 Append the current tabskip glue to the preamble list⟧ = ⟦
    link(cur_align) = new_param_glue(tab_skip_code)

    cur_align = link(cur_align)
⟧

827.

⟦827 Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|, looking for changes in the tabskip glue; append an alignrecord to the preamble list⟧ = ⟦
    ⟦831 Scan the template \<u_j>, putting the resulting token list in |hold_head|⟧

    link(cur_align) = new_null_box

    cur_align = link(cur_align) // a new alignrecord

    info(cur_align) = end_span

    width(cur_align) = null_flag

    u_part(cur_align) = link(hold_head)

    ⟦832 Scan the template \<v_j>, putting the resulting token list in |hold_head|⟧

    v_part(cur_align) = link(hold_head)
⟧

828. We enter ‘\span’ into eqtb with tab_mark as its command code, and with span_code as the command modifier. This makes TEX interpret it essentially the same as an alignment delimiter like ‘&’, yet it is recognizably different when we need to distinguish it from a normal delimiter. It also turns out to be useful to give a special cr_code to ‘\cr’, and an even larger cr_cr_code to ‘\crcr’.

The end of a template is represented by two “frozen” control sequences called \endtemplate. The first has the command code end_template , which is >outer_call , so it will not easily disappear in the presence of errors. The get_x_token routine converts the first into the second, which has endv as its command code.

// distinct from any character
@define span_code => special_char
// distinct from span_code and from any character
@define cr_code => span_code + 1
// this distinguishes \.{\\crcr} from \.{\\cr}
@define cr_cr_code => cr_code + 1
@define end_template_token =>
    cs_token_flag + frozen_end_template
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("span"), tab_mark, span_code)

    primitive(strpool!("cr"), car_ret, cr_code)

    text(frozen_cr) = strpool!("cr")

    eqtb[frozen_cr] = eqtb[cur_val]

    primitive(strpool!("crcr"), car_ret, cr_cr_code)

    text(frozen_end_template) = strpool!("endtemplate")

    text(frozen_endv) = strpool!("endtemplate")

    eq_type(frozen_endv) = endv

    equiv(frozen_endv) = null_list

    eq_level(frozen_endv) = level_one

    eqtb[frozen_end_template] = eqtb[frozen_endv]

    eq_type(frozen_end_template) = end_template
⟧

829.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    tab_mark:

    if (chr_code == span_code) {
        print_esc(strpool!("span"));
    } else {
        chr_cmd(strpool!("alignment tab character "));
    }

    car_ret:

    if (chr_code == cr_code) {
        print_esc(strpool!("cr"));
    } else {
        print_esc(strpool!("crcr"));
    }
⟧

830. The preamble is copied directly, except that \tabskip causes a change to the tabskip glue, thereby possibly expanding macros that immediately follow it. An appearance of \span also causes such an expansion.

Note that if the preamble contains ‘\global\tabskip’, the ‘\global’ token survives in the preamble and the ‘\tabskip’ defines new tabskip glue (locally).

⟦830 Declare the procedure called |get_preamble_token|⟧ = ⟦
    function get_preamble_token() {
        label restart;
        
      restart:
        get_token;
        while (
            (cur_chr == span_code)
            && (cur_cmd == tab_mark)
        ) {
            // this token will be expanded once
            get_token;
            if (cur_cmd > max_command) {
                expand;
                get_token;
            }
        }
        if (cur_cmd == endv) {
            fatal_error(
              strpool!("(interwoven alignment preambles are not allowed)"),
            );
        }
        if (
            (cur_cmd == assign_glue)
            && (cur_chr == glue_base + tab_skip_code)
        ) {
            scan_optional_equals;
            scan_glue(glue_val);
            if (global_defs > 0) {
                geq_define(
                  glue_base + tab_skip_code,
                  glue_ref,
                  cur_val,
                );
            } else {
                eq_define(
                  glue_base + tab_skip_code,
                  glue_ref,
                  cur_val,
                );
            }
            goto restart;
        }
    }
⟧

831. Spaces are eliminated from the beginning of a template.

⟦831 Scan the template \<u_j>, putting the resulting token list in |hold_head|⟧ = ⟦
    p = hold_head

    link(p) = null

    loop {
        get_preamble_token;
        if (cur_cmd == mac_param) {
            goto done1;
        }
        if (
            (cur_cmd <= car_ret)
            && (cur_cmd >= tab_mark)
            && (align_state == -1000000)
        ) {
            if (
                (p == hold_head)
                && (cur_loop == null)
                && (cur_cmd == tab_mark)
            ) {
                cur_loop = cur_align;
            } else {
                print_err(
                  strpool!("Missing # inserted in alignment preamble"),
                );
                help3(
                  strpool!("There should be exactly one # between &'s, when an"),
                )(
                  strpool!("\\halign or \\valign is being set up. In this case you had"),
                )(
                  strpool!("none, so I've put one in; maybe that will work."),
                );
                back_error;
                goto done1;
            }
        } else if ((cur_cmd != spacer) || (p != hold_head)) {
            link(p) = get_avail;
            p = link(p);
            info(p) = cur_tok;
        }
    }

    done1:
⟧

832.

⟦832 Scan the template \<v_j>, putting the resulting token list in |hold_head|⟧ = ⟦
    p = hold_head

    link(p) = null

    loop {
      continue:
        get_preamble_token;
        if (
            (cur_cmd <= car_ret)
            && (cur_cmd >= tab_mark)
            && (align_state == -1000000)
        ) {
            goto done2;
        }
        if (cur_cmd == mac_param) {
            print_err(
              strpool!("Only one # is allowed per tab"),
            );
            help3(
              strpool!("There should be exactly one # between &'s, when an"),
            )(
              strpool!("\\halign or \\valign is being set up. In this case you had"),
            )(
              strpool!("more than one, so I'm ignoring all but the first."),
            );
            error;
            goto continue;
        }
        link(p) = get_avail;
        p = link(p);
        info(p) = cur_tok;
    }

    done2:

    link(p) = get_avail

    p = link(p)

    // put \.{\\endtemplate} at the end
    info(p) = end_template_token
⟧

833. The tricky part about alignments is getting the templates into the scanner at the right time, and recovering control when a row or column is finished.

We usually begin a row after each \cr has been sensed, unless that \cr is followed by \noalign or by the right brace that terminates the alignment. The align_peek routine is used to look ahead and do the right thing; it either gets a new row started, or gets a \noalign started, or finishes off the alignment.

⟦833 Declare the procedure called |align_peek|⟧ = ⟦
    function align_peek() {
        label restart;
        
      restart:
        align_state = 1000000;
        repeat {
            get_x_or_protected;
        } until (cur_cmd != spacer);
        if (cur_cmd == no_align) {
            scan_left_brace;
            new_save_level(no_align_group);
            if (mode == -vmode) {
                normal_paragraph;
            }
        } else if (cur_cmd == right_brace) {
            fin_align;
        } else if (
            (cur_cmd == car_ret)
            && (cur_chr == cr_cr_code)
        ) {
            // ignore \.{\\crcr}
            goto restart;
        } else {
            // start a new row
            init_row;
            // start a new column and replace what we peeked 
            // at
            init_col;
        }
    }
⟧

834. To start a row (i.e., a ‘row’ that rhymes with ‘dough’ but not with ‘bough’), we enter a new semantic level, copy the first tabskip glue, and change from internal vertical mode to restricted horizontal mode or vice versa. The space_factor and prev_depth are not used on this semantic level, but we clear them to zero just to be tidy.

⟦835 Declare the procedure called |init_span|⟧

function init_row() {
    push_nest;
    mode = (-hmode - vmode) - mode;
    if (mode == -hmode) {
        space_factor = 0;
    } else {
        prev_depth = 0;
    }
    tail_append(new_glue(glue_ptr(preamble)));
    subtype(tail) = tab_skip_code + 1;
    cur_align = link(preamble);
    cur_tail = cur_head;
    cur_pre_tail = cur_pre_head;
    init_span(cur_align);
}

835. The parameter to init_span is a pointer to the alignrecord where the next column or group of columns will begin. A new semantic level is entered, so that the columns will generate a list for subsequent packaging.

⟦835 Declare the procedure called |init_span|⟧ = ⟦
    function init_span(p: pointer) {
        push_nest;
        if (mode == -hmode) {
            space_factor = 1000;
        } else {
            prev_depth = ignore_depth;
            normal_paragraph;
        }
        cur_span = p;
    }
⟧

836. When a column begins, we assume that cur_cmd is either omit or else the current token should be put back into the input until the 𝑢𝑗 template has been scanned. (Note that cur_cmd might be tab_mark or car_ret .) We also assume that align_state is approximately 1000000 at this time. We remain in the same mode, and start the template if it is called for.

function init_col() {
    extra_info(cur_align) = cur_cmd;
    if (cur_cmd == omit) {
        align_state = 0;
    } else {
        back_input;
        begin_token_list(u_part(cur_align), u_template);
        // now align_state == 1000000 
    }
}

837. The scanner sets align_state to zero when the 𝑢𝑗 template ends. When a subsequent \cr or \span or tab mark occurs with align_state == 0 , the scanner activates the following code, which fires up the 𝑣𝑗 template. We need to remember the cur_chr , which is either cr_cr_code , cr_code , span_code , or a character code, depending on how the column text has ended.

This part of the program had better not be activated when the preamble to another alignment is being scanned, or when no alignment preamble is active.

⟦837 Insert the \(v)\<v_j> template and |goto restart|⟧ = ⟦
    {
        if (
            (scanner_status == aligning)
            || (cur_align == null)
        ) {
            fatal_error(
              strpool!("(interwoven alignment preambles are not allowed)"),
            );
        }
        cur_cmd = extra_info(cur_align);
        extra_info(cur_align) = cur_chr;
        if (cur_cmd == omit) {
            begin_token_list(omit_template, v_template);
        } else {
            begin_token_list(v_part(cur_align), v_template);
        }
        align_state = 1000000;
        goto restart;
    }
⟧

838. The token list omit_template just referred to is a constant token list that contains the special control sequence \endtemplate only.

⟦838 Initialize the special list heads and constant nodes⟧ = ⟦
    //  link ( omit_template ) == null 
    info(omit_template) = end_template_token
⟧

839. When the endv command at the end of a 𝑣𝑗 template comes through the scanner, things really start to happen; and it is the fin_col routine that makes them happen. This routine returns true if a row as well as a column has been finished.

function fin_col(): boolean {
    label exit;
    var
      p: pointer, // the alignrecord after the current one
      q, r: pointer, // temporary pointers for list 
      // manipulation
      s: pointer, // a new span node
      u: pointer, // a new unset box
      w: scaled, // natural width
      o: glue_ord, // order of infinity
      n: halfword; // span counter
    
    if (cur_align == null) {
        confusion(strpool!("endv"));
    }
    q = link(cur_align);
    if (q == null) {
        confusion(strpool!("endv"));
    }
    if (align_state < 500000) {
        fatal_error(
          strpool!("(interwoven alignment preambles are not allowed)"),
        );
    }
    p = link(q);
    ⟦840 If the preamble list has been traversed, check that the row has ended⟧
    if (extra_info(cur_align) != span_code) {
        unsave;
        new_save_level(align_group);
        ⟦844 Package an unset box for the current column and record its width⟧
        ⟦843 Copy the tabskip glue between columns⟧
        if (extra_info(cur_align) >= cr_code) {
            fin_col = true;
            return;
        }
        init_span(p);
    }
    align_state = 1000000;
    repeat {
        get_x_or_protected;
    } until (cur_cmd != spacer);
    cur_align = p;
    init_col;
    fin_col = false;
  exit:
}

840.

⟦840 If the preamble list has been traversed, check that the row has ended⟧ = ⟦
    if ((p == null) && (extra_info(cur_align) < cr_code)) {
        if (cur_loop != null) {
            ⟦841 Lengthen the preamble periodically⟧
        } else {
            print_err(
              strpool!("Extra alignment tab has been changed to "),
            );
            print_esc(strpool!("cr"));
            help3(
              strpool!("You have given more \\span or & marks than there were"),
            )(
              strpool!("in the preamble to the \\halign or \\valign now in progress."),
            )(
              strpool!("So I'll assume that you meant to type \\cr instead."),
            );
            extra_info(cur_align) = cr_code;
            error;
        }
    }
⟧

841.

⟦841 Lengthen the preamble periodically⟧ = ⟦
    {
        link(q) = new_null_box;
        // a new alignrecord
        p = link(q);
        info(p) = end_span;
        width(p) = null_flag;
        cur_loop = link(cur_loop);
        ⟦842 Copy the templates from node |cur_loop| into node |p|⟧
        cur_loop = link(cur_loop);
        link(p) = new_glue(glue_ptr(cur_loop));
        subtype(link(p)) = tab_skip_code + 1;
    }
⟧

842.

⟦842 Copy the templates from node |cur_loop| into node |p|⟧ = ⟦
    q = hold_head

    r = u_part(cur_loop)

    while (r != null) {
        link(q) = get_avail;
        q = link(q);
        info(q) = info(r);
        r = link(r);
    }

    link(q) = null

    u_part(p) = link(hold_head)

    q = hold_head

    r = v_part(cur_loop)

    while (r != null) {
        link(q) = get_avail;
        q = link(q);
        info(q) = info(r);
        r = link(r);
    }

    link(q) = null

    v_part(p) = link(hold_head)
⟧

843.

⟦843 Copy the tabskip glue between columns⟧ = ⟦
    tail_append(new_glue(glue_ptr(link(cur_align))))

    subtype(tail) = tab_skip_code + 1

844.

⟦844 Package an unset box for the current column and record its width⟧ = ⟦
    {
        if (mode == -hmode) {
            adjust_tail = cur_tail;
            pre_adjust_tail = cur_pre_tail;
            u = hpack(link(head), natural);
            w = width(u);
            cur_tail = adjust_tail;
            adjust_tail = null;
            cur_pre_tail = pre_adjust_tail;
            pre_adjust_tail = null;
        } else {
            u = vpackage(link(head), natural, 0);
            w = height(u);
        }
        // this represents a span count of 1
        n = min_quarterword;
        if (cur_span != cur_align) {
            ⟦846 Update width entry for spanned columns⟧
        } else if (w > width(cur_align)) {
            width(cur_align) = w;
        }
        type(u) = unset_node;
        span_count(u) = n;
        ⟦701 Determine the stretch order⟧
        glue_order(u) = o;
        glue_stretch(u) = total_stretch[o];
        ⟦707 Determine the shrink order⟧
        glue_sign(u) = o;
        glue_shrink(u) = total_shrink[o];
        pop_nest;
        link(tail) = u;
        tail = u;
    }
⟧

845. A span node is a 2-word record containing width , info , and link fields. The link field is not really a link, it indicates the number of spanned columns; the info field points to a span node for the same starting column, having a greater extent of spanning, or to end_span , which has the largest possible link field; the width field holds the largest natural width corresponding to a particular set of spanned columns.

A list of the maximum widths so far, for spanned columns starting at a given column, begins with the info field of the alignrecord for that column.

// number of mem words for a span node
@define span_node_size => 2
⟦838 Initialize the special list heads and constant nodes⟧ += ⟦
    link(end_span) = max_quarterword + 1

    info(end_span) = null
⟧

846.

⟦846 Update width entry for spanned columns⟧ = ⟦
    {
        q = cur_span;
        repeat {
            incr(n);
            q = link(link(q));
        } until (q == cur_align);
        if (n > max_quarterword) {
            // this can happen, but won't
            confusion(strpool!("too many spans"));
        }
        q = cur_span;
        while (link(info(q)) < n) {
            q = info(q);
        }
        if (link(info(q)) > n) {
            s = get_node(span_node_size);
            info(s) = info(q);
            link(s) = n;
            info(q) = s;
            width(s) = w;
        } else if (width(info(q)) < w) {
            width(info(q)) = w;
        }
    }
⟧

847. At the end of a row, we append an unset box to the current vlist (for \halign) or the current hlist (for \valign). This unset box contains the unset boxes for the columns, separated by the tabskip glue. Everything will be set later.

function fin_row() {
    var
      p: pointer; // the new unset box
    
    if (mode == -hmode) {
        p = hpack(link(head), natural);
        pop_nest;
        if (cur_pre_head != cur_pre_tail) {
            append_list(cur_pre_head)(cur_pre_tail);
        }
        append_to_vlist(p);
        if (cur_head != cur_tail) {
            append_list(cur_head)(cur_tail);
        }
    } else {
        p = vpack(link(head), natural);
        pop_nest;
        link(tail) = p;
        tail = p;
        space_factor = 1000;
    }
    type(p) = unset_node;
    glue_stretch(p) = 0;
    if (every_cr != null) {
        begin_token_list(every_cr, every_cr_text);
    }
    align_peek;
    // note that glue_shrink ( p ) == 0 since glue_shrink 
    // === shift_amount 
}

848. Finally, we will reach the end of the alignment, and we can breathe a sigh of relief that memory hasn’t overflowed. All the unset boxes will now be set so that the columns line up, taking due account of spanned columns.

forward_declaration do_assignments();

forward_declaration resume_after_display();

forward_declaration build_page();

function fin_align() {
    var
      p, q, r, s, u, v: pointer, // registers for the list 
      // operations
      t, w: scaled, // width of column
      o: scaled, // shift offset for unset boxes
      n: halfword, // matching span amount
      rule_save: scaled, // temporary storage for 
      // overfull_rule 
      aux_save: memory_word; // temporary storage for aux 
    
    if (cur_group != align_group) {
        confusion(strpool!("align1"));
    }
    // that align_group was for individual entries
    unsave;
    if (cur_group != align_group) {
        confusion(strpool!("align0"));
    }
    // that align_group was for the whole alignment
    unsave;
    if (nest[nest_ptr - 1].mode_field == mmode) {
        o = display_indent;
    } else {
        o = 0;
    }
    ⟦849 Go through the preamble list, determining the column widths and changing the alignrecords to dummy unset boxes⟧
    ⟦852 Package the preamble list, to determine the actual tabskip glue amounts, and let |p| point to this prototype box⟧
    ⟦853 Set the glue in all the unset boxes of the current list⟧
    flush_node_list(p);
    pop_alignment;
    ⟦860 Insert the \(c)current list into its environment⟧
}

⟦833 Declare the procedure called |align_peek|⟧

849. It’s time now to dismantle the preamble list and to compute the column widths. Let 𝑤𝑖𝑗 be the maximum of the natural widths of all entries that span columns 𝑖 through 𝑗, inclusive. The alignrecord for column 𝑖 contains 𝑤𝑖𝑖 in its width field, and there is also a linked list of the nonzero 𝑤𝑖𝑗 for increasing 𝑗, accessible via the info field; these span nodes contain the value 𝑗𝑖+𝑚𝑖𝑛_𝑞𝑢𝑎𝑟𝑡𝑒𝑟𝑤𝑜𝑟𝑑 in their link fields. The values of 𝑤𝑖𝑖 were initialized to null_flag , which we regard as .

The final column widths are defined by the formula

𝑤𝑗=max1𝑖𝑗(𝑤𝑖𝑗𝑖𝑘<𝑗(𝑡𝑘+𝑤𝑘)),
where 𝑡𝑘 is the natural width of the tabskip glue between columns 𝑘 and 𝑘+1. However, if 𝑤𝑖𝑗= for all i in the range 1 <= i <= j (i.e., if every entry that involved column j also involved column j + 1 ), we let 𝑤𝑗=0, and we zero out the tabskip glue after column j .

TEX computes these values by using the following scheme: First 𝑤1=𝑤11. Then replace 𝑤2𝑗 by max(𝑤2𝑗,𝑤1𝑗𝑡1𝑤1), for all 𝑗>1. Then 𝑤2=𝑤22. Then replace 𝑤3𝑗 by max(𝑤3𝑗,𝑤2𝑗𝑡2𝑤2) for all 𝑗>2; and so on. If any 𝑤𝑗 turns out to be , its value is changed to zero and so is the next tabskip.

⟦849 Go through the preamble list, determining the column widths and changing the alignrecords to dummy unset boxes⟧ = ⟦
    q = link(preamble)

    repeat {
        flush_list(u_part(q));
        flush_list(v_part(q));
        p = link(link(q));
        if (width(q) == null_flag) {
            ⟦850 Nullify |width(q)| and the tabskip glue following this column⟧
        }
        if (info(q) != end_span) {
            ⟦851 Merge the widths in the span nodes of |q| with those of |p|, destroying the span nodes of |q|⟧
        }
        type(q) = unset_node;
        span_count(q) = min_quarterword;
        height(q) = 0;
        depth(q) = 0;
        glue_order(q) = normal;
        glue_sign(q) = normal;
        glue_stretch(q) = 0;
        glue_shrink(q) = 0;
        q = p;
    } until (q == null)
⟧

850.

⟦850 Nullify |width(q)| and the tabskip glue following this column⟧ = ⟦
    {
        width(q) = 0;
        r = link(q);
        s = glue_ptr(r);
        if (s != zero_glue) {
            add_glue_ref(zero_glue);
            delete_glue_ref(s);
            glue_ptr(r) = zero_glue;
        }
    }
⟧

851. Merging of two span-node lists is a typical exercise in the manipulation of linearly linked data structures. The essential invariant in the following repeat loop is that we want to dispense with node r , in q ’s list, and u is its successor; all nodes of p ’s list up to and including s have been processed, and the successor of s matches r or precedes r or follows r , according as link(r) == n or link(r) > n or link(r) < n .

⟦851 Merge the widths in the span nodes of |q| with those of |p|, destroying the span nodes of |q|⟧ = ⟦
    {
        t = width(q) + width(glue_ptr(link(q)));
        r = info(q);
        s = end_span;
        info(s) = p;
        n = min_quarterword + 1;
        repeat {
            width(r) = width(r) - t;
            u = info(r);
            while (link(r) > n) {
                s = info(s);
                n = link(info(s)) + 1;
            }
            if (link(r) < n) {
                info(r) = info(s);
                info(s) = r;
                decr(link(r));
                s = r;
            } else {
                if (width(r) > width(info(s))) {
                    width(info(s)) = width(r);
                }
                free_node(r, span_node_size);
            }
            r = u;
        } until (r == end_span);
    }
⟧

852. Now the preamble list has been converted to a list of alternating unset boxes and tabskip glue, where the box widths are equal to the final column sizes. In case of \valign, we change the widths to heights, so that a correct error message will be produced if the alignment is overfull or underfull.

⟦852 Package the preamble list, to determine the actual tabskip glue amounts, and let |p| point to this prototype box⟧ = ⟦
    save_ptr = save_ptr - 2

    pack_begin_line = -mode_line

    if (mode == -vmode) {
        rule_save = overfull_rule;
        // prevent rule from being packaged
        overfull_rule = 0;
        p = hpack(preamble, saved(1), saved(0));
        overfull_rule = rule_save;
    } else {
        q = link(preamble);
        repeat {
            height(q) = width(q);
            width(q) = 0;
            q = link(link(q));
        } until (q == null);
        p = vpack(preamble, saved(1), saved(0));
        q = link(preamble);
        repeat {
            width(q) = height(q);
            height(q) = 0;
            q = link(link(q));
        } until (q == null);
    }

    pack_begin_line = 0

853.

⟦853 Set the glue in all the unset boxes of the current list⟧ = ⟦
    q = link(head)

    s = head

    while (q != null) {
        if (!is_char_node(q)) {
            if (type(q) == unset_node) {
                ⟦855 Set the unset box |q| and the unset boxes in it⟧
            } else if (type(q) == rule_node) {
                ⟦854 Make the running dimensions in rule |q| extend to the boundaries of the alignment⟧
            }
        }
        s = q;
        q = link(q);
    }
⟧

854.

⟦854 Make the running dimensions in rule |q| extend to the boundaries of the alignment⟧ = ⟦
    {
        if (is_running(width(q))) {
            width(q) = width(p);
        }
        if (is_running(height(q))) {
            height(q) = height(p);
        }
        if (is_running(depth(q))) {
            depth(q) = depth(p);
        }
        if (o != 0) {
            r = link(q);
            link(q) = null;
            q = hpack(q, natural);
            shift_amount(q) = o;
            link(q) = r;
            link(s) = q;
        }
    }
⟧

855. The unset box q represents a row that contains one or more unset boxes, depending on how soon \cr occurred in that row.

⟦855 Set the unset box |q| and the unset boxes in it⟧ = ⟦
    {
        if (mode == -vmode) {
            type(q) = hlist_node;
            width(q) = width(p);
            if (nest[nest_ptr - 1].mode_field == mmode) {
                // for ship_out 
                set_box_lr(q)(dlist);
            }
        } else {
            type(q) = vlist_node;
            height(q) = height(p);
        }
        glue_order(q) = glue_order(p);
        glue_sign(q) = glue_sign(p);
        glue_set(q) = glue_set(p);
        shift_amount(q) = o;
        r = link(list_ptr(q));
        s = link(list_ptr(p));
        repeat {
            ⟦856 Set the glue in node |r| and change it from an unset node⟧
            r = link(link(r));
            s = link(link(s));
        } until (r == null);
    }
⟧

856. A box made from spanned columns will be followed by tabskip glue nodes and by empty boxes as if there were no spanning. This permits perfect alignment of subsequent entries, and it prevents values that depend on floating point arithmetic from entering into the dimensions of any boxes.

⟦856 Set the glue in node |r| and change it from an unset node⟧ = ⟦
    n = span_count(r)

    t = width(s)

    w = t

    u = hold_head

    set_box_lr(r)(0) // for ship_out 

    while (n > min_quarterword) {
        decr(n);
        ⟦857 Append tabskip glue and an empty box to list |u|, and update |s| and |t| as the prototype nodes are passed⟧
    }

    if (mode == -vmode) {
        ⟦858 Make the unset node |r| into an |hlist_node| of width |w|, setting the glue as if the width were |t|⟧
    } else {
        ⟦859 Make the unset node |r| into a |vlist_node| of height |w|, setting the glue as if the height were |t|⟧
    }

    shift_amount(r) = 0

    // append blank boxes to account for spanned nodes
    if (u != hold_head) {
        link(u) = link(r);
        link(r) = link(hold_head);
        r = u;
    }
⟧

857.

⟦857 Append tabskip glue and an empty box to list |u|, and update |s| and |t| as the prototype nodes are passed⟧ = ⟦
    s = link(s)

    v = glue_ptr(s)

    link(u) = new_glue(v)

    u = link(u)

    subtype(u) = tab_skip_code + 1

    t = t + width(v)

    if (glue_sign(p) == stretching) {
        if (stretch_order(v) == glue_order(p)) {
            t = t + round(float(glue_set(p)) * stretch(v));
        }
    } else if (glue_sign(p) == shrinking) {
        if (shrink_order(v) == glue_order(p)) {
            t = t - round(float(glue_set(p)) * shrink(v));
        }
    }

    s = link(s)

    link(u) = new_null_box

    u = link(u)

    t = t + width(s)

    if (mode == -vmode) {
        width(u) = width(s);
    } else {
        type(u) = vlist_node;
        height(u) = width(s);
    }
⟧

858.

⟦858 Make the unset node |r| into an |hlist_node| of width |w|, setting the glue as if the width were |t|⟧ = ⟦
    {
        height(r) = height(q);
        depth(r) = depth(q);
        if (t == width(r)) {
            glue_sign(r) = normal;
            glue_order(r) = normal;
            set_glue_ratio_zero(glue_set(r));
        } else if (t > width(r)) {
            glue_sign(r) = stretching;
            if (glue_stretch(r) == 0) {
                set_glue_ratio_zero(glue_set(r));
            } else {
                glue_set(r) = unfloat(
                  (t - width(r)) / glue_stretch(r),
                );
            }
        } else {
            glue_order(r) = glue_sign(r);
            glue_sign(r) = shrinking;
            if (glue_shrink(r) == 0) {
                set_glue_ratio_zero(glue_set(r));
            } else if (
                (glue_order(r) == normal)
                && (width(r) - t > glue_shrink(r))
            ) {
                set_glue_ratio_one(glue_set(r));
            } else {
                glue_set(r) = unfloat(
                  (width(r) - t) / glue_shrink(r),
                );
            }
        }
        width(r) = w;
        type(r) = hlist_node;
    }
⟧

859.

⟦859 Make the unset node |r| into a |vlist_node| of height |w|, setting the glue as if the height were |t|⟧ = ⟦
    {
        width(r) = width(q);
        if (t == height(r)) {
            glue_sign(r) = normal;
            glue_order(r) = normal;
            set_glue_ratio_zero(glue_set(r));
        } else if (t > height(r)) {
            glue_sign(r) = stretching;
            if (glue_stretch(r) == 0) {
                set_glue_ratio_zero(glue_set(r));
            } else {
                glue_set(r) = unfloat(
                  (t - height(r)) / glue_stretch(r),
                );
            }
        } else {
            glue_order(r) = glue_sign(r);
            glue_sign(r) = shrinking;
            if (glue_shrink(r) == 0) {
                set_glue_ratio_zero(glue_set(r));
            } else if (
                (glue_order(r) == normal)
                && (height(r) - t > glue_shrink(r))
            ) {
                set_glue_ratio_one(glue_set(r));
            } else {
                glue_set(r) = unfloat(
                  (height(r) - t) / glue_shrink(r),
                );
            }
        }
        height(r) = w;
        type(r) = vlist_node;
    }
⟧

860. We now have a completed alignment, in the list that starts at head and ends at tail . This list will be merged with the one that encloses it. (In case the enclosing mode is mmode , for displayed formulas, we will need to insert glue before and after the display; that part of the program will be deferred until we’re more familiar with such operations.)

In restricted horizontal mode, the clang part of aux is undefined; an over-cautious Pascal runtime system may complain about this.

⟦860 Insert the \(c)current list into its environment⟧ = ⟦
    aux_save = aux

    p = link(head)

    q = tail

    pop_nest

    if (mode == mmode) {
        ⟦1260 Finish an alignment in a display⟧
    } else {
        aux = aux_save;
        link(tail) = p;
        if (p != null) {
            tail = q;
        }
        if (mode == vmode) {
            build_page;
        }
    }
⟧

861. [38] Breaking paragraphs into lines. We come now to what is probably the most interesting algorithm of TEX: the mechanism for choosing the “best possible” breakpoints that yield the individual lines of a paragraph. TEX’s line-breaking algorithm takes a given horizontal list and converts it to a sequence of boxes that are appended to the current vertical list. In the course of doing this, it creates a special data structure containing three kinds of records that are not used elsewhere in TEX. Such nodes are created while a paragraph is being processed, and they are destroyed afterwards; thus, the other parts of TEX do not need to know anything about how line-breaking is done.

The method used here is based on an approach devised by Michael F. Plass and the author in 1977, subsequently generalized and improved by the same two people in 1980. A detailed discussion appears in Software—Practice and Experience 11 (1981), 1119–1184, where it is shown that the line-breaking problem can be regarded as a special case of the problem of computing the shortest path in an acyclic network. The cited paper includes numerous examples and describes the history of line breaking as it has been practiced by printers through the ages. The present implementation adds two new ideas to the algorithm of 1980: Memory space requirements are considerably reduced by using smaller records for inactive nodes than for active ones, and arithmetic overflow is avoided by using “delta distances” instead of keeping track of the total distance from the beginning of the paragraph to the current point.

862. The line_break procedure should be invoked only in horizontal mode; it leaves that mode and places its output into the current vlist of the enclosing vertical mode (or internal vertical mode). There is one explicit parameter: d is true for partial paragraphs preceding display math mode; in this case the amount of additional penalty inserted before the final line is display_widow_penalty instead of widow_penalty .

There are also a number of implicit parameters: The hlist to be broken starts at link(head) , and it is nonempty. The value of prev_graf in the enclosing semantic level tells where the paragraph should begin in the sequence of line numbers, in case hanging indentation or \parshape is in use; prev_graf is zero unless this paragraph is being continued after a displayed formula. Other implicit parameters, such as the par_shape_ptr and various penalties to use for hyphenation, etc., appear in eqtb .

After line_break has acted, it will have updated the current vlist and the value of prev_graf . Furthermore, the global variable just_box will point to the final box created by line_break , so that the width of this line can be ascertained when it is necessary to decide whether to use above_display_skip or above_display_short_skip before a displayed formula.

⟦13 Global variables⟧ += ⟦
    // the hlist_node for the last line of the new paragraph
    var just_box: pointer;
⟧

863. Since line_break is a rather lengthy procedure—sort of a small world unto itself—we must build it up little by little, somewhat more cautiously than we have done with the simpler procedures of TEX. Here is the general outline.

⟦874 Declare subprocedures for |line_break|⟧

function line_break(d: boolean) {
    label
        done,
        done1,
        done2,
        done3,
        done4,
        done5,
        done6,
        continue,
        restart;
    var ⟦910 Local variables for line breaking⟧;
    
    // this is for over/underfull box messages
    pack_begin_line = mode_line;
    ⟦864 Get ready to start line breaking⟧
    ⟦911 Find optimal breakpoints⟧
    ⟦924 Break the paragraph at the chosen breakpoints, justify the resulting lines to the correct widths, and append them to the current vertical list⟧
    ⟦913 Clean up the memory by removing the break nodes⟧
    pack_begin_line = 0;
}

⟦1466 Declare \eTeX\ procedures for use by |main_control|⟧

864. The first task is to move the list from head to temp_head and go into the enclosing semantic level. We also append the \parfillskip glue to the end of the paragraph, removing a space (or other glue node) if it was there, since spaces usually precede blank lines and instances of ‘$$’. The par_fill_skip is preceded by an infinite penalty, so it will never be considered as a potential breakpoint.

This code assumes that a glue_node and a penalty_node occupy the same number of mem words.

⟦864 Get ready to start line breaking⟧ = ⟦
    link(temp_head) = link(head)

    if (is_char_node(tail)) {
        tail_append(new_penalty(inf_penalty));
    } else if (type(tail) != glue_node) {
        tail_append(new_penalty(inf_penalty));
    } else {
        type(tail) = penalty_node;
        delete_glue_ref(glue_ptr(tail));
        flush_node_list(leader_ptr(tail));
        penalty(tail) = inf_penalty;
    }

    link(tail) = new_param_glue(par_fill_skip_code)

    last_line_fill = link(tail)

    init_cur_lang = prev_graf % 0x10000

    init_l_hyf = prev_graf div 0x400000

    init_r_hyf = (prev_graf div 0x10000) % 0x40

    pop_nest
⟧

865. When looking for optimal line breaks, TEX creates a “break node” for each break that is feasible, in the sense that there is a way to end a line at the given place without requiring any line to stretch more than a given tolerance. A break node is characterized by three things: the position of the break (which is a pointer to a glue_node , math_node , penalty_node , or disc_node ); the ordinal number of the line that will follow this breakpoint; and the fitness classification of the line that has just ended, i.e., tight_fit , decent_fit , loose_fit , or very_loose_fit .

// fitness classification for lines shrinking 0.5 to 1.0 of 
// their shrinkability
@define tight_fit => 3
// fitness classification for lines stretching 0.5 to 1.0 of 
// their stretchability
@define loose_fit => 1
// fitness classification for lines stretching more than 
// their stretchability
@define very_loose_fit => 0
// fitness classification for all other lines
@define decent_fit => 2

866. The algorithm essentially determines the best possible way to achieve each feasible combination of position, line, and fitness. Thus, it answers questions like, “What is the best way to break the opening part of the paragraph so that the fourth line is a tight line ending at such-and-such a place?” However, the fact that all lines are to be the same length after a certain point makes it possible to regard all sufficiently large line numbers as equivalent, when the looseness parameter is zero, and this makes it possible for the algorithm to save space and time.

An “active node” and a “passive node” are created in mem for each feasible breakpoint that needs to be considered. Active nodes are three words long and passive nodes are two words long. We need active nodes only for breakpoints near the place in the paragraph that is currently being examined, so they are recycled within a comparatively short time after they are created.

867. An active node for a given breakpoint contains six fields:

link points to the next node in the list of active nodes; the last active node has link == last_active .

break_node points to the passive node associated with this breakpoint.

line_number is the number of the line that follows this breakpoint.

fitness is the fitness classification of the line ending at this breakpoint.

type is either hyphenated or unhyphenated , depending on whether this breakpoint is a disc_node .

total_demerits is the minimum possible sum of demerits over all lines leading from the beginning of the paragraph to this breakpoint.

The value of link(active) points to the first active node on a linked list of all currently active nodes. This list is in order by line_number , except that nodes with line_number > easy_line may be in any order relative to each other.

// number of words in normal active nodes
@define active_node_size_normal => 3
//  very_loose_fit .. tight_fit on final line for this break
@define fitness => subtype
// pointer to the corresponding passive node
@define break_node => rlink
// line that begins at this breakpoint
@define line_number => llink
// the quantity that \TeX\ minimizes
@define total_demerits(#) => mem[# + 2].int
// the type of a normal active break node
@define unhyphenated => 0
// the type of an active node that breaks at a disc_node 
@define hyphenated => 1
// the active list ends where it begins
@define last_active => active

868.

⟦838 Initialize the special list heads and constant nodes⟧ += ⟦
    type(last_active) = hyphenated

    line_number(last_active) = max_halfword

    // the subtype is never examined by the algorithm
    subtype(last_active) = 0

869. The passive node for a given breakpoint contains only four fields:

link points to the passive node created just before this one, if any, otherwise it is null .

cur_break points to the position of this breakpoint in the horizontal list for the paragraph being broken.

prev_break points to the passive node that should precede this one in an optimal path to this breakpoint.

serial is equal to n if this passive node is the n th one created during the current pass. (This field is used only when printing out detailed statistics about the line-breaking calculations.)

There is a global variable called passive that points to the most recently created passive node. Another global variable, printed_node , is used to help print out the paragraph when detailed information about the line-breaking computation is being displayed.

// number of words in passive nodes
@define passive_node_size => 2
// in passive node, points to position of this breakpoint
@define cur_break => rlink
// points to passive node that should precede this one
@define prev_break => llink
// serial number for symbolic identification
@define serial => info
⟦13 Global variables⟧ += ⟦
    // most recent node on passive list
    var passive: pointer;

    // most recent node that has been printed
    var printed_node: pointer;

    // the number of passive nodes allocated on this pass
    var pass_number: halfword;
⟧

870. The active list also contains “delta” nodes that help the algorithm compute the badness of individual lines. Such nodes appear only between two active nodes, and they have type == delta_node . If p and r are active nodes and if q is a delta node between them, so that link(p) == q and link(q) == r , then q tells the space difference between lines in the horizontal list that start after breakpoint p and lines that start after breakpoint r . In other words, if we know the length of the line that starts after p and ends at our current position, then the corresponding length of the line that starts after r is obtained by adding the amounts in node q . A delta node contains six scaled numbers, since it must record the net change in glue stretchability with respect to all orders of infinity. The natural width difference appears in mem[q + 1].sc ; the stretch differences in units of pt, fil, fill, and filll appear in mem[q + 2 .. q + 5].sc ; and the shrink difference appears in mem[q + 6].sc . The subtype field of a delta node is not used.

// number of words in a delta node
@define delta_node_size => 7
@define delta_node => 2 //  type field in a delta node

871. As the algorithm runs, it maintains a set of six delta-like registers for the length of the line following the first active breakpoint to the current position in the given hlist. When it makes a pass through the active list, it also maintains a similar set of six registers for the length following the active breakpoint of current interest. A third set holds the length of an empty line (namely, the sum of \leftskip and \rightskip); and a fourth set is used to create new delta nodes.

When we pass a delta node we want to do operations like

for(kin1to6){cur_active_width[k]=cur_active_width[k]+mem[q+k].sc;};
and we want to do this without the overhead of for loops. The do_all_six macro makes such six-tuples convenient.

@define do_all_six(#) =>
    #(1);
    #(2);
    #(3);
    #(4);
    #(5);
    #(6)
⟦13 Global variables⟧ += ⟦
    // distance from first active node to~ cur_p 
    var active_width: array [1 .. 6] of scaled;

    // distance from current active node
    var cur_active_width: array [1 .. 6] of scaled;

    // length of an ``empty'' line
    var background: array [1 .. 6] of scaled;

    // length being computed after current break
    var break_width: array [1 .. 6] of scaled;
⟧

872. Let’s state the principles of the delta nodes more precisely and concisely, so that the following programs will be less obscure. For each legal breakpoint p in the paragraph, we define two quantities 𝛼(𝑝) and 𝛽(𝑝) such that the length of material in a line from breakpoint p to breakpoint q is 𝛾+𝛽(𝑞)𝛼(𝑝), for some fixed 𝛾. Intuitively, 𝛼(𝑝) and 𝛽(𝑞) are the total length of material from the beginning of the paragraph to a point “after” a break at p and to a point “before” a break at q ; and 𝛾 is the width of an empty line, namely the length contributed by \leftskip and \rightskip.

Suppose, for example, that the paragraph consists entirely of alternating boxes and glue skips; let the boxes have widths 𝑥1𝑥𝑛 and let the skips have widths 𝑦1𝑦𝑛, so that the paragraph can be represented by 𝑥1𝑦1𝑥𝑛𝑦𝑛. Let 𝑝𝑖 be the legal breakpoint at 𝑦𝑖; then 𝛼(𝑝𝑖)=𝑥1+𝑦1++𝑥𝑖+𝑦𝑖, and 𝛽(𝑝𝑖)=𝑥1+𝑦1++𝑥𝑖. To check this, note that the length of material from 𝑝2 to 𝑝5, say, is 𝛾+𝑥3+𝑦3+𝑥4+𝑦4+𝑥5=𝛾+𝛽(𝑝5)𝛼(𝑝2).

The quantities 𝛼, 𝛽, 𝛾 involve glue stretchability and shrinkability as well as a natural width. If we were to compute 𝛼(𝑝) and 𝛽(𝑝) for each p , we would need multiple precision arithmetic, and the multiprecise numbers would have to be kept in the active nodes. TEX avoids this problem by working entirely with relative differences or “deltas.” Suppose, for example, that the active list contains 𝑎1𝛿1𝑎2𝛿2𝑎3, where the a ’s are active breakpoints and the 𝛿’s are delta nodes. Then 𝛿1=𝛼(𝑎1)𝛼(𝑎2) and 𝛿2=𝛼(𝑎2)𝛼(𝑎3). If the line breaking algorithm is currently positioned at some other breakpoint p , the active_width array contains the value 𝛾+𝛽(𝑝)𝛼(𝑎1). If we are scanning through the list of active nodes and considering a tentative line that runs from 𝑎2 to p , say, the cur_active_width array will contain the value 𝛾+𝛽(𝑝)𝛼(𝑎2). Thus, when we move from 𝑎2 to 𝑎3, we want to add 𝛼(𝑎2)𝛼(𝑎3) to cur_active_width ; and this is just 𝛿2, which appears in the active list between 𝑎2 and 𝑎3. The background array contains 𝛾. The break_width array will be used to calculate values of new delta nodes when the active list is being updated.

873. Glue nodes in a horizontal list that is being paragraphed are not supposed to include “infinite” shrinkability; that is why the algorithm maintains four registers for stretching but only one for shrinking. If the user tries to introduce infinite shrinkability, the shrinkability will be reset to finite and an error message will be issued. A boolean variable no_shrink_error_yet prevents this error message from appearing more than once per paragraph.

@define check_shrinkage(#) =>
    if ((shrink_order(#) != normal) && (shrink(#) != 0)) {
        # = finite_shrink(#);
    }
⟦13 Global variables⟧ += ⟦
    // have we complained about infinite shrinkage?
    var no_shrink_error_yet: boolean;
⟧

874.

⟦874 Declare subprocedures for |line_break|⟧ = ⟦
    // recovers from infinite shrinkage
    function finite_shrink(p: pointer): pointer {
        var
          q: pointer; // new glue specification
        
        if (no_shrink_error_yet) {
            no_shrink_error_yet = false;
            stat!{
                if (tracing_paragraphs > 0) {
                    end_diagnostic(true);
                }
            }
            print_err(
              strpool!("Infinite glue shrinkage found in a paragraph"),
            );
            help5(
              strpool!("The paragraph just ended includes some glue that has"),
            )(
              strpool!("infinite shrinkability, e.g., `\\hskip 0pt minus 1fil'."),
            )(
              strpool!("Such glue doesn't belong there---it allows a paragraph"),
            )(
              strpool!("of any length to fit on one line. But it's safe to proceed,"),
            )(
              strpool!("since the offensive shrinkability has been made finite."),
            );
            error;
            stat!{
                if (tracing_paragraphs > 0) {
                    begin_diagnostic;
                }
            }
        }
        q = new_spec(p);
        shrink_order(q) = normal;
        delete_glue_ref(p);
        finite_shrink = q;
    }
⟧

875.

⟦864 Get ready to start line breaking⟧ += ⟦
    no_shrink_error_yet = true

    check_shrinkage(left_skip)

    check_shrinkage(right_skip)

    q = left_skip

    r = right_skip

    background[1] = width(q) + width(r)

    background[2] = 0

    background[3] = 0

    background[4] = 0

    background[5] = 0

    background[2 + stretch_order(q)] = stretch(q)

    background[2 + stretch_order(r)] = 
        background[2 + stretch_order(r)]
        + stretch(r)

    background[6] = shrink(q) + shrink(r)

    ⟦1654 Check for special treatment of last line of paragraph⟧

876. A pointer variable cur_p runs through the given horizontal list as we look for breakpoints. This variable is global, since it is used both by line_break and by its subprocedure try_break .

Another global variable called threshold is used to determine the feasibility of individual lines: Breakpoints are feasible if there is a way to reach them without creating lines whose badness exceeds threshold . (The badness is compared to threshold before penalties are added, so that penalty values do not affect the feasibility of breakpoints, except that no break is allowed when the penalty is 10000 or more.) If threshold is 10000 or more, all legal breaks are considered feasible, since the badness function specified above never returns a value greater than 10000.

Up to three passes might be made through the paragraph in an attempt to find at least one set of feasible breakpoints. On the first pass, we have threshold == pretolerance and second_pass == final_pass == false . If this pass fails to find a feasible solution, threshold is set to tolerance , second_pass is set true , and an attempt is made to hyphenate as many words as possible. If that fails too, we add emergency_stretch to the background stretchability and set final_pass == true .

⟦13 Global variables⟧ += ⟦
    // the current breakpoint under consideration
    var cur_p: pointer;

    // is this our second attempt to break this paragraph?
    var second_pass: boolean;

    // is this our final attempt to break this paragraph?
    var final_pass: boolean;

    // maximum badness on feasible lines
    var threshold: integer;
⟧

877. The heart of the line-breaking procedure is ‘try_break ’, a subroutine that tests if the current breakpoint cur_p is feasible, by running through the active list to see what lines of text can be made from active nodes to cur_p . If feasible breaks are possible, new break nodes are created. If cur_p is too far from an active node, that node is deactivated.

The parameter pi to try_break is the penalty associated with a break at cur_p ; we have pi == eject_penalty if the break is forced, and pi == inf_penalty if the break is illegal.

The other parameter, break_type , is set to hyphenated or unhyphenated , depending on whether or not the current break is at a disc_node . The end of a paragraph is also regarded as ‘hyphenated ’; this case is distinguishable by the condition cur_p == null .

@define copy_to_cur_active(#) =>
    cur_active_width[#] = active_width[#]
// go here when node r should be deactivated
@define deactivate => 60
// skipable nodes at the margins during character protrusion
@define cp_skipable(#) =>
    (!
        is_char_node(#)
        && (
            (type(#) == ins_node)
            || (type(#) == mark_node)
            || (type(#) == adjust_node)
            || (type(#) == penalty_node)
            || (
                (type(#) == disc_node)
                && (pre_break(#) == null)
                && (post_break(#) == null)
                && (replace_count(#) == 0)
            ) // an empty disc_node 
            || ((type(#) == math_node) && (width(#) == 0))
            || (
                (type(#) == kern_node)
                && (
                    (width(#) == 0)
                    || (subtype(#) == normal)
                )
            )
            || (
                (type(#) == glue_node)
                && (glue_ptr(#) == zero_glue)
            )
            || (
                (type(#) == hlist_node)
                && (width(#) == 0)
                && (height(#) == 0)
                && (depth(#) == 0) && (list_ptr(#) == null)
            )
        )
    )
⟦874 Declare subprocedures for |line_break|⟧ += ⟦
    function push_node(p: pointer) {
        if (hlist_stack_level > max_hlist_stack) {
            pdf_error(
              strpool!("push_node"),
              strpool!("stack overflow"),
            );
        }
        hlist_stack[hlist_stack_level] = p;
        hlist_stack_level = hlist_stack_level + 1;
    }

    function pop_node(): pointer {
        hlist_stack_level = hlist_stack_level - 1;
        // would point to some bug
        if (hlist_stack_level < 0) {
            pdf_error(
              strpool!("pop_node"),
              strpool!("stack underflow (internal error)"),
            );
        }
        pop_node = hlist_stack[hlist_stack_level];
    }

    // searches left to right from list head l , returns 1st 
    // non-skipable item
    function find_protchar_left(
      l: pointer,
      d: boolean,
    ): pointer {
        var t: pointer, run: boolean;
        
        if (
            (link(l) != null)
            && (type(l) == hlist_node)
            && (width(l) == 0)
            && (height(l) == 0)
            && (depth(l) == 0) && (list_ptr(l) == null)
        ) {
            // for paragraph start with \.{\\parindent = 
            // 0pt}
            l = link(l);
        } else if (d) {
            while (
                (link(l) != null)
                && (!(is_char_node(l) || non_discardable(l)))
            ) {
                // std.\ discardables at line break, \TeX 
                // book, p 95
                l = link(l);
            }
        }
        hlist_stack_level = 0;
        run = true;
        repeat {
            t = l;
            while (
                run
                && (type(l) == hlist_node)
                && (list_ptr(l) != null)
            ) {
                push_node(l);
                l = list_ptr(l);
            }
            while (run && cp_skipable(l)) {
                while (
                    (link(l) == null)
                    && (hlist_stack_level > 0)
                ) {
                    // don't visit this node again
                    l = pop_node;
                }
                if (link(l) != null) {
                    l = link(l);
                } else if (hlist_stack_level == 0) {
                    run = false;
                }
            }
        } until (t == l);
        find_protchar_left = l;
    }

    // searches right to left from list tail r to head l , 
    // returns 1st non-skipable item
    function find_protchar_right(l, r: pointer): pointer {
        var t: pointer, run: boolean;
        
        find_protchar_right = null;
        if (r == null) {
            return;
        }
        hlist_stack_level = 0;
        run = true;
        repeat {
            t = r;
            while (
                run
                && (type(r) == hlist_node)
                && (list_ptr(r) != null)
            ) {
                push_node(l);
                push_node(r);
                l = list_ptr(r);
                r = l;
                while (link(r) != null) {
                    r = link(r);
                }
            }
            while (run && cp_skipable(r)) {
                while ((r == l) && (hlist_stack_level > 0)) {
                    // don't visit this node again
                    r = pop_node;
                    l = pop_node;
                }
                if ((r != l) && (r != null)) {
                    r = prev_rightmost(l, r);
                } else if (
                    (r == l)
                    && (hlist_stack_level == 0)
                ) {
                    run = false;
                }
            }
        } until (t == r);
        find_protchar_right = r;
    }

    // returns the total width of character protrusion of a 
    // line; cur_break ( break_node ( q ) ) and p is the 
    // leftmost resp. rightmost node in the horizontal list 
    // representing the actual line
    function total_pw(q, p: pointer): scaled {
        var l, r: pointer, n: integer;
        
        if (break_node(q) == null) {
            l = first_p;
        } else {
            l = cur_break(break_node(q));
        }
        // get link ( r ) == p 
        // let's look at the right margin first
        r = prev_rightmost(global_prev_p, p);
        // a disc_node with non-empty pre_break , protrude 
        // the last char of pre_break 
        if (
            (p != null)
            && (type(p) == disc_node)
            && (pre_break(p) != null)
        ) {
            r = pre_break(p);
            while (link(r) != null) {
                r = link(r);
            }
        } else {
            // now the left margin
            r = find_protchar_right(l, r);
        }
        if ((l != null) && (type(l) == disc_node)) {
            if (post_break(l) != null) {
                // protrude the first char
                l = post_break(l);
                goto done;
            } else {
                // discard replace_count ( l ) nodes
                n = replace_count(l);
                l = link(l);
                while (n > 0) {
                    if (link(l) != null) {
                        l = link(l);
                    }
                    decr(n);
                }
            }
        }
        l = find_protchar_left(l, true);
      done:
        total_pw = left_pw(l) + right_pw(r);
    }

    function try_break(
      pi: integer,
      break_type: small_number,
    ) {
        label
            exit,
            done,
            done1,
            continue,
            deactivate,
            found,
            not_found;
        var
          r: pointer, // runs through the active list
          prev_r: pointer, // stays a step behind r 
          old_l: halfword, // maximum line number in current 
          // equivalence class of lines
          no_break_yet: boolean, // have we found a feasible 
          // break at cur_p ?
          ⟦878 Other local variables for |try_break|⟧;
        
        ⟦879 Make sure that |pi| is in the proper range⟧
        no_break_yet = true;
        prev_r = active;
        old_l = 0;
        do_all_six(copy_to_cur_active);
        loop {
          continue:
            r = link(prev_r);
            ⟦880 If node |r| is of type |delta_node|, update |cur_active_width|, set |prev_r| and |prev_prev_r|, then |goto continue|⟧
            ⟦883 If a line number class has ended, create new active nodes for the best feasible breaks in that class; then |return| if |r=last_active|, otherwise compute the new |line_width|⟧
            ⟦899 Consider the demerits for a line from |r| to |cur_p|; deactivate node |r| if it should no longer be active; then |goto continue| if a line from |r| to |cur_p| is infeasible, otherwise record a new feasible break⟧
        }
      exit:
        stat!{
            ⟦906 Update the value of |printed_node| for symbolic displays⟧
        }
    }
⟧

878.

⟦878 Other local variables for |try_break|⟧ = ⟦
    // a step behind prev_r , if type ( prev_r ) == 
    // delta_node 
    var prev_prev_r: pointer;

    // runs through nodes ahead of cur_p 
    var s: pointer;

    // points to a new node being created
    var q: pointer;

    // points to a glue specification or a node ahead of 
    // cur_p 
    var v: pointer;

    // node count, if cur_p is a discretionary node
    var t: integer;

    // used in character width calculation
    var f: internal_font_number;

    // line number of current active node
    var l: halfword;

    // should node r remain in the active list?
    var node_r_stays_active: boolean;

    // the current line will be justified to this width
    var line_width: scaled;

    // possible fitness class of test line
    var fit_class: very_loose_fit .. tight_fit;

    // badness of test line
    var b: halfword;

    // demerits of test line
    var d: integer;

    // has d been forced to zero?
    var artificial_demerits: boolean;

    // temporarily holds value of link ( cur_p ) 
    var save_link: pointer;

    // used in badness calculations
    var shortfall: scaled;
⟧

879.

⟦879 Make sure that |pi| is in the proper range⟧ = ⟦
    if (abs(pi) >= inf_penalty) {
        if (pi > 0) {
            // this breakpoint is inhibited by infinite 
            // penalty
            return;
        } else {
            // this breakpoint will be forced
            pi = eject_penalty;
        }
    }
⟧

880. The following code uses the fact that type(last_active) != delta_node .

@define update_width(#) =>
    cur_active_width[#] = 
        cur_active_width[#]
        + mem[r + #].sc
⟦880 If node |r| is of type |delta_node|, update |cur_active_width|, set |prev_r| and |prev_prev_r|, then |goto continue|⟧ = ⟦
    if (type(r) == delta_node) {
        do_all_six(update_width);
        prev_prev_r = prev_r;
        prev_r = r;
        goto continue;
    }
⟧

881. As we consider various ways to end a line at cur_p , in a given line number class, we keep track of the best total demerits known, in an array with one entry for each of the fitness classifications. For example, minimal_demerits[tight_fit] contains the fewest total demerits of feasible line breaks ending at cur_p with a tight_fit line; best_place[tight_fit] points to the passive node for the break before cur_p that achieves such an optimum; and best_pl_line[tight_fit] is the line_number field in the active node corresponding to best_place[tight_fit] . When no feasible break sequence is known, the minimal_demerits entries will be equal to awful_bad , which is 2301. Another variable, minimum_demerits , keeps track of the smallest value in the minimal_demerits array.

// more than a billion demerits
@define awful_bad => 0x3fffffff
⟦13 Global variables⟧ += ⟦
    // best total demerits known for current line class and 
    // position, given the fitness
    var minimal_demerits: array [
      very_loose_fit .. tight_fit,
    ] of integer;

    // best total demerits known for current line class and 
    // position
    var minimum_demerits: integer;

    // how to achieve minimal_demerits 
    var best_place: array [very_loose_fit .. tight_fit] of
      pointer;

    // corresponding line number
    var best_pl_line: array [very_loose_fit .. tight_fit] of
      halfword;
⟧

882.

⟦864 Get ready to start line breaking⟧ += ⟦
    minimum_demerits = awful_bad

    minimal_demerits[tight_fit] = awful_bad

    minimal_demerits[decent_fit] = awful_bad

    minimal_demerits[loose_fit] = awful_bad

    minimal_demerits[very_loose_fit] = awful_bad
⟧

883. The first part of the following code is part of TEX’s inner loop, so we don’t want to waste any time. The current active node, namely node r , contains the line number that will be considered next. At the end of the list we have arranged the data structure so that r == last_active and line_number(last_active) > old_l .

⟦883 If a line number class has ended, create new active nodes for the best feasible breaks in that class; then |return| if |r=last_active|, otherwise compute the new |line_width|⟧ = ⟦
    {
        l = line_number(r);
        if (l > old_l) {
            // now we are no longer in the inner loop
            if (
                (minimum_demerits < awful_bad)
                && (
                    (old_l != easy_line)
                    || (r == last_active)
                )
            ) {
                ⟦884 Create new active nodes for the best feasible breaks just found⟧
            }
            if (r == last_active) {
                return;
            }
            ⟦898 Compute the new line width⟧
        }
    }
⟧

884. It is not necessary to create new active nodes having minimal_demerits greater than minimum_demerits + abs(adj_demerits) , since such active nodes will never be chosen in the final paragraph breaks. This observation allows us to omit a substantial number of feasible breakpoints from further consideration.

⟦884 Create new active nodes for the best feasible breaks just found⟧ = ⟦
    {
        if (no_break_yet) {
            ⟦885 Compute the values of |break_width|⟧
        }
        ⟦891 Insert a delta node to prepare for breaks at |cur_p|⟧
        if (
            abs(adj_demerits)
            >= awful_bad - minimum_demerits
        ) {
            minimum_demerits = awful_bad - 1;
        } else {
            minimum_demerits = 
                minimum_demerits
                + abs(adj_demerits)
            ;
        }
        for (fit_class in very_loose_fit to tight_fit) {
            if (
                minimal_demerits[fit_class]
                <= minimum_demerits
            ) {
                ⟦893 Insert a new active node from |best_place[fit_class]| to |cur_p|⟧
            }
            minimal_demerits[fit_class] = awful_bad;
        }
        minimum_demerits = awful_bad;
        ⟦892 Insert a delta node to prepare for the next active node⟧
    }
⟧

885. When we insert a new active node for a break at cur_p , suppose this new node is to be placed just before active node a ; then we essentially want to insert ‘ 𝛿𝑐𝑢𝑟_𝑝𝛿’ before a , where 𝛿=𝛼(𝑎)𝛼(𝑐𝑢𝑟_𝑝) and 𝛿=𝛼(𝑐𝑢𝑟_𝑝)𝛼(𝑎) in the notation explained above. The cur_active_width array now holds 𝛾+𝛽(𝑐𝑢𝑟_𝑝)𝛼(𝑎); so 𝛿 can be obtained by subtracting cur_active_width from the quantity 𝛾+𝛽(𝑐𝑢𝑟_𝑝)𝛼(𝑐𝑢𝑟_𝑝). The latter quantity can be regarded as the length of a line “from cur_p to cur_p ”; we call it the break_width at cur_p .

The break_width is usually negative, since it consists of the background (which is normally zero) minus the width of nodes following cur_p that are eliminated after a break. If, for example, node cur_p is a glue node, the width of this glue is subtracted from the background; and we also look ahead to eliminate all subsequent glue and penalty and kern and math nodes, subtracting their widths as well.

Kern nodes do not disappear at a line break unless they are explicit or space_adjustment .

@define set_break_width_to_background(#) =>
    break_width[#] = background[#]
⟦885 Compute the values of |break_width|⟧ = ⟦
    {
        no_break_yet = false;
        do_all_six(set_break_width_to_background);
        s = cur_p;
        if (break_type > unhyphenated) {
            if (cur_p != null) {
                ⟦888 Compute the discretionary |break_width| values⟧
            }
        }
        while (s != null) {
            if (is_char_node(s)) {
                goto done;
            }
            case type(s) {
              glue_node:
                ⟦886 Subtract glue from |break_width|⟧
              penalty_node:
                do_nothing;
              math_node:
                break_width[1] = break_width[1] - width(s);
              kern_node:
                if (subtype(s) != explicit) {
                    goto done;
                } else {
                    break_width[1] = 
                        break_width[1]
                        - width(s)
                    ;
                }
              othercases:
                goto done;
            }
            s = link(s);
        }
      done:
    }
⟧

886.

⟦886 Subtract glue from |break_width|⟧ = ⟦
    {
        v = glue_ptr(s);
        break_width[1] = break_width[1] - width(v);
        break_width[2 + stretch_order(v)] = 
            break_width[2 + stretch_order(v)]
            - stretch(v)
        ;
        break_width[6] = break_width[6] - shrink(v);
    }
⟧

887. When cur_p is a discretionary break, the length of a line “from cur_p to cur_p ” has to be defined properly so that the other calculations work out. Suppose that the pre-break text at cur_p has length 𝑙0, the post-break text has length 𝑙1, and the replacement text has length l . Suppose also that q is the node following the replacement text. Then length of a line from cur_p to q will be computed as 𝛾+𝛽(𝑞)𝛼(𝑐𝑢𝑟_𝑝), where 𝛽(𝑞)=𝛽(𝑐𝑢𝑟_𝑝)𝑙0+𝑙. The actual length will be the background plus 𝑙1, so the length from cur_p to cur_p should be 𝛾+𝑙0+𝑙1𝑙. If the post-break text of the discretionary is empty, a break may also discard q ; in that unusual case we subtract the length of q and any other nodes that will be discarded after the discretionary break.

The value of 𝑙0 need not be computed, since line_break will put it into the global variable disc_width before calling try_break .

⟦13 Global variables⟧ += ⟦
    // the length of discretionary material preceding a 
    // break
    var disc_width: scaled;
⟧

888.

⟦888 Compute the discretionary |break_width| values⟧ = ⟦
    {
        t = replace_count(cur_p);
        v = cur_p;
        s = post_break(cur_p);
        while (t > 0) {
            decr(t);
            v = link(v);
            ⟦889 Subtract the width of node |v| from |break_width|⟧
        }
        while (s != null) {
            ⟦890 Add the width of node |s| to |break_width|⟧
            s = link(s);
        }
        break_width[1] = break_width[1] + disc_width;
        if (post_break(cur_p) == null) {
            // nodes may be discardable after the break
            s = link(v);
        }
    }
⟧

889. Replacement texts and discretionary texts are supposed to contain only character nodes, kern nodes, ligature nodes, and box or rule nodes.

⟦889 Subtract the width of node |v| from |break_width|⟧ = ⟦
    if (is_char_node(v)) {
        f = font(v);
        break_width[1] = 
            break_width[1]
            - char_width(f)(char_info(f)(character(v)))
        ;
    } else {
        case type(v) {
          ligature_node:
            f = font(lig_char(v));
            xtx_ligature_present = true;
            break_width[1] = 
                break_width[1]
                - char_width(f)(
                  char_info(f)(character(lig_char(v))),
                )
            ;
          hlist_node, vlist_node, rule_node, kern_node:
            break_width[1] = break_width[1] - width(v);
          whatsit_node:
            if (
                (is_native_word_subtype(v))
                || (subtype(v) == glyph_node)
                || (subtype(v) == pic_node)
                || (subtype(v) == pdf_node)
            ) {
                break_width[1] = break_width[1] - width(v);
            } else {
                confusion(strpool!("disc1a"));
            }
          othercases:
            confusion(strpool!("disc1"));
        }
    }
⟧

890.

⟦890 Add the width of node |s| to |break_width|⟧ = ⟦
    if (is_char_node(s)) {
        f = font(s);
        break_width[1] = 
            break_width[1]
            + char_width(f)(char_info(f)(character(s)))
        ;
    } else {
        case type(s) {
          ligature_node:
            f = font(lig_char(s));
            xtx_ligature_present = true;
            break_width[1] = 
                break_width[1]
                + char_width(f)(
                  char_info(f)(character(lig_char(s))),
                )
            ;
          hlist_node, vlist_node, rule_node, kern_node:
            break_width[1] = break_width[1] + width(s);
          whatsit_node:
            if (
                (is_native_word_subtype(s))
                || (subtype(s) == glyph_node)
                || (subtype(s) == pic_node)
                || (subtype(s) == pdf_node)
            ) {
                break_width[1] = break_width[1] + width(s);
            } else {
                confusion(strpool!("disc2a"));
            }
          othercases:
            confusion(strpool!("disc2"));
        }
    }
⟧

891. We use the fact that type(active) != delta_node .

@define convert_to_break_width(#) =>
    mem[prev_r + #].sc = 
        mem[prev_r + #].sc
        - cur_active_width[#] + break_width[#]
@define store_break_width(#) =>
    active_width[#] = break_width[#]
@define new_delta_to_break_width(#) =>
    mem[q + #].sc = break_width[#] - cur_active_width[#]
⟦891 Insert a delta node to prepare for breaks at |cur_p|⟧ = ⟦
    // modify an existing delta node
    if (type(prev_r) == delta_node) {
        do_all_six(convert_to_break_width);
    } else // no delta node needed at the beginning
    if (prev_r == active) {
        do_all_six(store_break_width);
    } else {
        q = get_node(delta_node_size);
        link(q) = r;
        type(q) = delta_node;
        // the subtype is not used
        subtype(q) = 0;
        do_all_six(new_delta_to_break_width);
        link(prev_r) = q;
        prev_prev_r = prev_r;
        prev_r = q;
    }
⟧

892. When the following code is performed, we will have just inserted at least one active node before r , so type(prev_r) != delta_node .

@define new_delta_from_break_width(#) =>
    mem[q + #].sc = cur_active_width[#] - break_width[#]
⟦892 Insert a delta node to prepare for the next active node⟧ = ⟦
    if (r != last_active) {
        q = get_node(delta_node_size);
        link(q) = r;
        type(q) = delta_node;
        // the subtype is not used
        subtype(q) = 0;
        do_all_six(new_delta_from_break_width);
        link(prev_r) = q;
        prev_prev_r = prev_r;
        prev_r = q;
    }
⟧

893. When we create an active node, we also create the corresponding passive node.

⟦893 Insert a new active node from |best_place[fit_class]| to |cur_p|⟧ = ⟦
    {
        q = get_node(passive_node_size);
        link(q) = passive;
        passive = q;
        cur_break(q) = cur_p;
        stat!{
            incr(pass_number);
            serial(q) = pass_number;
        }
        prev_break(q) = best_place[fit_class];
        q = get_node(active_node_size);
        break_node(q) = passive;
        line_number(q) = best_pl_line[fit_class] + 1;
        fitness(q) = fit_class;
        type(q) = break_type;
        total_demerits(q) = minimal_demerits[fit_class];
        if (do_last_line_fit) {
            ⟦1662 Store \(a)additional data in the new active node⟧
        }
        link(q) = r;
        link(prev_r) = q;
        prev_r = q;
        stat!{
            if (tracing_paragraphs > 0) {
                ⟦894 Print a symbolic description of the new break node⟧
            }
        }
    }
⟧

894.

⟦894 Print a symbolic description of the new break node⟧ = ⟦
    {
        print_nl(strpool!("@@"));
        print_int(serial(passive));
        print(strpool!(": line "));
        print_int(line_number(q) - 1);
        print_char(ord!("."));
        print_int(fit_class);
        if (break_type == hyphenated) {
            print_char(ord!("-"));
        }
        print(strpool!(" t="));
        print_int(total_demerits(q));
        if (do_last_line_fit) {
            ⟦1663 Print additional data in the new active node⟧
        }
        print(strpool!(" -> @@"));
        if (prev_break(passive) == null) {
            print_char(ord!("0"));
        } else {
            print_int(serial(prev_break(passive)));
        }
    }
⟧

895. The length of lines depends on whether the user has specified \parshape or \hangindent. If par_shape_ptr is not null, it points to a (2𝑛+1)-word record in mem , where the info in the first word contains the value of n , and the other 2𝑛 words contain the left margins and line lengths for the first n lines of the paragraph; the specifications for line n apply to all subsequent lines. If par_shape_ptr == null , the shape of the paragraph depends on the value of n == hang_after ; if n >= 0 , hanging indentation takes place on lines n + 1 , n + 2 , …, otherwise it takes place on lines 1, …, |𝑛|. When hanging indentation is active, the left margin is hang_indent , if hang_indent >= 0 , else it is 0; the line length is 𝑠𝑖𝑧𝑒|𝑎𝑛𝑔_𝑖𝑛𝑑𝑒𝑛𝑡|. The normal setting is par_shape_ptr == null , hang_after == 1 , and hang_indent == 0 . Note that if hang_indent == 0 , the value of hang_after is irrelevant.

⟦13 Global variables⟧ += ⟦
    // line numbers > easy_line are equivalent in break 
    // nodes
    var easy_line: halfword;

    // line numbers > last_special_line all have the same 
    // width
    var last_special_line: halfword;

    // the width of all lines <= last_special_line , if no 
    // \.{\\parshape} has been specified
    var first_width: scaled;

    // the width of all lines > last_special_line 
    var second_width: scaled;

    // left margin to go with first_width 
    var first_indent: scaled;

    // left margin to go with second_width 
    var second_indent: scaled;
⟧

896. We compute the values of easy_line and the other local variables relating to line length when the line_break procedure is initializing itself.

⟦864 Get ready to start line breaking⟧ += ⟦
    if (par_shape_ptr == null) {
        if (hang_indent == 0) {
            last_special_line = 0;
            second_width = hsize;
            second_indent = 0;
        } else {
            ⟦897 Set line length parameters in preparation for hanging indentation⟧
        }
    } else {
        last_special_line = info(par_shape_ptr) - 1;
        second_width = mem[
          par_shape_ptr + 2 * (last_special_line + 1),
        ].sc;
        second_indent = mem[
          par_shape_ptr + 2 * last_special_line + 1,
        ].sc;
    }

    if (looseness == 0) {
        easy_line = last_special_line;
    } else {
        easy_line = max_halfword;
    }
⟧

897.

⟦897 Set line length parameters in preparation for hanging indentation⟧ = ⟦
    {
        last_special_line = abs(hang_after);
        if (hang_after < 0) {
            first_width = hsize - abs(hang_indent);
            if (hang_indent >= 0) {
                first_indent = hang_indent;
            } else {
                first_indent = 0;
            }
            second_width = hsize;
            second_indent = 0;
        } else {
            first_width = hsize;
            first_indent = 0;
            second_width = hsize - abs(hang_indent);
            if (hang_indent >= 0) {
                second_indent = hang_indent;
            } else {
                second_indent = 0;
            }
        }
    }
⟧

898. When we come to the following code, we have just encountered the first active node r whose line_number field contains l . Thus we want to compute the length of the 𝑙th line of the current paragraph. Furthermore, we want to set old_l to the last number in the class of line numbers equivalent to l .

⟦898 Compute the new line width⟧ = ⟦
    if (l > easy_line) {
        line_width = second_width;
        old_l = max_halfword - 1;
    } else {
        old_l = l;
        if (l > last_special_line) {
            line_width = second_width;
        } else if (par_shape_ptr == null) {
            line_width = first_width;
        } else {
            line_width = mem[par_shape_ptr + 2 * l].sc;
        }
    }
⟧

899. The remaining part of try_break deals with the calculation of demerits for a break from r to cur_p .

The first thing to do is calculate the badness, b . This value will always be between zero and inf_bad + 1 ; the latter value occurs only in the case of lines from r to cur_p that cannot shrink enough to fit the necessary width. In such cases, node r will be deactivated. We also deactivate node r when a break at cur_p is forced, since future breaks must go through a forced break.

⟦899 Consider the demerits for a line from |r| to |cur_p|; deactivate node |r| if it should no longer be active; then |goto continue| if a line from |r| to |cur_p| is infeasible, otherwise record a new feasible break⟧ = ⟦
    {
        artificial_demerits = false;
        // we're this much too short
        shortfall = line_width - cur_active_width[1];
        if (XeTeX_protrude_chars > 1) {
            shortfall = shortfall + total_pw(r, cur_p);
        }
        if (shortfall > 0) {
            ⟦900 Set the value of |b| to the badness for stretching the line, and compute the corresponding |fit_class|⟧
        } else {
            ⟦901 Set the value of |b| to the badness for shrinking the line, and compute the corresponding |fit_class|⟧
        }
        if (do_last_line_fit) {
            ⟦1660 Adjust \(t)the additional data for last line⟧
        }
      found:
        if ((b > inf_bad) || (pi == eject_penalty)) {
            ⟦902 Prepare to deactivate node~|r|, and |goto deactivate| unless there is a reason to consider lines of text from |r| to |cur_p|⟧
        } else {
            prev_r = r;
            if (b > threshold) {
                goto continue;
            }
            node_r_stays_active = true;
        }
        ⟦903 Record a new feasible break⟧
        if (node_r_stays_active) {
            //  prev_r has been set to r 
            goto continue;
        }
      deactivate:
        ⟦908 Deactivate node |r|⟧
    }
⟧

900. When a line must stretch, the available stretchability can be found in the subarray cur_active_width[2 .. 5] , in units of points, fil, fill, and filll.

The present section is part of TEX’s inner loop, and it is most often performed when the badness is infinite; therefore it is worth while to make a quick test for large width excess and small stretchability, before calling the badness subroutine.

⟦900 Set the value of |b| to the badness for stretching the line, and compute the corresponding |fit_class|⟧ = ⟦
    if (
        (cur_active_width[3] != 0)
        || (cur_active_width[4] != 0)
        || (cur_active_width[5] != 0)
    ) {
        if (do_last_line_fit) {
            // the last line of a paragraph
            if (cur_p == null) {
                ⟦1657 Perform computations for last line and |goto found|⟧
            }
            shortfall = 0;
        }
        b = 0;
        // infinite stretch
        fit_class = decent_fit;
    } else {
        if (shortfall > 7230584) {
            if (cur_active_width[2] < 1663497) {
                b = inf_bad;
                fit_class = very_loose_fit;
                goto done1;
            }
        }
        b = badness(shortfall, cur_active_width[2]);
        if (b > 12) {
            if (b > 99) {
                fit_class = very_loose_fit;
            } else {
                fit_class = loose_fit;
            }
        } else {
            fit_class = decent_fit;
        }
      done1:
    }
⟧

901. Shrinkability is never infinite in a paragraph; we can shrink the line from r to cur_p by at most cur_active_width[6] .

⟦901 Set the value of |b| to the badness for shrinking the line, and compute the corresponding |fit_class|⟧ = ⟦
    {
        if (-shortfall > cur_active_width[6]) {
            b = inf_bad + 1;
        } else {
            b = badness(-shortfall, cur_active_width[6]);
        }
        if (b > 12) {
            fit_class = tight_fit;
        } else {
            fit_class = decent_fit;
        }
    }
⟧

902. During the final pass, we dare not lose all active nodes, lest we lose touch with the line breaks already found. The code shown here makes sure that such a catastrophe does not happen, by permitting overfull boxes as a last resort. This particular part of TEX was a source of several subtle bugs before the correct program logic was finally discovered; readers who seek to “improve” TEX should therefore think thrice before daring to make any changes here.

⟦902 Prepare to deactivate node~|r|, and |goto deactivate| unless there is a reason to consider lines of text from |r| to |cur_p|⟧ = ⟦
    {
        if (
            final_pass
            && (minimum_demerits == awful_bad)
            && (link(r) == last_active)
            && (prev_r == active)
        ) {
            // set demerits zero, this break is forced
            artificial_demerits = true;
        } else if (b > threshold) {
            goto deactivate;
        }
        node_r_stays_active = false;
    }
⟧

903. When we get to this part of the code, the line from r to cur_p is feasible, its badness is b , and its fitness classification is fit_class . We don’t want to make an active node for this break yet, but we will compute the total demerits and record them in the minimal_demerits array, if such a break is the current champion among all ways to get to cur_p in a given line-number class and fitness class.

⟦903 Record a new feasible break⟧ = ⟦
    if (artificial_demerits) {
        d = 0;
    } else {
        ⟦907 Compute the demerits, |d|, from |r| to |cur_p|⟧
    }

    stat!{
        if (tracing_paragraphs > 0) {
            ⟦904 Print a symbolic description of this feasible break⟧
        }
    }

    // this is the minimum total demerits from the beginning 
    // to cur_p via r 
    d = d + total_demerits(r)

    if (d <= minimal_demerits[fit_class]) {
        minimal_demerits[fit_class] = d;
        best_place[fit_class] = break_node(r);
        best_pl_line[fit_class] = l;
        if (do_last_line_fit) {
            ⟦1661 Store \(a)additional data for this feasible break⟧
        }
        if (d < minimum_demerits) {
            minimum_demerits = d;
        }
    }
⟧

904.

⟦904 Print a symbolic description of this feasible break⟧ = ⟦
    {
        if (printed_node != cur_p) {
            ⟦905 Print the list between |printed_node| and |cur_p|, then set |printed_node:=cur_p|⟧
        }
        print_nl(ord!("@"));
        if (cur_p == null) {
            print_esc(strpool!("par"));
        } else if (type(cur_p) != glue_node) {
            if (type(cur_p) == penalty_node) {
                print_esc(strpool!("penalty"));
            } else if (type(cur_p) == disc_node) {
                print_esc(strpool!("discretionary"));
            } else if (type(cur_p) == kern_node) {
                print_esc(strpool!("kern"));
            } else {
                print_esc(strpool!("math"));
            }
        }
        print(strpool!(" via @@"));
        if (break_node(r) == null) {
            print_char(ord!("0"));
        } else {
            print_int(serial(break_node(r)));
        }
        print(strpool!(" b="));
        if (b > inf_bad) {
            print_char(ord!("*"));
        } else {
            print_int(b);
        }
        print(strpool!(" p="));
        print_int(pi);
        print(strpool!(" d="));
        if (artificial_demerits) {
            print_char(ord!("*"));
        } else {
            print_int(d);
        }
    }
⟧

905.

⟦905 Print the list between |printed_node| and |cur_p|, then set |printed_node:=cur_p|⟧ = ⟦
    {
        print_nl(strpool!(""));
        if (cur_p == null) {
            short_display(link(printed_node));
        } else {
            save_link = link(cur_p);
            link(cur_p) = null;
            print_nl(strpool!(""));
            short_display(link(printed_node));
            link(cur_p) = save_link;
        }
        printed_node = cur_p;
    }
⟧

906. When the data for a discretionary break is being displayed, we will have printed the pre_break and post_break lists; we want to skip over the third list, so that the discretionary data will not appear twice. The following code is performed at the very end of try_break .

⟦906 Update the value of |printed_node| for symbolic displays⟧ = ⟦
    if (cur_p == printed_node) {
        if (cur_p != null) {
            if (type(cur_p) == disc_node) {
                t = replace_count(cur_p);
                while (t > 0) {
                    decr(t);
                    printed_node = link(printed_node);
                }
            }
        }
    }
⟧

907.

⟦907 Compute the demerits, |d|, from |r| to |cur_p|⟧ = ⟦
    {
        d = line_penalty + b;
        if (abs(d) >= 10000) {
            d = 100000000;
        } else {
            d = d * d;
        }
        if (pi != 0) {
            if (pi > 0) {
                d = d + pi * pi;
            } else if (pi > eject_penalty) {
                d = d - pi * pi;
            }
        }
        if (
            (break_type == hyphenated)
            && (type(r) == hyphenated)
        ) {
            if (cur_p != null) {
                d = d + double_hyphen_demerits;
            } else {
                d = d + final_hyphen_demerits;
            }
        }
        if (abs(fit_class - fitness(r)) > 1) {
            d = d + adj_demerits;
        }
    }
⟧

908. When an active node disappears, we must delete an adjacent delta node if the active node was at the beginning or the end of the active list, or if it was surrounded by delta nodes. We also must preserve the property that cur_active_width represents the length of material from link(prev_r) to cur_p .

@define combine_two_deltas(#) =>
    mem[prev_r + #].sc = mem[prev_r + #].sc + mem[r + #].sc
@define downdate_width(#) =>
    cur_active_width[#] = 
        cur_active_width[#]
        - mem[prev_r + #].sc
⟦908 Deactivate node |r|⟧ = ⟦
    link(prev_r) = link(r)

    free_node(r, active_node_size)

    if (prev_r == active) {
        ⟦909 Update the active widths, since the first active node has been deleted⟧
    } else if (type(prev_r) == delta_node) {
        r = link(prev_r);
        if (r == last_active) {
            do_all_six(downdate_width);
            link(prev_prev_r) = last_active;
            free_node(prev_r, delta_node_size);
            prev_r = prev_prev_r;
        } else if (type(r) == delta_node) {
            do_all_six(update_width);
            do_all_six(combine_two_deltas);
            link(prev_r) = link(r);
            free_node(r, delta_node_size);
        }
    }
⟧

909. The following code uses the fact that type(last_active) != delta_node . If the active list has just become empty, we do not need to update the active_width array, since it will be initialized when an active node is next inserted.

@define update_active(#) =>
    active_width[#] = active_width[#] + mem[r + #].sc
⟦909 Update the active widths, since the first active node has been deleted⟧ = ⟦
    {
        r = link(active);
        if (type(r) == delta_node) {
            do_all_six(update_active);
            do_all_six(copy_to_cur_active);
            link(active) = link(r);
            free_node(r, delta_node_size);
        }
    }
⟧

910. [39] Breaking paragraphs into lines, continued. So far we have gotten a little way into the line_break routine, having covered its important try_break subroutine. Now let’s consider the rest of the process.

The main loop of line_break traverses the given hlist, starting at link(temp_head) , and calls try_break at each legal breakpoint. A variable called auto_breaking is set to true except within math formulas, since glue nodes are not legal breakpoints when they appear in formulas.

The current node of interest in the hlist is pointed to by cur_p . Another variable, prev_p , is usually one step behind cur_p , but the real meaning of prev_p is this: If type(cur_p) == glue_node then cur_p is a legal breakpoint if and only if auto_breaking is true and prev_p does not point to a glue node, penalty node, explicit kern node, or math node.

The following declarations provide for a few other local variables that are used in special calculations.

⟦910 Local variables for line breaking⟧ = ⟦
    // is node cur_p outside a formula?
    var auto_breaking: boolean;

    // helps to determine when glue nodes are breakpoints
    var prev_p: pointer;

    // miscellaneous nodes of temporary interest
    var q, r, s, prev_s: pointer;

    // used when calculating character widths
    var f: internal_font_number;
⟧

911. The ‘loop ’ in the following code is performed at most thrice per call of line_break , since it is actually a pass over the entire paragraph.

@define update_prev_p =>
    {
        prev_p = cur_p;
        global_prev_p = cur_p;
    }
⟦911 Find optimal breakpoints⟧ = ⟦
    threshold = pretolerance

    if (threshold >= 0) {
        stat!{
            if (tracing_paragraphs > 0) {
                begin_diagnostic;
                print_nl(strpool!("@firstpass"));
            }
        }
        second_pass = false;
        final_pass = false;
    } else {
        threshold = tolerance;
        second_pass = true;
        final_pass = (emergency_stretch <= 0);
        stat!{
            if (tracing_paragraphs > 0) {
                begin_diagnostic;
            }
        }
    }

    loop {
        if (threshold > inf_bad) {
            threshold = inf_bad;
        }
        if (second_pass) {
            ⟦939 Initialize for hyphenating a paragraph⟧
        }
        ⟦912 Create an active breakpoint representing the beginning of the paragraph⟧
        cur_p = link(temp_head);
        auto_breaking = true;
        // glue at beginning is not a legal breakpoint
        update_prev_p;
        // to access the first node of paragraph as the 
        // first active node has break_node == null 
        first_p = cur_p;
        while (
            (cur_p != null)
            && (link(active) != last_active)
        ) {
            ⟦914 Call |try_break| if |cur_p| is a legal breakpoint; on the second pass, also try to hyphenate the next word, if |cur_p| is a glue node; then advance |cur_p| to the next node of the paragraph that could possibly be a legal breakpoint⟧
        }
        if (cur_p == null) {
            ⟦921 Try the final line break at the end of the paragraph, and |goto done| if the desired breakpoints have been found⟧
        }
        ⟦913 Clean up the memory by removing the break nodes⟧
        if (!second_pass) {
            stat!{
                if (tracing_paragraphs > 0) {
                    print_nl(strpool!("@secondpass"));
                }
            }
            threshold = tolerance;
            second_pass = true;
            final_pass = (emergency_stretch <= 0);
            // if at first you don't succeed, \dots
        } else {
            stat!{
                if (tracing_paragraphs > 0) {
                    print_nl(strpool!("@emergencypass"));
                }
            }
            background[2] = 
                background[2]
                + emergency_stretch
            ;
            final_pass = true;
        }
    }

    done:

    stat!{
        if (tracing_paragraphs > 0) {
            end_diagnostic(true);
            normalize_selector;
        }
    }

    if (do_last_line_fit) {
        ⟦1664 Adjust \(t)the final line of the paragraph⟧
    }
⟧

912. The active node that represents the starting point does not need a corresponding passive node.

@define store_background(#) =>
    active_width[#] = background[#]
⟦912 Create an active breakpoint representing the beginning of the paragraph⟧ = ⟦
    q = get_node(active_node_size)

    type(q) = unhyphenated

    fitness(q) = decent_fit

    link(q) = last_active

    break_node(q) = null

    line_number(q) = prev_graf + 1

    total_demerits(q) = 0

    link(active) = q

    if (do_last_line_fit) {
        ⟦1656 Initialize additional fields of the first active node⟧
    }

    do_all_six(store_background)

    passive = null

    printed_node = temp_head

    pass_number = 0

    font_in_short_display = null_font
⟧

913.

⟦913 Clean up the memory by removing the break nodes⟧ = ⟦
    q = link(active)

    while (q != last_active) {
        cur_p = link(q);
        if (type(q) == delta_node) {
            free_node(q, delta_node_size);
        } else {
            free_node(q, active_node_size);
        }
        q = cur_p;
    }

    q = passive

    while (q != null) {
        cur_p = link(q);
        free_node(q, passive_node_size);
        q = cur_p;
    }
⟧

914. Here is the main switch in the line_break routine, where legal breaks are determined. As we move through the hlist, we need to keep the active_width array up to date, so that the badness of individual lines is readily calculated by try_break . It is convenient to use the short name act_width for the component of active width that represents real width as opposed to glue.

// length from first active node to current node
@define act_width => active_width[1]
@define kern_break =>
    {
        if (!is_char_node(link(cur_p)) && auto_breaking) {
            if (type(link(cur_p)) == glue_node) {
                try_break(0, unhyphenated);
            }
        }
        act_width = act_width + width(cur_p);
    }
⟦914 Call |try_break| if |cur_p| is a legal breakpoint; on the second pass, also try to hyphenate the next word, if |cur_p| is a glue node; then advance |cur_p| to the next node of the paragraph that could possibly be a legal breakpoint⟧ = ⟦
    {
        if (is_char_node(cur_p)) {
            ⟦915 Advance \(c)|cur_p| to the node following the present string of characters⟧
        }
        case type(cur_p) {
          hlist_node, vlist_node, rule_node:
            act_width = act_width + width(cur_p);
          whatsit_node:
            ⟦1422 Advance \(p)past a whatsit node in the \(l)|line_break| loop⟧
          glue_node:
            ⟦916 If node |cur_p| is a legal breakpoint, call |try_break|; then update the active widths by including the glue in |glue_ptr(cur_p)|⟧
            if (second_pass && auto_breaking) {
                ⟦943 Try to hyphenate the following word⟧
            }
          kern_node:
            if (subtype(cur_p) == explicit) {
                kern_break;
            } else {
                act_width = act_width + width(cur_p);
            }
          ligature_node:
            f = font(lig_char(cur_p));
            xtx_ligature_present = true;
            act_width = 
                act_width
                + char_width(f)(
                  char_info(f)(character(lig_char(cur_p))),
                )
            ;
          disc_node:
            ⟦917 Try to break after a discretionary fragment, then |goto done5|⟧
          math_node:
            if (subtype(cur_p) < L_code) {
                auto_breaking = odd(subtype(cur_p));
            }
            kern_break;
          penalty_node:
            try_break(penalty(cur_p), unhyphenated);
          mark_node, ins_node, adjust_node:
            do_nothing;
          othercases:
            confusion(strpool!("paragraph"));
        }
        update_prev_p;
        cur_p = link(cur_p);
      done5:
    }
⟧

915. The code that passes over the characters of words in a paragraph is part of TEX’s inner loop, so it has been streamlined for speed. We use the fact that ‘\parfillskip’ glue appears at the end of each paragraph; it is therefore unnecessary to check if link(cur_p) == null when cur_p is a character node.

⟦915 Advance \(c)|cur_p| to the node following the present string of characters⟧ = ⟦
    {
        update_prev_p;
        repeat {
            f = font(cur_p);
            act_width = 
                act_width
                + char_width(f)(
                  char_info(f)(character(cur_p)),
                )
            ;
            cur_p = link(cur_p);
        } until (!is_char_node(cur_p));
    }
⟧

916. When node cur_p is a glue node, we look at prev_p to see whether or not a breakpoint is legal at cur_p , as explained above.

⟦916 If node |cur_p| is a legal breakpoint, call |try_break|; then update the active widths by including the glue in |glue_ptr(cur_p)|⟧ = ⟦
    if (auto_breaking) {
        if (is_char_node(prev_p)) {
            try_break(0, unhyphenated);
        } else if (precedes_break(prev_p)) {
            try_break(0, unhyphenated);
        } else if (
            (type(prev_p) == kern_node)
            && (subtype(prev_p) != explicit)
        ) {
            try_break(0, unhyphenated);
        }
    }

    check_shrinkage(glue_ptr(cur_p))

    q = glue_ptr(cur_p)

    act_width = act_width + width(q)

    active_width[2 + stretch_order(q)] = 
        active_width[2 + stretch_order(q)]
        + stretch(q)

    active_width[6] = active_width[6] + shrink(q)
⟧

917. The following code knows that discretionary texts contain only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes.

⟦917 Try to break after a discretionary fragment, then |goto done5|⟧ = ⟦
    {
        s = pre_break(cur_p);
        disc_width = 0;
        if (s == null) {
            try_break(ex_hyphen_penalty, hyphenated);
        } else {
            repeat {
                ⟦918 Add the width of node |s| to |disc_width|⟧
                s = link(s);
            } until (s == null);
            act_width = act_width + disc_width;
            try_break(hyphen_penalty, hyphenated);
            act_width = act_width - disc_width;
        }
        r = replace_count(cur_p);
        s = link(cur_p);
        while (r > 0) {
            ⟦919 Add the width of node |s| to |act_width|⟧
            decr(r);
            s = link(s);
        }
        update_prev_p;
        cur_p = s;
        goto done5;
    }
⟧

918.

⟦918 Add the width of node |s| to |disc_width|⟧ = ⟦
    if (is_char_node(s)) {
        f = font(s);
        disc_width = 
            disc_width
            + char_width(f)(char_info(f)(character(s)))
        ;
    } else {
        case type(s) {
          ligature_node:
            f = font(lig_char(s));
            xtx_ligature_present = true;
            disc_width = 
                disc_width
                + char_width(f)(
                  char_info(f)(character(lig_char(s))),
                )
            ;
          hlist_node, vlist_node, rule_node, kern_node:
            disc_width = disc_width + width(s);
          whatsit_node:
            if (
                (is_native_word_subtype(s))
                || (subtype(s) == glyph_node)
                || (subtype(s) == pic_node)
                || (subtype(s) == pdf_node)
            ) {
                disc_width = disc_width + width(s);
            } else {
                confusion(strpool!("disc3a"));
            }
          othercases:
            confusion(strpool!("disc3"));
        }
    }
⟧

919.

⟦919 Add the width of node |s| to |act_width|⟧ = ⟦
    if (is_char_node(s)) {
        f = font(s);
        act_width = 
            act_width
            + char_width(f)(char_info(f)(character(s)))
        ;
    } else {
        case type(s) {
          ligature_node:
            f = font(lig_char(s));
            xtx_ligature_present = true;
            act_width = 
                act_width
                + char_width(f)(
                  char_info(f)(character(lig_char(s))),
                )
            ;
          hlist_node, vlist_node, rule_node, kern_node:
            act_width = act_width + width(s);
          whatsit_node:
            if (
                (is_native_word_subtype(s))
                || (subtype(s) == glyph_node)
                || (subtype(s) == pic_node)
                || (subtype(s) == pdf_node)
            ) {
                act_width = act_width + width(s);
            } else {
                confusion(strpool!("disc4a"));
            }
          othercases:
            confusion(strpool!("disc4"));
        }
    }
⟧

920. The forced line break at the paragraph’s end will reduce the list of breakpoints so that all active nodes represent breaks at cur_p == null . On the first pass, we insist on finding an active node that has the correct “looseness.” On the final pass, there will be at least one active node, and we will match the desired looseness as well as we can.

The global variable best_bet will be set to the active node for the best way to break the paragraph, and a few other variables are used to help determine what is best.

⟦13 Global variables⟧ += ⟦
    // use this passive node and its predecessors
    var best_bet: pointer;

    // the demerits associated with best_bet 
    var fewest_demerits: integer;

    // line number following the last line of the new 
    // paragraph
    var best_line: halfword;

    // the difference between line_number ( best_bet ) and 
    // the optimum best_line 
    var actual_looseness: integer;

    // the difference between the current line number and 
    // the optimum best_line 
    var line_diff: integer;
⟧

921.

⟦921 Try the final line break at the end of the paragraph, and |goto done| if the desired breakpoints have been found⟧ = ⟦
    {
        try_break(eject_penalty, hyphenated);
        if (link(active) != last_active) {
            ⟦922 Find an active node with fewest demerits⟧
            if (looseness == 0) {
                goto done;
            }
            ⟦923 Find the best active node for the desired looseness⟧
            if (
                (actual_looseness == looseness)
                || final_pass
            ) {
                goto done;
            }
        }
    }
⟧

922.

⟦922 Find an active node with fewest demerits⟧ = ⟦
    r = link(active)

    fewest_demerits = awful_bad

    repeat {
        if (type(r) != delta_node) {
            if (total_demerits(r) < fewest_demerits) {
                fewest_demerits = total_demerits(r);
                best_bet = r;
            }
        }
        r = link(r);
    } until (r == last_active)

    best_line = line_number(best_bet)
⟧

923. The adjustment for a desired looseness is a slightly more complicated version of the loop just considered. Note that if a paragraph is broken into segments by displayed equations, each segment will be subject to the looseness calculation, independently of the other segments.

⟦923 Find the best active node for the desired looseness⟧ = ⟦
    {
        r = link(active);
        actual_looseness = 0;
        repeat {
            if (type(r) != delta_node) {
                line_diff = line_number(r) - best_line;
                if (
                    (
                        (line_diff < actual_looseness)
                        && (looseness <= line_diff)
                    )
                    || (
                        (line_diff > actual_looseness)
                        && (looseness >= line_diff)
                    )
                ) {
                    best_bet = r;
                    actual_looseness = line_diff;
                    fewest_demerits = total_demerits(r);
                } else if (
                    (line_diff == actual_looseness)
                    && (total_demerits(r) < fewest_demerits)
                ) {
                    best_bet = r;
                    fewest_demerits = total_demerits(r);
                }
            }
            r = link(r);
        } until (r == last_active);
        best_line = line_number(best_bet);
    }
⟧

924. Once the best sequence of breakpoints has been found (hurray), we call on the procedure post_line_break to finish the remainder of the work. (By introducing this subprocedure, we are able to keep line_break from getting extremely long.)

⟦924 Break the paragraph at the chosen breakpoints, justify the resulting lines to the correct widths, and append them to the current vertical list⟧ = ⟦
    post_line_break(d)
⟧

925. The total number of lines that will be set by post_line_break is best_line - prev_graf - 1 . The last breakpoint is specified by break_node(best_bet) , and this passive node points to the other breakpoints via the prev_break links. The finishing-up phase starts by linking the relevant passive nodes in forward order, changing prev_break to next_break . (The next_break fields actually reside in the same memory space as the prev_break fields did, but we give them a new name because of their new significance.) Then the lines are justified, one by one.

// new name for prev_break after links are reversed
@define next_break => prev_break
⟦874 Declare subprocedures for |line_break|⟧ += ⟦
    function post_line_break(d: boolean) {
        label done, done1;
        var
          q, r, s: pointer, // temporary registers for list 
          // manipulation
          p, k: pointer,
          w: scaled,
          glue_break: boolean, // was a break at glue?
          ptmp: pointer,
          disc_break: boolean, // was the current break at a 
          // discretionary node?
          post_disc_break: boolean, // and did it have a 
          // nonempty post-break part?
          cur_width: scaled, // width of line number 
          // cur_line 
          cur_indent: scaled, // left margin of line number 
          // cur_line 
          t: quarterword, // used for replacement counts in 
          // discretionary nodes
          pen: integer, // use when calculating penalties 
          // between lines
          cur_line: halfword, // the current line number 
          // being justified
          LR_ptr: pointer; // stack of LR codes
        
        LR_ptr = LR_save;
        ⟦926 Reverse the links of the relevant passive nodes, setting |cur_p| to the first breakpoint⟧
        cur_line = prev_graf + 1;
        repeat {
            ⟦928 Justify the line ending at breakpoint |cur_p|, and append it to the current vertical list, together with associated penalties and other insertions⟧
            incr(cur_line);
            cur_p = next_break(cur_p);
            if (cur_p != null) {
                if (!post_disc_break) {
                    ⟦927 Prune unwanted nodes at the beginning of the next line⟧
                }
            }
        } until (cur_p == null);
        if (
            (cur_line != best_line)
            || (link(temp_head) != null)
        ) {
            confusion(strpool!("line breaking"));
        }
        prev_graf = best_line - 1;
        LR_save = LR_ptr;
    }
⟧

926. The job of reversing links in a list is conveniently regarded as the job of taking items off one stack and putting them on another. In this case we take them off a stack pointed to by q and having prev_break fields; we put them on a stack pointed to by cur_p and having next_break fields. Node r is the passive node being moved from stack to stack.

⟦926 Reverse the links of the relevant passive nodes, setting |cur_p| to the first breakpoint⟧ = ⟦
    q = break_node(best_bet)

    cur_p = null

    repeat {
        r = q;
        q = prev_break(q);
        next_break(r) = cur_p;
        cur_p = r;
    } until (q == null)
⟧

927. Glue and penalty and kern and math nodes are deleted at the beginning of a line, except in the anomalous case that the node to be deleted is actually one of the chosen breakpoints. Otherwise the pruning done here is designed to match the lookahead computation in try_break , where the break_width values are computed for non-discretionary breakpoints.

⟦927 Prune unwanted nodes at the beginning of the next line⟧ = ⟦
    {
        r = temp_head;
        loop {
            q = link(r);
            if (q == cur_break(cur_p)) {
                //  cur_break ( cur_p ) is the next 
                // breakpoint
                goto done1;
            }
            if (is_char_node(q)) {
                goto done1;
            }
            if (non_discardable(q)) {
                goto done1;
            }
            if (type(q) == kern_node) {
                if (
                    (subtype(q) != explicit)
                    && (subtype(q) != space_adjustment)
                ) {
                    goto done1;
                }
            }
            // now type ( q ) == glue_node , kern_node , 
            // math_node , or penalty_node 
            r = q;
            if (type(q) == math_node) {
                if (TeXXeT_en) {
                    ⟦1518 Adjust \(t)the LR stack for the |post_line_break| routine⟧
                }
            }
        }
      done1:
        if (r != temp_head) {
            link(r) = null;
            flush_node_list(link(temp_head));
            link(temp_head) = q;
        }
    }
⟧

928. The current line to be justified appears in a horizontal list starting at link(temp_head) and ending at cur_break(cur_p) . If cur_break(cur_p) is a glue node, we reset the glue to equal the right_skip glue; otherwise we append the right_skip glue at the right. If cur_break(cur_p) is a discretionary node, we modify the list so that the discretionary break is compulsory, and we set disc_break to true . We also append the left_skip glue at the left of the line, unless it is zero.

⟦928 Justify the line ending at breakpoint |cur_p|, and append it to the current vertical list, together with associated penalties and other insertions⟧ = ⟦
    if (TeXXeT_en) {
        ⟦1517 Insert LR nodes at the beginning of the current line and adjust the LR stack based on LR nodes in this line⟧
    }

    ⟦929 Modify the end of the line to reflect the nature of the break and to include \.{\\rightskip}; also set the proper value of |disc_break|⟧

    if (TeXXeT_en) {
        ⟦1519 Insert LR nodes at the end of the current line⟧
    }

    ⟦935 Put the \(l)\.{\\leftskip} glue at the left and detach this line⟧

    ⟦937 Call the packaging subroutine, setting |just_box| to the justified box⟧

    ⟦936 Append the new box to the current vertical list, followed by the list of special nodes taken out of the box by the packager⟧

    ⟦938 Append a penalty node, if a nonzero penalty is appropriate⟧

929. At the end of the following code, q will point to the final node on the list about to be justified.

⟦929 Modify the end of the line to reflect the nature of the break and to include \.{\\rightskip}; also set the proper value of |disc_break|⟧ = ⟦
    q = cur_break(cur_p)

    disc_break = false

    post_disc_break = false

    glue_break = false

    //  q cannot be a char_node 
    if (q != null) {
        if (type(q) == glue_node) {
            delete_glue_ref(glue_ptr(q));
            glue_ptr(q) = right_skip;
            subtype(q) = right_skip_code + 1;
            add_glue_ref(right_skip);
            glue_break = true;
            goto done;
        } else {
            if (type(q) == disc_node) {
                ⟦930 Change discretionary to compulsory and set |disc_break:=true|⟧
            } else if (type(q) == kern_node) {
                width(q) = 0;
            } else if (type(q) == math_node) {
                width(q) = 0;
                if (TeXXeT_en) {
                    ⟦1518 Adjust \(t)the LR stack for the |post_line_break| routine⟧
                }
            }
        }
    } else {
        q = temp_head;
        while (link(q) != null) {
            q = link(q);
        }
    }

    // at this point q is the rightmost breakpoint; the only 
    // exception is the case of a discretionary break with 
    // non-empty pre_break , then q has been changed to the 
    // last node of the pre_break list
    done:

    if (XeTeX_protrude_chars > 0) {
        if (
            disc_break
            && (is_char_node(q) || (type(q) != disc_node)) //  
            // q has been reset to the last node of 
            // pre_break 
        ) {
            p = q;
            ptmp = p;
        } else {
            // get link ( p ) == q 
            p = prev_rightmost(link(temp_head), q);
            ptmp = p;
            p = find_protchar_right(link(temp_head), p);
        }
        w = right_pw(p);
        // we have found a marginal kern, append it after 
        // ptmp 
        if (w != 0) {
            k = new_margin_kern(
              -w,
              last_rightmost_char,
              right_side,
            );
            link(k) = link(ptmp);
            link(ptmp) = k;
            if ((ptmp == q)) {
                q = link(q);
            }
        }
        // if q was not a breakpoint at glue and has been 
        // reset to rightskip then we append rightskip after 
        // q now
    }

    if (!glue_break) {
        ⟦934 Put the \(r)\.{\\rightskip} glue after node |q|⟧
    }
⟧

930.

⟦930 Change discretionary to compulsory and set |disc_break:=true|⟧ = ⟦
    {
        t = replace_count(q);
        ⟦931 Destroy the |t| nodes following |q|, and make |r| point to the following node⟧
        if (post_break(q) != null) {
            ⟦932 Transplant the post-break list⟧
        }
        if (pre_break(q) != null) {
            ⟦933 Transplant the pre-break list⟧
        }
        link(q) = r;
        disc_break = true;
    }
⟧

931.

⟦931 Destroy the |t| nodes following |q|, and make |r| point to the following node⟧ = ⟦
    if (t == 0) {
        r = link(q);
    } else {
        r = q;
        while (t > 1) {
            r = link(r);
            decr(t);
        }
        s = link(r);
        r = link(s);
        link(s) = null;
        flush_node_list(link(q));
        replace_count(q) = 0;
    }
⟧

932. We move the post-break list from inside node q to the main list by reattaching it just before the present node r , then resetting r .

⟦932 Transplant the post-break list⟧ = ⟦
    {
        s = post_break(q);
        while (link(s) != null) {
            s = link(s);
        }
        link(s) = r;
        r = post_break(q);
        post_break(q) = null;
        post_disc_break = true;
    }
⟧

933. We move the pre-break list from inside node q to the main list by reattaching it just after the present node q , then resetting q .

⟦933 Transplant the pre-break list⟧ = ⟦
    {
        s = pre_break(q);
        link(q) = s;
        while (link(s) != null) {
            s = link(s);
        }
        pre_break(q) = null;
        q = s;
    }
⟧

934.

⟦934 Put the \(r)\.{\\rightskip} glue after node |q|⟧ = ⟦
    r = new_param_glue(right_skip_code)

    link(r) = link(q)

    link(q) = r

    q = r
⟧

935. The following code begins with q at the end of the list to be justified. It ends with q at the beginning of that list, and with link(temp_head) pointing to the remainder of the paragraph, if any.

⟦935 Put the \(l)\.{\\leftskip} glue at the left and detach this line⟧ = ⟦
    r = link(q)

    link(q) = null

    q = link(temp_head)

    // at this point q is the leftmost node; all discardable 
    // nodes have been discarded
    link(temp_head) = r

    if (XeTeX_protrude_chars > 0) {
        p = q;
        // no more discardables
        p = find_protchar_left(p, false);
        w = left_pw(p);
        if (w != 0) {
            k = new_margin_kern(
              -w,
              last_leftmost_char,
              left_side,
            );
            link(k) = q;
            q = k;
        }
    }

    if (left_skip != zero_glue) {
        r = new_param_glue(left_skip_code);
        link(r) = q;
        q = r;
    }
⟧

936.

⟦936 Append the new box to the current vertical list, followed by the list of special nodes taken out of the box by the packager⟧ = ⟦
    if (pre_adjust_head != pre_adjust_tail) {
        append_list(pre_adjust_head)(pre_adjust_tail);
    }

    pre_adjust_tail = null

    append_to_vlist(just_box)

    if (adjust_head != adjust_tail) {
        append_list(adjust_head)(adjust_tail);
    }

    adjust_tail = null
⟧

937. Now q points to the hlist that represents the current line of the paragraph. We need to compute the appropriate line width, pack the line into a box of this size, and shift the box by the appropriate amount of indentation.

⟦937 Call the packaging subroutine, setting |just_box| to the justified box⟧ = ⟦
    if (cur_line > last_special_line) {
        cur_width = second_width;
        cur_indent = second_indent;
    } else if (par_shape_ptr == null) {
        cur_width = first_width;
        cur_indent = first_indent;
    } else {
        cur_width = mem[par_shape_ptr + 2 * cur_line].sc;
        cur_indent = mem[par_shape_ptr + 2 * cur_line - 1]
          .sc;
    }

    adjust_tail = adjust_head

    pre_adjust_tail = pre_adjust_head

    just_box = hpack(q, cur_width, exactly)

    shift_amount(just_box) = cur_indent
⟧

938. Penalties between the lines of a paragraph come from club and widow lines, from the inter_line_penalty parameter, and from lines that end at discretionary breaks. Breaking between lines of a two-line paragraph gets both club-line and widow-line penalties. The local variable pen will be set to the sum of all relevant penalties for the current line, except that the final line is never penalized.

⟦938 Append a penalty node, if a nonzero penalty is appropriate⟧ = ⟦
    if (cur_line + 1 != best_line) {
        q = inter_line_penalties_ptr;
        if (q != null) {
            r = cur_line;
            if (r > penalty(q)) {
                r = penalty(q);
            }
            pen = penalty(q + r);
        } else {
            pen = inter_line_penalty;
        }
        q = club_penalties_ptr;
        if (q != null) {
            r = cur_line - prev_graf;
            if (r > penalty(q)) {
                r = penalty(q);
            }
            pen = pen + penalty(q + r);
        } else if (cur_line == prev_graf + 1) {
            pen = pen + club_penalty;
        }
        if (d) {
            q = display_widow_penalties_ptr;
        } else {
            q = widow_penalties_ptr;
        }
        if (q != null) {
            r = best_line - cur_line - 1;
            if (r > penalty(q)) {
                r = penalty(q);
            }
            pen = pen + penalty(q + r);
        } else if (cur_line + 2 == best_line) {
            if (d) {
                pen = pen + display_widow_penalty;
            } else {
                pen = pen + widow_penalty;
            }
        }
        if (disc_break) {
            pen = pen + broken_penalty;
        }
        if (pen != 0) {
            r = new_penalty(pen);
            link(tail) = r;
            tail = r;
        }
    }
⟧

939. [40] Pre-hyphenation. When the line-breaking routine is unable to find a feasible sequence of breakpoints, it makes a second pass over the paragraph, attempting to hyphenate the hyphenatable words. The goal of hyphenation is to insert discretionary material into the paragraph so that there are more potential places to break.

The general rules for hyphenation are somewhat complex and technical, because we want to be able to hyphenate words that are preceded or followed by punctuation marks, and because we want the rules to work for languages other than English. We also must contend with the fact that hyphens might radically alter the ligature and kerning structure of a word.

A sequence of characters will be considered for hyphenation only if it belongs to a “potentially hyphenatable part” of the current paragraph. This is a sequence of nodes 𝑝0𝑝1𝑝𝑚 where 𝑝0 is a glue node, 𝑝1𝑝𝑚1 are either character or ligature or whatsit or implicit kern or text direction nodes, and 𝑝𝑚 is a glue or penalty or insertion or adjust or mark or whatsit or explicit kern node. (Therefore hyphenation is disabled by boxes, math formulas, and discretionary nodes already inserted by the user.) The ligature nodes among 𝑝1𝑝𝑚1 are effectively expanded into the original non-ligature characters; the kern nodes and whatsits are ignored. Each character c is now classified as either a nonletter (if lc_code(c) == 0 ), a lowercase letter (if lc_code(c) == c ), or an uppercase letter (otherwise); an uppercase letter is treated as if it were lc_code(c) for purposes of hyphenation. The characters generated by 𝑝1𝑝𝑚1 may begin with nonletters; let 𝑐1 be the first letter that is not in the middle of a ligature. Whatsit nodes preceding 𝑐1 are ignored; a whatsit found after 𝑐1 will be the terminating node 𝑝𝑚. All characters that do not have the same font as 𝑐1 will be treated as nonletters. The hyphen_char for that font must be between 0 and 255, otherwise hyphenation will not be attempted. TEX looks ahead for as many consecutive letters 𝑐1𝑐𝑛 as possible; however, n must be less than 𝑐𝑚𝑎𝑥𝑦𝑝𝑒𝑛𝑎𝑡𝑎𝑏𝑙𝑒𝑙𝑒𝑛𝑔𝑡+1, so a character that would otherwise be 𝑐𝑚𝑎𝑥𝑦𝑝𝑒𝑛𝑎𝑡𝑎𝑏𝑙𝑒𝑙𝑒𝑛𝑔𝑡+1 is effectively not a letter. Furthermore 𝑐𝑛 must not be in the middle of a ligature. In this way we obtain a string of letters 𝑐1𝑐𝑛 that are generated by nodes 𝑝𝑎𝑝𝑏, where 1 <= a <= b + 1 <= m . If n >= l_hyf + r_hyf , this string qualifies for hyphenation; however, uc_hyph must be positive, if 𝑐1 is uppercase.

The hyphenation process takes place in three stages. First, the candidate sequence 𝑐1𝑐𝑛 is found; then potential positions for hyphens are determined by referring to hyphenation tables; and finally, the nodes 𝑝𝑎𝑝𝑏 are replaced by a new sequence of nodes that includes the discretionary breaks found.

Fortunately, we do not have to do all this calculation very often, because of the way it has been taken out of TEX’s inner loop. For example, when the second edition of the author’s 700-page book Seminumerical Algorithms was typeset by TEX, only about 1.2 hyphenations needed to be tried per paragraph, since the line breaking algorithm needed to use two passes on only about 5 per cent of the paragraphs.

⟦939 Initialize for hyphenating a paragraph⟧ = ⟦
    {
        init!{
            if (trie_not_ready) {
                init_trie;
            }
        }
        cur_lang = init_cur_lang;
        l_hyf = init_l_hyf;
        r_hyf = init_r_hyf;
        set_hyph_index;
    }
⟧

940. The letters 𝑐1𝑐𝑛 that are candidates for hyphenation are placed into an array called hc ; the number n is placed into hn ; pointers to nodes 𝑝𝑎1 and 𝑝𝑏 in the description above are placed into variables ha and hb ; and the font number is placed into hf .

⟦13 Global variables⟧ += ⟦
    // word to be hyphenated
    // note that element 0 needs to be a full UnicodeScalar, 
    // even though we basically work in UTF16
    var hc: array [0 .. (hyphenatable_length_limit + 3)] of
      0 .. number_usvs;

    // the number of positions occupied in hc , 0..64 in TeX
    var hn: small_number;

    // nodes ha .. hb should be replaced by the hyphenated 
    // result
    var ha, hb: pointer;

    // font number of the letters in hc 
    var hf: internal_font_number;

    // like hc , before conversion to lowercase
    var hu: array [0 .. (hyphenatable_length_limit + 1)] of
      0 .. too_big_char;

    // hyphen character of the relevant font
    var hyf_char: integer;

    // current hyphenation table of interest
    var cur_lang, init_cur_lang: 0 .. biggest_lang;

    // limits on fragment sizes
    var l_hyf, r_hyf, init_l_hyf, init_r_hyf: integer;

    // boundary character after $c_n$
    var hyf_bchar: halfword;

    var max_hyph_char: integer;
⟧

941.

⟦23 Set initial values of key variables⟧ += ⟦
    max_hyph_char = too_big_lang
⟧

942. Hyphenation routines need a few more local variables.

⟦910 Local variables for line breaking⟧ += ⟦
    // an index into hc or hu 
    var j: small_number;

    // character being considered for hyphenation
    var c: UnicodeScalar;
⟧

943. When the following code is activated, the line_break procedure is in its second pass, and cur_p points to a glue node.

⟦943 Try to hyphenate the following word⟧ = ⟦
    {
        prev_s = cur_p;
        s = link(prev_s);
        if (s != null) {
            ⟦949 Skip to node |ha|, or |goto done1| if no hyphenation should be attempted⟧
            if (l_hyf + r_hyf > max_hyphenatable_length) {
                goto done1;
            }
            if (is_native_word_node(ha)) {
                ⟦945 Check that nodes after |native_word| permit hyphenation; if not, |goto done1|⟧
                ⟦946 Prepare a |native_word_node| for hyphenation⟧
            } else {
                ⟦950 Skip to node |hb|, putting letters into |hu| and |hc|⟧
            }
            ⟦952 Check that the nodes following |hb| permit hyphenation and that at least |l_hyf+r_hyf| letters have been found, otherwise |goto done1|⟧
            hyphenate;
        }
      done1:
    }
⟧

944.

⟦874 Declare subprocedures for |line_break|⟧ += ⟦
    ⟦960 Declare the function called |reconstitute|⟧

    function hyphenate() {
        label
            common_ending,
            done,
            found,
            found1,
            found2,
            not_found,
            exit;
        var ⟦954 Local variables for hyphenation⟧;
        
        ⟦977 Find hyphen locations for the word in |hc|, or |return|⟧
        ⟦955 If no hyphens were found, |return|⟧
        ⟦956 Replace nodes |ha..hb| by a sequence of nodes that includes the discretionary hyphens⟧
      exit:
    }

    function max_hyphenatable_length(): integer {
        if (
            XeTeX_hyphenatable_length
            > hyphenatable_length_limit
        ) {
            max_hyphenatable_length = (
              hyphenatable_length_limit
            );
        } else {
            max_hyphenatable_length = (
              XeTeX_hyphenatable_length
            );
        }
    }
⟧

945.

⟦945 Check that nodes after |native_word| permit hyphenation; if not, |goto done1|⟧ = ⟦
    s = link(ha)

    loop {
        if (!(is_char_node(s))) {
            case type(s) {
              ligature_node:
                do_nothing;
              kern_node:
                if (subtype(s) != normal) {
                    goto done6;
                }
              whatsit_node,
              glue_node,
              penalty_node,
              ins_node,
              adjust_node,
              mark_node:
                goto done6;
              othercases:
                goto done1;
            }
        }
        s = link(s);
    }

    done6:
⟧

946.

⟦946 Prepare a |native_word_node| for hyphenation⟧ = ⟦
    // note that if there are chars with lccode == 0 , we 
    // split them out into separate native_word nodes

    hn = 0

    restart:

    for (l in 0 to native_length(ha) - 1) {
        c = get_native_usv(ha, l);
        set_lc_code(c);
        if ((hc[0] == 0)) {
            if ((hn > 0)) {
                // we've got some letters, and now found a 
                // non-letter, so break off the tail of the 
                // native_word and link it after this node, 
                // and goto done3
                ⟦947 Split the |native_word_node| at |l| and link the second part after |ha|⟧
                goto done3;
            }
        } else if ((hn == 0) && (l > 0)) {
            // we've found the first letter after some 
            // non-letters, so break off the head of the 
            // native_word and restart
            ⟦947 Split the |native_word_node| at |l| and link the second part after |ha|⟧
            ha = link(ha);
            goto restart;
        } else // reached max hyphenatable length
        if ((hn == max_hyphenatable_length)) {
            goto done3;
        } else {
            // found a letter that is part of a potentially 
            // hyphenatable sequence
            incr(hn);
            if (c < 0x10000) {
                hu[hn] = c;
                hc[hn] = hc[0];
            } else {
                hu[hn] = (c - 0x10000) div 0x400 + 0xd800;
                hc[hn] = 
                    (hc[0] - 0x10000)
                    div 0x400 + 0xd800
                ;
                incr(hn);
                hu[hn] = c % 0x400 + 0xdc00;
                hc[hn] = hc[0] % 0x400 + 0xdc00;
                incr(l);
            }
            hyf_bchar = non_char;
        }
    }
⟧

947.

⟦947 Split the |native_word_node| at |l| and link the second part after |ha|⟧ = ⟦
    q = new_native_word_node(hf, native_length(ha) - l)

    subtype(q) = subtype(ha)

    for (i in l to native_length(ha) - 1) {
        set_native_char(q, i - l, get_native_char(ha, i));
    }

    set_native_metrics(q, XeTeX_use_glyph_metrics)

    link(q) = link(ha)

    link(ha) = q // truncate text in node ha 

    native_length(ha) = l

    set_native_metrics(ha, XeTeX_use_glyph_metrics)
⟧

948.

⟦910 Local variables for line breaking⟧ += ⟦
    var l: integer;

    var i: integer;
⟧

949. The first thing we need to do is find the node ha just before the first letter.

⟦949 Skip to node |ha|, or |goto done1| if no hyphenation should be attempted⟧ = ⟦
    loop {
        if (is_char_node(s)) {
            c = qo(character(s));
            hf = font(s);
        } else if (type(s) == ligature_node) {
            if (lig_ptr(s) == null) {
                goto continue;
            } else {
                q = lig_ptr(s);
                c = qo(character(q));
                hf = font(q);
            }
        } else if (
            (type(s) == kern_node)
            && (subtype(s) == normal)
        ) {
            goto continue;
        } else if (
            (type(s) == math_node)
            && (subtype(s) >= L_code)
        ) {
            goto continue;
        } else if (type(s) == whatsit_node) {
            if ((is_native_word_subtype(s))) {
                // we only consider the node if it contains 
                // at least one letter, otherwise we'll skip 
                // it
                for (l in 0 to native_length(s) - 1) {
                    c = get_native_usv(s, l);
                    if (lc_code(c) != 0) {
                        hf = native_font(s);
                        prev_s = s;
                        if (
                            (lc_code(c) == c)
                            || (uc_hyph > 0)
                        ) {
                            goto done2;
                        } else {
                            goto done1;
                        }
                    }
                    if (c >= 0x10000) {
                        incr(l);
                    }
                }
            }
            ⟦1423 Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop⟧
            goto continue;
        } else {
            goto done1;
        }
        set_lc_code(c);
        if (hc[0] != 0) {
            if ((hc[0] == c) || (uc_hyph > 0)) {
                goto done2;
            } else {
                goto done1;
            }
        }
      continue:
        prev_s = s;
        s = link(prev_s);
    }

    done2:

    hyf_char = hyphen_char[hf]

    if (hyf_char < 0) {
        goto done1;
    }

    if (hyf_char > biggest_char) {
        goto done1;
    }

    ha = prev_s
⟧

950. The word to be hyphenated is now moved to the hu and hc arrays.

⟦950 Skip to node |hb|, putting letters into |hu| and |hc|⟧ = ⟦
    hn = 0

    loop {
        if (is_char_node(s)) {
            if (font(s) != hf) {
                goto done3;
            }
            hyf_bchar = character(s);
            c = qo(hyf_bchar);
            set_lc_code(c);
            if (hc[0] == 0) {
                goto done3;
            }
            if (hc[0] > max_hyph_char) {
                goto done3;
            }
            if (hn == max_hyphenatable_length) {
                goto done3;
            }
            hb = s;
            incr(hn);
            hu[hn] = c;
            hc[hn] = hc[0];
            hyf_bchar = non_char;
        } else if (type(s) == ligature_node) {
            ⟦951 Move the characters of a ligature node to |hu| and |hc|; but |goto done3| if they are not all letters⟧
        } else if (
            (type(s) == kern_node)
            && (subtype(s) == normal)
        ) {
            hb = s;
            hyf_bchar = font_bchar[hf];
        } else {
            goto done3;
        }
        s = link(s);
    }

    done3:
⟧

951. We let j be the index of the character being stored when a ligature node is being expanded, since we do not want to advance hn until we are sure that the entire ligature consists of letters. Note that it is possible to get to done3 with hn == 0 and hb not set to any value.

⟦951 Move the characters of a ligature node to |hu| and |hc|; but |goto done3| if they are not all letters⟧ = ⟦
    {
        if (font(lig_char(s)) != hf) {
            goto done3;
        }
        j = hn;
        q = lig_ptr(s);
        if (q > null) {
            hyf_bchar = character(q);
        }
        while (q > null) {
            c = qo(character(q));
            set_lc_code(c);
            if (hc[0] == 0) {
                goto done3;
            }
            if (hc[0] > max_hyph_char) {
                goto done3;
            }
            if (j == max_hyphenatable_length) {
                goto done3;
            }
            incr(j);
            hu[j] = c;
            hc[j] = hc[0];
            q = link(q);
        }
        hb = s;
        hn = j;
        if (odd(subtype(s))) {
            hyf_bchar = font_bchar[hf];
        } else {
            hyf_bchar = non_char;
        }
    }
⟧

952.

⟦952 Check that the nodes following |hb| permit hyphenation and that at least |l_hyf+r_hyf| letters have been found, otherwise |goto done1|⟧ = ⟦
    if (hn < l_hyf + r_hyf) {
        //  l_hyf and r_hyf are >= 1 
        goto done1;
    }

    loop {
        if (!(is_char_node(s))) {
            case type(s) {
              ligature_node:
                do_nothing;
              kern_node:
                if (subtype(s) != normal) {
                    goto done4;
                }
              whatsit_node,
              glue_node,
              penalty_node,
              ins_node,
              adjust_node,
              mark_node:
                goto done4;
              math_node:
                if (subtype(s) >= L_code) {
                    goto done4;
                } else {
                    goto done1;
                }
              othercases:
                goto done1;
            }
        }
        s = link(s);
    }

    done4:
⟧

953. [41] Post-hyphenation. If a hyphen may be inserted between hc[j] and hc[j + 1] , the hyphenation procedure will set hyf[j] to some small odd number. But before we look at TEX’s hyphenation procedure, which is independent of the rest of the line-breaking algorithm, let us consider what we will do with the hyphens it finds, since it is better to work on this part of the program before forgetting what ha and hb , etc., are all about.

⟦13 Global variables⟧ += ⟦
    // odd values indicate discretionary hyphens
    var hyf: array [0 .. (hyphenatable_length_limit + 1)] of
      0 .. 9;

    // list of punctuation characters preceding the word
    var init_list: pointer;

    // does init_list represent a ligature?
    var init_lig: boolean;

    // if so, did the ligature involve a left boundary?
    var init_lft: boolean;
⟧

954.

⟦954 Local variables for hyphenation⟧ = ⟦
    // indices into hc or hu 
    var i, j, l: 0 .. (hyphenatable_length_limit + 2);

    // temporary registers for list manipulation
    var q, r, s: pointer;

    // boundary character of hyphenated word, or non_char 
    var bchar: halfword;
⟧

955. TEX will never insert a hyphen that has fewer than \lefthyphenmin letters before it or fewer than \righthyphenmin after it; hence, a short word has comparatively little chance of being hyphenated. If no hyphens have been found, we can save time by not having to make any changes to the paragraph.

⟦955 If no hyphens were found, |return|⟧ = ⟦
    for (j in l_hyf to hn - r_hyf) {
        if (odd(hyf[j])) {
            goto found1;
        }
    }

    return

    found1:
⟧

956. If hyphens are in fact going to be inserted, TEX first deletes the subsequence of nodes between ha and hb . An attempt is made to preserve the effect that implicit boundary characters and punctuation marks had on ligatures inside the hyphenated word, by storing a left boundary or preceding character in hu[0] and by storing a possible right boundary in bchar . We set j = 0 if hu[0] is to be part of the reconstruction; otherwise j = 1 . The variable s will point to the tail of the current hlist, and q will point to the node following hb , so that things can be hooked up after we reconstitute the hyphenated word.

⟦956 Replace nodes |ha..hb| by a sequence of nodes that includes the discretionary hyphens⟧ = ⟦
    if (is_native_word_node(ha)) {
        ⟦957 Hyphenate the |native_word_node| at |ha|⟧
    } else {
        q = link(hb);
        link(hb) = null;
        r = link(ha);
        link(ha) = null;
        bchar = hyf_bchar;
        if (is_char_node(ha)) {
            if (font(ha) != hf) {
                goto found2;
            } else {
                init_list = ha;
                init_lig = false;
                hu[0] = qo(character(ha));
            }
        } else if (type(ha) == ligature_node) {
            if (font(lig_char(ha)) != hf) {
                goto found2;
            } else {
                init_list = lig_ptr(ha);
                init_lig = true;
                init_lft = (subtype(ha) > 1);
                hu[0] = qo(character(lig_char(ha)));
                if (init_list == null) {
                    if (init_lft) {
                        hu[0] = max_hyph_char;
                        init_lig = false;
                        // in this case a ligature will be 
                        // reconstructed from scratch
                    }
                }
                free_node(ha, small_node_size);
            }
        } else {
            // no punctuation found; look for left boundary
            if (!is_char_node(r)) {
                if (type(r) == ligature_node) {
                    if (subtype(r) > 1) {
                        goto found2;
                    }
                }
            }
            j = 1;
            s = ha;
            init_list = null;
            goto common_ending;
        }
        // we have cur_p != ha because type ( cur_p ) == 
        // glue_node 
        s = cur_p;
        while (link(s) != ha) {
            s = link(s);
        }
        j = 0;
        goto common_ending;
      found2:
        s = ha;
        j = 0;
        hu[0] = max_hyph_char;
        init_lig = false;
        init_list = null;
      common_ending:
        flush_node_list(r);
        ⟦967 Reconstitute nodes for the hyphenated word, inserting discretionary hyphens⟧
        flush_list(init_list);
    }
⟧

957.

⟦957 Hyphenate the |native_word_node| at |ha|⟧ = ⟦
    // find the node immediately before the word to be 
    // hyphenated

    // we have cur_p != ha because type ( cur_p ) == 
    // glue_node 
    s = cur_p

    while (link(s) != ha) {
        // for each hyphen position, create a 
        // native_word_node fragment for the text before 
        // this point, and a disc_node for the break, with 
        // the hyf_char in the pre_break text
        s = link(s);
    }

    hyphen_passed = 0 // location of last hyphen we saw

    for (j in l_hyf to hn - r_hyf) {
        // if this is a valid break....
        if (odd(hyf[j])) {
            // make a native_word_node for the fragment 
            // before the hyphen
            q = new_native_word_node(hf, j - hyphen_passed);
            subtype(q) = subtype(ha);
            for (i in 0 to j - hyphen_passed - 1) {
                set_native_char(
                  q,
                  i,
                  get_native_char(ha, i + hyphen_passed),
                );
            }
            set_native_metrics(q, XeTeX_use_glyph_metrics);
            // append the new node
            link(s) = q;
            // make the disc_node for the hyphenation point
            s = q;
            q = new_disc;
            pre_break(q) = new_native_character(
              hf,
              hyf_char,
            );
            link(s) = q;
            s = q;
            hyphen_passed = j;
        }
        // make a native_word_node for the last fragment of 
        // the word
    }

    // ensure trailing punctuation is not lost!
    hn = native_length(ha)

    q = new_native_word_node(hf, hn - hyphen_passed)

    subtype(q) = subtype(ha)

    for (i in 0 to hn - hyphen_passed - 1) {
        set_native_char(
          q,
          i,
          get_native_char(ha, i + hyphen_passed),
        );
    }

    set_native_metrics(q, XeTeX_use_glyph_metrics)

    link(s) = q // append the new node

    s = q

    q = link(ha)

    link(s) = q

    link(ha) = null

    flush_node_list(ha)
⟧

958. We must now face the fact that the battle is not over, even though the hyphens have been found: The process of reconstituting a word can be nontrivial because ligatures might change when a hyphen is present. The TEXbook discusses the difficulties of the word “difficult”, and the discretionary material surrounding a hyphen can be considerably more complex than that. Suppose abcdef is a word in a font for which the only ligatures are b c, c d, d e, and e f. If this word permits hyphenation between b and c, the two patterns with and without hyphenation are ab-cdef and abcdef. Thus the insertion of a hyphen might cause effects to ripple arbitrarily far into the rest of the word. A further complication arises if additional hyphens appear together with such rippling, e.g., if the word in the example just given could also be hyphenated between c and d; TEX avoids this by simply ignoring the additional hyphens in such weird cases.

Still further complications arise in the presence of ligatures that do not delete the original characters. When punctuation precedes the word being hyphenated, TEX’s method is not perfect under all possible scenarios, because punctuation marks and letters can propagate information back and forth. For example, suppose the original pre-hyphenation pair *a changes to *y via a |=: ligature, which changes to xy via a =:| ligature; if 𝑝𝑎1=x and 𝑝𝑎=y, the reconstitution procedure isn’t smart enough to obtain xy again. In such cases the font designer should include a ligature that goes from xa to xy.

959. The processing is facilitated by a subroutine called reconstitute . Given a string of characters 𝑥𝑗𝑥𝑛, there is a smallest index 𝑚𝑗 such that the “translation” of 𝑥𝑗𝑥𝑛 by ligatures and kerning has the form 𝑦1𝑦𝑡 followed by the translation of 𝑥𝑚+1𝑥𝑛, where 𝑦1𝑦𝑡 is some nonempty sequence of character, ligature, and kern nodes. We call 𝑥𝑗𝑥𝑚 a “cut prefix” of 𝑥𝑗𝑥𝑛. For example, if 𝑥1𝑥2𝑥3=fly, and if the font contains ‘fl’ as a ligature and a kern between ‘fl’ and ‘y’, then 𝑚=2, 𝑡=2, and 𝑦1 will be a ligature node for ‘fl’ followed by an appropriate kern node 𝑦2. In the most common case, 𝑥𝑗 forms no ligature with 𝑥𝑗+1 and we simply have 𝑚=𝑗, 𝑦1=𝑥𝑗. If 𝑚<𝑛 we can repeat the procedure on 𝑥𝑚+1𝑥𝑛 until the entire translation has been found.

The reconstitute function returns the integer 𝑚 and puts the nodes 𝑦1𝑦𝑡 into a linked list starting at link(hold_head) , getting the input 𝑥𝑗𝑥𝑛 from the hu array. If 𝑥𝑗=256, we consider 𝑥𝑗 to be an implicit left boundary character; in this case j must be strictly less than n . There is a parameter bchar , which is either 256 or an implicit right boundary character assumed to be present just following 𝑥𝑛. (The value hu[n + 1] is never explicitly examined, but the algorithm imagines that bchar is there.)

If there exists an index k in the range 𝑗𝑘𝑚 such that hyf[k] is odd and such that the result of reconstitute would have been different if 𝑥𝑘+1 had been hchar , then reconstitute sets hyphen_passed to the smallest such k . Otherwise it sets hyphen_passed to zero.

A special convention is used in the case j == 0 : Then we assume that the translation of hu[0] appears in a special list of charnodes starting at init_list ; moreover, if init_lig is true , then hu[0] will be a ligature character, involving a left boundary if init_lft is true . This facility is provided for cases when a hyphenated word is preceded by punctuation (like single or double quotes) that might affect the translation of the beginning of the word.

⟦13 Global variables⟧ += ⟦
    // first hyphen in a ligature, if any
    var hyphen_passed: small_number;
⟧

960.

⟦960 Declare the function called |reconstitute|⟧ = ⟦
    function reconstitute(
      j, n: small_number,
      bchar, hchar: halfword,
    ): small_number {
        label continue, done;
        var
          p: pointer, // temporary register for list 
          // manipulation
          t: pointer, // a node being appended to
          q: four_quarters, // character information or a 
          // lig/kern instruction
          cur_rh: halfword, // hyphen character for ligature 
          // testing
          test_char: halfword, // hyphen or other character 
          // for ligature testing
          w: scaled, // amount of kerning
          k: font_index; // position of current lig/kern 
          // instruction
        
        hyphen_passed = 0;
        t = hold_head;
        w = 0;
        // at this point ligature_present == lft_hit == 
        // rt_hit == false 
        link(hold_head) = null;
        ⟦962 Set up data structures with the cursor following position |j|⟧
      continue:
        ⟦963 If there's a ligature or kern at the cursor position, update the data structures, possibly advancing~|j|; continue until the cursor moves⟧
        ⟦964 Append a ligature and/or kern to the translation; |goto continue| if the stack of inserted ligatures is nonempty⟧
        reconstitute = j;
    }
⟧

961. The reconstitution procedure shares many of the global data structures by which TEX has processed the words before they were hyphenated. There is an implied “cursor” between characters cur_l and cur_r ; these characters will be tested for possible ligature activity. If ligature_present then cur_l is a ligature character formed from the original characters following cur_q in the current translation list. There is a “ligature stack” between the cursor and character j + 1 , consisting of pseudo-ligature nodes linked together by their link fields. This stack is normally empty unless a ligature command has created a new character that will need to be processed later. A pseudo-ligature is a special node having a character field that represents a potential ligature and a lig_ptr field that points to a char_node or is null . We have

𝑐𝑢𝑟_𝑟=𝑐𝑎𝑟𝑎𝑐𝑡𝑒𝑟(𝑙𝑖𝑔_𝑠𝑡𝑎𝑐𝑘),iflig_stack>null;𝑞𝑖(𝑢[𝑗+1]),iflig_stack==nullandj<n;𝑏𝑐𝑎𝑟,iflig_stack==nullandj==n.

⟦13 Global variables⟧ += ⟦
    // characters before and after the cursor
    var cur_l, cur_r: halfword;

    // where a ligature should be detached
    var cur_q: pointer;

    // unfinished business to the right of the cursor
    var lig_stack: pointer;

    // should a ligature node be made for cur_l ?
    var ligature_present: boolean;

    // did we hit a ligature with a boundary character?
    var lft_hit, rt_hit: boolean;
⟧

962.

@define append_charnode_to_t(#) =>
    {
        link(t) = get_avail;
        t = link(t);
        font(t) = hf;
        character(t) = #;
    }
@define set_cur_r =>
    {
        if (j < n) {
            cur_r = qi(hu[j + 1]);
        } else {
            cur_r = bchar;
        }
        if (odd(hyf[j])) {
            cur_rh = hchar;
        } else {
            cur_rh = non_char;
        }
    }
⟦962 Set up data structures with the cursor following position |j|⟧ = ⟦
    cur_l = qi(hu[j])

    cur_q = t

    if (j == 0) {
        ligature_present = init_lig;
        p = init_list;
        if (ligature_present) {
            lft_hit = init_lft;
        }
        while (p > null) {
            append_charnode_to_t(character(p));
            p = link(p);
        }
    } else if (cur_l < non_char) {
        append_charnode_to_t(cur_l);
    }

    lig_stack = null

    set_cur_r
⟧

963. We may want to look at the lig/kern program twice, once for a hyphen and once for a normal letter. (The hyphen might appear after the letter in the program, so we’d better not try to look for both at once.)

⟦963 If there's a ligature or kern at the cursor position, update the data structures, possibly advancing~|j|; continue until the cursor moves⟧ = ⟦
    if (cur_l == non_char) {
        k = bchar_label[hf];
        if (k == non_address) {
            goto done;
        } else {
            q = font_info[k].qqqq;
        }
    } else {
        q = char_info(hf)(cur_l);
        if (char_tag(q) != lig_tag) {
            goto done;
        }
        k = lig_kern_start(hf)(q);
        q = font_info[k].qqqq;
        if (skip_byte(q) > stop_flag) {
            k = lig_kern_restart(hf)(q);
            q = font_info[k].qqqq;
        }
        // now k is the starting address of the lig/kern 
        // program
    }

    if (cur_rh < non_char) {
        test_char = cur_rh;
    } else {
        test_char = cur_r;
    }

    loop {
        if (next_char(q) == test_char) {
            if (skip_byte(q) <= stop_flag) {
                if (cur_rh < non_char) {
                    hyphen_passed = j;
                    hchar = non_char;
                    cur_rh = non_char;
                    goto continue;
                } else {
                    if (hchar < non_char) {
                        if (odd(hyf[j])) {
                            hyphen_passed = j;
                            hchar = non_char;
                        }
                    }
                    if (op_byte(q) < kern_flag) {
                        ⟦965 Carry out a ligature replacement, updating the cursor structure and possibly advancing~|j|; |goto continue| if the cursor doesn't advance, otherwise |goto done|⟧
                    }
                    w = char_kern(hf)(q);
                    // this kern will be inserted below
                    goto done;
                }
            }
        }
        if (skip_byte(q) >= stop_flag) {
            if (cur_rh == non_char) {
                goto done;
            } else {
                cur_rh = non_char;
                goto continue;
            }
        }
        k = k + qo(skip_byte(q)) + 1;
        q = font_info[k].qqqq;
    }

    done:
⟧

964.

@define wrap_lig(#) =>
    if (ligature_present) {
        p = new_ligature(hf, cur_l, link(cur_q));
        if (lft_hit) {
            subtype(p) = 2;
            lft_hit = false;
        }
        if (#) {
            if (lig_stack == null) {
                incr(subtype(p));
                rt_hit = false;
            }
        }
        link(cur_q) = p;
        t = p;
        ligature_present = false;
    }
@define pop_lig_stack =>
    {
        if (lig_ptr(lig_stack) > null) {
            // this is a charnode for hu [ j + 1 ] 
            link(t) = lig_ptr(lig_stack);
            t = link(t);
            incr(j);
        }
        p = lig_stack;
        lig_stack = link(p);
        free_node(p, small_node_size);
        if (lig_stack == null) {
            set_cur_r;
        } else {
            cur_r = character(lig_stack);
        }
        // if lig_stack isn't null we have cur_rh == 
        // non_char 
    }
⟦964 Append a ligature and/or kern to the translation; |goto continue| if the stack of inserted ligatures is nonempty⟧ = ⟦
    wrap_lig(rt_hit)

    if (w != 0) {
        link(t) = new_kern(w);
        t = link(t);
        w = 0;
        // {\sl Sync\TeX}: do nothing, it is too late
        sync_tag(t + medium_node_size) = 0;
    }

    if (lig_stack > null) {
        cur_q = t;
        cur_l = character(lig_stack);
        ligature_present = true;
        pop_lig_stack;
        goto continue;
    }
⟧

965.

⟦965 Carry out a ligature replacement, updating the cursor structure and possibly advancing~|j|; |goto continue| if the cursor doesn't advance, otherwise |goto done|⟧ = ⟦
    {
        if (cur_l == non_char) {
            lft_hit = true;
        }
        if (j == n) {
            if (lig_stack == null) {
                rt_hit = true;
            }
        }
        // allow a way out in case there's an infinite 
        // ligature loop
        check_interrupt;
        case op_byte(q) {
          qi(1), qi(5):
            // \.{=:\?}, \.{=:\?>}
            cur_l = rem_byte(q);
            ligature_present = true;
          qi(2), qi(6):
            // \.{\?=:}, \.{\?=:>}
            cur_r = rem_byte(q);
            if (lig_stack > null) {
                character(lig_stack) = cur_r;
            } else {
                lig_stack = new_lig_item(cur_r);
                if (j == n) {
                    bchar = non_char;
                } else {
                    p = get_avail;
                    lig_ptr(lig_stack) = p;
                    character(p) = qi(hu[j + 1]);
                    font(p) = hf;
                }
            }
          qi(3):
            // \.{\?=:\?}
            cur_r = rem_byte(q);
            p = lig_stack;
            lig_stack = new_lig_item(cur_r);
            link(lig_stack) = p;
          qi(7), qi(11):
            // \.{\?=:\?>}, \.{\?=:\?>>}
            wrap_lig(false);
            cur_q = t;
            cur_l = rem_byte(q);
            ligature_present = true;
          othercases:
            cur_l = rem_byte(q);
            // \.{=:}
            ligature_present = true;
            if (lig_stack > null) {
                pop_lig_stack;
            } else if (j == n) {
                goto done;
            } else {
                append_charnode_to_t(cur_r);
                incr(j);
                set_cur_r;
            }
        }
        if (op_byte(q) > qi(4)) {
            if (op_byte(q) != qi(7)) {
                goto done;
            }
        }
        goto continue;
    }
⟧

966. Okay, we’re ready to insert the potential hyphenations that were found. When the following program is executed, we want to append the word hu[1 .. hn] after node ha , and node q should be appended to the result. During this process, the variable i will be a temporary index into hu ; the variable j will be an index to our current position in hu ; the variable l will be the counterpart of j , in a discretionary branch; the variable r will point to new nodes being created; and we need a few new local variables:

⟦954 Local variables for hyphenation⟧ += ⟦
    // the end of lists in the main and discretionary 
    // branches being reconstructed
    var major_tail, minor_tail: pointer;

    // character temporarily replaced by a hyphen
    var c: UnicodeScalar;

    // where that character came from
    var c_loc: 0 .. hyphenatable_length_limit;

    // replacement count for discretionary
    var r_count: integer;

    // the hyphen, if it exists
    var hyf_node: pointer;
⟧

967. When the following code is performed, hyf[0] and hyf[hn] will be zero.

⟦967 Reconstitute nodes for the hyphenated word, inserting discretionary hyphens⟧ = ⟦
    repeat {
        l = j;
        j = reconstitute(j, hn, bchar, qi(hyf_char)) + 1;
        if (hyphen_passed == 0) {
            link(s) = link(hold_head);
            while (link(s) > null) {
                s = link(s);
            }
            if (odd(hyf[j - 1])) {
                l = j;
                hyphen_passed = j - 1;
                link(hold_head) = null;
            }
        }
        if (hyphen_passed > 0) {
            ⟦968 Create and append a discretionary node as an alternative to the unhyphenated word, and continue to develop both branches until they become equivalent⟧
        }
    } until (j > hn)

    link(s) = q
⟧

968. In this repeat loop we will insert another discretionary if hyf[j - 1] is odd, when both branches of the previous discretionary end at position j - 1 . Strictly speaking, we aren’t justified in doing this, because we don’t know that a hyphen after j - 1 is truly independent of those branches. But in almost all applications we would rather not lose a potentially valuable hyphenation point. (Consider the word ‘difficult’, where the letter ‘c’ is in position j .)

@define advance_major_tail =>
    {
        major_tail = link(major_tail);
        incr(r_count);
    }
⟦968 Create and append a discretionary node as an alternative to the unhyphenated word, and continue to develop both branches until they become equivalent⟧ = ⟦
    repeat {
        r = get_node(small_node_size);
        link(r) = link(hold_head);
        type(r) = disc_node;
        major_tail = r;
        r_count = 0;
        while (link(major_tail) > null) {
            advance_major_tail;
        }
        i = hyphen_passed;
        hyf[i] = 0;
        ⟦969 Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|⟧
        ⟦970 Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this list and to |major_tail| until synchronization has been achieved⟧
        ⟦972 Move pointer |s| to the end of the current list, and set |replace_count(r)| appropriately⟧
        hyphen_passed = j - 1;
        link(hold_head) = null;
    } until (!odd(hyf[j - 1]))
⟧

969. The new hyphen might combine with the previous character via ligature or kern. At this point we have l - 1 <= i < j and i < hn .

⟦969 Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|⟧ = ⟦
    minor_tail = null

    pre_break(r) = null

    hyf_node = new_character(hf, hyf_char)

    if (hyf_node != null) {
        incr(i);
        c = hu[i];
        hu[i] = hyf_char;
        free_avail(hyf_node);
    }

    while (l <= i) {
        l = reconstitute(l, i, font_bchar[hf], non_char) + 1;
        if (link(hold_head) > null) {
            if (minor_tail == null) {
                pre_break(r) = link(hold_head);
            } else {
                link(minor_tail) = link(hold_head);
            }
            minor_tail = link(hold_head);
            while (link(minor_tail) > null) {
                minor_tail = link(minor_tail);
            }
        }
    }

    if (hyf_node != null) {
        // restore the character in the hyphen position
        hu[i] = c;
        l = i;
        decr(i);
    }
⟧

970. The synchronization algorithm begins with l == i + 1 <= j .

⟦970 Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this list and to |major_tail| until synchronization has been achieved⟧ = ⟦
    minor_tail = null

    post_break(r) = null

    c_loc = 0

    // put left boundary at beginning of new line
    if (bchar_label[hf] != non_address) {
        decr(l);
        c = hu[l];
        c_loc = l;
        hu[l] = max_hyph_char;
    }

    while (l < j) {
        repeat {
            l = reconstitute(l, hn, bchar, non_char) + 1;
            if (c_loc > 0) {
                hu[c_loc] = c;
                c_loc = 0;
            }
            if (link(hold_head) > null) {
                if (minor_tail == null) {
                    post_break(r) = link(hold_head);
                } else {
                    link(minor_tail) = link(hold_head);
                }
                minor_tail = link(hold_head);
                while (link(minor_tail) > null) {
                    minor_tail = link(minor_tail);
                }
            }
        } until (l >= j);
        while (l > j) {
            ⟦971 Append characters of |hu[j..@,]| to |major_tail|, advancing~|j|⟧
        }
    }
⟧

971.

⟦971 Append characters of |hu[j..@,]| to |major_tail|, advancing~|j|⟧ = ⟦
    {
        j = reconstitute(j, hn, bchar, non_char) + 1;
        link(major_tail) = link(hold_head);
        while (link(major_tail) > null) {
            advance_major_tail;
        }
    }
⟧

972. Ligature insertion can cause a word to grow exponentially in size. Therefore we must test the size of r_count here, even though the hyphenated text was at most max_hyphenatable_length characters long.

⟦972 Move pointer |s| to the end of the current list, and set |replace_count(r)| appropriately⟧ = ⟦
    // we have to forget the discretionary hyphen
    if (r_count > 127) {
        link(s) = link(r);
        link(r) = null;
        flush_node_list(r);
    } else {
        link(s) = r;
        replace_count(r) = r_count;
    }

    s = major_tail
⟧

973. [42] Hyphenation. When a word hc[1 .. hn] has been set up to contain a candidate for hyphenation, TEX first looks to see if it is in the user’s exception dictionary. If not, hyphens are inserted based on patterns that appear within the given word, using an algorithm due to Frank M. Liang.

Let’s consider Liang’s method first, since it is much more interesting than the exception-lookup routine. The algorithm begins by setting hyf[j] to zero for all j , and invalid characters are inserted into hc[0] and hc[hn + 1] to serve as delimiters. Then a reasonably fast method is used to see which of a given set of patterns occurs in the word hc[0 .. (hn + 1)] . Each pattern 𝑝1𝑝𝑘 of length k has an associated sequence of k + 1 numbers 𝑛0𝑛𝑘 ; and if the pattern occurs in hc[(j + 1) .. (j + k)] , TEX will set hyf[j + i] = "max"(hyf[j + i], n_i) for 0 <= i <= k . After this has been done for each pattern that occurs, a discretionary hyphen will be inserted between hc[j] and hc[j + 1] when hyf[j] is odd, as we have already seen.

The set of patterns 𝑝1𝑝𝑘 and associated numbers 𝑛0𝑛𝑘 depends, of course, on the language whose words are being hyphenated, and on the degree of hyphenation that is desired. A method for finding appropriate p ’s and n ’s, from a given dictionary of words and acceptable hyphenations, is discussed in Liang’s Ph.D. thesis (Stanford University, 1983); TEX simply starts with the patterns and works from there.

974. The patterns are stored in a compact table that is also efficient for retrieval, using a variant of “trie memory” [cf. The Art of Computer Programming 3 (1973), 481–505]. We can find each pattern 𝑝1𝑝𝑘 by letting 𝑧0 be one greater than the relevant language index and then, for 1 <= i <= k , setting z_i = trie_link (z_{i-1})+p_i ; the pattern will be identified by the number 𝑧𝑘 . Since all the pattern information is packed together into a single trie_link array, it is necessary to prevent confusion between the data from inequivalent patterns, so another table is provided such that trie_char (z_i)=p_i for all i . There is also a table trie_op (𝑧𝑘) to identify the numbers 𝑛0𝑛𝑘 associated with 𝑝1𝑝𝑘 .

The theory that comparatively few different number sequences 𝑛0𝑛𝑘 actually occur, since most of the n ’s are generally zero, seems to fail at least for the large German hyphenation patterns. Therefore the number sequences cannot any longer be encoded in such a way that trie_op (𝑧𝑘) is only one byte long. We have introduced a new constant max_trie_op for the maximum allowable hyphenation operation code value; max_trie_op might be different for TEX and INITEX and must not exceed max_halfword . An opcode will occupy a halfword if max_trie_op exceeds max_quarterword or a quarterword otherwise. If trie_op(z_k) != min_trie_op , when 𝑝1𝑝𝑘 has matched the letters in hc[(l - k + 1) .. l] of language t , we perform all of the required operations for this pattern by carrying out the following little program: Set v = trie_op(z_k) . Then set v = v + op_start[t] , hyf[l - hyf_distance[v]] = "max"( hyf[l - hyf_distance[v]], hyf_num[v], ) , and v = hyf_next[v] ; repeat, if necessary, until v == min_trie_op .

⟦18 Types in the outer block⟧ += ⟦
    // an index into trie 
    type trie_pointer = 0 .. ssup_trie_size;

    // a trie opcode
    type trie_opcode = 0 .. ssup_trie_opcode;
⟧

975. For more than 255 trie op codes, the three fields trie_link , trie_char , and trie_op will no longer fit into one memory word; thus using web2c we define trie as three array instead of an array of records. The variant will be implemented by reusing the opcode field later on with another macro.

// ``downward'' link in a trie
@define trie_link(#) => trie_trl[#]
// character matched at this trie location
@define trie_char(#) => trie_trc[#]
// program for hyphenation at this trie location
@define trie_op(#) => trie_tro[#]
⟦13 Global variables⟧ += ⟦
    // We will dynamically allocate these arrays.

    //  trie_link 
    var trie_trl: ^trie_pointer;

    //  trie_op 
    var trie_tro: ^trie_pointer;

    //  trie_char 
    var trie_trc: ^quarterword;

    // position k - j of $n_j$
    var hyf_distance: array [1 .. trie_op_size] of
      small_number;

    // value of $n_j$
    var hyf_num: array [1 .. trie_op_size] of small_number;

    // continuation code
    var hyf_next: array [1 .. trie_op_size] of trie_opcode;

    // offset for current language
    var op_start: array [0 .. biggest_lang] of
      0 .. trie_op_size;
⟧

976.

⟦954 Local variables for hyphenation⟧ += ⟦
    // an index into trie 
    var z: trie_pointer;

    // an index into hyf_distance , etc.
    var v: integer;
⟧

977. Assuming that these auxiliary tables have been set up properly, the hyphenation algorithm is quite short. In the following code we set hc[hn + 2] to the impossible value 256, in order to guarantee that hc[hn + 3] will never be fetched.

⟦977 Find hyphen locations for the word in |hc|, or |return|⟧ = ⟦
    for (j in 0 to hn) {
        hyf[j] = 0;
    }

    ⟦984 Look for the word |hc[1..hn]| in the exception table, and |goto found| (with |hyf| containing the hyphens) if an entry is found⟧

    if (trie_char(cur_lang + 1) != qi(cur_lang)) {
        // no patterns for cur_lang 
        return;
    }

    hc[0] = 0

    hc[hn + 1] = 0

    hc[hn + 2] = max_hyph_char // insert delimiters

    for (j in 0 to hn - r_hyf + 1) {
        z = trie_link(cur_lang + 1) + hc[j];
        l = j;
        while (hc[l] == qo(trie_char(z))) {
            if (trie_op(z) != min_trie_op) {
                ⟦978 Store \(m)maximum values in the |hyf| table⟧
            }
            incr(l);
            z = trie_link(z) + hc[l];
        }
    }

    found:

    for (j in 0 to l_hyf - 1) {
        hyf[j] = 0;
    }

    for (j in 0 to r_hyf - 1) {
        hyf[hn - j] = 0;
    }
⟧

978.

⟦978 Store \(m)maximum values in the |hyf| table⟧ = ⟦
    {
        v = trie_op(z);
        repeat {
            v = v + op_start[cur_lang];
            i = l - hyf_distance[v];
            if (hyf_num[v] > hyf[i]) {
                hyf[i] = hyf_num[v];
            }
            v = hyf_next[v];
        } until (v == min_trie_op);
    }
⟧

979. The exception table that is built by TEX’s \hyphenation primitive is organized as an ordered hash table [cf. Amble and Knuth, The Computer

Journal 17 (1974), 135–142] using linear probing. If 𝛼 and 𝛽 are words, we will say that 𝛼<𝛽 if |𝛼|<|𝛽| or if |𝛼|=|𝛽| and 𝛼 is lexicographically smaller than 𝛽. (The notation |𝛼| stands for the length of 𝛼.) The idea of ordered hashing is to arrange the table so that a given word 𝛼 can be sought by computing a hash address =(𝛼) and then looking in table positions h , h - 1 , …, until encountering the first word 𝛼. If this word is different from 𝛼, we can conclude that 𝛼 is not in the table. This is a clever scheme which saves the need for a hash link array. However, it is difficult to increase the size of the hyphen exception arrays. To make this easier, the ordered hash has been replaced by a simple hash, using an additional array hyph_link . The value 0 in hyph_link[k] means that there are no more entries corresponding to the specific hash chain. When hyph_link[k] > 0 , the next entry in the hash chain is hyph_link[k] - 1 . This value is used because the arrays start at 0 .

The words in the table point to lists in mem that specify hyphen positions in their info fields. The list for 𝑐1𝑐𝑛 contains the number k if the word 𝑐1𝑐𝑛 has a discretionary hyphen between 𝑐𝑘 and 𝑐𝑘+1.

⟦18 Types in the outer block⟧ += ⟦
    // index into hyphen exceptions hash table; enlarging 
    // this requires changing (un)dump code
    type hyph_pointer = 0 .. ssup_hyph_size;
⟧

980.

⟦13 Global variables⟧ += ⟦
    // exception words
    var hyph_word: ^str_number;

    // lists of hyphen positions
    var hyph_list: ^pointer;

    // link array for hyphen exceptions hash table
    var hyph_link: ^hyph_pointer;

    // the number of words in the exception dictionary
    var hyph_count: integer;

    // next free slot in hyphen exceptions hash table
    var hyph_next: integer;
⟧

981.

⟦19 Local variables for initialization⟧ += ⟦
    // runs through the exception dictionary
    var z: hyph_pointer;
⟧

982.

⟦23 Set initial values of key variables⟧ += ⟦
    for (z in 0 to hyph_size) {
        hyph_word[z] = 0;
        hyph_list[z] = null;
        hyph_link[z] = 0;
    }

    hyph_count = 0

    hyph_next = hyph_prime + 1

    if (hyph_next > hyph_size) {
        hyph_next = hyph_prime;
    }
⟧

983. The algorithm for exception lookup is quite simple, as soon as we have a few more local variables to work with.

⟦954 Local variables for hyphenation⟧ += ⟦
    // an index into hyph_word and hyph_list 
    var h: hyph_pointer;

    // an index into str_start 
    var k: str_number;

    // an index into str_pool 
    var u: pool_pointer;
⟧

984. First we compute the hash code h , then we search until we either find the word or we don’t. Words from different languages are kept separate by appending the language code to the string.

⟦984 Look for the word |hc[1..hn]| in the exception table, and |goto found| (with |hyf| containing the hyphens) if an entry is found⟧ = ⟦
    h = hc[1]

    incr(hn)

    hc[hn] = cur_lang

    for (j in 2 to hn) {
        h = (h + h + hc[j]) % hyph_prime;
    }

    loop {
        ⟦985 If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|, |goto not_found|; but if the two strings are equal, set |hyf| to the hyphen positions and |goto found|⟧
        h = hyph_link[h];
        if (h == 0) {
            goto not_found;
        }
        decr(h);
    }

    not_found:

    decr(hn)
⟧

985.

⟦985 If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|, |goto not_found|; but if the two strings are equal, set |hyf| to the hyphen positions and |goto found|⟧ = ⟦
    // This is now a simple hash list, not an ordered one, 
    // so the module title is no longer descriptive.

    k = hyph_word[h]

    if (k == 0) {
        goto not_found;
    }

    if (length(k) == hn) {
        j = 1;
        u = str_start_macro(k);
        repeat {
            if (so(str_pool[u]) != hc[j]) {
                goto done;
            }
            incr(j);
            incr(u);
        } until (j > hn);
        ⟦986 Insert hyphens as specified in |hyph_list[h]|⟧
        decr(hn);
        goto found;
    }

    done:
⟧

986.

⟦986 Insert hyphens as specified in |hyph_list[h]|⟧ = ⟦
    s = hyph_list[h]

    while (s != null) {
        hyf[info(s)] = 1;
        s = link(s);
    }
⟧

987.

⟦987 Search |hyph_list| for pointers to |p|⟧ = ⟦
    for (q in 0 to hyph_size) {
        if (hyph_list[q] == p) {
            print_nl(strpool!("HYPH("));
            print_int(q);
            print_char(ord!(")"));
        }
    }
⟧

988. We have now completed the hyphenation routine, so the line_break procedure is finished at last. Since the hyphenation exception table is fresh in our minds, it’s a good time to deal with the routine that adds new entries to it.

When TEX has scanned ‘\hyphenation’, it calls on a procedure named new_hyph_exceptions to do the right thing.

@define set_cur_lang =>
    if (language <= 0) {
        cur_lang = 0;
    } else if (language > biggest_lang) {
        cur_lang = 0;
    } else {
        cur_lang = language;
    }
// enters new exceptions
function new_hyph_exceptions() {
    label reswitch, exit, found, not_found, not_found1;
    var
      n: 0 .. (hyphenatable_length_limit + 1), // length of 
      // current word; not always a small_number 
      j: 0 .. (hyphenatable_length_limit + 1), // an index 
      // into hc 
      h: hyph_pointer, // an index into hyph_word and 
      // hyph_list 
      k: str_number, // an index into str_start 
      p: pointer, // head of a list of hyphen positions
      q: pointer, // used when creating a new node for list 
      // p 
      s: str_number, // strings being compared or stored
      u, v: pool_pointer; // indices into str_pool 
    
    // a left brace must follow \.{\\hyphenation}
    scan_left_brace;
    set_cur_lang;
    init!{
        if (trie_not_ready) {
            hyph_index = 0;
            goto not_found1;
        }
    }
    set_hyph_index;
  not_found1:
    ⟦989 Enter as many hyphenation exceptions as are listed, until coming to a right brace; then |return|⟧
  exit:
}

989.

⟦989 Enter as many hyphenation exceptions as are listed, until coming to a right brace; then |return|⟧ = ⟦
    n = 0

    p = null

    loop {
        get_x_token;
      reswitch:
        case cur_cmd {
          letter, other_char, char_given:
            ⟦991 Append a new letter or hyphen⟧
          char_num:
            scan_char_num;
            cur_chr = cur_val;
            cur_cmd = char_given;
            goto reswitch;
          spacer, right_brace:
            if (n > 1) {
                ⟦993 Enter a hyphenation exception⟧
            }
            if (cur_cmd == right_brace) {
                return;
            }
            n = 0;
            p = null;
          othercases:
            ⟦990 Give improper \.{\\hyphenation} error⟧
        }
    }
⟧

990.

⟦990 Give improper \.{\\hyphenation} error⟧ = ⟦
    {
        print_err(strpool!("Improper "));
        print_esc(strpool!("hyphenation"));
        print(strpool!(" will be flushed"));
        help2(
          strpool!("Hyphenation exceptions must contain only letters"),
        )(
          strpool!("and hyphens. But continue; I'll forgive and forget."),
        );
        error;
    }
⟧

991.

⟦991 Append a new letter or hyphen⟧ = ⟦
    if (cur_chr == ord!("-")) {
        ⟦992 Append the value |n| to list |p|⟧
    } else {
        set_lc_code(cur_chr);
        if (hc[0] == 0) {
            print_err(strpool!("Not a letter"));
            help2(
              strpool!("Letters in \\hyphenation words must have \\lccode>0."),
            )(
              strpool!("Proceed; I'll ignore the character I just read."),
            );
            error;
        } else if (n < max_hyphenatable_length) {
            incr(n);
            if (hc[0] < 0x10000) {
                hc[n] = hc[0];
            } else {
                hc[n] = (hc[0] - 0x10000) div 0x400 + 0xd800;
                incr(n);
                hc[n] = hc[0] % 0x400 + 0xdc00;
            }
        }
    }
⟧

992.

⟦992 Append the value |n| to list |p|⟧ = ⟦
    {
        if (n < max_hyphenatable_length) {
            q = get_avail;
            link(q) = p;
            info(q) = n;
            p = q;
        }
    }
⟧

993.

⟦993 Enter a hyphenation exception⟧ = ⟦
    {
        incr(n);
        hc[n] = cur_lang;
        str_room(n);
        h = 0;
        for (j in 1 to n) {
            h = (h + h + hc[j]) % hyph_prime;
            append_char(hc[j]);
        }
        s = make_string;
        ⟦994 Insert the \(p)pair |(s,p)| into the exception table⟧
    }
⟧

994.

⟦994 Insert the \(p)pair |(s,p)| into the exception table⟧ = ⟦
    if (hyph_next <= hyph_prime) {
        while (
            (hyph_next > 0)
            && (hyph_word[hyph_next - 1] > 0)
        ) {
            decr(hyph_next);
        }
    }

    if ((hyph_count == hyph_size) || (hyph_next == 0)) {
        overflow(
          strpool!("exception dictionary"),
          hyph_size,
        );
    }

    incr(hyph_count)

    while (hyph_word[h] != 0) {
        ⟦995 If the string |hyph_word[h]| is less than \(or)or equal to |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|⟧
        if (hyph_link[h] == 0) {
            hyph_link[h] = hyph_next;
            if (hyph_next >= hyph_size) {
                hyph_next = hyph_prime;
            }
            if (hyph_next > hyph_prime) {
                incr(hyph_next);
            }
        }
        h = hyph_link[h] - 1;
    }

    found:

    hyph_word[h] = s

    hyph_list[h] = p
⟧

995.

⟦995 If the string |hyph_word[h]| is less than \(or)or equal to |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|⟧ = ⟦
    // This is now a simple hash list, not an ordered one, 
    // so the module title is no longer descriptive.

    k = hyph_word[h]

    if (length(k) != length(s)) {
        goto not_found;
    }

    u = str_start_macro(k)

    v = str_start_macro(s)

    repeat {
        if (str_pool[u] != str_pool[v]) {
            goto not_found;
        }
        incr(u);
        incr(v);// repeat hyphenation exception; flushing 
        // old data
    } until (u == str_start_macro(k + 1))

    flush_string

    s = hyph_word[h] // avoid slow_make_string !

    // We could also flush_list ( hyph_list [ h ] ) ; , but 
    // it interferes with \.{trip.log}.
    decr(hyph_count)

    goto found

    not_found:
⟧

996. [43] Initializing the hyphenation tables. The trie for TEX’s hyphenation algorithm is built from a sequence of patterns following a \patterns specification. Such a specification is allowed only in INITEX, since the extra memory for auxiliary tables and for the initialization program itself would only clutter up the production version of TEX with a lot of deadwood.

The first step is to build a trie that is linked, instead of packed into sequential storage, so that insertions are readily made. After all patterns have been processed, INITEX compresses the linked trie by identifying common subtries. Finally the trie is packed into the efficient sequential form that the hyphenation algorithm actually uses.

⟦874 Declare subprocedures for |line_break|⟧ += ⟦
    init!{
        ⟦998 Declare procedures for preprocessing hyphenation patterns⟧
    }
⟧

997. Before we discuss trie building in detail, let’s consider the simpler problem of creating the hyf_distance , hyf_num , and hyf_next arrays.

Suppose, for example, that TEX reads the pattern ‘ab2cde1’. This is a pattern of length 5, with 𝑛0𝑛5=002001 in the notation above. We want the corresponding trie_op code v to have hyf_distance[v] == 3 , hyf_num[v] == 2 , and hyf_next[v] == v^\prime , where the auxiliary trie_op code 𝑣 has hyf_distance[v^\prime] == 0 , hyf_num[v^\prime] == 1 , and hyf_next[v^\prime] == min_trie_op .

TEX computes an appropriate value v with the new_trie_op subroutine below, by setting

v^\prime=new_trie_op(0,1,min_trie_op),v=new_trie_op(3,2,v^\prime).
This subroutine looks up its three parameters in a special hash table, assigning a new value only if these three have not appeared before for the current language.

The hash table is called trie_op_hash , and the number of entries it contains is trie_op_ptr .

⟦13 Global variables⟧ += ⟦
    init!{
        // trie op codes for quadruples
        var trie_op_hash: array [
          neg_trie_op_size .. trie_op_size,
        ] of 0 .. trie_op_size;
        // largest opcode used so far for this language
        var trie_used: array [0 .. biggest_lang] of
          trie_opcode;
        // language part of a hashed quadruple
        var trie_op_lang: array [1 .. trie_op_size] of
          0 .. biggest_lang;
        // opcode corresponding to a hashed quadruple
        var trie_op_val: array [1 .. trie_op_size] of
          trie_opcode;
        // number of stored ops so far
        var trie_op_ptr: 0 .. trie_op_size;
    }

    // largest opcode used for any language
    var max_op_used: trie_opcode;

    // flag used while dumping or undumping
    var small_op: boolean;
⟧

998. It’s tempting to remove the overflow stops in the following procedure; new_trie_op could return min_trie_op (thereby simply ignoring part of a hyphenation pattern) instead of aborting the job. However, that would lead to different hyphenation results on different installations of TEX using the same patterns. The overflow stops are necessary for portability of patterns.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ = ⟦
    function new_trie_op(
      d, n: small_number,
      v: trie_opcode,
    ): trie_opcode {
        label exit;
        var
          h: neg_trie_op_size .. trie_op_size, // trial hash 
          // location
          u: trie_opcode, // trial op code
          l: 0 .. trie_op_size; // pointer to stored data
        
        h = 
            abs(n + 313 * d + 361 * v + 1009 * cur_lang)
            % (trie_op_size - neg_trie_op_size)
            + neg_trie_op_size
        ;
        loop {
            l = trie_op_hash[h];
            // empty position found for a new op
            if (l == 0) {
                if (trie_op_ptr == trie_op_size) {
                    overflow(
                      strpool!("pattern memory ops"),
                      trie_op_size,
                    );
                }
                u = trie_used[cur_lang];
                if (u == max_trie_op) {
                    overflow(
                      strpool!("pattern memory ops per language"),
                      max_trie_op - min_trie_op,
                    );
                }
                incr(trie_op_ptr);
                incr(u);
                trie_used[cur_lang] = u;
                if (u > max_op_used) {
                    max_op_used = u;
                }
                hyf_distance[trie_op_ptr] = d;
                hyf_num[trie_op_ptr] = n;
                hyf_next[trie_op_ptr] = v;
                trie_op_lang[trie_op_ptr] = cur_lang;
                trie_op_hash[h] = trie_op_ptr;
                trie_op_val[trie_op_ptr] = u;
                new_trie_op = u;
                return;
            }
            if (
                (hyf_distance[l] == d)
                && (hyf_num[l] == n)
                && (hyf_next[l] == v)
                && (trie_op_lang[l] == cur_lang)
            ) {
                new_trie_op = trie_op_val[l];
                return;
            }
            if (h > -trie_op_size) {
                decr(h);
            } else {
                h = trie_op_size;
            }
        }
      exit:
    }
⟧

999. After new_trie_op has compressed the necessary opcode information, plenty of information is available to unscramble the data into the final form needed by our hyphenation algorithm.

⟦999 Sort \(t)the hyphenation op tables into proper order⟧ = ⟦
    op_start[0] = -min_trie_op

    for (j in 1 to biggest_lang) {
        op_start[j] = op_start[j - 1] + qo(trie_used[j - 1]);
    }

    for (j in 1 to trie_op_ptr) {
        // destination
        trie_op_hash[j] = 
            op_start[trie_op_lang[j]]
            + trie_op_val[j]
        ;
    }

    for (j in 1 to trie_op_ptr) {
        while (trie_op_hash[j] > j) {
            k = trie_op_hash[j];
            t = hyf_distance[k];
            hyf_distance[k] = hyf_distance[j];
            hyf_distance[j] = t;
            t = hyf_num[k];
            hyf_num[k] = hyf_num[j];
            hyf_num[j] = t;
            t = hyf_next[k];
            hyf_next[k] = hyf_next[j];
            hyf_next[j] = t;
            trie_op_hash[j] = trie_op_hash[k];
            trie_op_hash[k] = k;
        }
    }
⟧

1000. Before we forget how to initialize the data structures that have been mentioned so far, let’s write down the code that gets them started.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    for (k in -trie_op_size to trie_op_size) {
        trie_op_hash[k] = 0;
    }

    for (k in 0 to biggest_lang) {
        trie_used[k] = min_trie_op;
    }

    max_op_used = min_trie_op

    trie_op_ptr = 0

1001. The linked trie that is used to preprocess hyphenation patterns appears in several global arrays. Each node represents an instruction of the form “if you see character c , then perform operation o , move to the next character, and go to node l ; otherwise go to node r .” The four quantities c , o , l , and r are stored in four arrays trie_c , trie_o , trie_l , and trie_r . The root of the trie is trie_l[0] , and the number of nodes is trie_ptr . Null trie pointers are represented by zero. To initialize the trie, we simply set trie_l[0] and trie_ptr to zero. We also set trie_c[0] to some arbitrary value, since the algorithm may access it.

The algorithms maintain the condition

trie_c[trie_r[z]]>trie_c[z]wheneverz!=0andtrie_r[z]!=0;
in other words, sibling nodes are ordered by their c fields.

@define trie_root => trie_l[0] // root of the linked trie
⟦13 Global variables⟧ += ⟦
    init!{
        // characters to match
        var trie_c: ^packed_ASCII_code;
        // operations to perform
        var trie_o: ^trie_opcode;
        // left subtrie links
        var trie_l: ^trie_pointer;
        // right subtrie links
        var trie_r: ^trie_pointer;
        // the number of nodes in the trie
        var trie_ptr: trie_pointer;
        // used to identify equivalent subtries
        var trie_hash: ^trie_pointer;
    }
⟧

1002. Let us suppose that a linked trie has already been constructed. Experience shows that we can often reduce its size by recognizing common subtries; therefore another hash table is introduced for this purpose, somewhat similar to trie_op_hash . The new hash table will be initialized to zero.

The function trie_node(p) returns p if p is distinct from other nodes that it has seen, otherwise it returns the number of the first equivalent node that it has seen.

Notice that we might make subtries equivalent even if they correspond to patterns for different languages, in which the trie ops might mean quite different things. That’s perfectly all right.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    // converts to a canonical form
    function trie_node(p: trie_pointer): trie_pointer {
        label exit;
        var
          h: trie_pointer, // trial hash location
          q: trie_pointer; // trial trie node
        
        h = 
            abs(
              
                  trie_c[p]
                  + 1009
                  * trie_o[p]
                  + 2718 * trie_l[p] + 3142 * trie_r[p]
              ,
            )
            % trie_size
        ;
        loop {
            q = trie_hash[h];
            if (q == 0) {
                trie_hash[h] = p;
                trie_node = p;
                return;
            }
            if (
                (trie_c[q] == trie_c[p])
                && (trie_o[q] == trie_o[p])
                && (trie_l[q] == trie_l[p])
                && (trie_r[q] == trie_r[p])
            ) {
                trie_node = q;
                return;
            }
            if (h > 0) {
                decr(h);
            } else {
                h = trie_size;
            }
        }
      exit:
    }
⟧

1003. A neat recursive procedure is now able to compress a trie by traversing it and applying trie_node to its nodes in “bottom up” fashion. We will compress the entire trie by clearing trie_hash to zero and then saying ‘trie_root = compress_trie(trie_root) ’.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    function compress_trie(p: trie_pointer): trie_pointer {
        if (p == 0) {
            compress_trie = 0;
        } else {
            trie_l[p] = compress_trie(trie_l[p]);
            trie_r[p] = compress_trie(trie_r[p]);
            compress_trie = trie_node(p);
        }
    }
⟧

1004. The compressed trie will be packed into the trie array using a “top-down first-fit” procedure. This is a little tricky, so the reader should pay close attention: The trie_hash array is cleared to zero again and renamed trie_ref for this phase of the operation; later on, trie_ref[p] will be nonzero only if the linked trie node p is the smallest character in a family and if the characters c of that family have been allocated to locations trie_ref[p] + c in the trie array. Locations of trie that are in use will have trie_link == 0 , while the unused holes in trie will be doubly linked with trie_link pointing to the next larger vacant location and trie_back pointing to the next smaller one. This double linking will have been carried out only as far as trie_max , where trie_max is the largest index of trie that will be needed. To save time at the low end of the trie, we maintain array entries trie_min[c] pointing to the smallest hole that is greater than c . Another array trie_taken tells whether or not a given location is equal to trie_ref[p] for some p ; this array is used to ensure that distinct nodes in the compressed trie will have distinct trie_ref entries.

// where linked trie families go into trie 
@define trie_ref => trie_hash
// use the opcode field now for backward links
@define trie_back(#) => trie_tro[#]
⟦13 Global variables⟧ += ⟦
    init!{
        // does a family start here?
        var trie_taken: ^boolean;
        // the first possible slot for each character
        var trie_min: array [ASCII_code] of trie_pointer;
        // largest location used in trie 
        var trie_max: trie_pointer;
        // is the trie still in linked form?
        var trie_not_ready: boolean;
    }
⟧

1005. Each time \patterns appears, it contributes further patterns to the future trie, which will be built only when hyphenation is attempted or when a format file is dumped. The boolean variable trie_not_ready will change to false when the trie is compressed; this will disable further patterns.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    trie_not_ready = true
⟧

1006. Here is how the trie-compression data structures are initialized. If storage is tight, it would be possible to overlap trie_op_hash , trie_op_lang , and trie_op_val with trie , trie_hash , and trie_taken , because we finish with the former just before we need the latter.

⟦1006 Get ready to compress the trie⟧ = ⟦
    ⟦999 Sort \(t)the hyphenation op tables into proper order⟧

    for (p in 0 to trie_size) {
        trie_hash[p] = 0;
    }

    hyph_root = compress_trie(hyph_root)

    // identify equivalent subtries
    trie_root = compress_trie(trie_root)

    for (p in 0 to trie_ptr) {
        trie_ref[p] = 0;
    }

    for (p in 0 to biggest_char) {
        trie_min[p] = p + 1;
    }

    trie_link(0) = 1

    trie_max = 0

1007. The first_fit procedure finds the smallest hole z in trie such that a trie family starting at a given node p will fit into vacant positions starting at z . If c == trie_c[p] , this means that location z - c must not already be taken by some other family, and that z - c + c^\prime must be vacant for all characters 𝑐 in the family. The procedure sets trie_ref[p] to z - c when the first fit has been found.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    // packs a family into trie 
    function first_fit(p: trie_pointer) {
        label not_found, found;
        var
          h: trie_pointer, // candidate for trie_ref [ p ] 
          z: trie_pointer, // runs through holes
          q: trie_pointer, // runs through the family 
          // starting at p 
          c: ASCII_code, // smallest character in the family
          l, r: trie_pointer, // left and right neighbors
          ll: 1 .. too_big_char; // upper limit of trie_min 
          // updating
        
        c = so(trie_c[p]);
        // get the first conceivably good hole
        z = trie_min[c];
        loop {
            h = z - c;
            ⟦1008 Ensure that |trie_max>=h+max_hyph_char|⟧
            if (trie_taken[h]) {
                goto not_found;
            }
            ⟦1009 If all characters of the family fit relative to |h|, then |goto found|,\30\ otherwise |goto not_found|⟧
          not_found:
            // move to the next hole
            z = trie_link(z);
        }
      found:
        ⟦1010 Pack the family into |trie| relative to |h|⟧
    }
⟧

1008. By making sure that trie_max is at least h + max_hyph_char , we can be sure that trie_max > z , since h == z - c . It follows that location trie_max will never be occupied in trie , and we will have trie_max >= trie_link(z) .

⟦1008 Ensure that |trie_max>=h+max_hyph_char|⟧ = ⟦
    if (trie_max < h + max_hyph_char) {
        if (trie_size <= h + max_hyph_char) {
            overflow(strpool!("pattern memory"), trie_size);
        }
        repeat {
            incr(trie_max);
            trie_taken[trie_max] = false;
            trie_link(trie_max) = trie_max + 1;
            trie_back(trie_max) = trie_max - 1;
        } until (trie_max == h + max_hyph_char);
    }
⟧

1009.

⟦1009 If all characters of the family fit relative to |h|, then |goto found|,\30\ otherwise |goto not_found|⟧ = ⟦
    q = trie_r[p]

    while (q > 0) {
        if (trie_link(h + so(trie_c[q])) == 0) {
            goto not_found;
        }
        q = trie_r[q];
    }

    goto found

1010.

⟦1010 Pack the family into |trie| relative to |h|⟧ = ⟦
    trie_taken[h] = true

    trie_ref[p] = h

    q = p

    repeat {
        z = h + so(trie_c[q]);
        l = trie_back(z);
        r = trie_link(z);
        trie_back(r) = l;
        trie_link(l) = r;
        trie_link(z) = 0;
        if (l < max_hyph_char) {
            if (z < max_hyph_char) {
                ll = z;
            } else {
                ll = max_hyph_char;
            }
            repeat {
                trie_min[l] = r;
                incr(l);
            } until (l == ll);
        }
        q = trie_r[q];
    } until (q == 0)
⟧

1011. To pack the entire linked trie, we use the following recursive procedure.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    // pack subtries of a family
    function trie_pack(p: trie_pointer) {
        var
          q: trie_pointer; // a local variable that need not 
          // be saved on recursive calls
        
        repeat {
            q = trie_l[p];
            if ((q > 0) && (trie_ref[q] == 0)) {
                first_fit(q);
                trie_pack(q);
            }
            p = trie_r[p];
        } until (p == 0);
    }
⟧

1012. When the whole trie has been allocated into the sequential table, we must go through it once again so that trie contains the correct information. Null pointers in the linked trie will be represented by the value 0, which properly implements an “empty” family.

// clear trie [ r ] 
@define clear_trie =>
    {
        trie_link(r) = 0;
        trie_op(r) = min_trie_op;
        //  trie_char = qi ( 0 ) 
        trie_char(r) = min_quarterword;
    }
⟦1012 Move the data into |trie|⟧ = ⟦
    // no patterns were given
    if (trie_max == 0) {
        for (r in 0 to max_hyph_char) {
            clear_trie;
        }
        trie_max = max_hyph_char;
    } else {
        if (hyph_root > 0) {
            trie_fix(hyph_root);
        }
        if (trie_root > 0) {
            // this fixes the non-holes in trie 
            trie_fix(trie_root);
        }
        // now we will zero out all the holes
        r = 0;
        repeat {
            s = trie_link(r);
            clear_trie;
            r = s;
        } until (r > trie_max);
    }

    // make trie_char ( c ) != c for all c 
    trie_char(0) = qi(ord!("?"))
⟧

1013. The fixing-up procedure is, of course, recursive. Since the linked trie usually has overlapping subtries, the same data may be moved several times; but that causes no harm, and at most as much work is done as it took to build the uncompressed trie.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    // moves p and its siblings into trie 
    function trie_fix(p: trie_pointer) {
        var
          q: trie_pointer, // a local variable that need not 
          // be saved on recursive calls
          c: ASCII_code, // another one that need not be 
          // saved
          z: trie_pointer; //  trie reference; this local 
          // variable must be saved
        
        z = trie_ref[p];
        repeat {
            q = trie_l[p];
            c = so(trie_c[p]);
            trie_link(z + c) = trie_ref[q];
            trie_char(z + c) = qi(c);
            trie_op(z + c) = trie_o[p];
            if (q > 0) {
                trie_fix(q);
            }
            p = trie_r[p];
        } until (p == 0);
    }
⟧

1014. Now let’s go back to the easier problem, of building the linked trie. When INITEX has scanned the ‘\patterns’ control sequence, it calls on new_patterns to do the right thing.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    // initializes the hyphenation pattern data
    function new_patterns() {
        label done, done1;
        var
          k, l: 0 .. (hyphenatable_length_limit + 1), // 
          // indices into hc and hyf ; not always in 
          // small_number range
          digit_sensed: boolean, // should the next digit be 
          // treated as a letter?
          v: trie_opcode, // trie op code
          p, q: trie_pointer, // nodes of trie traversed 
          // during insertion
          first_child: boolean, // is p == trie_l [ q ] ?
          c: ASCII_code; // character being inserted
        
        if (trie_not_ready) {
            set_cur_lang;
            // a left brace must follow \.{\\patterns}
            scan_left_brace;
            ⟦1015 Enter all of the patterns into a linked trie, until coming to a right brace⟧
            if (saving_hyph_codes > 0) {
                ⟦1666 Store hyphenation codes for current language⟧
            }
        } else {
            print_err(strpool!("Too late for "));
            print_esc(strpool!("patterns"));
            help1(
              strpool!("All patterns must be given before typesetting begins."),
            );
            error;
            link(garbage) = scan_toks(false, false);
            flush_list(def_ref);
        }
    }
⟧

1015. Novices are not supposed to be using \patterns, so the error messages are terse. (Note that all error messages appear in TEX’s string pool, even if they are used only by INITEX.)

⟦1015 Enter all of the patterns into a linked trie, until coming to a right brace⟧ = ⟦
    k = 0

    hyf[0] = 0

    digit_sensed = false

    loop {
        get_x_token;
        case cur_cmd {
          letter, other_char:
            ⟦1016 Append a new letter or a hyphen level⟧
          spacer, right_brace:
            if (k > 0) {
                ⟦1017 Insert a new pattern into the linked trie⟧
            }
            if (cur_cmd == right_brace) {
                goto done;
            }
            k = 0;
            hyf[0] = 0;
            digit_sensed = false;
          othercases:
            print_err(strpool!("Bad "));
            print_esc(strpool!("patterns"));
            help1(strpool!("(See Appendix H.)"));
            error;
        }
    }

    done:
⟧

1016.

⟦1016 Append a new letter or a hyphen level⟧ = ⟦
    if (
        digit_sensed
        || (cur_chr < ord!("0")) || (cur_chr > ord!("9"))
    ) {
        if (cur_chr == ord!(".")) {
            // edge-of-word delimiter
            cur_chr = 0;
        } else {
            cur_chr = lc_code(cur_chr);
            if (cur_chr == 0) {
                print_err(strpool!("Nonletter"));
                help1(strpool!("(See Appendix H.)"));
                error;
            }
        }
        if (cur_chr > max_hyph_char) {
            max_hyph_char = cur_chr;
        }
        if (k < max_hyphenatable_length) {
            incr(k);
            hc[k] = cur_chr;
            hyf[k] = 0;
            digit_sensed = false;
        }
    } else if (k < max_hyphenatable_length) {
        hyf[k] = cur_chr - ord!("0");
        digit_sensed = true;
    }
⟧

1017. When the following code comes into play, the pattern 𝑝1𝑝𝑘 appears in hc[1 .. k] , and the corresponding sequence of numbers 𝑛0𝑛𝑘 appears in hyf[0 .. k] .

⟦1017 Insert a new pattern into the linked trie⟧ = ⟦
    {
        ⟦1019 Compute the trie op code, |v|, and set |l:=0|⟧
        q = 0;
        hc[0] = cur_lang;
        while (l <= k) {
            c = hc[l];
            incr(l);
            p = trie_l[q];
            first_child = true;
            while ((p > 0) && (c > so(trie_c[p]))) {
                q = p;
                p = trie_r[q];
                first_child = false;
            }
            if ((p == 0) || (c < so(trie_c[p]))) {
                ⟦1018 Insert a new trie node between |q| and |p|, and make |p| point to it⟧
            }
            // now node q represents $p_1\ldots p_{l-1}$
            q = p;
        }
        if (trie_o[q] != min_trie_op) {
            print_err(strpool!("Duplicate pattern"));
            help1(strpool!("(See Appendix H.)"));
            error;
        }
        trie_o[q] = v;
    }
⟧

1018.

⟦1018 Insert a new trie node between |q| and |p|, and make |p| point to it⟧ = ⟦
    {
        if (trie_ptr == trie_size) {
            overflow(strpool!("pattern memory"), trie_size);
        }
        incr(trie_ptr);
        trie_r[trie_ptr] = p;
        p = trie_ptr;
        trie_l[p] = 0;
        if (first_child) {
            trie_l[q] = p;
        } else {
            trie_r[q] = p;
        }
        trie_c[p] = si(c);
        trie_o[p] = min_trie_op;
    }
⟧

1019.

⟦1019 Compute the trie op code, |v|, and set |l:=0|⟧ = ⟦
    if (hc[1] == 0) {
        hyf[0] = 0;
    }

    if (hc[k] == 0) {
        hyf[k] = 0;
    }

    l = k

    v = min_trie_op

    loop {
        if (hyf[l] != 0) {
            v = new_trie_op(k - l, hyf[l], v);
        }
        if (l > 0) {
            decr(l);
        } else {
            goto done1;
        }
    }

    done1:
⟧

1020. Finally we put everything together: Here is how the trie gets to its final, efficient form. The following packing routine is rigged so that the root of the linked tree gets mapped into location 1 of trie , as required by the hyphenation algorithm. This happens because the first call of first_fit will “take” location 1.

⟦998 Declare procedures for preprocessing hyphenation patterns⟧ += ⟦
    function init_trie() {
        var
          p: trie_pointer, // pointer for initialization
          j, k, t: integer, // all-purpose registers for 
          // initialization
          r, s: trie_pointer; // used to clean up the packed 
          // trie 
        
        incr(max_hyph_char);
        ⟦1006 Get ready to compress the trie⟧
        if (trie_root != 0) {
            first_fit(trie_root);
            trie_pack(trie_root);
        }
        if (hyph_root != 0) {
            ⟦1668 Pack all stored |hyph_codes|⟧
        }
        ⟦1012 Move the data into |trie|⟧
        trie_not_ready = false;
    }
⟧

1021. [44] Breaking vertical lists into pages. The vsplit procedure, which implements TEX’s \vsplit operation, is considerably simpler than line_break because it doesn’t have to worry about hyphenation, and because its mission is to discover a single break instead of an optimum sequence of breakpoints. But before we get into the details of vsplit , we need to consider a few more basic things.

1022. A subroutine called prune_page_top takes a pointer to a vlist and returns a pointer to a modified vlist in which all glue, kern, and penalty nodes have been deleted before the first box or rule node. However, the first box or rule is actually preceded by a newly created glue node designed so that the topmost baseline will be at distance split_top_skip from the top, whenever this is possible without backspacing.

When the second argument s is false the deleted nodes are destroyed, otherwise they are collected in a list starting at split_disc .

In this routine and those that follow, we make use of the fact that a vertical list contains no character nodes, hence the type field exists for each node in the list.

// adjust top after page break
function prune_page_top(p: pointer, s: boolean): pointer {
    var
      prev_p: pointer, // lags one step behind p 
      q, r: pointer; // temporary variables for list 
      // manipulation
    
    prev_p = temp_head;
    link(temp_head) = p;
    while (p != null) {
        case type(p) {
          hlist_node, vlist_node, rule_node:
            ⟦1023 Insert glue for |split_top_skip| and set~|p:=null|⟧
          whatsit_node, mark_node, ins_node:
            prev_p = p;
            p = link(prev_p);
          glue_node, kern_node, penalty_node:
            q = p;
            p = link(q);
            link(q) = null;
            link(prev_p) = p;
            if (s) {
                if (split_disc == null) {
                    split_disc = q;
                } else {
                    link(r) = q;
                }
                r = q;
            } else {
                flush_node_list(q);
            }
          othercases:
            confusion(strpool!("pruning"));
        }
    }
    prune_page_top = link(temp_head);
}

1023.

⟦1023 Insert glue for |split_top_skip| and set~|p:=null|⟧ = ⟦
    {
        q = new_skip_param(split_top_skip_code);
        link(prev_p) = q;
        // now temp_ptr == glue_ptr ( q ) 
        link(q) = p;
        if (width(temp_ptr) > height(p)) {
            width(temp_ptr) = width(temp_ptr) - height(p);
        } else {
            width(temp_ptr) = 0;
        }
        p = null;
    }
⟧

1024. The next subroutine finds the best place to break a given vertical list so as to obtain a box of height h , with maximum depth d . A pointer to the beginning of the vertical list is given, and a pointer to the optimum breakpoint is returned. The list is effectively followed by a forced break, i.e., a penalty node with the eject_penalty ; if the best break occurs at this artificial node, the value null is returned.

An array of six scaled distances is used to keep track of the height from the beginning of the list to the current place, just as in line_break . In fact, we use one of the same arrays, only changing its name to reflect its new significance.

// new name for the six distance variables
@define active_height => active_width
@define cur_height => active_height[1] // the natural height
@define set_height_zero(#) =>
    // initialize the height to zero
    active_height[#] = 0
// go here to record glue in the active_height table
@define update_heights => 90
// finds optimum page break
function vert_break(p: pointer, h, d: scaled): pointer {
    label done, not_found, update_heights;
    var
      prev_p: pointer, // if p is a glue node, type ( prev_p 
      // ) determines whether p is a legal breakpoint
      q, r: pointer, // glue specifications
      pi: integer, // penalty value
      b: integer, // badness at a trial breakpoint
      least_cost: integer, // the smallest badness plus 
      // penalties found so far
      best_place: pointer, // the most recent break that 
      // leads to least_cost 
      prev_dp: scaled, // depth of previous box in the list
      t: small_number; //  type of the node following a kern
    
    // an initial glue node is not a legal breakpoint
    prev_p = p;
    least_cost = awful_bad;
    do_all_six(set_height_zero);
    prev_dp = 0;
    loop {
        ⟦1026 If node |p| is a legal breakpoint, check if this break is the best known, and |goto done| if |p| is null or if the page-so-far is already too full to accept more stuff⟧
        prev_p = p;
        p = link(prev_p);
    }
  done:
    vert_break = best_place;
}

1025. A global variable best_height_plus_depth will be set to the natural size of the box that corresponds to the optimum breakpoint found by vert_break . (This value is used by the insertion-splitting algorithm of the page builder.)

⟦13 Global variables⟧ += ⟦
    // height of the best box, without stretching or 
    // shrinking
    var best_height_plus_depth: scaled;
⟧

1026. A subtle point to be noted here is that the maximum depth d might be negative, so cur_height and prev_dp might need to be corrected even after a glue or kern node.

⟦1026 If node |p| is a legal breakpoint, check if this break is the best known, and |goto done| if |p| is null or if the page-so-far is already too full to accept more stuff⟧ = ⟦
    if (p == null) {
        pi = eject_penalty;
    } else {
        ⟦1027 Use node |p| to update the current height and depth measurements; if this node is not a legal breakpoint, |goto not_found| or |update_heights|, otherwise set |pi| to the associated penalty at the break⟧
    }

    ⟦1028 Check if node |p| is a new champion breakpoint; then \(go)|goto done| if |p| is a forced break or if the page-so-far is already too full⟧

    if ((type(p) < glue_node) || (type(p) > kern_node)) {
        goto not_found;
    }

    update_heights:

    ⟦1030 Update the current height and depth measurements with respect to a glue or kern node~|p|⟧

    not_found:

    if (prev_dp > d) {
        cur_height = cur_height + prev_dp - d;
        prev_dp = d;
    }
⟧

1027.

⟦1027 Use node |p| to update the current height and depth measurements; if this node is not a legal breakpoint, |goto not_found| or |update_heights|, otherwise set |pi| to the associated penalty at the break⟧ = ⟦
    case type(p) {
      hlist_node, vlist_node, rule_node:
        cur_height = cur_height + prev_dp + height(p);
        prev_dp = depth(p);
        goto not_found;
      whatsit_node:
        ⟦1425 Process whatsit |p| in |vert_break| loop, |goto not_found|⟧
      glue_node:
        if (precedes_break(prev_p)) {
            pi = 0;
        } else {
            goto update_heights;
        }
      kern_node:
        if (link(p) == null) {
            t = penalty_node;
        } else {
            t = type(link(p));
        }
        if (t == glue_node) {
            pi = 0;
        } else {
            goto update_heights;
        }
      penalty_node:
        pi = penalty(p);
      mark_node, ins_node:
        goto not_found;
      othercases:
        confusion(strpool!("vertbreak"));
    }
⟧

1028.

// more than inf_bad , but less than awful_bad 
@define deplorable => 100000
⟦1028 Check if node |p| is a new champion breakpoint; then \(go)|goto done| if |p| is a forced break or if the page-so-far is already too full⟧ = ⟦
    if (pi < inf_penalty) {
        ⟦1029 Compute the badness, |b|, using |awful_bad| if the box is too full⟧
        if (b < awful_bad) {
            if (pi <= eject_penalty) {
                b = pi;
            } else if (b < inf_bad) {
                b = b + pi;
            } else {
                b = deplorable;
            }
        }
        if (b <= least_cost) {
            best_place = p;
            least_cost = b;
            best_height_plus_depth = cur_height + prev_dp;
        }
        if ((b == awful_bad) || (pi <= eject_penalty)) {
            goto done;
        }
    }
⟧

1029.

⟦1029 Compute the badness, |b|, using |awful_bad| if the box is too full⟧ = ⟦
    if (cur_height < h) {
        if (
            (active_height[3] != 0)
            || (active_height[4] != 0)
            || (active_height[5] != 0)
        ) {
            b = 0;
        } else {
            b = badness(h - cur_height, active_height[2]);
        }
    } else if (cur_height - h > active_height[6]) {
        b = awful_bad;
    } else {
        b = badness(cur_height - h, active_height[6]);
    }
⟧

1030. Vertical lists that are subject to the vert_break procedure should not contain infinite shrinkability, since that would permit any amount of information to “fit” on one page.

⟦1030 Update the current height and depth measurements with respect to a glue or kern node~|p|⟧ = ⟦
    if (type(p) == kern_node) {
        q = p;
    } else {
        q = glue_ptr(p);
        active_height[2 + stretch_order(q)] = 
            active_height[2 + stretch_order(q)]
            + stretch(q)
        ;
        active_height[6] = active_height[6] + shrink(q);
        if ((shrink_order(q) != normal) && (shrink(q) != 0)) {
            print_err(
              strpool!("Infinite glue shrinkage found in box being split"),
            );
            help4(
              strpool!("The box you are \\vsplitting contains some infinitely"),
            )(
              strpool!("shrinkable glue, e.g., `\\vss' or `\\vskip 0pt minus 1fil'."),
            )(
              strpool!("Such glue doesn't belong there; but you can safely proceed,"),
            )(
              strpool!("since the offensive shrinkability has been made finite."),
            );
            error;
            r = new_spec(q);
            shrink_order(r) = normal;
            delete_glue_ref(q);
            glue_ptr(p) = r;
            q = r;
        }
    }

    cur_height = cur_height + prev_dp + width(q)

    prev_dp = 0

1031. Now we are ready to consider vsplit itself. Most of its work is accomplished by the two subroutines that we have just considered.

Given the number of a vlist box n , and given a desired page height h , the vsplit function finds the best initial segment of the vlist and returns a box for a page of height h . The remainder of the vlist, if any, replaces the original box, after removing glue and penalties and adjusting for split_top_skip . Mark nodes in the split-off box are used to set the values of split_first_mark and split_bot_mark ; we use the fact that split_first_mark == null if and only if split_bot_mark == null .

The original box becomes “void” if and only if it has been entirely extracted. The extracted box is “void” if and only if the original box was void (or if it was, erroneously, an hlist box).

⟦1636 Declare the function called |do_marks|⟧

// extracts a page of height h from box n 
function vsplit(n: halfword, h: scaled): pointer {
    label exit, done;
    var
      v: pointer, // the box to be split
      p: pointer, // runs through the vlist
      q: pointer; // points to where the break occurs
    
    cur_val = n;
    fetch_box(v);
    flush_node_list(split_disc);
    split_disc = null;
    if (sa_mark != null) {
        if (do_marks(vsplit_init, 0, sa_mark)) {
            sa_mark = null;
        }
    }
    if (split_first_mark != null) {
        delete_token_ref(split_first_mark);
        split_first_mark = null;
        delete_token_ref(split_bot_mark);
        split_bot_mark = null;
    }
    ⟦1032 Dispense with trivial cases of void or bad boxes⟧
    q = vert_break(list_ptr(v), h, split_max_depth);
    ⟦1033 Look at all the marks in nodes before the break, and set the final link to |null| at the break⟧
    q = prune_page_top(q, saving_vdiscards > 0);
    p = list_ptr(v);
    free_node(v, box_node_size);
    if (q != null) {
        q = vpack(q, natural);
    }
    // the eq_level of the box stays the same
    change_box(q);
    vsplit = vpackage(p, h, exactly, split_max_depth);
  exit:
}

1032.

⟦1032 Dispense with trivial cases of void or bad boxes⟧ = ⟦
    if (v == null) {
        vsplit = null;
        return;
    }

    if (type(v) != vlist_node) {
        print_err(strpool!(""));
        print_esc(strpool!("vsplit"));
        print(strpool!(" needs a "));
        print_esc(strpool!("vbox"));
        help2(
          strpool!("The box you are trying to split is an \\hbox."),
        )(
          strpool!("I can't split such a box, so I'll leave it alone."),
        );
        error;
        vsplit = null;
        return;
    }
⟧

1033. It’s possible that the box begins with a penalty node that is the “best” break, so we must be careful to handle this special case correctly.

⟦1033 Look at all the marks in nodes before the break, and set the final link to |null| at the break⟧ = ⟦
    p = list_ptr(v)

    if (p == q) {
        list_ptr(v) = null;
    } else {
        loop {
            if (type(p) == mark_node) {
                if (mark_class(p) != 0) {
                    ⟦1638 Update the current marks for |vsplit|⟧
                } else if (split_first_mark == null) {
                    split_first_mark = mark_ptr(p);
                    split_bot_mark = split_first_mark;
                    token_ref_count(split_first_mark) = 
                        token_ref_count(split_first_mark)
                        + 2
                    ;
                } else {
                    delete_token_ref(split_bot_mark);
                    split_bot_mark = mark_ptr(p);
                    add_token_ref(split_bot_mark);
                }
            }
            if (link(p) == q) {
                link(p) = null;
                goto done;
            }
            p = link(p);
        }
    }

    done:
⟧

1034. [45] The page builder. When TEX appends new material to its main vlist in vertical mode, it uses a method something like vsplit to decide where a page ends, except that the calculations are done “on line” as new items come in. The main complication in this process is that insertions must be put into their boxes and removed from the vlist, in a more-or-less optimum manner.

We shall use the term “current page” for that part of the main vlist that is being considered as a candidate for being broken off and sent to the user’s output routine. The current page starts at link(page_head) , and it ends at page_tail . We have page_head == page_tail if this list is empty.

Utter chaos would reign if the user kept changing page specifications while a page is being constructed, so the page builder keeps the pertinent specifications frozen as soon as the page receives its first box or insertion. The global variable page_contents is empty when the current page contains only mark nodes and content-less whatsit nodes; it is inserts_only if the page contains only insertion nodes in addition to marks and whatsits. Glue nodes, kern nodes, and penalty nodes are discarded until a box or rule node appears, at which time page_contents changes to box_there . As soon as page_contents becomes non-empty , the current vsize and max_depth are squirreled away into page_goal and page_max_depth ; the latter values will be used until the page has been forwarded to the user’s output routine. The \topskip adjustment is made when page_contents changes to box_there .

Although page_goal starts out equal to vsize , it is decreased by the scaled natural height-plus-depth of the insertions considered so far, and by the \skip corrections for those insertions. Therefore it represents the size into which the non-inserted material should fit, assuming that all insertions in the current page have been made.

The global variables best_page_break and least_page_cost correspond respectively to the local variables best_place and least_cost in the vert_break routine that we have already studied; i.e., they record the location and value of the best place currently known for breaking the current page. The value of page_goal at the time of the best break is stored in best_size .

//  page_contents when an insert node has been contributed, 
// but no boxes
@define inserts_only => 1
//  page_contents when a box or rule has been contributed
@define box_there => 2
⟦13 Global variables⟧ += ⟦
    // the final node on the current page
    var page_tail: pointer;

    // what is on the current page so far?
    var page_contents: empty .. box_there;

    // maximum box depth on page being built
    var page_max_depth: scaled;

    // break here to get the best page known so far
    var best_page_break: pointer;

    // the score for this currently best page
    var least_page_cost: integer;

    // its page_goal 
    var best_size: scaled;
⟧

1035. The page builder has another data structure to keep track of insertions. This is a list of four-word nodes, starting and ending at page_ins_head . That is, the first element of the list is node r _1 == link(page_ins_head) ; node 𝑟𝑗 is followed by r _{j+1} == link (r_j) ; and if there are n items we have r _{n+1} == page_ins_head . The subtype field of each node in this list refers to an insertion number; for example, ‘\insert 250’ would correspond to a node whose subtype is qi(250) (the same as the subtype field of the relevant ins_node ). These subtype fields are in increasing order, and subtype(page_ins_head) == qi(255) , so page_ins_head serves as a convenient sentinel at the end of the list. A record is present for each insertion number that appears in the current page.

The type field in these nodes distinguishes two possibilities that might occur as we look ahead before deciding on the optimum page break. If type(r) == inserting , then height(r) contains the total of the height-plus-depth dimensions of the box and all its inserts seen so far. If type(r) == split_up , then no more insertions will be made into this box, because at least one previous insertion was too big to fit on the current page; broken_ptr(r) points to the node where that insertion will be split, if TEX decides to split it, broken_ins(r) points to the insertion node that was tentatively split, and height(r) includes also the natural height plus depth of the part that would be split off.

In both cases, last_ins_ptr(r) points to the last ins_node encountered for box qo(subtype(r)) that would be at least partially inserted on the next page; and best_ins_ptr(r) points to the last such ins_node that should actually be inserted, to get the page with minimum badness among all page breaks considered so far. We have best_ins_ptr(r) == null if and only if no insertion for this box should be made to produce this optimum page.

The data structure definitions here use the fact that the height field appears in the fourth word of a box node.

// number of words for a page insertion node
@define page_ins_node_size => 4
// an insertion class that has not yet overflowed
@define inserting => 0
@define split_up => 1 // an overflowed insertion class
// an insertion for this class will break here if anywhere
@define broken_ptr(#) => link(# + 1)
// this insertion might break at broken_ptr 
@define broken_ins(#) => info(# + 1)
// the most recent insertion for this subtype 
@define last_ins_ptr(#) => link(# + 2)
// the optimum most recent insertion
@define best_ins_ptr(#) => info(# + 2)
⟦838 Initialize the special list heads and constant nodes⟧ += ⟦
    subtype(page_ins_head) = qi(255)

    type(page_ins_head) = split_up

    link(page_ins_head) = page_ins_head
⟧

1036. An array page_so_far records the heights and depths of everything on the current page. This array contains six scaled numbers, like the similar arrays already considered in line_break and vert_break ; and it also contains page_goal and page_depth , since these values are all accessible to the user via set_page_dimen commands. The value of page_so_far[1] is also called page_total . The stretch and shrink components of the \skip corrections for each insertion are included in page_so_far , but the natural space components of these corrections are not, since they have been subtracted from page_goal .

The variable page_depth records the depth of the current page; it has been adjusted so that it is at most page_max_depth . The variable last_glue points to the glue specification of the most recent node contributed from the contribution list, if this was a glue node; otherwise last_glue == max_halfword . (If the contribution list is nonempty, however, the value of last_glue is not necessarily accurate.) The variables last_penalty , last_kern , and last_node_type are similar. And finally, insert_penalties holds the sum of the penalties associated with all split and floating insertions.

// desired height of information on page being built
@define page_goal => page_so_far[0]
// height of the current page
@define page_total => page_so_far[1]
// shrinkability of the current page
@define page_shrink => page_so_far[6]
// depth of the current page
@define page_depth => page_so_far[7]
⟦13 Global variables⟧ += ⟦
    // height and glue of the current page
    var page_so_far: array [0 .. 7] of scaled;

    // used to implement \.{\\lastskip}
    var last_glue: pointer;

    // used to implement \.{\\lastpenalty}
    var last_penalty: integer;

    // used to implement \.{\\lastkern}
    var last_kern: scaled;

    // used to implement \.{\\lastnodetype}
    var last_node_type: integer;

    // sum of the penalties for insertions that were held 
    // over
    var insert_penalties: integer;
⟧

1037.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("pagegoal"), set_page_dimen, 0)

    primitive(strpool!("pagetotal"), set_page_dimen, 1)

    primitive(strpool!("pagestretch"), set_page_dimen, 2)

    primitive(strpool!("pagefilstretch"), set_page_dimen, 3)

    primitive(
      strpool!("pagefillstretch"),
      set_page_dimen,
      4,
    )

    primitive(
      strpool!("pagefilllstretch"),
      set_page_dimen,
      5,
    )

    primitive(strpool!("pageshrink"), set_page_dimen, 6)

    primitive(strpool!("pagedepth"), set_page_dimen, 7)
⟧

1038.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    set_page_dimen:

    case chr_code {
      0:
        print_esc(strpool!("pagegoal"));
      1:
        print_esc(strpool!("pagetotal"));
      2:
        print_esc(strpool!("pagestretch"));
      3:
        print_esc(strpool!("pagefilstretch"));
      4:
        print_esc(strpool!("pagefillstretch"));
      5:
        print_esc(strpool!("pagefilllstretch"));
      6:
        print_esc(strpool!("pageshrink"));
      othercases:
        print_esc(strpool!("pagedepth"));
    }
⟧

1039.

@define print_plus_end(#) =>
    /*... opened earlier ...*/
        print(#);
    }
@define print_plus(#) =>
    if (page_so_far[#] != 0) {
        print(strpool!(" plus "));
        print_scaled(page_so_far[#]);
        print_plus_end
    /* ... closed later ... */
function print_totals() {
    print_scaled(page_total);
    print_plus(2)(strpool!(""));
    print_plus(3)(strpool!("fil"));
    print_plus(4)(strpool!("fill"));
    print_plus(5)(strpool!("filll"));
    if (page_shrink != 0) {
        print(strpool!(" minus "));
        print_scaled(page_shrink);
    }
}

1040.

⟦1040 Show the status of the current page⟧ = ⟦
    if (page_head != page_tail) {
        print_nl(strpool!("### current page:"));
        if (output_active) {
            print(strpool!(" (held over for next output)"));
        }
        show_box(link(page_head));
        if (page_contents > empty) {
            print_nl(strpool!("total height "));
            print_totals;
            print_nl(strpool!(" goal height "));
            print_scaled(page_goal);
            r = link(page_ins_head);
            while (r != page_ins_head) {
                print_ln;
                print_esc(strpool!("insert"));
                t = qo(subtype(r));
                print_int(t);
                print(strpool!(" adds "));
                if (count(t) == 1000) {
                    t = height(r);
                } else {
                    t = x_over_n(height(r), 1000) * count(t);
                }
                print_scaled(t);
                if (type(r) == split_up) {
                    q = page_head;
                    t = 0;
                    repeat {
                        q = link(q);
                        if (
                            (type(q) == ins_node)
                            && (subtype(q) == subtype(r))
                        ) {
                            incr(t);
                        }
                    } until (q == broken_ins(r));
                    print(strpool!(", #"));
                    print_int(t);
                    print(strpool!(" might split"));
                }
                r = link(r);
            }
        }
    }
⟧

1041. Here is a procedure that is called when the page_contents is changing from empty to inserts_only or box_there .

@define set_page_so_far_zero(#) => page_so_far[#] = 0
function freeze_page_specs(s: small_number) {
    page_contents = s;
    page_goal = vsize;
    page_max_depth = max_depth;
    page_depth = 0;
    do_all_six(set_page_so_far_zero);
    least_page_cost = awful_bad;
    stat!{
        if (tracing_pages > 0) {
            begin_diagnostic;
            print_nl(strpool!("%% goal height="));
            print_scaled(page_goal);
            print(strpool!(", max depth="));
            print_scaled(page_max_depth);
            end_diagnostic(false);
        }
    }
}

1042. Pages are built by appending nodes to the current list in TEX’s vertical mode, which is at the outermost level of the semantic nest. This vlist is split into two parts; the “current page” that we have been talking so much about already, and the “contribution list” that receives new nodes as they are created. The current page contains everything that the page builder has accounted for in its data structures, as described above, while the contribution list contains other things that have been generated by other parts of TEX but have not yet been seen by the page builder. The contribution list starts at link(contrib_head) , and it ends at the current node in TEX’s vertical mode.

When TEX has appended new material in vertical mode, it calls the procedure build_page , which tries to catch up by moving nodes from the contribution list to the current page. This procedure will succeed in its goal of emptying the contribution list, unless a page break is discovered, i.e., unless the current page has grown to the point where the optimum next page break has been determined. In the latter case, the nodes after the optimum break will go back onto the contribution list, and control will effectively pass to the user’s output routine.

We make type(page_head) == glue_node , so that an initial glue node on the current page will not be considered a valid breakpoint.

⟦838 Initialize the special list heads and constant nodes⟧ += ⟦
    type(page_head) = glue_node

    // {\sl Sync\TeX} watch point: box( page_head ) size >= 
    // glue_node size
    subtype(page_head) = normal
⟧

1043. The global variable output_active is true during the time the user’s output routine is driving TEX.

⟦13 Global variables⟧ += ⟦
    // are we in the midst of an output routine?
    var output_active: boolean;
⟧

1044.

⟦23 Set initial values of key variables⟧ += ⟦
    output_active = false

    insert_penalties = 0

1045. The page builder is ready to start a fresh page if we initialize the following state variables. (However, the page insertion list is initialized elsewhere.)

⟦1045 Start a new current page⟧ = ⟦
    page_contents = empty

    page_tail = page_head

    link(page_head) = null

    last_glue = max_halfword

    last_penalty = 0

    last_kern = 0

    last_node_type = -1

    page_depth = 0

    page_max_depth = 0

1046. At certain times box 255 is supposed to be void (i.e., null ), or an insertion box is supposed to be ready to accept a vertical list. If not, an error message is printed, and the following subroutine flushes the unwanted contents, reporting them to the user.

function box_error(n: eight_bits) {
    error;
    begin_diagnostic;
    print_nl(
      strpool!("The following box has been deleted:"),
    );
    show_box(box(n));
    end_diagnostic(true);
    flush_node_list(box(n));
    box(n) = null;
}

1047. The following procedure guarantees that a given box register does not contain an \hbox.

function ensure_vbox(n: eight_bits) {
    var
      p: pointer; // the box register contents
    
    p = box(n);
    if (p != null) {
        if (type(p) == hlist_node) {
            print_err(
              strpool!("Insertions can only be added to a vbox"),
            );
            help3(
              strpool!("Tut tut: You're trying to \\insert into a"),
            )(
              strpool!("\\box register that now contains an \\hbox."),
            )(
              strpool!("Proceed, and I'll discard its present contents."),
            );
            box_error(n);
        }
    }
}

1048. TEX is not always in vertical mode at the time build_page is called; the current mode reflects what TEX should return to, after the contribution list has been emptied. A call on build_page should be immediately followed by ‘goto big_switch ’, which is TEX’s central control point.

// go here to link a node into the current page
@define contribute => 80
⟦1066 Declare the procedure called |fire_up|⟧

// append contributions to the current page
function build_page() {
    label
        exit,
        done,
        done1,
        continue,
        contribute,
        update_heights;
    var
      p: pointer, // the node being appended
      q, r: pointer, // nodes being examined
      b, c: integer, // badness and cost of current page
      pi: integer, // penalty to be added to the badness
      n: min_quarterword .. biggest_reg, // insertion box 
      // number
      delta, h, w: scaled; // sizes used for insertion 
      // calculations
    
    if ((link(contrib_head) == null) || output_active) {
        return;
    }
    repeat {
      continue:
        p = link(contrib_head);
        ⟦1050 Update the values of |last_glue|, |last_penalty|, and |last_kern|⟧
        ⟦1051 Move node |p| to the current page; if it is time for a page break, put the nodes following the break back onto the contribution list, and |return| to the user's output routine if there is one⟧
    } until (link(contrib_head) == null);
    ⟦1049 Make the contribution list empty by setting its tail to |contrib_head|⟧
  exit:
}

1049.

// tail of the contribution list
@define contrib_tail => nest[0].tail_field
⟦1049 Make the contribution list empty by setting its tail to |contrib_head|⟧ = ⟦
    if (nest_ptr == 0) {
        // vertical mode
        tail = contrib_head;
    } else {
        // other modes
        contrib_tail = contrib_head;
    }
⟧

1050.

⟦1050 Update the values of |last_glue|, |last_penalty|, and |last_kern|⟧ = ⟦
    if (last_glue != max_halfword) {
        delete_glue_ref(last_glue);
    }

    last_penalty = 0

    last_kern = 0

    last_node_type = type(p) + 1

    if (type(p) == glue_node) {
        last_glue = glue_ptr(p);
        add_glue_ref(last_glue);
    } else {
        last_glue = max_halfword;
        if (type(p) == penalty_node) {
            last_penalty = penalty(p);
        } else if (type(p) == kern_node) {
            last_kern = width(p);
        }
    }
⟧

1051. The code here is an example of a many-way switch into routines that merge together in different places. Some people call this unstructured programming, but the author doesn’t see much wrong with it, as long as the various labels have a well-understood meaning.

⟦1051 Move node |p| to the current page; if it is time for a page break, put the nodes following the break back onto the contribution list, and |return| to the user's output routine if there is one⟧ = ⟦
    ⟦1054 If the current page is empty and node |p| is to be deleted, |goto done1|; otherwise use node |p| to update the state of the current page; if this node is an insertion, |goto contribute|; otherwise if this node is not a legal breakpoint, |goto contribute| or |update_heights|; otherwise set |pi| to the penalty associated with this breakpoint⟧

    ⟦1059 Check if node |p| is a new champion breakpoint; then \(if)if it is time for a page break, prepare for output, and either fire up the user's output routine and |return| or ship out the page and |goto done|⟧

    if ((type(p) < glue_node) || (type(p) > kern_node)) {
        goto contribute;
    }

    update_heights:

    ⟦1058 Update the current page measurements with respect to the glue or kern specified by node~|p|⟧

    contribute:

    ⟦1057 Make sure that |page_max_depth| is not exceeded⟧

    ⟦1052 Link node |p| into the current page and |goto done|⟧

    done1:

    ⟦1053 Recycle node |p|⟧

    done:
⟧

1052.

⟦1052 Link node |p| into the current page and |goto done|⟧ = ⟦
    link(page_tail) = p

    page_tail = p

    link(contrib_head) = link(p)

    link(p) = null

    goto done

1053.

⟦1053 Recycle node |p|⟧ = ⟦
    link(contrib_head) = link(p)

    link(p) = null

    if (saving_vdiscards > 0) {
        if (page_disc == null) {
            page_disc = p;
        } else {
            link(tail_page_disc) = p;
        }
        tail_page_disc = p;
    } else {
        flush_node_list(p);
    }
⟧

1054. The title of this section is already so long, it seems best to avoid making it more accurate but still longer, by mentioning the fact that a kern node at the end of the contribution list will not be contributed until we know its successor.

⟦1054 If the current page is empty and node |p| is to be deleted, |goto done1|; otherwise use node |p| to update the state of the current page; if this node is an insertion, |goto contribute|; otherwise if this node is not a legal breakpoint, |goto contribute| or |update_heights|; otherwise set |pi| to the penalty associated with this breakpoint⟧ = ⟦
    case type(p) {
      hlist_node, vlist_node, rule_node:
        if (page_contents < box_there) {
            ⟦1055 Initialize the current page, insert the \.{\\topskip} glue ahead of |p|, and |goto continue|⟧
        } else {
            ⟦1056 Prepare to move a box or rule node to the current page, then |goto contribute|⟧
        }
      whatsit_node:
        ⟦1424 Prepare to move whatsit |p| to the current page, then |goto contribute|⟧
      glue_node:
        if (page_contents < box_there) {
            goto done1;
        } else if (precedes_break(page_tail)) {
            pi = 0;
        } else {
            goto update_heights;
        }
      kern_node:
        if (page_contents < box_there) {
            goto done1;
        } else if (link(p) == null) {
            return;
        } else if (type(link(p)) == glue_node) {
            pi = 0;
        } else {
            goto update_heights;
        }
      penalty_node:
        if (page_contents < box_there) {
            goto done1;
        } else {
            pi = penalty(p);
        }
      mark_node:
        goto contribute;
      ins_node:
        ⟦1062 Append an insertion to the current page and |goto contribute|⟧
      othercases:
        confusion(strpool!("page"));
    }
⟧

1055.

⟦1055 Initialize the current page, insert the \.{\\topskip} glue ahead of |p|, and |goto continue|⟧ = ⟦
    {
        if (page_contents == empty) {
            freeze_page_specs(box_there);
        } else {
            page_contents = box_there;
        }
        // now temp_ptr == glue_ptr ( q ) 
        q = new_skip_param(top_skip_code);
        if (width(temp_ptr) > height(p)) {
            width(temp_ptr) = width(temp_ptr) - height(p);
        } else {
            width(temp_ptr) = 0;
        }
        link(q) = p;
        link(contrib_head) = q;
        goto continue;
    }
⟧

1056.

⟦1056 Prepare to move a box or rule node to the current page, then |goto contribute|⟧ = ⟦
    {
        page_total = page_total + page_depth + height(p);
        page_depth = depth(p);
        goto contribute;
    }
⟧

1057.

⟦1057 Make sure that |page_max_depth| is not exceeded⟧ = ⟦
    if (page_depth > page_max_depth) {
        page_total = 
            page_total
            + page_depth - page_max_depth
        ;
        page_depth = page_max_depth;
    }
⟧

1058.

⟦1058 Update the current page measurements with respect to the glue or kern specified by node~|p|⟧ = ⟦
    if (type(p) == kern_node) {
        q = p;
    } else {
        q = glue_ptr(p);
        page_so_far[2 + stretch_order(q)] = 
            page_so_far[2 + stretch_order(q)]
            + stretch(q)
        ;
        page_shrink = page_shrink + shrink(q);
        if ((shrink_order(q) != normal) && (shrink(q) != 0)) {
            print_err(
              strpool!("Infinite glue shrinkage found on current page"),
            );
            help4(
              strpool!("The page about to be output contains some infinitely"),
            )(
              strpool!("shrinkable glue, e.g., `\\vss' or `\\vskip 0pt minus 1fil'."),
            )(
              strpool!("Such glue doesn't belong there; but you can safely proceed,"),
            )(
              strpool!("since the offensive shrinkability has been made finite."),
            );
            error;
            r = new_spec(q);
            shrink_order(r) = normal;
            delete_glue_ref(q);
            glue_ptr(p) = r;
            q = r;
        }
    }

    page_total = page_total + page_depth + width(q)

    page_depth = 0

1059.

⟦1059 Check if node |p| is a new champion breakpoint; then \(if)if it is time for a page break, prepare for output, and either fire up the user's output routine and |return| or ship out the page and |goto done|⟧ = ⟦
    if (pi < inf_penalty) {
        ⟦1061 Compute the badness, |b|, of the current page, using |awful_bad| if the box is too full⟧
        if (b < awful_bad) {
            if (pi <= eject_penalty) {
                c = pi;
            } else if (b < inf_bad) {
                c = b + pi + insert_penalties;
            } else {
                c = deplorable;
            }
        } else {
            c = b;
        }
        if (insert_penalties >= 10000) {
            c = awful_bad;
        }
        stat!{
            if (tracing_pages > 0) {
                ⟦1060 Display the page break cost⟧
            }
        }
        if (c <= least_page_cost) {
            best_page_break = p;
            best_size = page_goal;
            least_page_cost = c;
            r = link(page_ins_head);
            while (r != page_ins_head) {
                best_ins_ptr(r) = last_ins_ptr(r);
                r = link(r);
            }
        }
        if ((c == awful_bad) || (pi <= eject_penalty)) {
            // output the current page at the best place
            fire_up(p);
            if (output_active) {
                // user's output routine will act
                return;
            }
            // the page has been shipped out by default 
            // output routine
            goto done;
        }
    }
⟧

1060.

⟦1060 Display the page break cost⟧ = ⟦
    {
        begin_diagnostic;
        print_nl(ord!("%"));
        print(strpool!(" t="));
        print_totals;
        print(strpool!(" g="));
        print_scaled(page_goal);
        print(strpool!(" b="));
        if (b == awful_bad) {
            print_char(ord!("*"));
        } else {
            print_int(b);
        }
        print(strpool!(" p="));
        print_int(pi);
        print(strpool!(" c="));
        if (c == awful_bad) {
            print_char(ord!("*"));
        } else {
            print_int(c);
        }
        if (c <= least_page_cost) {
            print_char(ord!("#"));
        }
        end_diagnostic(false);
    }
⟧

1061.

⟦1061 Compute the badness, |b|, of the current page, using |awful_bad| if the box is too full⟧ = ⟦
    if (page_total < page_goal) {
        if (
            (page_so_far[3] != 0)
            || (page_so_far[4] != 0)
            || (page_so_far[5] != 0)
        ) {
            b = 0;
        } else {
            b = badness(
              page_goal - page_total,
              page_so_far[2],
            );
        }
    } else if (page_total - page_goal > page_shrink) {
        b = awful_bad;
    } else {
        b = badness(page_total - page_goal, page_shrink);
    }
⟧

1062.

⟦1062 Append an insertion to the current page and |goto contribute|⟧ = ⟦
    {
        if (page_contents == empty) {
            freeze_page_specs(inserts_only);
        }
        n = subtype(p);
        r = page_ins_head;
        while (n >= subtype(link(r))) {
            r = link(r);
        }
        n = qo(n);
        if (subtype(r) != qi(n)) {
            ⟦1063 Create a page insertion node with |subtype(r)=qi(n)|, and include the glue correction for box |n| in the current page state⟧
        }
        if (type(r) == split_up) {
            insert_penalties = 
                insert_penalties
                + float_cost(p)
            ;
        } else {
            last_ins_ptr(r) = p;
            // this much room is left if we shrink the 
            // maximum
            delta = 
                page_goal
                - page_total - page_depth + page_shrink
            ;
            if (count(n) == 1000) {
                h = height(p);
            } else {
                // this much room is needed
                h = x_over_n(height(p), 1000) * count(n);
            }
            if (
                ((h <= 0) || (h <= delta))
                && (height(p) + height(r) <= dimen(n))
            ) {
                page_goal = page_goal - h;
                height(r) = height(r) + height(p);
            } else {
                ⟦1064 Find the best way to split the insertion, and change |type(r)| to |split_up|⟧
            }
        }
        goto contribute;
    }
⟧

1063. We take note of the value of \skip n and the height plus depth of \box n only when the first \insert n node is encountered for a new page. A user who changes the contents of \box n after that first \insert n had better be either extremely careful or extremely lucky, or both.

⟦1063 Create a page insertion node with |subtype(r)=qi(n)|, and include the glue correction for box |n| in the current page state⟧ = ⟦
    {
        q = get_node(page_ins_node_size);
        link(q) = link(r);
        link(r) = q;
        r = q;
        subtype(r) = qi(n);
        type(r) = inserting;
        ensure_vbox(n);
        if (box(n) == null) {
            height(r) = 0;
        } else {
            height(r) = height(box(n)) + depth(box(n));
        }
        best_ins_ptr(r) = null;
        q = skip(n);
        if (count(n) == 1000) {
            h = height(r);
        } else {
            h = x_over_n(height(r), 1000) * count(n);
        }
        page_goal = page_goal - h - width(q);
        page_so_far[2 + stretch_order(q)] = 
            page_so_far[2 + stretch_order(q)]
            + stretch(q)
        ;
        page_shrink = page_shrink + shrink(q);
        if ((shrink_order(q) != normal) && (shrink(q) != 0)) {
            print_err(
              strpool!("Infinite glue shrinkage inserted from "),
            );
            print_esc(strpool!("skip"));
            print_int(n);
            help3(
              strpool!("The correction glue for page breaking with insertions"),
            )(
              strpool!("must have finite shrinkability. But you may proceed,"),
            )(
              strpool!("since the offensive shrinkability has been made finite."),
            );
            error;
        }
    }
⟧

1064. Here is the code that will split a long footnote between pages, in an emergency. The current situation deserves to be recapitulated: Node p is an insertion into box n ; the insertion will not fit, in its entirety, either because it would make the total contents of box n greater than \dimen n , or because it would make the incremental amount of growth h greater than the available space delta , or both. (This amount h has been weighted by the insertion scaling factor, i.e., by \count n over 1000.) Now we will choose the best way to break the vlist of the insertion, using the same criteria as in the \vsplit operation.

⟦1064 Find the best way to split the insertion, and change |type(r)| to |split_up|⟧ = ⟦
    {
        if (count(n) <= 0) {
            w = max_dimen;
        } else {
            w = page_goal - page_total - page_depth;
            if (count(n) != 1000) {
                w = x_over_n(w, count(n)) * 1000;
            }
        }
        if (w > dimen(n) - height(r)) {
            w = dimen(n) - height(r);
        }
        q = vert_break(ins_ptr(p), w, depth(p));
        height(r) = height(r) + best_height_plus_depth;
        stat!{
            if (tracing_pages > 0) {
                ⟦1065 Display the insertion split cost⟧
            }
        }
        if (count(n) != 1000) {
            best_height_plus_depth = 
                x_over_n(best_height_plus_depth, 1000)
                * count(n)
            ;
        }
        page_goal = page_goal - best_height_plus_depth;
        type(r) = split_up;
        broken_ptr(r) = q;
        broken_ins(r) = p;
        if (q == null) {
            insert_penalties = 
                insert_penalties
                + eject_penalty
            ;
        } else if (type(q) == penalty_node) {
            insert_penalties = insert_penalties + penalty(q);
        }
    }
⟧

1065.

⟦1065 Display the insertion split cost⟧ = ⟦
    {
        begin_diagnostic;
        print_nl(strpool!("% split"));
        print_int(n);
        print(strpool!(" to "));
        print_scaled(w);
        print_char(ord!(","));
        print_scaled(best_height_plus_depth);
        print(strpool!(" p="));
        if (q == null) {
            print_int(eject_penalty);
        } else if (type(q) == penalty_node) {
            print_int(penalty(q));
        } else {
            print_char(ord!("0"));
        }
        end_diagnostic(false);
    }
⟧

1066. When the page builder has looked at as much material as could appear before the next page break, it makes its decision. The break that gave minimum badness will be used to put a completed “page” into box 255, with insertions appended to their other boxes.

We also set the values of top_mark , first_mark , and bot_mark . The program uses the fact that bot_mark != null implies first_mark != null ; it also knows that bot_mark == null implies top_mark == first_mark == null .

The fire_up subroutine prepares to output the current page at the best place; then it fires up the user’s output routine, if there is one, or it simply ships out the page. There is one parameter, c , which represents the node that was being contributed to the page when the decision to force an output was made.

⟦1066 Declare the procedure called |fire_up|⟧ = ⟦
    function fire_up(c: pointer) {
        label exit;
        var
          p, q, r, s: pointer, // nodes being examined 
          // and/or changed
          prev_p: pointer, // predecessor of p 
          n: min_quarterword .. biggest_reg, // insertion 
          // box number
          wait: boolean, // should the present insertion be 
          // held over?
          save_vbadness: integer, // saved value of vbadness 
          save_vfuzz: scaled, // saved value of vfuzz 
          save_split_top_skip: pointer; // saved value of 
          // split_top_skip 
        
        ⟦1067 Set the value of |output_penalty|⟧
        if (sa_mark != null) {
            if (do_marks(fire_up_init, 0, sa_mark)) {
                sa_mark = null;
            }
        }
        if (bot_mark != null) {
            if (top_mark != null) {
                delete_token_ref(top_mark);
            }
            top_mark = bot_mark;
            add_token_ref(top_mark);
            delete_token_ref(first_mark);
            first_mark = null;
        }
        ⟦1068 Put the \(o)optimal current page into box 255, update |first_mark| and |bot_mark|, append insertions to their boxes, and put the remaining nodes back on the contribution list⟧
        if (sa_mark != null) {
            if (do_marks(fire_up_done, 0, sa_mark)) {
                sa_mark = null;
            }
        }
        if ((top_mark != null) && (first_mark == null)) {
            first_mark = top_mark;
            add_token_ref(top_mark);
        }
        if (output_routine != null) {
            if (dead_cycles >= max_dead_cycles) {
                ⟦1078 Explain that too many dead cycles have occurred in a row⟧
            } else {
                ⟦1079 Fire up the user's output routine and |return|⟧
            }
        }
        ⟦1077 Perform the default output routine⟧
      exit:
    }
⟧

1067.

⟦1067 Set the value of |output_penalty|⟧ = ⟦
    if (type(best_page_break) == penalty_node) {
        geq_word_define(
          int_base + output_penalty_code,
          penalty(best_page_break),
        );
        penalty(best_page_break) = inf_penalty;
    } else {
        geq_word_define(
          int_base + output_penalty_code,
          inf_penalty,
        );
    }
⟧

1068. As the page is finally being prepared for output, pointer p runs through the vlist, with prev_p trailing behind; pointer q is the tail of a list of insertions that are being held over for a subsequent page.

⟦1068 Put the \(o)optimal current page into box 255, update |first_mark| and |bot_mark|, append insertions to their boxes, and put the remaining nodes back on the contribution list⟧ = ⟦
    if (c == best_page_break) {
        //  c not yet linked in
        best_page_break = null;
    }

    ⟦1069 Ensure that box 255 is empty before output⟧

    // this will count the number of insertions held over
    insert_penalties = 0

    save_split_top_skip = split_top_skip

    if (holding_inserts <= 0) {
        ⟦1072 Prepare all the boxes involved in insertions to act as queues⟧
    }

    q = hold_head

    link(q) = null

    prev_p = page_head

    p = link(prev_p)

    while (p != best_page_break) {
        if (type(p) == ins_node) {
            if (holding_inserts <= 0) {
                ⟦1074 Either insert the material specified by node |p| into the appropriate box, or hold it for the next page; also delete node |p| from the current page⟧
            }
        } else if (type(p) == mark_node) {
            if (mark_class(p) != 0) {
                ⟦1641 Update the current marks for |fire_up|⟧
            } else {
                ⟦1070 Update the values of |first_mark| and |bot_mark|⟧
            }
        }
        prev_p = p;
        p = link(prev_p);
    }

    split_top_skip = save_split_top_skip

    ⟦1071 Break the current page at node |p|, put it in box~255, and put the remaining nodes on the contribution list⟧

    ⟦1073 Delete \(t)the page-insertion nodes⟧

1069.

⟦1069 Ensure that box 255 is empty before output⟧ = ⟦
    if (box(255) != null) {
        print_err(strpool!(""));
        print_esc(strpool!("box"));
        print(strpool!("255 is not void"));
        help2(
          strpool!("You shouldn't use \\box255 except in \\output routines."),
        )(
          strpool!("Proceed, and I'll discard its present contents."),
        );
        box_error(255);
    }
⟧

1070.

⟦1070 Update the values of |first_mark| and |bot_mark|⟧ = ⟦
    {
        if (first_mark == null) {
            first_mark = mark_ptr(p);
            add_token_ref(first_mark);
        }
        if (bot_mark != null) {
            delete_token_ref(bot_mark);
        }
        bot_mark = mark_ptr(p);
        add_token_ref(bot_mark);
    }
⟧

1071. When the following code is executed, the current page runs from node link(page_head) to node prev_p , and the nodes from p to page_tail are to be placed back at the front of the contribution list. Furthermore the heldover insertions appear in a list from link(hold_head) to q ; we will put them into the current page list for safekeeping while the user’s output routine is active. We might have q == hold_head ; and p == null if and only if prev_p == page_tail . Error messages are suppressed within vpackage , since the box might appear to be overfull or underfull simply because the stretch and shrink from the \skip registers for inserts are not actually present in the box.

⟦1071 Break the current page at node |p|, put it in box~255, and put the remaining nodes on the contribution list⟧ = ⟦
    if (p != null) {
        if (link(contrib_head) == null) {
            if (nest_ptr == 0) {
                tail = page_tail;
            } else {
                contrib_tail = page_tail;
            }
        }
        link(page_tail) = link(contrib_head);
        link(contrib_head) = p;
        link(prev_p) = null;
    }

    save_vbadness = vbadness

    vbadness = inf_bad

    save_vfuzz = vfuzz

    vfuzz = max_dimen // inhibit error messages

    box(255) = vpackage(
      link(page_head),
      best_size,
      exactly,
      page_max_depth,
    )

    vbadness = save_vbadness

    vfuzz = save_vfuzz

    if (last_glue != max_halfword) {
        delete_glue_ref(last_glue);
    }

    // this sets last_glue = max_halfword 
    ⟦1045 Start a new current page⟧

    if (q != hold_head) {
        link(page_head) = link(hold_head);
        page_tail = q;
    }
⟧

1072. If many insertions are supposed to go into the same box, we want to know the position of the last node in that box, so that we don’t need to waste time when linking further information into it. The last_ins_ptr fields of the page insertion nodes are therefore used for this purpose during the packaging phase.

⟦1072 Prepare all the boxes involved in insertions to act as queues⟧ = ⟦
    {
        r = link(page_ins_head);
        while (r != page_ins_head) {
            if (best_ins_ptr(r) != null) {
                n = qo(subtype(r));
                ensure_vbox(n);
                if (box(n) == null) {
                    box(n) = new_null_box;
                }
                p = box(n) + list_offset;
                while (link(p) != null) {
                    p = link(p);
                }
                last_ins_ptr(r) = p;
            }
            r = link(r);
        }
    }
⟧

1073.

⟦1073 Delete \(t)the page-insertion nodes⟧ = ⟦
    r = link(page_ins_head)

    while (r != page_ins_head) {
        q = link(r);
        free_node(r, page_ins_node_size);
        r = q;
    }

    link(page_ins_head) = page_ins_head
⟧

1074. We will set best_ins_ptr = null and package the box corresponding to insertion node r , just after making the final insertion into that box. If this final insertion is ‘split_up ’, the remainder after splitting and pruning (if any) will be carried over to the next page.

⟦1074 Either insert the material specified by node |p| into the appropriate box, or hold it for the next page; also delete node |p| from the current page⟧ = ⟦
    {
        r = link(page_ins_head);
        while (subtype(r) != subtype(p)) {
            r = link(r);
        }
        if (best_ins_ptr(r) == null) {
            wait = true;
        } else {
            wait = false;
            s = last_ins_ptr(r);
            link(s) = ins_ptr(p);
            if (best_ins_ptr(r) == p) {
                ⟦1075 Wrap up the box specified by node |r|, splitting node |p| if called for; set |wait:=true| if node |p| holds a remainder after splitting⟧
            } else {
                while (link(s) != null) {
                    s = link(s);
                }
                last_ins_ptr(r) = s;
            }
        }
        ⟦1076 Either append the insertion node |p| after node |q|, and remove it from the current page, or delete |node(p)|⟧
    }
⟧

1075.

⟦1075 Wrap up the box specified by node |r|, splitting node |p| if called for; set |wait:=true| if node |p| holds a remainder after splitting⟧ = ⟦
    {
        if (type(r) == split_up) {
            if (
                (broken_ins(r) == p)
                && (broken_ptr(r) != null)
            ) {
                while (link(s) != broken_ptr(r)) {
                    s = link(s);
                }
                link(s) = null;
                split_top_skip = split_top_ptr(p);
                ins_ptr(p) = prune_page_top(
                  broken_ptr(r),
                  false,
                );
                if (ins_ptr(p) != null) {
                    temp_ptr = vpack(ins_ptr(p), natural);
                    height(p) = 
                        height(temp_ptr)
                        + depth(temp_ptr)
                    ;
                    free_node(temp_ptr, box_node_size);
                    wait = true;
                }
            }
        }
        best_ins_ptr(r) = null;
        n = qo(subtype(r));
        temp_ptr = list_ptr(box(n));
        free_node(box(n), box_node_size);
        box(n) = vpack(temp_ptr, natural);
    }
⟧

1076.

⟦1076 Either append the insertion node |p| after node |q|, and remove it from the current page, or delete |node(p)|⟧ = ⟦
    link(prev_p) = link(p)

    link(p) = null

    if (wait) {
        link(q) = p;
        q = p;
        incr(insert_penalties);
    } else {
        delete_glue_ref(split_top_ptr(p));
        free_node(p, ins_node_size);
    }

    p = prev_p
⟧

1077. The list of heldover insertions, running from link(page_head) to page_tail , must be moved to the contribution list when the user has specified no output routine.

⟦1077 Perform the default output routine⟧ = ⟦
    {
        if (link(page_head) != null) {
            if (link(contrib_head) == null) {
                if (nest_ptr == 0) {
                    tail = page_tail;
                } else {
                    contrib_tail = page_tail;
                }
            } else {
                link(page_tail) = link(contrib_head);
            }
            link(contrib_head) = link(page_head);
            link(page_head) = null;
            page_tail = page_head;
        }
        flush_node_list(page_disc);
        page_disc = null;
        ship_out(box(255));
        box(255) = null;
    }
⟧

1078.

⟦1078 Explain that too many dead cycles have occurred in a row⟧ = ⟦
    {
        print_err(strpool!("Output loop---"));
        print_int(dead_cycles);
        print(strpool!(" consecutive dead cycles"));
        help3(
          strpool!("I've concluded that your \\output is awry; it never does a"),
        )(
          strpool!("\\shipout, so I'm shipping \\box255 out myself. Next time"),
        )(
          strpool!("increase \\maxdeadcycles if you want me to be more patient!"),
        );
        error;
    }
⟧

1079.

⟦1079 Fire up the user's output routine and |return|⟧ = ⟦
    {
        output_active = true;
        incr(dead_cycles);
        push_nest;
        mode = -vmode;
        prev_depth = ignore_depth;
        mode_line = -line;
        begin_token_list(output_routine, output_text);
        new_save_level(output_group);
        normal_paragraph;
        scan_left_brace;
        return;
    }
⟧

1080. When the user’s output routine finishes, it has constructed a vlist in internal vertical mode, and TEX will do the following:

⟦1080 Resume the page builder after an output routine has come to an end⟧ = ⟦
    {
        if (
            (loc != null)
            || (
                (token_type != output_text)
                && (token_type != backed_up)
            )
        ) {
            ⟦1081 Recover from an unbalanced output routine⟧
        }
        // conserve stack space in case more outputs are 
        // triggered
        end_token_list;
        end_graf;
        unsave;
        output_active = false;
        insert_penalties = 0;
        ⟦1082 Ensure that box 255 is empty after output⟧
        // current list goes after heldover insertions
        if (tail != head) {
            link(page_tail) = link(head);
            page_tail = tail;
        }
        // and both go before heldover contributions
        if (link(page_head) != null) {
            if (link(contrib_head) == null) {
                contrib_tail = page_tail;
            }
            link(page_tail) = link(contrib_head);
            link(contrib_head) = link(page_head);
            link(page_head) = null;
            page_tail = page_head;
        }
        flush_node_list(page_disc);
        page_disc = null;
        pop_nest;
        build_page;
    }
⟧

1081.

⟦1081 Recover from an unbalanced output routine⟧ = ⟦
    {
        print_err(strpool!("Unbalanced output routine"));
        help2(
          strpool!("Your sneaky output routine has problematic {'s and/or }'s."),
        )(
          strpool!("I can't handle that very well; good luck."),
        );
        error;
        repeat {
            get_token;
        } until (loc == null);
        // loops forever if reading from a file, since null 
        // == min_halfword <= 0 
    }
⟧

1082.

⟦1082 Ensure that box 255 is empty after output⟧ = ⟦
    if (box(255) != null) {
        print_err(
          strpool!("Output routine didn't use all of "),
        );
        print_esc(strpool!("box"));
        print_int(255);
        help3(
          strpool!("Your \\output commands should empty \\box255,"),
        )(strpool!("e.g., by saying `\\shipout\\box255'."))(
          strpool!("Proceed; I'll discard its present contents."),
        );
        box_error(255);
    }
⟧

1083. [46] The chief executive. We come now to the main_control routine, which contains the master switch that causes all the various pieces of TEX to do their things, in the right order.

In a sense, this is the grand climax of the program: It applies all the tools that we have worked so hard to construct. In another sense, this is the messiest part of the program: It necessarily refers to other pieces of code all over the place, so that a person can’t fully understand what is going on without paging back and forth to be reminded of conventions that are defined elsewhere. We are now at the hub of the web, the central nervous system that touches most of the other parts and ties them together.

The structure of main_control itself is quite simple. There’s a label called big_switch , at which point the next token of input is fetched using get_x_token . Then the program branches at high speed into one of about 100 possible directions, based on the value of the current mode and the newly fetched command code; the sum abs(mode) + cur_cmd indicates what to do next. For example, the case ‘vmode + letter ’ arises when a letter occurs in vertical mode (or internal vertical mode); this case leads to instructions that initialize a new paragraph and enter horizontal mode.

The big case statement that contains this multiway switch has been labeled reswitch , so that the program can goto reswitch when the next token has already been fetched. Most of the cases are quite short; they call an “action procedure” that does the work for that case, and then they either goto reswitch or they “fall through” to the end of the case statement, which returns control back to big_switch . Thus, main_control is not an extremely large procedure, in spite of the multiplicity of things it must do; it is small enough to be handled by Pascal compilers that put severe restrictions on procedure size.

One case is singled out for special treatment, because it accounts for most of TEX’s activities in typical applications. The process of reading simple text and converting it into char_node records, while looking for ligatures and kerns, is part of TEX’s “inner loop”; the whole program runs efficiently when its inner loop is fast, so this part has been written with particular care.

1084. We shall concentrate first on the inner loop of main_control , deferring consideration of the other cases until later.

// go here to branch on the next token of input
@define big_switch => 60
// go here to typeset a string of consecutive characters
@define main_loop => 70
// go here to collect characters in a "native" font string
@define collect_native => 71
@define collected => 72
// go here to finish a character or ligature
@define main_loop_wrapup => 80
// go here to advance the ligature cursor
@define main_loop_move => 90
// same, when advancing past a generated ligature
@define main_loop_move_lig => 95
// go here to bring in another character, if any
@define main_loop_lookahead => 100
// go here to check for ligatures or kerning
@define main_lig_loop => 110
// go here to append a normal space between words
@define append_normal_space => 120
//  pdf_box_type passed to find_pic_file 
@define pdfbox_crop => 1
@define pdfbox_media => 2
@define pdfbox_bleed => 3
@define pdfbox_trim => 4
@define pdfbox_art => 5
@define pdfbox_none => 6
⟦1097 Declare action procedures for use by |main_control|⟧

⟦1122 Declare the procedure called |handle_right_brace|⟧

// governs \TeX's activities
function main_control() {
    label
        big_switch,
        reswitch,
        main_loop,
        main_loop_wrapup,
        main_loop_move,
        main_loop_move + 1,
        main_loop_move + 2,
        main_loop_move_lig,
        main_loop_lookahead,
        main_loop_lookahead + 1,
        main_lig_loop,
        main_lig_loop + 1,
        main_lig_loop + 2,
        collect_native,
        collected,
        append_normal_space,
        exit;
    var
      t: integer; // general-purpose temporary variable
    
    if (every_job != null) {
        begin_token_list(every_job, every_job_text);
    }
  big_switch:
    get_x_token;
  reswitch:
    ⟦1085 Give diagnostic information, if requested⟧
    case abs(mode) + cur_cmd {
      hmode + letter,
      hmode + other_char,
      hmode + char_given:
        goto main_loop;
      hmode + char_num:
        scan_usv_num;
        cur_chr = cur_val;
        goto main_loop;
      hmode + no_boundary:
        get_x_token;
        if (
            (cur_cmd == letter)
            || (cur_cmd == other_char)
            || (cur_cmd == char_given)
            || (cur_cmd == char_num)
        ) {
            cancel_boundary = true;
        }
        goto reswitch;
      othercases:
        if (abs(mode) == hmode) {
            check_for_post_char_toks(big_switch);
        }
        case abs(mode) + cur_cmd {
          hmode + spacer:
            if (space_factor == 1000) {
                goto append_normal_space;
            } else {
                app_space;
            }
          hmode + ex_space, mmode + ex_space:
            goto append_normal_space;
          ⟦1099 Cases of |main_control| that are not part of the inner loop⟧
        }
      // of the big case statement
    }
    goto big_switch;
  main_loop:
    ⟦1088 Append character |cur_chr| and the following characters (if~any) to the current hlist in the current font; |goto reswitch| when a non-character has been fetched⟧
  append_normal_space:
    check_for_post_char_toks(big_switch);
    ⟦1095 Append a normal inter-word space to the current list, then |goto big_switch|⟧
  exit:
}

1085. When a new token has just been fetched at big_switch , we have an ideal place to monitor TEX’s activity.

⟦1085 Give diagnostic information, if requested⟧ = ⟦
    if (interrupt != 0) {
        if (OK_to_interrupt) {
            back_input;
            check_interrupt;
            goto big_switch;
        }
    }

    debug!{
        if (panicking) {
            check_mem(false);
        }
    }

    if (tracing_commands > 0) {
        show_cur_cmd_chr;
    }
⟧

1086. The following part of the program was first written in a structured manner, according to the philosophy that “premature optimization is the root of all evil.” Then it was rearranged into pieces of spaghetti so that the most common actions could proceed with little or no redundancy.

The original unoptimized form of this algorithm resembles the reconstitute procedure, which was described earlier in connection with hyphenation. Again we have an implied “cursor” between characters cur_l and cur_r . The main difference is that the lig_stack can now contain a charnode as well as pseudo-ligatures; that stack is now usually nonempty, because the next character of input (if any) has been appended to it. In main_control we have

𝑐𝑢𝑟_𝑟={𝑐𝑎𝑟𝑎𝑐𝑡𝑒𝑟(𝑙𝑖𝑔_𝑠𝑡𝑎𝑐𝑘),iflig_stack>null;𝑓𝑜𝑛𝑡_𝑏𝑐𝑎𝑟[𝑐𝑢𝑟_𝑓𝑜𝑛𝑡],otherwise;
except when character(lig_stack) == font_false_bchar[cur_font] . Several additional global variables are needed.

⟦13 Global variables⟧ += ⟦
    // the current font
    var main_f: internal_font_number;

    // character information bytes for cur_l 
    var main_i: four_quarters;

    // ligature/kern command
    var main_j: four_quarters;

    // index into font_info 
    var main_k: font_index;

    // temporary register for list manipulation
    var main_p: pointer;

    // more temporary registers for list manipulation
    var main_pp, main_ppp: pointer;

    // temp for hyphen offset in native-font text
    var main_h: pointer;

    // whether the last char seen is the font's hyphenchar
    var is_hyph: boolean;

    var space_class: integer;

    var prev_class: integer;

    // space factor value
    var main_s: integer;

    // boundary character of current font, or non_char 
    var bchar: halfword;

    // nonexistent character matching bchar , or non_char 
    var false_bchar: halfword;

    // should the left boundary be ignored?
    var cancel_boundary: boolean;

    // should we insert a discretionary node?
    var ins_disc: boolean;
⟧

1087. The boolean variables of the main loop are normally false, and always reset to false before the loop is left. That saves us the extra work of initializing each time.

⟦23 Set initial values of key variables⟧ += ⟦
    ligature_present = false

    cancel_boundary = false

    lft_hit = false

    rt_hit = false

    ins_disc = false
⟧

1088. We leave the space_factor unchanged if sf_code(cur_chr) == 0 ; otherwise we set it equal to sf_code(cur_chr) , except that it should never change from a value less than 1000 to a value exceeding 1000. The most common case is sf_code(cur_chr) == 1000 , so we want that case to be fast.

The overall structure of the main loop is presented here. Some program labels are inside the individual sections.

@define adjust_space_factor =>
    main_s = sf_code(cur_chr) % 0x10000;
    if (main_s == 1000) {
        space_factor = 1000;
    } else if (main_s < 1000) {
        if (main_s > 0) {
            space_factor = main_s;
        }
    } else if (space_factor < 1000) {
        space_factor = 1000;
    } else {
        space_factor = main_s;
    }
// check for a spacing token list, goto # if found, or 
// big_switch in case of the initial letter of a run
@define check_for_inter_char_toks(#) =>
    cur_ptr = null;
    space_class = sf_code(cur_chr) div 0x10000;
    if (
        XeTeX_inter_char_tokens_en
        && space_class != char_class_ignored
    ) {
        // class 4096 = ignored (for combining marks etc)
        if (prev_class == char_class_boundary) {
            // boundary
            if (
                (state != token_list)
                || (token_type != backed_up_char)
            ) {
                find_sa_element(
                  inter_char_val,
                  
                      char_class_boundary
                      * char_class_limit + space_class
                  ,
                  false,
                );
                if (
                    (cur_ptr != null)
                    && (sa_ptr(cur_ptr) != null)
                ) {
                    if (cur_cmd != letter) {
                        cur_cmd = other_char;
                    }
                    cur_tok = 
                        (cur_cmd * max_char_val)
                        + cur_chr
                    ;
                    back_input;
                    token_type = backed_up_char;
                    begin_token_list(
                      sa_ptr(cur_ptr),
                      inter_char_text,
                    );
                    goto big_switch;
                }
            }
        } else {
            find_sa_element(
              inter_char_val,
              prev_class * char_class_limit + space_class,
              false,
            );
            if (
                (cur_ptr != null)
                && (sa_ptr(cur_ptr) != null)
            ) {
                if (cur_cmd != letter) {
                    cur_cmd = other_char;
                }
                cur_tok = (cur_cmd * max_char_val) + cur_chr;
                back_input;
                token_type = backed_up_char;
                begin_token_list(
                  sa_ptr(cur_ptr),
                  inter_char_text,
                );
                prev_class = char_class_boundary;
                goto #;
            }
        }
        prev_class = space_class;
    }
@define check_for_post_char_toks(#) =>
    if (
        XeTeX_inter_char_tokens_en
        && (space_class != char_class_ignored)
        && (prev_class != char_class_boundary)
    ) {
        prev_class = char_class_boundary;
        // boundary
        find_sa_element(
          inter_char_val,
          
              space_class
              * char_class_limit + char_class_boundary
          ,
          false,
        );
        if ((cur_ptr != null) && (sa_ptr(cur_ptr) != null)) {
            if (cur_cs == 0) {
                if (cur_cmd == char_num) {
                    cur_cmd = other_char;
                }
                cur_tok = (cur_cmd * max_char_val) + cur_chr;
            } else {
                cur_tok = cs_token_flag + cur_cs;
            }
            back_input;
            begin_token_list(
              sa_ptr(cur_ptr),
              inter_char_text,
            );
            goto #;
        }
    }
⟦1088 Append character |cur_chr| and the following characters (if~any) to the current hlist in the current font; |goto reswitch| when a non-character has been fetched⟧ = ⟦
    if (((head == tail) && (mode > 0))) {
        if ((insert_src_special_auto)) {
            append_src_special;
        }
    }

    // boundary
    // added code for native font support
    prev_class = char_class_boundary

    if (is_native_font(cur_font)) {
        if (mode > 0) {
            if (language != clang) {
                fix_language;
            }
        }
        main_h = 0;
        main_f = cur_font;
        native_len = 0;
      collect_native:
        adjust_space_factor;
        check_for_inter_char_toks(collected);
        if ((cur_chr > 0xffff)) {
            native_room(2);
            append_native(
              (cur_chr - 0x10000) div 1024 + 0xd800,
            );
            append_native(
              (cur_chr - 0x10000) % 1024 + 0xdc00,
            );
        } else {
            native_room(1);
            append_native(cur_chr);
        }
        is_hyph = 
            (cur_chr == hyphen_char[main_f])
            || (
                XeTeX_dash_break_en
                && (
                    (cur_chr == 0x2014)
                    || (cur_chr == 0x2013)
                )
            )
        ;
        if ((main_h == 0) && is_hyph) {
            // try to collect as many chars as possible in 
            // the same font
            main_h = native_len;
        }
        get_next;
        if (
            (cur_cmd == letter)
            || (cur_cmd == other_char)
            || (cur_cmd == char_given)
        ) {
            goto collect_native;
        }
        x_token;
        if (
            (cur_cmd == letter)
            || (cur_cmd == other_char)
            || (cur_cmd == char_given)
        ) {
            goto collect_native;
        }
        if (cur_cmd == char_num) {
            scan_usv_num;
            cur_chr = cur_val;
            goto collect_native;
        }
        check_for_post_char_toks(collected);
      collected:
        if ((font_mapping[main_f] != 0)) {
            main_k = apply_mapping(
              font_mapping[main_f],
              native_text,
              native_len,
            );
            native_len = 0;
            native_room(main_k);
            main_h = 0;
            for (main_p in 0 to main_k - 1) {
                append_native(mapped_text[main_p]);
                if (
                    (main_h == 0)
                    && (
                        (
                            mapped_text[main_p]
                            == hyphen_char[main_f]
                        )
                        || (
                            XeTeX_dash_break_en
                            && (
                                (
                                    mapped_text[main_p]
                                    == 0x2014
                                )
                                || (
                                    mapped_text[main_p]
                                    == 0x2013
                                )
                            )
                        )
                    )
                ) {
                    main_h = native_len;
                }
            }
        }
        if (tracing_lost_chars > 0) {
            temp_ptr = 0;
            while ((temp_ptr < native_len)) {
                main_k = native_text[temp_ptr];
                incr(temp_ptr);
                if ((main_k >= 0xd800) && (main_k < 0xdc00)) {
                    main_k = 
                        0x10000
                        + (main_k - 0xd800) * 1024
                    ;
                    main_k = 
                        main_k
                        + native_text[temp_ptr] - 0xdc00
                    ;
                    incr(temp_ptr);
                }
                if (map_char_to_glyph(main_f, main_k) == 0) {
                    char_warning(main_f, main_k);
                }
            }
        }
        main_k = native_len;
        main_pp = tail;
        if (mode == hmode) {
            // find node preceding tail, skipping 
            // discretionaries
            main_ppp = head;
            while (
                (main_ppp != main_pp)
                && (link(main_ppp) != main_pp)
            ) {
                if (
                    (!is_char_node(main_ppp))
                    && (type(main_ppp) == disc_node)
                ) {
                    temp_ptr = main_ppp;
                    for (main_p in 1 to replace_count(
                      temp_ptr,
                    )) {
                        main_ppp = link(main_ppp);
                    }
                }
                if (main_ppp != main_pp) {
                    main_ppp = link(main_ppp);
                }
            }
            temp_ptr = 0;
            repeat {
                if (main_h == 0) {
                    main_h = main_k;
                }
                if (
                    is_native_word_node(main_pp)
                    && (native_font(main_pp) == main_f)
                    && (main_ppp != main_pp)
                    && (!is_char_node(main_ppp))
                    && (type(main_ppp) != disc_node)
                ) {
                    // make a new temp string that contains 
                    // the concatenated text of tail + the 
                    // current word/fragment
                    main_k = main_h + native_length(main_pp);
                    native_room(main_k);
                    save_native_len = native_len;
                    for (main_p in 0 to 
                        native_length(main_pp)
                        - 1
                    ) {
                        append_native(
                          get_native_char(main_pp, main_p),
                        );
                    }
                    for (main_p in 0 to main_h - 1) {
                        append_native(
                          native_text[temp_ptr + main_p],
                        );
                    }
                    do_locale_linebreaks(
                      save_native_len,
                      main_k,
                    );
                    // discard the temp string
                    native_len = save_native_len;
                    // and set main_k to remaining length of 
                    // new word
                    main_k = native_len - main_h - temp_ptr;
                    // pointer to remaining fragment
                    temp_ptr = main_h;
                    main_h = 0;
                    while (
                        (main_h < main_k)
                        && (
                            native_text[temp_ptr + main_h]
                            != hyphen_char[main_f]
                        )
                        && (
                            (!XeTeX_dash_break_en)
                            || (
                                (
                                    native_text[
                                      temp_ptr + main_h,
                                    ]
                                    != 0x2014
                                )
                                && (
                                    native_text[
                                      temp_ptr + main_h,
                                    ]
                                    != 0x2013
                                )
                            )
                        )
                    ) {
                        // look for next hyphen or end of 
                        // text
                        incr(main_h);
                    }
                    if ((main_h < main_k)) {
                        // remove the preceding node from 
                        // the list
                        incr(main_h);
                    }
                    link(main_ppp) = link(main_pp);
                    link(main_pp) = null;
                    flush_node_list(main_pp);
                    main_pp = tail;
                    while ((link(main_ppp) != main_pp)) {
                        main_ppp = link(main_ppp);
                    }
                } else {
                    // append fragment of current word
                    do_locale_linebreaks(temp_ptr, main_h);
                    // advance ptr to remaining fragment
                    temp_ptr = temp_ptr + main_h;
                    // decrement remaining length
                    main_k = main_k - main_h;
                    main_h = 0;
                    while (
                        (main_h < main_k)
                        && (
                            native_text[temp_ptr + main_h]
                            != hyphen_char[main_f]
                        )
                        && (
                            (!XeTeX_dash_break_en)
                            || (
                                (
                                    native_text[
                                      temp_ptr + main_h,
                                    ]
                                    != 0x2014
                                )
                                && (
                                    native_text[
                                      temp_ptr + main_h,
                                    ]
                                    != 0x2013
                                )
                            )
                        )
                    ) {
                        // look for next hyphen or end of 
                        // text
                        incr(main_h);
                    }
                    if ((main_h < main_k)) {
                        incr(main_h);
                    }
                }
                if ((main_k > 0) || is_hyph) {
                    // add a break if we aren't at end of 
                    // text (must be a hyphen), or if last 
                    // char in original text was a hyphen
                    tail_append(new_disc);
                    main_pp = tail;
                }
            } until (main_k == 0);
        } else {
            // must be restricted hmode, so no need for 
            // line-breaking or discretionaries
            // but there might already be explicit disc_node 
            // s in the list
            // find node preceding tail, skipping 
            // discretionaries
            main_ppp = head;
            while (
                (main_ppp != main_pp)
                && (link(main_ppp) != main_pp)
            ) {
                if (
                    (!is_char_node(main_ppp))
                    && (type(main_ppp) == disc_node)
                ) {
                    temp_ptr = main_ppp;
                    for (main_p in 1 to replace_count(
                      temp_ptr,
                    )) {
                        main_ppp = link(main_ppp);
                    }
                }
                if (main_ppp != main_pp) {
                    main_ppp = link(main_ppp);
                }
            }
            if (
                is_native_word_node(main_pp)
                && (native_font(main_pp) == main_f)
                && (main_ppp != main_pp)
                && (!is_char_node(main_ppp))
                && (type(main_ppp) != disc_node)
            ) {
                // total string length for the new merged 
                // whatsit
                link(main_pp) = new_native_word_node(
                  main_f,
                  main_k + native_length(main_pp),
                );
                // copy text from the old one into the new
                tail = link(main_pp);
                for (main_p in 0 to 
                    native_length(main_pp)
                    - 1
                ) {
                    // append the new text
                    set_native_char(
                      tail,
                      main_p,
                      get_native_char(main_pp, main_p),
                    );
                }
                for (main_p in 0 to main_k - 1) {
                    set_native_char(
                      tail,
                      main_p + native_length(main_pp),
                      native_text[main_p],
                    );
                }
                // remove the preceding node from the list
                set_native_metrics(
                  tail,
                  XeTeX_use_glyph_metrics,
                );
                main_p = head;
                if (main_p != main_pp) {
                    while (link(main_p) != main_pp) {
                        main_p = link(main_p);
                    }
                }
                link(main_p) = link(main_pp);
                link(main_pp) = null;
                flush_node_list(main_pp);
            } else {
                // package the current string into a 
                // native_word whatsit
                link(main_pp) = new_native_word_node(
                  main_f,
                  main_k,
                );
                tail = link(main_pp);
                for (main_p in 0 to main_k - 1) {
                    set_native_char(
                      tail,
                      main_p,
                      native_text[main_p],
                    );
                }
                set_native_metrics(
                  tail,
                  XeTeX_use_glyph_metrics,
                );
            }
        }
        if (XeTeX_interword_space_shaping_state > 0) {
            //  tail is a word we have just appended. If it 
            // is preceded by another word with a normal 
            // inter-word space between (all in the same 
            // font), then we will measure that space in 
            // context and replace it with an adjusted glue 
            // value if it differs from the font's normal 
            // space.
            // First we look for the most recent native_word 
            // in the list and set main_pp to it. This is 
            // potentially expensive, in the case of very 
            // long paragraphs, but in practice it's 
            // negligible compared to the cost of shaping 
            // and measurement.
            main_p = head;
            main_pp = null;
            while (main_p != tail) {
                if (is_native_word_node(main_p)) {
                    main_pp = main_p;
                }
                main_p = link(main_p);
            }
            if ((main_pp != null)) {
                // check if the font matches; if so, check 
                // the intervening nodes
                if ((native_font(main_pp) == main_f)) {
                    // Skip nodes that should be invisible 
                    // to inter-word spacing, so that e.g., 
                    // `\.{\\nobreak\\ }' doesn't prevent 
                    // contextual measurement. This loop is 
                    // guaranteed to end safely because 
                    // it'll eventually hit tail , which is 
                    // a native_word node, if nothing else 
                    // intervenes.
                    main_p = link(main_pp);
                    while ((
                      node_is_invisible_to_interword_space
                    )(main_p)) {
                        main_p = link(main_p);
                    }
                    if (!
                        is_char_node(main_p)
                        && (type(main_p) == glue_node)
                    ) {
                        // We found a glue node: we might 
                        // have an inter-word space to deal 
                        // with. Again, skip nodes that 
                        // should be invisible to inter-word 
                        // spacing. We leave main_p pointing 
                        // to the glue node; main_pp is the 
                        // preceding word.
                        main_ppp = link(main_p);
                        while (node_is_invisible_to_interword_space(
                          main_ppp,
                        )) {
                            main_ppp = link(main_ppp);
                        }
                        if (main_ppp == tail) {
                            // We found a candidate 
                            // inter-word space! Collect the 
                            // characters of both words, 
                            // separated by a single space, 
                            // into a native_word node and 
                            // measure its overall width.
                            temp_ptr = new_native_word_node(
                              main_f,
                              
                                  native_length(main_pp)
                                  + 1 + native_length(tail)
                              ,
                            );
                            main_k = 0;
                            for (t in 0 to 
                                native_length(main_pp)
                                - 1
                            ) {
                                set_native_char(
                                  temp_ptr,
                                  main_k,
                                  get_native_char(
                                    main_pp,
                                    t,
                                  ),
                                );
                                incr(main_k);
                            }
                            set_native_char(
                              temp_ptr,
                              main_k,
                              ord!(" "),
                            );
                            incr(main_k);
                            for (t in 0 to 
                                native_length(tail)
                                - 1
                            ) {
                                set_native_char(
                                  temp_ptr,
                                  main_k,
                                  get_native_char(tail, t),
                                );
                                incr(main_k);
                            }
                            // The contextual space width is 
                            // the difference between this 
                            // width and the sum of the two 
                            // words measured separately.
                            set_native_metrics(
                              temp_ptr,
                              XeTeX_use_glyph_metrics,
                            );
                            t = 
                                width(temp_ptr)
                                - width(main_pp)
                                - width(tail)
                            ;
                            // If the desired width differs 
                            // from the font's default word 
                            // space, we will insert a 
                            // suitable kern after the 
                            // existing glue. Because kerns 
                            // are discardable, this will 
                            // behave OK during line 
                            // breaking, and it's easier 
                            // than actually 
                            // modifying/replacing the glue 
                            // node.
                            free_node(
                              temp_ptr,
                              native_size(temp_ptr),
                            );
                            if (
                                t
                                != width(font_glue[main_f])
                            ) {
                                temp_ptr = new_kern(
                                  
                                      t
                                      - width(
                                        font_glue[main_f],
                                      )
                                  ,
                                );
                                subtype(temp_ptr) = (
                                  space_adjustment
                                );
                                link(temp_ptr) = link(
                                  main_p,
                                );
                                link(main_p) = temp_ptr;
                            }
                        }
                    }
                }
            }
        }
        if (cur_ptr != null) {
            goto big_switch;
        } else {
            goto reswitch;
        }
        // End of added code for native fonts
    }

    adjust_space_factor

    check_for_inter_char_toks(big_switch)

    main_f = cur_font

    bchar = font_bchar[main_f]

    false_bchar = font_false_bchar[main_f]

    if (mode > 0) {
        if (language != clang) {
            fix_language;
        }
    }

    fast_get_avail(lig_stack)

    font(lig_stack) = main_f

    cur_l = qi(cur_chr)

    character(lig_stack) = cur_l

    cur_q = tail

    if (cancel_boundary) {
        cancel_boundary = false;
        main_k = non_address;
    } else {
        main_k = bchar_label[main_f];
    }

    if (main_k == non_address) {
        goto main_loop_move;
    }

    +2 // no left boundary processing

    cur_r = cur_l

    cur_l = non_char

    goto main_lig_loop

    +1 // begin with cursor after left boundary

    main_loop_wrapup:

    ⟦1089 Make a ligature node, if |ligature_present|; insert a null discretionary, if appropriate⟧

    main_loop_move:

    ⟦1090 If the cursor is immediately followed by the right boundary, |goto reswitch|; if it's followed by an invalid character, |goto big_switch|; otherwise move the cursor one step to the right and |goto main_lig_loop|⟧

    main_loop_lookahead:

    ⟦1092 Look ahead for another character, or leave |lig_stack| empty if there's none there⟧

    main_lig_loop:

    ⟦1093 If there's a ligature/kern command relevant to |cur_l| and |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|⟧

    main_loop_move_lig:

    ⟦1091 Move the cursor past a pseudo-ligature, then |goto main_loop_lookahead| or |main_lig_loop|⟧

1089. If link(cur_q) is nonnull when wrapup is invoked, cur_q points to the list of characters that were consumed while building the ligature character cur_l .

A discretionary break is not inserted for an explicit hyphen when we are in restricted horizontal mode. In particular, this avoids putting discretionary nodes inside of other discretionaries.

// the parameter is either rt_hit or false 
@define pack_lig(#) =>
    {
        main_p = new_ligature(main_f, cur_l, link(cur_q));
        if (lft_hit) {
            subtype(main_p) = 2;
            lft_hit = false;
        }
        if (#) {
            if (lig_stack == null) {
                incr(subtype(main_p));
                rt_hit = false;
            }
        }
        link(cur_q) = main_p;
        tail = main_p;
        ligature_present = false;
    }
@define wrapup(#) =>
    if (cur_l < non_char) {
        if (link(cur_q) > null) {
            if (character(tail) == qi(hyphen_char[main_f])) {
                ins_disc = true;
            }
        }
        if (ligature_present) {
            pack_lig(#);
        }
        if (ins_disc) {
            ins_disc = false;
            if (mode > 0) {
                tail_append(new_disc);
            }
        }
    }
⟦1089 Make a ligature node, if |ligature_present|; insert a null discretionary, if appropriate⟧ = ⟦
    wrapup(rt_hit)
⟧

1090.

⟦1090 If the cursor is immediately followed by the right boundary, |goto reswitch|; if it's followed by an invalid character, |goto big_switch|; otherwise move the cursor one step to the right and |goto main_lig_loop|⟧ = ⟦
    if (lig_stack == null) {
        goto reswitch;
    }

    cur_q = tail

    cur_l = character(lig_stack)

    main_loop_move + 1:
      if (!is_char_node(lig_stack)) {
          goto main_loop_move_lig;
      }

    main_loop_move + 2:
      if (
          (
              qo(
                effective_char(false, main_f, qi(cur_chr)),
              )
              > font_ec[main_f]
          )
          || (
              qo(
                effective_char(false, main_f, qi(cur_chr)),
              )
              < font_bc[main_f]
          )
      ) {
          char_warning(main_f, cur_chr);
          free_avail(lig_stack);
          goto big_switch;
      }

    main_i = effective_char_info(main_f, cur_l)

    if (!char_exists(main_i)) {
        char_warning(main_f, cur_chr);
        free_avail(lig_stack);
        goto big_switch;
    }

    link(tail) = lig_stack

    tail = lig_stack //  main_loop_lookahead is next

1091. Here we are at main_loop_move_lig . When we begin this code we have cur_q == tail and cur_l == character(lig_stack) .

⟦1091 Move the cursor past a pseudo-ligature, then |goto main_loop_lookahead| or |main_lig_loop|⟧ = ⟦
    main_p = lig_ptr(lig_stack)

    if (main_p > null) {
        // append a single character
        tail_append(main_p);
    }

    temp_ptr = lig_stack

    lig_stack = link(temp_ptr)

    // {\sl Sync\TeX} watch point: proper size!
    free_node(temp_ptr, small_node_size)

    main_i = char_info(main_f)(cur_l)

    ligature_present = true

    if (lig_stack == null) {
        if (main_p > null) {
            goto main_loop_lookahead;
        } else {
            cur_r = bchar;
        }
    } else {
        cur_r = character(lig_stack);
    }

    goto main_lig_loop

1092. The result of \char can participate in a ligature or kern, so we must look ahead for it.

⟦1092 Look ahead for another character, or leave |lig_stack| empty if there's none there⟧ = ⟦
    get_next // set only cur_cmd and cur_chr , for speed

    if (cur_cmd == letter) {
        goto main_loop_lookahead;
    }

    +1

    if (cur_cmd == other_char) {
        goto main_loop_lookahead;
    }

    +1

    if (cur_cmd == char_given) {
        goto main_loop_lookahead;
    }

    +1

    // now expand and set cur_cmd , cur_chr , cur_tok 
    x_token

    if (cur_cmd == letter) {
        goto main_loop_lookahead;
    }

    +1

    if (cur_cmd == other_char) {
        goto main_loop_lookahead;
    }

    +1

    if (cur_cmd == char_given) {
        goto main_loop_lookahead;
    }

    +1

    if (cur_cmd == char_num) {
        scan_char_num;
        cur_chr = cur_val;
        goto main_loop_lookahead;
        +1;
    }

    if (cur_cmd == no_boundary) {
        bchar = non_char;
    }

    cur_r = bchar

    lig_stack = null

    goto main_lig_loop

    main_loop_lookahead + 1:
      adjust_space_factor;;

    check_for_inter_char_toks(big_switch)

    fast_get_avail(lig_stack)

    font(lig_stack) = main_f

    cur_r = qi(cur_chr)

    character(lig_stack) = cur_r

    if (cur_r == false_bchar) {
        // this prevents spurious ligatures
        cur_r = non_char;
    }
⟧

1093. Even though comparatively few characters have a lig/kern program, several of the instructions here count as part of TEX’s inner loop, since a potentially long sequential search must be performed. For example, tests with Computer Modern Roman showed that about 40 per cent of all characters actually encountered in practice had a lig/kern program, and that about four lig/kern commands were investigated for every such character.

At the beginning of this code we have main_i == char_info(main_f)(cur_l) .

⟦1093 If there's a ligature/kern command relevant to |cur_l| and |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|⟧ = ⟦
    if (char_tag(main_i) != lig_tag) {
        goto main_loop_wrapup;
    }

    if (cur_r == non_char) {
        goto main_loop_wrapup;
    }

    main_k = lig_kern_start(main_f)(main_i)

    main_j = font_info[main_k].qqqq

    if (skip_byte(main_j) <= stop_flag) {
        goto main_lig_loop;
    }

    +2

    main_k = lig_kern_restart(main_f)(main_j)

    main_lig_loop + 1:
      main_j = font_info[main_k].qqqq;;

    main_lig_loop + 2:
      if (next_char(main_j) == cur_r) {
          if (skip_byte(main_j) <= stop_flag) {
              ⟦1094 Do ligature or kern command, returning to |main_lig_loop| or |main_loop_wrapup| or |main_loop_move|⟧
          }
      }

    if (skip_byte(main_j) == qi(0)) {
        incr(main_k);
    } else {
        if (skip_byte(main_j) >= stop_flag) {
            goto main_loop_wrapup;
        }
        main_k = main_k + qo(skip_byte(main_j)) + 1;
    }

    goto main_lig_loop

    +1

1094. When a ligature or kern instruction matches a character, we know from read_font_info that the character exists in the font, even though we haven’t verified its existence in the normal way.

This section could be made into a subroutine, if the code inside main_control needs to be shortened.

⟦1094 Do ligature or kern command, returning to |main_lig_loop| or |main_loop_wrapup| or |main_loop_move|⟧ = ⟦
    {
        if (op_byte(main_j) >= kern_flag) {
            wrapup(rt_hit);
            tail_append(
              new_kern(char_kern(main_f)(main_j)),
            );
            goto main_loop_move;
        }
        if (cur_l == non_char) {
            lft_hit = true;
        } else if (lig_stack == null) {
            rt_hit = true;
        }
        // allow a way out in case there's an infinite 
        // ligature loop
        check_interrupt;
        case op_byte(main_j) {
          qi(1), qi(5):
            // \.{=:\?}, \.{=:\?>}
            cur_l = rem_byte(main_j);
            main_i = char_info(main_f)(cur_l);
            ligature_present = true;
          qi(2), qi(6):
            // \.{\?=:}, \.{\?=:>}
            cur_r = rem_byte(main_j);
            // right boundary character is being consumed
            if (lig_stack == null) {
                lig_stack = new_lig_item(cur_r);
                bchar = non_char;
            } else //  link ( lig_stack ) == null 
            if (is_char_node(lig_stack)) {
                main_p = lig_stack;
                lig_stack = new_lig_item(cur_r);
                lig_ptr(lig_stack) = main_p;
            } else {
                character(lig_stack) = cur_r;
            }
          qi(3):
            // \.{\?=:\?}
            cur_r = rem_byte(main_j);
            main_p = lig_stack;
            lig_stack = new_lig_item(cur_r);
            link(lig_stack) = main_p;
          qi(7), qi(11):
            // \.{\?=:\?>}, \.{\?=:\?>>}
            wrapup(false);
            cur_q = tail;
            cur_l = rem_byte(main_j);
            main_i = char_info(main_f)(cur_l);
            ligature_present = true;
          othercases:
            cur_l = rem_byte(main_j);
            // \.{=:}
            ligature_present = true;
            if (lig_stack == null) {
                goto main_loop_wrapup;
            } else {
                goto main_loop_move;
            }
            +1;
        }
        if (op_byte(main_j) > qi(4)) {
            if (op_byte(main_j) != qi(7)) {
                goto main_loop_wrapup;
            }
        }
        if (cur_l < non_char) {
            goto main_lig_loop;
        }
        main_k = bchar_label[main_f];
        goto main_lig_loop;
        +1;
    }
⟧

1095. The occurrence of blank spaces is almost part of TEX’s inner loop, since we usually encounter about one space for every five non-blank characters. Therefore main_control gives second-highest priority to ordinary spaces.

When a glue parameter like \spaceskip is set to ‘0pt’, we will see to it later that the corresponding glue specification is precisely zero_glue , not merely a pointer to some specification that happens to be full of zeroes. Therefore it is simple to test whether a glue parameter is zero or not.

⟦1095 Append a normal inter-word space to the current list, then |goto big_switch|⟧ = ⟦
    if (space_skip == zero_glue) {
        ⟦1096 Find the glue specification, |main_p|, for text spaces in the current font⟧
        temp_ptr = new_glue(main_p);
    } else {
        temp_ptr = new_param_glue(space_skip_code);
    }

    link(tail) = temp_ptr

    tail = temp_ptr

    goto big_switch

1096. Having font_glue allocated for each text font saves both time and memory. If any of the three spacing parameters are subsequently changed by the use of \fontdimen, the find_font_dimen procedure deallocates the font_glue specification allocated here.

⟦1096 Find the glue specification, |main_p|, for text spaces in the current font⟧ = ⟦
    {
        main_p = font_glue[cur_font];
        if (main_p == null) {
            main_p = new_spec(zero_glue);
            main_k = param_base[cur_font] + space_code;
            // that's space ( cur_font ) 
            width(main_p) = font_info[main_k].sc;
            // and space_stretch ( cur_font ) 
            stretch(main_p) = font_info[main_k + 1].sc;
            // and space_shrink ( cur_font ) 
            shrink(main_p) = font_info[main_k + 2].sc;
            font_glue[cur_font] = main_p;
        }
    }
⟧

1097.

⟦1097 Declare action procedures for use by |main_control|⟧ = ⟦
    // handle spaces when space_factor != 1000 
    function app_space() {
        var
          q: pointer; // glue node
        
        if (
            (space_factor >= 2000)
            && (xspace_skip != zero_glue)
        ) {
            q = new_param_glue(xspace_skip_code);
        } else {
            if (space_skip != zero_glue) {
                main_p = space_skip;
            } else {
                ⟦1096 Find the glue specification, |main_p|, for text spaces in the current font⟧
            }
            main_p = new_spec(main_p);
            ⟦1098 Modify the glue specification in |main_p| according to the space factor⟧
            q = new_glue(main_p);
            glue_ref_count(main_p) = null;
        }
        link(tail) = q;
        tail = q;
    }
⟧

1098.

⟦1098 Modify the glue specification in |main_p| according to the space factor⟧ = ⟦
    if (space_factor >= 2000) {
        width(main_p) = 
            width(main_p)
            + extra_space(cur_font)
        ;
    }

    stretch(main_p) = xn_over_d(
      stretch(main_p),
      space_factor,
      1000,
    )

    shrink(main_p) = xn_over_d(
      shrink(main_p),
      1000,
      space_factor,
    )
⟧

1099. Whew—that covers the main loop. We can now proceed at a leisurely pace through the other combinations of possibilities.

// for mode-independent commands
@define any_mode(#) => vmode + #, hmode + #, mmode + #
⟦1099 Cases of |main_control| that are not part of the inner loop⟧ = ⟦
    any_mode(relax),
    vmode + spacer,
    mmode + spacer,
    mmode + no_boundary:
      do_nothing;;

    any_mode(ignore_spaces):
      if (cur_chr == 0) {
          ⟦440 Get the next non-blank non-call token⟧
          goto reswitch;
      } else {
          t = scanner_status;
          scanner_status = normal;
          get_next;
          scanner_status = t;
          if (cur_cs < hash_base) {
              cur_cs = prim_lookup(cur_cs - single_base);
          } else {
              cur_cs = prim_lookup(text(cur_cs));
          }
          if (cur_cs != undefined_primitive) {
              cur_cmd = prim_eq_type(cur_cs);
              cur_chr = prim_equiv(cur_cs);
              cur_tok = 
                  cs_token_flag
                  + prim_eqtb_base + cur_cs
              ;
              goto reswitch;
          }
      }

    vmode + stop:
      if (its_all_over) {
          // this is the only way out
          return;
      }

    ⟦1102 Forbidden cases detected in |main_control|⟧

    any_mode(mac_param):
      report_illegal_case;;

    ⟦1100 Math-only cases in non-math modes, or vice versa⟧

    var : insert_dollar_sign;

    ⟦1110 Cases of |main_control| that build boxes and lists⟧

    ⟦1264 Cases of |main_control| that don't depend on |mode|⟧

    ⟦1402 Cases of |main_control| that are for extensions to \TeX⟧

1100. Here is a list of cases where the user has probably gotten into or out of math mode by mistake. TEX will insert a dollar sign and rescan the current token.

@define non_math(#) => vmode + #, hmode + #
⟦1100 Math-only cases in non-math modes, or vice versa⟧ = ⟦
    non_math(sup_mark),
    non_math(sub_mark),
    non_math(math_char_num),
    non_math(math_given),
    non_math(XeTeX_math_given),
    non_math(math_comp),
    non_math(delim_num),
    non_math(left_right),
    non_math(above),
    non_math(radical),
    non_math(math_style),
    non_math(math_choice),
    non_math(vcenter),
    non_math(non_script),
    non_math(mkern),
    non_math(limit_switch),
    non_math(mskip),
    non_math(math_accent),
    mmode + endv,
    mmode + par_end,
    mmode + stop,
    mmode + vskip,
    mmode + un_vbox,
    mmode + valign,
    mmode + hrule
⟧

1101.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function insert_dollar_sign() {
        back_input;
        cur_tok = math_shift_token + ord!("$");
        print_err(strpool!("Missing $ inserted"));
        help2(
          strpool!("I've inserted a begin-math/end-math symbol since I think"),
        )(
          strpool!("you left one out. Proceed, with fingers crossed."),
        );
        ins_error;
    }
⟧

1102. When erroneous situations arise, TEX usually issues an error message specific to the particular error. For example, ‘\noalign’ should not appear in any mode, since it is recognized by the align_peek routine in all of its legitimate appearances; a special error message is given when ‘\noalign’ occurs elsewhere. But sometimes the most appropriate error message is simply that the user is not allowed to do what he or she has attempted. For example, ‘\moveleft’ is allowed only in vertical mode, and ‘\lower’ only in non-vertical modes. Such cases are enumerated here and in the other sections referred to under ‘See also ….’

⟦1102 Forbidden cases detected in |main_control|⟧ = ⟦
    vmode + vmove,
    hmode + hmove,
    mmode + hmove,
    any_mode(last_item),
⟧

1103. The ‘you_cant ’ procedure prints a line saying that the current command is illegal in the current mode; it identifies these things symbolically.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function you_cant() {
        print_err(strpool!("You can't use `"));
        print_cmd_chr(cur_cmd, cur_chr);
        print_in_mode(mode);
    }
⟧

1104.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function report_illegal_case() {
        you_cant;
        help4(
          strpool!("Sorry, but I'm not programmed to handle this case;"),
        )(
          strpool!("I'll just pretend that you didn't ask for it."),
        )(
          strpool!("If you're in the wrong mode, you might be able to"),
        )(
          strpool!("return to the right one by typing `I}' or `I$' or `I\\par'."),
        );
        error;
    }
⟧

1105. Some operations are allowed only in privileged modes, i.e., in cases that mode > 0 . The privileged function is used to detect violations of this rule; it issues an error message and returns false if the current mode is negative.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function privileged(): boolean {
        if (mode > 0) {
            privileged = true;
        } else {
            report_illegal_case;
            privileged = false;
        }
    }
⟧

1106. Either \dump or \end will cause main_control to enter the endgame, since both of them have ‘stop ’ as their command code.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("end"), stop, 0)

    primitive(strpool!("dump"), stop, 1)
⟧

1107.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    stop:

    if (chr_code == 1) {
        print_esc(strpool!("dump"));
    } else {
        print_esc(strpool!("end"));
    }
⟧

1108. We don’t want to leave main_control immediately when a stop command is sensed, because it may be necessary to invoke an \output routine several times before things really grind to a halt. (The output routine might even say ‘\gdef\end{...}’, to prolong the life of the job.) Therefore its_all_over is true only when the current page and contribution list are empty, and when the last output was not a “dead cycle.”

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    // do this when \.{\\end} or \.{\\dump} occurs
    function its_all_over(): boolean {
        label exit;
        
        if (privileged) {
            if (
                (page_head == page_tail)
                && (head == tail) && (dead_cycles == 0)
            ) {
                its_all_over = true;
                return;
            }
            // we will try to end again after ejecting 
            // residual material
            back_input;
            tail_append(new_null_box);
            width(tail) = hsize;
            tail_append(new_glue(fill_glue));
            tail_append(new_penalty(-0x40000000));
            // append \.{\\hbox to 
            // \\hsize\{\}\\vfill\\penalty-'10000000000}
            build_page;
        }
        its_all_over = false;
      exit:
    }
⟧

1109. [47] Building boxes and lists. The most important parts of main_control are concerned with TEX’s chief mission of box-making. We need to control the activities that put entries on vlists and hlists, as well as the activities that convert those lists into boxes. All of the necessary machinery has already been developed; it remains for us to “push the buttons” at the right times.

1110. As an introduction to these routines, let’s consider one of the simplest cases: What happens when ‘\hrule’ occurs in vertical mode, or ‘\vrule’ in horizontal mode or math mode? The code in main_control is short, since the scan_rule_spec routine already does most of what is required; thus, there is no need for a special action procedure.

Note that baselineskip calculations are disabled after a rule in vertical mode, by setting prev_depth = ignore_depth .

⟦1110 Cases of |main_control| that build boxes and lists⟧ = ⟦
    vmode + hrule, hmode + vrule, mmode + vrule:
      tail_append(scan_rule_spec);
      if (abs(mode) == vmode) {
          prev_depth = ignore_depth;
      } else if (abs(mode) == hmode) {
          space_factor = 1000;
      }
⟧

1111. The processing of things like \hskip and \vskip is slightly more complicated. But the code in main_control is very short, since it simply calls on the action routine append_glue . Similarly, \kern activates append_kern .

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    vmode + vskip,
    hmode + hskip,
    mmode + hskip,
    mmode + mskip:
      append_glue;;

    any_mode(kern), mmode + mkern:
      append_kern;;
⟧

1112. The hskip and vskip command codes are used for control sequences like \hss and \vfil as well as for \hskip and \vskip. The difference is in the value of cur_chr .

@define fil_code => 0 // identifies \.{\\hfil} and 
// \.{\\vfil}
// identifies \.{\\hfill} and \.{\\vfill}
@define fill_code => 1
@define ss_code => 2 // identifies \.{\\hss} and \.{\\vss}
// identifies \.{\\hfilneg} and \.{\\vfilneg}
@define fil_neg_code => 3
// identifies \.{\\hskip} and \.{\\vskip}
@define skip_code => 4
@define mskip_code => 5 // identifies \.{\\mskip}
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("hskip"), hskip, skip_code)

    primitive(strpool!("hfil"), hskip, fil_code)

    primitive(strpool!("hfill"), hskip, fill_code)

    primitive(strpool!("hss"), hskip, ss_code)

    primitive(strpool!("hfilneg"), hskip, fil_neg_code)

    primitive(strpool!("vskip"), vskip, skip_code)

    primitive(strpool!("vfil"), vskip, fil_code)

    primitive(strpool!("vfill"), vskip, fill_code)

    primitive(strpool!("vss"), vskip, ss_code)

    primitive(strpool!("vfilneg"), vskip, fil_neg_code)

    primitive(strpool!("mskip"), mskip, mskip_code)

    primitive(strpool!("kern"), kern, explicit)

    primitive(strpool!("mkern"), mkern, mu_glue)
⟧

1113.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    hskip:

    case chr_code {
      skip_code:
        print_esc(strpool!("hskip"));
      fil_code:
        print_esc(strpool!("hfil"));
      fill_code:
        print_esc(strpool!("hfill"));
      ss_code:
        print_esc(strpool!("hss"));
      othercases:
        print_esc(strpool!("hfilneg"));
    }

    vskip:

    case chr_code {
      skip_code:
        print_esc(strpool!("vskip"));
      fil_code:
        print_esc(strpool!("vfil"));
      fill_code:
        print_esc(strpool!("vfill"));
      ss_code:
        print_esc(strpool!("vss"));
      othercases:
        print_esc(strpool!("vfilneg"));
    }

    mskip:

    print_esc(strpool!("mskip"))

    kern:

    print_esc(strpool!("kern"))

    mkern:

    print_esc(strpool!("mkern"))
⟧

1114. All the work relating to glue creation has been relegated to the following subroutine. It does not call build_page , because it is used in at least one place where that would be a mistake.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function append_glue() {
        var
          s: small_number; // modifier of skip command
        
        s = cur_chr;
        case s {
          fil_code:
            cur_val = fil_glue;
          fill_code:
            cur_val = fill_glue;
          ss_code:
            cur_val = ss_glue;
          fil_neg_code:
            cur_val = fil_neg_glue;
          skip_code:
            scan_glue(glue_val);
          mskip_code:
            scan_glue(mu_val);// now cur_val points to the 
          // glue specification
        }
        tail_append(new_glue(cur_val));
        if (s >= skip_code) {
            decr(glue_ref_count(cur_val));
            if (s > skip_code) {
                subtype(tail) = mu_glue;
            }
        }
    }
⟧

1115.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function append_kern() {
        var
          s: quarterword; //  subtype of the kern node
        
        s = cur_chr;
        scan_dimen(s == mu_glue, false, false);
        tail_append(new_kern(cur_val));
        subtype(tail) = s;
    }
⟧

1116. Many of the actions related to box-making are triggered by the appearance of braces in the input. For example, when the user says ‘\hbox to 100pt{hlist}’ in vertical mode, the information about the box size (100pt, exactly ) is put onto save_stack with a level boundary word just above it, and cur_group = adjusted_hbox_group ; TEX enters restricted horizontal mode to process the hlist. The right brace eventually causes save_stack to be restored to its former state, at which time the information about the box size (100pt, exactly ) is available once again; a box is packaged and we leave restricted horizontal mode, appending the new box to the current list of the enclosing mode (in this case to the current list of vertical mode), followed by any vertical adjustments that were removed from the box by hpack .

The next few sections of the program are therefore concerned with the treatment of left and right curly braces.

1117. If a left brace occurs in the middle of a page or paragraph, it simply introduces a new level of grouping, and the matching right brace will not have such a drastic effect. Such grouping affects neither the mode nor the current list.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    non_math(left_brace):
      new_save_level(simple_group);;

    any_mode(begin_group):
      new_save_level(semi_simple_group);;

    any_mode(end_group):
      if (cur_group == semi_simple_group) {
          unsave;
      } else {
          off_save;
      }
⟧

1118. We have to deal with errors in which braces and such things are not properly nested. Sometimes the user makes an error of commission by inserting an extra symbol, but sometimes the user makes an error of omission. TEX can’t always tell one from the other, so it makes a guess and tries to avoid getting into a loop.

The off_save routine is called when the current group code is wrong. It tries to insert something into the user’s input that will help clean off the top level.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function off_save() {
        var
          p: pointer; // inserted token
        
        if (cur_group == bottom_level) {
            ⟦1120 Drop current token and complain that it was unmatched⟧
        } else {
            back_input;
            p = get_avail;
            link(temp_head) = p;
            print_err(strpool!("Missing "));
            ⟦1119 Prepare to insert a token that matches |cur_group|, and print what it is⟧
            print(strpool!(" inserted"));
            ins_list(link(temp_head));
            help5(
              strpool!("I've inserted something that you may have forgotten."),
            )(strpool!("(See the <inserted text> above.)"))(
              strpool!("With luck, this will get me unwedged. But if you"),
            )(
              strpool!("really didn't forget anything, try typing `2' now; then"),
            )(
              strpool!("my insertion and my current dilemma will both disappear."),
            );
            error;
        }
    }
⟧

1119. At this point, link(temp_head) == p , a pointer to an empty one-word node.

⟦1119 Prepare to insert a token that matches |cur_group|, and print what it is⟧ = ⟦
    case cur_group {
      semi_simple_group:
        info(p) = cs_token_flag + frozen_end_group;
        print_esc(strpool!("endgroup"));
      math_shift_group:
        info(p) = math_shift_token + ord!("$");
        print_char(ord!("$"));
      math_left_group:
        info(p) = cs_token_flag + frozen_right;
        link(p) = get_avail;
        p = link(p);
        info(p) = other_token + ord!(".");
        print_esc(strpool!("right."));
      othercases:
        info(p) = right_brace_token + ord!("}");
        print_char(ord!("}"));
    }
⟧

1120.

⟦1120 Drop current token and complain that it was unmatched⟧ = ⟦
    {
        print_err(strpool!("Extra "));
        print_cmd_chr(cur_cmd, cur_chr);
        help1(
          strpool!("Things are pretty mixed up, but I think the worst is over."),
        );
        error;
    }
⟧

1121. The routine for a right_brace character branches into many subcases, since a variety of things may happen, depending on cur_group . Some types of groups are not supposed to be ended by a right brace; error messages are given in hopes of pinpointing the problem. Most branches of this routine will be filled in later, when we are ready to understand them; meanwhile, we must prepare ourselves to deal with such errors.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    any_mode(right_brace):
      handle_right_brace;;
⟧

1122.

⟦1122 Declare the procedure called |handle_right_brace|⟧ = ⟦
    function handle_right_brace() {
        var
          p, q: pointer, // for short-term use
          d: scaled, // holds split_max_depth in 
          // insert_group 
          f: integer; // holds floating_penalty in 
          // insert_group 
        
        case cur_group {
          simple_group:
            unsave;
          bottom_level:
            print_err(strpool!("Too many }'s"));
            help2(
              strpool!("You've closed more groups than you opened."),
            )(
              strpool!("Such booboos are generally harmless, so keep going."),
            );
            error;
          semi_simple_group,
          math_shift_group,
          math_left_group:
            extra_right_brace;
          ⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧
          othercases:
            confusion(strpool!("rightbrace"));
        }
    }
⟧

1123.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function extra_right_brace() {
        print_err(strpool!("Extra }, or forgotten "));
        case cur_group {
          semi_simple_group:
            print_esc(strpool!("endgroup"));
          math_shift_group:
            print_char(ord!("$"));
          math_left_group:
            print_esc(strpool!("right"));
        }
        help5(
          strpool!("I've deleted a group-closing symbol because it seems to be"),
        )(
          strpool!("spurious, as in `$x}$'. But perhaps the } is legitimate and"),
        )(
          strpool!("you forgot something else, as in `\\hbox{$x}'. In such cases"),
        )(
          strpool!("the way to recover is to insert both the forgotten and the"),
        )(
          strpool!("deleted material, e.g., by typing `I$}'."),
        );
        error;
        incr(align_state);
    }
⟧

1124. Here is where we clear the parameters that are supposed to revert to their default values after every paragraph and when internal vertical mode is entered.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function normal_paragraph() {
        if (looseness != 0) {
            eq_word_define(int_base + looseness_code, 0);
        }
        if (hang_indent != 0) {
            eq_word_define(
              dimen_base + hang_indent_code,
              0,
            );
        }
        if (hang_after != 1) {
            eq_word_define(int_base + hang_after_code, 1);
        }
        if (par_shape_ptr != null) {
            eq_define(par_shape_loc, shape_ref, null);
        }
        if (inter_line_penalties_ptr != null) {
            eq_define(
              inter_line_penalties_loc,
              shape_ref,
              null,
            );
        }
    }
⟧

1125. Now let’s turn to the question of how \hbox is treated. We actually need to consider also a slightly larger context, since constructions like ‘\setbox3=\hbox...’ and ‘\leaders\hbox...’ and ‘\lower3.8pt\hbox...’ are supposed to invoke quite different actions after the box has been packaged. Conversely, constructions like ‘\setbox3=’ can be followed by a variety of different kinds of boxes, and we would like to encode such things in an efficient way.

In other words, there are two problems: to represent the context of a box, and to represent its type.

The first problem is solved by putting a “context code” on the save_stack , just below the two entries that give the dimensions produced by scan_spec . The context code is either a (signed) shift amount, or it is a large integer >=box_flag , where box_flag == 2^{30} . Codes box_flag through global_box_flag - 1 represent ‘\setbox0’ through ‘\setbox32767’; codes global_box_flag through ship_out_flag - 1 represent ‘\global\setbox0’ through ‘\global\setbox32767’; code ship_out_flag represents ‘\shipout’; and codes leader_flag through leader_flag + 2 represent ‘\leaders’, ‘\cleaders’, and ‘\xleaders’.

The second problem is solved by giving the command code make_box to all control sequences that produce a box, and by using the following chr_code values to distinguish between them: box_code , copy_code , last_box_code , vsplit_code , vtop_code , vtop_code + vmode , and vtop_code + hmode , where the latter two are used to denote \vbox and \hbox, respectively.

// context code for `\.{\\setbox0}'
@define box_flag => 0x40000000
// context code for `\.{\\global\\setbox0}'
@define global_box_flag => 0x40008000
// context code for `\.{\\shipout}'
@define ship_out_flag => 0x40010000
// context code for `\.{\\leaders}'
@define leader_flag => 0x40010001
@define box_code => 0 //  chr_code for `\.{\\box}'
@define copy_code => 1 //  chr_code for `\.{\\copy}'
@define last_box_code => 2 //  chr_code for `\.{\\lastbox}'
@define vsplit_code => 3 //  chr_code for `\.{\\vsplit}'
@define vtop_code => 4 //  chr_code for `\.{\\vtop}'
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("moveleft"), hmove, 1)

    primitive(strpool!("moveright"), hmove, 0)

    primitive(strpool!("raise"), vmove, 1)

    primitive(strpool!("lower"), vmove, 0)

    primitive(strpool!("box"), make_box, box_code)

    primitive(strpool!("copy"), make_box, copy_code)

    primitive(strpool!("lastbox"), make_box, last_box_code)

    primitive(strpool!("vsplit"), make_box, vsplit_code)

    primitive(strpool!("vtop"), make_box, vtop_code)

    primitive(strpool!("vbox"), make_box, vtop_code + vmode)

    primitive(strpool!("hbox"), make_box, vtop_code + hmode)

    //  ship_out_flag == leader_flag - 1 
    primitive(
      strpool!("shipout"),
      leader_ship,
      a_leaders - 1,
    )

    primitive(strpool!("leaders"), leader_ship, a_leaders)

    primitive(strpool!("cleaders"), leader_ship, c_leaders)

    primitive(strpool!("xleaders"), leader_ship, x_leaders)
⟧

1126.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    hmove:

    if (chr_code == 1) {
        print_esc(strpool!("moveleft"));
    } else {
        print_esc(strpool!("moveright"));
    }

    vmove:

    if (chr_code == 1) {
        print_esc(strpool!("raise"));
    } else {
        print_esc(strpool!("lower"));
    }

    make_box:

    case chr_code {
      box_code:
        print_esc(strpool!("box"));
      copy_code:
        print_esc(strpool!("copy"));
      last_box_code:
        print_esc(strpool!("lastbox"));
      vsplit_code:
        print_esc(strpool!("vsplit"));
      vtop_code:
        print_esc(strpool!("vtop"));
      vtop_code + vmode:
        print_esc(strpool!("vbox"));
      othercases:
        print_esc(strpool!("hbox"));
    }

    leader_ship:

    if (chr_code == a_leaders) {
        print_esc(strpool!("leaders"));
    } else if (chr_code == c_leaders) {
        print_esc(strpool!("cleaders"));
    } else if (chr_code == x_leaders) {
        print_esc(strpool!("xleaders"));
    } else {
        print_esc(strpool!("shipout"));
    }
⟧

1127. Constructions that require a box are started by calling scan_box with a specified context code. The scan_box routine verifies that a make_box command comes next and then it calls begin_box .

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    vmode + hmove, hmode + vmove, mmode + vmove:
      t = cur_chr;
      scan_normal_dimen;
      if (t == 0) {
          scan_box(cur_val);
      } else {
          scan_box(-cur_val);
      }

    any_mode(leader_ship):
      scan_box(leader_flag - a_leaders + cur_chr);;

    any_mode(make_box):
      begin_box(0);;
⟧

1128. The global variable cur_box will point to a newly made box. If the box is void, we will have cur_box == null . Otherwise we will have type(cur_box) == hlist_node or vlist_node or rule_node ; the rule_node case can occur only with leaders.

⟦13 Global variables⟧ += ⟦
    // box to be placed into its context
    var cur_box: pointer;
⟧

1129. The box_end procedure does the right thing with cur_box , if box_context represents the context as explained above.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function box_end(box_context: integer) {
        var
          p: pointer, //  ord_noad for new box in math mode
          a: small_number; // global prefix
        
        if (box_context < box_flag) {
            ⟦1130 Append box |cur_box| to the current list, shifted by |box_context|⟧
        } else if (box_context < ship_out_flag) {
            ⟦1131 Store \(c)|cur_box| in a box register⟧
        } else if (cur_box != null) {
            if (box_context > ship_out_flag) {
                ⟦1132 Append a new leader node that uses |cur_box|⟧
            } else {
                ship_out(cur_box);
            }
        }
    }
⟧

1130. The global variable adjust_tail will be non-null if and only if the current box might include adjustments that should be appended to the current vertical list.

⟦1130 Append box |cur_box| to the current list, shifted by |box_context|⟧ = ⟦
    {
        if (cur_box != null) {
            shift_amount(cur_box) = box_context;
            if (abs(mode) == vmode) {
                if (pre_adjust_tail != null) {
                    if (pre_adjust_head != pre_adjust_tail) {
                        append_list(pre_adjust_head)(
                          pre_adjust_tail,
                        );
                    }
                    pre_adjust_tail = null;
                }
                append_to_vlist(cur_box);
                if (adjust_tail != null) {
                    if (adjust_head != adjust_tail) {
                        append_list(adjust_head)(
                          adjust_tail,
                        );
                    }
                    adjust_tail = null;
                }
                if (mode > 0) {
                    build_page;
                }
            } else {
                if (abs(mode) == hmode) {
                    space_factor = 1000;
                } else {
                    p = new_noad;
                    math_type(nucleus(p)) = sub_box;
                    info(nucleus(p)) = cur_box;
                    cur_box = p;
                }
                link(tail) = cur_box;
                tail = cur_box;
            }
        }
    }
⟧

1131.

⟦1131 Store \(c)|cur_box| in a box register⟧ = ⟦
    {
        if (box_context < global_box_flag) {
            cur_val = box_context - box_flag;
            a = 0;
        } else {
            cur_val = box_context - global_box_flag;
            a = 4;
        }
        if (cur_val < 256) {
            define(box_base + cur_val, box_ref, cur_box);
        } else {
            sa_def_box;
        }
    }
⟧

1132.

⟦1132 Append a new leader node that uses |cur_box|⟧ = ⟦
    {
        ⟦438 Get the next non-blank non-relax non-call token⟧
        if (
            ((cur_cmd == hskip) && (abs(mode) != vmode))
            || ((cur_cmd == vskip) && (abs(mode) == vmode))
        ) {
            append_glue;
            subtype(tail) = 
                box_context
                - (leader_flag - a_leaders)
            ;
            leader_ptr(tail) = cur_box;
        } else {
            print_err(
              strpool!("Leaders not followed by proper glue"),
            );
            help3(
              strpool!("You should say `\\leaders <box or rule><hskip or vskip>'."),
            )(
              strpool!("I found the <box or rule>, but there's no suitable"),
            )(
              strpool!("<hskip or vskip>, so I'm ignoring these leaders."),
            );
            back_error;
            flush_node_list(cur_box);
        }
    }
⟧

1133. Now that we can see what eventually happens to boxes, we can consider the first steps in their creation. The begin_box routine is called when box_context is a context specification, cur_chr specifies the type of box desired, and cur_cmd == make_box .

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function begin_box(box_context: integer) {
        label exit, done;
        var
          p, q: pointer, // run through the current list
          r: pointer, // running behind p 
          fm: boolean, // a final \.{\\beginM} \.{\\endM} 
          // node pair?
          tx: pointer, // effective tail node
          m: quarterword, // the length of a replacement 
          // list
          k: halfword, // 0 or vmode or hmode 
          n: halfword; // a box number
        
        case cur_chr {
          box_code:
            scan_register_num;
            fetch_box(cur_box);
            // the box becomes void, at the same level
            change_box(null);
          copy_code:
            scan_register_num;
            fetch_box(q);
            cur_box = copy_node_list(q);
          last_box_code:
            ⟦1134 If the current list ends with a box node, delete it from the list and make |cur_box| point to it; otherwise set |cur_box:=null|⟧
          vsplit_code:
            ⟦1136 Split off part of a vertical box, make |cur_box| point to it⟧
          othercases:
            ⟦1137 Initiate the construction of an hbox or vbox, then |return|⟧
        }
        // in simple cases, we use the box immediately
        box_end(box_context);
      exit:
    }
⟧

1134. Note that the condition !is_char_node(tail) implies that head != tail , since head is a one-word node.

// extract tx , drop \.{\\beginM} \.{\\endM} pair
@define fetch_effective_tail_eTeX(#) =>
    q = head;
    p = null;
    repeat {
        r = p;
        p = q;
        fm = false;
        if (!is_char_node(q)) {
            if (type(q) == disc_node) {
                for (m in 1 to replace_count(q)) {
                    p = link(p);
                }
                if (p == tx) {
                    #;
                }
            } else if (
                (type(q) == math_node)
                && (subtype(q) == begin_M_code)
            ) {
                fm = true;
            }
        }
        q = link(p);// found r $\to$ p $\to$ q == tx 
    } until (q == tx);
    q = link(tx);
    link(p) = q;
    link(tx) = null;
    if (q == null) {
        if (fm) {
            confusion(strpool!("tail1"));
        } else {
            tail = p;
        }
    } else //  r $\to$ p == begin_M $\to$ q == end_M 
    if (fm) {
        tail = r;
        link(r) = null;
        flush_node_list(p);
    }
@define check_effective_tail(#) => find_effective_tail_eTeX
@define fetch_effective_tail => fetch_effective_tail_eTeX
⟦1134 If the current list ends with a box node, delete it from the list and make |cur_box| point to it; otherwise set |cur_box:=null|⟧ = ⟦
    {
        cur_box = null;
        if (abs(mode) == mmode) {
            you_cant;
            help1(
              strpool!("Sorry; this \\lastbox will be void."),
            );
            error;
        } else if ((mode == vmode) && (head == tail)) {
            you_cant;
            help2(
              strpool!("Sorry...I usually can't take things from the current page."),
            )(
              strpool!("This \\lastbox will therefore be void."),
            );
            error;
        } else {
            check_effective_tail(goto done);
            if (!is_char_node(tx)) {
                if (
                    (type(tx) == hlist_node)
                    || (type(tx) == vlist_node)
                ) {
                    ⟦1135 Remove the last box, unless it's part of a discretionary⟧
                }
            }
          done:
        }
    }
⟧

1135.

⟦1135 Remove the last box, unless it's part of a discretionary⟧ = ⟦
    {
        fetch_effective_tail(goto done);
        cur_box = tx;
        shift_amount(cur_box) = 0;
    }
⟧

1136. Here we deal with things like ‘\vsplit 13 to 100pt’.

⟦1136 Split off part of a vertical box, make |cur_box| point to it⟧ = ⟦
    {
        scan_register_num;
        n = cur_val;
        if (!scan_keyword(strpool!("to"))) {
            print_err(strpool!("Missing `to' inserted"));
            help2(
              strpool!("I'm working on `\\vsplit<box number> to <dimen>';"),
            )(strpool!("will look for the <dimen> next."));
            error;
        }
        scan_normal_dimen;
        cur_box = vsplit(n, cur_val);
    }
⟧

1137. Here is where we enter restricted horizontal mode or internal vertical mode, in order to make a box.

⟦1137 Initiate the construction of an hbox or vbox, then |return|⟧ = ⟦
    {
        k = cur_chr - vtop_code;
        saved(0) = box_context;
        if (k == hmode) {
            if (
                (box_context < box_flag)
                && (abs(mode) == vmode)
            ) {
                scan_spec(adjusted_hbox_group, true);
            } else {
                scan_spec(hbox_group, true);
            }
        } else {
            if (k == vmode) {
                scan_spec(vbox_group, true);
            } else {
                scan_spec(vtop_group, true);
                k = vmode;
            }
            normal_paragraph;
        }
        push_nest;
        mode = -k;
        if (k == vmode) {
            prev_depth = ignore_depth;
            if (every_vbox != null) {
                begin_token_list(
                  every_vbox,
                  every_vbox_text,
                );
            }
        } else {
            space_factor = 1000;
            if (every_hbox != null) {
                begin_token_list(
                  every_hbox,
                  every_hbox_text,
                );
            }
        }
        return;
    }
⟧

1138.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    // the next input should specify a box or perhaps a rule
    function scan_box(box_context: integer) {
        ⟦438 Get the next non-blank non-relax non-call token⟧
        if (cur_cmd == make_box) {
            begin_box(box_context);
        } else if (
            (box_context >= leader_flag)
            && ((cur_cmd == hrule) || (cur_cmd == vrule))
        ) {
            cur_box = scan_rule_spec;
            box_end(box_context);
        } else {
            print_err(
              strpool!("A <box> was supposed to be here"),
            );
            help3(
              strpool!("I was expecting to see \\hbox or \\vbox or \\copy or \\box or"),
            )(
              strpool!("something like that. So you might find something missing in"),
            )(
              strpool!("your output. But keep trying; you can fix this later."),
            );
            back_error;
        }
    }
⟧

1139. When the right brace occurs at the end of an \hbox or \vbox or \vtop construction, the package routine comes into action. We might also have to finish a paragraph that hasn’t ended.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ = ⟦
    hbox_group:

    package(0)

    adjusted_hbox_group:

    {
        adjust_tail = adjust_head;
        pre_adjust_tail = pre_adjust_head;
        package(0);
    }

    vbox_group:

    {
        end_graf;
        package(0);
    }

    vtop_group:

    {
        end_graf;
        package(vtop_code);
    }
⟧

1140.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function package(c: small_number) {
        var
          h: scaled, // height of box
          p: pointer, // first node in a box
          d: scaled, // max depth
          u, v: integer; // saved values for upwards mode 
          // flag
        
        d = box_max_depth;
        u = XeTeX_upwards_state;
        unsave;
        save_ptr = save_ptr - 3;
        v = XeTeX_upwards_state;
        XeTeX_upwards_state = u;
        if (mode == -hmode) {
            cur_box = hpack(link(head), saved(2), saved(1));
        } else {
            cur_box = vpackage(
              link(head),
              saved(2),
              saved(1),
              d,
            );
            if (c == vtop_code) {
                ⟦1141 Readjust the height and depth of |cur_box|, for \.{\\vtop}⟧
            }
        }
        XeTeX_upwards_state = v;
        pop_nest;
        box_end(saved(0));
    }
⟧

1141. The height of a ‘\vtop’ box is inherited from the first item on its list, if that item is an hlist_node , vlist_node , or rule_node ; otherwise the \vtop height is zero.

⟦1141 Readjust the height and depth of |cur_box|, for \.{\\vtop}⟧ = ⟦
    {
        h = 0;
        p = list_ptr(cur_box);
        if (p != null) {
            if (type(p) <= rule_node) {
                h = height(p);
            }
        }
        depth(cur_box) = 
            depth(cur_box)
            - h + height(cur_box)
        ;
        height(cur_box) = h;
    }
⟧

1142. A paragraph begins when horizontal-mode material occurs in vertical mode, or when the paragraph is explicitly started by ‘\indent’ or ‘\noindent’.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("indent"), start_par, 1)

    primitive(strpool!("noindent"), start_par, 0)
⟧

1143.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    start_par:

    if (chr_code == 0) {
        print_esc(strpool!("noindent"));
    } else {
        print_esc(strpool!("indent"));
    }
⟧

1144.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    vmode + start_par:
      new_graf(cur_chr > 0);;

    
    vmode + letter,
    vmode + other_char,
    vmode + char_num,
    vmode + char_given,
    vmode + math_shift,
    vmode + un_hbox,
    vmode + vrule,
    vmode + accent,
    vmode + discretionary,
    vmode + hskip,
    vmode + valign,
    vmode + ex_space,
    vmode + no_boundary:
      back_input;
      new_graf(true);
⟧

1145.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function norm_min(h: integer): small_number {
        if (h <= 0) {
            norm_min = 1;
        } else if (h >= 63) {
            norm_min = 63;
        } else {
            norm_min = h;
        }
    }

    function new_graf(indented: boolean) {
        prev_graf = 0;
        if ((mode == vmode) || (head != tail)) {
            tail_append(new_param_glue(par_skip_code));
        }
        push_nest;
        mode = hmode;
        space_factor = 1000;
        set_cur_lang;
        clang = cur_lang;
        prev_graf = 
            (
                norm_min(left_hyphen_min)
                * 0x40 + norm_min(right_hyphen_min)
            )
            * 0x10000 + cur_lang
        ;
        if (indented) {
            tail = new_null_box;
            link(head) = tail;
            width(tail) = par_indent;
            if ((insert_src_special_every_par)) {
                insert_src_special;
            }
        }
        if (every_par != null) {
            begin_token_list(every_par, every_par_text);
        }
        if (nest_ptr == 1) {
            // put par_skip glue on current page
            build_page;
        }
    }
⟧

1146.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    hmode + start_par, mmode + start_par:
      indent_in_hmode;;
⟧

1147.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function indent_in_hmode() {
        var p, q: pointer;
        
        // \.{\\indent}
        if (cur_chr > 0) {
            p = new_null_box;
            width(p) = par_indent;
            if (abs(mode) == hmode) {
                space_factor = 1000;
            } else {
                q = new_noad;
                math_type(nucleus(q)) = sub_box;
                info(nucleus(q)) = p;
                p = q;
            }
            tail_append(p);
        }
    }
⟧

1148. A paragraph ends when a par_end command is sensed, or when we are in horizontal mode when reaching the right brace of vertical-mode routines like \vbox, \insert, or \output.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    vmode + par_end:
      normal_paragraph;
      if (mode > 0) {
          build_page;
      }

    hmode + par_end:
      if (align_state < 0) {
          // this tries to recover from an alignment that 
          // didn't end properly
          off_save;
      }
      // this takes us to the enclosing mode, if mode > 0 
      end_graf;
      if (mode == vmode) {
          build_page;
      }

    
    hmode + stop,
    hmode + vskip,
    hmode + hrule,
    hmode + un_vbox,
    hmode + halign:
      head_for_vmode;;
⟧

1149.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function head_for_vmode() {
        if (mode < 0) {
            if (cur_cmd != hrule) {
                off_save;
            } else {
                print_err(strpool!("You can't use `"));
                print_esc(strpool!("hrule"));
                print(
                  strpool!("' here except with leaders"),
                );
                help2(
                  strpool!("To put a horizontal rule in an hbox or an alignment,"),
                )(
                  strpool!("you should use \\leaders or \\hrulefill (see The TeXbook)."),
                );
                error;
            }
        } else {
            back_input;
            cur_tok = par_token;
            back_input;
            token_type = inserted;
        }
    }
⟧

1150.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function end_graf() {
        if (mode == hmode) {
            if (head == tail) {
                // null paragraphs are ignored
                pop_nest;
            } else {
                line_break(false);
            }
            if (LR_save != null) {
                flush_list(LR_save);
                LR_save = null;
            }
            normal_paragraph;
            error_count = 0;
        }
    }
⟧

1151. Insertion and adjustment and mark nodes are constructed by the following pieces of the program.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    any_mode(insert), hmode + vadjust, mmode + vadjust:
      begin_insert_or_adjust;;

    any_mode(mark):
      make_mark;;
⟧

1152.

⟦1102 Forbidden cases detected in |main_control|⟧ += ⟦
    vmode + vadjust
⟧

1153.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function begin_insert_or_adjust() {
        if (cur_cmd == vadjust) {
            cur_val = 255;
        } else {
            scan_eight_bit_int;
            if (cur_val == 255) {
                print_err(strpool!("You can't "));
                print_esc(strpool!("insert"));
                print_int(255);
                help1(
                  strpool!("I'm changing to \\insert0; box 255 is special."),
                );
                error;
                cur_val = 0;
            }
        }
        saved(0) = cur_val;
        if (
            (cur_cmd == vadjust)
            && scan_keyword(strpool!("pre"))
        ) {
            saved(1) = 1;
        } else {
            saved(1) = 0;
        }
        save_ptr = save_ptr + 2;
        new_save_level(insert_group);
        scan_left_brace;
        normal_paragraph;
        push_nest;
        mode = -vmode;
        prev_depth = ignore_depth;
    }
⟧

1154.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    insert_group:

    {
        end_graf;
        q = split_top_skip;
        add_glue_ref(q);
        d = split_max_depth;
        f = floating_penalty;
        unsave;
        // now saved ( 0 ) is the insertion number, or 255 
        // for vadjust 
        save_ptr = save_ptr - 2;
        p = vpack(link(head), natural);
        pop_nest;
        if (saved(0) < 255) {
            tail_append(get_node(ins_node_size));
            type(tail) = ins_node;
            subtype(tail) = qi(saved(0));
            height(tail) = height(p) + depth(p);
            ins_ptr(tail) = list_ptr(p);
            split_top_ptr(tail) = q;
            depth(tail) = d;
            float_cost(tail) = f;
        } else {
            tail_append(get_node(small_node_size));
            type(tail) = adjust_node;
            // the subtype is used for adjust_pre 
            adjust_pre(tail) = saved(1);
            adjust_ptr(tail) = list_ptr(p);
            delete_glue_ref(q);
        }
        free_node(p, box_node_size);
        if (nest_ptr == 0) {
            build_page;
        }
    }

    output_group:

    ⟦1080 Resume the page builder after an output routine has come to an end⟧

1155.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function make_mark() {
        var
          p: pointer, // new node
          c: halfword; // the mark class
        
        if (cur_chr == 0) {
            c = 0;
        } else {
            scan_register_num;
            c = cur_val;
        }
        p = scan_toks(false, true);
        p = get_node(small_node_size);
        mark_class(p) = c;
        type(p) = mark_node;
        // the subtype is not used
        subtype(p) = 0;
        mark_ptr(p) = def_ref;
        link(tail) = p;
        tail = p;
    }
⟧

1156. Penalty nodes get into a list via the break_penalty command.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    any_mode(break_penalty):
      append_penalty;;
⟧

1157.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function append_penalty() {
        scan_int;
        tail_append(new_penalty(cur_val));
        if (mode == vmode) {
            build_page;
        }
    }
⟧

1158. The remove_item command removes a penalty, kern, or glue node if it appears at the tail of the current list, using a brute-force linear scan. Like \lastbox, this command is not allowed in vertical mode (except internal vertical mode), since the current list in vertical mode is sent to the page builder. But if we happen to be able to implement it in vertical mode, we do.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    any_mode(remove_item):
      delete_last;;
⟧

1159. When delete_last is called, cur_chr is the type of node that will be deleted, if present.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function delete_last() {
        label exit;
        var
          p, q: pointer, // run through the current list
          r: pointer, // running behind p 
          fm: boolean, // a final \.{\\beginM} \.{\\endM} 
          // node pair?
          tx: pointer, // effective tail node
          m: quarterword; // the length of a replacement 
          // list
        
        if ((mode == vmode) && (tail == head)) {
            ⟦1160 Apologize for inability to do the operation now, unless \.{\\unskip} follows non-glue⟧
        } else {
            check_effective_tail(return);
            if (!is_char_node(tx)) {
                if (type(tx) == cur_chr) {
                    fetch_effective_tail(return);
                    flush_node_list(tx);
                }
            }
        }
      exit:
    }
⟧

1160.

⟦1160 Apologize for inability to do the operation now, unless \.{\\unskip} follows non-glue⟧ = ⟦
    {
        if (
            (cur_chr != glue_node)
            || (last_glue != max_halfword)
        ) {
            you_cant;
            help2(
              strpool!("Sorry...I usually can't take things from the current page."),
            )(
              strpool!("Try `I\\vskip-\\lastskip' instead."),
            );
            if (cur_chr == kern_node) {
                help_line[0] = (strpool!("Try `I\\kern-\\lastkern' instead."));
            } else if (cur_chr != glue_node) {
                help_line[0] = (strpool!("Perhaps you can make the output routine do it."));
            }
            error;
        }
    }
⟧

1161.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("unpenalty"),
      remove_item,
      penalty_node,
    )

    primitive(strpool!("unkern"), remove_item, kern_node)

    primitive(strpool!("unskip"), remove_item, glue_node)

    primitive(strpool!("unhbox"), un_hbox, box_code)

    primitive(strpool!("unhcopy"), un_hbox, copy_code)

    primitive(strpool!("unvbox"), un_vbox, box_code)

    primitive(strpool!("unvcopy"), un_vbox, copy_code)
⟧

1162.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    remove_item:

    if (chr_code == glue_node) {
        print_esc(strpool!("unskip"));
    } else if (chr_code == kern_node) {
        print_esc(strpool!("unkern"));
    } else {
        print_esc(strpool!("unpenalty"));
    }

    un_hbox:

    if (chr_code == copy_code) {
        print_esc(strpool!("unhcopy"));
    } else {
        print_esc(strpool!("unhbox"));
    }

    un_vbox:

    if (chr_code == copy_code) {
        print_esc(strpool!("unvcopy"));
    }

    ⟦1673 Cases of |un_vbox| for |print_cmd_chr|⟧

    else

    print_esc(strpool!("unvbox"))
⟧

1163. The un_hbox and un_vbox commands unwrap one of the 256 current boxes.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    vmode + un_vbox, hmode + un_hbox, mmode + un_hbox:
      unpackage;;
⟧

1164.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function unpackage() {
        label done, exit;
        var
          p: pointer, // the box
          r: pointer, // to remove marginal kern nodes
          c: box_code .. copy_code; // should we copy?
        
        if (cur_chr > copy_code) {
            ⟦1674 Handle saved items and |goto done|⟧
        }
        c = cur_chr;
        scan_register_num;
        fetch_box(p);
        if (p == null) {
            return;
        }
        if (
            (abs(mode) == mmode)
            || (
                (abs(mode) == vmode)
                && (type(p) != vlist_node)
            )
            || (
                (abs(mode) == hmode)
                && (type(p) != hlist_node)
            )
        ) {
            print_err(
              strpool!("Incompatible list can't be unboxed"),
            );
            help3(
              strpool!("Sorry, Pandora. (You sneaky devil.)"),
            )(
              strpool!("I refuse to unbox an \\hbox in vertical mode or vice versa."),
            )(
              strpool!("And I can't open any boxes in math mode."),
            );
            error;
            return;
        }
        if (c == copy_code) {
            link(tail) = copy_node_list(list_ptr(p));
        } else {
            link(tail) = list_ptr(p);
            change_box(null);
            free_node(p, box_node_size);
        }
      done:
        while (link(tail) != null) {
            r = link(tail);
            if (!
                is_char_node(r)
                && (type(r) == margin_kern_node)
            ) {
                link(tail) = link(r);
                free_node(r, margin_kern_node_size);
            }
            tail = link(tail);
        }
      exit:
    }
⟧

1165.

⟦1102 Forbidden cases detected in |main_control|⟧ += ⟦
    vmode + ital_corr
⟧

1166. Italic corrections are converted to kern nodes when the ital_corr command follows a character. In math mode the same effect is achieved by appending a kern of zero here, since italic corrections are supplied later.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    hmode + ital_corr:
      append_italic_correction;;

    mmode + ital_corr:
      tail_append(new_kern(0));;
⟧

1167.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function append_italic_correction() {
        label exit;
        var
          p: pointer, //  char_node at the tail of the 
          // current list
          f: internal_font_number; // the font in the 
          // char_node 
        
        if (tail != head) {
            if (is_char_node(tail)) {
                p = tail;
            } else if (type(tail) == ligature_node) {
                p = lig_char(tail);
            } else if ((type(tail) == whatsit_node)) {
                if (is_native_word_subtype(tail)) {
                    tail_append(
                      new_kern(
                        get_native_italic_correction(tail),
                      ),
                    );
                    subtype(tail) = explicit;
                } else if ((subtype(tail) == glyph_node)) {
                    tail_append(
                      new_kern(
                        get_native_glyph_italic_correction(
                          tail,
                        ),
                      ),
                    );
                    subtype(tail) = explicit;
                }
                return;
            } else {
                return;
            }
            f = font(p);
            tail_append(
              new_kern(
                char_italic(f)(char_info(f)(character(p))),
              ),
            );
            subtype(tail) = explicit;
        }
      exit:
    }
⟧

1168. Discretionary nodes are easy in the common case ‘\-’, but in the general case we must process three braces full of items.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(ord!("-"), discretionary, 1)

    primitive(strpool!("discretionary"), discretionary, 0)
⟧

1169.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    discretionary:

    if (chr_code == 1) {
        print_esc(ord!("-"));
    } else {
        print_esc(strpool!("discretionary"));
    }
⟧

1170.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    hmode + discretionary, mmode + discretionary:
      append_discretionary;;
⟧

1171. The space factor does not change when we append a discretionary node, but it starts out as 1000 in the subsidiary lists.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function append_discretionary() {
        var
          c: integer; // hyphen character
        
        tail_append(new_disc);
        if (cur_chr == 1) {
            c = hyphen_char[cur_font];
            if (c >= 0) {
                if (c <= biggest_char) {
                    pre_break(tail) = new_character(
                      cur_font,
                      c,
                    );
                }
            }
        } else {
            incr(save_ptr);
            saved(-1) = 0;
            new_save_level(disc_group);
            scan_left_brace;
            push_nest;
            mode = -hmode;
            space_factor = 1000;
        }
    }
⟧

1172. The three discretionary lists are constructed somewhat as if they were hboxes. A subroutine called build_discretionary handles the transitions. (This is sort of fun.)

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    var disc_group: build_discretionary;
⟧

1173.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function build_discretionary() {
        label done, exit;
        var
          p, q: pointer, // for link manipulation
          n: integer; // length of discretionary list
        
        unsave;
        ⟦1175 Prune the current list, if necessary, until it contains only |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|, and |ligature_node| items; set |n| to the length of the list, and set |q| to the list's tail⟧
        p = link(head);
        pop_nest;
        case saved(-1) {
          0:
            pre_break(tail) = p;
          1:
            post_break(tail) = p;
          2:
            ⟦1174 Attach list |p| to the current list, and record its length; then finish up and |return|⟧// 
          // there are no other cases
        }
        incr(saved(-1));
        new_save_level(disc_group);
        scan_left_brace;
        push_nest;
        mode = -hmode;
        space_factor = 1000;
      exit:
    }
⟧

1174.

⟦1174 Attach list |p| to the current list, and record its length; then finish up and |return|⟧ = ⟦
    {
        if ((n > 0) && (abs(mode) == mmode)) {
            print_err(strpool!("Illegal math "));
            print_esc(strpool!("discretionary"));
            help2(
              strpool!("Sorry: The third part of a discretionary break must be"),
            )(
              strpool!("empty, in math formulas. I had to delete your third part."),
            );
            flush_node_list(p);
            n = 0;
            error;
        } else {
            link(tail) = p;
        }
        if (n <= max_quarterword) {
            replace_count(tail) = n;
        } else {
            print_err(
              strpool!("Discretionary list is too long"),
            );
            help2(
              strpool!("Wow---I never thought anybody would tweak me here."),
            )(
              strpool!("You can't seriously need such a huge discretionary list?"),
            );
            error;
        }
        if (n > 0) {
            tail = q;
        }
        decr(save_ptr);
        return;
    }
⟧

1175. During this loop, p == link(q) and there are n items preceding p .

⟦1175 Prune the current list, if necessary, until it contains only |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|, and |ligature_node| items; set |n| to the length of the list, and set |q| to the list's tail⟧ = ⟦
    q = head

    p = link(q)

    n = 0

    while (p != null) {
        if (!is_char_node(p)) {
            if (type(p) > rule_node) {
                if (type(p) != kern_node) {
                    if (type(p) != ligature_node) {
                        if (
                            (type(p) != whatsit_node)
                            || (!
                                is_native_word_subtype(p)
                                && (subtype(p) != glyph_node)
                            )
                        ) {
                            print_err(
                              strpool!("Improper discretionary list"),
                            );
                            help1(
                              strpool!("Discretionary lists must contain only boxes and kerns."),
                            );
                            error;
                            begin_diagnostic;
                            print_nl(
                              strpool!("The following discretionary sublist has been deleted:"),
                            );
                            show_box(p);
                            end_diagnostic(true);
                            flush_node_list(p);
                            link(q) = null;
                            goto done;
                        }
                    }
                }
            }
        }
        q = p;
        p = link(q);
        incr(n);
    }

    done:
⟧

1176. We need only one more thing to complete the horizontal mode routines, namely the \accent primitive.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    hmode + accent:
      make_accent;;
⟧

1177. The positioning of accents is straightforward but tedious. Given an accent of width a , designed for characters of height x and slant s ; and given a character of width w , height h , and slant t : We will shift the accent down by x - h , and we will insert kern nodes that have the effect of centering the accent over the character and shifting the accent to the right by 𝛿=12(𝑤𝑎)+𝑡𝑥𝑠. If either character is absent from the font, we will simply use the other, without shifting.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function make_accent() {
        var
          s, t: real, // amount of slant
          p, q, r: pointer, // character, box, and kern 
          // nodes
          f: internal_font_number, // relevant font
          a, h, x, w, delta, lsb, rsb: scaled, // heights 
          // and widths, as explained above
          i: four_quarters; // character information
        
        scan_char_num;
        f = cur_font;
        p = new_character(f, cur_val);
        if (p != null) {
            x = x_height(f);
            s = slant(f) / float_constant(65536);
            if (is_native_font(f)) {
                a = width(p);
                if (a == 0) {
                    get_native_char_sidebearings(
                      f,
                      cur_val,
                      addressof(lsb),
                      addressof(rsb),
                    );
                }
            } else {
                a = char_width(f)(
                  char_info(f)(character(p)),
                );
            }
            do_assignments;
            ⟦1178 Create a character node |q| for the next character, but set |q:=null| if problems arise⟧
            if (q != null) {
                ⟦1179 Append the accent with appropriate kerns, then set |p:=q|⟧
            }
            link(tail) = p;
            tail = p;
            space_factor = 1000;
        }
    }
⟧

1178.

⟦1178 Create a character node |q| for the next character, but set |q:=null| if problems arise⟧ = ⟦
    q = null

    f = cur_font

    if (
        (cur_cmd == letter)
        || (cur_cmd == other_char)
        || (cur_cmd == char_given)
    ) {
        q = new_character(f, cur_chr);
        cur_val = cur_chr;
    } else if (cur_cmd == char_num) {
        scan_char_num;
        q = new_character(f, cur_val);
    } else {
        back_input;
    }
⟧

1179. The kern nodes appended here must be distinguished from other kerns, lest they be wiped away by the hyphenation algorithm or by a previous line break.

The two kerns are computed with (machine-dependent) real arithmetic, but their sum is machine-independent; the net effect is machine-independent, because the user cannot remove these nodes nor access them via \lastkern.

⟦1179 Append the accent with appropriate kerns, then set |p:=q|⟧ = ⟦
    {
        t = slant(f) / float_constant(65536);
        if (is_native_font(f)) {
            w = width(q);
            // using delta as scratch space for the unneeded 
            // depth value
            get_native_char_height_depth(
              f,
              cur_val,
              addressof(h),
              addressof(delta),
            );
        } else {
            i = char_info(f)(character(q));
            w = char_width(f)(i);
            h = char_height(f)(height_depth(i));
        }
        // the accent must be shifted up or down
        if (h != x) {
            p = hpack(p, natural);
            shift_amount(p) = x - h;
        }
        // special case for non-spacing marks
        if (is_native_font(f) && (a == 0)) {
            delta = round(
              
                  (w - lsb + rsb)
                  / float_constant(2) + h * t - x * s
              ,
            );
        } else {
            delta = round(
              (w - a) / float_constant(2) + h * t - x * s,
            );
        }
        r = new_kern(delta);
        subtype(r) = acc_kern;
        link(tail) = r;
        link(r) = p;
        tail = new_kern(-a - delta);
        subtype(tail) = acc_kern;
        link(p) = tail;
        p = q;
    }
⟧

1180. When ‘\cr’ or ‘\span’ or a tab mark comes through the scanner into main_control , it might be that the user has foolishly inserted one of them into something that has nothing to do with alignment. But it is far more likely that a left brace or right brace has been omitted, since get_next takes actions appropriate to alignment only when ‘\cr’ or ‘\span’ or tab marks occur with align_state == 0 . The following program attempts to make an appropriate recovery.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    any_mode(car_ret), any_mode(tab_mark):
      align_error;;

    any_mode(no_align):
      no_align_error;;

    any_mode(omit):
      omit_error;;
⟧

1181.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function align_error() {
        if (abs(align_state) > 2) {
            ⟦1182 Express consternation over the fact that no alignment is in progress⟧
        } else {
            back_input;
            if (align_state < 0) {
                print_err(strpool!("Missing { inserted"));
                incr(align_state);
                cur_tok = left_brace_token + ord!("{");
            } else {
                print_err(strpool!("Missing } inserted"));
                decr(align_state);
                cur_tok = right_brace_token + ord!("}");
            }
            help3(
              strpool!("I've put in what seems to be necessary to fix"),
            )(
              strpool!("the current column of the current alignment."),
            )(
              strpool!("Try to go on, since this might almost work."),
            );
            ins_error;
        }
    }
⟧

1182.

⟦1182 Express consternation over the fact that no alignment is in progress⟧ = ⟦
    {
        print_err(strpool!("Misplaced "));
        print_cmd_chr(cur_cmd, cur_chr);
        if (cur_tok == tab_token + ord!("&")) {
            help6(
              strpool!("I can't figure out why you would want to use a tab mark"),
            )(
              strpool!("here. If you just want an ampersand, the remedy is"),
            )(
              strpool!("simple: Just type `I\\&' now. But if some right brace"),
            )(
              strpool!("up above has ended a previous alignment prematurely,"),
            )(
              strpool!("you're probably due for more error messages, and you"),
            )(
              strpool!("might try typing `S' now just to see what is salvageable."),
            );
        } else {
            help5(
              strpool!("I can't figure out why you would want to use a tab mark"),
            )(
              strpool!("or \\cr or \\span just now. If something like a right brace"),
            )(
              strpool!("up above has ended a previous alignment prematurely,"),
            )(
              strpool!("you're probably due for more error messages, and you"),
            )(
              strpool!("might try typing `S' now just to see what is salvageable."),
            );
        }
        error;
    }
⟧

1183. The help messages here contain a little white lie, since \noalign and \omit are allowed also after ‘\noalign{...}’.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function no_align_error() {
        print_err(strpool!("Misplaced "));
        print_esc(strpool!("noalign"));
        help2(
          strpool!("I expect to see \\noalign only after the \\cr of"),
        )(
          strpool!("an alignment. Proceed, and I'll ignore this case."),
        );
        error;
    }

    function omit_error() {
        print_err(strpool!("Misplaced "));
        print_esc(strpool!("omit"));
        help2(
          strpool!("I expect to see \\omit only after tab marks or the \\cr of"),
        )(
          strpool!("an alignment. Proceed, and I'll ignore this case."),
        );
        error;
    }
⟧

1184. We’ve now covered most of the abuses of \halign and \valign. Let’s take a look at what happens when they are used correctly.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    vmode + halign:
      init_align;;

    hmode + valign:
      ⟦1513 Cases of |main_control| for |hmode+valign|⟧

    init_align

    mmode + halign:
      if (privileged) {
          if (cur_group == math_shift_group) {
              init_align;
          } else {
              off_save;
          }
      }

    vmode + endv, hmode + endv:
      do_endv;;
⟧

1185. An align_group code is supposed to remain on the save_stack during an entire alignment, until fin_align removes it.

A devious user might force an endv command to occur just about anywhere; we must defeat such hacks.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function do_endv() {
        base_ptr = input_ptr;
        input_stack[base_ptr] = cur_input;
        while (
            (input_stack[base_ptr].index_field != v_template)
            && (input_stack[base_ptr].loc_field == null)
            && (
                input_stack[base_ptr].state_field
                == token_list
            )
        ) {
            decr(base_ptr);
        }
        if (
            (input_stack[base_ptr].index_field != v_template)
            || (input_stack[base_ptr].loc_field != null)
            || (
                input_stack[base_ptr].state_field
                != token_list
            )
        ) {
            fatal_error(
              strpool!("(interwoven alignment preambles are not allowed)"),
            );
        }
        if (cur_group == align_group) {
            end_graf;
            if (fin_col) {
                fin_row;
            }
        } else {
            off_save;
        }
    }
⟧

1186.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    align_group:

    {
        back_input;
        cur_tok = cs_token_flag + frozen_cr;
        print_err(strpool!("Missing "));
        print_esc(strpool!("cr"));
        print(strpool!(" inserted"));
        help1(
          strpool!("I'm guessing that you meant to end an alignment here."),
        );
        ins_error;
    }
⟧

1187.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    no_align_group:

    {
        end_graf;
        unsave;
        align_peek;
    }
⟧

1188. Finally, \endcsname is not supposed to get through to main_control .

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    any_mode(end_cs_name):
      cs_error;;
⟧

1189.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function cs_error() {
        print_err(strpool!("Extra "));
        print_esc(strpool!("endcsname"));
        help1(
          strpool!("I'm ignoring this, since I wasn't doing a \\csname."),
        );
        error;
    }
⟧

1190. [48] Building math lists. The routines that TEX uses to create mlists are similar to those we have just seen for the generation of hlists and vlists. But it is necessary to make “noads” as well as nodes, so the reader should review the discussion of math mode data structures before trying to make sense out of the following program.

Here is a little routine that needs to be done whenever a subformula is about to be processed. The parameter is a code like math_group .

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function push_math(c: group_code) {
        push_nest;
        mode = -mmode;
        incompleat_noad = null;
        new_save_level(c);
    }
⟧

1191. We get into math mode from horizontal mode when a ‘$’ (i.e., a math_shift character) is scanned. We must check to see whether this ‘$’ is immediately followed by another, in case display math mode is called for.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    hmode + math_shift:
      init_math;;
⟧

1192.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    ⟦1544 Declare subprocedures for |init_math|⟧

    function init_math() {
        label reswitch, found, not_found, done;
        var
          w: scaled, // new or partial pre_display_size 
          j: pointer, // prototype box for display
          x: integer, // new pre_display_direction 
          l: scaled, // new display_width 
          s: scaled, // new display_indent 
          p: pointer, // current node when calculating 
          // pre_display_size 
          q: pointer, // glue specification when calculating 
          // pre_display_size 
          f: internal_font_number, // font in current 
          // char_node 
          n: integer, // scope of paragraph shape 
          // specification
          v: scaled, //  w plus possible glue amount
          d: scaled; // increment to v 
        
        //  get_x_token would fail on 
        // \.{\\ifmmode}\thinspace!
        get_token;
        if ((cur_cmd == math_shift) && (mode > 0)) {
            ⟦1199 Go into display math mode⟧
        } else {
            back_input;
            ⟦1193 Go into ordinary math mode⟧
        }
    }
⟧

1193.

⟦1193 Go into ordinary math mode⟧ = ⟦
    {
        push_math(math_shift_group);
        eq_word_define(int_base + cur_fam_code, -1);
        if ((insert_src_special_every_math)) {
            insert_src_special;
        }
        if (every_math != null) {
            begin_token_list(every_math, every_math_text);
        }
    }
⟧

1194. We get into ordinary math mode from display math mode when ‘\eqno’ or ‘\leqno’ appears. In such cases cur_chr will be 0 or 1, respectively; the value of cur_chr is placed onto save_stack for safe keeping.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + eq_no:
      if (privileged) {
          if (cur_group == math_shift_group) {
              start_eq_no;
          } else {
              off_save;
          }
      }
⟧

1195.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("eqno"), eq_no, 0)

    primitive(strpool!("leqno"), eq_no, 1)
⟧

1196. When TEX is in display math mode, cur_group == math_shift_group , so it is not necessary for the start_eq_no procedure to test for this condition.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function start_eq_no() {
        saved(0) = cur_chr;
        incr(save_ptr);
        ⟦1193 Go into ordinary math mode⟧
    }
⟧

1197.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    eq_no:

    if (chr_code == 1) {
        print_esc(strpool!("leqno"));
    } else {
        print_esc(strpool!("eqno"));
    }
⟧

1198.

⟦1102 Forbidden cases detected in |main_control|⟧ += ⟦
    non_math(eq_no)
⟧

1199. When we enter display math mode, we need to call line_break to process the partial paragraph that has just been interrupted by the display. Then we can set the proper values of display_width and display_indent and pre_display_size .

⟦1199 Go into display math mode⟧ = ⟦
    {
        j = null;
        w = -max_dimen;
        // `\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'
        if (head == tail) {
            ⟦1543 Prepare for display after an empty paragraph⟧
        } else {
            line_break(true);
            ⟦1200 Calculate the natural width, |w|, by which the characters of the final line extend to the right of the reference point, plus two ems; or set |w:=max_dimen| if the non-blank information on that line is affected by stretching or shrinking⟧
            // now we are in vertical mode, working on the 
            // list that will contain the display
        }
        ⟦1203 Calculate the length, |l|, and the shift amount, |s|, of the display lines⟧
        push_math(math_shift_group);
        mode = mmode;
        eq_word_define(int_base + cur_fam_code, -1);
        eq_word_define(
          dimen_base + pre_display_size_code,
          w,
        );
        LR_box = j;
        if (eTeX_ex) {
            eq_word_define(
              int_base + pre_display_direction_code,
              x,
            );
        }
        eq_word_define(dimen_base + display_width_code, l);
        eq_word_define(dimen_base + display_indent_code, s);
        if (every_display != null) {
            begin_token_list(
              every_display,
              every_display_text,
            );
        }
        if (nest_ptr == 1) {
            build_page;
        }
    }
⟧

1200.

⟦1200 Calculate the natural width, |w|, by which the characters of the final line extend to the right of the reference point, plus two ems; or set |w:=max_dimen| if the non-blank information on that line is affected by stretching or shrinking⟧ = ⟦
    ⟦1545 Prepare for display after a non-empty paragraph⟧

    while (p != null) {
        ⟦1201 Let |d| be the natural width of node |p|; if the node is ``visible,'' |goto found|; if the node is glue that stretches or shrinks, set |v:=max_dimen|⟧
        if (v < max_dimen) {
            v = v + d;
        }
        goto not_found;
      found:
        if (v < max_dimen) {
            v = v + d;
            w = v;
        } else {
            w = max_dimen;
            goto done;
        }
      not_found:
        p = link(p);
    }

    done:

    ⟦1546 Finish the natural width computation⟧

1201.

⟦1201 Let |d| be the natural width of node |p|; if the node is ``visible,'' |goto found|; if the node is glue that stretches or shrinks, set |v:=max_dimen|⟧ = ⟦
    reswitch:

    if (is_char_node(p)) {
        f = font(p);
        d = char_width(f)(char_info(f)(character(p)));
        goto found;
    }

    case type(p) {
      hlist_node, vlist_node, rule_node:
        d = width(p);
        goto found;
      ligature_node:
        ⟦692 Make node |p| look like a |char_node| and |goto reswitch|⟧
      kern_node:
        d = width(p);
      margin_kern_node:
        d = width(p);
      ⟦1547 Cases of `Let |d| be the natural width' that need special treatment⟧
      glue_node:
        ⟦1202 Let |d| be the natural width of this glue; if stretching or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders⟧
      whatsit_node:
        ⟦1421 Let |d| be the width of the whatsit |p|, and |goto found| if ``visible''⟧
      othercases:
        d = 0;
    }
⟧

1202. We need to be careful that w , v , and d do not depend on any glue_set values, since such values are subject to system-dependent rounding. System-dependent numbers are not allowed to infiltrate parameters like pre_display_size , since TEX82 is supposed to make the same decisions on all machines.

⟦1202 Let |d| be the natural width of this glue; if stretching or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders⟧ = ⟦
    {
        q = glue_ptr(p);
        d = width(q);
        if (glue_sign(just_box) == stretching) {
            if (
                (glue_order(just_box) == stretch_order(q))
                && (stretch(q) != 0)
            ) {
                v = max_dimen;
            }
        } else if (glue_sign(just_box) == shrinking) {
            if (
                (glue_order(just_box) == shrink_order(q))
                && (shrink(q) != 0)
            ) {
                v = max_dimen;
            }
        }
        if (subtype(p) >= a_leaders) {
            goto found;
        }
    }
⟧

1203. A displayed equation is considered to be three lines long, so we calculate the length and offset of line number prev_graf + 2 .

⟦1203 Calculate the length, |l|, and the shift amount, |s|, of the display lines⟧ = ⟦
    if (par_shape_ptr == null) {
        if (
            (hang_indent != 0)
            && (
                (
                    (hang_after >= 0)
                    && (prev_graf + 2 > hang_after)
                )
                || (prev_graf + 1 < -hang_after)
            )
        ) {
            l = hsize - abs(hang_indent);
            if (hang_indent > 0) {
                s = hang_indent;
            } else {
                s = 0;
            }
        } else {
            l = hsize;
            s = 0;
        }
    } else {
        n = info(par_shape_ptr);
        if (prev_graf + 2 >= n) {
            p = par_shape_ptr + 2 * n;
        } else {
            p = par_shape_ptr + 2 * (prev_graf + 2);
        }
        s = mem[p - 1].sc;
        l = mem[p].sc;
    }
⟧

1204. Subformulas of math formulas cause a new level of math mode to be entered, on the semantic nest as well as the save stack. These subformulas arise in several ways: (1) A left brace by itself indicates the beginning of a subformula that will be put into a box, thereby freezing its glue and preventing line breaks. (2) A subscript or superscript is treated as a subformula if it is not a single character; the same applies to the nucleus of things like \underline. (3) The \left primitive initiates a subformula that will be terminated by a matching \right. The group codes placed on save_stack in these three cases are math_group , math_group , and math_left_group , respectively.

Here is the code that handles case (1); the other cases are not quite as trivial, so we shall consider them later.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + left_brace:
      tail_append(new_noad);
      back_input;
      scan_math(nucleus(tail));
⟧

1205. Recall that the nucleus , subscr , and supscr fields in a noad are broken down into subfields called math_type and either info or (fam, character) . The job of scan_math is to figure out what to place in one of these principal fields; it looks at the subformula that comes next in the input, and places an encoding of that subformula into a given word of mem .

@define fam_in_range =>
    ((cur_fam >= 0) && (cur_fam < number_math_families))
⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function scan_math(p: pointer) {
        label restart, reswitch, exit;
        var
          c: integer; // math character code
        
      restart:
        ⟦438 Get the next non-blank non-relax non-call token⟧
      reswitch:
        case cur_cmd {
          letter, other_char, char_given:
            c = ho(math_code(cur_chr));
            if (is_active_math_char(c)) {
                ⟦1206 Treat |cur_chr| as an active character⟧
                goto restart;
            }
          char_num:
            scan_char_num;
            cur_chr = cur_val;
            cur_cmd = char_given;
            goto reswitch;
          math_char_num:
            if (cur_chr == 2) {
                // \.{\\Umathchar}
                scan_math_class_int;
                c = set_class_field(cur_val);
                scan_math_fam_int;
                c = c + set_family_field(cur_val);
                scan_usv_num;
                c = c + cur_val;
            } else if (cur_chr == 1) {
                // \.{\\Umathcharnum}
                scan_xetex_math_char_int;
                c = cur_val;
            } else {
                scan_fifteen_bit_int;
                c = 
                    set_class_field(cur_val div 0x1000)
                    + set_family_field(
                      (cur_val % 0x1000) div 0x100,
                    )
                    + (cur_val % 0x100)
                ;
            }
          math_given:
            c = 
                set_class_field(cur_chr div 0x1000)
                + set_family_field(
                  (cur_chr % 0x1000) div 0x100,
                )
                + (cur_chr % 0x100)
            ;
          XeTeX_math_given:
            c = cur_chr;
          delim_num:
            if (cur_chr == 1) {
                // \.{\\Udelimiter <class> <fam> <usv>}
                scan_math_class_int;
                c = set_class_field(cur_val);
                scan_math_fam_int;
                c = c + set_family_field(cur_val);
                scan_usv_num;
                c = c + cur_val;
            } else {
                // \.{\\delimiter <27-bit delcode>}
                scan_delimiter_int;
                // get the `small' delimiter field
                c = cur_val div 0x1000;
                // and convert it to a \XeTeX\ mathchar code
                c = 
                    set_class_field(c div 0x1000)
                    + set_family_field(
                      (c % 0x1000) div 0x100,
                    )
                    + (c % 0x100)
                ;
            }
          othercases:
            ⟦1207 Scan a subformula enclosed in braces and |return|⟧
        }
        math_type(p) = math_char;
        character(p) = qi(c % 0x10000);
        if ((is_var_family(c)) && fam_in_range) {
            plane_and_fam_field(p) = cur_fam;
        } else {
            plane_and_fam_field(p) = (math_fam_field(c));
        }
        plane_and_fam_field(p) = 
            plane_and_fam_field(p)
            + (math_char_field(c) div 0x10000) * 0x100
        ;
      exit:
    }
⟧

1206. An active character that is an outer_call is allowed here.

⟦1206 Treat |cur_chr| as an active character⟧ = ⟦
    {
        cur_cs = cur_chr + active_base;
        cur_cmd = eq_type(cur_cs);
        cur_chr = equiv(cur_cs);
        x_token;
        back_input;
    }
⟧

1207. The pointer p is placed on save_stack while a complex subformula is being scanned.

⟦1207 Scan a subformula enclosed in braces and |return|⟧ = ⟦
    {
        back_input;
        scan_left_brace;
        saved(0) = p;
        incr(save_ptr);
        push_math(math_group);
        return;
    }
⟧

1208. The simplest math formula is, of course, ‘$ $’, when no noads are generated. The next simplest cases involve a single character, e.g., ‘$x$’. Even though such cases may not seem to be very interesting, the reader can perhaps understand how happy the author was when ‘$x$’ was first properly typeset by TEX. The code in this section was used.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + letter, mmode + other_char, mmode + char_given:
      set_math_char(ho(math_code(cur_chr)));;

    mmode + char_num:
      scan_char_num;
      cur_chr = cur_val;
      set_math_char(ho(math_code(cur_chr)));

    mmode + math_char_num:
      if (cur_chr == 2) {
          // \.{\\Umathchar}
          scan_math_class_int;
          t = set_class_field(cur_val);
          scan_math_fam_int;
          t = t + set_family_field(cur_val);
          scan_usv_num;
          t = t + cur_val;
          set_math_char(t);
      } else if (cur_chr == 1) {
          // \.{\\Umathcharnum}
          scan_xetex_math_char_int;
          set_math_char(cur_val);
      } else {
          scan_fifteen_bit_int;
          set_math_char(
            
                set_class_field(cur_val div 0x1000)
                + set_family_field(
                  (cur_val % 0x1000) div 0x100,
                )
                + (cur_val % 0x100)
            ,
          );
      }

    mmode + math_given:
      set_math_char(
        
            set_class_field(cur_chr div 0x1000)
            + set_family_field(
              (cur_chr % 0x1000) div 0x100,
            )
            + (cur_chr % 0x100)
        ,
      );

    mmode + XeTeX_math_given:
      set_math_char(cur_chr);;

    mmode + delim_num:
      if (cur_chr == 1) {
          // \.{\\Udelimiter}
          scan_math_class_int;
          t = set_class_field(cur_val);
          scan_math_fam_int;
          t = t + set_family_field(cur_val);
          scan_usv_num;
          t = t + cur_val;
          set_math_char(t);
      } else {
          scan_delimiter_int;
          // discard the large delimiter code
          cur_val = cur_val div 0x1000;
          set_math_char(
            
                set_class_field(cur_val div 0x1000)
                + set_family_field(
                  (cur_val % 0x1000) div 0x100,
                )
                + (cur_val % 0x100)
            ,
          );
      }
⟧

1209. The set_math_char procedure creates a new noad appropriate to a given math code, and appends it to the current mlist. However, if the math code is sufficiently large, the cur_chr is treated as an active character and nothing is appended.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function set_math_char(c: integer) {
        var
          p: pointer, // the new noad
          ch: UnicodeScalar;
        
        if (is_active_math_char(c)) {
            ⟦1206 Treat |cur_chr| as an active character⟧
        } else {
            p = new_noad;
            math_type(nucleus(p)) = math_char;
            ch = math_char_field(c);
            character(nucleus(p)) = qi(ch % 0x10000);
            plane_and_fam_field(nucleus(p)) = math_fam_field(
              c,
            );
            if (is_var_family(c)) {
                if (fam_in_range) {
                    plane_and_fam_field(nucleus(p)) = (
                      cur_fam
                    );
                }
                type(p) = ord_noad;
            } else {
                type(p) = ord_noad + math_class_field(c);
            }
            plane_and_fam_field(nucleus(p)) = 
                plane_and_fam_field(nucleus(p))
                + (ch div 0x10000) * 0x100
            ;
            link(tail) = p;
            tail = p;
        }
    }
⟧

1210. Primitive math operators like \mathop and \underline are given the command code math_comp , supplemented by the noad type that they generate.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("mathord"), math_comp, ord_noad)

    primitive(strpool!("mathop"), math_comp, op_noad)

    primitive(strpool!("mathbin"), math_comp, bin_noad)

    primitive(strpool!("mathrel"), math_comp, rel_noad)

    primitive(strpool!("mathopen"), math_comp, open_noad)

    primitive(strpool!("mathclose"), math_comp, close_noad)

    primitive(strpool!("mathpunct"), math_comp, punct_noad)

    primitive(strpool!("mathinner"), math_comp, inner_noad)

    primitive(strpool!("underline"), math_comp, under_noad)

    primitive(strpool!("overline"), math_comp, over_noad)

    primitive(
      strpool!("displaylimits"),
      limit_switch,
      normal,
    )

    primitive(strpool!("limits"), limit_switch, limits)

    primitive(strpool!("nolimits"), limit_switch, no_limits)
⟧

1211.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    math_comp:

    case chr_code {
      ord_noad:
        print_esc(strpool!("mathord"));
      op_noad:
        print_esc(strpool!("mathop"));
      bin_noad:
        print_esc(strpool!("mathbin"));
      rel_noad:
        print_esc(strpool!("mathrel"));
      open_noad:
        print_esc(strpool!("mathopen"));
      close_noad:
        print_esc(strpool!("mathclose"));
      punct_noad:
        print_esc(strpool!("mathpunct"));
      inner_noad:
        print_esc(strpool!("mathinner"));
      under_noad:
        print_esc(strpool!("underline"));
      othercases:
        print_esc(strpool!("overline"));
    }

    limit_switch:

    if (chr_code == limits) {
        print_esc(strpool!("limits"));
    } else if (chr_code == no_limits) {
        print_esc(strpool!("nolimits"));
    } else {
        print_esc(strpool!("displaylimits"));
    }
⟧

1212.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + math_comp:
      tail_append(new_noad);
      type(tail) = cur_chr;
      scan_math(nucleus(tail));

    mmode + limit_switch:
      math_limit_switch;;
⟧

1213.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function math_limit_switch() {
        label exit;
        
        if (head != tail) {
            if (type(tail) == op_noad) {
                subtype(tail) = cur_chr;
                return;
            }
        }
        print_err(
          strpool!("Limit controls must follow a math operator"),
        );
        help1(
          strpool!("I'm ignoring this misplaced \\limits or \\nolimits command."),
        );
        error;
      exit:
    }
⟧

1214. Delimiter fields of noads are filled in by the scan_delimiter routine. The first parameter of this procedure is the mem address where the delimiter is to be placed; the second tells if this delimiter follows \radical or not.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function scan_delimiter(p: pointer, r: boolean) {
        if (r) {
            if (cur_chr == 1) {
                // \.{\\Uradical}
                // extended delimiter code flag
                cur_val1 = 0x40000000;
                scan_math_fam_int;
                cur_val1 = cur_val1 + cur_val * 0x200000;
                scan_usv_num;
                cur_val = cur_val1 + cur_val;
            } else {
                // radical
                scan_delimiter_int;
            }
        } else {
            ⟦438 Get the next non-blank non-relax non-call token⟧
            case cur_cmd {
              letter, other_char:
                cur_val = del_code(cur_chr);
              delim_num:
                if (cur_chr == 1) {
                    // \.{\\Udelimiter}
                    // extended delimiter code flag
                    cur_val1 = 0x40000000;
                    // discarded
                    scan_math_class_int;
                    scan_math_fam_int;
                    cur_val1 = cur_val1 + cur_val * 0x200000;
                    scan_usv_num;
                    cur_val = cur_val1 + cur_val;
                } else {
                    // normal delimiter
                    scan_delimiter_int;
                }
              othercases:
                cur_val = -1;
            }
        }
        if (cur_val < 0) {
            ⟦1215 Report that an invalid delimiter code is being changed to null; set~|cur_val:=0|⟧
        }
        if (cur_val >= 0x40000000) {
            // extended delimiter code, only one size
            // plane
            small_plane_and_fam_field(p) = 
                ((cur_val % 0x200000) div 0x10000)
                * 0x100
            ;
            // family
            +(cur_val div 0x200000) % 0x100;
            small_char_field(p) = qi(cur_val % 0x10000);
            large_plane_and_fam_field(p) = 0;
            large_char_field(p) = 0;
        } else {
            // standard delimiter code, 4-bit families and 
            // 8-bit char codes
            small_plane_and_fam_field(p) = 
                (cur_val div 0x100000)
                % 16
            ;
            small_char_field(p) = qi(
              (cur_val div 0x1000) % 256,
            );
            large_plane_and_fam_field(p) = 
                (cur_val div 256)
                % 16
            ;
            large_char_field(p) = qi(cur_val % 256);
        }
    }
⟧

1215.

⟦1215 Report that an invalid delimiter code is being changed to null; set~|cur_val:=0|⟧ = ⟦
    {
        print_err(
          strpool!("Missing delimiter (. inserted)"),
        );
        help6(
          strpool!("I was expecting to see something like `(' or `\\{' or"),
        )(
          strpool!("`\\}' here. If you typed, e.g., `{' instead of `\\{', you"),
        )(
          strpool!("should probably delete the `{' by typing `1' now, so that"),
        )(
          strpool!("braces don't get unbalanced. Otherwise just proceed."),
        )(
          strpool!("Acceptable delimiters are characters whose \\delcode is"),
        )(
          strpool!("nonnegative, or you can use `\\delimiter <delimiter code>'."),
        );
        back_error;
        cur_val = 0;
    }
⟧

1216.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + radical:
      math_radical;;
⟧

1217.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function math_radical() {
        tail_append(get_node(radical_noad_size));
        type(tail) = radical_noad;
        subtype(tail) = normal;
        mem[nucleus(tail)].hh = empty_field;
        mem[subscr(tail)].hh = empty_field;
        mem[supscr(tail)].hh = empty_field;
        scan_delimiter(left_delimiter(tail), true);
        scan_math(nucleus(tail));
    }
⟧

1218.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + accent, mmode + math_accent:
      math_ac;;
⟧

1219.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function math_ac() {
        var c: integer;
        
        if (cur_cmd == accent) {
            ⟦1220 Complain that the user should have said \.{\\mathaccent}⟧
        }
        tail_append(get_node(accent_noad_size));
        type(tail) = accent_noad;
        subtype(tail) = normal;
        mem[nucleus(tail)].hh = empty_field;
        mem[subscr(tail)].hh = empty_field;
        mem[supscr(tail)].hh = empty_field;
        math_type(accent_chr(tail)) = math_char;
        if (cur_chr == 1) {
            if (scan_keyword(strpool!("fixed"))) {
                subtype(tail) = fixed_acc;
            } else if (scan_keyword(strpool!("bottom"))) {
                if (scan_keyword(strpool!("fixed"))) {
                    subtype(tail) = bottom_acc + fixed_acc;
                } else {
                    subtype(tail) = bottom_acc;
                }
            }
            scan_math_class_int;
            c = set_class_field(cur_val);
            scan_math_fam_int;
            c = c + set_family_field(cur_val);
            scan_usv_num;
            cur_val = cur_val + c;
        } else {
            scan_fifteen_bit_int;
            cur_val = 
                set_class_field(cur_val div 0x1000)
                + set_family_field(
                  (cur_val % 0x1000) div 0x100,
                )
                + (cur_val % 0x100)
            ;
        }
        character(accent_chr(tail)) = qi(cur_val % 0x10000);
        if ((is_var_family(cur_val)) && fam_in_range) {
            plane_and_fam_field(accent_chr(tail)) = cur_fam;
        } else {
            plane_and_fam_field(accent_chr(tail)) = (
              math_fam_field
            )(cur_val);
        }
        plane_and_fam_field(accent_chr(tail)) = 
            plane_and_fam_field(accent_chr(tail))
            + (math_char_field(cur_val) div 0x10000) * 0x100
        ;
        scan_math(nucleus(tail));
    }
⟧

1220.

⟦1220 Complain that the user should have said \.{\\mathaccent}⟧ = ⟦
    {
        print_err(strpool!("Please use "));
        print_esc(strpool!("mathaccent"));
        print(strpool!(" for accents in math mode"));
        help2(
          strpool!("I'm changing \\accent to \\mathaccent here; wish me luck."),
        )(
          strpool!("(Accents are not the same in formulas as they are in text.)"),
        );
        error;
    }
⟧

1221.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + vcenter:
      scan_spec(vcenter_group, false);
      normal_paragraph;
      push_nest;
      mode = -vmode;
      prev_depth = ignore_depth;
      if ((insert_src_special_every_vbox)) {
          insert_src_special;
      }
      if (every_vbox != null) {
          begin_token_list(every_vbox, every_vbox_text);
      }
⟧

1222.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    vcenter_group:

    {
        end_graf;
        unsave;
        save_ptr = save_ptr - 2;
        p = vpack(link(head), saved(1), saved(0));
        pop_nest;
        tail_append(new_noad);
        type(tail) = vcenter_noad;
        math_type(nucleus(tail)) = sub_box;
        info(nucleus(tail)) = p;
    }
⟧

1223. The routine that inserts a style_node holds no surprises.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("displaystyle"),
      math_style,
      display_style,
    )

    primitive(strpool!("textstyle"), math_style, text_style)

    primitive(
      strpool!("scriptstyle"),
      math_style,
      script_style,
    )

    primitive(
      strpool!("scriptscriptstyle"),
      math_style,
      script_script_style,
    )
⟧

1224.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    math_style:

    print_style(chr_code)
⟧

1225.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + math_style:
      tail_append(new_style(cur_chr));;

    mmode + non_script:
      tail_append(new_glue(zero_glue));
      subtype(tail) = cond_math_glue;

    mmode + math_choice:
      append_choices;;
⟧

1226. The routine that scans the four mlists of a \mathchoice is very much like the routine that builds discretionary nodes.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function append_choices() {
        tail_append(new_choice);
        incr(save_ptr);
        saved(-1) = 0;
        push_math(math_choice_group);
        scan_left_brace;
    }
⟧

1227.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    var math_choice_group: build_choices;
⟧

1228.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    ⟦1238 Declare the function called |fin_mlist|⟧

    function build_choices() {
        label exit;
        var
          p: pointer; // the current mlist
        
        unsave;
        p = fin_mlist(null);
        case saved(-1) {
          0:
            display_mlist(tail) = p;
          1:
            text_mlist(tail) = p;
          2:
            script_mlist(tail) = p;
          3:
            script_script_mlist(tail) = p;
            decr(save_ptr);
            return;
          // there are no other cases
        }
        incr(saved(-1));
        push_math(math_choice_group);
        scan_left_brace;
      exit:
    }
⟧

1229. Subscripts and superscripts are attached to the previous nucleus by the action procedure called sub_sup . We use the facts that sub_mark == sup_mark + 1 and subscr(p) == supscr(p) + 1 .

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + sub_mark, mmode + sup_mark:
      sub_sup;;
⟧

1230.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function sub_sup() {
        var
          t: small_number, // type of previous 
          // sub/superscript
          p: pointer; // field to be filled by scan_math 
        
        t = empty;
        p = null;
        if (tail != head) {
            if (scripts_allowed(tail)) {
                //  supscr or subscr 
                p = supscr(tail) + cur_cmd - sup_mark;
                t = math_type(p);
            }
        }
        if ((p == null) || (t != empty)) {
            ⟦1231 Insert a dummy noad to be sub/superscripted⟧
        }
        scan_math(p);
    }
⟧

1231.

⟦1231 Insert a dummy noad to be sub/superscripted⟧ = ⟦
    {
        tail_append(new_noad);
        //  supscr or subscr 
        p = supscr(tail) + cur_cmd - sup_mark;
        if (t != empty) {
            if (cur_cmd == sup_mark) {
                print_err(strpool!("Double superscript"));
                help1(
                  strpool!("I treat `x^1^2' essentially like `x^1{}^2'."),
                );
            } else {
                print_err(strpool!("Double subscript"));
                help1(
                  strpool!("I treat `x_1_2' essentially like `x_1{}_2'."),
                );
            }
            error;
        }
    }
⟧

1232. An operation like ‘\over’ causes the current mlist to go into a state of suspended animation: incompleat_noad points to a fraction_noad that contains the mlist-so-far as its numerator, while the denominator is yet to come. Finally when the mlist is finished, the denominator will go into the incompleat fraction noad, and that noad will become the whole formula, unless it is surrounded by ‘\left’ and ‘\right’ delimiters.

@define above_code => 0 // `\.{\\above}'
@define over_code => 1 // `\.{\\over}'
@define atop_code => 2 // `\.{\\atop}'
@define delimited_code => 3 // `\.{\\abovewithdelims}', etc.
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("above"), above, above_code)

    primitive(strpool!("over"), above, over_code)

    primitive(strpool!("atop"), above, atop_code)

    primitive(
      strpool!("abovewithdelims"),
      above,
      delimited_code + above_code,
    )

    primitive(
      strpool!("overwithdelims"),
      above,
      delimited_code + over_code,
    )

    primitive(
      strpool!("atopwithdelims"),
      above,
      delimited_code + atop_code,
    )
⟧

1233.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    above:

    case chr_code {
      over_code:
        print_esc(strpool!("over"));
      atop_code:
        print_esc(strpool!("atop"));
      delimited_code + above_code:
        print_esc(strpool!("abovewithdelims"));
      delimited_code + over_code:
        print_esc(strpool!("overwithdelims"));
      delimited_code + atop_code:
        print_esc(strpool!("atopwithdelims"));
      othercases:
        print_esc(strpool!("above"));
    }
⟧

1234.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + above:
      math_fraction;;
⟧

1235.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function math_fraction() {
        var
          c: small_number; // the type of generalized 
          // fraction we are scanning
        
        c = cur_chr;
        if (incompleat_noad != null) {
            ⟦1237 Ignore the fraction operation and complain about this ambiguous case⟧
        } else {
            incompleat_noad = get_node(fraction_noad_size);
            type(incompleat_noad) = fraction_noad;
            subtype(incompleat_noad) = normal;
            math_type(numerator(incompleat_noad)) = (
              sub_mlist
            );
            info(numerator(incompleat_noad)) = link(head);
            mem[denominator(incompleat_noad)].hh = (
              empty_field
            );
            mem[left_delimiter(incompleat_noad)].qqqq = (
              null_delimiter
            );
            mem[right_delimiter(incompleat_noad)].qqqq = (
              null_delimiter
            );
            link(head) = null;
            tail = head;
            ⟦1236 Use code |c| to distinguish between generalized fractions⟧
        }
    }
⟧

1236.

⟦1236 Use code |c| to distinguish between generalized fractions⟧ = ⟦
    if (c >= delimited_code) {
        scan_delimiter(
          left_delimiter(incompleat_noad),
          false,
        );
        scan_delimiter(
          right_delimiter(incompleat_noad),
          false,
        );
    }

    case c % delimited_code {
      above_code:
        scan_normal_dimen;
        thickness(incompleat_noad) = cur_val;
      over_code:
        thickness(incompleat_noad) = default_code;
      atop_code:
        thickness(incompleat_noad) = 0;// there are no other 
      // cases
    }
⟧

1237.

⟦1237 Ignore the fraction operation and complain about this ambiguous case⟧ = ⟦
    {
        if (c >= delimited_code) {
            scan_delimiter(garbage, false);
            scan_delimiter(garbage, false);
        }
        if (c % delimited_code == above_code) {
            scan_normal_dimen;
        }
        print_err(
          strpool!("Ambiguous; you need another { and }"),
        );
        help3(
          strpool!("I'm ignoring this fraction specification, since I don't"),
        )(
          strpool!("know whether a construction like `x \\over y \\over z'"),
        )(
          strpool!("means `{x \\over y} \\over z' or `x \\over {y \\over z}'."),
        );
        error;
    }
⟧

1238. At the end of a math formula or subformula, the fin_mlist routine is called upon to return a pointer to the newly completed mlist, and to pop the nest back to the enclosing semantic level. The parameter to fin_mlist , if not null, points to a right_noad that ends the current mlist; this right_noad has not yet been appended.

⟦1238 Declare the function called |fin_mlist|⟧ = ⟦
    function fin_mlist(p: pointer): pointer {
        var
          q: pointer; // the mlist to return
        
        if (incompleat_noad != null) {
            ⟦1239 Compleat the incompleat noad⟧
        } else {
            link(tail) = p;
            q = link(head);
        }
        pop_nest;
        fin_mlist = q;
    }
⟧

1239.

⟦1239 Compleat the incompleat noad⟧ = ⟦
    {
        math_type(denominator(incompleat_noad)) = sub_mlist;
        info(denominator(incompleat_noad)) = link(head);
        if (p == null) {
            q = incompleat_noad;
        } else {
            q = info(numerator(incompleat_noad));
            if (
                (type(q) != left_noad)
                || (delim_ptr == null)
            ) {
                confusion(strpool!("right"));
            }
            info(numerator(incompleat_noad)) = link(
              delim_ptr,
            );
            link(delim_ptr) = incompleat_noad;
            link(incompleat_noad) = p;
        }
    }
⟧

1240. Now at last we’re ready to see what happens when a right brace occurs in a math formula. Two special cases are simplified here: Braces are effectively removed when they surround a single Ord without sub/superscripts, or when they surround an accent that is the nucleus of an Ord atom.

⟦1139 Cases of |handle_right_brace| where a |right_brace| triggers a delayed action⟧ += ⟦
    math_group:

    {
        unsave;
        decr(save_ptr);
        math_type(saved(0)) = sub_mlist;
        p = fin_mlist(null);
        info(saved(0)) = p;
        if (p != null) {
            if (link(p) == null) {
                if (type(p) == ord_noad) {
                    if (math_type(subscr(p)) == empty) {
                        if (math_type(supscr(p)) == empty) {
                            mem[saved(0)].hh = mem[
                              nucleus(p),
                            ].hh;
                            free_node(p, noad_size);
                        }
                    }
                } else if (type(p) == accent_noad) {
                    if (saved(0) == nucleus(tail)) {
                        if (type(tail) == ord_noad) {
                            ⟦1241 Replace the tail of the list by |p|⟧
                        }
                    }
                }
            }
        }
    }
⟧

1241.

⟦1241 Replace the tail of the list by |p|⟧ = ⟦
    {
        q = head;
        while (link(q) != tail) {
            q = link(q);
        }
        link(q) = p;
        free_node(tail, noad_size);
        tail = p;
    }
⟧

1242. We have dealt with all constructions of math mode except ‘\left’ and ‘\right’, so the picture is completed by the following sections of the program.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("left"), left_right, left_noad)

    primitive(strpool!("right"), left_right, right_noad)

    text(frozen_right) = strpool!("right")

    eqtb[frozen_right] = eqtb[cur_val]
⟧

1243.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    left_right:

    if (chr_code == left_noad) {
        print_esc(strpool!("left"));
    }

    ⟦1508 Cases of |left_right| for |print_cmd_chr|⟧

    else

    print_esc(strpool!("right"))
⟧

1244.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + left_right:
      math_left_right;;
⟧

1245.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function math_left_right() {
        var
          t: small_number, //  left_noad or right_noad 
          p: pointer, // new noad
          q: pointer; // resulting mlist
        
        t = cur_chr;
        if (
            (t != left_noad)
            && (cur_group != math_left_group)
        ) {
            ⟦1246 Try to recover from mismatched \.{\\right}⟧
        } else {
            p = new_noad;
            type(p) = t;
            scan_delimiter(delimiter(p), false);
            if (t == middle_noad) {
                type(p) = right_noad;
                subtype(p) = middle_noad;
            }
            if (t == left_noad) {
                q = p;
            } else {
                q = fin_mlist(p);
                // end of math_left_group 
                unsave;
            }
            if (t != right_noad) {
                push_math(math_left_group);
                link(head) = q;
                tail = p;
                delim_ptr = p;
            } else {
                tail_append(new_noad);
                type(tail) = inner_noad;
                math_type(nucleus(tail)) = sub_mlist;
                info(nucleus(tail)) = q;
            }
        }
    }
⟧

1246.

⟦1246 Try to recover from mismatched \.{\\right}⟧ = ⟦
    {
        if (cur_group == math_shift_group) {
            scan_delimiter(garbage, false);
            print_err(strpool!("Extra "));
            if (t == middle_noad) {
                print_esc(strpool!("middle"));
                help1(
                  strpool!("I'm ignoring a \\middle that had no matching \\left."),
                );
            } else {
                print_esc(strpool!("right"));
                help1(
                  strpool!("I'm ignoring a \\right that had no matching \\left."),
                );
            }
            error;
        } else {
            off_save;
        }
    }
⟧

1247. Here is the only way out of math mode.

⟦1110 Cases of |main_control| that build boxes and lists⟧ += ⟦
    mmode + math_shift:
      if (cur_group == math_shift_group) {
          after_math;
      } else {
          off_save;
      }
⟧

1248.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    ⟦1555 Declare subprocedures for |after_math|⟧

    function after_math() {
        var
          l: boolean, // `\.{\\leqno}' instead of 
          // `\.{\\eqno}'
          danger: boolean, // not enough symbol fonts are 
          // present
          m: integer, //  mmode or - mmode 
          p: pointer, // the formula
          a: pointer, // box containing equation number
          ⟦1252 Local variables for finishing a displayed formula⟧;
        
        danger = false;
        ⟦1553 Retrieve the prototype box⟧
        ⟦1249 Check that the necessary fonts for math symbols are present; if not, flush the current math lists and set |danger:=true|⟧
        m = mode;
        l = false;
        // this pops the nest
        p = fin_mlist(null);
        // end of equation number
        if (mode == -m) {
            ⟦1251 Check that another \.\$ follows⟧
            cur_mlist = p;
            cur_style = text_style;
            mlist_penalties = false;
            mlist_to_hlist;
            a = hpack(link(temp_head), natural);
            set_box_lr(a)(dlist);
            unsave;
            // now cur_group == math_shift_group 
            decr(save_ptr);
            if (saved(0) == 1) {
                l = true;
            }
            danger = false;
            ⟦1553 Retrieve the prototype box⟧
            ⟦1249 Check that the necessary fonts for math symbols are present; if not, flush the current math lists and set |danger:=true|⟧
            m = mode;
            p = fin_mlist(null);
        } else {
            a = null;
        }
        if (m < 0) {
            ⟦1250 Finish math in text⟧
        } else {
            if (a == null) {
                ⟦1251 Check that another \.\$ follows⟧
            }
            ⟦1253 Finish displayed math⟧
        }
    }
⟧

1249.

⟦1249 Check that the necessary fonts for math symbols are present; if not, flush the current math lists and set |danger:=true|⟧ = ⟦
    if (
        (
            (
                font_params[fam_fnt(2 + text_size)]
                < total_mathsy_params
            )
            && (!is_new_mathfont(fam_fnt(2 + text_size)))
        )
        || (
            (
                font_params[fam_fnt(2 + script_size)]
                < total_mathsy_params
            )
            && (!is_new_mathfont(fam_fnt(2 + script_size)))
        )
        || (
            (
                font_params[
                  fam_fnt(2 + script_script_size),
                ]
                < total_mathsy_params
            )
            && (!is_new_mathfont(
              fam_fnt(2 + script_script_size),
            ))
        )
    ) {
        print_err(
          strpool!("Math formula deleted: Insufficient symbol fonts"),
        );
        help3(
          strpool!("Sorry, but I can't typeset math unless \\textfont 2"),
        )(
          strpool!("and \\scriptfont 2 and \\scriptscriptfont 2 have all"),
        )(
          strpool!("the \\fontdimen values needed in math symbol fonts."),
        );
        error;
        flush_math;
        danger = true;
    } else if (
        (
            (
                font_params[fam_fnt(3 + text_size)]
                < total_mathex_params
            )
            && (!is_new_mathfont(fam_fnt(3 + text_size)))
        )
        || (
            (
                font_params[fam_fnt(3 + script_size)]
                < total_mathex_params
            )
            && (!is_new_mathfont(fam_fnt(3 + script_size)))
        )
        || (
            (
                font_params[
                  fam_fnt(3 + script_script_size),
                ]
                < total_mathex_params
            )
            && (!is_new_mathfont(
              fam_fnt(3 + script_script_size),
            ))
        )
    ) {
        print_err(
          strpool!("Math formula deleted: Insufficient extension fonts"),
        );
        help3(
          strpool!("Sorry, but I can't typeset math unless \\textfont 3"),
        )(
          strpool!("and \\scriptfont 3 and \\scriptscriptfont 3 have all"),
        )(
          strpool!("the \\fontdimen values needed in math extension fonts."),
        );
        error;
        flush_math;
        danger = true;
    }
⟧

1250. The unsave is done after everything else here; hence an appearance of ‘\mathsurround’ inside of ‘$...$’ affects the spacing at these particular $’s. This is consistent with the conventions of ‘$$...$$’, since ‘\abovedisplayskip’ inside a display affects the space above that display.

⟦1250 Finish math in text⟧ = ⟦
    {
        tail_append(new_math(math_surround, before));
        cur_mlist = p;
        cur_style = text_style;
        mlist_penalties = (mode > 0);
        mlist_to_hlist;
        link(tail) = link(temp_head);
        while (link(tail) != null) {
            tail = link(tail);
        }
        tail_append(new_math(math_surround, after));
        space_factor = 1000;
        unsave;
    }
⟧

1251. TEX gets to the following part of the program when the first ‘$’ ending a display has been scanned.

⟦1251 Check that another \.\$ follows⟧ = ⟦
    {
        get_x_token;
        if (cur_cmd != math_shift) {
            print_err(
              strpool!("Display math should end with $$"),
            );
            help2(
              strpool!("The `$' that I just saw supposedly matches a previous `$$'."),
            )(
              strpool!("So I shall assume that you typed `$$' both times."),
            );
            back_error;
        }
    }
⟧

1252. We have saved the worst for last: The fussiest part of math mode processing occurs when a displayed formula is being centered and placed with an optional equation number.

⟦1252 Local variables for finishing a displayed formula⟧ = ⟦
    // box containing the equation
    var b: pointer;

    // width of the equation
    var w: scaled;

    // width of the line
    var z: scaled;

    // width of equation number
    var e: scaled;

    // width of equation number plus space to separate from 
    // equation
    var q: scaled;

    // displacement of equation in the line
    var d: scaled;

    // move the line right this much
    var s: scaled;

    // glue parameter codes for before and after
    var g1, g2: small_number;

    // kern node used to position the display
    var r: pointer;

    // tail of adjustment list
    var t: pointer;

    // tail of pre-adjustment list
    var pre_t: pointer;
⟧

1253. At this time p points to the mlist for the formula; a is either null or it points to a box containing the equation number; and we are in vertical mode (or internal vertical mode).

⟦1253 Finish displayed math⟧ = ⟦
    cur_mlist = p

    cur_style = display_style

    mlist_penalties = false

    mlist_to_hlist

    p = link(temp_head)

    adjust_tail = adjust_head

    pre_adjust_tail = pre_adjust_head

    b = hpack(p, natural)

    p = list_ptr(b)

    t = adjust_tail

    adjust_tail = null

    pre_t = pre_adjust_tail

    pre_adjust_tail = null

    w = width(b)

    z = display_width

    s = display_indent

    if (pre_display_direction < 0) {
        s = -s - z;
    }

    if ((a == null) || danger) {
        e = 0;
        q = 0;
    } else {
        e = width(a);
        q = e + math_quad(text_size);
    }

    if (w + q > z) {
        ⟦1255 Squeeze the equation as much as possible; if there is an equation number that should go on a separate line by itself, set~|e:=0|⟧
    }

    ⟦1256 Determine the displacement, |d|, of the left edge of the equation, with respect to the line size |z|, assuming that |l=false|⟧

    ⟦1257 Append the glue or equation number preceding the display⟧

    ⟦1258 Append the display and perhaps also the equation number⟧

    ⟦1259 Append the glue or equation number following the display⟧

    ⟦1554 Flush the prototype box⟧

    resume_after_display
⟧

1254.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function resume_after_display() {
        if (cur_group != math_shift_group) {
            confusion(strpool!("display"));
        }
        unsave;
        prev_graf = prev_graf + 3;
        push_nest;
        mode = hmode;
        space_factor = 1000;
        set_cur_lang;
        clang = cur_lang;
        prev_graf = 
            (
                norm_min(left_hyphen_min)
                * 0x40 + norm_min(right_hyphen_min)
            )
            * 0x10000 + cur_lang
        ;
        ⟦477 Scan an optional space⟧
        if (nest_ptr == 1) {
            build_page;
        }
    }
⟧

1255. The user can force the equation number to go on a separate line by causing its width to be zero.

⟦1255 Squeeze the equation as much as possible; if there is an equation number that should go on a separate line by itself, set~|e:=0|⟧ = ⟦
    {
        if (
            (e != 0)
            && (
                (w - total_shrink[normal] + q <= z)
                || (total_shrink[fil] != 0)
                || (total_shrink[fill] != 0)
                || (total_shrink[filll] != 0)
            )
        ) {
            free_node(b, box_node_size);
            b = hpack(p, z - q, exactly);
        } else {
            e = 0;
            if (w > z) {
                free_node(b, box_node_size);
                b = hpack(p, z, exactly);
            }
        }
        w = width(b);
    }
⟧

1256. We try first to center the display without regard to the existence of the equation number. If that would make it too close (where “too close” means that the space between display and equation number is less than the width of the equation number), we either center it in the remaining space or move it as far from the equation number as possible. The latter alternative is taken only if the display begins with glue, since we assume that the user put glue there to control the spacing precisely.

⟦1256 Determine the displacement, |d|, of the left edge of the equation, with respect to the line size |z|, assuming that |l=false|⟧ = ⟦
    set_box_lr(b)(dlist)

    d = half(z - w)

    // too close
    if ((e > 0) && (d < 2 * e)) {
        d = half(z - w - e);
        if (p != null) {
            if (!is_char_node(p)) {
                if (type(p) == glue_node) {
                    d = 0;
                }
            }
        }
    }
⟧

1257. If the equation number is set on a line by itself, either before or after the formula, we append an infinite penalty so that no page break will separate the display from its number; and we use the same size and displacement for all three potential lines of the display, even though ‘\parshape’ may specify them differently.

⟦1257 Append the glue or equation number preceding the display⟧ = ⟦
    tail_append(new_penalty(pre_display_penalty))

    // not enough clearance
    if ((d + s <= pre_display_size) || l) {
        g1 = above_display_skip_code;
        g2 = below_display_skip_code;
    } else {
        g1 = above_display_short_skip_code;
        g2 = below_display_short_skip_code;
    }

    // it follows that type ( a ) == hlist_node 
    if (l && (e == 0)) {
        app_display(j, a, 0);
        tail_append(new_penalty(inf_penalty));
    } else {
        tail_append(new_param_glue(g1));
    }
⟧

1258.

⟦1258 Append the display and perhaps also the equation number⟧ = ⟦
    if (e != 0) {
        r = new_kern(z - w - e - d);
        if (l) {
            link(a) = r;
            link(r) = b;
            b = a;
            d = 0;
        } else {
            link(b) = r;
            link(r) = a;
        }
        b = hpack(b, natural);
    }

    app_display(j, b, d)
⟧

1259.

⟦1259 Append the glue or equation number following the display⟧ = ⟦
    if ((a != null) && (e == 0) && !l) {
        tail_append(new_penalty(inf_penalty));
        app_display(j, a, z - width(a));
        g2 = 0;
    }

    // migrating material comes after equation number
    if (t != adjust_head) {
        link(tail) = link(adjust_head);
        tail = t;
    }

    if (pre_t != pre_adjust_head) {
        link(tail) = link(pre_adjust_head);
        tail = pre_t;
    }

    tail_append(new_penalty(post_display_penalty))

    if (g2 > 0) {
        tail_append(new_param_glue(g2));
    }
⟧

1260. When \halign appears in a display, the alignment routines operate essentially as they do in vertical mode. Then the following program is activated, with p and q pointing to the beginning and end of the resulting list, and with aux_save holding the prev_depth value.

⟦1260 Finish an alignment in a display⟧ = ⟦
    {
        do_assignments;
        if (cur_cmd != math_shift) {
            ⟦1261 Pontificate about improper alignment in display⟧
        } else {
            ⟦1251 Check that another \.\$ follows⟧
        }
        flush_node_list(LR_box);
        pop_nest;
        tail_append(new_penalty(pre_display_penalty));
        tail_append(
          new_param_glue(above_display_skip_code),
        );
        link(tail) = p;
        if (p != null) {
            tail = q;
        }
        tail_append(new_penalty(post_display_penalty));
        tail_append(
          new_param_glue(below_display_skip_code),
        );
        prev_depth = aux_save.sc;
        resume_after_display;
    }
⟧

1261.

⟦1261 Pontificate about improper alignment in display⟧ = ⟦
    {
        print_err(strpool!("Missing $$ inserted"));
        help2(
          strpool!("Displays can use special alignments (like \\eqalignno)"),
        )(
          strpool!("only if nothing but the alignment itself is between $$'s."),
        );
        back_error;
    }
⟧

1262. [49] Mode-independent processing. The long main_control procedure has now been fully specified, except for certain activities that are independent of the current mode. These activities do not change the current vlist or hlist or mlist; if they change anything, it is the value of a parameter or the meaning of a control sequence.

Assignments to values in eqtb can be global or local. Furthermore, a control sequence can be defined to be ‘\long’, ‘\protected’, or ‘\outer’, and it might or might not be expanded. The prefixes ‘\global’, ‘\long’, ‘\protected’, and ‘\outer’ can occur in any order. Therefore we assign binary numeric codes, making it possible to accumulate the union of all specified prefixes by adding the corresponding codes. (Pascal’s set operations could also have been used.)

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("long"), prefix, 1)

    primitive(strpool!("outer"), prefix, 2)

    primitive(strpool!("global"), prefix, 4)

    primitive(strpool!("def"), def, 0)

    primitive(strpool!("gdef"), def, 1)

    primitive(strpool!("edef"), def, 2)

    primitive(strpool!("xdef"), def, 3)
⟧

1263.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    prefix:

    if (chr_code == 1) {
        print_esc(strpool!("long"));
    } else if (chr_code == 2) {
        print_esc(strpool!("outer"));
    }

    ⟦1582 Cases of |prefix| for |print_cmd_chr|⟧

    else

    print_esc(strpool!("global"))

    def:

    if (chr_code == 0) {
        print_esc(strpool!("def"));
    } else if (chr_code == 1) {
        print_esc(strpool!("gdef"));
    } else if (chr_code == 2) {
        print_esc(strpool!("edef"));
    } else {
        print_esc(strpool!("xdef"));
    }
⟧

1264. Every prefix, and every command code that might or might not be prefixed, calls the action procedure prefixed_command . This routine accumulates a sequence of prefixes until coming to a non-prefix, then it carries out the command.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ = ⟦
    any_mode(toks_register),
    any_mode(assign_toks),
    any_mode(assign_int),
    any_mode(assign_dimen),
    any_mode(assign_glue),
    any_mode(assign_mu_glue),
    any_mode(assign_font_dimen),
    any_mode(assign_font_int),
    any_mode(set_aux),
    any_mode(set_prev_graf),
    any_mode(set_page_dimen),
    any_mode(set_page_int),
    any_mode(set_box_dimen),
    any_mode(set_shape),
    any_mode(def_code),
    any_mode(XeTeX_def_code),
    any_mode(def_family),
    any_mode(set_font),
    any_mode(def_font),
    any_mode(register),
    any_mode(advance),
    any_mode(multiply),
    any_mode(divide),
    any_mode(prefix),
    any_mode(let),
    any_mode(shorthand_def),
    any_mode(read_to_cs),
    any_mode(def),
    any_mode(set_box),
    any_mode(hyph_data),
    any_mode(set_interaction):
      prefixed_command;;
⟧

1265. If the user says, e.g., ‘\global\global’, the redundancy is silently accepted.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    ⟦1269 Declare subprocedures for |prefixed_command|⟧

    function prefixed_command() {
        label done, exit;
        var
          a: small_number, // accumulated prefix codes so 
          // far
          f: internal_font_number, // identifies a font
          j: halfword, // index into a \.{\\parshape} 
          // specification
          k: font_index, // index into font_info 
          p, q: pointer, // for temporary short-term use
          n: integer, // ditto
          e: boolean; // should a definition be expanded? or 
          // was \.{\\let} not done?
        
        a = 0;
        while (cur_cmd == prefix) {
            if (!odd(a div cur_chr)) {
                a = a + cur_chr;
            }
            ⟦438 Get the next non-blank non-relax non-call token⟧
            if (cur_cmd <= max_non_prefixed_command) {
                ⟦1266 Discard erroneous prefixes and |return|⟧
            }
            if (tracing_commands > 2) {
                if (eTeX_ex) {
                    show_cur_cmd_chr;
                }
            }
        }
        ⟦1267 Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant⟧
        ⟦1268 Adjust \(f)for the setting of \.{\\globaldefs}⟧
        case cur_cmd {
          ⟦1271 Assignments⟧
          othercases:
            confusion(strpool!("prefix"));
        }
      done:
        ⟦1323 Insert a token saved by \.{\\afterassignment}, if any⟧
      exit:
    }
⟧

1266.

⟦1266 Discard erroneous prefixes and |return|⟧ = ⟦
    {
        print_err(
          strpool!("You can't use a prefix with `"),
        );
        print_cmd_chr(cur_cmd, cur_chr);
        print_char(ord!("'"));
        help1(
          strpool!("I'll pretend you didn't say \\long or \\outer or \\global."),
        );
        if (eTeX_ex) {
            help_line[0] = strpool!("I'll pretend you didn't say \\long or \\outer or \\global or \\protected.");
        }
        back_error;
        return;
    }
⟧

1267.

⟦1267 Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant⟧ = ⟦
    if (a >= 8) {
        j = protected_token;
        a = a - 8;
    } else {
        j = 0;
    }

    if ((cur_cmd != def) && ((a % 4 != 0) || (j != 0))) {
        print_err(strpool!("You can't use `"));
        print_esc(strpool!("long"));
        print(strpool!("' or `"));
        print_esc(strpool!("outer"));
        help1(
          strpool!("I'll pretend you didn't say \\long or \\outer here."),
        );
        if (eTeX_ex) {
            help_line[0] = strpool!("I'll pretend you didn't say \\long or \\outer or \\protected here.");
            print(strpool!("' or `"));
            print_esc(strpool!("protected"));
        }
        print(strpool!("' with `"));
        print_cmd_chr(cur_cmd, cur_chr);
        print_char(ord!("'"));
        error;
    }
⟧

1268. The previous routine does not have to adjust a so that a % 4 == 0 , since the following routines test for the \global prefix as follows.

@define global => (a >= 4)
@define define(#) =>
    if (global) {
        geq_define(#);
    } else {
        eq_define(#);
    }
@define word_define(#) =>
    if (global) {
        geq_word_define(#);
    } else {
        eq_word_define(#);
    }
@define word_define1(#) =>
    if (global) {
        geq_word_define1(#);
    } else {
        eq_word_define1(#);
    }
⟦1268 Adjust \(f)for the setting of \.{\\globaldefs}⟧ = ⟦
    if (global_defs != 0) {
        if (global_defs < 0) {
            if (global) {
                a = a - 4;
            }
        } else {
            if (!global) {
                a = a + 4;
            }
        }
    }
⟧

1269. When a control sequence is to be defined, by \def or \let or something similar, the get_r_token routine will substitute a special control sequence for a token that is not redefinable.

⟦1269 Declare subprocedures for |prefixed_command|⟧ = ⟦
    function get_r_token() {
        label restart;
        
      restart:
        repeat {
            get_token;
        } until (cur_tok != space_token);
        if (
            (cur_cs == 0)
            || (cur_cs > eqtb_top)
            || (
                (cur_cs > frozen_control_sequence)
                && (cur_cs <= eqtb_size)
            )
        ) {
            print_err(
              strpool!("Missing control sequence inserted"),
            );
            help5(
              strpool!("Please don't say `\\def cs{...}', say `\\def\\cs{...}'."),
            )(
              strpool!("I've inserted an inaccessible control sequence so that your"),
            )(
              strpool!("definition will be completed without mixing me up too badly."),
            )(
              strpool!("You can recover graciously from this error, if you're"),
            )(
              strpool!("careful; see exercise 27.2 in The TeXbook."),
            );
            if (cur_cs == 0) {
                back_input;
            }
            cur_tok = cs_token_flag + frozen_protection;
            ins_error;
            goto restart;
        }
    }
⟧

1270.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    text(frozen_protection) = strpool!("inaccessible")
⟧

1271. Here’s an example of the way many of the following routines operate. (Unfortunately, they aren’t all as simple as this.)

⟦1271 Assignments⟧ = ⟦
    set_font:

    define(cur_font_loc, data, cur_chr)
⟧

1272. When a def command has been scanned, cur_chr is odd if the definition is supposed to be global, and cur_chr >= 2 if the definition is supposed to be expanded.

⟦1271 Assignments⟧ += ⟦
    def:

    {
        if (odd(cur_chr) && !global && (global_defs >= 0)) {
            a = a + 4;
        }
        e = (cur_chr >= 2);
        get_r_token;
        p = cur_cs;
        q = scan_toks(true, e);
        if (j != 0) {
            q = get_avail;
            info(q) = j;
            link(q) = link(def_ref);
            link(def_ref) = q;
        }
        define(p, call + (a % 4), def_ref);
    }
⟧

1273. Both \let and \futurelet share the command code let .

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("let"), let, normal)

    primitive(strpool!("futurelet"), let, normal + 1)
⟧

1274.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    let:

    if (chr_code != normal) {
        print_esc(strpool!("futurelet"));
    } else {
        print_esc(strpool!("let"));
    }
⟧

1275.

⟦1271 Assignments⟧ += ⟦
    let:

    {
        n = cur_chr;
        get_r_token;
        p = cur_cs;
        if (n == normal) {
            repeat {
                get_token;
            } until (cur_cmd != spacer);
            if (cur_tok == other_token + ord!("=")) {
                get_token;
                if (cur_cmd == spacer) {
                    get_token;
                }
            }
        } else {
            get_token;
            q = cur_tok;
            get_token;
            back_input;
            cur_tok = q;
            // look ahead, then back up
            back_input;
            // note that back_input doesn't affect cur_cmd , 
            // cur_chr 
        }
        if (cur_cmd >= call) {
            add_token_ref(cur_chr);
        } else if (
            (cur_cmd == register)
            || (cur_cmd == toks_register)
        ) {
            if (
                (cur_chr < mem_bot)
                || (cur_chr > lo_mem_stat_max)
            ) {
                add_sa_ref(cur_chr);
            }
        }
        define(p, cur_cmd, cur_chr);
    }
⟧

1276. A \chardef creates a control sequence whose cmd is char_given ; a \mathchardef creates a control sequence whose cmd is math_given ; and the corresponding chr is the character code or math code. A \countdef or \dimendef or \skipdef or \muskipdef creates a control sequence whose cmd is assign_int or … or assign_mu_glue , and the corresponding chr is the eqtb location of the internal register in question.

//  shorthand_def for \.{\\chardef}
@define char_def_code => 0
//  shorthand_def for \.{\\mathchardef}
@define math_char_def_code => 1
//  shorthand_def for \.{\\countdef}
@define count_def_code => 2
//  shorthand_def for \.{\\dimendef}
@define dimen_def_code => 3
//  shorthand_def for \.{\\skipdef}
@define skip_def_code => 4
//  shorthand_def for \.{\\muskipdef}
@define mu_skip_def_code => 5
//  shorthand_def for \.{\\toksdef}
@define toks_def_code => 6
//  shorthand_def for \.{\\charsubdef}
@define char_sub_def_code => 7
@define XeTeX_math_char_num_def_code => 8
@define XeTeX_math_char_def_code => 9
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("chardef"),
      shorthand_def,
      char_def_code,
    )

    primitive(
      strpool!("mathchardef"),
      shorthand_def,
      math_char_def_code,
    )

    primitive(
      strpool!("XeTeXmathcharnumdef"),
      shorthand_def,
      XeTeX_math_char_num_def_code,
    )

    primitive(
      strpool!("Umathcharnumdef"),
      shorthand_def,
      XeTeX_math_char_num_def_code,
    )

    primitive(
      strpool!("XeTeXmathchardef"),
      shorthand_def,
      XeTeX_math_char_def_code,
    )

    primitive(
      strpool!("Umathchardef"),
      shorthand_def,
      XeTeX_math_char_def_code,
    )

    primitive(
      strpool!("countdef"),
      shorthand_def,
      count_def_code,
    )

    primitive(
      strpool!("dimendef"),
      shorthand_def,
      dimen_def_code,
    )

    primitive(
      strpool!("skipdef"),
      shorthand_def,
      skip_def_code,
    )

    primitive(
      strpool!("muskipdef"),
      shorthand_def,
      mu_skip_def_code,
    )

    primitive(
      strpool!("toksdef"),
      shorthand_def,
      toks_def_code,
    )

    if (mltex_p) {
        primitive(
          strpool!("charsubdef"),
          shorthand_def,
          char_sub_def_code,
        );
    }
⟧

1277.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    shorthand_def:

    case chr_code {
      char_def_code:
        print_esc(strpool!("chardef"));
      math_char_def_code:
        print_esc(strpool!("mathchardef"));
      XeTeX_math_char_def_code:
        print_esc(strpool!("Umathchardef"));
      XeTeX_math_char_num_def_code:
        print_esc(strpool!("Umathcharnumdef"));
      count_def_code:
        print_esc(strpool!("countdef"));
      dimen_def_code:
        print_esc(strpool!("dimendef"));
      skip_def_code:
        print_esc(strpool!("skipdef"));
      mu_skip_def_code:
        print_esc(strpool!("muskipdef"));
      char_sub_def_code:
        print_esc(strpool!("charsubdef"));
      othercases:
        print_esc(strpool!("toksdef"));
    }

    char_given:

    {
        print_esc(strpool!("char"));
        print_hex(chr_code);
    }

    math_given:

    {
        print_esc(strpool!("mathchar"));
        print_hex(chr_code);
    }

    XeTeX_math_given:

    {
        print_esc(strpool!("Umathchar"));
        print_hex(math_class_field(chr_code));
        print_hex(math_fam_field(chr_code));
        print_hex(math_char_field(chr_code));
    }
⟧

1278. We temporarily define p to be relax , so that an occurrence of p while scanning the definition will simply stop the scanning instead of producing an “undefined control sequence” error or expanding the previous meaning. This allows, for instance, ‘\chardef\foo=123\foo’.

⟦1271 Assignments⟧ += ⟦
    shorthand_def:

    if (cur_chr == char_sub_def_code) {
        scan_char_num;
        p = char_sub_code_base + cur_val;
        scan_optional_equals;
        scan_char_num;
        // accent character in substitution
        n = cur_val;
        scan_char_num;
        if ((tracing_char_sub_def > 0)) {
            begin_diagnostic;
            print_nl(
              strpool!("New character substitution: "),
            );
            print_ASCII(p - char_sub_code_base);
            print(strpool!(" = "));
            print_ASCII(n);
            print_char(ord!(" "));
            print_ASCII(cur_val);
            end_diagnostic(false);
        }
        n = n * 256 + cur_val;
        define(p, data, hi(n));
        if ((p - char_sub_code_base) < char_sub_def_min) {
            word_define(
              int_base + char_sub_def_min_code,
              p - char_sub_code_base,
            );
        }
        if ((p - char_sub_code_base) > char_sub_def_max) {
            word_define(
              int_base + char_sub_def_max_code,
              p - char_sub_code_base,
            );
        }
    } else {
        n = cur_chr;
        get_r_token;
        p = cur_cs;
        define(p, relax, too_big_usv);
        scan_optional_equals;
        case n {
          char_def_code:
            scan_usv_num;
            define(p, char_given, cur_val);
          math_char_def_code:
            scan_fifteen_bit_int;
            define(p, math_given, cur_val);
          XeTeX_math_char_num_def_code:
            scan_xetex_math_char_int;
            define(p, XeTeX_math_given, cur_val);
          XeTeX_math_char_def_code:
            scan_math_class_int;
            n = set_class_field(cur_val);
            scan_math_fam_int;
            n = n + set_family_field(cur_val);
            scan_usv_num;
            n = n + cur_val;
            define(p, XeTeX_math_given, n);
          othercases:
            scan_register_num;
            if (cur_val > 255) {
                //  int_val .. box_val 
                j = n - count_def_code;
                if (j > mu_val) {
                    //  int_val .. mu_val or tok_val 
                    j = tok_val;
                }
                find_sa_element(j, cur_val, true);
                add_sa_ref(cur_ptr);
                if (j == tok_val) {
                    j = toks_register;
                } else {
                    j = register;
                }
                define(p, j, cur_ptr);
            } else {
                case n {
                  count_def_code:
                    define(
                      p,
                      assign_int,
                      count_base + cur_val,
                    );
                  dimen_def_code:
                    define(
                      p,
                      assign_dimen,
                      scaled_base + cur_val,
                    );
                  skip_def_code:
                    define(
                      p,
                      assign_glue,
                      skip_base + cur_val,
                    );
                  mu_skip_def_code:
                    define(
                      p,
                      assign_mu_glue,
                      mu_skip_base + cur_val,
                    );
                  toks_def_code:
                    define(
                      p,
                      assign_toks,
                      toks_base + cur_val,
                    );// there are no other cases
                }
            }
        }
    }
⟧

1279.

⟦1271 Assignments⟧ += ⟦
    read_to_cs:

    {
        j = cur_chr;
        scan_int;
        n = cur_val;
        if (!scan_keyword(strpool!("to"))) {
            print_err(strpool!("Missing `to' inserted"));
            help2(
              strpool!("You should have said `\\read<number> to \\cs'."),
            )(
              strpool!("I'm going to look for the \\cs now."),
            );
            error;
        }
        get_r_token;
        p = cur_cs;
        read_toks(n, p, j);
        define(p, call, cur_val);
    }
⟧

1280. The token-list parameters, \output and \everypar, etc., receive their values in the following way. (For safety’s sake, we place an enclosing pair of braces around an \output list.)

⟦1271 Assignments⟧ += ⟦
    toks_register, assign_toks:
      q = cur_cs;
      // just in case, will be set true for sparse array 
      // elements
      e = false;
      if (cur_cmd == toks_register) {
          if (cur_chr == mem_bot) {
              scan_register_num;
              if (cur_val > 255) {
                  find_sa_element(tok_val, cur_val, true);
                  cur_chr = cur_ptr;
                  e = true;
              } else {
                  cur_chr = toks_base + cur_val;
              }
          } else {
              e = true;
          }
      } else if (cur_chr == XeTeX_inter_char_loc) {
          scan_char_class_not_ignored;
          cur_ptr = cur_val;
          scan_char_class_not_ignored;
          find_sa_element(
            inter_char_val,
            cur_ptr * char_class_limit + cur_val,
            true,
          );
          cur_chr = cur_ptr;
          e = true;
      }
      //  p == every_par_loc or output_routine_loc or \dots
      p = cur_chr;
      scan_optional_equals;
      ⟦438 Get the next non-blank non-relax non-call token⟧
      if (cur_cmd != left_brace) {
          ⟦1281 If the right-hand side is a token parameter or token register, finish the assignment and |goto done|⟧
      }
      back_input;
      cur_cs = q;
      q = scan_toks(false, false);
      // empty list: revert to the default
      if (link(def_ref) == null) {
          sa_define(p, null)(p, undefined_cs, null);
          free_avail(def_ref);
      } else {
          // enclose in curlies
          if ((p == output_routine_loc) && !e) {
              link(q) = get_avail;
              q = link(q);
              info(q) = right_brace_token + ord!("}");
              q = get_avail;
              info(q) = left_brace_token + ord!("{");
              link(q) = link(def_ref);
              link(def_ref) = q;
          }
          sa_define(p, def_ref)(p, call, def_ref);
      }
⟧

1281.

⟦1281 If the right-hand side is a token parameter or token register, finish the assignment and |goto done|⟧ = ⟦
    if (
        (cur_cmd == toks_register)
        || (cur_cmd == assign_toks)
    ) {
        if (cur_cmd == toks_register) {
            if (cur_chr == mem_bot) {
                scan_register_num;
                if (cur_val < 256) {
                    q = equiv(toks_base + cur_val);
                } else {
                    find_sa_element(
                      tok_val,
                      cur_val,
                      false,
                    );
                    if (cur_ptr == null) {
                        q = null;
                    } else {
                        q = sa_ptr(cur_ptr);
                    }
                }
            } else {
                q = sa_ptr(cur_chr);
            }
        } else if (cur_chr == XeTeX_inter_char_loc) {
            scan_char_class_not_ignored;
            cur_ptr = cur_val;
            scan_char_class_not_ignored;
            find_sa_element(
              inter_char_val,
              cur_ptr * char_class_limit + cur_val,
              false,
            );
            if (cur_ptr == null) {
                q = null;
            } else {
                q = sa_ptr(cur_ptr);
            }
        } else {
            q = equiv(cur_chr);
        }
        if (q == null) {
            sa_define(p, null)(p, undefined_cs, null);
        } else {
            add_token_ref(q);
            sa_define(p, q)(p, call, q);
        }
        goto done;
    }
⟧

1282. Similar routines are used to assign values to the numeric parameters.

⟦1271 Assignments⟧ += ⟦
    assign_int:

    {
        p = cur_chr;
        scan_optional_equals;
        scan_int;
        word_define(p, cur_val);
    }

    assign_dimen:

    {
        p = cur_chr;
        scan_optional_equals;
        scan_normal_dimen;
        word_define(p, cur_val);
    }

    assign_glue, assign_mu_glue:
      p = cur_chr;
      n = cur_cmd;
      scan_optional_equals;
      if (n == assign_mu_glue) {
          scan_glue(mu_val);
      } else {
          scan_glue(glue_val);
      }
      trap_zero_glue;
      define(p, glue_ref, cur_val);
⟧

1283. When a glue register or parameter becomes zero, it will always point to zero_glue because of the following procedure. (Exception: The tabskip glue isn’t trapped while preambles are being scanned.)

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function trap_zero_glue() {
        if (
            (width(cur_val) == 0)
            && (stretch(cur_val) == 0)
            && (shrink(cur_val) == 0)
        ) {
            add_glue_ref(zero_glue);
            delete_glue_ref(cur_val);
            cur_val = zero_glue;
        }
    }
⟧

1284. The various character code tables are changed by the def_code commands, and the font families are declared by def_family .

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("catcode"), def_code, cat_code_base)

    primitive(
      strpool!("mathcode"),
      def_code,
      math_code_base,
    )

    primitive(
      strpool!("XeTeXmathcodenum"),
      XeTeX_def_code,
      math_code_base,
    )

    primitive(
      strpool!("Umathcodenum"),
      XeTeX_def_code,
      math_code_base,
    )

    primitive(
      strpool!("XeTeXmathcode"),
      XeTeX_def_code,
      math_code_base + 1,
    )

    primitive(
      strpool!("Umathcode"),
      XeTeX_def_code,
      math_code_base + 1,
    )

    primitive(strpool!("lccode"), def_code, lc_code_base)

    primitive(strpool!("uccode"), def_code, uc_code_base)

    primitive(strpool!("sfcode"), def_code, sf_code_base)

    primitive(
      strpool!("XeTeXcharclass"),
      XeTeX_def_code,
      sf_code_base,
    )

    primitive(strpool!("delcode"), def_code, del_code_base)

    primitive(
      strpool!("XeTeXdelcodenum"),
      XeTeX_def_code,
      del_code_base,
    )

    primitive(
      strpool!("Udelcodenum"),
      XeTeX_def_code,
      del_code_base,
    )

    primitive(
      strpool!("XeTeXdelcode"),
      XeTeX_def_code,
      del_code_base + 1,
    )

    primitive(
      strpool!("Udelcode"),
      XeTeX_def_code,
      del_code_base + 1,
    )

    primitive(
      strpool!("textfont"),
      def_family,
      math_font_base,
    )

    primitive(
      strpool!("scriptfont"),
      def_family,
      math_font_base + script_size,
    )

    primitive(
      strpool!("scriptscriptfont"),
      def_family,
      math_font_base + script_script_size,
    )
⟧

1285.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    def_code:

    if (chr_code == cat_code_base) {
        print_esc(strpool!("catcode"));
    } else if (chr_code == math_code_base) {
        print_esc(strpool!("mathcode"));
    } else if (chr_code == lc_code_base) {
        print_esc(strpool!("lccode"));
    } else if (chr_code == uc_code_base) {
        print_esc(strpool!("uccode"));
    } else if (chr_code == sf_code_base) {
        print_esc(strpool!("sfcode"));
    } else {
        print_esc(strpool!("delcode"));
    }

    XeTeX_def_code:

    if (chr_code == sf_code_base) {
        print_esc(strpool!("XeTeXcharclass"));
    } else if (chr_code == math_code_base) {
        print_esc(strpool!("Umathcodenum"));
    } else if (chr_code == math_code_base + 1) {
        print_esc(strpool!("Umathcode"));
    } else if (chr_code == del_code_base) {
        print_esc(strpool!("Udelcodenum"));
    } else {
        print_esc(strpool!("Udelcode"));
    }

    def_family:

    print_size(chr_code - math_font_base)
⟧

1286. The different types of code values have different legal ranges; the following program is careful to check each case properly.

⟦1271 Assignments⟧ += ⟦
    XeTeX_def_code:

    {
        if (cur_chr == sf_code_base) {
            p = cur_chr;
            scan_usv_num;
            p = p + cur_val;
            n = sf_code(cur_val) % 0x10000;
            scan_optional_equals;
            scan_char_class;
            define(p, data, cur_val * 0x10000 + n);
        } else if (cur_chr == math_code_base) {
            p = cur_chr;
            scan_usv_num;
            p = p + cur_val;
            scan_optional_equals;
            scan_xetex_math_char_int;
            define(p, data, hi(cur_val));
        } else if (cur_chr == math_code_base + 1) {
            p = cur_chr - 1;
            scan_usv_num;
            p = p + cur_val;
            scan_optional_equals;
            scan_math_class_int;
            n = set_class_field(cur_val);
            scan_math_fam_int;
            n = n + set_family_field(cur_val);
            scan_usv_num;
            n = n + cur_val;
            define(p, data, hi(n));
        } else if (cur_chr == del_code_base) {
            p = cur_chr;
            scan_usv_num;
            p = p + cur_val;
            scan_optional_equals;
            //  scan_xetex_del_code_int ; !!FIXME!!
            scan_int;
            word_define(p, hi(cur_val));
        } else {
            p = cur_chr - 1;
            scan_usv_num;
            p = p + cur_val;
            scan_optional_equals;
            // extended delimiter code flag
            n = 0x40000000;
            scan_math_fam_int;
            // extended delimiter code family
            n = n + cur_val * 0x200000;
            scan_usv_num;
            // extended delimiter code USV
            n = n + cur_val;
            word_define(p, hi(n));
        }
    }

    def_code:

    {
        ⟦1287 Let |n| be the largest legal code value, based on |cur_chr|⟧
        p = cur_chr;
        scan_usv_num;
        p = p + cur_val;
        scan_optional_equals;
        scan_int;
        if (
            ((cur_val < 0) && (p < del_code_base))
            || (cur_val > n)
        ) {
            print_err(strpool!("Invalid code ("));
            print_int(cur_val);
            if (p < del_code_base) {
                print(
                  strpool!("), should be in the range 0.."),
                );
            } else {
                print(strpool!("), should be at most "));
            }
            print_int(n);
            help1(
              strpool!("I'm going to use 0 instead of that illegal code value."),
            );
            error;
            cur_val = 0;
        }
        if (p < math_code_base) {
            if (p >= sf_code_base) {
                n = equiv(p) div 0x10000;
                define(p, data, n * 0x10000 + cur_val);
            } else {
                define(p, data, cur_val);
            }
        } else if (p < del_code_base) {
            if (cur_val == 0x8000) {
                cur_val = active_math_char;
            } else {
                // !!FIXME!! check how this is used
                cur_val = 
                    set_class_field(cur_val div 0x1000)
                    + set_family_field(
                      (cur_val % 0x1000) div 0x100,
                    )
                    + (cur_val % 0x100)
                ;
            }
            define(p, data, hi(cur_val));
        } else {
            word_define(p, cur_val);
        }
    }
⟧

1287.

⟦1287 Let |n| be the largest legal code value, based on |cur_chr|⟧ = ⟦
    if (cur_chr == cat_code_base) {
        n = max_char_code;
    } else if (cur_chr == math_code_base) {
        n = 0x8000;
    } else if (cur_chr == sf_code_base) {
        n = 0x7fff;
    } else if (cur_chr == del_code_base) {
        n = 0xffffff;
    } else {
        n = biggest_usv;
    }
⟧

1288.

⟦1271 Assignments⟧ += ⟦
    def_family:

    {
        p = cur_chr;
        scan_math_fam_int;
        p = p + cur_val;
        scan_optional_equals;
        scan_font_ident;
        define(p, data, cur_val);
    }
⟧

1289. Next we consider changes to TEX’s numeric registers.

⟦1271 Assignments⟧ += ⟦
    register, advance, multiply, divide:
      do_register_command(a);;
⟧

1290. We use the fact that register < advance < multiply < divide .

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function do_register_command(a: small_number) {
        label found, exit;
        var
          l, q, r, s: pointer, // for list manipulation
          p: int_val .. mu_val, // type of register involved
          e: boolean, // does l refer to a sparse array 
          // element?
          w: integer; // integer or dimen value of l 
        
        q = cur_cmd;
        // just in case, will be set true for sparse array 
        // elements
        e = false;
        ⟦1291 Compute the register location |l| and its type |p|; but |return| if invalid⟧
        if (q == register) {
            scan_optional_equals;
        } else if (scan_keyword(strpool!("by"))) {
            // optional `\.{by}'
            do_nothing;
        }
        arith_error = false;
        if (q < multiply) {
            ⟦1292 Compute result of |register| or |advance|, put it in |cur_val|⟧
        } else {
            ⟦1294 Compute result of |multiply| or |divide|, put it in |cur_val|⟧
        }
        if (arith_error) {
            print_err(strpool!("Arithmetic overflow"));
            help2(
              strpool!("I can't carry out that multiplication or division,"),
            )(
              strpool!("since the result is out of range."),
            );
            if (p >= glue_val) {
                delete_glue_ref(cur_val);
            }
            error;
            return;
        }
        if (p < glue_val) {
            sa_word_define(l, cur_val);
        } else {
            trap_zero_glue;
            sa_define(l, cur_val)(l, glue_ref, cur_val);
        }
      exit:
    }
⟧

1291. Here we use the fact that the consecutive codes int_val .. mu_val and assign_int .. assign_mu_glue correspond to each other nicely.

⟦1291 Compute the register location |l| and its type |p|; but |return| if invalid⟧ = ⟦
    {
        if (q != register) {
            get_x_token;
            if (
                (cur_cmd >= assign_int)
                && (cur_cmd <= assign_mu_glue)
            ) {
                l = cur_chr;
                p = cur_cmd - assign_int;
                goto found;
            }
            if (cur_cmd != register) {
                print_err(strpool!("You can't use `"));
                print_cmd_chr(cur_cmd, cur_chr);
                print(strpool!("' after "));
                print_cmd_chr(q, 0);
                help1(
                  strpool!("I'm forgetting what you said and not changing anything."),
                );
                error;
                return;
            }
        }
        if (
            (cur_chr < mem_bot)
            || (cur_chr > lo_mem_stat_max)
        ) {
            l = cur_chr;
            p = sa_type(l);
            e = true;
        } else {
            p = cur_chr - mem_bot;
            scan_register_num;
            if (cur_val > 255) {
                find_sa_element(p, cur_val, true);
                l = cur_ptr;
                e = true;
            } else {
                case p {
                  int_val:
                    l = cur_val + count_base;
                  dimen_val:
                    l = cur_val + scaled_base;
                  glue_val:
                    l = cur_val + skip_base;
                  mu_val:
                    l = cur_val + mu_skip_base;// there are 
                  // no other cases
                }
            }
        }
    }

    found:

    if (p < glue_val) {
        if (e) {
            w = sa_int(l);
        } else {
            w = eqtb[l].int;
        }
    } else if (e) {
        s = sa_ptr(l);
    } else {
        s = equiv(l);
    }
⟧

1292.

⟦1292 Compute result of |register| or |advance|, put it in |cur_val|⟧ = ⟦
    if (p < glue_val) {
        if (p == int_val) {
            scan_int;
        } else {
            scan_normal_dimen;
        }
        if (q == advance) {
            cur_val = cur_val + w;
        }
    } else {
        scan_glue(p);
        if (q == advance) {
            ⟦1293 Compute the sum of two glue specs⟧
        }
    }
⟧

1293.

⟦1293 Compute the sum of two glue specs⟧ = ⟦
    {
        q = new_spec(cur_val);
        r = s;
        delete_glue_ref(cur_val);
        width(q) = width(q) + width(r);
        if (stretch(q) == 0) {
            stretch_order(q) = normal;
        }
        if (stretch_order(q) == stretch_order(r)) {
            stretch(q) = stretch(q) + stretch(r);
        } else if (
            (stretch_order(q) < stretch_order(r))
            && (stretch(r) != 0)
        ) {
            stretch(q) = stretch(r);
            stretch_order(q) = stretch_order(r);
        }
        if (shrink(q) == 0) {
            shrink_order(q) = normal;
        }
        if (shrink_order(q) == shrink_order(r)) {
            shrink(q) = shrink(q) + shrink(r);
        } else if (
            (shrink_order(q) < shrink_order(r))
            && (shrink(r) != 0)
        ) {
            shrink(q) = shrink(r);
            shrink_order(q) = shrink_order(r);
        }
        cur_val = q;
    }
⟧

1294.

⟦1294 Compute result of |multiply| or |divide|, put it in |cur_val|⟧ = ⟦
    {
        scan_int;
        if (p < glue_val) {
            if (q == multiply) {
                if (p == int_val) {
                    cur_val = mult_integers(w, cur_val);
                } else {
                    cur_val = nx_plus_y(w, cur_val, 0);
                }
            } else {
                cur_val = x_over_n(w, cur_val);
            }
        } else {
            r = new_spec(s);
            if (q == multiply) {
                width(r) = nx_plus_y(width(s), cur_val, 0);
                stretch(r) = nx_plus_y(
                  stretch(s),
                  cur_val,
                  0,
                );
                shrink(r) = nx_plus_y(
                  shrink(s),
                  cur_val,
                  0,
                );
            } else {
                width(r) = x_over_n(width(s), cur_val);
                stretch(r) = x_over_n(stretch(s), cur_val);
                shrink(r) = x_over_n(shrink(s), cur_val);
            }
            cur_val = r;
        }
    }
⟧

1295. The processing of boxes is somewhat different, because we may need to scan and create an entire box before we actually change the value of the old one.

⟦1271 Assignments⟧ += ⟦
    set_box:

    {
        scan_register_num;
        if (global) {
            n = global_box_flag + cur_val;
        } else {
            n = box_flag + cur_val;
        }
        scan_optional_equals;
        if (set_box_allowed) {
            scan_box(n);
        } else {
            print_err(strpool!("Improper "));
            print_esc(strpool!("setbox"));
            help2(
              strpool!("Sorry, \\setbox is not allowed after \\halign in a display,"),
            )(
              strpool!("or between \\accent and an accented character."),
            );
            error;
        }
    }
⟧

1296. The space_factor or prev_depth settings are changed when a set_aux command is sensed. Similarly, prev_graf is changed in the presence of set_prev_graf , and dead_cycles or insert_penalties in the presence of set_page_int . These definitions are always global.

When some dimension of a box register is changed, the change isn’t exactly global; but TEX does not look at the \global switch.

⟦1271 Assignments⟧ += ⟦
    var set_aux: alter_aux;

    var set_prev_graf: alter_prev_graf;

    var set_page_dimen: alter_page_so_far;

    var set_page_int: alter_integer;

    var set_box_dimen: alter_box_dimen;
⟧

1297.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function alter_aux() {
        var
          c: halfword; //  hmode or vmode 
        
        if (cur_chr != abs(mode)) {
            report_illegal_case;
        } else {
            c = cur_chr;
            scan_optional_equals;
            if (c == vmode) {
                scan_normal_dimen;
                prev_depth = cur_val;
            } else {
                scan_int;
                if ((cur_val <= 0) || (cur_val > 32767)) {
                    print_err(strpool!("Bad space factor"));
                    help1(
                      strpool!("I allow only values in the range 1..32767 here."),
                    );
                    int_error(cur_val);
                } else {
                    space_factor = cur_val;
                }
            }
        }
    }
⟧

1298.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function alter_prev_graf() {
        var
          p: 0 .. nest_size; // index into nest 
        
        nest[nest_ptr] = cur_list;
        p = nest_ptr;
        while (abs(nest[p].mode_field) != vmode) {
            decr(p);
        }
        scan_optional_equals;
        scan_int;
        if (cur_val < 0) {
            print_err(strpool!("Bad "));
            print_esc(strpool!("prevgraf"));
            help1(
              strpool!("I allow only nonnegative values here."),
            );
            int_error(cur_val);
        } else {
            nest[p].pg_field = cur_val;
            cur_list = nest[nest_ptr];
        }
    }
⟧

1299.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function alter_page_so_far() {
        var
          c: 0 .. 7; // index into page_so_far 
        
        c = cur_chr;
        scan_optional_equals;
        scan_normal_dimen;
        page_so_far[c] = cur_val;
    }
⟧

1300.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function alter_integer() {
        var
          c: small_number; // 0 for \.{\\deadcycles}, 1 for 
          // \.{\\insertpenalties}, etc.
        
        c = cur_chr;
    }

    scan_optional_equals

    scan_int

    if (c == 0) {
        dead_cycles = cur_val;
    }

    ⟦1506 Cases for |alter_integer|⟧

    else

    insert_penalties = cur_val

    
⟧

1301.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function alter_box_dimen() {
        var
          c: small_number, //  width_offset or height_offset 
          // or depth_offset 
          b: pointer; // box register
        
        c = cur_chr;
        scan_register_num;
        fetch_box(b);
        scan_optional_equals;
        scan_normal_dimen;
        if (b != null) {
            mem[b + c].sc = cur_val;
        }
    }
⟧

1302. Paragraph shapes are set up in the obvious way.

⟦1271 Assignments⟧ += ⟦
    set_shape:

    {
        q = cur_chr;
        scan_optional_equals;
        scan_int;
        n = cur_val;
        if (n <= 0) {
            p = null;
        } else if (q > par_shape_loc) {
            n = (cur_val div 2) + 1;
            p = get_node(2 * n + 1);
            info(p) = n;
            n = cur_val;
            // number of penalties
            mem[p + 1].int = n;
            for (j in p + 2 to p + n + 1) {
                scan_int;
                // penalty values
                mem[j].int = cur_val;
            }
            if (!odd(n)) {
                // unused
                mem[p + n + 2].int = 0;
            }
        } else {
            p = get_node(2 * n + 1);
            info(p) = n;
            for (j in 1 to n) {
                scan_normal_dimen;
                // indentation
                mem[p + 2 * j - 1].sc = cur_val;
                scan_normal_dimen;
                // width
                mem[p + 2 * j].sc = cur_val;
            }
        }
        define(q, shape_ref, p);
    }
⟧

1303. Here’s something that isn’t quite so obvious. It guarantees that info(par_shape_ptr) can hold any positive n for which get_node(2 * n + 1) doesn’t overflow the memory capacity.

⟦14 Check the ``constant'' values for consistency⟧ += ⟦
    if (2 * max_halfword < mem_top - mem_min) {
        bad = 41;
    }
⟧

1304. New hyphenation data is loaded by the hyph_data command.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("hyphenation"), hyph_data, 0)

    primitive(strpool!("patterns"), hyph_data, 1)
⟧

1305.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    hyph_data:

    if (chr_code == 1) {
        print_esc(strpool!("patterns"));
    } else {
        print_esc(strpool!("hyphenation"));
    }
⟧

1306.

⟦1271 Assignments⟧ += ⟦
    hyph_data:

    if (cur_chr == 1) {
        Init!{
            new_patterns;
            goto done;
        }
        print_err(
          strpool!("Patterns can be loaded only by INITEX"),
        );
        help0;
        error;
        repeat {
            get_token;// flush the patterns
        } until (cur_cmd == right_brace);
        return;
    } else {
        new_hyph_exceptions;
        goto done;
    }
⟧

1307. All of TEX’s parameters are kept in eqtb except the font information, the interaction mode, and the hyphenation tables; these are strictly global.

⟦1271 Assignments⟧ += ⟦
    assign_font_dimen:

    {
        find_font_dimen(true);
        k = cur_val;
        scan_optional_equals;
        scan_normal_dimen;
        font_info[k].sc = cur_val;
    }

    assign_font_int:

    {
        n = cur_chr;
        scan_font_ident;
        f = cur_val;
        if (n < lp_code_base) {
            scan_optional_equals;
            scan_int;
            if (n == 0) {
                hyphen_char[f] = cur_val;
            } else {
                skew_char[f] = cur_val;
            }
        } else {
            if (is_native_font(f)) {
                // for native fonts, the value is a glyph id
                scan_glyph_number(f);
            } else {
                // for tfm fonts it's the same like pdftex
                scan_char_num;
            }
            p = cur_val;
            scan_optional_equals;
            scan_int;
            case n {
              lp_code_base:
                set_cp_code(f, p, left_side, cur_val);
              rp_code_base:
                set_cp_code(f, p, right_side, cur_val);
            }
        }
    }
⟧

1308.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("hyphenchar"), assign_font_int, 0)

    primitive(strpool!("skewchar"), assign_font_int, 1)

    primitive(
      strpool!("lpcode"),
      assign_font_int,
      lp_code_base,
    )

    primitive(
      strpool!("rpcode"),
      assign_font_int,
      rp_code_base,
    )
⟧

1309.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    assign_font_int:

    case chr_code {
      0:
        print_esc(strpool!("hyphenchar"));
      1:
        print_esc(strpool!("skewchar"));
      lp_code_base:
        print_esc(strpool!("lpcode"));
      rp_code_base:
        print_esc(strpool!("rpcode"));
    }
⟧

1310. Here is where the information for a new font gets loaded.

⟦1271 Assignments⟧ += ⟦
    def_font:

    new_font(a)
⟧

1311.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function new_font(a: small_number) {
        label common_ending;
        var
          u: pointer, // user's font identifier
          s: scaled, // stated ``at'' size, or negative of 
          // scaled magnification
          f: internal_font_number, // runs through existing 
          // fonts
          t: str_number, // name for the frozen font 
          // identifier
          old_setting: 0 .. max_selector; // holds selector 
          // setting
        
        if (job_name == 0) {
            // avoid confusing \.{texput} with the font name
            open_log_file;
        }
        get_r_token;
        u = cur_cs;
        if (u >= hash_base) {
            t = text(u);
        } else if (u >= single_base) {
            if (u == null_cs) {
                t = strpool!("FONT");
            } else {
                t = u - single_base;
            }
        } else {
            old_setting = selector;
            selector = new_string;
            print(strpool!("FONT"));
            print(u - active_base);
            selector = old_setting;
            str_room(1);
            t = make_string;
        }
        define(u, set_font, null_font);
        scan_optional_equals;
        scan_file_name;
        ⟦1312 Scan the font size specification⟧
        ⟦1314 If this font has already been loaded, set |f| to the internal font number and |goto common_ending|⟧
        f = read_font_info(u, cur_name, cur_area, s);
      common_ending:
        define(u, set_font, f);
        eqtb[font_id_base + f] = eqtb[u];
        font_id_text(f) = t;
    }
⟧

1312.

⟦1312 Scan the font size specification⟧ = ⟦
    // this keeps cur_name from being changed
    name_in_progress = true

    if (scan_keyword(strpool!("at"))) {
        ⟦1313 Put the \(p)(positive) `at' size into |s|⟧
    } else if (scan_keyword(strpool!("scaled"))) {
        scan_int;
        s = -cur_val;
        if ((cur_val <= 0) || (cur_val > 32768)) {
            print_err(
              strpool!("Illegal magnification has been changed to 1000"),
            );
            help1(
              strpool!("The magnification ratio must be between 1 and 32768."),
            );
            int_error(cur_val);
            s = -1000;
        }
    } else {
        s = -1000;
    }

    name_in_progress = false
⟧

1313.

⟦1313 Put the \(p)(positive) `at' size into |s|⟧ = ⟦
    {
        scan_normal_dimen;
        s = cur_val;
        if ((s <= 0) || (s >= 0x8000000)) {
            print_err(strpool!("Improper `at' size ("));
            print_scaled(s);
            print(strpool!("pt), replaced by 10pt"));
            help2(
              strpool!("I can only handle fonts at positive sizes that are"),
            )(
              strpool!("less than 2048pt, so I've changed what you said to 10pt."),
            );
            error;
            s = 10 * unity;
        }
    }
⟧

1314. When the user gives a new identifier to a font that was previously loaded, the new name becomes the font identifier of record. Font names ‘xyz’ and ‘XYZ’ are considered to be different.

⟦1314 If this font has already been loaded, set |f| to the internal font number and |goto common_ending|⟧ = ⟦
    for (f in font_base + 1 to font_ptr) {
        if (
            str_eq_str(font_name[f], cur_name)
            && (
                (
                    (cur_area == strpool!(""))
                    && is_native_font(f)
                )
                || str_eq_str(font_area[f], cur_area)
            )
        ) {
            if (s > 0) {
                if (s == font_size[f]) {
                    goto common_ending;
                }
            } else if (
                font_size[f]
                == xn_over_d(font_dsize[f], -s, 1000)
            ) {
                goto common_ending;
            }
            // could be a native font whose "name" ended up 
            // partly in area or extension
        }
        append_str(cur_area);
        append_str(cur_name);
        append_str(cur_ext);
        if (str_eq_str(font_name[f], make_string)) {
            flush_string;
            if (is_native_font(f)) {
                if (s > 0) {
                    if (s == font_size[f]) {
                        goto common_ending;
                    }
                } else if (
                    font_size[f]
                    == xn_over_d(font_dsize[f], -s, 1000)
                ) {
                    goto common_ending;
                }
            }
        } else {
            flush_string;
        }
    }
⟧

1315.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    set_font:

    {
        print(strpool!("select font "));
        font_name_str = font_name[chr_code];
        if (is_native_font(chr_code)) {
            quote_char = ord!("\"");
            for (n in 0 to length(font_name_str) - 1) {
                if (
                    str_pool[
                      str_start_macro(font_name_str) + n,
                    ]
                    == ord!("\"")
                ) {
                    quote_char = ord!("'");
                }
            }
            print_char(quote_char);
            slow_print(font_name_str);
            print_char(quote_char);
        } else {
            slow_print(font_name_str);
        }
        if (font_size[chr_code] != font_dsize[chr_code]) {
            print(strpool!(" at "));
            print_scaled(font_size[chr_code]);
            print(strpool!("pt"));
        }
    }
⟧

1316.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("batchmode"),
      set_interaction,
      batch_mode,
    )

    primitive(
      strpool!("nonstopmode"),
      set_interaction,
      nonstop_mode,
    )

    primitive(
      strpool!("scrollmode"),
      set_interaction,
      scroll_mode,
    )

    primitive(
      strpool!("errorstopmode"),
      set_interaction,
      error_stop_mode,
    )
⟧

1317.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    set_interaction:

    case chr_code {
      batch_mode:
        print_esc(strpool!("batchmode"));
      nonstop_mode:
        print_esc(strpool!("nonstopmode"));
      scroll_mode:
        print_esc(strpool!("scrollmode"));
      othercases:
        print_esc(strpool!("errorstopmode"));
    }
⟧

1318.

⟦1271 Assignments⟧ += ⟦
    var set_interaction: new_interaction;
⟧

1319.

⟦1269 Declare subprocedures for |prefixed_command|⟧ += ⟦
    function new_interaction() {
        print_ln;
        interaction = cur_chr;
        if (interaction == batch_mode) {
            kpse_make_tex_discard_errors = 1;
        } else {
            kpse_make_tex_discard_errors = 0;
        }
        ⟦79 Initialize the print |selector| based on |interaction|⟧
        if (log_opened) {
            selector = selector + 2;
        }
    }
⟧

1320. The \afterassignment command puts a token into the global variable after_token . This global variable is examined just after every assignment has been performed.

⟦13 Global variables⟧ += ⟦
    // zero, or a saved token
    var after_token: halfword;
⟧

1321.

⟦23 Set initial values of key variables⟧ += ⟦
    after_token = 0

1322.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ += ⟦
    any_mode(after_assignment):
      get_token;
      after_token = cur_tok;
⟧

1323.

⟦1323 Insert a token saved by \.{\\afterassignment}, if any⟧ = ⟦
    if (after_token != 0) {
        cur_tok = after_token;
        back_input;
        after_token = 0;
    }
⟧

1324. Here is a procedure that might be called ‘Get the next non-blank non-relax non-call non-assignment token’.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function do_assignments() {
        label exit;
        
        loop {
            ⟦438 Get the next non-blank non-relax non-call token⟧
            if (cur_cmd <= max_non_prefixed_command) {
                return;
            }
            set_box_allowed = false;
            prefixed_command;
            set_box_allowed = true;
        }
      exit:
    }
⟧

1325.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ += ⟦
    any_mode(after_group):
      get_token;
      save_for_after(cur_tok);
⟧

1326. Files for \read are opened and closed by the in_stream command.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("openin"), in_stream, 1)

    primitive(strpool!("closein"), in_stream, 0)
⟧

1327.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    in_stream:

    if (chr_code == 0) {
        print_esc(strpool!("closein"));
    } else {
        print_esc(strpool!("openin"));
    }
⟧

1328.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ += ⟦
    any_mode(in_stream):
      open_or_close_in;;
⟧

1329.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function open_or_close_in() {
        var
          c: 0 .. 1, // 1 for \.{\\openin}, 0 for 
          // \.{\\closein}
          n: 0 .. 15, // stream number
          k: 0 .. file_name_size; // index into 
          // name_of_file16 
        
        c = cur_chr;
        scan_four_bit_int;
        n = cur_val;
        if (read_open[n] != closed) {
            u_close(read_file[n]);
            read_open[n] = closed;
        }
        if (c != 0) {
            scan_optional_equals;
            scan_file_name;
            pack_cur_name;
            // Tell open_input we are \.{\\openin}.
            tex_input_type = 0;
            if (
                kpse_in_name_ok(
                  stringcast(name_of_file + 1),
                )
                && u_open_in(
                  read_file[n],
                  kpse_tex_format,
                  XeTeX_default_input_mode,
                  XeTeX_default_input_encoding,
                )
            ) {
                make_utf16_name;
                name_in_progress = true;
                begin_name;
                stop_at_space = false;
                k = 0;
                while (
                    (k < name_length16)
                    && (more_name(name_of_file16[k]))
                ) {
                    incr(k);
                }
                stop_at_space = true;
                end_name;
                name_in_progress = false;
                read_open[n] = just_open;
            }
        }
    }
⟧

1330. The user can issue messages to the terminal, regardless of the current mode.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ += ⟦
    any_mode(message):
      issue_message;;
⟧

1331.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("message"), message, 0)

    primitive(strpool!("errmessage"), message, 1)
⟧

1332.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    message:

    if (chr_code == 0) {
        print_esc(strpool!("message"));
    } else {
        print_esc(strpool!("errmessage"));
    }
⟧

1333.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function issue_message() {
        var
          old_setting: 0 .. max_selector, // holds selector 
          // setting
          c: 0 .. 1, // identifies \.{\\message} and 
          // \.{\\errmessage}
          s: str_number; // the message
        
        c = cur_chr;
        link(garbage) = scan_toks(false, true);
        old_setting = selector;
        selector = new_string;
        token_show(def_ref);
        selector = old_setting;
        flush_list(def_ref);
        str_room(1);
        s = make_string;
        if (c == 0) {
            ⟦1334 Print string |s| on the terminal⟧
        } else {
            ⟦1337 Print string |s| as an error message⟧
        }
        flush_string;
    }
⟧

1334.

⟦1334 Print string |s| on the terminal⟧ = ⟦
    {
        if (term_offset + length(s) > max_print_line - 2) {
            print_ln;
        } else if ((term_offset > 0) || (file_offset > 0)) {
            print_char(ord!(" "));
        }
        slow_print(s);
        update_terminal;
    }
⟧

1335. If \errmessage occurs often in scroll_mode , without user-defined \errhelp, we don’t want to give a long help message each time. So we give a verbose explanation only once.

⟦13 Global variables⟧ += ⟦
    // has the long \.{\\errmessage} help been used?
    var long_help_seen: boolean;
⟧

1336.

⟦23 Set initial values of key variables⟧ += ⟦
    long_help_seen = false
⟧

1337.

⟦1337 Print string |s| as an error message⟧ = ⟦
    {
        print_err(strpool!(""));
        slow_print(s);
        if (err_help != null) {
            use_err_help = true;
        } else if (long_help_seen) {
            help1(
              strpool!("(That was another \\errmessage.)"),
            );
        } else {
            if (interaction < error_stop_mode) {
                long_help_seen = true;
            }
            help4(
              strpool!("This error message was generated by an \\errmessage"),
            )(
              strpool!("command, so I can't give any explicit help."),
            )(
              strpool!("Pretend that you're Hercule Poirot: Examine all clues,"),
            )(
              strpool!("and deduce the truth by order and method."),
            );
        }
        error;
        use_err_help = false;
    }
⟧

1338. The error routine calls on give_err_help if help is requested from the err_help parameter.

function give_err_help() {
    token_show(err_help);
}

1339. The \uppercase and \lowercase commands are implemented by building a token list and then changing the cases of the letters in it.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ += ⟦
    any_mode(case_shift):
      shift_case;;
⟧

1340.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("lowercase"),
      case_shift,
      lc_code_base,
    )

    primitive(
      strpool!("uppercase"),
      case_shift,
      uc_code_base,
    )
⟧

1341.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    case_shift:

    if (chr_code == lc_code_base) {
        print_esc(strpool!("lowercase"));
    } else {
        print_esc(strpool!("uppercase"));
    }
⟧

1342.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function shift_case() {
        var
          b: pointer, //  lc_code_base or uc_code_base 
          p: pointer, // runs through the token list
          t: halfword, // token
          c: integer; // character code
        
        b = cur_chr;
        p = scan_toks(false, false);
        p = link(def_ref);
        while (p != null) {
            ⟦1343 Change the case of the token in |p|, if a change is appropriate⟧
            p = link(p);
        }
        back_list(link(def_ref));
        // omit reference count
        free_avail(def_ref);
    }
⟧

1343. When the case of a chr_code changes, we don’t change the cmd . We also change active characters, using the fact that cs_token_flag + active_base is a multiple of 256.

⟦1343 Change the case of the token in |p|, if a change is appropriate⟧ = ⟦
    t = info(p)

    if (t < cs_token_flag + single_base) {
        c = t % max_char_val;
        if (equiv(b + c) != 0) {
            info(p) = t - c + equiv(b + c);
        }
    }
⟧

1344. We come finally to the last pieces missing from main_control , namely the ‘\show’ commands that are useful when debugging.

⟦1264 Cases of |main_control| that don't depend on |mode|⟧ += ⟦
    any_mode(xray):
      show_whatever;;
⟧

1345.

@define show_code => 0 // \.{\\show}
@define show_box_code => 1 // \.{\\showbox}
@define show_the_code => 2 // \.{\\showthe}
@define show_lists_code => 3 // \.{\\showlists}
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("show"), xray, show_code)

    primitive(strpool!("showbox"), xray, show_box_code)

    primitive(strpool!("showthe"), xray, show_the_code)

    primitive(strpool!("showlists"), xray, show_lists_code)
⟧

1346.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    xray:

    case chr_code {
      show_box_code:
        print_esc(strpool!("showbox"));
      show_the_code:
        print_esc(strpool!("showthe"));
      show_lists_code:
        print_esc(strpool!("showlists"));
      ⟦1486 Cases of |xray| for |print_cmd_chr|⟧
      othercases:
        print_esc(strpool!("show"));
    }
⟧

1347.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function show_whatever() {
        label common_ending;
        var
          p: pointer, // tail of a token list to show
          t: small_number, // type of conditional being 
          // shown
          m: normal .. or_code, // upper bound on fi_or_else 
          // codes
          l: integer, // line where that conditional began
          n: integer; // level of \.{\\if...\\fi} nesting
        
        case cur_chr {
          show_lists_code:
            begin_diagnostic;
            show_activities;
          show_box_code:
            ⟦1350 Show the current contents of a box⟧
          show_code:
            ⟦1348 Show the current meaning of a token, then |goto common_ending|⟧
          ⟦1487 Cases for |show_whatever|⟧
          othercases:
            ⟦1351 Show the current value of some parameter or register, then |goto common_ending|⟧
        }
        ⟦1352 Complete a potentially long \.{\\show} command⟧
      common_ending:
        if (interaction < error_stop_mode) {
            help0;
            decr(error_count);
        } else if (tracing_online > 0) {
            help3(
              strpool!("This isn't an error message; I'm just \\showing something."),
            )(
              strpool!("Type `I\\show...' to show more (e.g., \\show\\cs,"),
            )(
              strpool!("\\showthe\\count10, \\showbox255, \\showlists)."),
            );
        } else {
            help5(
              strpool!("This isn't an error message; I'm just \\showing something."),
            )(
              strpool!("Type `I\\show...' to show more (e.g., \\show\\cs,"),
            )(
              strpool!("\\showthe\\count10, \\showbox255, \\showlists)."),
            )(
              strpool!("And type `I\\tracingonline=1\\show...' to show boxes and"),
            )(
              strpool!("lists on your terminal as well as in the transcript file."),
            );
        }
        error;
    }
⟧

1348.

⟦1348 Show the current meaning of a token, then |goto common_ending|⟧ = ⟦
    {
        get_token;
        if (interaction == error_stop_mode) {
            wake_up_terminal;
        }
        print_nl(strpool!("> "));
        if (cur_cs != 0) {
            sprint_cs(cur_cs);
            print_char(ord!("="));
        }
        print_meaning;
        goto common_ending;
    }
⟧

1349.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    undefined_cs:

    print(strpool!("undefined"))

    call, long_call, outer_call, long_outer_call:
      n = cmd - call;
      if (info(link(chr_code)) == protected_token) {
          n = n + 4;
      }
      if (odd(n div 4)) {
          print_esc(strpool!("protected"));
      }
      if (odd(n)) {
          print_esc(strpool!("long"));
      }
      if (odd(n div 2)) {
          print_esc(strpool!("outer"));
      }
      if (n > 0) {
          print_char(ord!(" "));
      }
      print(strpool!("macro"));

    end_template:

    print_esc(strpool!("outer endtemplate"))
⟧

1350.

⟦1350 Show the current contents of a box⟧ = ⟦
    {
        scan_register_num;
        fetch_box(p);
        begin_diagnostic;
        print_nl(strpool!("> \\box"));
        print_int(cur_val);
        print_char(ord!("="));
        if (p == null) {
            print(strpool!("void"));
        } else {
            show_box(p);
        }
    }
⟧

1351.

⟦1351 Show the current value of some parameter or register, then |goto common_ending|⟧ = ⟦
    {
        p = the_toks;
        if (interaction == error_stop_mode) {
            wake_up_terminal;
        }
        print_nl(strpool!("> "));
        token_show(temp_head);
        flush_list(link(temp_head));
        goto common_ending;
    }
⟧

1352.

⟦1352 Complete a potentially long \.{\\show} command⟧ = ⟦
    end_diagnostic(true)

    print_err(strpool!("OK"))

    if (selector == term_and_log) {
        if (tracing_online <= 0) {
            selector = term_only;
            print(strpool!(" (see the transcript file)"));
            selector = term_and_log;
        }
    }
⟧

1353. [50] Dumping and undumping the tables. After INITEX has seen a collection of fonts and macros, it can write all the necessary information on an auxiliary file so that production versions of TEX are able to initialize their memory at high speed. The present section of the program takes care of such output and input. We shall consider simultaneously the processes of storing and restoring, so that the inverse relation between them is clear.

The global variable format_ident is a string that is printed right after the banner line when TEX is ready to start. For INITEX this string says simply ‘ (INITEX)’; for other versions of TEX it says, for example, ‘ (preloaded format=plain 1982.11.19)’, showing the year, month, and day that the format file was created. We have format_ident == 0 before TEX’s tables are loaded.

⟦13 Global variables⟧ += ⟦
    var format_ident: str_number;
⟧

1354.

⟦23 Set initial values of key variables⟧ += ⟦
    format_ident = 0

1355.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    if (ini_version) {
        format_ident = strpool!(" (INITEX)");
    }
⟧

1356.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    init!{
        function store_fmt_file() {
            label found1, found2, done1, done2;
            var
              j, k, l: integer, // all-purpose indices
              p, q: pointer, // all-purpose pointers
              x: integer, // something to dump
              format_engine: ^char;
            
            ⟦1358 If dumping is not allowed, abort⟧
            ⟦1382 Create the |format_ident|, open the format file, and inform the user that dumping has begun⟧
            ⟦1361 Dump constants for consistency check⟧
            ⟦1700 Dump ML\TeX-specific data⟧
            ⟦1363 Dump the string pool⟧
            ⟦1365 Dump the dynamic memory⟧
            ⟦1367 Dump the table of equivalents⟧
            ⟦1374 Dump the font information⟧
            ⟦1378 Dump the hyphenation tables⟧
            ⟦1380 Dump a couple more things and the closing check word⟧
            ⟦1383 Close the format file⟧
        }
    }
⟧

1357. Corresponding to the procedure that dumps a format file, we have a function that reads one in. The function returns false if the dumped format is incompatible with the present TEX table sizes, etc.

// go here if the format file is unacceptable
@define bad_fmt => 6666
@define too_small(#) =>
    {
        wake_up_terminal;
        wterm_ln("---! Must increase the ", #);
        goto bad_fmt;
    }
⟦559 Declare the function called |open_fmt_file|⟧

function load_fmt_file(): boolean {
    label bad_fmt, exit;
    var
      j, k: integer, // all-purpose indices
      p, q: pointer, // all-purpose pointers
      x: integer, // something undumped
      format_engine: ^char;
    
    ⟦1362 Undump constants for consistency check⟧
    ⟦1701 Undump ML\TeX-specific data⟧
    ⟦1364 Undump the string pool⟧
    ⟦1366 Undump the dynamic memory⟧
    ⟦1368 Undump the table of equivalents⟧
    ⟦1375 Undump the font information⟧
    ⟦1379 Undump the hyphenation tables⟧
    ⟦1381 Undump a couple more things and the closing check word⟧
    load_fmt_file = true;
    // it worked!
    return;
  bad_fmt:
    wake_up_terminal;
    wterm_ln("(Fatal format file error; I'm stymied)");
    load_fmt_file = false;
  exit:
}

1358. The user is not allowed to dump a format file unless save_ptr == 0 . This condition implies that cur_level == level_one , hence the xeq_level array is constant and it need not be dumped.

⟦1358 If dumping is not allowed, abort⟧ = ⟦
    if (save_ptr != 0) {
        print_err(
          strpool!("You can't dump inside a group"),
        );
        help1(strpool!("`{...\\dump}' is a no-no."));
        succumb;
    }
⟧

1359. Format files consist of memory_word items, and we use the following macros to dump words of different types:

⟦13 Global variables⟧ += ⟦
    // for input or output of format information
    var fmt_file: word_file;
⟧

1360. The inverse macros are slightly more complicated, since we need to check the range of the values we are reading in. We say ‘undump(a)(b)(x) ’ to read an integer value x that is supposed to be in the range a <= x <= b . System error messages should be suppressed when undumping.

@define undump_end_end(#) =>
    /*... opened earlier ...*/
        # = x;
    }
@define undump_end(#) =>
    /* `if` opened earlier (*/(x > #)) {
        goto bad_fmt;
    } else {
        undump_end_end
    /* ... closed later ... */
@define undump(#) =>
    {
        undump_int(x);
        if ((x < #) || undump_end /* ) { ... continued ... */
@define format_debug_end(#) =>
    /*... opened earlier ...*/
        write_ln(stderr, " = ", #);
    }
@define format_debug(#) =>
    if (debug_format_file) {
        write(stderr, "fmtdebug:", #);
        format_debug_end
    /* ... closed later ... */
@define

undump_size_end_end(#) => too_small(#)

else

format_debug(#)(x)

undump_end_end
@define undump_size_end(#) =>
    if (x > #) {
        undump_size_end_end;
    }
@define undump_size(#) =>
    {
        undump_int(x);
        if (x < #) {
            goto bad_fmt;
        }
        undump_size_end
    /* ... closed later ... */

1361. The next few sections of the program should make it clear how we use the dump/undump macros.

⟦1361 Dump constants for consistency check⟧ = ⟦
    // Align engine to 4 bytes with one or more trailing NUL
    // Web2C \TeX's magic constant: "W2TX"
    dump_int(0x57325458)

    x = strlen(engine_name)

    format_engine = xmalloc_array(char, x + 4)

    strcpy(stringcast(format_engine), engine_name)

    for (k in x to x + 3) {
        format_engine[k] = 0;
    }

    x = x + 4 - (x % 4)

    dump_int(x)

    dump_things(format_engine[0], x)

    libc_free(format_engine)

    dump_int(stringpoolchecksum!())

    dump_int(max_halfword)

    dump_int(hash_high)

    ⟦1464 Dump the \eTeX\ state⟧

    dump_int(mem_bot)

    dump_int(mem_top)

    dump_int(eqtb_size)

    dump_int(hash_prime)

    dump_int(hyph_prime)
⟧

1362. Sections of a WEB program that are “commented out” still contribute strings to the string pool; therefore INITEX and TEX will have the same strings. (And it is, of course, a good thing that they do.)

⟦1362 Undump constants for consistency check⟧ = ⟦
    Init!{
        libc_free(font_info);
        libc_free(str_pool);
        libc_free(str_start);
        libc_free(yhash);
        libc_free(zeqtb);
        libc_free(yzmem);
    }

    undump_int(x)

    format_debug("format magic number")(x)

    if (x != 0x57325458) {
        // not a format file
        goto bad_fmt;
    }

    undump_int(x)

    format_debug("engine name size")(x)

    if ((x < 0) || (x > 256)) {
        // corrupted format file
        goto bad_fmt;
    }

    format_engine = xmalloc_array(char, x)

    undump_things(format_engine[0], x)

    // force string termination, just in case
    format_engine[x - 1] = 0

    if (strcmp(engine_name, stringcast(format_engine))) {
        wake_up_terminal;
        wterm_ln(
          "---! ",
          stringcast(name_of_file + 1),
          " was written by ",
          format_engine,
        );
        libc_free(format_engine);
        goto bad_fmt;
    }

    libc_free(format_engine)

    undump_int(x)

    format_debug("string pool checksum")(x)

    if (x != stringpoolchecksum!()) {
        // check that strings are the same
        wake_up_terminal;
        wterm_ln(
          "---! ",
          stringcast(name_of_file + 1),
          " made by different executable version, strings are different",
        );
        goto bad_fmt;
    }

    undump_int(x)

    if (x != max_halfword) {
        // check max_halfword 
        goto bad_fmt;
    }

    undump_int(hash_high)

    if ((hash_high < 0) || (hash_high > sup_hash_extra)) {
        goto bad_fmt;
    }

    if (hash_extra < hash_high) {
        hash_extra = hash_high;
    }

    eqtb_top = eqtb_size + hash_extra

    if (hash_extra == 0) {
        hash_top = undefined_control_sequence;
    } else {
        hash_top = eqtb_top;
    }

    yhash = xmalloc_array(
      two_halves,
      1 + hash_top - hash_offset,
    )

    hash = yhash - hash_offset

    next(hash_base) = 0

    text(hash_base) = 0

    for (x in hash_base + 1 to hash_top) {
        hash[x] = hash[hash_base];
    }

    zeqtb = xmalloc_array(memory_word, eqtb_top + 1)

    eqtb = zeqtb

    eq_type(undefined_control_sequence) = undefined_cs

    equiv(undefined_control_sequence) = null

    eq_level(undefined_control_sequence) = level_zero

    for (x in eqtb_size + 1 to eqtb_top) {
        eqtb[x] = eqtb[undefined_control_sequence];
    }

    ⟦1465 Undump the \eTeX\ state⟧

    undump_int(x)

    format_debug("mem_bot")(x)

    if (x != mem_bot) {
        goto bad_fmt;
    }

    undump_int(mem_top)

    format_debug("mem_top")(mem_top)

    if (mem_bot + 1100 > mem_top) {
        goto bad_fmt;
    }

    head = contrib_head

    tail = contrib_head

    page_tail = page_head // page initialization

    mem_min = mem_bot - extra_mem_bot

    mem_max = mem_top + extra_mem_top

    yzmem = xmalloc_array(
      memory_word,
      mem_max - mem_min + 1,
    )

    // this pointer arithmetic fails with some compilers
    zmem = yzmem - mem_min

    mem = zmem

    undump_int(x)

    if (x != eqtb_size) {
        goto bad_fmt;
    }

    undump_int(x)

    if (x != hash_prime) {
        goto bad_fmt;
    }

    undump_int(x)

    if (x != hyph_prime) {
        goto bad_fmt;
    }
⟧

1363.

@define dump_four_ASCII =>
    w.b0 = qi(so(str_pool[k]));
    w.b1 = qi(so(str_pool[k + 1]));
    w.b2 = qi(so(str_pool[k + 2]));
    w.b3 = qi(so(str_pool[k + 3]));
    dump_qqqq(w)
⟦1363 Dump the string pool⟧ = ⟦
    dump_int(pool_ptr)

    dump_int(str_ptr)

    dump_things(
      str_start_macro(too_big_char),
      str_ptr + 1 - too_big_char,
    )

    dump_things(str_pool[0], pool_ptr)

    print_ln

    print_int(str_ptr)

    print(strpool!(" strings of total length "))

    print_int(pool_ptr)
⟧

1364.

@define undump_four_ASCII =>
    undump_qqqq(w);
    str_pool[k] = si(qo(w.b0));
    str_pool[k + 1] = si(qo(w.b1));
    str_pool[k + 2] = si(qo(w.b2));
    str_pool[k + 3] = si(qo(w.b3))
⟦1364 Undump the string pool⟧ = ⟦
    undump_size(0)(sup_pool_size - pool_free)(
      "string pool size",
    )(pool_ptr)

    if (pool_size < pool_ptr + pool_free) {
        pool_size = pool_ptr + pool_free;
    }

    undump_size(0)(sup_max_strings - strings_free)(
      "sup strings",
    )(str_ptr)

    if (max_strings < str_ptr + strings_free) {
        max_strings = str_ptr + strings_free;
    }

    str_start = xmalloc_array(pool_pointer, max_strings)

    undump_checked_things(
      0,
      pool_ptr,
      str_start_macro(too_big_char),
      str_ptr + 1 - too_big_char,
    )

    str_pool = xmalloc_array(packed_ASCII_code, pool_size)

    undump_things(str_pool[0], pool_ptr)

    init_str_ptr = str_ptr

    init_pool_ptr = pool_ptr
⟧

1365. By sorting the list of available spaces in the variable-size portion of mem , we are usually able to get by without having to dump very much of the dynamic memory.

We recompute var_used and dyn_used , so that INITEX dumps valid information even when it has not been gathering statistics.

⟦1365 Dump the dynamic memory⟧ = ⟦
    sort_avail

    var_used = 0

    dump_int(lo_mem_max)

    dump_int(rover)

    if (eTeX_ex) {
        for (k in int_val to inter_char_val) {
            dump_int(sa_root[k]);
        }
    }

    p = mem_bot

    q = rover

    x = 0

    repeat {
        dump_things(mem[p], q + 2 - p);
        x = x + q + 2 - p;
        var_used = var_used + q - p;
        p = q + node_size(q);
        q = rlink(q);
    } until (q == rover)

    var_used = var_used + lo_mem_max - p

    dyn_used = mem_end + 1 - hi_mem_min

    dump_things(mem[p], lo_mem_max + 1 - p)

    x = x + lo_mem_max + 1 - p

    dump_int(hi_mem_min)

    dump_int(avail)

    dump_things(mem[hi_mem_min], mem_end + 1 - hi_mem_min)

    x = x + mem_end + 1 - hi_mem_min

    p = avail

    while (p != null) {
        decr(dyn_used);
        p = link(p);
    }

    dump_int(var_used)

    dump_int(dyn_used)

    print_ln

    print_int(x)

    print(
      strpool!(" memory locations dumped; current usage is "),
    )

    print_int(var_used)

    print_char(ord!("&"))

    print_int(dyn_used)
⟧

1366.

⟦1366 Undump the dynamic memory⟧ = ⟦
    undump(lo_mem_stat_max + 1000)(hi_mem_stat_min - 1)(
      lo_mem_max,
    )

    undump(lo_mem_stat_max + 1)(lo_mem_max)(rover)

    if (eTeX_ex) {
        for (k in int_val to inter_char_val) {
            undump(null)(lo_mem_max)(sa_root[k]);
        }
    }

    p = mem_bot

    q = rover

    repeat {
        undump_things(mem[p], q + 2 - p);
        p = q + node_size(q);
        if (
            (p > lo_mem_max)
            || ((q >= rlink(q)) && (rlink(q) != rover))
        ) {
            goto bad_fmt;
        }
        q = rlink(q);
    } until (q == rover)

    undump_things(mem[p], lo_mem_max + 1 - p)

    // make more low memory available
    if (mem_min < mem_bot - 2) {
        p = llink(rover);
        q = mem_min + 1;
        link(mem_min) = null;
        // we don't use the bottom word
        info(mem_min) = null;
        rlink(p) = q;
        llink(rover) = q;
        rlink(q) = rover;
        llink(q) = p;
        link(q) = empty_flag;
        node_size(q) = mem_bot - q;
    }

    undump(lo_mem_max + 1)(hi_mem_stat_min)(hi_mem_min)

    undump(null)(mem_top)(avail)

    mem_end = mem_top

    undump_things(mem[hi_mem_min], mem_end + 1 - hi_mem_min)

    undump_int(var_used)

    undump_int(dyn_used)
⟧

1367.

⟦1367 Dump the table of equivalents⟧ = ⟦
    ⟦1369 Dump regions 1 to 4 of |eqtb|⟧

    ⟦1370 Dump regions 5 and 6 of |eqtb|⟧

    dump_int(par_loc)

    dump_int(write_loc)

    ⟦1372 Dump the hash table⟧

1368.

⟦1368 Undump the table of equivalents⟧ = ⟦
    ⟦1371 Undump regions 1 to 6 of |eqtb|⟧

    undump(hash_base)(hash_top)(par_loc)

    par_token = cs_token_flag + par_loc

    undump(hash_base)(hash_top)(write_loc)

    ⟦1373 Undump the hash table⟧

1369. The table of equivalents usually contains repeated information, so we dump it in compressed form: The sequence of 𝑛+2 values (𝑛,𝑥1,,𝑥𝑛,𝑚) in the format file represents 𝑛+𝑚 consecutive entries of eqtb , with m extra copies of 𝑥𝑛, namely (𝑥1,,𝑥𝑛,𝑥𝑛,,𝑥𝑛).

⟦1369 Dump regions 1 to 4 of |eqtb|⟧ = ⟦
    k = active_base

    repeat {
        j = k;
        while (j < int_base - 1) {
            if (
                (equiv(j) == equiv(j + 1))
                && (eq_type(j) == eq_type(j + 1))
                && (eq_level(j) == eq_level(j + 1))
            ) {
                goto found1;
            }
            incr(j);
        }
        l = int_base;
        //  j == int_base - 1 
        goto done1;
      found1:
        incr(j);
        l = j;
        while (j < int_base - 1) {
            if (
                (equiv(j) != equiv(j + 1))
                || (eq_type(j) != eq_type(j + 1))
                || (eq_level(j) != eq_level(j + 1))
            ) {
                goto done1;
            }
            incr(j);
        }
      done1:
        dump_int(l - k);
        dump_things(eqtb[k], l - k);
        k = j + 1;
        dump_int(k - l);
    } until (k == int_base)
⟧

1370.

⟦1370 Dump regions 5 and 6 of |eqtb|⟧ = ⟦
    repeat {
        j = k;
        while (j < eqtb_size) {
            if (eqtb[j].int == eqtb[j + 1].int) {
                goto found2;
            }
            incr(j);
        }
        l = eqtb_size + 1;
        //  j == eqtb_size 
        goto done2;
      found2:
        incr(j);
        l = j;
        while (j < eqtb_size) {
            if (eqtb[j].int != eqtb[j + 1].int) {
                goto done2;
            }
            incr(j);
        }
      done2:
        dump_int(l - k);
        dump_things(eqtb[k], l - k);
        k = j + 1;
        dump_int(k - l);
    } until (k > eqtb_size)

    if (hash_high > 0) {
        // dump hash_extra part
        dump_things(eqtb[eqtb_size + 1], hash_high);
    }
⟧

1371.

⟦1371 Undump regions 1 to 6 of |eqtb|⟧ = ⟦
    k = active_base

    repeat {
        undump_int(x);
        if ((x < 1) || (k + x > eqtb_size + 1)) {
            goto bad_fmt;
        }
        undump_things(eqtb[k], x);
        k = k + x;
        undump_int(x);
        if ((x < 0) || (k + x > eqtb_size + 1)) {
            goto bad_fmt;
        }
        for (j in k to k + x - 1) {
            eqtb[j] = eqtb[k - 1];
        }
        k = k + x;
    } until (k > eqtb_size)

    if (hash_high > 0) {
        // undump hash_extra part
        undump_things(eqtb[eqtb_size + 1], hash_high);
    }
⟧

1372. A different scheme is used to compress the hash table, since its lower region is usually sparse. When text(p) != 0 for p <= hash_used , we output two words, p and hash[p] . The hash table is, of course, densely packed for p >= hash_used , so the remaining entries are output in a block.

⟦1372 Dump the hash table⟧ = ⟦
    for (p in 0 to prim_size) {
        dump_hh(prim[p]);
    }

    dump_int(hash_used)

    cs_count = 
        frozen_control_sequence
        - 1 - hash_used + hash_high

    for (p in hash_base to hash_used) {
        if (text(p) != 0) {
            dump_int(p);
            dump_hh(hash[p]);
            incr(cs_count);
        }
    }

    dump_things(
      hash[hash_used + 1],
      undefined_control_sequence - 1 - hash_used,
    )

    if (hash_high > 0) {
        dump_things(hash[eqtb_size + 1], hash_high);
    }

    dump_int(cs_count)

    print_ln

    print_int(cs_count)

    print(strpool!(" multiletter control sequences"))
⟧

1373.

⟦1373 Undump the hash table⟧ = ⟦
    for (p in 0 to prim_size) {
        undump_hh(prim[p]);
    }

    undump(hash_base)(frozen_control_sequence)(hash_used)

    p = hash_base - 1

    repeat {
        undump(p + 1)(hash_used)(p);
        undump_hh(hash[p]);
    } until (p == hash_used)

    undump_things(
      hash[hash_used + 1],
      undefined_control_sequence - 1 - hash_used,
    )

    if (debug_format_file) {
        print_csnames(
          hash_base,
          undefined_control_sequence - 1,
        );
    }

    if (hash_high > 0) {
        undump_things(hash[eqtb_size + 1], hash_high);
        if (debug_format_file) {
            print_csnames(
              eqtb_size + 1,
              hash_high - (eqtb_size + 1),
            );
        }
    }

    undump_int(cs_count)
⟧

1374.

⟦1374 Dump the font information⟧ = ⟦
    dump_int(fmem_ptr)

    dump_things(font_info[0], fmem_ptr)

    dump_int(font_ptr)

    ⟦1376 Dump the array info for internal font number |k|⟧

    print_ln

    print_int(fmem_ptr - 7)

    print(strpool!(" words of font info for "))

    print_int(font_ptr - font_base)

    if (font_ptr != font_base + 1) {
        print(strpool!(" preloaded fonts"));
    } else {
        print(strpool!(" preloaded font"));
    }
⟧

1375.

⟦1375 Undump the font information⟧ = ⟦
    undump_size(7)(sup_font_mem_size)("font mem size")(
      fmem_ptr,
    )

    if (fmem_ptr > font_mem_size) {
        font_mem_size = fmem_ptr;
    }

    font_info = xmalloc_array(fmemory_word, font_mem_size)

    undump_things(font_info[0], fmem_ptr)

    // This undumps all of the font info, despite the name.
    undump_size(font_base)(font_base + max_font_max)(
      "font max",
    )(font_ptr)

    ⟦1377 Undump the array info for internal font number |k|⟧

1376.

⟦1376 Dump the array info for internal font number |k|⟧ = ⟦
    {
        dump_things(
          font_check[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_size[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_dsize[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_params[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          hyphen_char[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          skew_char[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_name[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_area[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_bc[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_ec[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          char_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          width_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          height_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          depth_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          italic_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          lig_kern_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          kern_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          exten_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          param_base[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_glue[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          bchar_label[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_bchar[null_font],
          font_ptr + 1 - null_font,
        );
        dump_things(
          font_false_bchar[null_font],
          font_ptr + 1 - null_font,
        );
        for (k in null_font to font_ptr) {
            print_nl(strpool!("\\font"));
            print_esc(font_id_text(k));
            print_char(ord!("="));
            if (is_native_font(k) || (font_mapping[k] != 0)) {
                print_file_name(
                  font_name[k],
                  strpool!(""),
                  strpool!(""),
                );
                print_err(
                  strpool!("Can't \\dump a format with native fonts or font-mappings"),
                );
                help3(
                  strpool!("You really, really don't want to do this."),
                )(
                  strpool!("It won't work, and only confuses me."),
                )(
                  strpool!("(Load them at runtime, not as part of the format file.)"),
                );
                error;
            } else {
                print_file_name(
                  font_name[k],
                  font_area[k],
                  strpool!(""),
                );
            }
            if (font_size[k] != font_dsize[k]) {
                print(strpool!(" at "));
                print_scaled(font_size[k]);
                print(strpool!("pt"));
            }
        }
    }
⟧

1377. This module should now be named ‘Undump all the font arrays’.

⟦1377 Undump the array info for internal font number |k|⟧ = ⟦
    {
        // Allocate the font arrays
        font_mapping = xmalloc_array(
          void_pointer,
          font_max,
        );
        font_layout_engine = xmalloc_array(
          void_pointer,
          font_max,
        );
        font_flags = xmalloc_array(char, font_max);
        font_letter_space = xmalloc_array(scaled, font_max);
        font_check = xmalloc_array(four_quarters, font_max);
        font_size = xmalloc_array(scaled, font_max);
        font_dsize = xmalloc_array(scaled, font_max);
        font_params = xmalloc_array(font_index, font_max);
        font_name = xmalloc_array(str_number, font_max);
        font_area = xmalloc_array(str_number, font_max);
        font_bc = xmalloc_array(UTF16_code, font_max);
        font_ec = xmalloc_array(UTF16_code, font_max);
        font_glue = xmalloc_array(halfword, font_max);
        hyphen_char = xmalloc_array(integer, font_max);
        skew_char = xmalloc_array(integer, font_max);
        bchar_label = xmalloc_array(font_index, font_max);
        font_bchar = xmalloc_array(nine_bits, font_max);
        font_false_bchar = xmalloc_array(
          nine_bits,
          font_max,
        );
        char_base = xmalloc_array(integer, font_max);
        width_base = xmalloc_array(integer, font_max);
        height_base = xmalloc_array(integer, font_max);
        depth_base = xmalloc_array(integer, font_max);
        italic_base = xmalloc_array(integer, font_max);
        lig_kern_base = xmalloc_array(integer, font_max);
        kern_base = xmalloc_array(integer, font_max);
        exten_base = xmalloc_array(integer, font_max);
        param_base = xmalloc_array(integer, font_max);
        for (k in null_font to font_ptr) {
            font_mapping[k] = 0;
        }
        undump_things(
          font_check[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          font_size[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          font_dsize[null_font],
          font_ptr + 1 - null_font,
        );
        undump_checked_things(
          min_halfword,
          max_halfword,
          font_params[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          hyphen_char[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          skew_char[null_font],
          font_ptr + 1 - null_font,
        );
        undump_upper_check_things(
          str_ptr,
          font_name[null_font],
          font_ptr + 1 - null_font,
        );
        // There's no point in checking these values against 
        // the range $[0,255]$, since the data type is 
        // unsigned char , and all values of that type are 
        // in that range by definition.
        undump_upper_check_things(
          str_ptr,
          font_area[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          font_bc[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          font_ec[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          char_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          width_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          height_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          depth_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          italic_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          lig_kern_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          kern_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          exten_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_things(
          param_base[null_font],
          font_ptr + 1 - null_font,
        );
        undump_checked_things(
          min_halfword,
          lo_mem_max,
          font_glue[null_font],
          font_ptr + 1 - null_font,
        );
        undump_checked_things(
          0,
          fmem_ptr - 1,
          bchar_label[null_font],
          font_ptr + 1 - null_font,
        );
        undump_checked_things(
          min_quarterword,
          non_char,
          font_bchar[null_font],
          font_ptr + 1 - null_font,
        );
        undump_checked_things(
          min_quarterword,
          non_char,
          font_false_bchar[null_font],
          font_ptr + 1 - null_font,
        );
    }
⟧

1378.

⟦1378 Dump the hyphenation tables⟧ = ⟦
    dump_int(hyph_count)

    if (hyph_next <= hyph_prime) {
        hyph_next = hyph_size;
    }

    // minimum value of hyphen_size needed
    dump_int(hyph_next)

    for (k in 0 to hyph_size) {
        if (hyph_word[k] != 0) {
            // assumes number of hyphen exceptions does not 
            // exceed 65535
            dump_int(k + 65536 * hyph_link[k]);
            dump_int(hyph_word[k]);
            dump_int(hyph_list[k]);
        }
    }

    print_ln

    print_int(hyph_count)

    if (hyph_count != 1) {
        print(strpool!(" hyphenation exceptions"));
    } else {
        print(strpool!(" hyphenation exception"));
    }

    if (trie_not_ready) {
        init_trie;
    }

    dump_int(trie_max)

    dump_int(hyph_start)

    dump_things(trie_trl[0], trie_max + 1)

    dump_things(trie_tro[0], trie_max + 1)

    dump_things(trie_trc[0], trie_max + 1)

    dump_int(max_hyph_char)

    dump_int(trie_op_ptr)

    dump_things(hyf_distance[1], trie_op_ptr)

    dump_things(hyf_num[1], trie_op_ptr)

    dump_things(hyf_next[1], trie_op_ptr)

    print_nl(strpool!("Hyphenation trie of length "))

    print_int(trie_max)

    print(strpool!(" has "))

    print_int(trie_op_ptr)

    if (trie_op_ptr != 1) {
        print(strpool!(" ops"));
    } else {
        print(strpool!(" op"));
    }

    print(strpool!(" out of "))

    print_int(trie_op_size)

    for (k in biggest_lang downto 0) {
        if (trie_used[k] > min_quarterword) {
            print_nl(strpool!("  "));
            print_int(qo(trie_used[k]));
            print(strpool!(" for language "));
            print_int(k);
            dump_int(k);
            dump_int(qo(trie_used[k]));
        }
    }
⟧

1379. Only “nonempty” parts of op_start need to be restored.

⟦1379 Undump the hyphenation tables⟧ = ⟦
    undump_size(0)(hyph_size)("hyph_size")(hyph_count)

    undump_size(hyph_prime)(hyph_size)("hyph_size")(
      hyph_next,
    )

    j = 0

    for (k in 1 to hyph_count) {
        undump_int(j);
        if (j < 0) {
            goto bad_fmt;
        }
        if (j > 65535) {
            hyph_next = j div 65536;
            j = j - hyph_next * 65536;
        } else {
            hyph_next = 0;
        }
        if ((j >= hyph_size) || (hyph_next > hyph_size)) {
            goto bad_fmt;
        }
        hyph_link[j] = hyph_next;
        undump(0)(str_ptr)(hyph_word[j]);
        undump(min_halfword)(max_halfword)(hyph_list[j]);
        //  j is now the largest occupied location in 
        // hyph_word 
    }

    incr(j)

    if (j < hyph_prime) {
        j = hyph_prime;
    }

    hyph_next = j

    if (hyph_next >= hyph_size) {
        hyph_next = hyph_prime;
    } else if (hyph_next >= hyph_prime) {
        incr(hyph_next);
    }

    undump_size(0)(trie_size)("trie size")(j)

    init!{
        trie_max = j;
    }

    // These first three haven't been allocated yet unless 
    // we're \.{INITEX}; we do that precisely so we don't 
    // allocate more space than necessary.
    undump(0)(j)(hyph_start)

    if (!trie_trl) {
        trie_trl = xmalloc_array(trie_pointer, j + 1);
    }

    undump_things(trie_trl[0], j + 1)

    if (!trie_tro) {
        trie_tro = xmalloc_array(trie_pointer, j + 1);
    }

    undump_things(trie_tro[0], j + 1)

    if (!trie_trc) {
        trie_trc = xmalloc_array(quarterword, j + 1);
    }

    undump_things(trie_trc[0], j + 1)

    undump_int(max_hyph_char)

    undump_size(0)(trie_op_size)("trie op size")(j)

    init!{
        trie_op_ptr = j;
        // I'm not sure we have such a strict limitation 
        // (64) on these values, so let's leave them 
        // unchecked.
    }

    undump_things(hyf_distance[1], j)

    undump_things(hyf_num[1], j)

    undump_upper_check_things(max_trie_op, hyf_next[1], j)

    init!{
        for (k in 0 to biggest_lang) {
            trie_used[k] = min_quarterword;
        }
    }

    k = biggest_lang + 1

    while (j > 0) {
        undump(0)(k - 1)(k);
        undump(1)(j)(x);
        init!{
            trie_used[k] = qi(x);
        }
        j = j - x;
        op_start[k] = qo(j);
    }

    init!{
        trie_not_ready = false;
    }
⟧

1380. We have already printed a lot of statistics, so we set tracing_stats = 0 to prevent them from appearing again.

⟦1380 Dump a couple more things and the closing check word⟧ = ⟦
    dump_int(interaction)

    dump_int(format_ident)

    dump_int(69069)

    tracing_stats = 0

1381.

⟦1381 Undump a couple more things and the closing check word⟧ = ⟦
    undump(batch_mode)(error_stop_mode)(interaction)

    if (interaction_option != unspecified_mode) {
        interaction = interaction_option;
    }

    undump(0)(str_ptr)(format_ident)

    undump_int(x)

    if (x != 69069) {
        goto bad_fmt;
    }
⟧

1382.

⟦1382 Create the |format_ident|, open the format file, and inform the user that dumping has begun⟧ = ⟦
    selector = new_string

    print(strpool!(" (preloaded format="))

    print(job_name)

    print_char(ord!(" "))

    print_int(year)

    print_char(ord!("."))

    print_int(month)

    print_char(ord!("."))

    print_int(day)

    print_char(ord!(")"))

    if (interaction == batch_mode) {
        selector = log_only;
    } else {
        selector = term_and_log;
    }

    str_room(1)

    format_ident = make_string

    pack_job_name(format_extension)

    while (!w_open_out(fmt_file)) {
        prompt_file_name(
          strpool!("format file name"),
          format_extension,
        );
    }

    print_nl(strpool!("Beginning to dump on file "))

    slow_print(w_make_name_string(fmt_file))

    flush_string

    print_nl(strpool!(""))

    slow_print(format_ident)
⟧

1383.

⟦1383 Close the format file⟧ = ⟦
    w_close(fmt_file)
⟧

1384. [51] The main program. This is it: the part of TEX that executes all those procedures we have written.

Well—almost. Let’s leave space for a few more routines that we may have forgotten.

⟦1387 Last-minute procedures⟧

1385. We have noted that there are two versions of TEX82. One, called INITEX, has to be run first; it initializes everything from scratch, without reading a format file, and it has the capability of dumping a format file. The other one is called ‘VIRTEX’; it is a “virgin” program that needs to input a format file in order to get started. VIRTEX typically has more memory capacity than INITEX, because it does not need the space consumed by the auxiliary hyphenation tables and the numerous calls on primitive , etc.

The VIRTEX program cannot read a format file instantaneously, of course; the best implementations therefore allow for production versions of TEX that not only avoid the loading routine for Pascal object code, they also have a format file pre-loaded. This is impossible to do if we stick to standard Pascal; but there is a simple way to fool many systems into avoiding the initialization, as follows: (1) We declare a global integer variable called ready_already . The probability is negligible that this variable holds any particular value like 314159 when VIRTEX is first loaded. (2) After we have read in a format file and initialized everything, we set ready_already = 314159 . (3) Soon VIRTEX will print ‘*’, waiting for more input; and at this point we interrupt the program and save its core image in some form that the operating system can reload speedily. (4) When that core image is activated, the program starts again at the beginning; but now ready_already == 314159 and all the other global variables have their initial values too. The former chastity has vanished!

In other words, if we allow ourselves to test the condition ready_already == 314159 , before ready_already has been assigned a value, we can avoid the lengthy initialization. Dirty tricks rarely pay off so handsomely.

On systems that allow such preloading, the standard program called TeX should be the one that has plain format preloaded, since that agrees with The TEXbook. Other versions, e.g., AmSTeX, should also be provided for commonly used formats.

⟦13 Global variables⟧ += ⟦
    // a sacrifice of purity for economy
    var ready_already: integer;
⟧

1386. Now this is really it: TEX starts and ends here.

The initial test involving ready_already should be deleted if the Pascal runtime system is smart enough to detect such a “mistake.”

@define const_chk(#) =>
    {
        if (# < paste!(inf, #)) {
            # = paste!(inf, #);
        } else if (# > paste!(sup, #)) {
            # = paste!(sup, #);
        }
        //  setup_bound_var stuff duplicated in \.{mf.ch}.
    }
@define setup_bound_var(#) =>
    bound_default = #;
    setup_bound_var_end
@define setup_bound_var_end(#) =>
    bound_name = #;
    setup_bound_var_end_end
@define setup_bound_var_end_end(#) =>
    setup_bound_variable(
      addressof(#),
      bound_name,
      bound_default,
    )
function main_body() {
    //  start_here 
    // Bounds that may be set from the configuration file. 
    // We want the user to be able to specify the names with 
    // underscores, but \.{TANGLE} removes underscores, so 
    // we're stuck giving the names twice, once as a string, 
    // once as the identifier. How ugly.
    setup_bound_var(0)("mem_bot")(mem_bot);
    //  memory_word s for mem in \.{INITEX}
    setup_bound_var(250000)("main_memory")(main_memory);
    // increase high mem in \.{VIRTEX}
    setup_bound_var(0)("extra_mem_top")(extra_mem_top);
    // increase low mem in \.{VIRTEX}
    setup_bound_var(0)("extra_mem_bot")(extra_mem_bot);
    setup_bound_var(200000)("pool_size")(pool_size);
    setup_bound_var(75000)("string_vacancies")(
      string_vacancies,
    );
    // min pool avail after fmt
    setup_bound_var(5000)("pool_free")(pool_free);
    setup_bound_var(15000)("max_strings")(max_strings);
    // the max_strings value doesn't include the 64K 
    // synthetic strings
    max_strings = max_strings + too_big_char;
    setup_bound_var(100)("strings_free")(strings_free);
    setup_bound_var(100000)("font_mem_size")(font_mem_size);
    setup_bound_var(500)("font_max")(font_max);
    // if ssup_trie_size increases, recompile
    setup_bound_var(20000)("trie_size")(trie_size);
    setup_bound_var(659)("hyph_size")(hyph_size);
    setup_bound_var(3000)("buf_size")(buf_size);
    setup_bound_var(50)("nest_size")(nest_size);
    setup_bound_var(15)("max_in_open")(max_in_open);
    setup_bound_var(60)("param_size")(param_size);
    setup_bound_var(4000)("save_size")(save_size);
    setup_bound_var(300)("stack_size")(stack_size);
    setup_bound_var(16384)("dvi_buf_size")(dvi_buf_size);
    setup_bound_var(79)("error_line")(error_line);
    setup_bound_var(50)("half_error_line")(half_error_line);
    setup_bound_var(79)("max_print_line")(max_print_line);
    setup_bound_var(0)("hash_extra")(hash_extra);
    setup_bound_var(10000)("expand_depth")(expand_depth);
    const_chk(mem_bot);
    const_chk(main_memory);
    Init!{
        extra_mem_top = 0;
        extra_mem_bot = 0;
    }
    if (extra_mem_bot > sup_main_memory) {
        extra_mem_bot = sup_main_memory;
    }
    if (extra_mem_top > sup_main_memory) {
        //  mem_top is an index, main_memory a size
        extra_mem_top = sup_main_memory;
    }
    mem_top = mem_bot + main_memory - 1;
    mem_min = mem_bot;
    // Check other constants against their sup and inf.
    mem_max = mem_top;
    const_chk(trie_size);
    const_chk(hyph_size);
    const_chk(buf_size);
    const_chk(nest_size);
    const_chk(max_in_open);
    const_chk(param_size);
    const_chk(save_size);
    const_chk(stack_size);
    const_chk(dvi_buf_size);
    const_chk(pool_size);
    const_chk(string_vacancies);
    const_chk(pool_free);
    const_chk(max_strings);
    const_chk(strings_free);
    const_chk(font_mem_size);
    const_chk(font_max);
    const_chk(hash_extra);
    if (error_line > ssup_error_line) {
        // array memory allocation
        error_line = ssup_error_line;
    }
    buffer = xmalloc_array(UnicodeScalar, buf_size);
    nest = xmalloc_array(list_state_record, nest_size);
    save_stack = xmalloc_array(memory_word, save_size);
    input_stack = xmalloc_array(
      in_state_record,
      stack_size,
    );
    input_file = xmalloc_array(unicode_file, max_in_open);
    line_stack = xmalloc_array(integer, max_in_open);
    eof_seen = xmalloc_array(boolean, max_in_open);
    grp_stack = xmalloc_array(save_pointer, max_in_open);
    if_stack = xmalloc_array(pointer, max_in_open);
    source_filename_stack = xmalloc_array(
      str_number,
      max_in_open,
    );
    full_source_filename_stack = xmalloc_array(
      str_number,
      max_in_open,
    );
    param_stack = xmalloc_array(halfword, param_size);
    dvi_buf = xmalloc_array(eight_bits, dvi_buf_size);
    hyph_word = xmalloc_array(str_number, hyph_size);
    hyph_list = xmalloc_array(halfword, hyph_size);
    hyph_link = xmalloc_array(hyph_pointer, hyph_size);
    Init!{
        yzmem = xmalloc_array(
          memory_word,
          mem_top - mem_bot + 1,
        );
        // Some compilers require mem_bot == 0 
        zmem = yzmem - mem_bot;
        eqtb_top = eqtb_size + hash_extra;
        if (hash_extra == 0) {
            hash_top = undefined_control_sequence;
        } else {
            hash_top = eqtb_top;
        }
        yhash = xmalloc_array(
          two_halves,
          1 + hash_top - hash_offset,
        );
        // Some compilers require hash_offset == 0 
        hash = yhash - hash_offset;
        next(hash_base) = 0;
        text(hash_base) = 0;
        for (hash_used in hash_base + 1 to hash_top) {
            hash[hash_used] = hash[hash_base];
        }
        zeqtb = xmalloc_array(memory_word, eqtb_top);
        eqtb = zeqtb;
        str_start = xmalloc_array(
          pool_pointer,
          max_strings,
        );
        str_pool = xmalloc_array(
          packed_ASCII_code,
          pool_size,
        );
        font_info = xmalloc_array(
          fmemory_word,
          font_mem_size,
        );
    }
    // in case we quit during initialization
    history = fatal_error_stop;
    // open the terminal for output
    t_open_out;
    if (ready_already == 314159) {
        goto start_of_TEX;
    }
    ⟦14 Check the ``constant'' values for consistency⟧
    if (bad > 0) {
        wterm_ln(
          "Ouch---my internal constants have been clobbered!",
          "---case ",
          bad:1,
        );
        goto final_end;
    }
    // set global variables to their starting values
    initialize;
    Init!{
        if (!get_strings_started) {
            goto final_end;
        }
        // call primitive for each primitive
        init_prim;
        init_str_ptr = str_ptr;
        init_pool_ptr = pool_ptr;
        fix_date_and_time;
    }
    ready_already = 314159;
  start_of_TEX:
    ⟦55 Initialize the output routines⟧
    ⟦1391 Get the first line of input and prepare to start⟧
    // ready to go!
    history = spotless;
    ⟦1711 Initialize synctex primitive⟧
    // come to life
    main_control;
    // prepare for death
    final_cleanup;
    close_files_and_terminate;
  final_end:
    do_final_end;
    //  main_body 
}

1387. Here we do whatever is needed to complete TEX’s job gracefully on the local operating system. The code here might come into play after a fatal error; it must therefore consist entirely of “safe” operations that cannot produce error messages. For example, it would be a mistake to call str_room or make_string at this time, because a call on overflow might lead to an infinite loop. (Actually there’s one way to get error messages, via prepare_mag ; but that can’t cause infinite recursion.)

If final_cleanup is bypassed, this program doesn’t bother to close the input files that may still be open.

⟦1387 Last-minute procedures⟧ = ⟦
    function close_files_and_terminate() {
        var
          k: integer; // all-purpose index
        
        ⟦1441 Finish the extensions⟧
        new_line_char = -1;
        stat!{
            if (tracing_stats > 0) {
                ⟦1388 Output statistics about this job⟧
            }
        }
        wake_up_terminal;
        ⟦680 Finish the \.{DVI} file⟧
        ⟦1719 Close {\sl Sync\TeX} file and write status⟧
        if (log_opened) {
            wlog_cr;
            a_close(log_file);
            selector = selector - 2;
            if (selector == term_only) {
                print_nl(
                  strpool!("Transcript written on "),
                );
                print(log_name);
                print_char(ord!("."));
            }
        }
        print_ln;
        if (
            (edit_name_start != 0)
            && (interaction > batch_mode)
        ) {
            call_edit(
              str_pool,
              edit_name_start,
              edit_name_length,
              edit_line,
            );
        }
    }
⟧

1388. The present section goes directly to the log file instead of using print commands, because there’s no need for these strings to take up str_pool memory when a non-stat version of TEX is being used.

⟦1388 Output statistics about this job⟧ = ⟦
    if (log_opened) {
        wlog_ln(" ");
        wlog_ln(
          "Here is how much of TeX's memory",
          " you used:",
        );
        wlog(" ", str_ptr - init_str_ptr:1, " string");
        if (str_ptr != init_str_ptr + 1) {
            wlog("s");
        }
        wlog_ln(" out of ", max_strings - init_str_ptr:1);
        wlog_ln(
          " ",
          pool_ptr - init_pool_ptr:1,
          " string characters out of ",
          pool_size - init_pool_ptr:1,
        );
        wlog_ln(
          " ",
          lo_mem_max - mem_min + mem_end - hi_mem_min + 2:1,
          " words of memory out of ",
          mem_end + 1 - mem_min:1,
        );
        wlog_ln(
          " ",
          cs_count:1,
          " multiletter control sequences out of ",
          hash_size:1,
          "+",
          hash_extra:1,
        );
        wlog(
          " ",
          fmem_ptr:1,
          " words of font info for ",
          font_ptr - font_base:1,
          " font",
        );
        if (font_ptr != font_base + 1) {
            wlog("s");
        }
        wlog_ln(
          ", out of ",
          font_mem_size:1,
          " for ",
          font_max - font_base:1,
        );
        wlog(" ", hyph_count:1, " hyphenation exception");
        if (hyph_count != 1) {
            wlog("s");
        }
        wlog_ln(" out of ", hyph_size:1);
        wlog_ln(
          " ",
          max_in_stack:1,
          "i,",
          max_nest_stack:1,
          "n,",
          max_param_stack:1,
          "p,",
          max_buf_stack + 1:1,
          "b,",
          max_save_stack + 6:1,
          "s stack positions out of ",
          stack_size:1,
          "i,",
          nest_size:1,
          "n,",
          param_size:1,
          "p,",
          buf_size:1,
          "b,",
          save_size:1,
          "s",
        );
    }
⟧

1389. We get to the final_cleanup routine when \end or \dump has been scanned and its_all_over .

⟦1387 Last-minute procedures⟧ += ⟦
    function final_cleanup() {
        label exit;
        var
          c: small_number; // 0 for \.{\\end}, 1 for 
          // \.{\\dump}
        
        c = cur_chr;
        if (c != 1) {
            new_line_char = -1;
        }
        if (job_name == 0) {
            open_log_file;
        }
        while (input_ptr > 0) {
            if (state == token_list) {
                end_token_list;
            } else {
                end_file_reading;
            }
        }
        while (open_parens > 0) {
            print(strpool!(" )"));
            decr(open_parens);
        }
        if (cur_level > level_one) {
            print_nl(ord!("("));
            print_esc(strpool!("end occurred "));
            print(strpool!("inside a group at level "));
            print_int(cur_level - level_one);
            print_char(ord!(")"));
            if (eTeX_ex) {
                show_save_groups;
            }
        }
        while (cond_ptr != null) {
            print_nl(ord!("("));
            print_esc(strpool!("end occurred "));
            print(strpool!("when "));
            print_cmd_chr(if_test, cur_if);
            if (if_line != 0) {
                print(strpool!(" on line "));
                print_int(if_line);
            }
            print(strpool!(" was incomplete)"));
            if_line = if_line_field(cond_ptr);
            cur_if = subtype(cond_ptr);
            temp_ptr = cond_ptr;
            cond_ptr = link(cond_ptr);
            free_node(temp_ptr, if_node_size);
        }
        if (history != spotless) {
            if ((
                (history == warning_issued)
                || (interaction < error_stop_mode)
            )) {
                if (selector == term_and_log) {
                    selector = term_only;
                    print_nl(
                      strpool!("(see the transcript file for additional information)"),
                    );
                    selector = term_and_log;
                }
            }
        }
        if (c == 1) {
            Init!{
                for (c in top_mark_code to (
                  split_bot_mark_code
                )) {
                    if (cur_mark[c] != null) {
                        delete_token_ref(cur_mark[c]);
                    }
                }
                if (sa_mark != null) {
                    if (do_marks(destroy_marks, 0, sa_mark)) {
                        sa_mark = null;
                    }
                }
                for (c in last_box_code to vsplit_code) {
                    flush_node_list(disc_ptr[c]);
                }
                if (last_glue != max_halfword) {
                    delete_glue_ref(last_glue);
                }
                store_fmt_file;
                return;
            }
            print_nl(
              strpool!("(\\dump is performed only by INITEX)"),
            );
            return;
        }
      exit:
    }
⟧

1390.

⟦1387 Last-minute procedures⟧ += ⟦
    init!{
        // initialize all the primitives
        function init_prim() {
            no_new_control_sequence = false;
            first = 0;
            ⟦252 Put each of \TeX's primitives into the hash table⟧
            no_new_control_sequence = true;
        }
    }
⟧

1391. When we begin the following code, TEX’s tables may still contain garbage; the strings might not even be present. Thus we must proceed cautiously to get bootstrapped in.

But when we finish this part of the program, TEX is ready to call on the main_control routine to do its work.

⟦1391 Get the first line of input and prepare to start⟧ = ⟦
    {
        ⟦361 Initialize the input routines⟧
        ⟦1451 Enable \eTeX, if requested⟧
        if (
            (format_ident == 0)
            || (buffer[loc] == ord!("&")) || dump_line
        ) {
            if (format_ident != 0) {
                // erase preloaded format
                initialize;
            }
            if (!open_fmt_file) {
                goto final_end;
            }
            if (!load_fmt_file) {
                w_close(fmt_file);
                goto final_end;
            }
            w_close(fmt_file);
            eqtb = zeqtb;
            while (
                (loc < limit)
                && (buffer[loc] == ord!(" "))
            ) {
                incr(loc);
            }
        }
        if (eTeX_ex) {
            wterm_ln("entering extended mode");
        }
        if (end_line_char_inactive) {
            decr(limit);
        } else {
            buffer[limit] = end_line_char;
        }
        if (mltex_enabled_p) {
            wterm_ln("MLTeX v2.2 enabled");
        }
        fix_date_and_time;
        init!{
            if (trie_not_ready) {
                // initex without format loaded
                trie_trl = xmalloc_array(
                  trie_pointer,
                  trie_size,
                );
                trie_tro = xmalloc_array(
                  trie_pointer,
                  trie_size,
                );
                trie_trc = xmalloc_array(
                  quarterword,
                  trie_size,
                );
                trie_c = xmalloc_array(
                  packed_ASCII_code,
                  trie_size,
                );
                trie_o = xmalloc_array(
                  trie_opcode,
                  trie_size,
                );
                trie_l = xmalloc_array(
                  trie_pointer,
                  trie_size,
                );
                trie_r = xmalloc_array(
                  trie_pointer,
                  trie_size,
                );
                trie_hash = xmalloc_array(
                  trie_pointer,
                  trie_size,
                );
                trie_taken = xmalloc_array(
                  boolean,
                  trie_size,
                );
                trie_root = 0;
                trie_c[0] = si(0);
                trie_ptr = 0;
                hyph_root = 0;
                // Allocate and initialize font arrays
                hyph_start = 0;
                font_mapping = xmalloc_array(
                  void_pointer,
                  font_max,
                );
                font_layout_engine = xmalloc_array(
                  void_pointer,
                  font_max,
                );
                font_flags = xmalloc_array(char, font_max);
                font_letter_space = xmalloc_array(
                  scaled,
                  font_max,
                );
                font_check = xmalloc_array(
                  four_quarters,
                  font_max,
                );
                font_size = xmalloc_array(scaled, font_max);
                font_dsize = xmalloc_array(
                  scaled,
                  font_max,
                );
                font_params = xmalloc_array(
                  font_index,
                  font_max,
                );
                font_name = xmalloc_array(
                  str_number,
                  font_max,
                );
                font_area = xmalloc_array(
                  str_number,
                  font_max,
                );
                font_bc = xmalloc_array(
                  UTF16_code,
                  font_max,
                );
                font_ec = xmalloc_array(
                  UTF16_code,
                  font_max,
                );
                font_glue = xmalloc_array(
                  halfword,
                  font_max,
                );
                hyphen_char = xmalloc_array(
                  integer,
                  font_max,
                );
                skew_char = xmalloc_array(
                  integer,
                  font_max,
                );
                bchar_label = xmalloc_array(
                  font_index,
                  font_max,
                );
                font_bchar = xmalloc_array(
                  nine_bits,
                  font_max,
                );
                font_false_bchar = xmalloc_array(
                  nine_bits,
                  font_max,
                );
                char_base = xmalloc_array(
                  integer,
                  font_max,
                );
                width_base = xmalloc_array(
                  integer,
                  font_max,
                );
                height_base = xmalloc_array(
                  integer,
                  font_max,
                );
                depth_base = xmalloc_array(
                  integer,
                  font_max,
                );
                italic_base = xmalloc_array(
                  integer,
                  font_max,
                );
                lig_kern_base = xmalloc_array(
                  integer,
                  font_max,
                );
                kern_base = xmalloc_array(
                  integer,
                  font_max,
                );
                exten_base = xmalloc_array(
                  integer,
                  font_max,
                );
                param_base = xmalloc_array(
                  integer,
                  font_max,
                );
                font_ptr = null_font;
                fmem_ptr = 7;
                font_name[null_font] = strpool!("nullfont");
                font_area[null_font] = strpool!("");
                hyphen_char[null_font] = ord!("-");
                skew_char[null_font] = -1;
                bchar_label[null_font] = non_address;
                font_bchar[null_font] = non_char;
                font_false_bchar[null_font] = non_char;
                font_bc[null_font] = 1;
                font_ec[null_font] = 0;
                font_size[null_font] = 0;
                font_dsize[null_font] = 0;
                char_base[null_font] = 0;
                width_base[null_font] = 0;
                height_base[null_font] = 0;
                depth_base[null_font] = 0;
                italic_base[null_font] = 0;
                lig_kern_base[null_font] = 0;
                kern_base[null_font] = 0;
                exten_base[null_font] = 0;
                font_glue[null_font] = null;
                font_params[null_font] = 7;
                font_mapping[null_font] = 0;
                param_base[null_font] = -1;
                for (font_k in 0 to 6) {
                    font_info[font_k].sc = 0;
                }
            }
        }
        font_used = xmalloc_array(boolean, font_max);
        for (font_k in font_base to font_max) {
            font_used[font_k] = false;
        }
        random_seed = 
            (microseconds * 1000)
            + (epochseconds % 1000000)
        ;
        init_randoms(random_seed);
        ⟦813 Compute the magic offset⟧
        ⟦79 Initialize the print |selector| based on |interaction|⟧
        if (
            (loc < limit)
            && (cat_code(buffer[loc]) != escape)
        ) {
            // \.{\\input} assumed
            start_input;
        }
    }
⟧

1392. [52] Debugging. Once TEX is working, you should be able to diagnose most errors with the \show commands and other diagnostic features. But for the initial stages of debugging, and for the revelation of really deep mysteries, you can compile TEX with a few more aids, including the Pascal runtime checks and its debugger. An additional routine called debug_help will also come into play when you type ‘D’ after an error message; debug_help also occurs just before a fatal error causes TEX to succumb.

The interface to debug_help is primitive, but it is good enough when used with a Pascal debugger that allows you to set breakpoints and to read variables and change their values. After getting the prompt ‘debug #’, you type either a negative number (this exits debug_help ), or zero (this goes to a location where you can set a breakpoint, thereby entering into dialog with the Pascal debugger), or a positive number m followed by an argument n . The meaning of m and n will be clear from the program below. (If m == 13 , there is an additional argument, l .)

// place where a breakpoint is desirable
@define breakpoint => 888
⟦1387 Last-minute procedures⟧ += ⟦
    debug!{
        // routine to display various things
        function debug_help() {
            label breakpoint, exit;
            var k, l, m, n: integer;
            
            clear_terminal;
            loop {
                wake_up_terminal;
                print_nl(strpool!("debug # (-1 to exit):"));
                update_terminal;
                read(term_in, m);
                if (m < 0) {
                    return;
                } else if (m == 0) {
                    // do something to cause a core dump
                    dump_core;
                } else {
                    read(term_in, n);
                    case m {
                      ⟦1393 Numbered cases for |debug_help|⟧
                      othercases:
                        print(ord!("?"));
                    }
                }
            }
          exit:
        }
    }
⟧

1393.

⟦1393 Numbered cases for |debug_help|⟧ = ⟦
    1:

    print_word(mem[n]) // display mem [ n ] in all forms

    2:

    print_int(info(n))

    3:

    print_int(link(n))

    4:

    print_word(eqtb[n])

    5:

    {
        print_scaled(font_info[n].sc);
        print_char(ord!(" "));
        print_int(font_info[n].qqqq.b0);
        print_char(ord!(":"));
        print_int(font_info[n].qqqq.b1);
        print_char(ord!(":"));
        print_int(font_info[n].qqqq.b2);
        print_char(ord!(":"));
        print_int(font_info[n].qqqq.b3);
    }

    6:

    print_word(save_stack[n])

    7:

    // show a box, abbreviated by show_box_depth and 
    // show_box_breadth 
    show_box(n)

    8:

    {
        breadth_max = 10000;
        depth_threshold = pool_size - pool_ptr - 10;
        // show a box in its entirety
        show_node_list(n);
    }

    9:

    show_token_list(n, null, 1000)

    10:

    slow_print(n)

    11:

    // check wellformedness; print new busy locations if n > 
    // 0 
    check_mem(n > 0)

    12:

    search_mem(n) // look for pointers to n 

    13:

    {
        read(term_in, l);
        print_cmd_chr(n, l);
    }

    14:

    for (k in 0 to n) {
        print(buffer[k]);
    }

    15:

    {
        font_in_short_display = null_font;
        short_display(n);
    }

    16:

    panicking = !panicking
⟧

1394. [53] Extensions. The program above includes a bunch of “hooks” that allow further capabilities to be added without upsetting TEX’s basic structure. Most of these hooks are concerned with “whatsit” nodes, which are intended to be used for special purposes; whenever a new extension to TEX involves a new kind of whatsit node, a corresponding change needs to be made to the routines below that deal with such nodes, but it will usually be unnecessary to make many changes to the other parts of this program.

In order to demonstrate how extensions can be made, we shall treat ‘\write’, ‘\openout’, ‘\closeout’, ‘\immediate’, ‘\special’, and ‘\setlanguage’ as if they were extensions. These commands are actually primitives of TEX, and they should appear in all implementations of the system; but let’s try to imagine that they aren’t. Then the program below illustrates how a person could add them.

Sometimes, of course, an extension will require changes to TEX itself; no system of hooks could be complete enough for all conceivable extensions. The features associated with ‘\write’ are almost all confined to the following paragraphs, but there are small parts of the print_ln and print_char procedures that were introduced specifically to \write characters. Furthermore one of the token lists recognized by the scanner is a write_text ; and there are a few other miscellaneous places where we have already provided for some aspect of \write. The goal of a TEX extender should be to minimize alterations to the standard parts of the program, and to avoid them completely if possible. He or she should also be quite sure that there’s no easy way to accomplish the desired goals with the standard features that TEX already has. “Think thrice before extending,” because that may save a lot of work, and it will also keep incompatible extensions of TEX from proliferating.

1395. First let’s consider the format of whatsit nodes that are used to represent the data associated with \write and its relatives. Recall that a whatsit has type == whatsit_node , and the subtype is supposed to distinguish different kinds of whatsits. Each node occupies two or more words; the exact number is immaterial, as long as it is readily determined from the subtype or other data.

We shall introduce five subtype values here, corresponding to the control sequences \openout, \write, \closeout, \special, and \setlanguage. The second word of I/O whatsits has a write_stream field that identifies the write-stream number (0 to 15, or 16 for out-of-range and positive, or 17 for out-of-range and negative). In the case of \write and \special, there is also a field that points to the reference count of a token list that should be sent. In the case of \openout, we need three words and three auxiliary subfields to hold the string numbers for name, area, and extension.

// number of words in a write/whatsit node
@define write_node_size => 2
// number of words in an open/whatsit node
@define open_node_size => 3
//  subtype in whatsits that represent files to 
// \.{\\openout}
@define open_node => 0
//  subtype in whatsits that represent things to \.{\\write}
@define write_node => 1
//  subtype in whatsits that represent streams to 
// \.{\\closeout}
@define close_node => 2
//  subtype in whatsits that represent \.{\\special} things
@define special_node => 3
//  subtype in whatsits that change the current language
@define language_node => 4
// language number, in the range 0 .. 255 
@define what_lang(#) => link(# + 1)
// minimum left fragment, in the range 1 .. 63 
@define what_lhm(#) => type(# + 1)
// minimum right fragment, in the range 1 .. 63 
@define what_rhm(#) => subtype(# + 1)
// reference count of token list to write
@define write_tokens(#) => link(# + 1)
// stream number (0 to 17)
@define write_stream(#) => info(# + 1)
// string number of file name to open
@define open_name(#) => link(# + 1)
// string number of file area for open_name 
@define open_area(#) => info(# + 2)
// string number of file extension for open_name 
@define open_ext(#) => link(# + 2)

1396. The sixteen possible \write streams are represented by the write_file array. The j th file is open if and only if write_open[j] == true . The last two streams are special; write_open[16] represents a stream number greater than 15, while write_open[17] represents a negative stream number, and both of these variables are always false .

⟦13 Global variables⟧ += ⟦
    var write_file: array [0 .. 15] of alpha_file;

    var write_open: array [0 .. 17] of boolean;
⟧

1397.

⟦23 Set initial values of key variables⟧ += ⟦
    for (k in 0 to 17) {
        write_open[k] = false;
    }
⟧

1398. Extensions might introduce new command codes; but it’s best to use extension with a modifier, whenever possible, so that main_control stays the same.

// command modifier for \.{\\immediate}
@define immediate_code => 4
// command modifier for \.{\\setlanguage}
@define set_language_code => 5
@define pdftex_first_extension_code => 6
@define pdf_save_pos_node => pdftex_first_extension_code + 15
@define reset_timer_code => pdftex_first_extension_code + 25
@define set_random_seed_code =>
    pdftex_first_extension_code + 27
// command modifier for \.{\\XeTeXpicfile}, skipping codes 
// pdfTeX might use
@define pic_file_code => 41
// command modifier for \.{\\XeTeXpdffile}
@define pdf_file_code => 42
// command modifier for \.{\\XeTeXglyph}
@define glyph_code => 43
@define XeTeX_input_encoding_extension_code => 44
@define XeTeX_default_encoding_extension_code => 45
@define XeTeX_linebreak_locale_extension_code => 46
⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(strpool!("openout"), extension, open_node)

    primitive(strpool!("write"), extension, write_node)

    write_loc = cur_val

    primitive(strpool!("closeout"), extension, close_node)

    primitive(strpool!("special"), extension, special_node)

    text(frozen_special) = strpool!("special")

    eqtb[frozen_special] = eqtb[cur_val]

    primitive(
      strpool!("immediate"),
      extension,
      immediate_code,
    )

    primitive(
      strpool!("setlanguage"),
      extension,
      set_language_code,
    )

    primitive(
      strpool!("resettimer"),
      extension,
      reset_timer_code,
    )

    primitive(
      strpool!("setrandomseed"),
      extension,
      set_random_seed_code,
    )
⟧

1399. The \XeTeXpicfile and \XeTeXpdffile primitives are only defined in extended mode.

⟦1399 Generate all \eTeX\ primitives⟧ = ⟦
    primitive(
      strpool!("XeTeXpicfile"),
      extension,
      pic_file_code,
    )

    primitive(
      strpool!("XeTeXpdffile"),
      extension,
      pdf_file_code,
    )

    primitive(strpool!("XeTeXglyph"), extension, glyph_code)

    primitive(
      strpool!("XeTeXlinebreaklocale"),
      extension,
      XeTeX_linebreak_locale_extension_code,
    )

    primitive(
      strpool!("XeTeXinterchartoks"),
      assign_toks,
      XeTeX_inter_char_loc,
    )

    primitive(
      strpool!("pdfsavepos"),
      extension,
      pdf_save_pos_node,
    )
⟧

1400. The variable write_loc just introduced is used to provide an appropriate error message in case of “runaway” write texts.

⟦13 Global variables⟧ += ⟦
    //  eqtb address of \.{\\write}
    var write_loc: pointer;
⟧

1401.

⟦253 Cases of |print_cmd_chr| for symbolic printing of primitives⟧ += ⟦
    extension:

    case chr_code {
      open_node:
        print_esc(strpool!("openout"));
      write_node:
        print_esc(strpool!("write"));
      close_node:
        print_esc(strpool!("closeout"));
      special_node:
        print_esc(strpool!("special"));
      immediate_code:
        print_esc(strpool!("immediate"));
      set_language_code:
        print_esc(strpool!("setlanguage"));
      pdf_save_pos_node:
        print_esc(strpool!("pdfsavepos"));
      reset_timer_code:
        print_esc(strpool!("resettimer"));
      set_random_seed_code:
        print_esc(strpool!("setrandomseed"));
      pic_file_code:
        print_esc(strpool!("XeTeXpicfile"));
      pdf_file_code:
        print_esc(strpool!("XeTeXpdffile"));
      glyph_code:
        print_esc(strpool!("XeTeXglyph"));
      XeTeX_linebreak_locale_extension_code:
        print_esc(strpool!("XeTeXlinebreaklocale"));
      XeTeX_input_encoding_extension_code:
        print_esc(strpool!("XeTeXinputencoding"));
      XeTeX_default_encoding_extension_code:
        print_esc(strpool!("XeTeXdefaultencoding"));
      othercases:
        print(strpool!("[unknown extension!]"));
    }
⟧

1402. When an extension command occurs in main_control , in any mode, the do_extension routine is called.

⟦1402 Cases of |main_control| that are for extensions to \TeX⟧ = ⟦
    any_mode(extension):
      do_extension;;
⟧

1403.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    ⟦1404 Declare procedures needed in |do_extension|⟧

    function do_extension() {
        var
          i, j, k: integer, // all-purpose integers
          p: pointer; // all-purpose pointers
        
        case cur_chr {
          open_node:
            ⟦1406 Implement \.{\\openout}⟧
          write_node:
            ⟦1407 Implement \.{\\write}⟧
          close_node:
            ⟦1408 Implement \.{\\closeout}⟧
          special_node:
            ⟦1409 Implement \.{\\special}⟧
          immediate_code:
            ⟦1438 Implement \.{\\immediate}⟧
          set_language_code:
            ⟦1440 Implement \.{\\setlanguage}⟧
          pdf_save_pos_node:
            ⟦1450 Implement \.{\\pdfsavepos}⟧
          reset_timer_code:
            ⟦1414 Implement \.{\\resettimer}⟧
          set_random_seed_code:
            ⟦1413 Implement \.{\\setrandomseed}⟧
          pic_file_code:
            ⟦1442 Implement \.{\\XeTeXpicfile}⟧
          pdf_file_code:
            ⟦1443 Implement \.{\\XeTeXpdffile}⟧
          glyph_code:
            ⟦1444 Implement \.{\\XeTeXglyph}⟧
          XeTeX_input_encoding_extension_code:
            ⟦1446 Implement \.{\\XeTeXinputencoding}⟧
          XeTeX_default_encoding_extension_code:
            ⟦1447 Implement \.{\\XeTeXdefaultencoding}⟧
          XeTeX_linebreak_locale_extension_code:
            ⟦1448 Implement \.{\\XeTeXlinebreaklocale}⟧
          othercases:
            confusion(strpool!("ext1"));
        }
    }
⟧

1404. Here is a subroutine that creates a whatsit node having a given subtype and a given number of words. It initializes only the first word of the whatsit, and appends it to the current list.

⟦1404 Declare procedures needed in |do_extension|⟧ = ⟦
    function new_whatsit(s: small_number, w: small_number) {
        var
          p: pointer; // the new node
        
        p = get_node(w);
        type(p) = whatsit_node;
        subtype(p) = s;
        link(tail) = p;
        tail = p;
    }
⟧

1405. The next subroutine uses cur_chr to decide what sort of whatsit is involved, and also inserts a write_stream number.

⟦1404 Declare procedures needed in |do_extension|⟧ += ⟦
    function new_write_whatsit(w: small_number) {
        new_whatsit(cur_chr, w);
        if (w != write_node_size) {
            scan_four_bit_int;
        } else {
            scan_int;
            if (cur_val < 0) {
                cur_val = 17;
            } else if ((cur_val > 15) && (cur_val != 18)) {
                cur_val = 16;
            }
        }
        write_stream(tail) = cur_val;
    }
⟧

1406.

⟦1406 Implement \.{\\openout}⟧ = ⟦
    {
        new_write_whatsit(open_node_size);
        scan_optional_equals;
        scan_file_name;
        open_name(tail) = cur_name;
        open_area(tail) = cur_area;
        open_ext(tail) = cur_ext;
    }
⟧

1407. When ‘\write 12{...}’ appears, we scan the token list ‘{...}’ without expanding its macros; the macros will be expanded later when this token list is rescanned.

⟦1407 Implement \.{\\write}⟧ = ⟦
    {
        k = cur_cs;
        new_write_whatsit(write_node_size);
        cur_cs = k;
        p = scan_toks(false, false);
        write_tokens(tail) = def_ref;
    }
⟧

1408.

⟦1408 Implement \.{\\closeout}⟧ = ⟦
    {
        new_write_whatsit(write_node_size);
        write_tokens(tail) = null;
    }
⟧

1409. When ‘\special{...}’ appears, we expand the macros in the token list as in \xdef and \mark.

⟦1409 Implement \.{\\special}⟧ = ⟦
    {
        new_whatsit(special_node, write_node_size);
        write_stream(tail) = null;
        p = scan_toks(false, true);
        write_tokens(tail) = def_ref;
    }
⟧

1410.

@define call_func(#) =>
    {
        if (# != 0) {
            do_nothing;
        }
    }
@define flushable(#) => (# == str_ptr - 1)
@define max_integer => 0x7fffffff // $2^{31}-1$
// flush a string if possible
function flush_str(s: str_number) {
    if (flushable(s)) {
        flush_string;
    }
}

// return a string from tokens list
function tokens_to_string(p: pointer): str_number {
    if (selector == new_string) {
        pdf_error(
          strpool!("tokens"),
          strpool!("tokens_to_string() called while selector = new_string"),
        );
    }
    old_setting = selector;
    selector = new_string;
    show_token_list(link(p), null, pool_size - pool_ptr);
    selector = old_setting;
    tokens_to_string = make_string;
}

function scan_pdf_ext_toks() {
    // like \.{\\special}
    call_func(scan_toks(false, true));
}

// to implement \.{\\strcmp}
function compare_strings() {
    label done;
    var
      s1, s2: str_number,
      i1, i2, j1, j2: pool_pointer,
      save_cur_cs: pointer;
    
    save_cur_cs = cur_cs;
    call_func(scan_toks(false, true));
    s1 = tokens_to_string(def_ref);
    delete_token_ref(def_ref);
    cur_cs = save_cur_cs;
    call_func(scan_toks(false, true));
    s2 = tokens_to_string(def_ref);
    delete_token_ref(def_ref);
    i1 = str_start_macro(s1);
    j1 = str_start_macro(s1 + 1);
    i2 = str_start_macro(s2);
    j2 = str_start_macro(s2 + 1);
    while ((i1 < j1) && (i2 < j2)) {
        if (str_pool[i1] < str_pool[i2]) {
            cur_val = -1;
            goto done;
        }
        if (str_pool[i1] > str_pool[i2]) {
            cur_val = 1;
            goto done;
        }
        incr(i1);
        incr(i2);
    }
    if ((i1 == j1) && (i2 == j2)) {
        cur_val = 0;
    } else if (i1 < j1) {
        cur_val = 1;
    } else {
        cur_val = -1;
    }
  done:
    flush_str(s2);
    flush_str(s1);
    cur_val_level = int_val;
}

1411.

⟦1411 Declare procedures that need to be declared forward for \pdfTeX⟧ = ⟦
    function get_microinterval(): integer {
        var
          s, m: integer; // seconds and microseconds
        
        seconds_and_micros(s, m);
        if ((s - epochseconds) > 32767) {
            get_microinterval = max_integer;
        } else if ((microseconds > m)) {
            get_microinterval = 
                ((s - 1 - epochseconds) * 65536)
                + (
                    ((m + 1000000 - microseconds) / 100)
                    * 65536
                )
                / 10000
            ;
        } else {
            get_microinterval = 
                ((s - epochseconds) * 65536)
                + (((m - microseconds) / 100) * 65536)
                / 10000
            ;
        }
    }
⟧

1412.

⟦23 Set initial values of key variables⟧ += ⟦
    seconds_and_micros(epochseconds, microseconds)

    init_start_time
⟧

1413. Negative random seed values are silently converted to positive ones

⟦1413 Implement \.{\\setrandomseed}⟧ = ⟦
    {
        scan_int;
        if (cur_val < 0) {
            negate(cur_val);
        }
        random_seed = cur_val;
        init_randoms(random_seed);
    }
⟧

1414.

⟦1414 Implement \.{\\resettimer}⟧ = ⟦
    {
        seconds_and_micros(epochseconds, microseconds);
    }
⟧

1415. Each new type of node that appears in our data structure must be capable of being displayed, copied, destroyed, and so on. The routines that we need for write-oriented whatsits are somewhat like those for mark nodes; other extensions might, of course, involve more subtlety here.

⟦57 Basic printing procedures⟧ += ⟦
    function print_write_whatsit(
      s: str_number,
      p: pointer,
    ) {
        print_esc(s);
        if (write_stream(p) < 16) {
            print_int(write_stream(p));
        } else if (write_stream(p) == 16) {
            print_char(ord!("*"));
        } else {
            print_char(ord!("-"));
        }
    }

    function print_native_word(p: pointer) {
        var i, c, cc: integer;
        
        for (i in 0 to native_length(p) - 1) {
            c = get_native_char(p, i);
            if ((c >= 0xd800) && (c <= 0xdbff)) {
                if (i < native_length(p) - 1) {
                    cc = get_native_char(p, i + 1);
                    if ((cc >= 0xdc00) && (cc <= 0xdfff)) {
                        c = 
                            0x10000
                            + (c - 0xd800)
                            * 0x400 + (cc - 0xdc00)
                        ;
                        print_char(c);
                        incr(i);
                    } else {
                        print(ord!("."));
                    }
                } else {
                    print(ord!("."));
                }
            } else {
                print_char(c);
            }
        }
    }
⟧

1416.

⟦1416 Display the whatsit node |p|⟧ = ⟦
    case subtype(p) {
      open_node:
        print_write_whatsit(strpool!("openout"), p);
        print_char(ord!("="));
        print_file_name(
          open_name(p),
          open_area(p),
          open_ext(p),
        );
      write_node:
        print_write_whatsit(strpool!("write"), p);
        print_mark(write_tokens(p));
      close_node:
        print_write_whatsit(strpool!("closeout"), p);
      special_node:
        print_esc(strpool!("special"));
        print_mark(write_tokens(p));
      language_node:
        print_esc(strpool!("setlanguage"));
        print_int(what_lang(p));
        print(strpool!(" (hyphenmin "));
        print_int(what_lhm(p));
        print_char(ord!(","));
        print_int(what_rhm(p));
        print_char(ord!(")"));
      pdf_save_pos_node:
        print_esc(strpool!("pdfsavepos"));
      native_word_node, native_word_node_AT:
        print_esc(font_id_text(native_font(p)));
        print_char(ord!(" "));
        print_native_word(p);
      glyph_node:
        print_esc(font_id_text(native_font(p)));
        print(strpool!(" glyph#"));
        print_int(native_glyph(p));
      pic_node, pdf_node:
        if (subtype(p) == pic_node) {
            print_esc(strpool!("XeTeXpicfile"));
        } else {
            print_esc(strpool!("XeTeXpdffile"));
        }
        print(strpool!(" \""));
        for (i in 0 to pic_path_length(p) - 1) {
            print_visible_char(pic_path_byte(p, i));
        }
        print(ord!("\""));
      othercases:
        print(strpool!("whatsit?"));
    }
⟧

1417. Picture nodes are tricky in that they are variable size.

@define total_pic_node_size(#) =>
    (
        pic_node_size
        + (pic_path_length(#) + sizeof(memory_word) - 1)
        div sizeof(memory_word)
    )
⟦1417 Make a partial copy of the whatsit node |p| and make |r| point to it; set |words| to the number of initial words not yet copied⟧ = ⟦
    case subtype(p) {
      open_node:
        r = get_node(open_node_size);
        words = open_node_size;
      write_node, special_node:
        r = get_node(write_node_size);
        add_token_ref(write_tokens(p));
        words = write_node_size;
      close_node, language_node:
        r = get_node(small_node_size);
        words = small_node_size;
      native_word_node, native_word_node_AT:
        words = native_size(p);
        r = get_node(words);
        while (words > 0) {
            decr(words);
            mem[r + words] = mem[p + words];
        }
        native_glyph_info_ptr(r) = null_ptr;
        native_glyph_count(r) = 0;
        copy_native_glyph_info(p, r);
      glyph_node:
        r = get_node(glyph_node_size);
        words = glyph_node_size;
      pic_node, pdf_node:
        words = total_pic_node_size(p);
        r = get_node(words);
      pdf_save_pos_node:
        r = get_node(small_node_size);
      othercases:
        confusion(strpool!("ext2"));
    }
⟧

1418.

⟦1418 Wipe out the whatsit node |p| and |goto done|⟧ = ⟦
    {
        case subtype(p) {
          open_node:
            free_node(p, open_node_size);
          write_node, special_node:
            delete_token_ref(write_tokens(p));
            free_node(p, write_node_size);
            goto done;
          close_node, language_node:
            free_node(p, small_node_size);
          native_word_node, native_word_node_AT:
            free_native_glyph_info(p);
            free_node(p, native_size(p));
          glyph_node:
            free_node(p, glyph_node_size);
          pic_node, pdf_node:
            free_node(p, total_pic_node_size(p));
          pdf_save_pos_node:
            free_node(p, small_node_size);
          othercases:
            confusion(strpool!("ext3"));
        }
        goto done;
    }
⟧

1419.

⟦1419 Incorporate a whatsit node into a vbox⟧ = ⟦
    {
        if (
            (subtype(p) == pic_node)
            || (subtype(p) == pdf_node)
        ) {
            x = x + d + height(p);
            d = depth(p);
            if (width(p) > w) {
                w = width(p);
            }
        }
    }
⟧

1420.

⟦1420 Incorporate a whatsit node into an hbox⟧ = ⟦
    {
        case subtype(p) {
          native_word_node, native_word_node_AT:
            // merge with any following word fragments in 
            // same font, discarding discretionary breaks
            if (
                (q != r + list_offset)
                && (type(q) == disc_node)
            ) {
                k = replace_count(q);
            } else {
                k = 0;
            }
            while ((link(q) != p)) {
                decr(k);
                // bring q up in preparation for deletion of 
                // nodes starting at p
                q = link(q);
                if (type(q) == disc_node) {
                    k = replace_count(q);
                }
            }
            pp = link(p);
          restart:
            if (
                (k <= 0)
                && (pp != null) && (!is_char_node(pp))
            ) {
                if (
                    (type(pp) == whatsit_node)
                    && (is_native_word_subtype(pp))
                    && (native_font(pp) == native_font(p))
                ) {
                    pp = link(pp);
                    goto restart;
                } else if ((type(pp) == disc_node)) {
                    ppp = link(pp);
                    if (
                        is_native_word_node(ppp)
                        && (
                            native_font(ppp)
                            == native_font(p)
                        )
                    ) {
                        pp = link(ppp);
                        goto restart;
                    }
                }
                // now pp points to the non- native_word 
                // node that ended the chain, or null
            }
            if ((pp != link(p))) {
                // found a chain of at least two pieces 
                // starting at p
                total_chars = 0;
                // the first fragment
                p = link(q);
                while ((p != pp)) {
                    if ((type(p) == whatsit_node)) {
                        // accumulate char count
                        total_chars = 
                            total_chars
                            + native_length(p)
                        ;
                    }
                    // remember last node seen
                    ppp = p;
                    // point to next fragment or 
                    // discretionary or terminator
                    p = link(p);
                }
                // the first fragment again
                p = link(q);
                // make new node for merged word
                pp = new_native_word_node(
                  native_font(p),
                  total_chars,
                );
                subtype(pp) = subtype(p);
                // link to preceding material
                link(q) = pp;
                // attach remainder of hlist to it
                link(pp) = link(ppp);
                // and detach from the old fragments
                // copy the chars into new node
                link(ppp) = null;
                total_chars = 0;
                ppp = p;
                repeat {
                    if ((type(ppp) == whatsit_node)) {
                        for (k in 0 to 
                            native_length(ppp)
                            - 1
                        ) {
                            set_native_char(
                              pp,
                              total_chars,
                              get_native_char(ppp, k),
                            );
                            incr(total_chars);
                        }
                    }
                    ppp = link(ppp);
                } until ((ppp == null));
                // delete the fragments
                flush_node_list(p);
                // update p to point to the new node
                p = link(q);
                // and measure it (i.e., re-do the OT 
                // layout)
                set_native_metrics(
                  p,
                  XeTeX_use_glyph_metrics,
                );
                // now incorporate the native_word node 
                // measurements into the box we're packing
            }
            if (height(p) > h) {
                h = height(p);
            }
            if (depth(p) > d) {
                d = depth(p);
            }
            x = x + width(p);
          glyph_node, pic_node, pdf_node:
            if (height(p) > h) {
                h = height(p);
            }
            if (depth(p) > d) {
                d = depth(p);
            }
            x = x + width(p);
          othercases:
            do_nothing;
        }
    }
⟧

1421.

⟦1421 Let |d| be the width of the whatsit |p|, and |goto found| if ``visible''⟧ = ⟦
    if (
        (is_native_word_subtype(p))
        || (subtype(p) == glyph_node)
        || (subtype(p) == pic_node)
        || (subtype(p) == pdf_node)
    ) {
        d = width(p);
        goto found;
    } else {
        d = 0;
    }
⟧

1422.

@define adv_past_linebreak(#) =>
    if (subtype(#) == language_node) {
        cur_lang = what_lang(#);
        l_hyf = what_lhm(#);
        r_hyf = what_rhm(#);
        set_hyph_index;
    } else if (
        (is_native_word_subtype(#))
        || (subtype(#) == glyph_node)
        || (subtype(#) == pic_node)
        || (subtype(#) == pdf_node)
    ) {
        act_width = act_width + width(#);
    }
⟦1422 Advance \(p)past a whatsit node in the \(l)|line_break| loop⟧ = ⟦
    adv_past_linebreak(cur_p)
⟧

1423.

@define adv_past_prehyph(#) =>
    if (subtype(#) == language_node) {
        cur_lang = what_lang(#);
        l_hyf = what_lhm(#);
        r_hyf = what_rhm(#);
        set_hyph_index;
    }
⟦1423 Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop⟧ = ⟦
    adv_past_prehyph(s)
⟧

1424.

⟦1424 Prepare to move whatsit |p| to the current page, then |goto contribute|⟧ = ⟦
    {
        if (
            (subtype(p) == pic_node)
            || (subtype(p) == pdf_node)
        ) {
            page_total = page_total + page_depth + height(p);
            page_depth = depth(p);
        }
        goto contribute;
    }
⟧

1425.

⟦1425 Process whatsit |p| in |vert_break| loop, |goto not_found|⟧ = ⟦
    {
        if (
            (subtype(p) == pic_node)
            || (subtype(p) == pdf_node)
        ) {
            cur_height = cur_height + prev_dp + height(p);
            prev_dp = depth(p);
        }
        goto not_found;
    }
⟧

1426.

⟦1426 Output the whatsit node |p| in a vlist⟧ = ⟦
    {
        case subtype(p) {
          glyph_node:
            cur_v = cur_v + height(p);
            cur_h = left_edge;
            synch_h;
            // Sync DVI state to TeX state
            synch_v;
            f = native_font(p);
            if (f != dvi_f) {
                ⟦659 Change font |dvi_f| to |f|⟧
            }
            dvi_out(set_glyphs);
            // width
            dvi_four(0);
            // glyph count
            dvi_two(1);
            // x-offset as fixed point
            dvi_four(0);
            // y-offset as fixed point
            dvi_four(0);
            dvi_two(native_glyph(p));
            cur_v = cur_v + depth(p);
            cur_h = left_edge;
          pic_node, pdf_node:
            save_h = dvi_h;
            save_v = dvi_v;
            cur_v = cur_v + height(p);
            pic_out(p);
            dvi_h = save_h;
            dvi_v = save_v;
            cur_v = save_v + depth(p);
            cur_h = left_edge;
          pdf_save_pos_node:
            ⟦1427 Save current position to |pdf_last_x_pos|, |pdf_last_y_pos|⟧
          othercases:
            out_what(p);
        }
    }
⟧

1427.

⟦1427 Save current position to |pdf_last_x_pos|, |pdf_last_y_pos|⟧ = ⟦
    {
        pdf_last_x_pos = cur_h + 4736286;
        pdf_last_y_pos = cur_page_height - cur_v - 4736286;
    }
⟧

1428.

⟦1428 Calculate page dimensions and margins⟧ = ⟦
    cur_h_offset = h_offset + (unity * 7227) / 100

    cur_v_offset = v_offset + (unity * 7227) / 100

    if (pdf_page_width != 0) {
        cur_page_width = pdf_page_width;
    } else {
        cur_page_width = width(p) + 2 * cur_h_offset;
    }

    if (pdf_page_height != 0) {
        cur_page_height = pdf_page_height;
    } else {
        cur_page_height = 
            height(p)
            + depth(p) + 2 * cur_v_offset
        ;
    }
⟧

1429.

⟦13 Global variables⟧ += ⟦
    // width of page being shipped
    var cur_page_width: scaled;

    // height of page being shipped
    var cur_page_height: scaled;

    // horizontal offset of page being shipped
    var cur_h_offset: scaled;

    // vertical offset of page being shipped
    var cur_v_offset: scaled;
⟧

1430.

⟦1430 Output the whatsit node |p| in an hlist⟧ = ⟦
    {
        case subtype(p) {
          native_word_node, native_word_node_AT, glyph_node:
            synch_h;
            // Sync DVI state to TeX state
            synch_v;
            f = native_font(p);
            if (f != dvi_f) {
                ⟦659 Change font |dvi_f| to |f|⟧
            }
            if (subtype(p) == glyph_node) {
                dvi_out(set_glyphs);
                dvi_four(width(p));
                // glyph count
                dvi_two(1);
                // x-offset as fixed point
                dvi_four(0);
                // y-offset as fixed point
                dvi_four(0);
                dvi_two(native_glyph(p));
                cur_h = cur_h + width(p);
            } else {
                if (subtype(p) == native_word_node_AT) {
                    if (
                        (native_length(p) > 0)
                        || (
                            native_glyph_info_ptr(p)
                            != null_ptr
                        )
                    ) {
                        dvi_out(set_text_and_glyphs);
                        len = native_length(p);
                        dvi_two(len);
                        for (k in 0 to len - 1) {
                            dvi_two(get_native_char(p, k));
                        }
                        len = make_xdv_glyph_array_data(p);
                        for (k in 0 to len - 1) {
                            dvi_out(xdv_buffer_byte(k));
                        }
                    }
                } else {
                    if (native_glyph_info_ptr(p) != null_ptr) {
                        dvi_out(set_glyphs);
                        len = make_xdv_glyph_array_data(p);
                        for (k in 0 to len - 1) {
                            dvi_out(xdv_buffer_byte(k));
                        }
                    }
                }
                cur_h = cur_h + width(p);
            }
            dvi_h = cur_h;
          pic_node, pdf_node:
            save_h = dvi_h;
            save_v = dvi_v;
            cur_v = base_line;
            edge = cur_h + width(p);
            pic_out(p);
            dvi_h = save_h;
            dvi_v = save_v;
            cur_h = edge;
            cur_v = base_line;
          pdf_save_pos_node:
            ⟦1427 Save current position to |pdf_last_x_pos|, |pdf_last_y_pos|⟧
          othercases:
            out_what(p);
        }
    }
⟧

1431. After all this preliminary shuffling, we come finally to the routines that actually send out the requested data. Let’s do \special first (it’s easier).

⟦1431 Declare procedures needed in |hlist_out|, |vlist_out|⟧ = ⟦
    function special_out(p: pointer) {
        var
          old_setting: 0 .. max_selector, // holds print 
          // selector 
          k: pool_pointer; // index into str_pool 
        
        synch_h;
        synch_v;
        doing_special = true;
        old_setting = selector;
        selector = new_string;
        show_token_list(
          link(write_tokens(p)),
          null,
          pool_size - pool_ptr,
        );
        selector = old_setting;
        str_room(1);
        if (cur_length < 256) {
            dvi_out(xxx1);
            dvi_out(cur_length);
        } else {
            dvi_out(xxx4);
            dvi_four(cur_length);
        }
        for (k in str_start_macro(str_ptr) to pool_ptr - 1) {
            dvi_out(so(str_pool[k]));
        }
        // erase the string
        pool_ptr = str_start_macro(str_ptr);
        doing_special = false;
    }
⟧

1432. To write a token list, we must run it through TEX’s scanner, expanding macros and \the and \number, etc. This might cause runaways, if a delimited macro parameter isn’t matched, and runaways would be extremely confusing since we are calling on TEX’s scanner in the middle of a \shipout command. Therefore we will put a dummy control sequence as a “stopper,” right after the token list. This control sequence is artificially defined to be \outer.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    text(end_write) = strpool!("endwrite")

    eq_level(end_write) = level_one

    eq_type(end_write) = outer_call

    equiv(end_write) = null
⟧

1433.

⟦1431 Declare procedures needed in |hlist_out|, |vlist_out|⟧ += ⟦
    function write_out(p: pointer) {
        var
          old_setting: 0 .. max_selector, // holds print 
          // selector 
          old_mode: integer, // saved mode 
          j: small_number, // write stream number
          k: integer,
          q, r: pointer, // temporary variables for list 
          // manipulation
          d: integer, // number of characters in incomplete 
          // current string
          clobbered: boolean, // system string is ok?
          runsystem_ret: integer; // return value from 
          // runsystem 
        
        ⟦1434 Expand macros in the token list and make |link(def_ref)| point to the result⟧
        old_setting = selector;
        j = write_stream(p);
        if (j == 18) {
            selector = new_string;
        } else if (write_open[j]) {
            selector = j;
        } else {
            // write to the terminal if file isn't open
            if ((j == 17) && (selector == term_and_log)) {
                selector = log_only;
            }
            print_nl(strpool!(""));
        }
        token_show(def_ref);
        print_ln;
        flush_list(def_ref);
        if (j == 18) {
            if ((tracing_online <= 0)) {
                // Show what we're doing in the log file.
                selector = log_only;
            } else {
                // Show what we're doing.
                // If the log file isn't open yet, we can 
                // only send output to the terminal. Calling 
                // open_log_file from here seems to result 
                // in bad data in the log.
                selector = term_and_log;
            }
            if (!log_opened) {
                selector = term_only;
            }
            print_nl(strpool!("runsystem("));
            for (d in 0 to cur_length - 1) {
                //  print gives up if passed str_ptr , so do 
                // it by hand.
                // N.B.: not print_char 
                print(
                  so(
                    str_pool[str_start_macro(str_ptr) + d],
                  ),
                );
            }
            print(strpool!(")..."));
            if (shellenabledp) {
                str_room(1);
                // Append a null byte to the expansion.
                append_char(0);
                clobbered = false;
                // Convert to external character set.
                for (d in 0 to cur_length - 1) {
                    if (
                        (
                            str_pool[
                              str_start_macro(str_ptr) + d,
                            ]
                            == null_code
                        )
                        && (d < cur_length - 1)
                    ) {
                        // minimal checking: NUL not allowed 
                        // in argument string of system ()
                        clobbered = true;
                    }
                }
                if (clobbered) {
                    print(strpool!("clobbered"));
                } else {
                    // We have the command. See if we're 
                    // allowed to execute it, and report in 
                    // the log. We don't check the actual 
                    // exit status of the command, or do 
                    // anything with the output.
                    if (name_of_file) {
                        libc_free(name_of_file);
                    }
                    name_of_file = xmalloc(
                      cur_length * 3 + 2,
                    );
                    k = 0;
                    for (d in 0 to cur_length - 1) {
                        append_to_name(
                          str_pool[
                            str_start_macro(str_ptr) + d,
                          ],
                        );
                    }
                    name_of_file[k + 1] = 0;
                    runsystem_ret = runsystem(
                      conststringcast(name_of_file + 1),
                    );
                    if (runsystem_ret == -1) {
                        print(
                          strpool!("quotation error in system command"),
                        );
                    } else if (runsystem_ret == 0) {
                        print(
                          strpool!("disabled (restricted)"),
                        );
                    } else if (runsystem_ret == 1) {
                        print(strpool!("executed"));
                    } else if (runsystem_ret == 2) {
                        print(
                          strpool!("executed safely (allowed)"),
                        );
                    }
                }
            } else {
                //  shellenabledp false
                print(strpool!("disabled"));
            }
            print_char(ord!("."));
            print_nl(strpool!(""));
            print_ln;
            // erase the string
            pool_ptr = str_start_macro(str_ptr);
        }
        selector = old_setting;
    }
⟧

1434. The final line of this routine is slightly subtle; at least, the author didn’t think about it until getting burnt! There is a used-up token list on the stack, namely the one that contained end_write_token . (We insert this artificial ‘\endwrite’ to prevent runaways, as explained above.) If it were not removed, and if there were numerous writes on a single page, the stack would overflow.

@define end_write_token => cs_token_flag + end_write
⟦1434 Expand macros in the token list and make |link(def_ref)| point to the result⟧ = ⟦
    q = get_avail

    info(q) = right_brace_token + ord!("}")

    r = get_avail

    link(q) = r

    info(r) = end_write_token

    ins_list(q)

    begin_token_list(write_tokens(p), write_text)

    q = get_avail

    info(q) = left_brace_token + ord!("{")

    // now we're ready to scan `\.\{$\langle\,$token 
    // list$\,\rangle$\.{\} \\endwrite}'
    ins_list(q)

    old_mode = mode

    // disable \.{\\prevdepth}, \.{\\spacefactor}, 
    // \.{\\lastskip}, \.{\\prevgraf}
    mode = 0

    cur_cs = write_loc

    q = scan_toks(false, true) // expand macros, etc.

    get_token

    if (cur_tok != end_write_token) {
        ⟦1435 Recover from an unbalanced write command⟧
    }

    mode = old_mode

    end_token_list // conserve stack space

1435.

⟦1435 Recover from an unbalanced write command⟧ = ⟦
    {
        print_err(strpool!("Unbalanced write command"));
        help2(
          strpool!("On this page there's a \\write with fewer real {'s than }'s."),
        )(
          strpool!("I can't handle that very well; good luck."),
        );
        error;
        repeat {
            get_token;
        } until (cur_tok == end_write_token);
    }
⟧

1436. The out_what procedure takes care of outputting whatsit nodes for vlist_out and hlist_out .

⟦1431 Declare procedures needed in |hlist_out|, |vlist_out|⟧ += ⟦
    function pic_out(p: pointer) {
        var
          old_setting: 0 .. max_selector, // holds print 
          // selector 
          i: integer,
          k: pool_pointer; // index into str_pool 
        
        synch_h;
        synch_v;
        old_setting = selector;
        selector = new_string;
        print(strpool!("pdf:image "));
        print(strpool!("matrix "));
        print_scaled(pic_transform1(p));
        print(ord!(" "));
        print_scaled(pic_transform2(p));
        print(ord!(" "));
        print_scaled(pic_transform3(p));
        print(ord!(" "));
        print_scaled(pic_transform4(p));
        print(ord!(" "));
        print_scaled(pic_transform5(p));
        print(ord!(" "));
        print_scaled(pic_transform6(p));
        print(ord!(" "));
        print(strpool!("page "));
        print_int(pic_page(p));
        print(ord!(" "));
        case pic_pdf_box(p) {
          pdfbox_crop:
            print(strpool!("pagebox cropbox "));
          pdfbox_media:
            print(strpool!("pagebox mediabox "));
          pdfbox_bleed:
            print(strpool!("pagebox bleedbox "));
          pdfbox_art:
            print(strpool!("pagebox artbox "));
          pdfbox_trim:
            print(strpool!("pagebox trimbox "));
          others:
            do_nothing;
        }
        print(ord!("("));
        for (i in 0 to pic_path_length(p) - 1) {
            print_visible_char(pic_path_byte(p, i));
        }
        print(ord!(")"));
        selector = old_setting;
        if (cur_length < 256) {
            dvi_out(xxx1);
            dvi_out(cur_length);
        } else {
            dvi_out(xxx4);
            dvi_four(cur_length);
        }
        for (k in str_start_macro(str_ptr) to pool_ptr - 1) {
            dvi_out(so(str_pool[k]));
        }
        // erase the string
        pool_ptr = str_start_macro(str_ptr);
    }

    function out_what(p: pointer) {
        var
          j: small_number, // write stream number
          old_setting: 0 .. max_selector;
        
        case subtype(p) {
          open_node, write_node, close_node:
            ⟦1437 Do some work that has been queued up for \.{\\write}⟧
          special_node:
            special_out(p);
          language_node:
            do_nothing;
          othercases:
            confusion(strpool!("ext4"));
        }
    }
⟧

1437. We don’t implement \write inside of leaders. (The reason is that the number of times a leader box appears might be different in different implementations, due to machine-dependent rounding in the glue calculations.)

⟦1437 Do some work that has been queued up for \.{\\write}⟧ = ⟦
    if (!doing_leaders) {
        j = write_stream(p);
        if (subtype(p) == write_node) {
            write_out(p);
        } else {
            if (write_open[j]) {
                a_close(write_file[j]);
                write_open[j] = false;
            }
            if (subtype(p) == close_node) {
                // already closed
                do_nothing;
            } else if (j < 16) {
                cur_name = open_name(p);
                cur_area = open_area(p);
                cur_ext = open_ext(p);
                if (cur_ext == strpool!("")) {
                    cur_ext = strpool!(".tex");
                }
                pack_cur_name;
                while (!
                    kpse_out_name_ok(
                      stringcast(name_of_file + 1),
                    )
                    || !a_open_out(write_file[j])
                ) {
                    prompt_file_name(
                      strpool!("output file name"),
                      strpool!(".tex"),
                    );
                }
                // If on first line of input, log file is 
                // not ready yet, so don't log.
                write_open[j] = true;
                if (log_opened && texmf_yesno("log_openout")) {
                    old_setting = selector;
                    if ((tracing_online <= 0)) {
                        // Show what we're doing in the log 
                        // file.
                        selector = log_only;
                    } else {
                        // Show what we're doing.
                        selector = term_and_log;
                    }
                    print_nl(strpool!("\\openout"));
                    print_int(j);
                    print(strpool!(" = `"));
                    print_file_name(
                      cur_name,
                      cur_area,
                      cur_ext,
                    );
                    print(strpool!("'."));
                    print_nl(strpool!(""));
                    print_ln;
                    selector = old_setting;
                }
            }
        }
    }
⟧

1438. The presence of ‘\immediate’ causes the do_extension procedure to descend to one level of recursion. Nothing happens unless \immediate is followed by ‘\openout’, ‘\write’, or ‘\closeout’.

⟦1438 Implement \.{\\immediate}⟧ = ⟦
    {
        get_x_token;
        if (
            (cur_cmd == extension)
            && (cur_chr <= close_node)
        ) {
            p = tail;
            // append a whatsit node
            do_extension;
            // do the action immediately
            out_what(tail);
            flush_node_list(tail);
            tail = p;
            link(p) = null;
        } else {
            back_input;
        }
    }
⟧

1439. The \language extension is somewhat different. We need a subroutine that comes into play when a character of a non-clang language is being appended to the current paragraph.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function fix_language() {
        var
          l: ASCII_code; // the new current language
        
        if (language <= 0) {
            l = 0;
        } else if (language > 255) {
            l = 0;
        } else {
            l = language;
        }
        if (l != clang) {
            new_whatsit(language_node, small_node_size);
            what_lang(tail) = l;
            clang = l;
            what_lhm(tail) = norm_min(left_hyphen_min);
            what_rhm(tail) = norm_min(right_hyphen_min);
        }
    }
⟧

1440.

⟦1440 Implement \.{\\setlanguage}⟧ = ⟦
    if (abs(mode) != hmode) {
        report_illegal_case;
    } else {
        new_whatsit(language_node, small_node_size);
        scan_int;
        if (cur_val <= 0) {
            clang = 0;
        } else if (cur_val > 255) {
            clang = 0;
        } else {
            clang = cur_val;
        }
        what_lang(tail) = clang;
        what_lhm(tail) = norm_min(left_hyphen_min);
        what_rhm(tail) = norm_min(right_hyphen_min);
    }
⟧

1441.

⟦1441 Finish the extensions⟧ = ⟦
    terminate_font_manager

    for (k in 0 to 15) {
        if (write_open[k]) {
            a_close(write_file[k]);
        }
    }
⟧

1442.

⟦1442 Implement \.{\\XeTeXpicfile}⟧ = ⟦
    if (abs(mode) == mmode) {
        report_illegal_case;
    } else {
        load_picture(false);
    }
⟧

1443.

⟦1443 Implement \.{\\XeTeXpdffile}⟧ = ⟦
    if (abs(mode) == mmode) {
        report_illegal_case;
    } else {
        load_picture(true);
    }
⟧

1444.

⟦1444 Implement \.{\\XeTeXglyph}⟧ = ⟦
    {
        if (abs(mode) == vmode) {
            back_input;
            new_graf(true);
        } else if (abs(mode) == mmode) {
            report_illegal_case;
        } else {
            if (is_native_font(cur_font)) {
                new_whatsit(glyph_node, glyph_node_size);
                scan_int;
                if ((cur_val < 0) || (cur_val > 65535)) {
                    print_err(strpool!("Bad glyph number"));
                    help2(
                      strpool!("A glyph number must be between 0 and 65535."),
                    )(
                      strpool!("I changed this one to zero."),
                    );
                    int_error(cur_val);
                    cur_val = 0;
                }
                native_font(tail) = cur_font;
                native_glyph(tail) = cur_val;
                set_native_glyph_metrics(
                  tail,
                  XeTeX_use_glyph_metrics,
                );
            } else {
                not_native_font_error(
                  extension,
                  glyph_code,
                  cur_font,
                );
            }
        }
    }
⟧

1445. Load a picture file and handle following keywords.

@define calc_min_and_max =>
    {
        xmin = 1000000.0;
        xmax = -xmin;
        ymin = xmin;
        ymax = xmax;
        for (i in 0 to 3) {
            if (xCoord(corners[i]) < xmin) {
                xmin = xCoord(corners[i]);
            }
            if (xCoord(corners[i]) > xmax) {
                xmax = xCoord(corners[i]);
            }
            if (yCoord(corners[i]) < ymin) {
                ymin = yCoord(corners[i]);
            }
            if (yCoord(corners[i]) > ymax) {
                ymax = yCoord(corners[i]);
            }
        }
    }
@define update_corners =>
    for (i in 0 to 3) {
        transform_point(
          addressof(corners[i]),
          addressof(t2),
        );
    }
@define do_size_requests =>
    {
        // calculate current width and height
        calc_min_and_max;
        if (x_size_req == 0.0) {
            make_scale(
              addressof(t2),
              y_size_req / (ymax - ymin),
              y_size_req / (ymax - ymin),
            );
        } else if (y_size_req == 0.0) {
            make_scale(
              addressof(t2),
              x_size_req / (xmax - xmin),
              x_size_req / (xmax - xmin),
            );
        } else {
            make_scale(
              addressof(t2),
              x_size_req / (xmax - xmin),
              y_size_req / (ymax - ymin),
            );
        }
        update_corners;
        x_size_req = 0.0;
        y_size_req = 0.0;
        transform_concat(addressof(t), addressof(t2));
    }
⟦1404 Declare procedures needed in |do_extension|⟧ += ⟦
    function load_picture(is_pdf: boolean) {
        var
          pic_path: ^char,
          bounds: real_rect,
          t, t2: transform,
          corners: array [0 .. 3] of real_point,
          x_size_req, y_size_req: real,
          check_keywords: boolean,
          xmin, xmax, ymin, ymax: real,
          i: small_number,
          page: integer,
          pdf_box_type: integer,
          result: integer;
        
        // scan the filename and pack into name_of_file 
        scan_file_name;
        pack_cur_name;
        pdf_box_type = 0;
        page = 0;
        if (is_pdf) {
            if (scan_keyword(strpool!("page"))) {
                scan_int;
                page = cur_val;
            }
            pdf_box_type = pdfbox_none;
            if (scan_keyword(strpool!("crop"))) {
                pdf_box_type = pdfbox_crop;
            } else if (scan_keyword(strpool!("media"))) {
                pdf_box_type = pdfbox_media;
            } else if (scan_keyword(strpool!("bleed"))) {
                pdf_box_type = pdfbox_bleed;
            } else if (scan_keyword(strpool!("trim"))) {
                pdf_box_type = pdfbox_trim;
            } else if (scan_keyword(strpool!("art"))) {
                pdf_box_type = pdfbox_art;
            }
            // access the picture file and check its size
        }
        if (pdf_box_type == pdfbox_none) {
            result = find_pic_file(
              addressof(pic_path),
              addressof(bounds),
              pdfbox_crop,
              page,
            );
        } else {
            result = find_pic_file(
              addressof(pic_path),
              addressof(bounds),
              pdf_box_type,
              page,
            );
        }
        setPoint(
          corners[0],
          xField(bounds),
          yField(bounds),
        );
        setPoint(
          corners[1],
          xField(corners[0]),
          yField(bounds) + htField(bounds),
        );
        setPoint(
          corners[2],
          xField(bounds) + wdField(bounds),
          yField(corners[1]),
        );
        setPoint(
          corners[3],
          xField(corners[2]),
          yField(corners[0]),
        );
        x_size_req = 0.0;
        // look for any scaling requests for this picture
        y_size_req = 0.0;
        make_identity(addressof(t));
        check_keywords = true;
        while (check_keywords) {
            if (scan_keyword(strpool!("scaled"))) {
                scan_int;
                if (
                    (x_size_req == 0.0)
                    && (y_size_req == 0.0)
                ) {
                    make_scale(
                      addressof(t2),
                      float(cur_val) / 1000.0,
                      float(cur_val) / 1000.0,
                    );
                    update_corners;
                    transform_concat(
                      addressof(t),
                      addressof(t2),
                    );
                }
            } else if (scan_keyword(strpool!("xscaled"))) {
                scan_int;
                if (
                    (x_size_req == 0.0)
                    && (y_size_req == 0.0)
                ) {
                    make_scale(
                      addressof(t2),
                      float(cur_val) / 1000.0,
                      1.0,
                    );
                    update_corners;
                    transform_concat(
                      addressof(t),
                      addressof(t2),
                    );
                }
            } else if (scan_keyword(strpool!("yscaled"))) {
                scan_int;
                if (
                    (x_size_req == 0.0)
                    && (y_size_req == 0.0)
                ) {
                    make_scale(
                      addressof(t2),
                      1.0,
                      float(cur_val) / 1000.0,
                    );
                    update_corners;
                    transform_concat(
                      addressof(t),
                      addressof(t2),
                    );
                }
            } else if (scan_keyword(strpool!("width"))) {
                scan_normal_dimen;
                if (cur_val <= 0) {
                    print_err(strpool!("Improper image "));
                    print(strpool!("size ("));
                    print_scaled(cur_val);
                    print(strpool!("pt) will be ignored"));
                    help2(
                      strpool!("I can't scale images to zero or negative sizes,"),
                    )(strpool!("so I'm ignoring this."));
                    error;
                } else {
                    x_size_req = Fix2D(cur_val);
                }
            } else if (scan_keyword(strpool!("height"))) {
                scan_normal_dimen;
                if (cur_val <= 0) {
                    print_err(strpool!("Improper image "));
                    print(strpool!("size ("));
                    print_scaled(cur_val);
                    print(strpool!("pt) will be ignored"));
                    help2(
                      strpool!("I can't scale images to zero or negative sizes,"),
                    )(strpool!("so I'm ignoring this."));
                    error;
                } else {
                    y_size_req = Fix2D(cur_val);
                }
            } else if (scan_keyword(strpool!("rotated"))) {
                scan_decimal;
                if (
                    (x_size_req != 0.0)
                    || (y_size_req != 0.0)
                ) {
                    do_size_requests;
                }
                make_rotation(
                  addressof(t2),
                  Fix2D(cur_val) * 3.141592653589793 / 180.0,
                );
                update_corners;
                calc_min_and_max;
                setPoint(corners[0], xmin, ymin);
                setPoint(corners[1], xmin, ymax);
                setPoint(corners[2], xmax, ymax);
                setPoint(corners[3], xmax, ymin);
                transform_concat(
                  addressof(t),
                  addressof(t2),
                );
            } else {
                check_keywords = false;
            }
        }
        if ((x_size_req != 0.0) || (y_size_req != 0.0)) {
            do_size_requests;
        }
        calc_min_and_max;
        make_translation(
          addressof(t2),
          -xmin * 72 / 72.27,
          -ymin * 72 / 72.27,
        );
        transform_concat(addressof(t), addressof(t2));
        if (result == 0) {
            new_whatsit(
              pic_node,
              
                  pic_node_size
                  + (
                      strlen(pic_path)
                      + sizeof(memory_word) - 1
                  )
                  div sizeof(memory_word)
              ,
            );
            if (is_pdf) {
                subtype(tail) = pdf_node;
            }
            pic_path_length(tail) = strlen(pic_path);
            pic_page(tail) = page;
            pic_pdf_box(tail) = pdf_box_type;
            width(tail) = D2Fix(xmax - xmin);
            height(tail) = D2Fix(ymax - ymin);
            depth(tail) = 0;
            pic_transform1(tail) = D2Fix(aField(t));
            pic_transform2(tail) = D2Fix(bField(t));
            pic_transform3(tail) = D2Fix(cField(t));
            pic_transform4(tail) = D2Fix(dField(t));
            pic_transform5(tail) = D2Fix(xField(t));
            pic_transform6(tail) = D2Fix(yField(t));
            memcpy(
              addressof(mem[tail + pic_node_size]),
              pic_path,
              strlen(pic_path),
            );
            libc_free(pic_path);
        } else {
            print_err(
              strpool!("Unable to load picture or PDF file '"),
            );
            print_file_name(cur_name, cur_area, cur_ext);
            print(ord!("'"));
            if (result == -43) {
                // Mac OS file not found error
                help2(
                  strpool!("The requested image couldn't be read because"),
                )(strpool!("the file was not found."));
            } else {
                // otherwise assume GraphicImport failed
                help2(
                  strpool!("The requested image couldn't be read because"),
                )(
                  strpool!("it was not a recognized image format."),
                );
            }
            error;
        }
    }
⟧

1446.

⟦1446 Implement \.{\\XeTeXinputencoding}⟧ = ⟦
    {
        // scan a filename-like arg for the input encoding
        scan_and_pack_name;
        // convert it to mode and encoding values
        i = get_encoding_mode_and_info(addressof(j));
        if (i == XeTeX_input_mode_auto) {
            print_err(
              strpool!("Encoding mode `auto' is not valid for \\XeTeXinputencoding"),
            );
            help2(
              strpool!("You can't use `auto' encoding here, only for \\XeTeXdefaultencoding."),
            )(
              strpool!("I'll ignore this and leave the current encoding unchanged."),
            );
            error;
        } else {
            // apply them to the current input file
            set_input_file_encoding(
              input_file[in_open],
              i,
              j,
            );
        }
    }
⟧

1447.

⟦1447 Implement \.{\\XeTeXdefaultencoding}⟧ = ⟦
    {
        // scan a filename-like arg for the input encoding
        scan_and_pack_name;
        // convert it to mode and encoding values
        i = get_encoding_mode_and_info(addressof(j));
        // store them as defaults for new input files
        XeTeX_default_input_mode = i;
        XeTeX_default_input_encoding = j;
    }
⟧

1448.

⟦1448 Implement \.{\\XeTeXlinebreaklocale}⟧ = ⟦
    {
        // scan a filename-like arg for the locale name
        scan_file_name;
        if (length(cur_name) == 0) {
            XeTeX_linebreak_locale = 0;
        } else {
            // we ignore the area and extension!
            XeTeX_linebreak_locale = cur_name;
        }
    }
⟧

1449.

⟦13 Global variables⟧ += ⟦
    var pdf_last_x_pos: integer;

    var pdf_last_y_pos: integer;
⟧

1450.

⟦1450 Implement \.{\\pdfsavepos}⟧ = ⟦
    {
        new_whatsit(pdf_save_pos_node, small_node_size);
    }
⟧

1451. [53a] The extended features of 𝜀-TEX. The program has two modes of operation: (1) In TEX compatibility mode it fully deserves the name TEX and there are neither extended features nor additional primitive commands. There are, however, a few modifications that would be legitimate in any implementation of TEX such as, e.g., preventing inadequate results of the glue to DVI unit conversion during ship_out . (2) In extended mode there are additional primitive commands and the extended features of 𝜀-TEX are available.

The distinction between these two modes of operation initially takes place when a ‘virgin’ eINITEX starts without reading a format file. Later on the values of all 𝜀-TEX state variables are inherited when eVIRTEX (or eINITEX) reads a format file.

The code below is designed to work for cases where ‘ 𝑖𝑛𝑖𝑡𝑡𝑖𝑛𝑖’ is a run-time switch.

⟦1451 Enable \eTeX, if requested⟧ = ⟦
    init!{
        if (
            (etex_p || (buffer[loc] == ord!("*")))
            && (format_ident == strpool!(" (INITEX)"))
        ) {
            no_new_control_sequence = false;
            ⟦1399 Generate all \eTeX\ primitives⟧
            if (buffer[loc] == ord!("*")) {
                incr(loc);
            }
            // enter extended mode
            eTeX_mode = 1;
            ⟦1624 Initialize variables for \eTeX\ extended mode⟧
        }
    }

    // just entered extended mode ?
    if (!no_new_control_sequence) {
        no_new_control_sequence = true;
    }

    else
⟧

1452. The 𝜀-TEX features available in extended mode are grouped into two categories: (1) Some of them are permanently enabled and have no semantic effect as long as none of the additional primitives are executed. (2) The remaining 𝜀-TEX features are optional and can be individually enabled and disabled. For each optional feature there is an 𝜀-TEX state variable named \...state; the feature is enabled, resp. disabled by assigning a positive, resp. non-positive value to that integer.

@define eTeX_state_base => int_base + eTeX_state_code
// an \eTeX\ state variable
@define eTeX_state(#) => eqtb[eTeX_state_base + #].int
// code for \.{\\eTeXversion}
@define eTeX_version_code => eTeX_int
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("lastnodetype"),
      last_item,
      last_node_type_code,
    )

    primitive(
      strpool!("eTeXversion"),
      last_item,
      eTeX_version_code,
    )

    primitive(
      strpool!("eTeXrevision"),
      convert,
      eTeX_revision_code,
    )

    primitive(
      strpool!("XeTeXversion"),
      last_item,
      XeTeX_version_code,
    )

    primitive(
      strpool!("XeTeXrevision"),
      convert,
      XeTeX_revision_code,
    )

    primitive(
      strpool!("XeTeXcountglyphs"),
      last_item,
      XeTeX_count_glyphs_code,
    )

    primitive(
      strpool!("XeTeXcountvariations"),
      last_item,
      XeTeX_count_variations_code,
    )

    primitive(
      strpool!("XeTeXvariation"),
      last_item,
      XeTeX_variation_code,
    )

    primitive(
      strpool!("XeTeXfindvariationbyname"),
      last_item,
      XeTeX_find_variation_by_name_code,
    )

    primitive(
      strpool!("XeTeXvariationmin"),
      last_item,
      XeTeX_variation_min_code,
    )

    primitive(
      strpool!("XeTeXvariationmax"),
      last_item,
      XeTeX_variation_max_code,
    )

    primitive(
      strpool!("XeTeXvariationdefault"),
      last_item,
      XeTeX_variation_default_code,
    )

    primitive(
      strpool!("XeTeXcountfeatures"),
      last_item,
      XeTeX_count_features_code,
    )

    primitive(
      strpool!("XeTeXfeaturecode"),
      last_item,
      XeTeX_feature_code_code,
    )

    primitive(
      strpool!("XeTeXfindfeaturebyname"),
      last_item,
      XeTeX_find_feature_by_name_code,
    )

    primitive(
      strpool!("XeTeXisexclusivefeature"),
      last_item,
      XeTeX_is_exclusive_feature_code,
    )

    primitive(
      strpool!("XeTeXcountselectors"),
      last_item,
      XeTeX_count_selectors_code,
    )

    primitive(
      strpool!("XeTeXselectorcode"),
      last_item,
      XeTeX_selector_code_code,
    )

    primitive(
      strpool!("XeTeXfindselectorbyname"),
      last_item,
      XeTeX_find_selector_by_name_code,
    )

    primitive(
      strpool!("XeTeXisdefaultselector"),
      last_item,
      XeTeX_is_default_selector_code,
    )

    primitive(
      strpool!("XeTeXvariationname"),
      convert,
      XeTeX_variation_name_code,
    )

    primitive(
      strpool!("XeTeXfeaturename"),
      convert,
      XeTeX_feature_name_code,
    )

    primitive(
      strpool!("XeTeXselectorname"),
      convert,
      XeTeX_selector_name_code,
    )

    primitive(
      strpool!("XeTeXOTcountscripts"),
      last_item,
      XeTeX_OT_count_scripts_code,
    )

    primitive(
      strpool!("XeTeXOTcountlanguages"),
      last_item,
      XeTeX_OT_count_languages_code,
    )

    primitive(
      strpool!("XeTeXOTcountfeatures"),
      last_item,
      XeTeX_OT_count_features_code,
    )

    primitive(
      strpool!("XeTeXOTscripttag"),
      last_item,
      XeTeX_OT_script_code,
    )

    primitive(
      strpool!("XeTeXOTlanguagetag"),
      last_item,
      XeTeX_OT_language_code,
    )

    primitive(
      strpool!("XeTeXOTfeaturetag"),
      last_item,
      XeTeX_OT_feature_code,
    )

    primitive(
      strpool!("XeTeXcharglyph"),
      last_item,
      XeTeX_map_char_to_glyph_code,
    )

    primitive(
      strpool!("XeTeXglyphindex"),
      last_item,
      XeTeX_glyph_index_code,
    )

    primitive(
      strpool!("XeTeXglyphbounds"),
      last_item,
      XeTeX_glyph_bounds_code,
    )

    primitive(
      strpool!("XeTeXglyphname"),
      convert,
      XeTeX_glyph_name_code,
    )

    primitive(
      strpool!("XeTeXfonttype"),
      last_item,
      XeTeX_font_type_code,
    )

    primitive(
      strpool!("XeTeXfirstfontchar"),
      last_item,
      XeTeX_first_char_code,
    )

    primitive(
      strpool!("XeTeXlastfontchar"),
      last_item,
      XeTeX_last_char_code,
    )

    primitive(
      strpool!("XeTeXpdfpagecount"),
      last_item,
      XeTeX_pdf_page_count_code,
    )
⟧

1453.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ = ⟦
    last_node_type_code:

    print_esc(strpool!("lastnodetype"))

    eTeX_version_code:

    print_esc(strpool!("eTeXversion"))

    XeTeX_version_code:

    print_esc(strpool!("XeTeXversion"))

    XeTeX_count_glyphs_code:

    print_esc(strpool!("XeTeXcountglyphs"))

    XeTeX_count_variations_code:

    print_esc(strpool!("XeTeXcountvariations"))

    XeTeX_variation_code:

    print_esc(strpool!("XeTeXvariation"))

    XeTeX_find_variation_by_name_code:

    print_esc(strpool!("XeTeXfindvariationbyname"))

    XeTeX_variation_min_code:

    print_esc(strpool!("XeTeXvariationmin"))

    XeTeX_variation_max_code:

    print_esc(strpool!("XeTeXvariationmax"))

    XeTeX_variation_default_code:

    print_esc(strpool!("XeTeXvariationdefault"))

    XeTeX_count_features_code:

    print_esc(strpool!("XeTeXcountfeatures"))

    XeTeX_feature_code_code:

    print_esc(strpool!("XeTeXfeaturecode"))

    XeTeX_find_feature_by_name_code:

    print_esc(strpool!("XeTeXfindfeaturebyname"))

    XeTeX_is_exclusive_feature_code:

    print_esc(strpool!("XeTeXisexclusivefeature"))

    XeTeX_count_selectors_code:

    print_esc(strpool!("XeTeXcountselectors"))

    XeTeX_selector_code_code:

    print_esc(strpool!("XeTeXselectorcode"))

    XeTeX_find_selector_by_name_code:

    print_esc(strpool!("XeTeXfindselectorbyname"))

    XeTeX_is_default_selector_code:

    print_esc(strpool!("XeTeXisdefaultselector"))

    XeTeX_OT_count_scripts_code:

    print_esc(strpool!("XeTeXOTcountscripts"))

    XeTeX_OT_count_languages_code:

    print_esc(strpool!("XeTeXOTcountlanguages"))

    XeTeX_OT_count_features_code:

    print_esc(strpool!("XeTeXOTcountfeatures"))

    XeTeX_OT_script_code:

    print_esc(strpool!("XeTeXOTscripttag"))

    XeTeX_OT_language_code:

    print_esc(strpool!("XeTeXOTlanguagetag"))

    XeTeX_OT_feature_code:

    print_esc(strpool!("XeTeXOTfeaturetag"))

    XeTeX_map_char_to_glyph_code:

    print_esc(strpool!("XeTeXcharglyph"))

    XeTeX_glyph_index_code:

    print_esc(strpool!("XeTeXglyphindex"))

    XeTeX_glyph_bounds_code:

    print_esc(strpool!("XeTeXglyphbounds"))

    XeTeX_font_type_code:

    print_esc(strpool!("XeTeXfonttype"))

    XeTeX_first_char_code:

    print_esc(strpool!("XeTeXfirstfontchar"))

    XeTeX_last_char_code:

    print_esc(strpool!("XeTeXlastfontchar"))

    XeTeX_pdf_page_count_code:

    print_esc(strpool!("XeTeXpdfpagecount"))
⟧

1454.

⟦1454 Cases for fetching an integer value⟧ = ⟦
    eTeX_version_code:

    cur_val = eTeX_version

    XeTeX_version_code:

    cur_val = XeTeX_version

    XeTeX_count_glyphs_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_aat_font(n)) {
            cur_val = aat_font_get(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else if (is_otgr_font(n)) {
            cur_val = ot_font_get(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else {
            cur_val = 0;
        }
    }

    XeTeX_count_features_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_aat_font(n)) {
            cur_val = aat_font_get(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else if (is_gr_font(n)) {
            cur_val = ot_font_get(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else {
            cur_val = 0;
        }
    }

    
    XeTeX_variation_code,
    XeTeX_variation_min_code,
    XeTeX_variation_max_code,
    XeTeX_variation_default_code,
    XeTeX_count_variations_code:
      scan_font_ident;
      n = cur_val;
      // Deprecated
      cur_val = 0;

    
    XeTeX_feature_code_code,
    XeTeX_is_exclusive_feature_code,
    XeTeX_count_selectors_code:
      scan_font_ident;
      n = cur_val;
      if (is_aat_font(n)) {
          scan_int;
          k = cur_val;
          cur_val = aat_font_get_1(
            m - XeTeX_int,
            font_layout_engine[n],
            k,
          );
      } else if (is_gr_font(n)) {
          scan_int;
          k = cur_val;
          cur_val = ot_font_get_1(
            m - XeTeX_int,
            font_layout_engine[n],
            k,
          );
      } else {
          not_aat_gr_font_error(last_item, m, n);
          cur_val = -1;
      }

    
    XeTeX_selector_code_code,
    XeTeX_is_default_selector_code:
      scan_font_ident;
      n = cur_val;
      if (is_aat_font(n)) {
          scan_int;
          k = cur_val;
          scan_int;
          cur_val = aat_font_get_2(
            m - XeTeX_int,
            font_layout_engine[n],
            k,
            cur_val,
          );
      } else if (is_gr_font(n)) {
          scan_int;
          k = cur_val;
          scan_int;
          cur_val = ot_font_get_2(
            m - XeTeX_int,
            font_layout_engine[n],
            k,
            cur_val,
          );
      } else {
          not_aat_gr_font_error(last_item, m, n);
          cur_val = -1;
      }

    XeTeX_find_variation_by_name_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_aat_font(n)) {
            scan_and_pack_name;
            cur_val = aat_font_get_named(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else {
            not_aat_font_error(last_item, m, n);
            cur_val = -1;
        }
    }

    XeTeX_find_feature_by_name_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_aat_font(n)) {
            scan_and_pack_name;
            cur_val = aat_font_get_named(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else if (is_gr_font(n)) {
            scan_and_pack_name;
            cur_val = gr_font_get_named(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else {
            not_aat_gr_font_error(last_item, m, n);
            cur_val = -1;
        }
    }

    XeTeX_find_selector_by_name_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_aat_font(n)) {
            scan_int;
            k = cur_val;
            scan_and_pack_name;
            cur_val = aat_font_get_named_1(
              m - XeTeX_int,
              font_layout_engine[n],
              k,
            );
        } else if (is_gr_font(n)) {
            scan_int;
            k = cur_val;
            scan_and_pack_name;
            cur_val = gr_font_get_named_1(
              m - XeTeX_int,
              font_layout_engine[n],
              k,
            );
        } else {
            not_aat_gr_font_error(last_item, m, n);
            cur_val = -1;
        }
    }

    XeTeX_OT_count_scripts_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_ot_font(n)) {
            cur_val = ot_font_get(
              m - XeTeX_int,
              font_layout_engine[n],
            );
        } else {
            cur_val = 0;
        }
    }

    XeTeX_OT_count_languages_code, XeTeX_OT_script_code:
      scan_font_ident;
      n = cur_val;
      if (is_ot_font(n)) {
          scan_int;
          cur_val = ot_font_get_1(
            m - XeTeX_int,
            font_layout_engine[n],
            cur_val,
          );
      } else {
          not_ot_font_error(last_item, m, n);
          cur_val = -1;
      }

    XeTeX_OT_count_features_code, XeTeX_OT_language_code:
      scan_font_ident;
      n = cur_val;
      if (is_ot_font(n)) {
          scan_int;
          k = cur_val;
          scan_int;
          cur_val = ot_font_get_2(
            m - XeTeX_int,
            font_layout_engine[n],
            k,
            cur_val,
          );
      } else {
          not_ot_font_error(last_item, m, n);
          cur_val = -1;
      }

    XeTeX_OT_feature_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_ot_font(n)) {
            scan_int;
            k = cur_val;
            scan_int;
            kk = cur_val;
            scan_int;
            cur_val = ot_font_get_3(
              m - XeTeX_int,
              font_layout_engine[n],
              k,
              kk,
              cur_val,
            );
        } else {
            not_ot_font_error(last_item, m, n);
            cur_val = -1;
        }
    }

    XeTeX_map_char_to_glyph_code:

    {
        if (is_native_font(cur_font)) {
            scan_int;
            n = cur_val;
            cur_val = map_char_to_glyph(cur_font, n);
        } else {
            not_native_font_error(last_item, m, cur_font);
            cur_val = 0;
        }
    }

    XeTeX_glyph_index_code:

    {
        if (is_native_font(cur_font)) {
            scan_and_pack_name;
            cur_val = map_glyph_to_index(cur_font);
        } else {
            not_native_font_error(last_item, m, cur_font);
            cur_val = 0;
        }
    }

    XeTeX_font_type_code:

    {
        scan_font_ident;
        n = cur_val;
        if (is_aat_font(n)) {
            cur_val = 1;
        } else if (is_ot_font(n)) {
            cur_val = 2;
        } else if (is_gr_font(n)) {
            cur_val = 3;
        } else {
            cur_val = 0;
        }
    }

    XeTeX_first_char_code, XeTeX_last_char_code:
      scan_font_ident;
      n = cur_val;
      if (is_native_font(n)) {
          cur_val = get_font_char_range(
            n,
            m == XeTeX_first_char_code,
          );
      } else {
          if (m == XeTeX_first_char_code) {
              cur_val = font_bc[n];
          } else {
              cur_val = font_ec[n];
          }
      }

    pdf_last_x_pos_code:

    cur_val = pdf_last_x_pos

    pdf_last_y_pos_code:

    cur_val = pdf_last_y_pos

    XeTeX_pdf_page_count_code:

    {
        scan_and_pack_name;
        cur_val = count_pdf_file_pages;
    }
⟧

1455. Slip in an extra procedure here and there....

⟦82 Error handling procedures⟧ += ⟦
    forward_declaration scan_and_pack_name();
⟧

1456.

⟦1404 Declare procedures needed in |do_extension|⟧ += ⟦
    function scan_and_pack_name() {
        scan_file_name;
        pack_cur_name;
    }
⟧

1457.

⟦328 Declare the procedure called |print_cmd_chr|⟧ += ⟦
    function not_aat_font_error(
      cmd, c: integer,
      f: integer,
    ) {
        print_err(strpool!("Cannot use "));
        print_cmd_chr(cmd, c);
        print(strpool!(" with "));
        print(font_name[f]);
        print(strpool!("; not an AAT font"));
        error;
    }

    function not_aat_gr_font_error(
      cmd, c: integer,
      f: integer,
    ) {
        print_err(strpool!("Cannot use "));
        print_cmd_chr(cmd, c);
        print(strpool!(" with "));
        print(font_name[f]);
        print(strpool!("; not an AAT or Graphite font"));
        error;
    }

    function not_ot_font_error(
      cmd, c: integer,
      f: integer,
    ) {
        print_err(strpool!("Cannot use "));
        print_cmd_chr(cmd, c);
        print(strpool!(" with "));
        print(font_name[f]);
        print(strpool!("; not an OpenType Layout font"));
        error;
    }

    function not_native_font_error(
      cmd, c: integer,
      f: integer,
    ) {
        print_err(strpool!("Cannot use "));
        print_cmd_chr(cmd, c);
        print(strpool!(" with "));
        print(font_name[f]);
        print(strpool!("; not a native platform font"));
        error;
    }
⟧

1458.

⟦1458 Cases for fetching a dimension value⟧ = ⟦
    XeTeX_glyph_bounds_code:

    {
        if (is_native_font(cur_font)) {
            scan_int;
            // which edge: 1=left, 2=top, 3=right, 4=bottom
            n = cur_val;
            if ((n < 1) || (n > 4)) {
                print_err(
                  strpool!("\\\\XeTeXglyphbounds requires an edge index from 1 to 4;"),
                );
                print_nl(
                  strpool!("I don't know anything about edge "),
                );
                print_int(n);
                error;
                cur_val = 0;
            } else {
                // glyph number
                scan_int;
                cur_val = get_glyph_bounds(
                  cur_font,
                  n,
                  cur_val,
                );
            }
        } else {
            not_native_font_error(last_item, m, cur_font);
            cur_val = 0;
        }
    }
⟧

1459.

⟦1459 Cases of |convert| for |print_cmd_chr|⟧ = ⟦
    XeTeX_revision_code:

    print_esc(strpool!("XeTeXrevision"))

    XeTeX_variation_name_code:

    print_esc(strpool!("XeTeXvariationname"))

    XeTeX_feature_name_code:

    print_esc(strpool!("XeTeXfeaturename"))

    XeTeX_selector_name_code:

    print_esc(strpool!("XeTeXselectorname"))

    XeTeX_glyph_name_code:

    print_esc(strpool!("XeTeXglyphname"))

    XeTeX_Uchar_code:

    print_esc(strpool!("Uchar"))

    XeTeX_Ucharcat_code:

    print_esc(strpool!("Ucharcat"))
⟧

1460.

⟦1460 Cases of `Scan the argument for command |c|'⟧ = ⟦
    var XeTeX_revision_code: do_nothing;

    XeTeX_variation_name_code:

    {
        scan_font_ident;
        fnt = cur_val;
        if (is_aat_font(fnt)) {
            scan_int;
            arg1 = cur_val;
            arg2 = 0;
        } else {
            not_aat_font_error(convert, c, fnt);
        }
    }

    XeTeX_feature_name_code:

    {
        scan_font_ident;
        fnt = cur_val;
        if (is_aat_font(fnt) || is_gr_font(fnt)) {
            scan_int;
            arg1 = cur_val;
            arg2 = 0;
        } else {
            not_aat_gr_font_error(convert, c, fnt);
        }
    }

    XeTeX_selector_name_code:

    {
        scan_font_ident;
        fnt = cur_val;
        if (is_aat_font(fnt) || is_gr_font(fnt)) {
            scan_int;
            arg1 = cur_val;
            scan_int;
            arg2 = cur_val;
        } else {
            not_aat_gr_font_error(convert, c, fnt);
        }
    }

    XeTeX_glyph_name_code:

    {
        scan_font_ident;
        fnt = cur_val;
        if (is_native_font(fnt)) {
            scan_int;
            arg1 = cur_val;
        } else {
            not_native_font_error(convert, c, fnt);
        }
    }
⟧

1461.

⟦1461 Cases of `Print the result of command |c|'⟧ = ⟦
    XeTeX_revision_code:

    print(XeTeX_revision)

    XeTeX_variation_name_code:

    if (is_aat_font(fnt)) {
        aat_print_font_name(
          c,
          font_layout_engine[fnt],
          arg1,
          arg2,
        );
    }

    XeTeX_feature_name_code, XeTeX_selector_name_code:
      if (is_aat_font(fnt)) {
          aat_print_font_name(
            c,
            font_layout_engine[fnt],
            arg1,
            arg2,
          );
      } else if (is_gr_font(fnt)) {
          gr_print_font_name(
            c,
            font_layout_engine[fnt],
            arg1,
            arg2,
          );
      }

    XeTeX_glyph_name_code:

    if (is_native_font(fnt)) {
        print_glyph_name(fnt, arg1);
    }
⟧

1462.

@define eTeX_ex => (eTeX_mode == 1) // is this extended 
// mode?
⟦13 Global variables⟧ += ⟦
    // identifies compatibility and extended mode
    var eTeX_mode: 0 .. 1;

    // was the -etex option specified
    var etex_p: boolean;
⟧

1463.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    eTeX_mode = 0 // initially we are in compatibility mode

    ⟦1623 Initialize variables for \eTeX\ compatibility mode⟧

1464.

⟦1464 Dump the \eTeX\ state⟧ = ⟦
    // in a deliberate change from e-TeX, we allow non-zero 
    // state variables to be dumped
    dump_int(eTeX_mode)
⟧

1465.

⟦1465 Undump the \eTeX\ state⟧ = ⟦
    undump(0)(1)(eTeX_mode)

    if (eTeX_ex) {
        ⟦1624 Initialize variables for \eTeX\ extended mode⟧
    } else {
        ⟦1623 Initialize variables for \eTeX\ compatibility mode⟧
    }
⟧

1466. The eTeX_enabled function simply returns its first argument as result. This argument is true if an optional 𝜀-TEX feature is currently enabled; otherwise, if the argument is false , the function gives an error message.

⟦1466 Declare \eTeX\ procedures for use by |main_control|⟧ = ⟦
    function eTeX_enabled(
      b: boolean,
      j: quarterword,
      k: halfword,
    ): boolean {
        if (!b) {
            print_err(strpool!("Improper "));
            print_cmd_chr(j, k);
            help1(
              strpool!("Sorry, this optional e-TeX feature has been disabled."),
            );
            error;
        }
        eTeX_enabled = b;
    }
⟧

1467. First we implement the additional 𝜀-TEX parameters in the table of equivalents.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("everyeof"),
      assign_toks,
      every_eof_loc,
    )

    primitive(
      strpool!("tracingassigns"),
      assign_int,
      int_base + tracing_assigns_code,
    )

    primitive(
      strpool!("tracinggroups"),
      assign_int,
      int_base + tracing_groups_code,
    )

    primitive(
      strpool!("tracingifs"),
      assign_int,
      int_base + tracing_ifs_code,
    )

    primitive(
      strpool!("tracingscantokens"),
      assign_int,
      int_base + tracing_scan_tokens_code,
    )

    primitive(
      strpool!("tracingnesting"),
      assign_int,
      int_base + tracing_nesting_code,
    )

    primitive(
      strpool!("predisplaydirection"),
      assign_int,
      int_base + pre_display_direction_code,
    )

    primitive(
      strpool!("lastlinefit"),
      assign_int,
      int_base + last_line_fit_code,
    )

    primitive(
      strpool!("savingvdiscards"),
      assign_int,
      int_base + saving_vdiscards_code,
    )

    primitive(
      strpool!("savinghyphcodes"),
      assign_int,
      int_base + saving_hyph_codes_code,
    )
⟧

1468.

@define every_eof => equiv(every_eof_loc)
⟦1468 Cases of |assign_toks| for |print_cmd_chr|⟧ = ⟦
    every_eof_loc:

    print_esc(strpool!("everyeof"))

    XeTeX_inter_char_loc:

    print_esc(strpool!("XeTeXinterchartoks"))
⟧

1469.

⟦1469 Cases for |print_param|⟧ = ⟦
    tracing_assigns_code:

    print_esc(strpool!("tracingassigns"))

    tracing_groups_code:

    print_esc(strpool!("tracinggroups"))

    tracing_ifs_code:

    print_esc(strpool!("tracingifs"))

    tracing_scan_tokens_code:

    print_esc(strpool!("tracingscantokens"))

    tracing_nesting_code:

    print_esc(strpool!("tracingnesting"))

    pre_display_direction_code:

    print_esc(strpool!("predisplaydirection"))

    last_line_fit_code:

    print_esc(strpool!("lastlinefit"))

    saving_vdiscards_code:

    print_esc(strpool!("savingvdiscards"))

    saving_hyph_codes_code:

    print_esc(strpool!("savinghyphcodes"))
⟧

1470. In order to handle \everyeof we need an array eof_seen of boolean variables.

⟦13 Global variables⟧ += ⟦
    // has eof been seen?
    var eof_seen: ^boolean;
⟧

1471. The print_group procedure prints the current level of grouping and the name corresponding to cur_group .

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    function print_group(e: boolean) {
        label exit;
        
        case cur_group {
          bottom_level:
            print(strpool!("bottom level"));
            return;
          simple_group, semi_simple_group:
            if (cur_group == semi_simple_group) {
                print(strpool!("semi "));
            }
            print(strpool!("simple"));
          hbox_group, adjusted_hbox_group:
            if (cur_group == adjusted_hbox_group) {
                print(strpool!("adjusted "));
            }
            print(strpool!("hbox"));
          vbox_group:
            print(strpool!("vbox"));
          vtop_group:
            print(strpool!("vtop"));
          align_group, no_align_group:
            if (cur_group == no_align_group) {
                print(strpool!("no "));
            }
            print(strpool!("align"));
          output_group:
            print(strpool!("output"));
          disc_group:
            print(strpool!("disc"));
          insert_group:
            print(strpool!("insert"));
          vcenter_group:
            print(strpool!("vcenter"));
          math_group,
          math_choice_group,
          math_shift_group,
          math_left_group:
            print(strpool!("math"));
            if (cur_group == math_choice_group) {
                print(strpool!(" choice"));
            } else if (cur_group == math_shift_group) {
                print(strpool!(" shift"));
            } else if (cur_group == math_left_group) {
                print(strpool!(" left"));
            }
          // there are no other cases
        }
        print(strpool!(" group (level "));
        print_int(qo(cur_level));
        print_char(ord!(")"));
        if (saved(-1) != 0) {
            if (e) {
                print(strpool!(" entered at line "));
            } else {
                print(strpool!(" at line "));
            }
            print_int(saved(-1));
        }
      exit:
    }
⟧

1472. The group_trace procedure is called when a new level of grouping begins (e == false ) or ends (e == true ) with saved(-1) containing the line number.

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    stat!{
        function group_trace(e: boolean) {
            begin_diagnostic;
            print_char(ord!("{"));
            if (e) {
                print(strpool!("leaving "));
            } else {
                print(strpool!("entering "));
            }
            print_group(e);
            print_char(ord!("}"));
            end_diagnostic(false);
        }
    }
⟧

1473. The \currentgrouplevel and \currentgrouptype commands return the current level of grouping and the type of the current group respectively.

// code for \.{\\currentgrouplevel}
@define current_group_level_code => eTeX_int + 1
// code for \.{\\currentgrouptype}
@define current_group_type_code => eTeX_int + 2
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("currentgrouplevel"),
      last_item,
      current_group_level_code,
    )

    primitive(
      strpool!("currentgrouptype"),
      last_item,
      current_group_type_code,
    )
⟧

1474.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    current_group_level_code:

    print_esc(strpool!("currentgrouplevel"))

    current_group_type_code:

    print_esc(strpool!("currentgrouptype"))
⟧

1475.

⟦1454 Cases for fetching an integer value⟧ += ⟦
    current_group_level_code:

    cur_val = cur_level - level_one

    current_group_type_code:

    cur_val = cur_group
⟧

1476. The \currentiflevel, \currentiftype, and \currentifbranch commands return the current level of conditionals and the type and branch of the current conditional.

// code for \.{\\currentiflevel}
@define current_if_level_code => eTeX_int + 3
// code for \.{\\currentiftype}
@define current_if_type_code => eTeX_int + 4
// code for \.{\\currentifbranch}
@define current_if_branch_code => eTeX_int + 5
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("currentiflevel"),
      last_item,
      current_if_level_code,
    )

    primitive(
      strpool!("currentiftype"),
      last_item,
      current_if_type_code,
    )

    primitive(
      strpool!("currentifbranch"),
      last_item,
      current_if_branch_code,
    )
⟧

1477.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    current_if_level_code:

    print_esc(strpool!("currentiflevel"))

    current_if_type_code:

    print_esc(strpool!("currentiftype"))

    current_if_branch_code:

    print_esc(strpool!("currentifbranch"))
⟧

1478.

⟦1454 Cases for fetching an integer value⟧ += ⟦
    current_if_level_code:

    {
        q = cond_ptr;
        cur_val = 0;
        while (q != null) {
            incr(cur_val);
            q = link(q);
        }
    }

    current_if_type_code:

    if (cond_ptr == null) {
        cur_val = 0;
    } else if (cur_if < unless_code) {
        cur_val = cur_if + 1;
    } else {
        cur_val = -(cur_if - unless_code + 1);
    }

    current_if_branch_code:

    if ((if_limit == or_code) || (if_limit == else_code)) {
        cur_val = 1;
    } else if (if_limit == fi_code) {
        cur_val = -1;
    } else {
        cur_val = 0;
    }
⟧

1479. The \fontcharwd, \fontcharht, \fontchardp, and \fontcharic commands return information about a character in a font.

// code for \.{\\fontcharwd}
@define font_char_wd_code => eTeX_dim
// code for \.{\\fontcharht}
@define font_char_ht_code => eTeX_dim + 1
// code for \.{\\fontchardp}
@define font_char_dp_code => eTeX_dim + 2
// code for \.{\\fontcharic}
@define font_char_ic_code => eTeX_dim + 3
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("fontcharwd"),
      last_item,
      font_char_wd_code,
    )

    primitive(
      strpool!("fontcharht"),
      last_item,
      font_char_ht_code,
    )

    primitive(
      strpool!("fontchardp"),
      last_item,
      font_char_dp_code,
    )

    primitive(
      strpool!("fontcharic"),
      last_item,
      font_char_ic_code,
    )
⟧

1480.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    font_char_wd_code:

    print_esc(strpool!("fontcharwd"))

    font_char_ht_code:

    print_esc(strpool!("fontcharht"))

    font_char_dp_code:

    print_esc(strpool!("fontchardp"))

    font_char_ic_code:

    print_esc(strpool!("fontcharic"))
⟧

1481.

⟦1458 Cases for fetching a dimension value⟧ += ⟦
    font_char_wd_code,
    font_char_ht_code,
    font_char_dp_code,
    font_char_ic_code:
      scan_font_ident;
      q = cur_val;
      scan_usv_num;
      if (is_native_font(q)) {
          case m {
            font_char_wd_code:
              cur_val = getnativecharwd(q, cur_val);
            font_char_ht_code:
              cur_val = getnativecharht(q, cur_val);
            font_char_dp_code:
              cur_val = getnativechardp(q, cur_val);
            font_char_ic_code:
              cur_val = getnativecharic(q, cur_val);// there 
            // are no other cases
          }
      } else {
          if (
              (font_bc[q] <= cur_val)
              && (font_ec[q] >= cur_val)
          ) {
              i = char_info(q)(qi(cur_val));
              case m {
                font_char_wd_code:
                  cur_val = char_width(q)(i);
                font_char_ht_code:
                  cur_val = char_height(q)(height_depth(i));
                font_char_dp_code:
                  cur_val = char_depth(q)(height_depth(i));
                font_char_ic_code:
                  cur_val = char_italic(q)(i);// there are 
                // no other cases
              }
          } else {
              cur_val = 0;
          }
      }
⟧

1482. The \parshapedimen, \parshapeindent, and \parshapelength commands return the indent and length parameters of the current \parshape specification.

// code for \.{\\parshapelength}
@define par_shape_length_code => eTeX_dim + 4
// code for \.{\\parshapeindent}
@define par_shape_indent_code => eTeX_dim + 5
// code for \.{\\parshapedimen}
@define par_shape_dimen_code => eTeX_dim + 6
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("parshapelength"),
      last_item,
      par_shape_length_code,
    )

    primitive(
      strpool!("parshapeindent"),
      last_item,
      par_shape_indent_code,
    )

    primitive(
      strpool!("parshapedimen"),
      last_item,
      par_shape_dimen_code,
    )
⟧

1483.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    par_shape_length_code:

    print_esc(strpool!("parshapelength"))

    par_shape_indent_code:

    print_esc(strpool!("parshapeindent"))

    par_shape_dimen_code:

    print_esc(strpool!("parshapedimen"))
⟧

1484.

⟦1458 Cases for fetching a dimension value⟧ += ⟦
    par_shape_length_code,
    par_shape_indent_code,
    par_shape_dimen_code:
      q = cur_chr - par_shape_length_code;
      scan_int;
      if ((par_shape_ptr == null) || (cur_val <= 0)) {
          cur_val = 0;
      } else {
          if (q == 2) {
              q = cur_val % 2;
              cur_val = (cur_val + q) div 2;
          }
          if (cur_val > info(par_shape_ptr)) {
              cur_val = info(par_shape_ptr);
          }
          cur_val = mem[par_shape_ptr + 2 * cur_val - q].sc;
      }
      cur_val_level = dimen_val;
⟧

1485. The \showgroups command displays all currently active grouping levels.

@define show_groups => 4 // \.{\\showgroups}
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("showgroups"), xray, show_groups)
⟧

1486.

⟦1486 Cases of |xray| for |print_cmd_chr|⟧ = ⟦
    show_groups:

    print_esc(strpool!("showgroups"))
⟧

1487.

⟦1487 Cases for |show_whatever|⟧ = ⟦
    show_groups:

    {
        begin_diagnostic;
        show_save_groups;
    }
⟧

1488.

⟦18 Types in the outer block⟧ += ⟦
    // index into save_stack 
    type save_pointer = 0 .. save_size;
⟧

1489. The modifications of TEX required for the display produced by the show_save_groups procedure were first discussed by Donald E. Knuth in TUGboat 11, 165–170 and 499–511, 1990.

In order to understand a group type we also have to know its mode. Since unrestricted horizontal modes are not associated with grouping, they are skipped when traversing the semantic nest.

⟦1466 Declare \eTeX\ procedures for use by |main_control|⟧ += ⟦
    function show_save_groups() {
        label found1, found2, found, done;
        var
          p: 0 .. nest_size, // index into nest 
          m: -mmode .. mmode, // mode
          v: save_pointer, // saved value of save_ptr 
          l: quarterword, // saved value of cur_level 
          c: group_code, // saved value of cur_group 
          a: -1 .. 1, // to keep track of alignments
          i: integer,
          j: quarterword,
          s: str_number;
        
        p = nest_ptr;
        // put the top level into the array
        nest[p] = cur_list;
        v = save_ptr;
        l = cur_level;
        c = cur_group;
        save_ptr = cur_boundary;
        decr(cur_level);
        a = 1;
        print_nl(strpool!(""));
        print_ln;
        loop {
            print_nl(strpool!("### "));
            print_group(true);
            if (cur_group == bottom_level) {
                goto done;
            }
            repeat {
                m = nest[p].mode_field;
                if (p > 0) {
                    decr(p);
                } else {
                    m = vmode;
                }
            } until (m != hmode);
            print(strpool!(" ("));
            case cur_group {
              simple_group:
                incr(p);
                goto found2;
              hbox_group, adjusted_hbox_group:
                s = strpool!("hbox");
              vbox_group:
                s = strpool!("vbox");
              vtop_group:
                s = strpool!("vtop");
              align_group:
                if (a == 0) {
                    if (m == -vmode) {
                        s = strpool!("halign");
                    } else {
                        s = strpool!("valign");
                    }
                    a = 1;
                    goto found1;
                } else {
                    if (a == 1) {
                        print(strpool!("align entry"));
                    } else {
                        print_esc(strpool!("cr"));
                    }
                    if (p >= a) {
                        p = p - a;
                    }
                    a = 0;
                    goto found;
                }
              no_align_group:
                incr(p);
                a = -1;
                print_esc(strpool!("noalign"));
                goto found2;
              output_group:
                print_esc(strpool!("output"));
                goto found;
              math_group:
                goto found2;
              disc_group, math_choice_group:
                if (cur_group == disc_group) {
                    print_esc(strpool!("discretionary"));
                } else {
                    print_esc(strpool!("mathchoice"));
                }
                for (i in 1 to 3) {
                    if (i <= saved(-2)) {
                        print(strpool!("{}"));
                    }
                }
                goto found2;
              insert_group:
                if (saved(-2) == 255) {
                    print_esc(strpool!("vadjust"));
                } else {
                    print_esc(strpool!("insert"));
                    print_int(saved(-2));
                }
                goto found2;
              vcenter_group:
                s = strpool!("vcenter");
                goto found1;
              semi_simple_group:
                incr(p);
                print_esc(strpool!("begingroup"));
                goto found;
              math_shift_group:
                if (m == mmode) {
                    print_char(ord!("$"));
                } else if (nest[p].mode_field == mmode) {
                    print_cmd_chr(eq_no, saved(-2));
                    goto found;
                }
                print_char(ord!("$"));
                goto found;
              math_left_group:
                if (
                    type(nest[p + 1].eTeX_aux_field)
                    == left_noad
                ) {
                    print_esc(strpool!("left"));
                } else {
                    print_esc(strpool!("middle"));
                }
                goto found;
              // there are no other cases
            }
            ⟦1491 Show the box context⟧
          found1:
            print_esc(s);
            ⟦1490 Show the box packaging info⟧
          found2:
            print_char(ord!("{"));
          found:
            print_char(ord!(")"));
            decr(cur_level);
            cur_group = save_level(save_ptr);
            save_ptr = save_index(save_ptr);
        }
      done:
        save_ptr = v;
        cur_level = l;
        cur_group = c;
    }
⟧

1490.

⟦1490 Show the box packaging info⟧ = ⟦
    if (saved(-2) != 0) {
        print_char(ord!(" "));
        if (saved(-3) == exactly) {
            print(strpool!("to"));
        } else {
            print(strpool!("spread"));
        }
        print_scaled(saved(-2));
        print(strpool!("pt"));
    }
⟧

1491.

⟦1491 Show the box context⟧ = ⟦
    i = saved(-4)

    if (i != 0) {
        if (i < box_flag) {
            if (abs(nest[p].mode_field) == vmode) {
                j = hmove;
            } else {
                j = vmove;
            }
            if (i > 0) {
                print_cmd_chr(j, 0);
            } else {
                print_cmd_chr(j, 1);
            }
            print_scaled(abs(i));
            print(strpool!("pt"));
        } else if (i < ship_out_flag) {
            if (i >= global_box_flag) {
                print_esc(strpool!("global"));
                i = i - (global_box_flag - box_flag);
            }
            print_esc(strpool!("setbox"));
            print_int(i - box_flag);
            print_char(ord!("="));
        } else {
            print_cmd_chr(
              leader_ship,
              i - (leader_flag - a_leaders),
            );
        }
    }
⟧

1492. The scan_general_text procedure is much like scan_toks(false, false) , but will be invoked via expand , i.e., recursively.

⟦1492 Declare \eTeX\ procedures for scanning⟧ = ⟦
    forward_declaration scan_general_text();
⟧

1493. The token list (balanced text) created by scan_general_text begins at link(temp_head) and ends at cur_val . (If cur_val == temp_head , the list is empty.)

⟦1493 Declare \eTeX\ procedures for token lists⟧ = ⟦
    function scan_general_text() {
        label found;
        var
          s: normal .. absorbing, // to save scanner_status 
          w: pointer, // to save warning_index 
          d: pointer, // to save def_ref 
          p: pointer, // tail of the token list being built
          q: pointer, // new node being added to the token 
          // list via store_new_token 
          unbalance: halfword; // number of unmatched left 
          // braces
        
        s = scanner_status;
        w = warning_index;
        d = def_ref;
        scanner_status = absorbing;
        warning_index = cur_cs;
        def_ref = get_avail;
        token_ref_count(def_ref) = null;
        p = def_ref;
        // remove the compulsory left brace
        scan_left_brace;
        unbalance = 1;
        loop {
            get_token;
            if (cur_tok < right_brace_limit) {
                if (cur_cmd < right_brace) {
                    incr(unbalance);
                } else {
                    decr(unbalance);
                    if (unbalance == 0) {
                        goto found;
                    }
                }
            }
            store_new_token(cur_tok);
        }
      found:
        q = link(def_ref);
        // discard reference count
        free_avail(def_ref);
        if (q == null) {
            cur_val = temp_head;
        } else {
            cur_val = p;
        }
        link(temp_head) = q;
        scanner_status = s;
        warning_index = w;
        def_ref = d;
    }
⟧

1494. The \showtokens command displays a token list.

@define show_tokens => 5 // \.{\\showtokens} , must be odd!
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("showtokens"), xray, show_tokens)
⟧

1495.

⟦1486 Cases of |xray| for |print_cmd_chr|⟧ += ⟦
    show_tokens:

    print_esc(strpool!("showtokens"))
⟧

1496. The \unexpanded primitive prevents expansion of tokens much as the result from \the applied to a token variable. The \detokenize primitive converts a token list into a list of character tokens much as if the token list were written to a file. We use the fact that the command modifiers for \unexpanded and \detokenize are odd whereas those for \the and \showthe are even.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("unexpanded"), the, 1)

    primitive(strpool!("detokenize"), the, show_tokens)
⟧

1497.

⟦1497 Cases of |the| for |print_cmd_chr|⟧ = ⟦
    else

    if (chr_code == 1) {
        print_esc(strpool!("unexpanded"));
    } else {
        print_esc(strpool!("detokenize"));
    }
⟧

1498.

⟦1498 Handle \.{\\unexpanded} or \.{\\detokenize} and |return|⟧ = ⟦
    if (odd(cur_chr)) {
        c = cur_chr;
        scan_general_text;
        if (c == 1) {
            the_toks = cur_val;
        } else {
            old_setting = selector;
            selector = new_string;
            b = pool_ptr;
            p = get_avail;
            link(p) = link(temp_head);
            token_show(p);
            flush_list(p);
            selector = old_setting;
            the_toks = str_toks(b);
        }
        return;
    }
⟧

1499. The \showifs command displays all currently active conditionals.

@define show_ifs => 6 // \.{\\showifs}
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("showifs"), xray, show_ifs)
⟧

1500.

⟦1486 Cases of |xray| for |print_cmd_chr|⟧ += ⟦
    show_ifs:

    print_esc(strpool!("showifs"))
⟧

1501.

@define print_if_line(#) =>
    if (# != 0) {
        print(strpool!(" entered on line "));
        print_int(#);
    }
⟦1487 Cases for |show_whatever|⟧ += ⟦
    show_ifs:

    {
        begin_diagnostic;
        print_nl(strpool!(""));
        print_ln;
        if (cond_ptr == null) {
            print_nl(strpool!("### "));
            print(strpool!("no active conditionals"));
        } else {
            p = cond_ptr;
            n = 0;
            repeat {
                incr(n);
                p = link(p);
            } until (p == null);
            p = cond_ptr;
            t = cur_if;
            l = if_line;
            m = if_limit;
            repeat {
                print_nl(strpool!("### level "));
                print_int(n);
                print(strpool!(": "));
                print_cmd_chr(if_test, t);
                if (m == fi_code) {
                    print_esc(strpool!("else"));
                }
                print_if_line(l);
                decr(n);
                t = subtype(p);
                l = if_line_field(p);
                m = type(p);
                p = link(p);
            } until (p == null);
        }
    }
⟧

1502. The \interactionmode primitive allows to query and set the interaction mode.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("interactionmode"), set_page_int, 2)
⟧

1503.

⟦1503 Cases of |set_page_int| for |print_cmd_chr|⟧ = ⟦
    else

    if (chr_code == 2) {
        print_esc(strpool!("interactionmode"));
    }
⟧

1504.

⟦1504 Cases for `Fetch the |dead_cycles| or the |insert_penalties|'⟧ = ⟦
    else

    if (m == 2) {
        cur_val = interaction;
    }
⟧

1505.

⟦1466 Declare \eTeX\ procedures for use by |main_control|⟧ += ⟦
    forward_declaration new_interaction();
⟧

1506.

⟦1506 Cases for |alter_integer|⟧ = ⟦
    else

    if (c == 2) {
        if (
            (cur_val < batch_mode)
            || (cur_val > error_stop_mode)
        ) {
            print_err(strpool!("Bad interaction mode"));
            help2(
              strpool!("Modes are 0=batch, 1=nonstop, 2=scroll, and"),
            )(
              strpool!("3=errorstop. Proceed, and I'll ignore this case."),
            );
            int_error(cur_val);
        } else {
            cur_chr = cur_val;
            new_interaction;
        }
    }
⟧

1507. The middle feature of 𝜀-TEX allows one ore several \middle delimiters to appear between \left and \right.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("middle"), left_right, middle_noad)
⟧

1508.

⟦1508 Cases of |left_right| for |print_cmd_chr|⟧ = ⟦
    else

    if (chr_code == middle_noad) {
        print_esc(strpool!("middle"));
    }
⟧

1509. In constructions such as

\hboxto\hsize{\hskip0ptplus0.0001fil...\hfil\penalty-200\hfilneg...}
the stretch components of \hfil and \hfilneg compensate; they may, however, get modified in order to prevent arithmetic overflow during hlist_out when each of them is multiplied by a large glue_set value.

Since this “glue rounding” depends on state variables cur_g and cur_glue and TEX--X E T is supposed to emulate the behaviour of TeXeT (plus a suitable postprocessor) as close as possible the glue rounding cannot be postponed until (segments of) an hlist has been reversed.

The code below is invoked after the effective width, rule_wd , of a glue node has been computed. The glue node is either converted into a kern node or, for leaders, the glue specification is replaced by an equivalent rigid one; the subtype of the glue node remains unchanged.

⟦1509 Handle a glue node for mixed direction typesetting⟧ = ⟦
    if ((
        (
            (g_sign == stretching)
            && (stretch_order(g) == g_order)
        )
        || (
            (g_sign == shrinking)
            && (shrink_order(g) == g_order)
        )
    )) {
        fast_delete_glue_ref(g);
        if (subtype(p) < a_leaders) {
            type(p) = kern_node;
            width(p) = rule_wd;
        } else {
            g = get_node(glue_spec_size);
            stretch_order(g) = filll + 1;
            // will never match
            shrink_order(g) = filll + 1;
            width(g) = rule_wd;
            stretch(g) = 0;
            shrink(g) = 0;
            glue_ptr(p) = g;
        }
    }
⟧

1510. The optional TeXXeT feature of 𝜀-TEX contains the code for mixed left-to-right and right-to-left typesetting. This code is inspired by but different from TeXeT as presented by Donald E. Knuth and Pierre MacKay in TUGboat 8, 14–25, 1987.

In order to avoid confusion with TeXeT the present implementation of mixed direction typesetting is called TEX--X E T. It differs from TeXeT in several important aspects: (1) Right-to-left text is reversed explicitly by the ship_out routine and is written to a normal DVI file without any begin_reflect or end_reflect commands; (2) a math_node is (ab)used instead of a whatsit_node to record the \beginL, \endL, \beginR, and \endR text direction primitives in order to keep the influence on the line breaking algorithm for pure left-to-right text as small as possible; (3) right-to-left text interrupted by a displayed equation is automatically resumed after that equation; and (4) the valign command code with a non-zero command modifier is (ab)used for the text direction primitives.

Nevertheless there is a subtle difference between TEX and TEX--X E T that may influence the line breaking algorithm for pure left-to-right text. When a paragraph containing math mode material is broken into lines TEX may generate lines where math mode material is not enclosed by properly nested \mathon and \mathoff nodes. Unboxing such lines as part of a new paragraph may have the effect that hyphenation is attempted for ‘words’ originating from math mode or that hyphenation is inhibited for words originating from horizontal mode.

In TEX--X E T additional \beginM, resp. \endM math nodes are supplied at the start, resp. end of lines such that math mode material inside a horizontal list always starts with either \mathon or \beginM and ends with \mathoff or \endM. These additional nodes are transparent to operations such as \unskip, \lastpenalty, or \lastbox but they do have the effect that hyphenation is never attempted for ‘words’ originating from math mode and is never inhibited for words originating from horizontal mode.

@define TeXXeT_state => eTeX_state(TeXXeT_code)
@define TeXXeT_en =>
    (TeXXeT_state > 0) // is \TeXXeT\ enabled?
@define XeTeX_upwards_state => eTeX_state(XeTeX_upwards_code)
@define XeTeX_upwards => (XeTeX_upwards_state > 0)
@define XeTeX_use_glyph_metrics_state =>
    eTeX_state(XeTeX_use_glyph_metrics_code)
@define XeTeX_use_glyph_metrics =>
    (XeTeX_use_glyph_metrics_state > 0)
@define XeTeX_inter_char_tokens_state =>
    eTeX_state(XeTeX_inter_char_tokens_code)
@define XeTeX_inter_char_tokens_en =>
    (XeTeX_inter_char_tokens_state > 0)
@define XeTeX_dash_break_state =>
    eTeX_state(XeTeX_dash_break_code)
@define XeTeX_dash_break_en => (XeTeX_dash_break_state > 0)
@define XeTeX_input_normalization_state =>
    eTeX_state(XeTeX_input_normalization_code)
@define XeTeX_tracing_fonts_state =>
    eTeX_state(XeTeX_tracing_fonts_code)
@define XeTeX_interword_space_shaping_state =>
    eTeX_state(XeTeX_interword_space_shaping_code)
@define XeTeX_generate_actual_text_state =>
    eTeX_state(XeTeX_generate_actual_text_code)
@define XeTeX_generate_actual_text_en =>
    (XeTeX_generate_actual_text_state > 0)
@define XeTeX_default_input_mode =>
    eTeX_state(XeTeX_default_input_mode_code)
@define XeTeX_default_input_encoding =>
    eTeX_state(XeTeX_default_input_encoding_code)
@define XeTeX_hyphenatable_length =>
    eTeX_state(XeTeX_hyphenatable_length_code)
⟦1469 Cases for |print_param|⟧ += ⟦
    suppress_fontnotfound_error_code:

    print_esc(strpool!("suppressfontnotfounderror"))

    eTeX_state_code + TeXXeT_code:
      print_esc(strpool!("TeXXeTstate"));;

    eTeX_state_code + XeTeX_upwards_code:
      print_esc(strpool!("XeTeXupwardsmode"));;

    eTeX_state_code + XeTeX_use_glyph_metrics_code:
      print_esc(strpool!("XeTeXuseglyphmetrics"));;

    eTeX_state_code + XeTeX_inter_char_tokens_code:
      print_esc(strpool!("XeTeXinterchartokenstate"));;

    eTeX_state_code + XeTeX_dash_break_code:
      print_esc(strpool!("XeTeXdashbreakstate"));;

    eTeX_state_code + XeTeX_input_normalization_code:
      print_esc(strpool!("XeTeXinputnormalization"));;

    eTeX_state_code + XeTeX_tracing_fonts_code:
      print_esc(strpool!("XeTeXtracingfonts"));;

    eTeX_state_code + XeTeX_interword_space_shaping_code:
      print_esc(strpool!("XeTeXinterwordspaceshaping"));;

    eTeX_state_code + XeTeX_generate_actual_text_code:
      print_esc(strpool!("XeTeXgenerateactualtext"));;

    eTeX_state_code + XeTeX_hyphenatable_length_code:
      print_esc(strpool!("XeTeXhyphenatablelength"));;
⟧

1511.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("suppressfontnotfounderror"),
      assign_int,
      int_base + suppress_fontnotfound_error_code,
    )

    primitive(
      strpool!("TeXXeTstate"),
      assign_int,
      eTeX_state_base + TeXXeT_code,
    )

    primitive(
      strpool!("XeTeXupwardsmode"),
      assign_int,
      eTeX_state_base + XeTeX_upwards_code,
    )

    primitive(
      strpool!("XeTeXuseglyphmetrics"),
      assign_int,
      eTeX_state_base + XeTeX_use_glyph_metrics_code,
    )

    primitive(
      strpool!("XeTeXinterchartokenstate"),
      assign_int,
      eTeX_state_base + XeTeX_inter_char_tokens_code,
    )

    primitive(
      strpool!("XeTeXdashbreakstate"),
      assign_int,
      eTeX_state_base + XeTeX_dash_break_code,
    )

    primitive(
      strpool!("XeTeXinputnormalization"),
      assign_int,
      eTeX_state_base + XeTeX_input_normalization_code,
    )

    primitive(
      strpool!("XeTeXtracingfonts"),
      assign_int,
      eTeX_state_base + XeTeX_tracing_fonts_code,
    )

    primitive(
      strpool!("XeTeXinterwordspaceshaping"),
      assign_int,
      eTeX_state_base + XeTeX_interword_space_shaping_code,
    )

    primitive(
      strpool!("XeTeXgenerateactualtext"),
      assign_int,
      eTeX_state_base + XeTeX_generate_actual_text_code,
    )

    primitive(
      strpool!("XeTeXhyphenatablelength"),
      assign_int,
      eTeX_state_base + XeTeX_hyphenatable_length_code,
    )

    primitive(
      strpool!("XeTeXinputencoding"),
      extension,
      XeTeX_input_encoding_extension_code,
    )

    primitive(
      strpool!("XeTeXdefaultencoding"),
      extension,
      XeTeX_default_encoding_extension_code,
    )

    primitive(strpool!("beginL"), valign, begin_L_code)

    primitive(strpool!("endL"), valign, end_L_code)

    primitive(strpool!("beginR"), valign, begin_R_code)

    primitive(strpool!("endR"), valign, end_R_code)
⟧

1512.

⟦1512 Cases of |valign| for |print_cmd_chr|⟧ = ⟦
    else

    case chr_code {
      begin_L_code:
        print_esc(strpool!("beginL"));
      end_L_code:
        print_esc(strpool!("endL"));
      begin_R_code:
        print_esc(strpool!("beginR"));
      othercases:
        print_esc(strpool!("endR"));
    }
⟧

1513.

⟦1513 Cases of |main_control| for |hmode+valign|⟧ = ⟦
    if (cur_chr > 0) {
        if (eTeX_enabled(TeXXeT_en, cur_cmd, cur_chr)) {
            tail_append(new_math(0, cur_chr));
        }
    }

    else
⟧

1514. An hbox with subtype dlist will never be reversed, even when embedded in right-to-left text.

⟦1514 Display if this box is never to be reversed⟧ = ⟦
    if ((type(p) == hlist_node) && (box_lr(p) == dlist)) {
        print(strpool!(", display"));
    }
⟧

1515. A number of routines are based on a stack of one-word nodes whose info fields contain end_M_code , end_L_code , or end_R_code . The top of the stack is pointed to by LR_ptr .

When the stack manipulation macros of this section are used below, variable LR_ptr might be the global variable declared here for hpack and ship_out , or might be local to post_line_break .

@define put_LR(#) =>
    {
        temp_ptr = get_avail;
        info(temp_ptr) = #;
        link(temp_ptr) = LR_ptr;
        LR_ptr = temp_ptr;
    }
@define push_LR(#) => put_LR(end_LR_type(#))
@define pop_LR =>
    {
        temp_ptr = LR_ptr;
        LR_ptr = link(temp_ptr);
        free_avail(temp_ptr);
    }
⟦13 Global variables⟧ += ⟦
    // stack of LR codes for hpack , ship_out , and 
    // init_math 
    var LR_ptr: pointer;

    // counts missing begins and ends
    var LR_problems: integer;

    // current text direction
    var cur_dir: small_number;
⟧

1516.

⟦23 Set initial values of key variables⟧ += ⟦
    LR_ptr = null

    LR_problems = 0

    cur_dir = left_to_right
⟧

1517.

⟦1517 Insert LR nodes at the beginning of the current line and adjust the LR stack based on LR nodes in this line⟧ = ⟦
    {
        q = link(temp_head);
        if (LR_ptr != null) {
            temp_ptr = LR_ptr;
            r = q;
            repeat {
                s = new_math(
                  0,
                  begin_LR_type(info(temp_ptr)),
                );
                link(s) = r;
                r = s;
                temp_ptr = link(temp_ptr);
            } until (temp_ptr == null);
            link(temp_head) = r;
        }
        while (q != cur_break(cur_p)) {
            if (!is_char_node(q)) {
                if (type(q) == math_node) {
                    ⟦1518 Adjust \(t)the LR stack for the |post_line_break| routine⟧
                }
            }
            q = link(q);
        }
    }
⟧

1518.

⟦1518 Adjust \(t)the LR stack for the |post_line_break| routine⟧ = ⟦
    if (end_LR(q)) {
        if (LR_ptr != null) {
            if (info(LR_ptr) == end_LR_type(q)) {
                pop_LR;
            }
        }
    } else {
        push_LR(q);
    }
⟧

1519. We use the fact that q now points to the node with \rightskip glue.

⟦1519 Insert LR nodes at the end of the current line⟧ = ⟦
    if (LR_ptr != null) {
        s = temp_head;
        r = link(s);
        while (r != q) {
            s = r;
            r = link(s);
        }
        r = LR_ptr;
        while (r != null) {
            temp_ptr = new_math(0, info(r));
            link(s) = temp_ptr;
            s = temp_ptr;
            r = link(r);
        }
        link(s) = q;
    }
⟧

1520.

⟦1520 Initialize the LR stack⟧ = ⟦
    put_LR(before) // this will never match

1521.

⟦1521 Adjust \(t)the LR stack for the |hpack| routine⟧ = ⟦
    if (end_LR(p)) {
        if (info(LR_ptr) == end_LR_type(p)) {
            pop_LR;
        } else {
            incr(LR_problems);
            type(p) = kern_node;
            subtype(p) = explicit;
        }
    } else {
        push_LR(p);
    }
⟧

1522.

⟦1522 Check for LR anomalies at the end of |hpack|⟧ = ⟦
    {
        if (info(LR_ptr) != before) {
            while (link(q) != null) {
                q = link(q);
            }
            repeat {
                temp_ptr = q;
                q = new_math(0, info(LR_ptr));
                link(temp_ptr) = q;
                LR_problems = LR_problems + 10000;
                pop_LR;
            } until (info(LR_ptr) == before);
        }
        if (LR_problems > 0) {
            ⟦1523 Report LR problems⟧
            goto common_ending;
        }
        pop_LR;
        if (LR_ptr != null) {
            confusion(strpool!("LR1"));
        }
    }
⟧

1523.

⟦1523 Report LR problems⟧ = ⟦
    {
        print_ln;
        print_nl(strpool!("\\endL or \\endR problem ("));
        print_int(LR_problems div 10000);
        print(strpool!(" missing, "));
        print_int(LR_problems % 10000);
        print(strpool!(" extra"));
        LR_problems = 0;
    }
⟧

1524.

⟦1524 Initialize |hlist_out| for mixed direction typesetting⟧ = ⟦
    if (eTeX_ex) {
        ⟦1520 Initialize the LR stack⟧
        if (box_lr(this_box) == dlist) {
            if (cur_dir == right_to_left) {
                cur_dir = left_to_right;
                cur_h = cur_h - width(this_box);
            } else {
                set_box_lr(this_box)(0);
            }
        }
        if (
            (cur_dir == right_to_left)
            && (box_lr(this_box) != reversed)
        ) {
            ⟦1531 Reverse the complete hlist and set the subtype to |reversed|⟧
        }
    }
⟧

1525.

⟦1525 Finish |hlist_out| for mixed direction typesetting⟧ = ⟦
    if (eTeX_ex) {
        ⟦1528 Check for LR anomalies at the end of |hlist_out|⟧
        if (box_lr(this_box) == dlist) {
            cur_dir = right_to_left;
        }
    }
⟧

1526.

⟦1526 Handle a math node in |hlist_out|⟧ = ⟦
    {
        if (eTeX_ex) {
            ⟦1527 Adjust \(t)the LR stack for the |hlist_out| routine; if necessary reverse an hlist segment and |goto reswitch|⟧
        }
        cur_h = cur_h + width(p);
    }
⟧

1527. Breaking a paragraph into lines while TEX--X E T is disabled may result in lines whith unpaired math nodes. Such hlists are silently accepted in the absence of text direction directives.

@define LR_dir(#) =>
    (subtype(#) div R_code) // text direction of a `math 
    // node'
⟦1527 Adjust \(t)the LR stack for the |hlist_out| routine; if necessary reverse an hlist segment and |goto reswitch|⟧ = ⟦
    {
        if (end_LR(p)) {
            if (info(LR_ptr) == end_LR_type(p)) {
                pop_LR;
            } else {
                if (subtype(p) > L_code) {
                    incr(LR_problems);
                }
            }
        } else {
            push_LR(p);
            if (LR_dir(p) != cur_dir) {
                ⟦1532 Reverse an hlist segment and |goto reswitch|⟧
            }
        }
        type(p) = kern_node;
    }
⟧

1528.

⟦1528 Check for LR anomalies at the end of |hlist_out|⟧ = ⟦
    {
        while (info(LR_ptr) != before) {
            if (info(LR_ptr) > L_code) {
                LR_problems = LR_problems + 10000;
            }
            pop_LR;
        }
        pop_LR;
    }
⟧

1529.

// a style_node does not occur in hlists
@define edge_node => style_node
// number of words in an edge node
@define edge_node_size => style_node_size
// new left_edge position relative to cur_h (after width has 
// been taken into account)
@define edge_dist(#) => depth(#)
⟦1431 Declare procedures needed in |hlist_out|, |vlist_out|⟧ += ⟦
    // create an edge node
    function new_edge(s: small_number, w: scaled): pointer {
        var
          p: pointer; // the new node
        
        p = get_node(edge_node_size);
        type(p) = edge_node;
        subtype(p) = s;
        width(p) = w;
        // the edge_dist field will be set later
        edge_dist(p) = 0;
        new_edge = p;
    }
⟧

1530.

⟦1530 Cases of |hlist_out| that arise in mixed direction text only⟧ = ⟦
    edge_node:

    {
        cur_h = cur_h + width(p);
        left_edge = cur_h + edge_dist(p);
        cur_dir = subtype(p);
    }
⟧

1531. We detach the hlist, start a new one consisting of just one kern node, append the reversed list, and set the width of the kern node.

⟦1531 Reverse the complete hlist and set the subtype to |reversed|⟧ = ⟦
    {
        save_h = cur_h;
        temp_ptr = p;
        p = new_kern(0);
        // {\sl Sync\TeX}: do nothing, it is too late
        sync_tag(p + medium_node_size) = 0;
        link(prev_p) = p;
        cur_h = 0;
        link(p) = reverse(this_box, null, cur_g, cur_glue);
        width(p) = -cur_h;
        cur_h = save_h;
        set_box_lr(this_box)(reversed);
    }
⟧

1532. We detach the remainder of the hlist, replace the math node by an edge node, and append the reversed hlist segment to it; the tail of the reversed segment is another edge node and the remainder of the original list is attached to it.

⟦1532 Reverse an hlist segment and |goto reswitch|⟧ = ⟦
    {
        save_h = cur_h;
        temp_ptr = link(p);
        rule_wd = width(p);
        // {\sl Sync\TeX}: p is a math_node 
        free_node(p, medium_node_size);
        cur_dir = reflected;
        p = new_edge(cur_dir, rule_wd);
        link(prev_p) = p;
        cur_h = cur_h - left_edge + rule_wd;
        link(p) = reverse(
          this_box,
          new_edge(reflected, 0),
          cur_g,
          cur_glue,
        );
        edge_dist(p) = cur_h;
        cur_dir = reflected;
        cur_h = save_h;
        goto reswitch;
    }
⟧

1533. The reverse function defined here is responsible to reverse the nodes of an hlist (segment). The first parameter this_box is the enclosing hlist node, the second parameter t is to become the tail of the reversed list, and the global variable temp_ptr is the head of the list to be reversed. Finally cur_g and cur_glue are the current glue rounding state variables, to be updated by this function. We remove nodes from the original list and add them to the head of the new one.

⟦1431 Declare procedures needed in |hlist_out|, |vlist_out|⟧ += ⟦
    function reverse(
      this_box, t: pointer,
      var cur_g: scaled,
      var cur_glue: real,
    ): pointer {
        label reswitch, next_p, done;
        var
          l: pointer, // the new list
          p: pointer, // the current node
          q: pointer, // the next node
          g_order: glue_ord, // applicable order of infinity 
          // for glue
          g_sign: normal .. shrinking, // selects type of 
          // glue
          glue_temp: real, // glue value before rounding
          m, n: halfword; // count of unmatched math nodes
        
        g_order = glue_order(this_box);
        g_sign = glue_sign(this_box);
        l = t;
        p = temp_ptr;
        m = min_halfword;
        n = min_halfword;
        loop {
            while (p != null) {
                ⟦1534 Move node |p| to the new list and go to the next node; or |goto done| if the end of the reflected segment has been reached⟧
            }
            if (
                (t == null)
                && (m == min_halfword)
                && (n == min_halfword)
            ) {
                goto done;
            }
            p = new_math(0, info(LR_ptr));
            // manufacture one missing math node
            LR_problems = LR_problems + 10000;
        }
      done:
        reverse = l;
    }
⟧

1534.

⟦1534 Move node |p| to the new list and go to the next node; or |goto done| if the end of the reflected segment has been reached⟧ = ⟦
    reswitch:

    if (is_char_node(p)) {
        repeat {
            f = font(p);
            c = character(p);
            cur_h = cur_h + char_width(f)(char_info(f)(c));
            q = link(p);
            link(p) = l;
            l = p;
            p = q;
        } until (!is_char_node(p));
    } else {
        ⟦1535 Move the non-|char_node| |p| to the new list⟧
    }
⟧

1535.

⟦1535 Move the non-|char_node| |p| to the new list⟧ = ⟦
    {
        q = link(p);
        case type(p) {
          hlist_node, vlist_node, rule_node, kern_node:
            rule_wd = width(p);
          ⟦1536 Cases of |reverse| that need special treatment⟧
          edge_node:
            confusion(strpool!("LR2"));
          othercases:
            goto next_p;
        }
        cur_h = cur_h + rule_wd;
      next_p:
        link(p) = l;
        if (type(p) == kern_node) {
            if ((rule_wd == 0) || (l == null)) {
                free_node(p, medium_node_size);
                p = l;
            }
        }
        l = p;
        p = q;
    }
⟧

1536. Need to measure native_word and picture nodes when reversing!

⟦1536 Cases of |reverse| that need special treatment⟧ = ⟦
    whatsit_node:

    if (
        (is_native_word_subtype(p))
        || (subtype(p) == glyph_node)
        || (subtype(p) == pic_node)
        || (subtype(p) == pdf_node)
    ) {
        rule_wd = width(p);
    } else {
        goto next_p;
    }
⟧

1537. Here we compute the effective width of a glue node as in hlist_out .

⟦1536 Cases of |reverse| that need special treatment⟧ += ⟦
    glue_node:

    {
        round_glue;
        ⟦1509 Handle a glue node for mixed direction typesetting⟧
    }
⟧

1538. A ligature node is replaced by a char node.

⟦1536 Cases of |reverse| that need special treatment⟧ += ⟦
    ligature_node:

    {
        flush_node_list(lig_ptr(p));
        temp_ptr = p;
        p = get_avail;
        mem[p] = mem[lig_char(temp_ptr)];
        link(p) = q;
        free_node(temp_ptr, small_node_size);
        goto reswitch;
    }
⟧

1539. Math nodes in an inner reflected segment are modified, those at the outer level are changed into kern nodes.

⟦1536 Cases of |reverse| that need special treatment⟧ += ⟦
    math_node:

    {
        rule_wd = width(p);
        if (end_LR(p)) {
            if (info(LR_ptr) != end_LR_type(p)) {
                type(p) = kern_node;
                incr(LR_problems);
            } else {
                pop_LR;
                if (n > min_halfword) {
                    decr(n);
                    // change after into before 
                    decr(subtype(p));
                } else {
                    type(p) = kern_node;
                    if (m > min_halfword) {
                        decr(m);
                    } else {
                        ⟦1540 Finish the reversed hlist segment and |goto done|⟧
                    }
                }
            }
        } else {
            push_LR(p);
            if ((n > min_halfword) || (LR_dir(p) != cur_dir)) {
                incr(n);
                // change before into after 
                incr(subtype(p));
            } else {
                type(p) = kern_node;
                incr(m);
            }
        }
    }
⟧

1540. Finally we have found the end of the hlist segment to be reversed; the final math node is released and the remaining list attached to the edge node terminating the reversed segment.

⟦1540 Finish the reversed hlist segment and |goto done|⟧ = ⟦
    {
        // {\sl Sync\TeX}: p is a kern_node 
        free_node(p, medium_node_size);
        link(t) = q;
        width(t) = rule_wd;
        edge_dist(t) = -cur_h - rule_wd;
        goto done;
    }
⟧

1541.

⟦1541 Check for LR anomalies at the end of |ship_out|⟧ = ⟦
    {
        if (LR_problems > 0) {
            ⟦1523 Report LR problems⟧
            print_char(ord!(")"));
            print_ln;
        }
        if ((LR_ptr != null) || (cur_dir != left_to_right)) {
            confusion(strpool!("LR3"));
        }
    }
⟧

1542. Some special actions are required for displayed equation in paragraphs with mixed direction texts. First of all we have to set the text direction preceding the display.

⟦1542 Set the value of |x| to the text direction before the display⟧ = ⟦
    if (LR_save == null) {
        x = 0;
    } else if (info(LR_save) >= R_code) {
        x = -1;
    } else {
        x = 1;
    }
⟧

1543.

⟦1543 Prepare for display after an empty paragraph⟧ = ⟦
    {
        pop_nest;
        ⟦1542 Set the value of |x| to the text direction before the display⟧
    }
⟧

1544. When calculating the natural width, w , of the final line preceding the display, we may have to copy all or part of its hlist. We copy, however, only those parts of the original list that are relevant for the computation of pre_display_size .

⟦1544 Declare subprocedures for |init_math|⟧ = ⟦
    function just_copy(p, h, t: pointer) {
        label found, not_found;
        var
          r: pointer, // current node being fabricated for 
          // new list
          words: 0 .. 5; // number of words remaining to be 
          // copied
        
        while (p != null) {
            // this setting occurs in more branches than any 
            // other
            words = 1;
            if (is_char_node(p)) {
                r = get_avail;
            } else {
                case type(p) {
                  hlist_node, vlist_node:
                    r = get_node(box_node_size);
                    ⟦1733 Copy the box {\sl Sync\TeX} information⟧
                    mem[r + 6] = mem[p + 6];
                    // copy the last two words
                    mem[r + 5] = mem[p + 5];
                    words = 5;
                    // this affects mem [ r + 5 ] 
                    list_ptr(r) = null;
                  rule_node:
                    r = get_node(rule_node_size);
                    words = rule_node_size;
                  ligature_node:
                    // only font and character are needed
                    r = get_avail;
                    mem[r] = mem[lig_char(p)];
                    goto found;
                  kern_node, math_node:
                    // {\sl Sync\TeX}: proper size for math 
                    // and kern
                    words = medium_node_size;
                    r = get_node(words);
                  glue_node:
                    r = get_node(medium_node_size);
                    // {\sl Sync\TeX}: proper size for glue
                    add_glue_ref(glue_ptr(p));
                    ⟦1735 Copy the medium sized node {\sl Sync\TeX} information⟧
                    glue_ptr(r) = glue_ptr(p);
                    leader_ptr(r) = null;
                  whatsit_node:
                    ⟦1417 Make a partial copy of the whatsit node |p| and make |r| point to it; set |words| to the number of initial words not yet copied⟧
                  othercases:
                    goto not_found;
                }
            }
            while (words > 0) {
                decr(words);
                mem[r + words] = mem[p + words];
            }
          found:
            link(h) = r;
            h = r;
          not_found:
            p = link(p);
        }
        link(h) = t;
    }
⟧

1545. When the final line ends with R-text, the value w refers to the line reflected with respect to the left edge of the enclosing vertical list.

⟦1545 Prepare for display after a non-empty paragraph⟧ = ⟦
    if (eTeX_ex) {
        ⟦1551 Let |j| be the prototype box for the display⟧
    }

    v = shift_amount(just_box)

    ⟦1542 Set the value of |x| to the text direction before the display⟧

    if (x >= 0) {
        p = list_ptr(just_box);
        link(temp_head) = null;
    } else {
        v = -v - width(just_box);
        p = new_math(0, begin_L_code);
        link(temp_head) = p;
        just_copy(
          list_ptr(just_box),
          p,
          new_math(0, end_L_code),
        );
        cur_dir = right_to_left;
    }

    v = v + 2 * quad(cur_font)

    if (TeXXeT_en) {
        ⟦1520 Initialize the LR stack⟧
    }
⟧

1546.

⟦1546 Finish the natural width computation⟧ = ⟦
    if (TeXXeT_en) {
        while (LR_ptr != null) {
            pop_LR;
        }
        if (LR_problems != 0) {
            w = max_dimen;
            LR_problems = 0;
        }
    }

    cur_dir = left_to_right

    flush_node_list(link(temp_head))
⟧

1547. In the presence of text direction directives we assume that any LR problems have been fixed by the hpack routine. If the final line contains, however, text direction directives while TEX--X E T is disabled, then we set w = max_dimen .

⟦1547 Cases of `Let |d| be the natural width' that need special treatment⟧ = ⟦
    math_node:

    {
        d = width(p);
        if (TeXXeT_en) {
            ⟦1548 Adjust \(t)the LR stack for the |init_math| routine⟧
        } else if (subtype(p) >= L_code) {
            w = max_dimen;
            goto done;
        }
    }

    edge_node:

    {
        d = width(p);
        cur_dir = subtype(p);
    }
⟧

1548.

⟦1548 Adjust \(t)the LR stack for the |init_math| routine⟧ = ⟦
    if (end_LR(p)) {
        if (info(LR_ptr) == end_LR_type(p)) {
            pop_LR;
        } else if (subtype(p) > L_code) {
            w = max_dimen;
            goto done;
        }
    } else {
        push_LR(p);
        if (LR_dir(p) != cur_dir) {
            just_reverse(p);
            p = temp_head;
        }
    }
⟧

1549.

⟦1544 Declare subprocedures for |init_math|⟧ += ⟦
    function just_reverse(p: pointer) {
        label done;
        var
          l: pointer, // the new list
          t: pointer, // tail of reversed segment
          q: pointer, // the next node
          m, n: halfword; // count of unmatched math nodes
        
        m = min_halfword;
        n = min_halfword;
        if (link(temp_head) == null) {
            just_copy(link(p), temp_head, null);
            q = link(temp_head);
        } else {
            q = link(p);
            link(p) = null;
            flush_node_list(link(temp_head));
        }
        t = new_edge(cur_dir, 0);
        l = t;
        cur_dir = reflected;
        while (q != null) {
            if (is_char_node(q)) {
                repeat {
                    p = q;
                    q = link(p);
                    link(p) = l;
                    l = p;
                } until (!is_char_node(q));
            } else {
                p = q;
                q = link(p);
                if (type(p) == math_node) {
                    ⟦1550 Adjust \(t)the LR stack for the |just_reverse| routine⟧
                }
                link(p) = l;
                l = p;
            }
        }
        goto done;
        width(t) = width(p);
        link(t) = q;
        free_node(p, small_node_size);
      done:
        link(temp_head) = l;
    }
⟧

1550.

⟦1550 Adjust \(t)the LR stack for the |just_reverse| routine⟧ = ⟦
    if (end_LR(p)) {
        if (info(LR_ptr) != end_LR_type(p)) {
            type(p) = kern_node;
            // {\sl Sync\TeX} node size watch point: 
            // math_node size == kern_node size
            incr(LR_problems);
        } else {
            pop_LR;
            if (n > min_halfword) {
                decr(n);
                // change after into before 
                decr(subtype(p));
            } else {
                if (m > min_halfword) {
                    decr(m);
                } else {
                    width(t) = width(p);
                    link(t) = q;
                    // {\sl Sync\TeX}: no more "goto found", 
                    // and proper node size
                    free_node(p, medium_node_size);
                    goto done;
                }
                // {\sl Sync\TeX} node size watch point: 
                // math_node size == kern_node size
                type(p) = kern_node;
            }
        }
    } else {
        push_LR(p);
        if ((n > min_halfword) || (LR_dir(p) != cur_dir)) {
            incr(n);
            // change before into after 
            incr(subtype(p));
        } else {
            type(p) = kern_node;
            // {\sl Sync\TeX} node size watch point: 
            // math_node size == kern_node size
            incr(m);
        }
    }
⟧

1551. The prototype box is an hlist node with the width, glue set, and shift amount of just_box , i.e., the last line preceding the display. Its hlist reflects the current \leftskip and \rightskip.

⟦1551 Let |j| be the prototype box for the display⟧ = ⟦
    {
        if (right_skip == zero_glue) {
            j = new_kern(0);
        } else {
            j = new_param_glue(right_skip_code);
        }
        if (left_skip == zero_glue) {
            p = new_kern(0);
        } else {
            p = new_param_glue(left_skip_code);
        }
        link(p) = j;
        j = new_null_box;
        width(j) = width(just_box);
        shift_amount(j) = shift_amount(just_box);
        list_ptr(j) = p;
        glue_order(j) = glue_order(just_box);
        glue_sign(j) = glue_sign(just_box);
        glue_set(j) = glue_set(just_box);
    }
⟧

1552. At the end of a displayed equation we retrieve the prototype box.

⟦1252 Local variables for finishing a displayed formula⟧ += ⟦
    // prototype box
    var j: pointer;
⟧

1553.

⟦1553 Retrieve the prototype box⟧ = ⟦
    if (mode == mmode) {
        j = LR_box;
    }
⟧

1554.

⟦1554 Flush the prototype box⟧ = ⟦
    flush_node_list(j)
⟧

1555. The app_display procedure used to append the displayed equation and/or equation number to the current vertical list has three parameters: the prototype box, the hbox to be appended, and the displacement of the hbox in the display line.

⟦1555 Declare subprocedures for |after_math|⟧ = ⟦
    function app_display(j, b: pointer, d: scaled) {
        var
          z: scaled, // width of the line
          s: scaled, // move the line right this much
          e: scaled, // distance from right edge of box to 
          // end of line
          x: integer, //  pre_display_direction 
          p, q, r, t, u: pointer; // for list manipulation
        
        s = display_indent;
        x = pre_display_direction;
        if (x == 0) {
            shift_amount(b) = s + d;
        } else {
            z = display_width;
            p = b;
            ⟦1556 Set up the hlist for the display line⟧
            ⟦1557 Package the display line⟧
        }
        append_to_vlist(b);
    }
⟧

1556. Here we construct the hlist for the display, starting with node p and ending with node q . We also set d and e to the amount of kerning to be added before and after the hlist (adjusted for the prototype box).

⟦1556 Set up the hlist for the display line⟧ = ⟦
    if (x > 0) {
        e = z - d - width(p);
    } else {
        e = d;
        d = z - e - width(p);
    }

    if (j != null) {
        b = copy_node_list(j);
        height(b) = height(p);
        depth(b) = depth(p);
        s = s - shift_amount(b);
        d = d + s;
        e = e + width(b) - z - s;
    }

    if (box_lr(p) == dlist) {
        // display or equation number
        q = p;
    } else {
        // display and equation number
        r = list_ptr(p);
        free_node(p, box_node_size);
        if (r == null) {
            confusion(strpool!("LR4"));
        }
        if (x > 0) {
            p = r;
            repeat {
                q = r;
                // find tail of list
                r = link(r);
            } until (r == null);
        } else {
            p = null;
            q = r;
            repeat {
                t = link(r);
                link(r) = p;
                p = r;
                // reverse list
                r = t;
            } until (r == null);
        }
    }
⟧

1557. In the presence of a prototype box we use its shift amount and width to adjust the values of kerning and add these values to the glue nodes inserted to cancel the \leftskip and \rightskip. If there is no prototype box (because the display is preceded by an empty paragraph), or if the skip parameters are zero, we just add kerns.

The cancel_glue macro creates and links a glue node that is, together with another glue node, equivalent to a given amount of kerning. We can use j as temporary pointer, since all we need is j != null .

@define cancel_glue(#) =>
    j = new_skip_param(#);
    cancel_glue_cont
@define cancel_glue_cont(#) =>
    link(#) = j;
    cancel_glue_cont_cont
@define cancel_glue_cont_cont(#) =>
    link(j) = #;
    cancel_glue_end
@define cancel_glue_end(#) =>
    j = glue_ptr(#);
    cancel_glue_end_end
@define cancel_glue_end_end(#) =>
    stretch_order(temp_ptr) = stretch_order(j);
    shrink_order(temp_ptr) = shrink_order(j);
    width(temp_ptr) = # - width(j);
    stretch(temp_ptr) = -stretch(j);
    shrink(temp_ptr) = -shrink(j)
⟦1557 Package the display line⟧ = ⟦
    if (j == null) {
        r = new_kern(0);
        // the widths will be set later
        t = new_kern(0);
    } else {
        r = list_ptr(b);
        t = link(r);
    }

    u = new_math(0, end_M_code)

    //  t is \.{\\rightskip} glue
    if (type(t) == glue_node) {
        cancel_glue(right_skip_code)(q)(u)(t)(e);
        link(u) = t;
    } else {
        width(t) = e;
        link(t) = u;
        link(q) = t;
    }

    u = new_math(0, begin_M_code)

    //  r is \.{\\leftskip} glue
    if (type(r) == glue_node) {
        cancel_glue(left_skip_code)(u)(p)(r)(d);
        link(r) = u;
    } else {
        width(r) = d;
        link(r) = p;
        link(u) = r;
        if (j == null) {
            b = hpack(u, natural);
            shift_amount(b) = s;
        } else {
            list_ptr(b) = u;
        }
    }
⟧

1558. The scan_tokens feature of 𝜀-TEX defines the \scantokens primitive.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("scantokens"), input, 2)
⟧

1559.

⟦1559 Cases of |input| for |print_cmd_chr|⟧ = ⟦
    else

    if (chr_code == 2) {
        print_esc(strpool!("scantokens"));
    }
⟧

1560.

⟦1560 Cases for |input|⟧ = ⟦
    else

    if (cur_chr == 2) {
        pseudo_start;
    }
⟧

1561. The global variable pseudo_files is used to maintain a stack of pseudo files. The info field of each pseudo file points to a linked list of variable size nodes representing lines not yet processed: the info field of the first word contains the size of this node, all the following words contain ASCII codes.

⟦13 Global variables⟧ += ⟦
    // stack of pseudo files
    var pseudo_files: pointer;
⟧

1562.

⟦23 Set initial values of key variables⟧ += ⟦
    pseudo_files = null
⟧

1563. The pseudo_start procedure initiates reading from a pseudo file.

⟦1563 Declare \eTeX\ procedures for expanding⟧ = ⟦
    forward_declaration pseudo_start();
⟧

1564.

⟦1493 Declare \eTeX\ procedures for token lists⟧ += ⟦
    function pseudo_start() {
        var
          old_setting: 0 .. max_selector, // holds selector 
          // setting
          s: str_number, // string to be converted into a 
          // pseudo file
          l, m: pool_pointer, // indices into str_pool 
          p, q, r: pointer, // for list construction
          w: four_quarters, // four ASCII codes
          nl, sz: integer;
        
        scan_general_text;
        old_setting = selector;
        selector = new_string;
        token_show(temp_head);
        selector = old_setting;
        flush_list(link(temp_head));
        str_room(1);
        s = make_string;
        ⟦1565 Convert string |s| into a new pseudo file⟧
        flush_string;
        ⟦1566 Initiate input from new pseudo file⟧
    }
⟧

1565.

⟦1565 Convert string |s| into a new pseudo file⟧ = ⟦
    str_pool[pool_ptr] = si(ord!(" "))

    l = str_start_macro(s)

    nl = si(new_line_char)

    p = get_avail

    q = p

    while (l < pool_ptr) {
        m = l;
        while ((l < pool_ptr) && (str_pool[l] != nl)) {
            incr(l);
        }
        sz = (l - m + 7) div 4;
        if (sz == 1) {
            sz = 2;
        }
        r = get_node(sz);
        link(q) = r;
        q = r;
        info(q) = hi(sz);
        while (sz > 2) {
            decr(sz);
            incr(r);
            w.b0 = qi(so(str_pool[m]));
            w.b1 = qi(so(str_pool[m + 1]));
            w.b2 = qi(so(str_pool[m + 2]));
            w.b3 = qi(so(str_pool[m + 3]));
            mem[r].qqqq = w;
            m = m + 4;
        }
        w.b0 = qi(ord!(" "));
        w.b1 = qi(ord!(" "));
        w.b2 = qi(ord!(" "));
        w.b3 = qi(ord!(" "));
        if (l > m) {
            w.b0 = qi(so(str_pool[m]));
            if (l > m + 1) {
                w.b1 = qi(so(str_pool[m + 1]));
                if (l > m + 2) {
                    w.b2 = qi(so(str_pool[m + 2]));
                    if (l > m + 3) {
                        w.b3 = qi(so(str_pool[m + 3]));
                    }
                }
            }
        }
        mem[r + 1].qqqq = w;
        if (str_pool[l] == nl) {
            incr(l);
        }
    }

    info(p) = link(p)

    link(p) = pseudo_files

    pseudo_files = p
⟧

1566.

⟦1566 Initiate input from new pseudo file⟧ = ⟦
    // set up cur_file and new level of input
    begin_file_reading

    line = 0

    limit = start

    loc = limit + 1 // force line read

    if (tracing_scan_tokens > 0) {
        if (term_offset > max_print_line - 3) {
            print_ln;
        } else if ((term_offset > 0) || (file_offset > 0)) {
            print_char(ord!(" "));
        }
        name = 19;
        print(strpool!("( "));
        incr(open_parens);
        update_terminal;
    } else {
        name = 18;
        ⟦1718 Prepare pseudo file {\sl Sync\TeX} information⟧
    }
⟧

1567. Here we read a line from the current pseudo file into buffer .

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // inputs the next line or returns false 
    function pseudo_input(): boolean {
        var
          p: pointer, // current line from pseudo file
          sz: integer, // size of node p 
          w: four_quarters, // four ASCII codes
          r: pointer; // loop index
        
        // cf.\ Matthew 19\thinspace:\thinspace30
        last = first;
        p = info(pseudo_files);
        if (p == null) {
            pseudo_input = false;
        } else {
            info(pseudo_files) = link(p);
            sz = ho(info(p));
            if (4 * sz - 3 >= buf_size - last) {
                ⟦35 Report overflow of the input buffer, and abort⟧
            }
            last = first;
            for (r in p + 1 to p + sz - 1) {
                w = mem[r].qqqq;
                buffer[last] = w.b0;
                buffer[last + 1] = w.b1;
                buffer[last + 2] = w.b2;
                buffer[last + 3] = w.b3;
                last = last + 4;
            }
            if (last >= max_buf_stack) {
                max_buf_stack = last + 1;
            }
            while (
                (last > first)
                && (buffer[last - 1] == ord!(" "))
            ) {
                decr(last);
            }
            free_node(p, sz);
            pseudo_input = true;
        }
    }
⟧

1568. When we are done with a pseudo file we ‘close’ it.

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // close the top level pseudo file
    function pseudo_close() {
        var p, q: pointer;
        
        p = link(pseudo_files);
        q = info(pseudo_files);
        free_avail(pseudo_files);
        pseudo_files = p;
        while (q != null) {
            p = q;
            q = link(p);
            free_node(p, ho(info(p)));
        }
    }
⟧

1569.

⟦1464 Dump the \eTeX\ state⟧ += ⟦
    while (pseudo_files != null) {
        // flush pseudo files
        pseudo_close;
    }
⟧

1570.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("readline"), read_to_cs, 1)
⟧

1571.

⟦1571 Cases of |read| for |print_cmd_chr|⟧ = ⟦
    else

    print_esc(strpool!("readline"))
⟧

1572.

⟦1572 Handle \.{\\readline} and |goto done|⟧ = ⟦
    if (j == 1) {
        // current line not yet finished
        while (loc <= limit) {
            cur_chr = buffer[loc];
            incr(loc);
            if (cur_chr == ord!(" ")) {
                cur_tok = space_token;
            } else {
                cur_tok = cur_chr + other_token;
            }
            store_new_token(cur_tok);
        }
        goto done;
    }
⟧

1573. Here we define the additional conditionals of 𝜀-TEX as well as the \unless prefix.

@define if_def_code => 17 // `\.{\\ifdefined}'
@define if_cs_code => 18 // `\.{\\ifcsname}'
@define if_font_char_code => 19 // `\.{\\iffontchar}'
@define if_in_csname_code => 20 // `\.{\\ifincsname}'
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("unless"), expand_after, 1)

    primitive(strpool!("ifdefined"), if_test, if_def_code)

    primitive(strpool!("ifcsname"), if_test, if_cs_code)

    primitive(
      strpool!("iffontchar"),
      if_test,
      if_font_char_code,
    )

    primitive(
      strpool!("ifincsname"),
      if_test,
      if_in_csname_code,
    )
⟧

1574.

⟦1574 Cases of |expandafter| for |print_cmd_chr|⟧ = ⟦
    else

    print_esc(strpool!("unless"))
⟧

1575.

⟦1575 Cases of |if_test| for |print_cmd_chr|⟧ = ⟦
    if_def_code:

    print_esc(strpool!("ifdefined"))

    if_cs_code:

    print_esc(strpool!("ifcsname"))

    if_font_char_code:

    print_esc(strpool!("iffontchar"))

    if_in_csname_code:

    print_esc(strpool!("ifincsname"))
⟧

1576. The result of a boolean condition is reversed when the conditional is preceded by \unless.

⟦1576 Negate a boolean conditional and |goto reswitch|⟧ = ⟦
    {
        get_token;
        if (
            (cur_cmd == if_test)
            && (cur_chr != if_case_code)
        ) {
            cur_chr = cur_chr + unless_code;
            goto reswitch;
        }
        print_err(strpool!("You can't use `"));
        print_esc(strpool!("unless"));
        print(strpool!("' before `"));
        print_cmd_chr(cur_cmd, cur_chr);
        print_char(ord!("'"));
        help1(
          strpool!("Continue, and I'll forget that it ever happened."),
        );
        back_error;
    }
⟧

1577. The conditional \ifdefined tests if a control sequence is defined.

We need to reset scanner_status , since \outer control sequences are allowed, but we might be scanning a macro definition or preamble.

⟦1577 Cases for |conditional|⟧ = ⟦
    if_def_code:

    {
        save_scanner_status = scanner_status;
        scanner_status = normal;
        get_next;
        b = (cur_cmd != undefined_cs);
        scanner_status = save_scanner_status;
    }
⟧

1578. The conditional \ifcsname is equivalent to {\expandafter }\expandafter \ifdefined \csname, except that no new control sequence will be entered into the hash table (once all tokens preceding the mandatory \endcsname have been expanded).

⟦1577 Cases for |conditional|⟧ += ⟦
    if_cs_code:

    {
        n = get_avail;
        // head of the list of characters
        p = n;
        e = is_in_csname;
        is_in_csname = true;
        repeat {
            get_x_token;
            if (cur_cs == 0) {
                store_new_token(cur_tok);
            }
        } until (cur_cs != 0);
        if (cur_cmd != end_cs_name) {
            ⟦407 Complain about missing \.{\\endcsname}⟧
        }
        ⟦1579 Look up the characters of list |n| in the hash table, and set |cur_cs|⟧
        flush_list(n);
        b = (eq_type(cur_cs) != undefined_cs);
        is_in_csname = e;
    }
⟧

1579.

⟦1579 Look up the characters of list |n| in the hash table, and set |cur_cs|⟧ = ⟦
    m = first

    p = link(n)

    while (p != null) {
        if (m >= max_buf_stack) {
            max_buf_stack = m + 1;
            if (max_buf_stack == buf_size) {
                overflow(strpool!("buffer size"), buf_size);
            }
        }
        buffer[m] = info(p) % max_char_val;
        incr(m);
        p = link(p);
    }

    if (m > first + 1) {
        //  no_new_control_sequence is true 
        cur_cs = id_lookup(first, m - first);
    } else if (m == first) {
        // the list is empty
        cur_cs = null_cs;
    } else {
        // the list has length one
        cur_cs = single_base + buffer[first];
    }
⟧

1580. The conditional \iffontchar tests the existence of a character in a font.

⟦1577 Cases for |conditional|⟧ += ⟦
    if_in_csname_code:

    b = is_in_csname

    if_font_char_code:

    {
        scan_font_ident;
        n = cur_val;
        scan_usv_num;
        if (is_native_font(n)) {
            b = (map_char_to_glyph(n, cur_val) > 0);
        } else {
            if (
                (font_bc[n] <= cur_val)
                && (font_ec[n] >= cur_val)
            ) {
                b = char_exists(char_info(n)(qi(cur_val)));
            } else {
                b = false;
            }
        }
    }
⟧

1581. The protected feature of 𝜀-TEX defines the \protected prefix command for macro definitions. Such macros are protected against expansions when lists of expanded tokens are built, e.g., for \edef or during \write.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("protected"), prefix, 8)
⟧

1582.

⟦1582 Cases of |prefix| for |print_cmd_chr|⟧ = ⟦
    else

    if (chr_code == 8) {
        print_esc(strpool!("protected"));
    }
⟧

1583. The get_x_or_protected procedure is like get_x_token except that protected macros are not expanded.

⟦1492 Declare \eTeX\ procedures for scanning⟧ += ⟦
    // sets cur_cmd , cur_chr , cur_tok , and expands 
    // non-protected macros
    function get_x_or_protected() {
        label exit;
        
        loop {
            get_token;
            if (cur_cmd <= max_command) {
                return;
            }
            if (
                (cur_cmd >= call)
                && (cur_cmd < end_template)
            ) {
                if (info(link(cur_chr)) == protected_token) {
                    return;
                }
            }
            expand;
        }
      exit:
    }
⟧

1584. A group entered (or a conditional started) in one file may end in a different file. Such slight anomalies, although perfectly legitimate, may cause errors that are difficult to locate. In order to be able to give a warning message when such anomalies occur, 𝜀-TEX uses the grp_stack and if_stack arrays to record the initial cur_boundary and cond_ptr values for each input file.

⟦13 Global variables⟧ += ⟦
    // initial cur_boundary 
    var grp_stack: ^save_pointer;

    // initial cond_ptr 
    var if_stack: ^pointer;
⟧

1585. When a group ends that was apparently entered in a different input file, the group_warning procedure is invoked in order to update the grp_stack . If moreover \tracingnesting is positive we want to give a warning message. The situation is, however, somewhat complicated by two facts: (1) There may be grp_stack elements without a corresponding \input file or \scantokens pseudo file (e.g., error insertions from the terminal); and (2) the relevant information is recorded in the name_field of the input_stack only loosely synchronized with the in_open variable indexing grp_stack .

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    function group_warning() {
        var
          i: 0 .. max_in_open, // index into grp_stack 
          w: boolean; // do we need a warning?
        
        base_ptr = input_ptr;
        // store current state
        input_stack[base_ptr] = cur_input;
        i = in_open;
        w = false;
        while ((grp_stack[i] == cur_boundary) && (i > 0)) {
            ⟦1586 Set variable |w| to indicate if this case should be reported⟧
            grp_stack[i] = save_index(save_ptr);
            decr(i);
        }
        if (w) {
            print_nl(strpool!("Warning: end of "));
            print_group(true);
            print(strpool!(" of a different file"));
            print_ln;
            if (tracing_nesting > 1) {
                show_context;
            }
            if (history == spotless) {
                history = warning_issued;
            }
        }
    }
⟧

1586. This code scans the input stack in order to determine the type of the current input file.

⟦1586 Set variable |w| to indicate if this case should be reported⟧ = ⟦
    if (tracing_nesting > 0) {
        while (
            (input_stack[base_ptr].state_field == token_list)
            || (input_stack[base_ptr].index_field > i)
        ) {
            decr(base_ptr);
        }
        if (input_stack[base_ptr].name_field > 17) {
            w = true;
        }
    }
⟧

1587. When a conditional ends that was apparently started in a different input file, the if_warning procedure is invoked in order to update the if_stack . If moreover \tracingnesting is positive we want to give a warning message (with the same complications as above).

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    function if_warning() {
        var
          i: 0 .. max_in_open, // index into if_stack 
          w: boolean; // do we need a warning?
        
        base_ptr = input_ptr;
        // store current state
        input_stack[base_ptr] = cur_input;
        i = in_open;
        w = false;
        while (if_stack[i] == cond_ptr) {
            ⟦1586 Set variable |w| to indicate if this case should be reported⟧
            if_stack[i] = link(cond_ptr);
            decr(i);
        }
        if (w) {
            print_nl(strpool!("Warning: end of "));
            print_cmd_chr(if_test, cur_if);
            print_if_line(if_line);
            print(strpool!(" of a different file"));
            print_ln;
            if (tracing_nesting > 1) {
                show_context;
            }
            if (history == spotless) {
                history = warning_issued;
            }
        }
    }
⟧

1588. Conversely, the file_warning procedure is invoked when a file ends and some groups entered or conditionals started while reading from that file are still incomplete.

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    function file_warning() {
        var
          p: pointer, // saved value of save_ptr or cond_ptr 
          l: quarterword, // saved value of cur_level or 
          // if_limit 
          c: quarterword, // saved value of cur_group or 
          // cur_if 
          i: integer; // saved value of if_line 
        
        p = save_ptr;
        l = cur_level;
        c = cur_group;
        save_ptr = cur_boundary;
        while (grp_stack[in_open] != save_ptr) {
            decr(cur_level);
            print_nl(
              strpool!("Warning: end of file when "),
            );
            print_group(true);
            print(strpool!(" is incomplete"));
            cur_group = save_level(save_ptr);
            save_ptr = save_index(save_ptr);
        }
        save_ptr = p;
        cur_level = l;
        // restore old values
        cur_group = c;
        p = cond_ptr;
        l = if_limit;
        c = cur_if;
        i = if_line;
        while (if_stack[in_open] != cond_ptr) {
            print_nl(
              strpool!("Warning: end of file when "),
            );
            print_cmd_chr(if_test, cur_if);
            if (if_limit == fi_code) {
                print_esc(strpool!("else"));
            }
            print_if_line(if_line);
            print(strpool!(" is incomplete"));
            if_line = if_line_field(cond_ptr);
            cur_if = subtype(cond_ptr);
            if_limit = type(cond_ptr);
            cond_ptr = link(cond_ptr);
        }
        cond_ptr = p;
        if_limit = l;
        cur_if = c;
        // restore old values
        if_line = i;
        print_ln;
        if (tracing_nesting > 1) {
            show_context;
        }
        if (history == spotless) {
            history = warning_issued;
        }
    }
⟧

1589. Here are the additional 𝜀-TEX primitives for expressions.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("numexpr"),
      last_item,
      eTeX_expr - int_val + int_val,
    )

    primitive(
      strpool!("dimexpr"),
      last_item,
      eTeX_expr - int_val + dimen_val,
    )

    primitive(
      strpool!("glueexpr"),
      last_item,
      eTeX_expr - int_val + glue_val,
    )

    primitive(
      strpool!("muexpr"),
      last_item,
      eTeX_expr - int_val + mu_val,
    )
⟧

1590.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    eTeX_expr - int_val + int_val:
      print_esc(strpool!("numexpr"));;

    eTeX_expr - int_val + dimen_val:
      print_esc(strpool!("dimexpr"));;

    eTeX_expr - int_val + glue_val:
      print_esc(strpool!("glueexpr"));;

    eTeX_expr - int_val + mu_val:
      print_esc(strpool!("muexpr"));;
⟧

1591. This code for reducing cur_val_level and/or negating the result is similar to the one for all the other cases of scan_something_internal , with the difference that scan_expr has already increased the reference count of a glue specification.

⟦1591 Process an expression and |return|⟧ = ⟦
    {
        if (m < eTeX_mu) {
            case m {
              ⟦1618 Cases for fetching a glue value⟧// 
              // there are no other cases
            }
            cur_val_level = glue_val;
        } else if (m < eTeX_expr) {
            case m {
              ⟦1619 Cases for fetching a mu value⟧// 
              // there are no other cases
            }
            cur_val_level = mu_val;
        } else {
            cur_val_level = m - eTeX_expr + int_val;
            scan_expr;
        }
        while (cur_val_level > level) {
            if (cur_val_level == glue_val) {
                m = cur_val;
                cur_val = width(m);
                delete_glue_ref(m);
            } else if (cur_val_level == mu_val) {
                mu_error;
            }
            decr(cur_val_level);
        }
        if (negative) {
            if (cur_val_level >= glue_val) {
                m = cur_val;
                cur_val = new_spec(m);
                delete_glue_ref(m);
                ⟦465 Negate all three glue components of |cur_val|⟧
            } else {
                negate(cur_val);
            }
        }
        return;
    }
⟧

1592.

⟦1492 Declare \eTeX\ procedures for scanning⟧ += ⟦
    forward_declaration scan_expr();
⟧

1593. The scan_expr procedure scans and evaluates an expression.

⟦1593 Declare procedures needed for expressions⟧ = ⟦
    ⟦1604 Declare subprocedures for |scan_expr|⟧

    // scans and evaluates an expression
    function scan_expr() {
        label restart, continue, found;
        var
          a, b: boolean, // saved values of arith_error 
          l: small_number, // type of expression
          r: small_number, // state of expression so far
          s: small_number, // state of term so far
          o: small_number, // next operation or type of next 
          // factor
          e: integer, // expression so far
          t: integer, // term so far
          f: integer, // current factor
          n: integer, // numerator of combined 
          // multiplication and division
          p: pointer, // top of expression stack
          q: pointer; // for stack manipulations
        
        l = cur_val_level;
        a = arith_error;
        b = false;
        p = null;
        incr(expand_depth_count);
        if (expand_depth_count >= expand_depth) {
            overflow(
              strpool!("expansion depth"),
              expand_depth,
            );
        }
        ⟦1594 Scan and evaluate an expression |e| of type |l|⟧
        decr(expand_depth_count);
        if (b) {
            print_err(strpool!("Arithmetic overflow"));
            help2(
              strpool!("I can't evaluate this expression,"),
            )(
              strpool!("since the result is out of range."),
            );
            error;
            if (l >= glue_val) {
                delete_glue_ref(e);
                e = zero_glue;
                add_glue_ref(e);
            } else {
                e = 0;
            }
        }
        arith_error = a;
        cur_val = e;
        cur_val_level = l;
    }
⟧

1594. Evaluating an expression is a recursive process: When the left parenthesis of a subexpression is scanned we descend to the next level of recursion; the previous level is resumed with the matching right parenthesis.

// \.( seen, or \.( $\langle\it expr\rangle$ \.) seen
@define expr_none => 0
// \.( $\langle\it expr\rangle$ \.+ seen
@define expr_add => 1
// \.( $\langle\it expr\rangle$ \.- seen
@define expr_sub => 2
@define expr_mult => 3 // $\langle\it term\rangle$ \.* seen
@define expr_div => 4 // $\langle\it term\rangle$ \./ seen
// $\langle\it term\rangle$ \.* $\langle\it factor\rangle$ 
// \./ seen
@define expr_scale => 5
⟦1594 Scan and evaluate an expression |e| of type |l|⟧ = ⟦
    restart:

    r = expr_none

    e = 0

    s = expr_none

    t = 0

    n = 0

    continue:

    if (s == expr_none) {
        o = l;
    } else {
        o = int_val;
    }

    ⟦1596 Scan a factor |f| of type |o| or start a subexpression⟧

    found:

    ⟦1595 Scan the next operator and set |o|⟧

    arith_error = b

    ⟦1601 Make sure that |f| is in the proper range⟧

    case s {
      ⟦1602 Cases for evaluation of the current term⟧// 
      // there are no other cases
    }

    if (o > expr_sub) {
        s = o;
    } else {
        ⟦1603 Evaluate the current expression⟧
    }

    b = arith_error

    if (o != expr_none) {
        goto continue;
    }

    if (p != null) {
        ⟦1600 Pop the expression stack and |goto found|⟧
    }
⟧

1595.

⟦1595 Scan the next operator and set |o|⟧ = ⟦
    ⟦440 Get the next non-blank non-call token⟧

    if (cur_tok == other_token + ord!("+")) {
        o = expr_add;
    } else if (cur_tok == other_token + ord!("-")) {
        o = expr_sub;
    } else if (cur_tok == other_token + ord!("*")) {
        o = expr_mult;
    } else if (cur_tok == other_token + ord!("/")) {
        o = expr_div;
    } else {
        o = expr_none;
        if (p == null) {
            if (cur_cmd != relax) {
                back_input;
            }
        } else if (cur_tok != other_token + ord!(")")) {
            print_err(
              strpool!("Missing ) inserted for expression"),
            );
            help1(
              strpool!("I was expecting to see `+', `-', `*', `/', or `)'. Didn't."),
            );
            back_error;
        }
    }
⟧

1596.

⟦1596 Scan a factor |f| of type |o| or start a subexpression⟧ = ⟦
    ⟦440 Get the next non-blank non-call token⟧

    if (cur_tok == other_token + ord!("(")) {
        ⟦1599 Push the expression stack and |goto restart|⟧
    }

    back_input

    if (o == int_val) {
        scan_int;
    } else if (o == dimen_val) {
        scan_normal_dimen;
    } else if (o == glue_val) {
        scan_normal_glue;
    } else {
        scan_mu_glue;
    }

    f = cur_val
⟧

1597.

⟦1492 Declare \eTeX\ procedures for scanning⟧ += ⟦
    forward_declaration scan_normal_glue();

    forward_declaration scan_mu_glue();
⟧

1598. Here we declare two trivial procedures in order to avoid mutually recursive procedures with parameters.

⟦1593 Declare procedures needed for expressions⟧ += ⟦
    function scan_normal_glue() {
        scan_glue(glue_val);
    }

    function scan_mu_glue() {
        scan_glue(mu_val);
    }
⟧

1599. Parenthesized subexpressions can be inside expressions, and this nesting has a stack. Seven local variables represent the top of the expression stack: p points to pushed-down entries, if any; l specifies the type of expression currently beeing evaluated; e is the expression so far and r is the state of its evaluation; t is the term so far and s is the state of its evaluation; finally n is the numerator for a combined multiplication and division, if any.

// number of words in stack entry for subexpressions
@define expr_node_size => 4
// saved expression so far
@define expr_e_field(#) => mem[# + 1].int
// saved term so far
@define expr_t_field(#) => mem[# + 2].int
@define expr_n_field(#) => mem[# + 3].int // saved numerator
⟦1599 Push the expression stack and |goto restart|⟧ = ⟦
    {
        q = get_node(expr_node_size);
        link(q) = p;
        type(q) = l;
        subtype(q) = 4 * s + r;
        expr_e_field(q) = e;
        expr_t_field(q) = t;
        expr_n_field(q) = n;
        p = q;
        l = o;
        goto restart;
    }
⟧

1600.

⟦1600 Pop the expression stack and |goto found|⟧ = ⟦
    {
        f = e;
        q = p;
        e = expr_e_field(q);
        t = expr_t_field(q);
        n = expr_n_field(q);
        s = subtype(q) div 4;
        r = subtype(q) % 4;
        l = type(q);
        p = link(q);
        free_node(q, expr_node_size);
        goto found;
    }
⟧

1601. We want to make sure that each term and (intermediate) result is in the proper range. Integer values must not exceed infinity (2311) in absolute value, dimensions must not exceed max_dimen (2301). We avoid the absolute value of an integer, because this might fail for the value 231 using 32-bit arithmetic.

// clear a number or dimension and set arith_error 
@define num_error(#) =>
    {
        arith_error = true;
        # = 0;
    }
// clear a glue spec and set arith_error 
@define glue_error(#) =>
    {
        arith_error = true;
        delete_glue_ref(#);
        # = new_spec(zero_glue);
    }
⟦1601 Make sure that |f| is in the proper range⟧ = ⟦
    if ((l == int_val) || (s > expr_sub)) {
        if ((f > infinity) || (f < -infinity)) {
            num_error(f);
        }
    } else if (l == dimen_val) {
        if (abs(f) > max_dimen) {
            num_error(f);
        }
    } else {
        if (
            (abs(width(f)) > max_dimen)
            || (abs(stretch(f)) > max_dimen)
            || (abs(shrink(f)) > max_dimen)
        ) {
            glue_error(f);
        }
    }
⟧

1602. Applying the factor f to the partial term t (with the operator s ) is delayed until the next operator o has been scanned. Here we handle the first factor of a partial term. A glue spec has to be copied unless the next operator is a right parenthesis; this allows us later on to simply modify the glue components.

@define normalize_glue(#) =>
    if (stretch(#) == 0) {
        stretch_order(#) = normal;
    }
    if (shrink(#) == 0) {
        shrink_order(#) = normal;
    }
⟦1602 Cases for evaluation of the current term⟧ = ⟦
    expr_none:

    if ((l >= glue_val) && (o != expr_none)) {
        t = new_spec(f);
        delete_glue_ref(f);
        normalize_glue(t);
    } else {
        t = f;
    }
⟧

1603. When a term t has been completed it is copied to, added to, or subtracted from the expression e .

@define expr_add_sub(#) => add_or_sub(#, r == expr_sub)
@define expr_a(#) => expr_add_sub(#, max_dimen)
⟦1603 Evaluate the current expression⟧ = ⟦
    {
        s = expr_none;
        if (r == expr_none) {
            e = t;
        } else if (l == int_val) {
            e = expr_add_sub(e, t, infinity);
        } else if (l == dimen_val) {
            e = expr_a(e, t);
        } else {
            ⟦1605 Compute the sum or difference of two glue specs⟧
        }
        r = o;
    }
⟧

1604. The function add_or_sub(x, y, max_answer, negative) computes the sum (for negative == false ) or difference (for negative == true ) of x and y , provided the absolute value of the result does not exceed max_answer .

⟦1604 Declare subprocedures for |scan_expr|⟧ = ⟦
    function add_or_sub(
      x, y, max_answer: integer,
      negative: boolean,
    ): integer {
        var
          a: integer; // the answer
        
        if (negative) {
            negate(y);
        }
        if (x >= 0) {
            if (y <= max_answer - x) {
                a = x + y;
            } else {
                num_error(a);
            }
        } else if (y >= -max_answer - x) {
            a = x + y;
        } else {
            num_error(a);
        }
        add_or_sub = a;
    }
⟧

1605. We know that stretch_order(e) > normal implies stretch(e) != 0 and shrink_order(e) > normal implies shrink(e) != 0 .

⟦1605 Compute the sum or difference of two glue specs⟧ = ⟦
    {
        width(e) = expr_a(width(e), width(t));
        if (stretch_order(e) == stretch_order(t)) {
            stretch(e) = expr_a(stretch(e), stretch(t));
        } else if (
            (stretch_order(e) < stretch_order(t))
            && (stretch(t) != 0)
        ) {
            stretch(e) = stretch(t);
            stretch_order(e) = stretch_order(t);
        }
        if (shrink_order(e) == shrink_order(t)) {
            shrink(e) = expr_a(shrink(e), shrink(t));
        } else if (
            (shrink_order(e) < shrink_order(t))
            && (shrink(t) != 0)
        ) {
            shrink(e) = shrink(t);
            shrink_order(e) = shrink_order(t);
        }
        delete_glue_ref(t);
        normalize_glue(e);
    }
⟧

1606. If a multiplication is followed by a division, the two operations are combined into a ‘scaling’ operation. Otherwise the term t is multiplied by the factor f .

@define expr_m(#) => # = nx_plus_y(#, f, 0)
⟦1602 Cases for evaluation of the current term⟧ += ⟦
    expr_mult:

    if (o == expr_div) {
        n = f;
        o = expr_scale;
    } else if (l == int_val) {
        t = mult_integers(t, f);
    } else if (l == dimen_val) {
        expr_m(t);
    } else {
        expr_m(width(t));
        expr_m(stretch(t));
        expr_m(shrink(t));
    }
⟧

1607. Here we divide the term t by the factor f .

@define expr_d(#) => # = quotient(#, f)
⟦1602 Cases for evaluation of the current term⟧ += ⟦
    expr_div:

    if (l < glue_val) {
        expr_d(t);
    } else {
        expr_d(width(t));
        expr_d(stretch(t));
        expr_d(shrink(t));
    }
⟧

1608. The function quotient(n, d) computes the rounded quotient 𝑞=𝑛/𝑑+12, when 𝑛 and 𝑑 are positive.

⟦1604 Declare subprocedures for |scan_expr|⟧ += ⟦
    function quotient(n, d: integer): integer {
        var
          negative: boolean, // should the answer be 
          // negated?
          a: integer; // the answer
        
        if (d == 0) {
            num_error(a);
        } else {
            if (d > 0) {
                negative = false;
            } else {
                negate(d);
                negative = true;
            }
            if (n < 0) {
                negate(n);
                negative = !negative;
            }
            a = n div d;
            n = n - a * d;
            // avoid certain compiler optimizations!
            d = n - d;
            if (d + n >= 0) {
                incr(a);
            }
            if (negative) {
                negate(a);
            }
        }
        quotient = a;
    }
⟧

1609. Here the term t is multiplied by the quotient 𝑛/𝑓 .

@define expr_s(#) => # = fract(#, n, f, max_dimen)
⟦1602 Cases for evaluation of the current term⟧ += ⟦
    expr_scale:

    if (l == int_val) {
        t = fract(t, n, f, infinity);
    } else if (l == dimen_val) {
        expr_s(t);
    } else {
        expr_s(width(t));
        expr_s(stretch(t));
        expr_s(shrink(t));
    }
⟧

1610. Finally, the function fract(x, n, d, max_answer) computes the integer 𝑞=𝑥𝑛/𝑑+12, when 𝑥, 𝑛, and 𝑑 are positive and the result does not exceed max_answer . We can’t use floating point arithmetic since the routine must produce identical results in all cases; and it would be too dangerous to multiply by n and then divide by d , in separate operations, since overflow might well occur. Hence this subroutine simulates double precision arithmetic, somewhat analogous to ’s make_fraction and take_fraction routines.

@define too_big => 88 // go here when the result is too big
⟦1604 Declare subprocedures for |scan_expr|⟧ += ⟦
    function fract(x, n, d, max_answer: integer): integer {
        label found, found1, too_big, done;
        var
          negative: boolean, // should the answer be 
          // negated?
          a: integer, // the answer
          f: integer, // a proper fraction
          h: integer, // smallest integer such that 2 * h >= 
          // d 
          r: integer, // intermediate remainder
          t: integer; // temp variable
        
        if (d == 0) {
            goto too_big;
        }
        a = 0;
        if (d > 0) {
            negative = false;
        } else {
            negate(d);
            negative = true;
        }
        if (x < 0) {
            negate(x);
            negative = !negative;
        } else if (x == 0) {
            goto done;
        }
        if (n < 0) {
            negate(n);
            negative = !negative;
        }
        t = n div d;
        if (t > max_answer div x) {
            goto too_big;
        }
        a = t * x;
        n = n - t * d;
        if (n == 0) {
            goto found;
        }
        t = x div d;
        if (t > (max_answer - a) div n) {
            goto too_big;
        }
        a = a + t * n;
        x = x - t * d;
        if (x == 0) {
            goto found;
        }
        if (x < n) {
            t = x;
            x = n;
            n = t;
            // now 0 < n <= x < d 
        }
        ⟦1611 Compute \(f)$f=\lfloor xn/d+{1\over2}\rfloor$⟧
        if (f > (max_answer - a)) {
            goto too_big;
        }
        a = a + f;
      found:
        if (negative) {
            negate(a);
        }
        goto done;
      too_big:
        num_error(a);
      done:
        fract = a;
    }
⟧

1611. The loop here preserves the following invariant relations between f , x , n , and r : (i) 𝑓+(𝑥𝑛+(𝑟+𝑑))/𝑑=𝑥0𝑛0/𝑑+12; (ii) -d <= r < 0 < n <= x < d , where 𝑥0, 𝑛0 are the original values of 𝑥 and 𝑛.

Notice that the computation specifies (x - d) + x instead of (x + x) - d , because the latter could overflow.

⟦1611 Compute \(f)$f=\lfloor xn/d+{1\over2}\rfloor$⟧ = ⟦
    f = 0

    r = (d div 2) - d

    h = -r

    loop {
        if (odd(n)) {
            r = r + x;
            if (r >= 0) {
                r = r - d;
                incr(f);
            }
        }
        n = n div 2;
        if (n == 0) {
            goto found1;
        }
        if (x < h) {
            x = x + x;
        } else {
            t = x - d;
            x = t + x;
            f = f + n;
            if (x < n) {
                if (x == 0) {
                    goto found1;
                }
                t = x;
                x = n;
                n = t;
            }
        }
    }

    found1:
⟧

1612. The \gluestretch, \glueshrink, \gluestretchorder, and \glueshrinkorder commands return the stretch and shrink components and their orders of “infinity” of a glue specification.

// code for \.{\\gluestretchorder}
@define glue_stretch_order_code => eTeX_int + 6
// code for \.{\\glueshrinkorder}
@define glue_shrink_order_code => eTeX_int + 7
// code for \.{\\gluestretch}
@define glue_stretch_code => eTeX_dim + 7
// code for \.{\\glueshrink}
@define glue_shrink_code => eTeX_dim + 8
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("gluestretchorder"),
      last_item,
      glue_stretch_order_code,
    )

    primitive(
      strpool!("glueshrinkorder"),
      last_item,
      glue_shrink_order_code,
    )

    primitive(
      strpool!("gluestretch"),
      last_item,
      glue_stretch_code,
    )

    primitive(
      strpool!("glueshrink"),
      last_item,
      glue_shrink_code,
    )
⟧

1613.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    glue_stretch_order_code:

    print_esc(strpool!("gluestretchorder"))

    glue_shrink_order_code:

    print_esc(strpool!("glueshrinkorder"))

    glue_stretch_code:

    print_esc(strpool!("gluestretch"))

    glue_shrink_code:

    print_esc(strpool!("glueshrink"))
⟧

1614.

⟦1454 Cases for fetching an integer value⟧ += ⟦
    glue_stretch_order_code, glue_shrink_order_code:
      scan_normal_glue;
      q = cur_val;
      if (m == glue_stretch_order_code) {
          cur_val = stretch_order(q);
      } else {
          cur_val = shrink_order(q);
      }
      delete_glue_ref(q);
⟧

1615.

⟦1458 Cases for fetching a dimension value⟧ += ⟦
    glue_stretch_code, glue_shrink_code:
      scan_normal_glue;
      q = cur_val;
      if (m == glue_stretch_code) {
          cur_val = stretch(q);
      } else {
          cur_val = shrink(q);
      }
      delete_glue_ref(q);
⟧

1616. The \mutoglue and \gluetomu commands convert “math” glue into normal glue and vice versa; they allow to manipulate math glue with \gluestretch etc.

// code for \.{\\mutoglue}
@define mu_to_glue_code => eTeX_glue
@define glue_to_mu_code => eTeX_mu // code for 
// \.{\\gluetomu}
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("mutoglue"),
      last_item,
      mu_to_glue_code,
    )

    primitive(
      strpool!("gluetomu"),
      last_item,
      glue_to_mu_code,
    )
⟧

1617.

⟦1453 Cases of |last_item| for |print_cmd_chr|⟧ += ⟦
    mu_to_glue_code:

    print_esc(strpool!("mutoglue"))

    glue_to_mu_code:

    print_esc(strpool!("gluetomu"))
⟧

1618.

⟦1618 Cases for fetching a glue value⟧ = ⟦
    var mu_to_glue_code: scan_mu_glue;
⟧

1619.

⟦1619 Cases for fetching a mu value⟧ = ⟦
    var glue_to_mu_code: scan_normal_glue;
⟧

1620. 𝜀-TEX (in extended mode) supports 32768 (i.e., 215) count, dimen, skip, muskip, box, and token registers. As in TEX the first 256 registers of each kind are realized as arrays in the table of equivalents; the additional registers are realized as tree structures built from variable-size nodes with individual registers existing only when needed. Default values are used for nonexistent registers: zero for count and dimen values, zero_glue for glue (skip and muskip) values, void for boxes, and null for token lists (and current marks discussed below).

Similarly there are 32768 mark classes; the command \marksn creates a mark node for a given mark class 0 <= n <= 32767 (where \marks0 is synonymous to \mark). The page builder (actually the fire_up routine) and the vsplit routine maintain the current values of top_mark , first_mark , bot_mark , split_first_mark , and split_bot_mark for each mark class. They are accessed as \topmarksn etc., and \topmarks0 is again synonymous to \topmark. As in TEX the five current marks for mark class zero are realized as cur_mark array. The additional current marks are again realized as tree structure with individual mark classes existing only when needed.

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(strpool!("marks"), mark, marks_code)

    primitive(
      strpool!("topmarks"),
      top_bot_mark,
      top_mark_code + marks_code,
    )

    primitive(
      strpool!("firstmarks"),
      top_bot_mark,
      first_mark_code + marks_code,
    )

    primitive(
      strpool!("botmarks"),
      top_bot_mark,
      bot_mark_code + marks_code,
    )

    primitive(
      strpool!("splitfirstmarks"),
      top_bot_mark,
      split_first_mark_code + marks_code,
    )

    primitive(
      strpool!("splitbotmarks"),
      top_bot_mark,
      split_bot_mark_code + marks_code,
    )
⟧

1621. The scan_register_num procedure scans a register number that must not exceed 255 in compatibility mode resp. 32767 in extended mode.

⟦1563 Declare \eTeX\ procedures for expanding⟧ += ⟦
    forward_declaration scan_register_num();
⟧

1622.

⟦467 Declare procedures that scan restricted classes of integers⟧ += ⟦
    function scan_register_num() {
        scan_int;
        if ((cur_val < 0) || (cur_val > max_reg_num)) {
            print_err(strpool!("Bad register code"));
            help2(max_reg_help_line)(
              strpool!("I changed this one to zero."),
            );
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

1623.

⟦1623 Initialize variables for \eTeX\ compatibility mode⟧ = ⟦
    max_reg_num = 255

    max_reg_help_line = strpool!("A register number must be between 0 and 255.")
⟧

1624.

⟦1624 Initialize variables for \eTeX\ extended mode⟧ = ⟦
    max_reg_num = 32767

    max_reg_help_line = strpool!("A register number must be between 0 and 32767.")
⟧

1625.

⟦13 Global variables⟧ += ⟦
    // largest allowed register number
    var max_reg_num: halfword;

    // first line of help message
    var max_reg_help_line: str_number;
⟧

1626. There are eight almost identical doubly linked trees, one for the sparse array of the up to 32512 additional registers of each kind, one for inter-character token lists at specified class transitions, and one for the sparse array of the up to 32767 additional mark classes. The root of each such tree, if it exists, is an index node containing 64 pointers to subtrees for 644 consecutive array elements. Similar index nodes are the starting points for all nonempty subtrees for 643, 642, and 64 consecutive array elements. These four levels of index nodes are followed by a fifth level with nodes for the individual array elements.

Each index node is 33 words long. The pointers to the 64 possible subtrees or nodes are kept in the info and link fields of the last 32 words. (It would be both elegant and efficient to declare them as array, unfortunately Pascal doesn’t allow this.)

The fields in the first word of each index node and in the nodes for the array elements are closely related. The link field points to the next lower index node and the sa_index field contains four bits (one hexadecimal digit) of the register number or mark class. For the lowest index node the link field is null and the sa_index field indicates the type of quantity (int_val , dimen_val , glue_val , mu_val , box_val , tok_val , inter_char_val or mark_val ). The sa_used field in the index nodes counts how many of the 64 pointers are non-null.

The sa_index field in the nodes for array elements contains the six bits plus 64 times the type. Therefore such a node represents a count or dimen register if and only if sa_index < dimen_val_limit ; it represents a skip or muskip register if and only if dimen_val_limit <= sa_index < mu_val_limit ; it represents a box register if and only if mu_val_limit <= sa_index < box_val_limit ; it represents a token list register if and only if box_val_limit <= sa_index < tok_val_limit ; finally it represents a mark class if and only if tok_val_limit <= sa_index .

The new_index procedure creates an index node (returned in cur_ptr ) having given contents of the sa_index and link fields.

@define box_val => 4 // the additional box registers
@define mark_val => 7 // the additional mark classes
@define dimen_val_limit => 0x80 // $2^6\cdot( dimen_val +1)$
@define mu_val_limit => 0x100 // $2^6\cdot( mu_val +1)$
@define box_val_limit => 0x140 // $2^6\cdot( box_val +1)$
@define tok_val_limit => 0x180 // $2^6\cdot( tok_val +1)$
@define index_node_size => 33 // size of an index node
// a four-bit address or a type or both
@define sa_index => type
@define sa_used => subtype // count of non-null pointers
⟦1563 Declare \eTeX\ procedures for expanding⟧ += ⟦
    function new_index(i: quarterword, q: pointer) {
        var
          k: small_number; // loop index
        
        cur_ptr = get_node(index_node_size);
        sa_index(cur_ptr) = i;
        sa_used(cur_ptr) = 0;
        link(cur_ptr) = q;
        // clear all 64 pointers
        for (k in 1 to index_node_size - 1) {
            mem[cur_ptr + k] = sa_null;
        }
    }
⟧

1627. The roots of the eight trees for the additional registers and mark classes are kept in the sa_root array. The first seven locations must be dumped and undumped; the last one is also known as sa_mark .

@define sa_mark => sa_root[mark_val] // root for mark 
// classes
⟦13 Global variables⟧ += ⟦
    // roots of sparse arrays
    var sa_root: array [int_val .. mark_val] of pointer;

    // value returned by new_index and find_sa_element 
    var cur_ptr: pointer;

    // two null pointers
    var sa_null: memory_word;
⟧

1628.

⟦23 Set initial values of key variables⟧ += ⟦
    sa_mark = null

    sa_null.hh.lh = null

    sa_null.hh.rh = null
⟧

1629.

⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    for (i in int_val to inter_char_val) {
        sa_root[i] = null;
    }
⟧

1630. Given a type t and a twenty-four-bit number n , the find_sa_element procedure returns (in cur_ptr ) a pointer to the node for the corresponding array element, or null when no such element exists. The third parameter w is set true if the element must exist, e.g., because it is about to be modified. The procedure has two main branches: one follows the existing tree structure, the other (only used when w is true ) creates the missing nodes.

We use macros to extract the six-bit pieces from a twenty-four-bit register number or mark class and to fetch or store one of the 64 pointers from an index node. (Note that the hex_dig macros are mis-named since the conversion from 4-bit to 6-bit fields for XƎTEX!)

// some tree element is missing
@define if_cur_ptr_is_null_then_return_or_goto(#) =>
    {
        if (cur_ptr == null) {
            if (w) {
                goto #;
            } else {
                return;
            }
        }
    }
// the fourth lowest 6-bit field
@define hex_dig1(#) => # div 0x40000
// the third lowest 6-bit field
@define hex_dig2(#) => (# div 0x1000) % 0x40
// the second lowest 6-bit field
@define hex_dig3(#) => (# div 0x40) % 0x40
@define hex_dig4(#) => # % 0x40 // the lowest 6-bit field
@define get_sa_ptr =>
    if (odd(i)) {
        cur_ptr = link(q + (i div 2) + 1);
    } else {
        // set cur_ptr to the pointer indexed by i from 
        // index node q 
        cur_ptr = info(q + (i div 2) + 1);
    }
@define put_sa_ptr(#) =>
    if (odd(i)) {
        link(q + (i div 2) + 1) = #;
    } else {
        // store the pointer indexed by i in index node q 
        info(q + (i div 2) + 1) = #;
    }
@define add_sa_ptr =>
    {
        put_sa_ptr(cur_ptr);
        incr(sa_used(q));
        // add cur_ptr as the pointer indexed by i in index 
        // node q 
    }
@define delete_sa_ptr =>
    {
        put_sa_ptr(null);
        decr(sa_used(q));
        // delete the pointer indexed by i in index node q 
    }
⟦1563 Declare \eTeX\ procedures for expanding⟧ += ⟦
    // sets cur_val to sparse array element location or null 
    function find_sa_element(
      t: small_number,
      n: halfword,
      w: boolean,
    ) {
        label
            not_found,
            not_found1,
            not_found2,
            not_found3,
            not_found4,
            exit;
        var
          q: pointer, // for list manipulations
          i: small_number; // a six bit index
        
        cur_ptr = sa_root[t];
        if_cur_ptr_is_null_then_return_or_goto(not_found);
        q = cur_ptr;
        i = hex_dig1(n);
        get_sa_ptr;
        if_cur_ptr_is_null_then_return_or_goto(not_found1);
        q = cur_ptr;
        i = hex_dig2(n);
        get_sa_ptr;
        if_cur_ptr_is_null_then_return_or_goto(not_found2);
        q = cur_ptr;
        i = hex_dig3(n);
        get_sa_ptr;
        if_cur_ptr_is_null_then_return_or_goto(not_found3);
        q = cur_ptr;
        i = hex_dig4(n);
        get_sa_ptr;
        if ((cur_ptr == null) && w) {
            goto not_found4;
        }
        return;
      not_found:
        // create first level index node
        new_index(t, null);
        sa_root[t] = cur_ptr;
        q = cur_ptr;
        i = hex_dig1(n);
      not_found1:
        // create second level index node
        new_index(i, q);
        add_sa_ptr;
        q = cur_ptr;
        i = hex_dig2(n);
      not_found2:
        // create third level index node
        new_index(i, q);
        add_sa_ptr;
        q = cur_ptr;
        i = hex_dig3(n);
      not_found3:
        // create fourth level index node
        new_index(i, q);
        add_sa_ptr;
        q = cur_ptr;
        i = hex_dig4(n);
      not_found4:
        ⟦1631 Create a new array element of type |t| with index |i|⟧
        link(cur_ptr) = q;
        add_sa_ptr;
      exit:
    }
⟧

1631. The array elements for registers are subject to grouping and have an sa_lev field (quite analogous to eq_level ) instead of sa_used . Since saved values as well as shorthand definitions (created by e.g., \countdef) refer to the location of the respective array element, we need a reference count that is kept in the sa_ref field. An array element can be deleted (together with all references to it) when its sa_ref value is null and its value is the default value.

Skip, muskip, box, and token registers use two word nodes, their values are stored in the sa_ptr field. Count and dimen registers use three word nodes, their values are stored in the sa_int resp. sa_dim field in the third word; the sa_ptr field is used under the name sa_num to store the register number. Mark classes use four word nodes. The last three words contain the five types of current marks

// grouping level for the current value
@define sa_lev => sa_used
// size of an element with a pointer value
@define pointer_node_size => 2
@define sa_type(#) =>
    (sa_index(#) div 64) // type part of combined type/index
// reference count of a sparse array element
@define sa_ref(#) => info(# + 1)
@define sa_ptr(#) => link(# + 1) // a pointer value
// size of an element with a word value
@define word_node_size => 3
@define sa_num => sa_ptr // the register number
@define sa_int(#) => mem[# + 2].int // an integer
// a dimension (a somewhat esotheric distinction)
@define sa_dim(#) => mem[# + 2].sc
// size of an element for a mark class
@define mark_class_node_size => 4
// fetch box ( cur_val ) 
@define fetch_box(#) =>
    if (cur_val < 256) {
        # = box(cur_val);
    } else {
        find_sa_element(box_val, cur_val, false);
        if (cur_ptr == null) {
            # = null;
        } else {
            # = sa_ptr(cur_ptr);
        }
    }
⟦1631 Create a new array element of type |t| with index |i|⟧ = ⟦
    // a mark class
    if (t == mark_val) {
        cur_ptr = get_node(mark_class_node_size);
        mem[cur_ptr + 1] = sa_null;
        mem[cur_ptr + 2] = sa_null;
        mem[cur_ptr + 3] = sa_null;
    } else {
        // a count or dimen register
        if (t <= dimen_val) {
            cur_ptr = get_node(word_node_size);
            sa_int(cur_ptr) = 0;
            sa_num(cur_ptr) = n;
        } else {
            cur_ptr = get_node(pointer_node_size);
            // a skip or muskip register
            if (t <= mu_val) {
                sa_ptr(cur_ptr) = zero_glue;
                add_glue_ref(zero_glue);
            } else {
                // a box or token list register
                sa_ptr(cur_ptr) = null;
            }
        }
        // all registers have a reference count
        sa_ref(cur_ptr) = null;
    }

    sa_index(cur_ptr) = 64 * t + i

    sa_lev(cur_ptr) = level_one
⟧

1632. The delete_sa_ref procedure is called when a pointer to an array element representing a register is being removed; this means that the reference count should be decreased by one. If the reduced reference count is null and the register has been (globally) assigned its default value the array element should disappear, possibly together with some index nodes. This procedure will never be used for mark class nodes.

// increase reference count
@define add_sa_ref(#) => incr(sa_ref(#))
// change box ( cur_val ) , the eq_level stays the same
@define change_box(#) =>
    if (cur_val < 256) {
        box(cur_val) = #;
    } else {
        set_sa_box(#);
    }
@define set_sa_box(#) =>
    {
        find_sa_element(box_val, cur_val, false);
        if (cur_ptr != null) {
            sa_ptr(cur_ptr) = #;
            add_sa_ref(cur_ptr);
            delete_sa_ref(cur_ptr);
        }
    }
⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // reduce reference count
    function delete_sa_ref(q: pointer) {
        label exit;
        var
          p: pointer, // for list manipulations
          i: small_number, // a four bit index
          s: small_number; // size of a node
        
        decr(sa_ref(q));
        if (sa_ref(q) != null) {
            return;
        }
        if (sa_index(q) < dimen_val_limit) {
            if (sa_int(q) == 0) {
                s = word_node_size;
            } else {
                return;
            }
        } else {
            if (sa_index(q) < mu_val_limit) {
                if (sa_ptr(q) == zero_glue) {
                    delete_glue_ref(zero_glue);
                } else {
                    return;
                }
            } else if (sa_ptr(q) != null) {
                return;
            }
            s = pointer_node_size;
        }
        repeat {
            i = hex_dig4(sa_index(q));
            p = q;
            q = link(p);
            free_node(p, s);
            // the whole tree has been freed
            if (q == null) {
                sa_root[i] = null;
                return;
            }
            delete_sa_ptr;
            // node q is an index node
            s = index_node_size;
        } until (sa_used(q) > 0);
      exit:
    }
⟧

1633. The print_sa_num procedure prints the register number corresponding to an array element.

⟦57 Basic printing procedures⟧ += ⟦
    // print register number
    function print_sa_num(q: pointer) {
        var
          n: halfword; // the register number
        
        if (sa_index(q) < dimen_val_limit) {
            // the easy case
            n = sa_num(q);
        } else {
            n = hex_dig4(sa_index(q));
            q = link(q);
            n = n + 64 * sa_index(q);
            q = link(q);
            n = 
                n
                + 64
                * 64
                * (sa_index(q) + 64 * sa_index(link(q)))
            ;
        }
        print_int(n);
    }
⟧

1634. Here is a procedure that displays the contents of an array element symbolically. It is used under similar circumstances as is restore_trace (together with show_eqtb ) for the quantities kept in the eqtb array.

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    stat!{
        function show_sa(p: pointer, s: str_number) {
            var
              t: small_number; // the type of element
            
            begin_diagnostic;
            print_char(ord!("{"));
            print(s);
            print_char(ord!(" "));
            if (p == null) {
                // this can't happen
                print_char(ord!("?"));
            } else {
                t = sa_type(p);
                if (t < box_val) {
                    print_cmd_chr(register, p);
                } else if (t == box_val) {
                    print_esc(strpool!("box"));
                    print_sa_num(p);
                } else if (t == tok_val) {
                    print_cmd_chr(toks_register, p);
                } else {
                    // this can't happen either
                    print_char(ord!("?"));
                }
                print_char(ord!("="));
                if (t == int_val) {
                    print_int(sa_int(p));
                } else if (t == dimen_val) {
                    print_scaled(sa_dim(p));
                    print(strpool!("pt"));
                } else {
                    p = sa_ptr(p);
                    if (t == glue_val) {
                        print_spec(p, strpool!("pt"));
                    } else if (t == mu_val) {
                        print_spec(p, strpool!("mu"));
                    } else if (t == box_val) {
                        if (p == null) {
                            print(strpool!("void"));
                        } else {
                            depth_threshold = 0;
                            breadth_max = 1;
                            show_node_list(p);
                        }
                    } else if (t == tok_val) {
                        if (p != null) {
                            show_token_list(
                              link(p),
                              null,
                              32,
                            );
                        }
                    } else {
                        // this can't happen either
                        print_char(ord!("?"));
                    }
                }
            }
            print_char(ord!("}"));
            end_diagnostic(false);
        }
    }
⟧

1635. Here we compute the pointer to the current mark of type t and mark class cur_val .

⟦1635 Compute the mark pointer for mark type |t| and class |cur_val|⟧ = ⟦
    {
        find_sa_element(mark_val, cur_val, false);
        if (cur_ptr != null) {
            if (odd(t)) {
                cur_ptr = link(cur_ptr + (t div 2) + 1);
            } else {
                cur_ptr = info(cur_ptr + (t div 2) + 1);
            }
        }
    }
⟧

1636. The current marks for all mark classes are maintained by the vsplit and fire_up routines and are finally destroyed (for INITEX only) by the final_cleanup routine. Apart from updating the current marks when mark nodes are encountered, these routines perform certain actions on all existing mark classes. The recursive do_marks procedure walks through the whole tree or a subtree of existing mark class nodes and preforms certain actions indicted by its first parameter a , the action code. The second parameter l indicates the level of recursion (at most four); the third parameter points to a nonempty tree or subtree. The result is true if the complete tree or subtree has been deleted.

// action code for vsplit initialization
@define vsplit_init => 0
// action code for fire_up initialization
@define fire_up_init => 1
// action code for fire_up completion
@define fire_up_done => 2
@define destroy_marks => 3 // action code for final_cleanup 
@define sa_top_mark(#) => info(# + 1) // \.{\\topmarks} n 
// \.{\\firstmarks} n 
@define sa_first_mark(#) => link(# + 1)
@define sa_bot_mark(#) => info(# + 2) // \.{\\botmarks} n 
// \.{\\splitfirstmarks} n 
@define sa_split_first_mark(#) => link(# + 2)
// \.{\\splitbotmarks} n 
@define sa_split_bot_mark(#) => info(# + 3)
⟦1636 Declare the function called |do_marks|⟧ = ⟦
    function do_marks(
      a, l: small_number,
      q: pointer,
    ): boolean {
        var
          i: small_number; // a four bit index
        
        //  q is an index node
        if (l < 4) {
            for (i in 0 to 15) {
                get_sa_ptr;
                if (cur_ptr != null) {
                    if (do_marks(a, l + 1, cur_ptr)) {
                        delete_sa_ptr;
                    }
                }
            }
            if (sa_used(q) == 0) {
                free_node(q, index_node_size);
                q = null;
            }
        } else {
            //  q is the node for a mark class
            case a {
              ⟦1637 Cases for |do_marks|⟧// there are no 
              // other cases
            }
            if (sa_bot_mark(q) == null) {
                if (sa_split_bot_mark(q) == null) {
                    free_node(q, mark_class_node_size);
                    q = null;
                }
            }
        }
        do_marks = (q == null);
    }
⟧

1637. At the start of the vsplit routine the existing split_fist_mark and split_bot_mark are discarded.

⟦1637 Cases for |do_marks|⟧ = ⟦
    vsplit_init:

    if (sa_split_first_mark(q) != null) {
        delete_token_ref(sa_split_first_mark(q));
        sa_split_first_mark(q) = null;
        delete_token_ref(sa_split_bot_mark(q));
        sa_split_bot_mark(q) = null;
    }
⟧

1638. We use again the fact that split_first_mark == null if and only if split_bot_mark == null .

⟦1638 Update the current marks for |vsplit|⟧ = ⟦
    {
        find_sa_element(mark_val, mark_class(p), true);
        if (sa_split_first_mark(cur_ptr) == null) {
            sa_split_first_mark(cur_ptr) = mark_ptr(p);
            add_token_ref(mark_ptr(p));
        } else {
            delete_token_ref(sa_split_bot_mark(cur_ptr));
        }
        sa_split_bot_mark(cur_ptr) = mark_ptr(p);
        add_token_ref(mark_ptr(p));
    }
⟧

1639. At the start of the fire_up routine the old top_mark and first_mark are discarded, whereas the old bot_mark becomes the new top_mark . An empty new top_mark token list is, however, discarded as well in order that mark class nodes can eventually be released. We use again the fact that bot_mark != null implies first_mark != null ; it also knows that bot_mark == null implies top_mark == first_mark == null .

⟦1637 Cases for |do_marks|⟧ += ⟦
    fire_up_init:

    if (sa_bot_mark(q) != null) {
        if (sa_top_mark(q) != null) {
            delete_token_ref(sa_top_mark(q));
        }
        delete_token_ref(sa_first_mark(q));
        sa_first_mark(q) = null;
        // an empty token list
        if (link(sa_bot_mark(q)) == null) {
            delete_token_ref(sa_bot_mark(q));
            sa_bot_mark(q) = null;
        } else {
            add_token_ref(sa_bot_mark(q));
        }
        sa_top_mark(q) = sa_bot_mark(q);
    }
⟧

1640.

⟦1637 Cases for |do_marks|⟧ += ⟦
    fire_up_done:

    if (
        (sa_top_mark(q) != null)
        && (sa_first_mark(q) == null)
    ) {
        sa_first_mark(q) = sa_top_mark(q);
        add_token_ref(sa_top_mark(q));
    }
⟧

1641.

⟦1641 Update the current marks for |fire_up|⟧ = ⟦
    {
        find_sa_element(mark_val, mark_class(p), true);
        if (sa_first_mark(cur_ptr) == null) {
            sa_first_mark(cur_ptr) = mark_ptr(p);
            add_token_ref(mark_ptr(p));
        }
        if (sa_bot_mark(cur_ptr) != null) {
            delete_token_ref(sa_bot_mark(cur_ptr));
        }
        sa_bot_mark(cur_ptr) = mark_ptr(p);
        add_token_ref(mark_ptr(p));
    }
⟧

1642. Here we use the fact that the five current mark pointers in a mark class node occupy the same locations as the the first five pointers of an index node. For systems using a run-time switch to distinguish between VIRTEX and INITEX, the codewords ‘ 𝑖𝑛𝑖𝑡𝑡𝑖𝑛𝑖’ surrounding the following piece of code should be removed.

⟦1637 Cases for |do_marks|⟧ += ⟦
    init!{
        
      destroy_marks:
        for (i in top_mark_code to split_bot_mark_code) {
            get_sa_ptr;
            if (cur_ptr != null) {
                delete_token_ref(cur_ptr);
                put_sa_ptr(null);
            }
        }
    }
⟧

1643. The command code register is used for ‘\count’, ‘\dimen’, etc., as well as for references to sparse array elements defined by ‘\countdef’, etc.

⟦1643 Cases of |register| for |print_cmd_chr|⟧ = ⟦
    {
        if (
            (chr_code < mem_bot)
            || (chr_code > lo_mem_stat_max)
        ) {
            cmd = sa_type(chr_code);
        } else {
            cmd = chr_code - mem_bot;
            chr_code = null;
        }
        if (cmd == int_val) {
            print_esc(strpool!("count"));
        } else if (cmd == dimen_val) {
            print_esc(strpool!("dimen"));
        } else if (cmd == glue_val) {
            print_esc(strpool!("skip"));
        } else {
            print_esc(strpool!("muskip"));
        }
        if (chr_code != null) {
            print_sa_num(chr_code);
        }
    }
⟧

1644. Similarly the command code toks_register is used for ‘\toks’ as well as for references to sparse array elements defined by ‘\toksdef’.

⟦1644 Cases of |toks_register| for |print_cmd_chr|⟧ = ⟦
    {
        print_esc(strpool!("toks"));
        if (chr_code != mem_bot) {
            print_sa_num(chr_code);
        }
    }
⟧

1645. When a shorthand definition for an element of one of the sparse arrays is destroyed, we must reduce the reference count.

⟦1645 Cases for |eq_destroy|⟧ = ⟦
    toks_register, register:
      if (
          (equiv_field(w) < mem_bot)
          || (equiv_field(w) > lo_mem_stat_max)
      ) {
          delete_sa_ref(equiv_field(w));
      }
⟧

1646. The task to maintain (change, save, and restore) register values is essentially the same when the register is realized as sparse array element or entry in eqtb . The global variable sa_chain is the head of a linked list of entries saved at the topmost level sa_level ; the lists for lowel levels are kept in special save stack entries.

⟦13 Global variables⟧ += ⟦
    // chain of saved sparse array entries
    var sa_chain: pointer;

    // group level for sa_chain 
    var sa_level: quarterword;
⟧

1647.

⟦23 Set initial values of key variables⟧ += ⟦
    sa_chain = null

    sa_level = level_zero
⟧

1648. The individual saved items are kept in pointer or word nodes similar to those used for the array elements: a word node with value zero is, however, saved as pointer node with the otherwise impossible sa_index value tok_val_limit .

@define sa_loc => sa_ref // location of saved item
⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // saves value of p 
    function sa_save(p: pointer) {
        var
          q: pointer, // the new save node
          i: quarterword; // index field of node
        
        if (cur_level != sa_level) {
            check_full_save_stack;
            save_type(save_ptr) = restore_sa;
            save_level(save_ptr) = sa_level;
            save_index(save_ptr) = sa_chain;
            incr(save_ptr);
            sa_chain = null;
            sa_level = cur_level;
        }
        i = sa_index(p);
        if (i < dimen_val_limit) {
            if (sa_int(p) == 0) {
                q = get_node(pointer_node_size);
                i = tok_val_limit;
            } else {
                q = get_node(word_node_size);
                sa_int(q) = sa_int(p);
            }
            sa_ptr(q) = null;
        } else {
            q = get_node(pointer_node_size);
            sa_ptr(q) = sa_ptr(p);
        }
        sa_loc(q) = p;
        sa_index(q) = i;
        sa_lev(q) = sa_lev(p);
        link(q) = sa_chain;
        sa_chain = q;
        add_sa_ref(p);
    }
⟧

1649.

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // destroy value of p 
    function sa_destroy(p: pointer) {
        if (sa_index(p) < mu_val_limit) {
            delete_glue_ref(sa_ptr(p));
        } else if (sa_ptr(p) != null) {
            if (sa_index(p) < box_val_limit) {
                flush_node_list(sa_ptr(p));
            } else {
                delete_token_ref(sa_ptr(p));
            }
        }
    }
⟧

1650. The procedure sa_def assigns a new value to sparse array elements, and saves the former value if appropriate. This procedure is used only for skip, muskip, box, and token list registers. The counterpart of sa_def for count and dimen registers is called sa_w_def .

@define sa_define(#) =>
    if (e) {
        if (global) {
            gsa_def(#);
        } else {
            sa_def(#);
        }
    } else {
        define;
    }
// assign cur_box to box ( cur_val ) 
@define sa_def_box =>
    {
        find_sa_element(box_val, cur_val, true);
        if (global) {
            gsa_def(cur_ptr, cur_box);
        } else {
            sa_def(cur_ptr, cur_box);
        }
    }
@define sa_word_define(#) =>
    if (e) {
        if (global) {
            gsa_w_def(#);
        } else {
            sa_w_def(#);
        }
    } else {
        word_define(#);
    }
⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // new data for sparse array elements
    function sa_def(p: pointer, e: halfword) {
        add_sa_ref(p);
        if (sa_ptr(p) == e) {
            stat!{
                if (tracing_assigns > 0) {
                    show_sa(p, strpool!("reassigning"));
                }
            }
            sa_destroy(p);
        } else {
            stat!{
                if (tracing_assigns > 0) {
                    show_sa(p, strpool!("changing"));
                }
            }
            if (sa_lev(p) == cur_level) {
                sa_destroy(p);
            } else {
                sa_save(p);
            }
            sa_lev(p) = cur_level;
            sa_ptr(p) = e;
            stat!{
                if (tracing_assigns > 0) {
                    show_sa(p, strpool!("into"));
                }
            }
        }
        delete_sa_ref(p);
    }

    function sa_w_def(p: pointer, w: integer) {
        add_sa_ref(p);
        if (sa_int(p) == w) {
            stat!{
                if (tracing_assigns > 0) {
                    show_sa(p, strpool!("reassigning"));
                }
            }
        } else {
            stat!{
                if (tracing_assigns > 0) {
                    show_sa(p, strpool!("changing"));
                }
            }
            if (sa_lev(p) != cur_level) {
                sa_save(p);
            }
            sa_lev(p) = cur_level;
            sa_int(p) = w;
            stat!{
                if (tracing_assigns > 0) {
                    show_sa(p, strpool!("into"));
                }
            }
        }
        delete_sa_ref(p);
    }
⟧

1651. The sa_def and sa_w_def routines take care of local definitions. Global definitions are done in almost the same way, but there is no need to save old values, and the new value is associated with level_one .

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    // global sa_def 
    function gsa_def(p: pointer, e: halfword) {
        add_sa_ref(p);
        stat!{
            if (tracing_assigns > 0) {
                show_sa(p, strpool!("globally changing"));
            }
        }
        sa_destroy(p);
        sa_lev(p) = level_one;
        sa_ptr(p) = e;
        stat!{
            if (tracing_assigns > 0) {
                show_sa(p, strpool!("into"));
            }
        }
        delete_sa_ref(p);
    }

    // global sa_w_def 
    function gsa_w_def(p: pointer, w: integer) {
        add_sa_ref(p);
        stat!{
            if (tracing_assigns > 0) {
                show_sa(p, strpool!("globally changing"));
            }
        }
        sa_lev(p) = level_one;
        sa_int(p) = w;
        stat!{
            if (tracing_assigns > 0) {
                show_sa(p, strpool!("into"));
            }
        }
        delete_sa_ref(p);
    }
⟧

1652. The sa_restore procedure restores the sparse array entries pointed at by sa_chain

⟦314 Declare \eTeX\ procedures for tracing and input⟧ += ⟦
    function sa_restore() {
        var
          p: pointer; // sparse array element
        
        repeat {
            p = sa_loc(sa_chain);
            if (sa_lev(p) == level_one) {
                if (sa_index(p) >= dimen_val_limit) {
                    sa_destroy(sa_chain);
                }
                stat!{
                    if (tracing_restores > 0) {
                        show_sa(p, strpool!("retaining"));
                    }
                }
            } else {
                if (sa_index(p) < dimen_val_limit) {
                    if (sa_index(sa_chain) < dimen_val_limit) {
                        sa_int(p) = sa_int(sa_chain);
                    } else {
                        sa_int(p) = 0;
                    }
                } else {
                    sa_destroy(p);
                    sa_ptr(p) = sa_ptr(sa_chain);
                }
                sa_lev(p) = sa_lev(sa_chain);
                stat!{
                    if (tracing_restores > 0) {
                        show_sa(p, strpool!("restoring"));
                    }
                }
            }
            delete_sa_ref(p);
            p = sa_chain;
            sa_chain = link(p);
            if (sa_index(p) < dimen_val_limit) {
                free_node(p, word_node_size);
            } else {
                free_node(p, pointer_node_size);
            }
        } until (sa_chain == null);
    }
⟧

1653. When the value of last_line_fit is positive, the last line of a (partial) paragraph is treated in a special way and we need additional fields in the active nodes.

// number of words in extended active nodes
@define active_node_size_extended => 5
//  shortfall of this line
@define active_short(#) => mem[# + 3].sc
// corresponding glue stretch or shrink
@define active_glue(#) => mem[# + 4].sc
⟦13 Global variables⟧ += ⟦
    // the par_fill_skip glue node of the new paragraph
    var last_line_fill: pointer;

    // special algorithm for last line of paragraph?
    var do_last_line_fit: boolean;

    // number of words in active nodes
    var active_node_size: small_number;

    // infinite stretch components of par_fill_skip 
    var fill_width: array [0 .. 2] of scaled;

    //  shortfall corresponding to minimal_demerits 
    var best_pl_short: array [
      very_loose_fit .. tight_fit,
    ] of scaled;

    // corresponding glue stretch or shrink
    var best_pl_glue: array [very_loose_fit .. tight_fit] of
      scaled;
⟧

1654. The new algorithm for the last line requires that the stretchability of par_fill_skip is infinite and the stretchability of left_skip plus right_skip is finite.

⟦1654 Check for special treatment of last line of paragraph⟧ = ⟦
    do_last_line_fit = false

    // just in case
    active_node_size = active_node_size_normal

    if (last_line_fit > 0) {
        q = glue_ptr(last_line_fill);
        if ((stretch(q) > 0) && (stretch_order(q) > normal)) {
            if (
                (background[3] == 0)
                && (background[4] == 0)
                && (background[5] == 0)
            ) {
                do_last_line_fit = true;
                active_node_size = active_node_size_extended;
                fill_width[0] = 0;
                fill_width[1] = 0;
                fill_width[2] = 0;
                fill_width[stretch_order(q) - 1] = stretch(
                  q,
                );
            }
        }
    }
⟧

1655.

⟦878 Other local variables for |try_break|⟧ += ⟦
    // glue stretch or shrink of test line, adjustment for 
    // last line
    var g: scaled;
⟧

1656. Here we initialize the additional fields of the first active node representing the beginning of the paragraph.

⟦1656 Initialize additional fields of the first active node⟧ = ⟦
    {
        active_short(q) = 0;
        active_glue(q) = 0;
    }
⟧

1657. Here we compute the adjustment g and badness b for a line from r to the end of the paragraph. When any of the criteria for adjustment is violated we fall through to the normal algorithm.

The last line must be too short, and have infinite stretch entirely due to par_fill_skip .

⟦1657 Perform computations for last line and |goto found|⟧ = ⟦
    {
        if ((active_short(r) == 0) || (active_glue(r) <= 0)) {
            // previous line was neither stretched nor 
            // shrunk, or was infinitely bad
            goto not_found;
        }
        if (
            (cur_active_width[3] != fill_width[0])
            || (cur_active_width[4] != fill_width[1])
            || (cur_active_width[5] != fill_width[2])
        ) {
            // infinite stretch of this line not entirely 
            // due to par_fill_skip 
            goto not_found;
        }
        if (active_short(r) > 0) {
            g = cur_active_width[2];
        } else {
            g = cur_active_width[6];
        }
        if (g <= 0) {
            // no finite stretch resp.\ no shrink
            goto not_found;
        }
        arith_error = false;
        g = fract(
          g,
          active_short(r),
          active_glue(r),
          max_dimen,
        );
        if (last_line_fit < 1000) {
            g = fract(g, last_line_fit, 1000, max_dimen);
        }
        if (arith_error) {
            if (active_short(r) > 0) {
                g = max_dimen;
            } else {
                g = -max_dimen;
            }
        }
        if (g > 0) {
            ⟦1658 Set the value of |b| to the badness of the last line for stretching, compute the corresponding |fit_class|, and |goto found|⟧
        } else if (g < 0) {
            ⟦1659 Set the value of |b| to the badness of the last line for shrinking, compute the corresponding |fit_class|, and |goto found|⟧
        }
      not_found:
    }
⟧

1658. These badness computations are rather similar to those of the standard algorithm, with the adjustment amount g replacing the shortfall .

⟦1658 Set the value of |b| to the badness of the last line for stretching, compute the corresponding |fit_class|, and |goto found|⟧ = ⟦
    {
        if (g > shortfall) {
            g = shortfall;
        }
        if (g > 7230584) {
            if (cur_active_width[2] < 1663497) {
                b = inf_bad;
                fit_class = very_loose_fit;
                goto found;
            }
        }
        b = badness(g, cur_active_width[2]);
        if (b > 12) {
            if (b > 99) {
                fit_class = very_loose_fit;
            } else {
                fit_class = loose_fit;
            }
        } else {
            fit_class = decent_fit;
        }
        goto found;
    }
⟧

1659.

⟦1659 Set the value of |b| to the badness of the last line for shrinking, compute the corresponding |fit_class|, and |goto found|⟧ = ⟦
    {
        if (-g > cur_active_width[6]) {
            g = -cur_active_width[6];
        }
        b = badness(-g, cur_active_width[6]);
        if (b > 12) {
            fit_class = tight_fit;
        } else {
            fit_class = decent_fit;
        }
        goto found;
    }
⟧

1660. Vanishing values of shortfall and g indicate that the last line is not adjusted.

⟦1660 Adjust \(t)the additional data for last line⟧ = ⟦
    {
        if (cur_p == null) {
            shortfall = 0;
        }
        if (shortfall > 0) {
            g = cur_active_width[2];
        } else if (shortfall < 0) {
            g = cur_active_width[6];
        } else {
            g = 0;
        }
    }
⟧

1661. For each feasible break we record the shortfall and glue stretch or shrink (or adjustment).

⟦1661 Store \(a)additional data for this feasible break⟧ = ⟦
    {
        best_pl_short[fit_class] = shortfall;
        best_pl_glue[fit_class] = g;
    }
⟧

1662. Here we save these data in the active node representing a potential line break.

⟦1662 Store \(a)additional data in the new active node⟧ = ⟦
    {
        active_short(q) = best_pl_short[fit_class];
        active_glue(q) = best_pl_glue[fit_class];
    }
⟧

1663.

⟦1663 Print additional data in the new active node⟧ = ⟦
    {
        print(strpool!(" s="));
        print_scaled(active_short(q));
        if (cur_p == null) {
            print(strpool!(" a="));
        } else {
            print(strpool!(" g="));
        }
        print_scaled(active_glue(q));
    }
⟧

1664. Here we either reset do_last_line_fit or adjust the par_fill_skip glue.

⟦1664 Adjust \(t)the final line of the paragraph⟧ = ⟦
    if (active_short(best_bet) == 0) {
        do_last_line_fit = false;
    } else {
        q = new_spec(glue_ptr(last_line_fill));
        delete_glue_ref(glue_ptr(last_line_fill));
        width(q) = 
            width(q)
            + active_short(best_bet) - active_glue(best_bet)
        ;
        stretch(q) = 0;
        glue_ptr(last_line_fill) = q;
    }
⟧

1665. When reading \patterns while \savinghyphcodes is positive the current lc_code values are stored together with the hyphenation patterns for the current language. They will later be used instead of the lc_code values for hyphenation purposes.

The lc_code values are stored in the linked trie analogous to patterns 𝑝1 of length 1, with hyph_root == trie_r[0] replacing trie_root and lc_code(p_1) replacing the trie_op code. This allows to compress and pack them together with the patterns with minimal changes to the existing code.

// root of the linked trie for hyph_codes 
@define hyph_root => trie_r[0]
⟦189 Initialize table entries (done by \.{INITEX} only)⟧ += ⟦
    // for backward compatibility with standard TeX by 
    // default
    XeTeX_hyphenatable_length = 63

1666.

⟦1666 Store hyphenation codes for current language⟧ = ⟦
    {
        c = cur_lang;
        first_child = false;
        p = 0;
        repeat {
            q = p;
            p = trie_r[q];
        } until ((p == 0) || (c <= so(trie_c[p])));
        if ((p == 0) || (c < so(trie_c[p]))) {
            ⟦1018 Insert a new trie node between |q| and |p|, and make |p| point to it⟧
        }
        // now node q represents cur_lang 
        q = p;
        ⟦1667 Store all current |lc_code| values⟧
    }
⟧

1667. We store all nonzero lc_code values, overwriting any previously stored values (and possibly wasting a few trie nodes that were used previously and are not needed now). We always store at least one lc_code value such that hyph_index (defined below) will not be zero.

⟦1667 Store all current |lc_code| values⟧ = ⟦
    p = trie_l[q]

    first_child = true

    for (c in 0 to 255) {
        if ((lc_code(c) > 0) || ((c == 255) && first_child)) {
            if (p == 0) {
                ⟦1018 Insert a new trie node between |q| and |p|, and make |p| point to it⟧
            } else {
                trie_c[p] = si(c);
            }
            trie_o[p] = qi(lc_code(c));
            q = p;
            p = trie_r[q];
            first_child = false;
        }
    }

    if (first_child) {
        trie_l[q] = 0;
    } else {
        trie_r[q] = 0;
    }
⟧

1668. We must avoid to “take” location 1, in order to distinguish between lc_code values and patterns.

⟦1668 Pack all stored |hyph_codes|⟧ = ⟦
    {
        if (trie_root == 0) {
            for (p in 0 to 255) {
                trie_min[p] = p + 2;
            }
        }
        first_fit(hyph_root);
        trie_pack(hyph_root);
        hyph_start = trie_ref[hyph_root];
    }
⟧

1669. The global variable hyph_index will point to the hyphenation codes for the current language.

// set hyph_index for current language
@define set_hyph_index =>
    if (trie_char(hyph_start + cur_lang) != qi(cur_lang)) {
        // no hyphenation codes for cur_lang 
        hyph_index = 0;
    } else {
        hyph_index = trie_link(hyph_start + cur_lang);
    }
// set hc [ 0 ] to hyphenation or lc code for # 
@define set_lc_code(#) =>
    if ((hyph_index == 0) || ((#) > 255)) {
        hc[0] = lc_code(#);
    } else if (trie_char(hyph_index + #) != qi(#)) {
        hc[0] = 0;
    } else {
        hc[0] = qo(trie_op(hyph_index + #));
    }
⟦13 Global variables⟧ += ⟦
    // root of the packed trie for hyph_codes 
    var hyph_start: trie_pointer;

    // pointer to hyphenation codes for cur_lang 
    var hyph_index: trie_pointer;
⟧

1670. When saving_vdiscards is positive then the glue, kern, and penalty nodes removed by the page builder or by \vsplit from the top of a vertical list are saved in special lists instead of being discarded.

// last item removed by page builder
@define tail_page_disc => disc_ptr[copy_code]
// first item removed by page builder
@define page_disc => disc_ptr[last_box_code]
// first item removed by \.{\\vsplit}
@define split_disc => disc_ptr[vsplit_code]
⟦13 Global variables⟧ += ⟦
    // list pointers
    var disc_ptr: array [copy_code .. vsplit_code] of
      pointer;
⟧

1671.

⟦23 Set initial values of key variables⟧ += ⟦
    page_disc = null

    split_disc = null
⟧

1672. The \pagediscards and \splitdiscards commands share the command code un_vbox with \unvbox and \unvcopy, they are distinguished by their chr_code values last_box_code and vsplit_code . These chr_code values are larger than box_code and copy_code .

⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("pagediscards"),
      un_vbox,
      last_box_code,
    )

    primitive(
      strpool!("splitdiscards"),
      un_vbox,
      vsplit_code,
    )
⟧

1673.

⟦1673 Cases of |un_vbox| for |print_cmd_chr|⟧ = ⟦
    else

    if (chr_code == last_box_code) {
        print_esc(strpool!("pagediscards"));
    } else if (chr_code == vsplit_code) {
        print_esc(strpool!("splitdiscards"));
    }
⟧

1674.

⟦1674 Handle saved items and |goto done|⟧ = ⟦
    {
        link(tail) = disc_ptr[cur_chr];
        disc_ptr[cur_chr] = null;
        goto done;
    }
⟧

1675. The \interlinepenalties, \clubpenalties, \widowpenalties, and \displaywidowpenalties commands allow to define arrays of penalty values to be used instead of the corresponding single values.

@define inter_line_penalties_ptr =>
    equiv(inter_line_penalties_loc)
@define club_penalties_ptr => equiv(club_penalties_loc)
@define widow_penalties_ptr => equiv(widow_penalties_loc)
@define display_widow_penalties_ptr =>
    equiv(display_widow_penalties_loc)
⟦1399 Generate all \eTeX\ primitives⟧ += ⟦
    primitive(
      strpool!("interlinepenalties"),
      set_shape,
      inter_line_penalties_loc,
    )

    primitive(
      strpool!("clubpenalties"),
      set_shape,
      club_penalties_loc,
    )

    primitive(
      strpool!("widowpenalties"),
      set_shape,
      widow_penalties_loc,
    )

    primitive(
      strpool!("displaywidowpenalties"),
      set_shape,
      display_widow_penalties_loc,
    )
⟧

1676.

⟦1676 Cases of |set_shape| for |print_cmd_chr|⟧ = ⟦
    inter_line_penalties_loc:

    print_esc(strpool!("interlinepenalties"))

    club_penalties_loc:

    print_esc(strpool!("clubpenalties"))

    widow_penalties_loc:

    print_esc(strpool!("widowpenalties"))

    display_widow_penalties_loc:

    print_esc(strpool!("displaywidowpenalties"))
⟧

1677.

⟦1677 Fetch a penalties array element⟧ = ⟦
    {
        scan_int;
        if ((equiv(m) == null) || (cur_val < 0)) {
            cur_val = 0;
        } else {
            if (cur_val > penalty(equiv(m))) {
                cur_val = penalty(equiv(m));
            }
            cur_val = penalty(equiv(m) + cur_val);
        }
    }
⟧

1678. [54/web2c] System-dependent changes for Web2c. Here are extra variables for Web2c. (This numbering of the system-dependent section allows easy integration of Web2c and e-TEX, etc.)

⟦13 Global variables⟧ += ⟦
    // where the filename to switch to starts
    var edit_name_start: pool_pointer;

    // what line to start editing at
    var edit_name_length, edit_line: integer;

    // level of IPC action, 0 for none [default]
    var ipc_on: cinttype;

    // whether more_name returns false for space
    var stop_at_space: boolean;
⟧

1679. The edit_name_start will be set to point into str_pool somewhere after its beginning if TEX is supposed to switch to an editor on exit.

⟦23 Set initial values of key variables⟧ += ⟦
    edit_name_start = 0

    stop_at_space = true
⟧

1680. These are used when we regenerate the representation of the first 256 strings.

⟦13 Global variables⟧ += ⟦
    var save_str_ptr: str_number;

    var save_pool_ptr: pool_pointer;

    var shellenabledp: cinttype;

    var restrictedshell: cinttype;

    var output_comment: ^char;

    // used by `Make the first 256 strings', etc.
    var k, l: 0 .. 255;
⟧

1681. When debugging a macro package, it can be useful to see the exact control sequence names in the format file. For example, if ten new csnames appear, it’s nice to know what they are, to help pinpoint where they came from. (This isn’t a truly “basic” printing procedure, but that’s a convenient module in which to put it.)

⟦57 Basic printing procedures⟧ += ⟦
    function print_csnames(
      hstart: integer,
      hfinish: integer,
    ) {
        var c, h: integer;
        
        write_ln(
          stderr,
          "fmtdebug:csnames from ",
          hstart,
          " to ",
          hfinish,
          ":",
        );
        for (h in hstart to hfinish) {
            if (text(h) > 0) {
                // if have anything at this position
                for (c in str_start_macro(text(h)) to 
                    str_start_macro(text(h) + 1)
                    - 1
                ) {
                    // print the characters
                    put_byte(str_pool[c], stderr);
                }
                write_ln(stderr, "|");
            }
        }
    }
⟧

1682. Are we printing extra info as we read the format file?

⟦13 Global variables⟧ += ⟦
    var debug_format_file: boolean;
⟧

1683. A helper for printing file:line:error style messages. Look for a filename in full_source_filename_stack , and if we fail to find one fall back on the non-file:line:error style.

⟦57 Basic printing procedures⟧ += ⟦
    function print_file_line() {
        var level: 0 .. max_in_open;
        
        level = in_open;
        while (
            (level > 0)
            && (full_source_filename_stack[level] == 0)
        ) {
            decr(level);
        }
        if (level == 0) {
            print_nl(strpool!("! "));
        } else {
            print_nl(strpool!(""));
            print(full_source_filename_stack[level]);
            print(ord!(":"));
            if (level == in_open) {
                print_int(line);
            } else {
                print_int(line_stack[level + 1]);
            }
            print(strpool!(": "));
        }
    }
⟧

1684. To be able to determine whether \write18 is enabled from within TEX we also implement \eof18. We sort of cheat by having an additional route scan_four_bit_int_or_18 which is the same as scan_four_bit_int except it also accepts the value 18.

⟦467 Declare procedures that scan restricted classes of integers⟧ += ⟦
    function scan_four_bit_int_or_18() {
        scan_int;
        if (
            (cur_val < 0)
            || ((cur_val > 15) && (cur_val != 18))
        ) {
            print_err(strpool!("Bad number"));
            help2(
              strpool!("Since I expected to read a number between 0 and 15,"),
            )(strpool!("I changed this one to zero."));
            int_error(cur_val);
            cur_val = 0;
        }
    }
⟧

1685. [54/web2c-string] The string recycling routines. TEX uses 2 upto 4 new strings when scanning a filename in an \input, \openin, or \openout operation. These strings are normally lost because the reference to them are not saved after finishing the operation. search_string searches through the string pool for the given string and returns either 0 or the found string number.

⟦1685 Declare additional routines for string recycling⟧ = ⟦
    function search_string(search: str_number): str_number {
        label found;
        var
          result: str_number,
          s: str_number, // running index
          len: integer; // length of searched string
        
        result = 0;
        len = length(search);
        // trivial case
        if (len == 0) {
            result = strpool!("");
            goto found;
        } else {
            // start search with newest string below s ; 
            // search > 1 !
            s = search - 1;
            // first 64K strings don't really exist in the 
            // pool!
            while (s > 65535) {
                if (length(s) == len) {
                    if (str_eq_str(s, search)) {
                        result = s;
                        goto found;
                    }
                }
                decr(s);
            }
        }
      found:
        search_string = result;
    }
⟧

1686. The following routine is a variant of make_string . It searches the whole string pool for a string equal to the string currently built and returns a found string. Otherwise a new string is created and returned. Be cautious, you can not apply flush_string to a replaced string!

⟦1685 Declare additional routines for string recycling⟧ += ⟦
    function slow_make_string(): str_number {
        label exit;
        var
          s: str_number, // result of search_string 
          t: str_number; // new string
        
        t = make_string;
        s = search_string(t);
        if (s > 0) {
            flush_string;
            slow_make_string = s;
            return;
        }
        slow_make_string = t;
      exit:
    }
⟧

1687. [54/web2c] More changes for Web2c. Sometimes, recursive calls to the expand routine may cause exhaustion of the run-time calling stack, resulting in forced execution stops by the operating system. To diminish the chance of this happening, a counter is used to keep track of the recursion depth, in conjunction with a constant called expand_depth .

This does not catch all possible infinite recursion loops, just the ones that exhaust the application calling stack. The actual maximum value of expand_depth is outside of our control, but the initial setting of 10000 should be enough to prevent problems.

⟦13 Global variables⟧ += ⟦
    var expand_depth_count: integer;
⟧

1688.

⟦23 Set initial values of key variables⟧ += ⟦
    expand_depth_count = 0

1689. When scan_file_name starts it looks for a left_brace (skipping \relaxes, as other \toks-like primitives). If a left_brace is found, then the procedure scans a file name contained in a balanced token list, expanding tokens as it goes. When the scanner finds the balanced token list, it is converted into a string and fed character-by-character to more_name to do its job the same as in the “normal” file name scanning.

function scan_file_name_braced() {
    var
      save_scanner_status: small_number, //  scanner_status 
      // upon entry
      save_def_ref: pointer, //  def_ref upon entry, 
      // important if inside `\.{\\message}
      save_cur_cs: pointer,
      s: str_number, // temp string
      p: pointer, // temp pointer
      i: integer, // loop tally
      save_stop_at_space: boolean, // this should be in 
      // tex.ch
      dummy: boolean; // Initializing
    
    //  scan_toks sets scanner_status to absorbing 
    save_scanner_status = scanner_status;
    //  scan_toks uses def_ref to point to the token list 
    // just read
    save_def_ref = def_ref;
    // we set cur_cs back a few tokens to use in runaway 
    // errors
    // Scanning a token list
    save_cur_cs = cur_cs;
    // for possible runaway error
    // mimick call_func from pdfTeX
    cur_cs = warning_index;
    if (scan_toks(false, true) != 0) {
        // actually do the scanning
        do_nothing;
    }
    old_setting = selector;
    selector = new_string;
    show_token_list(
      link(def_ref),
      null,
      pool_size - pool_ptr,
    );
    selector = old_setting;
    // turns the token list read in a string to input
    // Restoring some variables
    s = make_string;
    // remove the token list from memory
    delete_token_ref(def_ref);
    // and restore def_ref 
    def_ref = save_def_ref;
    // restore cur_cs 
    cur_cs = save_cur_cs;
    // restore scanner_status 
    // Passing the read string to the input machinery
    scanner_status = save_scanner_status;
    // save stop_at_space 
    save_stop_at_space = stop_at_space;
    // set stop_at_space to false to allow spaces in file 
    // names
    stop_at_space = false;
    begin_name;
    for (i in str_start_macro(s) to 
        str_start_macro(s + 1)
        - 1
    ) {
        // add each read character to the current file name
        dummy = more_name(str_pool[i]);
    }
    // restore stop_at_space 
    stop_at_space = save_stop_at_space;
}

1690. [54/MLTEX] System-dependent changes for MLTEX. The boolean variable mltex_p is set by web2c according to the given command line option (or an entry in the configuration file) before any TEX function is called.

⟦13 Global variables⟧ += ⟦
    var mltex_p: boolean;
⟧

1691. The boolean variable mltex_enabled_p is used to enable MLTEX’s character substitution. It is initialized to false . When loading a FMT it is set to the value of the boolean mltex_p saved in the FMT file. Additionally it is set to the value of mltex_p in IniTEX.

⟦13 Global variables⟧ += ⟦
    // enable character substitution
    var mltex_enabled_p: boolean;

    // used by XeTeX font loading code to record which font 
    // technology was used
    var native_font_type_flag: integer;

    // to suppress tfm font mapping of char codes from 
    // ligature nodes (already mapped)
    var xtx_ligature_present: boolean;
⟧

1692.

⟦23 Set initial values of key variables⟧ += ⟦
    mltex_enabled_p = false
⟧

1693. The function effective_char computes the effective character with respect to font information. The effective character is either the base character part of a character substitution definition, if the character does not exist in the font or the character itself.

Inside effective_char we can not use char_info because the macro char_info uses effective_char calling this function a second time with the same arguments.

If neither the character c exists in font f nor a character substitution for c was defined, you can not use the function value as a character offset in char_info because it will access an undefined or invalid font_info entry! Therefore inside char_info and in other places, effective_char ’s boolean parameter err_p is set to true to issue a warning and return the incorrect replacement, but always existing character font_bc[f] .

⟦1492 Declare \eTeX\ procedures for scanning⟧ += ⟦
    function effective_char(
      err_p: boolean,
      f: internal_font_number,
      c: quarterword,
    ): integer {
        label found;
        var
          base_c: integer, // or eightbits : replacement 
          // base character
          result: integer; // or quarterword 
        
        if (
            (!xtx_ligature_present)
            && (font_mapping[f] != nil)
        ) {
            c = apply_tfm_font_mapping(font_mapping[f], c);
        }
        xtx_ligature_present = false;
        // return c unless it does not exist in the font
        result = c;
        if (!mltex_enabled_p) {
            goto found;
        }
        if (font_ec[f] >= qo(c)) {
            if (font_bc[f] <= qo(c)) {
                // N.B.: not char_info (f)(c)
                if (char_exists(orig_char_info(f)(c))) {
                    goto found;
                }
            }
        }
        if (qo(c) >= char_sub_def_min) {
            if (qo(c) <= char_sub_def_max) {
                if (char_list_exists(qo(c))) {
                    base_c = char_list_char(qo(c));
                    // return base_c 
                    result = qi(base_c);
                    if (!err_p) {
                        goto found;
                    }
                    if (font_ec[f] >= base_c) {
                        if (font_bc[f] <= base_c) {
                            if (char_exists(
                              orig_char_info(f)(qi(base_c)),
                            )) {
                                goto found;
                            }
                        }
                    }
                }
            }
        }
        // print error and return existing character?
        if (err_p) {
            begin_diagnostic;
            print_nl(
              strpool!("Missing character: There is no "),
            );
            print(strpool!("substitution for "));
            print_ASCII(qo(c));
            print(strpool!(" in font "));
            slow_print(font_name[f]);
            print_char(ord!("!"));
            end_diagnostic(false);
            // N.B.: not non-existing character c !
            result = qi(font_bc[f]);
        }
      found:
        effective_char = result;
    }
⟧

1694. The function effective_char_info is equivalent to char_info , except it will return null_character if neither the character c exists in font f nor is there a substitution definition for c . (For these cases char_info using effective_char will access an undefined or invalid font_info entry. See the documentation of effective_char for more information.)

⟦1694 Declare additional functions for ML\TeX⟧ = ⟦
    function effective_char_info(
      f: internal_font_number,
      c: quarterword,
    ): four_quarters {
        label exit;
        var
          ci: four_quarters, // character information bytes 
          // for c 
          base_c: integer; // or eightbits : replacement 
          // base character
        
        if (
            (!xtx_ligature_present)
            && (font_mapping[f] != nil)
        ) {
            c = apply_tfm_font_mapping(font_mapping[f], c);
        }
        xtx_ligature_present = false;
        if (!mltex_enabled_p) {
            effective_char_info = orig_char_info(f)(c);
            return;
        }
        if (font_ec[f] >= qo(c)) {
            if (font_bc[f] <= qo(c)) {
                // N.B.: not char_info (f)(c)
                ci = orig_char_info(f)(c);
                if (char_exists(ci)) {
                    effective_char_info = ci;
                    return;
                }
            }
        }
        if (qo(c) >= char_sub_def_min) {
            if (qo(c) <= char_sub_def_max) {
                if (char_list_exists(qo(c))) {
                    //  effective_char_info = char_info ( f 
                    // ) ( qi ( char_list_char ( qo ( c ) ) 
                    // ) ) ; 
                    base_c = char_list_char(qo(c));
                    if (font_ec[f] >= base_c) {
                        if (font_bc[f] <= base_c) {
                            // N.B.: not char_info (f)(c)
                            ci = orig_char_info(f)(
                              qi(base_c),
                            );
                            if (char_exists(ci)) {
                                effective_char_info = ci;
                                return;
                            }
                        }
                    }
                }
            }
        }
        effective_char_info = null_character;
      exit:
    }

    ⟦616 Declare subroutines for |new_character|⟧

1695. This code is called for a virtual character c in hlist_out during ship_out . It tries to built a character substitution construct for c generating appropriate DVI code using the character substitution definition for this character. If a valid character substitution exists DVI code is created as if make_accent was used. In all other cases the status of the substitution for this character has been changed between the creation of the character node in the hlist and the output of the page—the created DVI code will be correct but the visual result will be undefined.

Former MLTEX versions have replaced the character node by a sequence of character, box, and accent kern nodes splicing them into the original horizontal list. This version does not do this to avoid a) a memory overflow at this processing stage, b) additional code to add a pointer to the previous node needed for the replacement, and c) to avoid wrong code resulting in anomalies because of the use within a \leaders box.

⟦1695 Output a substitution, |goto continue| if not possible⟧ = ⟦
    {
        ⟦1697 Get substitution information, check it, goto |found| if all is ok, otherwise goto |continue|⟧
      found:
        ⟦1698 Print character substitution tracing log⟧
        ⟦1699 Rebuild character using substitution information⟧
    }
⟧

1696. The global variables for the code to substitute a virtual character can be declared as local. Nonetheless we declare them as global to avoid stack overflows because hlist_out can be called recursively.

⟦13 Global variables⟧ += ⟦
    var accent_c, base_c, replace_c: integer;

    // accent and base character information
    var ia_c, ib_c: four_quarters;

    // amount of slant
    var base_slant, accent_slant: real;

    // accent is designed for characters of this height
    var base_x_height: scaled;

    // height and width for base character
    var base_width, base_height: scaled;

    // height and width for accent
    var accent_width, accent_height: scaled;

    // amount of right shift
    var delta: scaled;
⟧

1697. Get the character substitution information in char_sub_code for the character c . The current code checks that the substitution exists and is valid and all substitution characters exist in the font, so we can not substitute a character used in a substitution. This simplifies the code because we have not to check for cycles in all character substitution definitions.

⟦1697 Get substitution information, check it, goto |found| if all is ok, otherwise goto |continue|⟧ = ⟦
    if (qo(c) >= char_sub_def_min) {
        if (qo(c) <= char_sub_def_max) {
            if (char_list_exists(qo(c))) {
                base_c = char_list_char(qo(c));
                accent_c = char_list_accent(qo(c));
                if ((font_ec[f] >= base_c)) {
                    if ((font_bc[f] <= base_c)) {
                        if ((font_ec[f] >= accent_c)) {
                            if ((font_bc[f] <= accent_c)) {
                                ia_c = char_info(f)(
                                  qi(accent_c),
                                );
                                ib_c = char_info(f)(
                                  qi(base_c),
                                );
                                if (char_exists(ib_c)) {
                                    if (char_exists(ia_c)) {
                                        goto found;
                                    }
                                }
                            }
                        }
                    }
                }
                begin_diagnostic;
                print_nl(
                  strpool!("Missing character: Incomplete substitution "),
                );
                print_ASCII(qo(c));
                print(strpool!(" = "));
                print_ASCII(accent_c);
                print(ord!(" "));
                print_ASCII(base_c);
                print(strpool!(" in font "));
                slow_print(font_name[f]);
                print_char(ord!("!"));
                end_diagnostic(false);
                goto continue;
            }
        }
    }

    begin_diagnostic

    print_nl(strpool!("Missing character: There is no "))

    print(strpool!("substitution for "))

    print_ASCII(qo(c))

    print(strpool!(" in font "))

    slow_print(font_name[f])

    print_char(ord!("!"))

    end_diagnostic(false)

    goto continue

1698. For tracinglostchars > 99 the substitution is shown in the log file.

⟦1698 Print character substitution tracing log⟧ = ⟦
    if (tracing_lost_chars > 99) {
        begin_diagnostic;
        print_nl(
          strpool!("Using character substitution: "),
        );
        print_ASCII(qo(c));
        print(strpool!(" = "));
        print_ASCII(accent_c);
        print(ord!(" "));
        print_ASCII(base_c);
        print(strpool!(" in font "));
        slow_print(font_name[f]);
        print_char(ord!("."));
        end_diagnostic(false);
    }
⟧

1699. This outputs the accent and the base character given in the substitution. It uses code virtually identical to the make_accent procedure, but without the node creation steps.

Additionally if the accent character has to be shifted vertically it does not create the same code. The original routine in make_accent and former versions of MLTEX creates a box node resulting in push and pop operations, whereas this code simply produces vertical positioning operations. This can influence the pixel rounding algorithm in some DVI drivers—and therefore will probably be changed in one of the next MLTEX versions.

⟦1699 Rebuild character using substitution information⟧ = ⟦
    base_x_height = x_height(f)

    base_slant = slant(f) / float_constant(65536)

    // slant of accent character font
    accent_slant = base_slant

    base_width = char_width(f)(ib_c)

    base_height = char_height(f)(height_depth(ib_c))

    accent_width = char_width(f)(ia_c)

    // compute necessary horizontal shift (don't forget 
    // slant)
    accent_height = char_height(f)(height_depth(ia_c))

    delta = round(
      
          (base_width - accent_width)
          / float_constant(2)
          + base_height
          * base_slant - base_x_height * accent_slant
      ,
    )

    // update dvi_h , similar to the last statement in 
    // module 620
    // 1. For centering/horizontal shifting insert a kern 
    // node.
    dvi_h = cur_h

    cur_h = cur_h + delta

    // 2. Then insert the accent character possibly shifted 
    // up or down.
    synch_h

    if ((
        (base_height != base_x_height)
        && (accent_height > 0)
    )) {
        // the accent must be shifted up or down
        cur_v = base_line + (base_x_height - base_height);
        synch_v;
        if (accent_c >= 128) {
            dvi_out(set1);
        }
        dvi_out(accent_c);
        cur_v = base_line;
    } else {
        synch_v;
        if (accent_c >= 128) {
            dvi_out(set1);
        }
        dvi_out(accent_c);
    }

    cur_h = cur_h + accent_width

    // 3. For centering/horizontal shifting insert another 
    // kern node.
    dvi_h = cur_h

    // 4. Output the base character.
    cur_h = cur_h + (-accent_width - delta)

    synch_h

    synch_v

    if (base_c >= 128) {
        dvi_out(set1);
    }

    dvi_out(base_c)

    cur_h = cur_h + base_width

    // update of dvi_h is unnecessary, will be set in module 
    // 620
    dvi_h = cur_h
⟧

1700. Dumping MLTEX-related material. This is just the flag in the format that tells us whether MLTEX is enabled.

⟦1700 Dump ML\TeX-specific data⟧ = ⟦
    dump_int(0x4d4c5458) // ML\TeX's magic constant: "MLTX"

    if (mltex_p) {
        dump_int(1);
    } else {
        dump_int(0);
    }
⟧

1701. Undump MLTEX-related material, which is just a flag in the format that tells us whether MLTEX is enabled.

⟦1701 Undump ML\TeX-specific data⟧ = ⟦
    undump_int(x) // check magic constant of ML\TeX

    if (x != 0x4d4c5458) {
        goto bad_fmt;
    }

    // undump mltex_p flag into mltex_enabled_p 
    undump_int(x)

    if (x == 1) {
        mltex_enabled_p = true;
    } else if (x != 0) {
        goto bad_fmt;
    }
⟧

1702. [54/SyncTeX] The Synchronize TEXnology. This section is devoted to the Synchronize TEXnology - or simply SyncTEX - used to synchronize between input and output. This section explains how synchronization basics are implemented. Before we enter into more technical details, let us recall in a few words what is synchronization.

TEX typesetting system clearly separates the input and the output material, and synchronization will provide a new link between both that can help text editors and viewers to work together. More precisely, forwards synchronization is the ability, given a location in the input source file, to find what is the corresponding place in the output. Backwards synchronization just performs the opposite: given a location in the output, retrieve the corresponding material in the input source file.

For better code management and maintainance, we adopt a naming convention. Throughout this program, code related to the Synchronize TEXnology is tagged with the “synctex” key word. Any code extract where SyncTEX plays its part, either explicitly or implicitly, (should) contain the string “synctex”. This naming convention also holds for external files. Moreover, all the code related to SyncTEX is gathered in this section, except the definitions.

1703. Enabling synchronization should be performed from the command line, synctexoption is used for that purpose. This global integer variable is declared here but it is not used here. This is just a placeholder where the command line controller will put the SyncTEX related options, and the SyncTEX controller will read them.

1704.

⟦13 Global variables⟧ += ⟦
    var synctexoption: integer;
⟧

1705. A convenient primitive is provided: \synctex=1 in the input source file enables synchronization whereas \synctex=0 disables it. Its memory address is synctex_code . It is initialized by the SyncTEX controller to the command-line option if given. The controller may filter some reserved bits.

1706.

⟦252 Put each of \TeX's primitives into the hash table⟧ += ⟦
    primitive(
      strpool!("synctex"),
      assign_int,
      int_base + synctex_code,
    )
⟧

1707.

⟦1707 synctex case for |print_param|⟧ = ⟦
    synctex_code:

    print_esc(strpool!("synctex"))
⟧

1708. In order to give the SyncTEX controller read and write access to the contents of the \synctex primitive, we declare synctexoffset , such that mem[synctexoffset] and \synctex correspond to the same memory storage. synctexoffset is initialized to the correct value when quite everything is initialized.

1709.

⟦13 Global variables⟧ += ⟦
    // holds the true value of synctex_code 
    var synctexoffset: integer;
⟧

1710.

⟦8 Initialize whatever \TeX\ might access⟧ += ⟦
    synctexoffset = int_base + synctex_code
⟧

1711.

⟦1711 Initialize synctex primitive⟧ = ⟦
    synctex_init_command
⟧

1712. Synchronization is achieved with the help of an auxiliary file named ‘jobname.synctex’ (jobname is the contents of the \jobname macro), where a SyncTEX controller implemented in the external synctex.c file will store geometrical information. This SyncTEX controller will take care of every technical details concerning the SyncTEX file, we will only focus on the messages the controller will receive from the TEX program.

The most accurate synchronization information should allow to map any character of the input source file to the corresponding location in the output, if relevant. Ideally, the synchronization information of the input material consists of the file name, the line and column numbers of every character. The synchronization information in the output is simply the page number and either point coordinates, or box dimensions and position. The problem is that the mapping between these informations is only known at ship out time, which means that we must keep track of the input synchronization information until the pages ship out.

As TEX only knows about file names and line numbers, but forgets the column numbers, we only consider a restricted input synchronization information called SyncTEX information. It consists of a unique file name identifier, the SyncTEX file tag, and the line number.

Keeping track of such information, should be different whether characters or nodes are involved. Actually, only certain nodes are involved in SyncTEX, we call them synchronized nodes. Synchronized nodes store the SyncTEX information in their last two words: the first one contains a SyncTEX file tag uniquely identifying the input file, and the second one contains the current line number, as returned by the \inputlineno primitive. The synctex_field_size macro contains the necessary size to store the SyncTEX information in a node.

When declaring the size of a new node, it is recommanded to use the following convention: if the node is synchronized, use a definition similar to my_synchronized_node_size =xxx+synctex_field_size . Moreover, one should expect that the SyncTEX information is always stored in the last two words of a synchronized node.

1713. By default, every node with a sufficiently big size is initialized at creation time in the get_node routine with the current SyncTEX information, whether or not the node is synchronized. One purpose is to set this information very early in order to minimize code dependencies, including forthcoming extensions. Another purpose is to avoid the assumption that every node type has a dedicated getter, where initialization should take place. Actually, it appears that some nodes are created using directly the get_node routine and not the dedicated constructor. And finally, initializing the node at only one place is less error prone.

1714.

⟦1714 Initialize bigger nodes with {\sl Sync\TeX} information⟧ = ⟦
    if (s >= medium_node_size) {
        sync_tag(r + s) = synctex_tag;
        sync_line(r + s) = line;
    }
⟧

1715. Instead of storing the input file name, it is better to store just an identifier. Each time TEX opens a new file, it notifies the SyncTEX controller with a synctex_start_input message. This controller will create a new SyncTEX file tag and will update the current input state record accordingly. If the input comes from the terminal or a pseudo file, the synctex_tag is set to 0. It results in automatically disabling synchronization for material input from the terminal or pseudo files.

1716.

⟦1716 Prepare new file {\sl Sync\TeX} information⟧ = ⟦
    // Give control to the {\sl Sync\TeX} controller
    synctex_start_input
⟧

1717.

⟦1717 Prepare terminal input {\sl Sync\TeX} information⟧ = ⟦
    synctex_tag = 0

1718.

⟦1718 Prepare pseudo file {\sl Sync\TeX} information⟧ = ⟦
    synctex_tag = 0

1719.

⟦1719 Close {\sl Sync\TeX} file and write status⟧ = ⟦
    // Let the {\sl Sync\TeX} controller close its files.
    synctex_terminate(log_opened)
⟧

1720. Synchronized nodes are boxes, math, kern and glue nodes. Other nodes should be synchronized too, in particular math noads. TEX assumes that math, kern and glue nodes have the same size, this is why both are synchronized. In fine, only horizontal lists are really used in SyncTEX, but all box nodes are considered the same with respect to synchronization, because a box node type is allowed to change at execution time.

The next sections are the various messages sent to the SyncTEX controller. The argument is either the box or the node currently shipped out. The vertical boxes are not recorded, but the code is available for clients.

1721.

⟦1721 Start sheet {\sl Sync\TeX} information record⟧ = ⟦
    synctex_sheet(mag)
⟧

1722.

⟦1722 Finish sheet {\sl Sync\TeX} information record⟧ = ⟦
    synctex_teehs
⟧

1723.

⟦1723 Start vlist {\sl Sync\TeX} information record⟧ = ⟦
    synctex_vlist(this_box)
⟧

1724.

⟦1724 Finish vlist {\sl Sync\TeX} information record⟧ = ⟦
    synctex_tsilv(this_box)
⟧

1725.

⟦1725 Start hlist {\sl Sync\TeX} information record⟧ = ⟦
    synctex_hlist(this_box)
⟧

1726.

⟦1726 Finish hlist {\sl Sync\TeX} information record⟧ = ⟦
    synctex_tsilh(this_box)
⟧

1727.

⟦1727 Record void list {\sl Sync\TeX} information⟧ = ⟦
    if (type(p) == vlist_node) {
        synctex_void_vlist(p, this_box);
    } else {
        synctex_void_hlist(p, this_box);
    }
⟧

1728.

⟦1728 Record current point {\sl Sync\TeX} information⟧ = ⟦
    synctex_current
⟧

1729.

⟦1729 Record horizontal |rule_node| or |glue_node| {\sl Sync\TeX} information⟧ = ⟦
    synctex_horizontal_rule_or_glue(p, this_box)
⟧

1730.

⟦1730 Record |kern_node| {\sl Sync\TeX} information⟧ = ⟦
    synctex_kern(p, this_box)
⟧

1731.

⟦1731 Record |math_node| {\sl Sync\TeX} information⟧ = ⟦
    synctex_math(p, this_box)
⟧

1732. When making a copy of a synchronized node, we might also have to duplicate the SyncTEX information by copying the two last words. This is the case for a box_node and for a glue_node , but not for a math_node nor a kern_node . These last two nodes always keep the SyncTEX information they received at creation time.

1733.

⟦1733 Copy the box {\sl Sync\TeX} information⟧ = ⟦
    sync_tag(r + box_node_size) = sync_tag(
      p + box_node_size,
    )

    sync_line(r + box_node_size) = sync_line(
      p + box_node_size,
    )
⟧

1734.

⟦1734 Copy the rule {\sl Sync\TeX} information⟧ = ⟦
    //  sync_tag ( r + rule_node_size ) = sync_tag ( p + 
    // rule_node_size ) ;  sync_line ( r + rule_node_size ) 
    // = sync_line ( p + rule_node_size ) ; 

1735.

⟦1735 Copy the medium sized node {\sl Sync\TeX} information⟧ = ⟦
    sync_tag(r + medium_node_size) = sync_tag(
      p + medium_node_size,
    )

    sync_line(r + medium_node_size) = sync_line(
      p + medium_node_size,
    )
⟧

1736. Nota Bene: The SyncTEX code is very close to the memory model. It is not connected to any other part of the code, except for memory management. It is possible to neutralize the SyncTEX code rather simply. The first step is to define a null synctex_field_size . The second step is to comment out the code in “Initialize bigger nodes...” and every “Copy ... SyncTEX information”. The last step will be to comment out the synctex_tag_field related code in the definition of synctex_tag and the various “Prepare ... SyncTEX information”. Then all the remaining code should be just harmless. The resulting program would behave exactly the same as if absolutely no SyncTEX related code was there, including memory management. Of course, all this assumes that SyncTEX is turned off from the command line.

1737. [54] System-dependent changes.

⟦1097 Declare action procedures for use by |main_control|⟧ += ⟦
    function insert_src_special() {
        var toklist, p, q: pointer;
        
        if ((
            source_filename_stack[in_open]
            > 0
            && is_new_source(
              source_filename_stack[in_open],
              line,
            )
        )) {
            toklist = get_avail;
            p = toklist;
            info(p) = cs_token_flag + frozen_special;
            link(p) = get_avail;
            p = link(p);
            info(p) = left_brace_token + ord!("{");
            q = str_toks(
              make_src_special(
                source_filename_stack[in_open],
                line,
              ),
            );
            link(p) = link(temp_head);
            p = q;
            link(p) = get_avail;
            p = link(p);
            info(p) = right_brace_token + ord!("}");
            ins_list(toklist);
            remember_source_info(
              source_filename_stack[in_open],
              line,
            );
        }
    }

    function append_src_special() {
        var q: pointer;
        
        if ((
            source_filename_stack[in_open]
            > 0
            && is_new_source(
              source_filename_stack[in_open],
              line,
            )
        )) {
            new_whatsit(special_node, write_node_size);
            write_stream(tail) = 0;
            def_ref = get_avail;
            token_ref_count(def_ref) = null;
            q = str_toks(
              make_src_special(
                source_filename_stack[in_open],
                line,
              ),
            );
            link(def_ref) = link(temp_head);
            write_tokens(tail) = def_ref;
            remember_source_info(
              source_filename_stack[in_open],
              line,
            );
        }
    }
⟧

1738. This function used to be in pdftex, but is useful in tex too.

function get_nullstr(): str_number {
    get_nullstr = strpool!("");
}

1739. [55] Index. Here is where you can find all uses of each identifier in the program, with underlined entries pointing to where the identifier was defined. If the identifier is only one letter long, however, you get to see only the underlined entries. All references are to section numbers instead of page numbers.

This index also lists error messages and other aspects of the program that you might want to look up some day. For example, the entry for “system dependencies” lists all sections that should receive special attention from people who are installing TEX in a new operating environment. A list of various things that can’t happen appears under “this can’t happen”. Approximately 40 sections are listed under “inner loop”; these account for about 60% of TEX’s running time, exclusive of input and output.