This page lists a few reports which could plausibly be considered notable bugs in the original TeX software written and maintained by Donald Knuth, but have been deemed not something to be fixed, either by Knuth or his vetters.
Many other reports have been declined that are not listed here (Knuth's tune-up reports mention some: 2021, 2014, 2008). The original reports and answers have been edited or paraphrased for presentation here.
The page and section numbers are merely an initial hint about a relevant location; typically, more than one place in the code and/or documentation is involved. The initial letter (A, B, …) refers to the Computers & Typesetting volume.
A list of accepted bugs for the next tune-up is also available. These are not expected to be reviewed by Knuth until the next tune-up.
For any discussion about these issues, or further reports, please use the contact information on the main TeX bugs page here.
Contents: A005: missing \null - A009: primitive operations - A214: \endinput behavior - A308: \copyright tie - A374: \asts behavior - A415: \ninebig delimiters
B016: first line not logged - B028: extra blank line logged - B032: use of word “procedure” - B035: newlines inconsistently written to terminal - B040,D036: can fix can fix - B133: max_param_stack comment - B214: input file name flushed - B274: line number ranges - B442: \spaceskip not logged - B506: bogus dimen display/other overflow - B546: output routine braces
C172: epsilon not x - D350: unused variable m declared - E037: doubled and missing kern pairs
dvitype: unnecessary loop condition. - gftopk: unused text_file declaration.
From various people: as a first example, in exercise 2.4, to get the right spacefactor, “OK,” should be “OK\null,”. The same issue occurs dozens of times throughout the books and WEB sources, after all kinds of punctuation. “\TeX.” and “MF.” are particularly prevalent.
Response from DEK: My practice has been to insert \null only when I notice something amiss in proofreading. Similarly with lots of other refinements.
From Ulrich Diez, 2022-12-09. The TeXbook says:
About 300 of TeX's control sequences are called primitive; these are the low-level atomic operations that are not decomposable into simpler functions. All other control sequences are defined, ultimately, in terms of the primitive ones.
but there are many primitive operations, such as typesetting a character, which are not primitive control sequences, so the second sentence is false.
Response: no argument, but currently the term “primitive” is rather consistently used to mean “primitive control sequence”. The repertoire of primitive actions not invoked by a control sequence is never given a label, as a group. So no simple correction is evident.
The sentence could be reworded to be pedantically correct via something like (thanks to Paul Vojta for the basic suggestion):
All other control sequences ultimately expand to a token list in which the only control sequences are the primitive ones.
or in various other ways; but, this is extremely early in The TeXbook. It would sow confusion to new readers to say anything like this so early in the book. For example, the concept of tokens is not mentioned until 30 pages further on.
Knuth's disclaimer in the preface of “deliberate lying” would seem to apply here.
The TeXbook defines the behavior of \endinput with:
The next time TeX gets to the end of an \input line, it will stop reading from the file containing that line.
and that is exactly how it behaves. N.B. It does not say “stop reading from the file containing the \endinput” (let alone “stop reading immediately”). Thus, when material is placed on an input line after \endinput, there are counter-intuitive effects (report 1) and/or wrong/imprecise error messages (report 2), but this is not a bug that Knuth will consider.
TeX's error message “File ended while ...” is technically inaccurate even in the simple case of any text at all following \endinput, in that file reading did not reach EOF.
Response: In general, Knuth has said that extreme cases of TeX input deserve whatever they get. Furthermore, he knows error messages are not always optimally worded. But he has consistently declined to tinker with wording for small incremental improvements at this point.
From Victor S: in the answer to exercise 7.9 (the question is on p.41), a non-breaking space should follow the \copyright command.
Response: While it is certainly true that no line break would be desirable after (or before) a \copyright symbol in a normal copyright statement, such a line break would not render the copyright statement invalid. In addition, this exercise is clearly hypothetical, since it's not good practice, or perhaps even legally meaningful, to generate the year in a copyright statement instead of writing it out; this is a much worse legal issue than an extraneous line break. Also, since the exercise is about token expansion, not proper form of copyright statements, the presence of a ~ could lead readers to wonder if that was relevant to the exercise. Knuth has consistently declined to spend time on improvements on side issues that are not related to the TeXnical topic at hand.
From Bertram Scharpf, 2023-04-13. The TeXbook gives a solution for inserting \n asterisks:
\setbox0=\hbox{*}\cleaders\copy0\hskip\n\wd0
However, this does not work in every situation. In vertical mode, it produces an error and thus needs to be preceded by \leavevmode or similar. Worse, at the end of a paragraph, the leaders glue will be removed; a \null or similar is needed to avoid that.
A general solution would be to put an \hbox around the whole thing:
\hbox{\setbox0=\hbox{*}\cleaders\copy0\hskip\n\wd0}
Response: agreed on all counts, but since Appendix D is maximally-dangerous bend material, it seems reasonable for Knuth to leave out such bullet-proofing from the macro. The main point is the use of leaders to produce the variable number of asterisks.
From Hu Yajie 胡亚捷, 2020-07-29 and 2020-07-29 again and 2020-08-02:
The \ninebig macro in manmac.tex typesets \big delimiters in 9-point math by borrowing the 10-point ones in cmr10 and cmsy10, but it forgets to retain the 9-point axis height. Thus examples like \input manmac \ninepoint $\bigl(()\bigr)$\end are vertically asymmetrical. This asymmetry can be observed in the real books (page A245, line 20; page C298, line -1; etc.), and it can be fixed by changing ‘\hbox{...}’ to ‘\vcenter{\hbox{...}}’ to get vertical symmetry.
Response: Knuth accepts the analysis, but says: “Since I've been happy with that for nearly 40 years, I guess I'm still happy with it.”
From Dominik Leininger (and many other people over the years), 2023-08-16. Give the ** prompt (or a command line) commands that ordinarily write to the log file. For instance, running Metafont like this:
**\ show tracingonline; show origin..right; end
gives the output:
>> 0 >> Path at line 0: ...
According to the ‘>> 0’, the path should not be displayed on the terminal, but it does. The second thing I noticed is that you should see the path in the transcript file, but mfput.log only contains the following [...]
[An analogous TeX first line: \showthe\tracingonline \showbox255].
Response: TeX and MF can't write to the log file until it's opened, and they don't open the log file until its proper name is known. That is, we want to let the user do stuff like \somemacros \input myfile and get myfile.log as the log file name. Thus, the programs wait until the (implicit) \input happens, or there isn't one by the end of processing the ** line, in which case we get texput.log or mfput.log.
This is explained, somewhat, on page B016 (module 25, “We need a special routine…”), and (even less specifically, but still implied), on page A023 (“TeX uses the name texput …”), and the analogous areas in mf.web and The Metafontbook.
Although one might imagine the programs buffering all would-be log output until the log file is opened, instead of writing to the terminal, it's questionable whether this would be an improvement. In any case, it's not a change Knuth would make at this late date.
From Igor Liferenko, 2023-03-06. Run TeX on:
\setbox0=\hbox{A}\showbox0\end
and the log file is (notice two blank lines):
.\tenrm A ! OK.
Response: Granted this is not ideal, but presumably Knuth would not rework the basic I/O functions to avoid it, as the bug happens because TeX does not keep perfect track of which column it's at in the log file (or terminal output). He's declined to fix similar problems in the past, such as (your/Igor's) “newlines inconsistently written to terminal” report below.
The response from Tyge Tiessen explains the situation in more detail, including other examples where spurious blank lines can occur.
From Martin Ruckert, 2021-03-14: The text says “The print_err procedure”, but print_err is a macro, not a procedure.
Rino Jose, 2018-12-18, made a similar report for Metafont, page D120 (section 266): “This procedure returns” instead of “This function returns”. [The phrase occurs in several other places in mf.web and tex.web.]
Response: As a matter of English, it is normal to use “procedure” interchangeably with other terms (macro, function, (sub)routine), and since it's not formatted in bold, it shouldn't be taken as implying a Pascal procedure.
From Igor Liferenko, 2021-07-06: Before exiting, TeX (and other WEB programs) sometimes use write and sometimes use write_ln, e.g., tex.web line 1036 (B035, section 35) vs. line 1085 (B036, section 37).
Response from DEK: This is not important enough to warrant any change. [However, he has noted that it's fine/expected for change files to do as they see fit in this regard.]
Further information from DRF: For the historical record. The Sail/Waits OS, where DEK spent his time back in the day, had strong knowledge of what text was where on your screen, as well as what was buffered up for (custom) keyboard input and (custom) screen output, and it was all tightly bound with the “shell” that was really integrated with the OS. The system handled whole lines of input, and when the user hit <return>, it put the cursor at start of the next line, and knew it; when the program put characters on the screen, the system knew exactly what row and column they were in; and when a program ended, any remaining output buffered up for the screen was flushed, including moving the cursor to the start of a new line if necessary.
Score/TOPS-20, the only port we directly provided, was the same but different (completely different OS and no special terminal hardware, but the shell was tightly integrated with the terminal IO, and the system knew where the cursor was at all times; see the SFPOS and RFPOS system calls).
The only question is whether I'm lying about “if necessary”, and that it was really “always”. Looking for further clues, note that Tangle and Weave always do not end with a newline to the terminal, while PLtoTF, TFtoPL and PoolType always do end with a newline to the terminal. This is evidence that either the OSes we dealt with added a newline at the end only as needed, or that they always added a newline and DEK didn't care that the PL/TF programs left an extra blank line. The latter is believable, as those programs were virtually never used, while Tangle and Weave were in constant use, especially by DEK. (Perhaps oddly, DVItype and GFtype don't report errors or progress to the terminal; all their output goes into the .TYP file, so they don't really provide any evidence.)
That leaves TeX and MF. In normal operation, they also do not end with a newline to the terminal, and they too were in constant use. However, in most exceptional cases they do end with a newline (minus l.1085). Looking back at version 0.97 on saildart.org/[TEX,DEK] it looks like it was all pretty much the same mix.
I'd say that the normal-operation TeX/MF, along with Tangle/Weave, is fairly strong evidence that care was being taken for intentionally ending without a newline; and that it is in fact a mistake in the (very) exceptional cases where it does, but nobody cared or noticed, since those exceptions pretty much never happened. So, I suppose I'd say that [in principle] lines 1036, 1328, 10164, 23810, 24289 should all be changed not have a newline, so that everything is self-consistent; with the super advantage that it's then always ok for other ports to always add a newline at the end of the job, and that will never cause an extra blank line.
I don't think there's any mysterious reason for the odd-man-out case of 1085 where the code seems already “right”.
From Gregor Purdy, 2023-06-12 (and others in the past):
[the] procedure confusion which calls `help1` with an argument ending in the duplication "...who can fix can fix"
Response: This is one of Knuth's jokes: the program is so broken it issues a broken error message.
From Wolfgang Helbig, 2021-07-23:
The comment
{ largest value of param_ptr, will be <= param_size + 9 }at the declaration of max_param_stack seems misleading to me. I'd suggest instead:
{ largest value of param_ptr }The param_ptr must not exceed param_size, which is ensured in section 390.
Response: That's true about param_ptr. What's misleading is that second half of the comment, “will be <= param_size + 9”, applies to max_param_stack, not param_ptr. A semicolon instead of comma would have made that clearer.
More from DRF about this: DEK is commenting on the fact that he had to make the type of max_param_stack be integer rather than 0..param_size+9, which is what it really ought to be—but Pascal doesn't let you use even a constant additive expression in the range definition (and WEB only lets you if it's from a numeric (=) macro so it can collapse the addition, but DEK wanted max_param_stack to be compile-time changeable in the const section, evidently). See all of the other max_* global variables for confirmation; they're all 0..<whatever>_size (and <whatever> is a Pascal const).
He could have detected the overflow before doing the addition, and thus be able to use max_param_stack:0..param_size and gotten rid of the comment entirely, but then the statistics report at the end of the TeX run would not have shown how big you need to increase param_size to for the job to run.
This is not the only terse comment that needs much thought / experience / analysis to figure out the motivation for.
From Wolfgang Helbig, 2020-10-26:
TeX sometimes flushes the name of an input file, keeping only the base name without directory and extension. This causes an error if the full name of the file needs to be passed to the editor during error recovery, as suggested by Prof. Knuth. The same bug is in Metafont (section 793).
[...] change block[s] from my tex.ch [...]:
@x [tex 537] continued if name=str_ptr-1 then {we can conserve string pool space now} begin flush_string; name:=cur_name; end; @y @^Editor@> @z
Response: The assumption was that on any “reasonable” OS, you could easily ask the system for the full canonical file name of the appropriate open alpha_file when you need it, so there's no need for TeX to remember it. Since *nix is not able to do this in general, dealing with this in the changefile as shown seems reasonable. (There are various system-dependent ways to approximate this, but no reliable and portable method is possible.)
On the other hand, filesystems on popular TeX-able OSes of the day (TOPS-20, VAX/VMS, etc.) had both Logical Name and Version Number features, resulting in the need to ask the OS for the full name of the file that actually got opened.
This call to flush_string also causes a non-standard filename extension to be lost when calling the editor. Knuth recommends that implementors avoid this, either via Wolfgang's change that eliminate flushing the string or some other method.
These issues all fall under the rubric of “system wizardry” mentioned in the description of the E option (page B036, section 84).
From Udo Wermuth, 2017-01-25.
In overfull/underfull box messages, the beginning and ending lines shown might come from different files, and thus be misleading. Suppose we have a file main.tex containing this line:
Main 1\par \input auxone \end
and a file auxone.tex with these two lines:
Aux1 1\par Aux1 2 bug in underfull message?\break
then running tex main gives:
This is TeX... (./main.tex (./auxone.tex) Underfull \hbox (badness 10000) in paragraph at lines 2--1 ...
A range of lines that begins after it ends does not make sense. Also, a user will connect both numbers to the file main.tex as it is the only active file; there is a ) after auxone.tex, so this file has been processed.
The situation can also occur in alignments and with overfull messages. It is shown in the trip test.
Response: Indeed, and because it is shown in the trip test, we can conclude that DEK was aware that if you started a paragraph in one source file and ended it in another, then line numbers in messages would be problematic. The attitude was that robustness in the face of this edge case (not a recommended best practice) wasn't worth the extra bytes of memory (both code and data). Especially since the rest of the context is usually clear from the logging of the actual text of the paragraph.
From Igor Liferenko, 2022-06-21:
Spaceskip glue with zero stretchability and shrinkability is not marked as such in log when spacefactor is not 1000.
\spaceskip=1pt \setbox0=\hbox{I turn}\showbox0
Output:
.\glue 1.0Spacefactor does not change the glue (due to zero stretchability and shrinkability), so output must be:
.\glue(\spaceskip) 1.0
Response: our feeling is that this is not a bug to be passed on, mainly because Knuth never explicitly says, either in The TeXbook or tex.web, exactly when the special glues such as \spaceskip are marked. So the only guide is the code, and it seems intentional that the marking is avoided when the spacefactor != 1000, regardless of whether the spacefactor affects the glue setting.
As suggested, there is certainly a reasonable argument that it would be nicer if \spaceskip was marked when it is the only source for the glue item. But the current behavior doesn't seem wrong to us, since there is no statement being contradicted. Also, the behavior is plausible, namely, only mark \spaceskip if the glue was not modified, even potentially, by the space factor.
Looking at the code (tex.web sections 1041–1044), it seems it would not be simple to change, since right now the code applies space factor modifications without needing to know the context, that is, if the stretch and shrink are actually going to be used (consider \unhbox). Putting extra stuff into this “almost inner loop” code merely for the sake of different logging is presumably not something Knuth would entertain at this point.
From Bruno Le Floch, 2020-10-22.
Slightly incorrect display of 32768pt dimen: The following shows --32768.0pt with a double minus sign.
\dimen0=\maxdimen \advance\dimen0\maxdimen \advance\dimen0 2sp \showthe\dimen0
Response: Any bug is the lack of an error message at \advance\dimen0\maxdimen, but the lack of overflow checking is pervasive in TeX, and is a deliberate choice by Knuth. The resulting display is a case of GIGO.
Supplement: Overfull \hbox not reported: the lack of complete overflow checking induces strange behavior in other ways. For example, on 2021-05-16, Matteo Caoduro reported that an extremely long line does not generate an overfull box message:
xxx...6298 x's...xxx\end
With 6297 x's, and then starting at 24924 x's, there is an overfull box message, but not in between. TeX's errors and warnings are not intended to handle such pathological situations.
Patches to implement more complete overflow checking would be welcome (Knuth has said this would be ok). The performance hit is unlikely to matter nowadays.
From Bruno Le Floch, 2020-10-22.
The \output routine is surrounded by very peculiar braces, and by removing the closing one with \let\next=, one ends up in a black hole where TeX does not interpret any further token. My question and answer on tex.sx describe the strange behaviour. It is probably not a bug as there is an explicit comment “loops forever if reading from a file”. It would be interesting to have a rationale.
Response: The idea here is that “The error message [I can't handle that very well] told you you've made a mess, and if the error message isn't enough, then the help info warns you that error recovery is not likely, and if that's not enough, then you'd better look at Volume B.” Basically, “I can't handle that very well” includes the possibility “I can get in a loop.”
As for a rationale, it would be more or less impossible to generally recover into any sensible state, and certainly not one that would give any reasonable subsequent output. This is a case that normal users of macro packages and documents would never run into, and would need a deep expert's close study to fix, so it is not worth spending time or (precious, at the time) bytes of code on trying to let the job continue, most likely fruitlessly in the end.
For that matter, it's somewhat surprising that this case doesn't just bail out immediately, as with the few
fatal_error("(interwoven alignment preambles are not allowed)")
cases, or the favorite:
confusion("256 spans"); {this can happen, but won't}both of which are more likely for a real user to run into directly.
From Bertram Scharpf, 2022-06-30:
In The METAFONTbook, page C172, line 10 reads:
case text($\epsilon$) is omitted. [...]
but in line 3 ‘text’ is defined by
@for@ $x=\epsilon_1,\epsilon_2,\epsilon_3$: text($x$) @endfor@Notice the ‘\epsilon’ instead of the ‘x’.
Response: Knuth is using the unsubscripted \epsilon here to mean any of the \epsilon_1, \epsilon_2, \epsilon_3 values given in the for statement. This is indicated by the text in the sentence on line 9 (before the quoted line 10): “The \epsilon's might also be empty, in which [case …]”.
The idea being that if a particular epsilon is empty, then the text() expression for that epsilon is omitted. Saying “text(x) could be omitted” might be misread as saying the entire for loop expands to nothing if any of the epsilons were empty, which is clearly not the case. It might have been clearer to say “A given \epsilon_i might also be empty, in which case text($\epsilon_i$) is omitted …”, but even if Knuth agreed in theory, he is no longer making that kind of micro-improvement to the exposition.
From David Fuchs, 2021-01-24.
This line in mf.web should be removed:
@!m:integer; {the current month}
as the local variable k is used, as correctly commented, by open_log_file for indexing into months, and m is now an unused variable.
Response from DEK: In accordance with the wonderful Japanese tradition of wabi-sabi, I won't be changing that.
From Bogusław Jackowski and Janusz Nowacki (2005, reported at EuroTeX 2005, where Knuth was present; article, slide), and Hans Hagen and Mikael Sundqvist (2023), and others: there are some repeated kern pairs in Computer Modern. For example, in roman.mf (E037), ka is defined with both -u# and -.5u#; in mathit.mf, N+slash and X+slash similarly are defined twice.
Response: There's no harm in this, apart from a few bytes of wasted space in the tfm files, and it won't be changed. All engines use the first value, as stated in The Metafontbook, page C317.
There are other infelicities in the CM kerning tables, e.g., the second (smaller) value for ka in cmr10 is arguably better than the first, Av (among many other possible pairs) is not kerned at all, etc. Knuth has stated that no further tweaks will be made to CM metrics. Indeed, they have not changed since the 1980s.
From Lucas Mirelman 2021-07-10: In
@<Store character-width indices...@>= if wp>0 then for k:=width_ptr to wp-1 do
the condition on wp is unnecessary.
Response from DRF: In the declarations:
var k:integer; {index for loops} ... @!lh:integer; {length of the header data, in four-byte words} ... @!nw:integer; {number of words in the width table} ... @!wp:0..max_widths; {new value of |width_ptr| after successful input}
I think k should have been 0..max_widths like wp, and then the suspicious check would make sense. Instead, true to form, DEK saved a word of memory by using k for another loop with the range (0..lh+3) and when someone pointed out lh could be larger than max_widths, it was easier to make k an integer rather than clarify the code a little and use a different index variable for that loop.
For what it's worth, lh and nw should both have had type 0..65535 since they're set to lh:=b2*256+b3; and nw:=b0*256+b1; which are both guaranteed to be in that range. There's another case just a bit later in DVIType, exactly like the one Lucas pointed out, Anyway, if this were 40 years ago, I'd militate for the changes I suggest above; now it seems ok to leave it be.
Further down the rabbit hole: In quite a number of places in TeX and friends, there's code like this that does seem necessary in order to protect for-loops from having their “to” value be out-of-range for their index variable. I believe that Hedrick and/or Vax/VMS Pascal optionally enforced this when you turned on some runtime checks. But the old Pascal User Manual and Report from back in the day (as well as the more recent ISO/IEC Pascal standard documents) are pretty clear that first you check if the for-loop is going to happen at all, and then you check that the “from” and “to” values are in range. So, perhaps all the guards scattered about Knuth's code were not supposed to be needed, other than to satisfy a too-fussy compiler?
From Igor Liferenko 2022-04-13: The following code in gftopk.web serves no purpose:
@<Types...@>= @!text_file=packed file of text_char;
Response: Certainly true, but Knuth has consistently declined to remove unused declarations in the past, so there's no use in passing this on. Other examples:
No doubt there are others.
For any discussion about these issues, or further reports to be listed here, please use the contact information on the main TeX bugs page here.