[XeTeX] LaTeX Font Warning: Encoding `OT1' has changed to `U' for symbol font
Jonathan Kew
jonathan_kew at sil.org
Sat May 13 10:21:32 CEST 2006
On 13 May 2006, at 8:49 am, Will Robertson wrote:
> I haven't pushed for the creation of our very own font encoding for
> XeTeX+fontspec for the reason that "font encoding" doesn't mean the
> same thing when you're using unicode fonts. In LaTeX proper, I font
> encoding is supposed to mean (although doesn't always, to their
> chagrin) that the font contains *exactly* some set of glyphs. Then,
> if a character is requested that cannot be typeset with this font,
> corrective measures are taken.
>
> But consider what this means: every non-ascii character is active and
> assigned a "LaTeX Input Character Representation" (or something,
> can't ever remember what the acronym stands for). E.g. "é" -> \eacute
> -> slot XXX in font with encoding YYY. With XeTeX, we know we're
> always using unicode fonts. So this step is largely unnecessary. "é"
> in the source will be typeset directly by XeTeX.
>
> In order to fulfil the LaTeX paradigm, we'd need to set up a mapping
> for every character in unicode to a control sequence, back to a glyph
> in a unicode font. We'd then need to start representing fonts by
> which subset of unicode they contain, and this will never be fully
> consistent across fonts.
Right.... we really don't want to go down that road (IMO). Bear in
mind that there are around 100,000 characters defined in Unicode, and
growing.... it just doesn't make sense to try and have an Internal
Character Representation for each, separate from the character code
itself.
The whole font encoding system arose because in the byte-oriented
world, it was necessary to have fonts that supported many different
collections of glyphs (as no single 256-glyph collection could
include everything that people wanted to typeset). So this meant that
a given byte (character) code meant different things, depending on
the font associated with it; and that the code required to access a
given glyph depended which font you were using. So \eacute might need
to be output as code 140 in one font, code 200 in another, and
composed using \accent in a third.
Moving to Unicode as the single character encoding standard means
that \eacute is always U+00E9, and this entire font-dependent mapping
layer can go away.
Actually, in one sense it doesn't go away, but it moves from within
TeX to inside the font. Glyphs in fonts are *really* rendered via
glyph IDs (TrueType) or names (T1) -- but the mapping of Unicode
character codes (universal) to glyphs (font-specific) is defined
within the font itself. So the text processing software doesn't need
to be concerned with it; we simply pass Unicode character codes to
the font renderer.
> In the scheme above, you could perform error checking in the stage
> from going from command name to unicode glyph, but it would be
> terribly inefficient (consider that Jonathan Kew hasn't wanted to
> implement this *in the source*. This would be much slower.)
Actually, I'm considering a change here, to support generating
warnings for character codes that are not supported by the font in
use. This would be under the control of \tracinglostchars, just like
TeX's warnings for legacy TFM-based fonts. (Though as most of the
current font encodings fill all 256 slots, people probably aren't
used to seeing those messages very often. IIRC, they're only written
to the log by default, not to the console, so they're rather easy to
overlook anyway.)
JK
More information about the XeTeX
mailing list