[XeTeX] Handling of the ^^-input
Jonathan Kew
jonathan at jfkew.plus.com
Wed Oct 8 18:22:53 CEST 2008
On 8 Oct 2008, at 8:45 PM, Ulrike Fischer wrote:
> After some thought I at least found this example (a ansinew file):
>
> \XeTeXinputencoding "cp1252"
> \documentclass[11pt,a4paper]{book}
> \begin{document}
> \catcode`\€=\active
> \def€{EuroSign}
>
> \catcode`\^^80=\active
> \def^^80{Roof notation}
> € ^^80
>
> \end{document}
>
> When run with LaTeX (without the \XeTeXinputencoding line) the
> definition of ^^80 overwrites the definition of €. With xetex € and
> ^^80
> give different results. € is no longer accessible through the ^^80
> notation with xetex (in 8-bit files), you must use ^^^^20ac instead.
Right. \XeTeXinputencoding "cp1252" will cause the literal bytes of
input to be mapped to Unicode codepoints through the given codepage,
so the € character (0x80 in cp1252, I guess) will be mapped to U+20AC.
This happens at the very first level of input.
*Then* the ^^ notation will be handled (this depends on TeX \catcodes,
of course), and sequences of the form ^^hh are replaced by the
character code given. So ^^80 becomes the character U+0080 (which is a
control code in Unicode, not something you usually want to use). This
is unaffected by the \XeTeXinputencoding.
So to input the Unicode character for the Euro sign using ^^ notation
in XeTeX, you need ^^^^20ac, regardless of the \XeTeXinputencoding.
And the € character in input will be mapped to U+20AC if (and only if)
you set the appropriate \XeTeXinputencoding.
To make xetex behave as much like pdftex as possible, you can use
\XeTeXinputencoding "bytes". This gives a "straight-through" mapping
where the byte codes 0..255 become the Unicode codepoints U+0000..U
+00FF. This would let you read arbitrary byte data and get the same
numeric character codes as with pdftex. Just remember that it won't be
valid Unicode! (In general, I wouldn't recommend doing this: if you
want to work with text and fonts that use 8-bit, non-Unicode
encodings, don't bother with xetex at all.)
JK
More information about the XeTeX
mailing list