[XeTeX] sting manipulation macros
Michiel Kamermans
pomax at nihongoresources.com
Sun Oct 19 23:16:10 CEST 2008
>> I don't understand. Why don't you give different character classes
for Hangul, Kana and Kanji (Hanja)?
Mostly because as a writer, I hate magic numbers. I want to be able to
use names that I can read, instead of numbers that are temporarily
mapped to something. If I can define something by calling
\setunicodeblock{Cuneiform}{palatino linotype} without having to first
come up with a number for it, that makes me infinitely happier than if I
have to do more programming than that in order to write the content I
need to.
>> You don't seen to know the fine controls of inter-character spaces
Japanese typesetting requires,
>> See, e.g. JIS X 4051. Anyway, you can use convenience macros
defined in zhspacing package.
This is true, but not because I overlooked something here: those
subtleties are essentially irrelevant for my purposes.
Let me explain: my principle goal is to get the Fontwrap package to work
without having to rely on perl, and without having to first spend a
whole year learning the intimacies of TeX in my spare time before being
able to implement something that should be fairly simple given the type
of programming language it is (if it had string manipulation built in).
The availability of a small number of trivially implemented functions
can solve the problem too, and in the process make life much easier for
people who may want to develop their own packages without having to
master TeX first (it is a very hard language to get your head around)
To further illustrate: what Fontwrap does at the moment is stick the
appropriate fontspec \font{fontname} macro in between characters if one
character is from a unicode block that should be using font A, and the
next is from a unicode block that should be using font B. Of course, can
be trivially made more generic by instead of adding in just \font
macros, adding in any block of macro code. The functionality here is
"placing a particular macro when characters belonging to a particular
set is found in the text". The sets don't have to be unicode blocks
either, but could be custom sets, with for instance "a, l, e, s"
belonging to set 1, and set 2 consisting of the letter "p". The word
"apples" would then have three 'start of set' locations: ".a.pp.les".
My own purpose has nothing to do with spacing, kerning, box models, or
anything beyond "inserting a normally already valid anyway bit of TeX
when a new set starts".
I looked at zhspacing after your recommendation, but from what I can
tell, it is for styling and placement of a particular language. This
feature is not for a specific language, it's really just for "arbitrary
sets of characters". Please, see my request from that perspective, not
pertaining to something specific to one or more languages, or even
specifically to unicode blocks (although that is of course my immediate
concern).
- Mike
More information about the XeTeX
mailing list