[XeTeX] sting manipulation macros

Sun Oct 19 23:16:10 CEST 2008

 >> I don't understand. Why don't you give different character classes 
for Hangul, Kana and Kanji (Hanja)?

Mostly because as a writer, I hate magic numbers. I want to be able to 
use names that I can read, instead of numbers that are temporarily 
mapped to something. If I can define something by calling 
\setunicodeblock{Cuneiform}{palatino linotype} without having to first 
come up with a number for it, that makes me infinitely happier than if I 
have to do more programming than that in order to write the content I 
need to.

 >> You don't seen to know the fine controls of inter-character spaces 
Japanese typesetting requires,
 >>  See, e.g. JIS X 4051. Anyway, you can use convenience macros 
defined in zhspacing package.

This is true, but not because I overlooked something here: those 
subtleties are essentially irrelevant for my purposes.

Let me explain: my principle goal is to get the Fontwrap package to work 
without having to rely on perl, and without having to first spend a 
whole year learning the intimacies of TeX in my spare time before being 
able to implement something that should be fairly simple given the type 
of programming language it is (if it had string manipulation built in). 
The availability of a small number of trivially implemented functions 
can solve the problem too, and in the process make life much easier for 
people who may want to develop their own packages without having to 
master TeX first (it is a very hard language to get your head around)

To further illustrate: what Fontwrap does at the moment is stick the 
appropriate fontspec \font{fontname} macro in between characters if one 
character is from a unicode block that should be using font A, and the 
next is from a unicode block that should be using font B. Of course, can 
be trivially made more generic by instead of adding in just \font 
macros, adding in any block of macro code. The functionality here is 
"placing a particular macro when characters belonging to a particular 
set is found in the text". The sets don't have to be unicode blocks 
either, but could be custom sets, with for instance "a, l, e, s" 
belonging to set 1, and set 2 consisting of the letter "p". The word 
"apples" would then have three 'start of set' locations: ".a.pp.les".

My own purpose has nothing to do with spacing, kerning, box models, or 
anything beyond "inserting a normally already valid anyway bit of TeX 
when a new set starts".

I looked at zhspacing after your recommendation, but from what I can 
tell, it is for styling and placement of a particular language. This 
feature is not for a specific language, it's really just for "arbitrary 
sets of characters". Please, see my request from that perspective, not 
pertaining to something specific to one or more languages, or even 
specifically to unicode blocks (although that is of course my immediate 
concern).

- Mike