[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and math symbols

To: C.A.Rowley@open.ac.uk
Subject: Re: Unicode and math symbols
From: "Berthold K.P. Horn" <bkph@ai.mit.edu>
Date: Fri, 28 Feb 1997 13:31:03 -0500 (EST)
CC: mduerst@ifi.unizh.ch, BNB@math.ams.org, tex-font@math.utah.edu
Flags: 000000000000
In-reply-to: <199702281745.RAA21644@fell.open.ac.uk> (message from ChrisRowley on Fri, 28 Feb 1997 17:45:40 GMT)
Reply-to: bkph@ai.mit.edu

Chris wrote:

   Berthold wrote --

   >    > (1) Which is why we have the `alphabetic presentation forms' 
   >    > ff, ffi, ffl, fi, fl, slongt, st etc. in UNICODE.
   > 
   >    They are in the compatibility section. 
   > 
   > Well, they were put in *somewhere* because they are needed,

   No, they were put in for compatibility and there use is not advised.

But nobody is heeding that avise!  Applications in Windows NT *are*
using them.  And I would not be suprised if they were put in after
arm twisting from the `Seattle Satans' as Sebastain refers to them.

And you can see why: (just about) the only way to get at glyphs in
fonts in these systems is either (1) by UNICODE or (2) by numeric
sequence number of arrangement of glyphs in the font, which of course
is quite random, although fixed for a given version of a given
font. In addition in some font technologies there need be tables with
mnemonic names (such as AFII numbers :-).  So what is an application
developer to do?  (The `just about' above refers to the exception
that you can make a `symbol' font which is treated as an incomprehensible
thing that the operating system does not mess with.  You can put your
glyphs in any order you like and access them by numeric code).

   Also "etc" are not there: these are the only Latin "aesthetic
   ligatures" that are there, eg there is no ck ligature.

Yes, I know, and fj and a few others one might like see there.  It took
them a long time to even add ff, ffi, ffl to the basic fi and fl.

   > since (i) we do not have a usable and widely accepted glyph
   > standard, and (ii) because most software wants to be able to have
   > *some* way of telling what glyph in a font is to be used.

   These do seem to be the two problems driving this issue.

   But not just "some way": it seems that the only object some software can
   use is a fixed-length number; and *only one* correspondence from
   these numbers to ... what? ... to glyphs (in all fonts, in some fonts, or
   what??) or to characters (ie units of information) or to both (ie
   a one-one relationship between glyphs and characters?).

Well, there are many borderline cases we can argue about.  But let just
take `greater'.  I know what that means and you do.  And we'll in most
cases recognize which glyph if any in a font is supposed to represent it.
It is very convenient that in most encodings it is at char code 62
in all fonts, bold, regular, oblique, blackletter, Script, what have you.  
The more of this uniformity we have the better.

   >  But do I really need - in English - to make a distinction
   > between the characters A-Z and the glyphs A-Z?  Or, beyond that,
   > most of the glyphs in the ISO Latin X tables (if we ignore the
   > mistakes and bad choices made).

   I have little qualification to answer this but it may well be that you
   do not need to make a big thing of the difference in these cases.

   But the point of Unicode is to remove such cultural dominance of modern
   European languages on IT.

And it fails in that.  It *does* succeed in assiging unique codes to
characters from many languages.  But it fails in dealing with the
different writing systems which have all sorts of features not found
in Western languages.  Look for example at the rapidly growing
`alphabetic presentation forms' put in to try and cope with a bit of this.

   > But anyway, meantime we need to make life easier!  And despite all the
   > explanations and arguments I don't see a whole lot wrong with using
   > UNICODE as essentially a glyph standard for Latin X, Cyrillic, Greek,
   > and yes, most math symbols, relations, operators, delimiters etc.

...

   > Except that unfortunately they don't cover enough of math to be 
   > really useful...

   And never will, according to some definitions of useful; but it will
   also not cover my ck ligature, or all the lovely twirly things in
   Poetica and similar fonts.

   So set up a 16-bit glyph encoding if you wish, but do not try to
   change Unicode because you want it to parallel your glyph encoding.
   Leave Unicode itself to do what it is intended for.

That is not an option.  On systems with system level font support
(i.e. not Unix) you get a lot of power from what the system provides
and at the same time inherit any limitations `they' put in.  So in
the case of Windows NT for example (and perhaps AT&T Plan 9) the OS
supports UNICODE and applications can use it to display and print.
BUT: what they consider not to be in UNICODE is not accessible.

So for example when I convert Lucida Bright from Type 1 to TrueType
format using the built in converter, I can get at all the glyphs
except the ones not in their table, such as dotlessj.  Ditto for ATM
(now in beta test).  So the issue of what *is* in UNICODE *is* important.

Aside from that we don't want a hundred different versions of UNICODE++
(like the mess we have with \special).  I have put dotlessj at FB0F,
but who knows where you will put it?

   chris

Berthold.

Follow-Ups:
- Re: Unicode and math symbols
  - From: Chris Rowley <C.A.Rowley@open.ac.uk>
- Re: Unicode and math symbols
  - From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>

References:
- Re: Unicode and math symbols
  - From: Chris Rowley <C.A.Rowley@open.ac.uk>

Prev by Date: Re: Unicode and math symbols
Next by Date: Re: Unicode and math symbols
Prev by thread: Re: Unicode and math symbols
Next by thread: Re: Unicode and math symbols
Index(es):
- Date
- Thread