[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and math symbols

Martin wrote --

> The "rubish" parts are usually due to backwards compatibility issues.
> I.e. there is some national standard or some industry or company
> encoding that contains these things.

Which may well also explain why the set of math symbols is so bizarre.

> In general, I agree with Chris that for systematic form changes
> in the alphabet, additional information (such as font information
> on a lower level, or structural information on a higher level)
> should be used. On the other hand, if there is a well-used
> Math symbol that isn't in UNicode, I would suggest to make
> a formal proposal for putting it in, with all the necessary
> data.

What makes it well-used;? If you look at something like formal methods
or logic, you find all sorts of symbols and the number increases
rapidly: are these "well-used"? are they "maths"?

The general problem is that mathematical notation is by its nature
not standardised, in either form or meaning.  
> One thing not really clear in the Math area is the distinction
> between semantics and abstract form.

And also the relationship between them.

Please do not let Unicode become caught up in the problem of
expressing the semantics of mathematical notation.

The only semantics that Unicode gives to 0061 is a standard name and
the fact that most people expect something with that name to look like
"a" or "the rounder form used in some fonts"; it does not say that
"when used in English as the only letter in a word it is the definite
article", nor should it.

So please leave the meaning of math symbols (which is also highly
context-dependent) to the mathematical reader.

Another reason for keeping such discussions out of the Unicode area
right now is that a lot of effort is going into decding what can and
should be standardised at the DTD level (in particular HTML-math).

I think that this fits in with bb's comment that there are standard
SGML public-entities for math notation and with the way users are used
to encoding maths: at least for the moment it should be the only place
where we try to standardise any sort of semantics.

One reason for this is that the natural structure of even quite simple
typeset maths is visually much more complex than the Unicode model
(for Latin-based systems) of "base+diacritics" and it is not closely
related to the more complex visual structure of other writing systems.
It may well be possible to extend Unicode to cope with this but again
this woul only cover some math notation, never all of it, so it does
not seem to me to be a worthwhile activity, at least not right now.

> For example, should there
> be one codepoint for "set difference", and this could look
> like "-" or like "\" or whatever, depending on the font (and
> maybe other setting),

No, that is not a font-dependent thing (at least I would shoot any
editor who decide it was:-).

I can also easily find places where both would be needed (for very
similar operations that had to be distinguished in a certain context)
or inded 3 or more similar things (at some stage one would stop using
different symbols and instead use (TeX notaion): \mathbin{\setminus_n}
...in other words, I think that this is the wrong question.

> while there is another "-" for subtraction,
> one for hyphen, and so on, or should there be one and the same
> "-" for various purposes, and one and the same "\" for various
> purposes, and the slight differences in shape, size, and placement
> be dealt with depending on circumstances (e.g. Math or not).

This is a much better question: one reason is that I suspect 
that "minus" needs to be used in places that really do not require to
be labelled "language=math" (but some would argue with this).

And if something that should look like a minus sign is used outside
math then it would be a great blessing if the right symbol was used.

Within maths, based on my opinion above, I think that the canonical
form should be an SGML entity called "minus" (or possibly two: one
called "unary-minus" and one called "binary-minus", but not one called
"set-minus"; and here we get dangerously close to the question: how
much of the mathematical semantics should/can be encoded at any
level).  This would not preclude a unicode character called "minus"
appearing within math, but it means that when it appears there it is a
short-form for an entity (in the abstract model, I am not saying that
SGML's contorted syntax will allow this).

> I guess we certainly need some amount of both (the later e.g.
> to distinguish between a hyphen and a dash), but in general,
> on the level of character encoding, it's easier for most
> people to deal with "one shape, one code", and so that will
> probably prevail in the long run.

How much longer must a dash become before its shape changes?:-)
Yes, I agree with you: let pragmatism rule; which is why I asked some
time ago (maybe I missed the answer?):

  What are the practical benefits of having some set of mathemtical
  symbols in Unicode?  Is it the canonical name that is important?
  Or assigning a standard code-value to that name, or both?