Line lengths with polyglossia in Cantonese and Japanese ?

Ken Moffat zarniwhoop at ntlworld.com
Fri May 31 01:54:11 CEST 2024


Background - for years I've intermittently been documenting various
TTF and OTF fonts for use on Linux: how they look, which codepoints
they contain, and from that which modern languages are covered -
showing at a minimum the alphabet and UDHR Article 1 for languages
using Latin, Cyrillic and Greek 'L/C/G' alphabets, and just Article
1 for others such as CJK languages.

To list all the possible codepoints I care about for the various
languages I use XeLaTeX.  I started trying to understand the
variations of CJK (Han unification) a little while ago, creating a
test page to look at glyph shapes known to vary showing several CJK
fonts targetted at different languages.  Adding one codepoint at a
time from yet another font which had patchy overage convinced me
that XeLaTeX, not LuaLaTeX, was the way to go (much quicker, and
did not fail if nothing found in the font).

Meanwhile, I was updating details for L/C/G Sans languages and got
increasingly annoyed at lines overflowing. When I started a few years
ago, it seemed to me that most L/G/C fonts did not severely overflow
in my en_GB locale and I had "more fish to fry".

Searching, I first discarded polyglossia because it lacked anything
for the most-awkward overflowing languages (Maltese, some obscure
Cyrillic languages which I show because they use uncommon letters).
For those I took an alternative approach.

But eventually I got to a font with Armentian and Georgian.
Reviewing Polyglossia I discovered it covered those, and trying it
gave me line lengths which looked OK.  From there I've now reworked
the L/G/C fonts using Polyglossia for all supported languages - most
fonts are fine, a few still overflow the margins to varying degrees
and I adjust them as best I can.

[ side note:: some fonts lack the necessary tags to enable polyglossia
(via fontconfig) to process them: 'cyrl', 'grek' and for CJK 'hani'
(the latter applies to WenQuanYi Zen Hei which is a preferred font
in fontconfig;s 65-nonlatin.conf).]

For those still reading: I'm now starting to look at the NotoSerif
CJK fonts.  So far I've played with the 'sc' (Simplified Chinese)
and 'jp' (Japanese) versions.  Showing Article 1 in all the CJK
languages for each font, so probably using a "looks wrong" style for
the other languages. Using:

 {chinese}[variant=traditional] for Cantonese and Traditional Chinese

 {chinese[variant=simplified] for Simplified Chinese

 {japanese} for Japanese

I find:

Cantonese: needs the language to display the punctuation as raised,
but produces a single line which probably extends past the edge of
the PDF.

Japanese: produces a single line which probably extends past the
edge of the PDF.

Simplified Chinese: splits to a second line at approximately the
expected margin, with correct low punctuation.

Traditional Chinese: splits to a second line at approximately the
expected margin, with correct raised puunctuation.

Am I overlooking something (very probable), or will I always need to
manually adjust the line lengths for both Cantonese and Japanese ?

I can sort of understand this for Cantonese (I have not tried the HK
variant of NotoSerifCJK at the moment, that is some days away), but
I had assumed that Japanese would adapt to the intended line length
?

ĸen
-- 
When one person suffers from a delusion, it is called insanity.
When many people suffer from a delusion it is called a Religion.
 -- Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance.


More information about the tex-live mailing list.