[tex4ht] Automatic HTF file generation

Michal Hoftich michal.h21 at gmail.com
Mon Nov 12 11:01:43 CET 2018

Hi all,

I've finally finished the Htfgen project [1]. It's objective is to
automatize the creation of the HTF font mapping files. These files are
used by tex4ht to map character codes in the DVI files to Unicode.

There are two new scripts: scanfdfile and dvitohtf. The first one
searches for declared fonts in the FD files, the other generates
literate TeX file for HTF generation. Sample usage is as follows:

   cat /usr/local/texlive/2018/texmf-dist/tex/latex/ebgaramond/*.fd |
scanfdfile | dvitohtf > ebgaramont-htf.tex
   tex ebgaramont-htf.tex

This will create HTF files for all detected fonts defined in FD files
for EB Garamond.

dvitohtf can also generate HTF files for missing fonts in the DVI
file. So if tex4ht reports missing HTF files, it can be used directly
on the DVI file:

   dvitohtf sample.dvi > missing.tex
   tex missing.tex

dvitohtf supports both virtual and tfm fonts. It looks for virtual
fonts first, the tfm file is used only when no vf is found. It looks
for all fonts referenced in the virtual font and tries to look for
corresponding .enc files in pdftex.map. The .enc files contain glyph
lists, which are then mapped to Unicode.

It also parses the .pfb file for font family name and tries to detect
style (italic, bold, small caps) from the font full name saved in the
.pfb file.

It computes hashes for the font tables, so duplicate font tables
aren't written, the fonts with same characters just link to the first
used font.

If no .enc file is found, then the font cannot be supported. There can
be also missing mappings between glyphs and Unicode. The missing
mappings are reported in the TeX file.  Htfgen contains large mapping
files, but some fonts just use some custom glyphs which doesn't have
Unicode equivalent. For example Q_u ligatures etc. In this case the
mapping must be added by hand to glyphlists/glyphlist-fixes.txt.

It works reasonably well for fonts generated by Fontinst, because they
usually use standard glyph names, contains .enc files, etc. For
complex virtual fonts, especially math, it fails. HTF files for such
fonts still needs to be created by hand.

What to do now? There are some wrong HTF files in tex4ht sources, for
example Linux Libertine support is wrong for some ligatures. I am sure
there will be more examples, especially fonts with large number of
ligatures. Their support has been added few years ago, but only in T1
font encoding. We should remove HTF generation for these files from
the huge literate sources for fonts and create smaller literate TeX
file for each of these fonts. This should speed up the tex4ht build
and it should be easier to manage.

Any volunteers are welcomed.

Best regards,

[1] https://github.com/michal-h21/htfgen

More information about the tex4ht mailing list