[tex4ht] unicode HTML title

William F Hammond gellmu at gmail.com
Mon May 27 08:46:00 CEST 2013


On Sun, May 26, 2013 at 11:07 PM, Jaroslav SOBOTA <jsobota at kky.zcu.cz>wrote:

> Perhaps I did not describe the problem clearly - what I'm talking about is
> the <title>...</title> element in the resulting HTML file, which really
> looses the accents, umlauts, etc. See the attached screenshot.
>
> Jaroslav
>
> Dne 27.5.2013 4:59, Radhakrishnan CV napsal(a):
>
>> On Mon, May 27, 2013 at 3:16 AM, Jaroslav Sobota <jsobota at kky.zcu.cz
>> <mailto:jsobota at kky.zcu.cz>> wrote:
>>
>>     I think it is tex4ht issue indeed, the accents are missing in the
>>     resulting HTML file. If the accents were there but weren't
>>     displayed, then it would be a browser issue.
>>
>>
>> Seems like it is your browser problem. Attaching two screen shots
>>
>>  1. mybook.png: screen shot of your html as rendered in Firefox in my
>>     Linux laptop.
>>  2. jaroslav.png: screen shot of html generated in my Linux laptop and
>>
>>     rendered in the same Firefox browser.
>>
>>
>> --
>> Radhakrishnan
>> River Valley
>> <https://maps.google.com/maps?**q=River%20Valley,%**20Thiruvananthapuram%
>> **20Neyyardam%20Road,%20Kerala,%**20India&vector=1<https://maps.google.com/maps?q=River%20Valley,%20Thiruvananthapuram%20Neyyardam%20Road,%20Kerala,%20India&vector=1>
>> >
>>
>>
On second look, I agree that the TeX \title is written with accents in the
"h2" header (in the HTML "body") but written without accents in the "title"
(in the HTML "head").  My guess is that Eitan did that deliberately some
years ago.  But let me say that, as a general proposition, one needs to be
more guarded about what goes in the HTML "title" (written in the border of
the browser's window) than what goes in an HTML header.  For example, math
mode should not be used in the HTML "title".  One concern is that when a
browser writes in its window border, it does so as the guest of the
operating system's window manager.  Another concern is that HTML "titles",
as opposed to headers, are frequently grabbed as meta-data (text strings).

However, whatever Eitan's thinking might have been, these days Unicode
"word" characters should be allowed to go in HTML "title" strings.

     -- Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex4ht/attachments/20130526/e10f170e/attachment.html>


More information about the tex4ht mailing list