Thu 11 May: TeX Hour: Using LaTeXML to access audit arXiv LaTeX source files

Deyan Ginev deyan.ginev at gmail.com
Fri May 12 14:46:22 CEST 2023


Hi Jonathan,
(and all TeX Hour enthusiasts)

Thank you for setting aside the time to do a deep dive in LaTeXML's log
messages - and for making the recording public for interested eyes to
follow along.

Designing log messages is a curious task. There is a balance between
informing the users of a tool which part of their input caused a hiccup
(and how to fix it),
while keeping enough information in for the developers to diagnose what
really went on internally.

We have (relatively recently) started separating the full log from the
brief log we emit on the standard error stream (STDERR), but only the full
log is currently available on ar5iv.
As you have noticed, some parts of that can be quite impenetrable for a
casual user.
It may be beneficial to put some latexml effort in crafting a painless
user-facing log file going forward, also to arXiv users' benefit. And
separately, continue to keep an informative developer-facing log.

I felt everyone's pain when you tried demystifying the ALLCAPS math grammar
category blocks. They used to be trendy back in the day! Less so in 2023
I'm afraid.

Let me extend an invitation I've made before to everyone reading here:
Feel welcome to send any finer points our way, and to bravely open GitHub
issues on the latexml [1] and ar5iv [2] repositories, respectively.
As you may imagine, we are busily working on a variety of upgrades, but the
extra clues from the community can be quite helpful for triage.

Greetings,
Deyan

P.S. I am hoping we will soon have at least a couple of tools which can
proof-listen to ar5iv from one's web browser, which will allow us to do a
latexml development sprint focused on improved audio renderings.

[1] https://github.com/brucemiller/LaTeXML/
[2] https://github.com/dginev/ar5iv/


On Wed, May 10, 2023 at 3:42 PM Jonathan Fine <jfine2358 at gmail.com> wrote:

> Hi
>
> The arXiv has about 2.5 million articles, most of which have been
> processed with LaTeX to produce PDF. In addition, most of these LaTeX
> articles have been processed with LaTeXML, to produce HTML. Recently the
> arXix has announced it will be making this HTML available, to improve
> accessibility. Tomorrow's TeX Hour is about using LaTeXML to audit
> accessibility of the arXiv LaTeX source.
>
> TeX Hour: Thursday 11 May, 6:30 to 7:30pm BST
> More information:
> https://texhour.github.io/2023/05/11/latex-access-audit-latex/
> Zoom URL:
> https://us02web.zoom.us/j/78551255396?pwd=cHdJN0pTTXRlRCtSd1lCTHpuWmNIUT09
>
> LaTeXML produces a log file, containing warnings and errors. It provides
> to some degree an accessibility audit of the LaTeX source files on the
> arXiv. Tomorrow's TeX Hour is an informal preliminary report on my efforts
> to use thes log files to audit arXiv source for accessibility. Results so
> far are outnumbered by problems, but it's early days.
>
> Going to https://ar5iv.labs.arxiv.org/feeling_lucky will send you to a
> random arXiv article in HTML. At the bottom of that page there is a link to
> the LaTeX-to-HTML conversion report (the log file), and also the arXiv PDF.
> Getting the LaTeX source is more work. Automating all this is one of the
> early problems.
>
> wishing you safe and accessible TeXing
>
> Jonathan
> <https://ar5iv.labs.arxiv.org/feeling_lucky>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20230512/fcb0dc12/attachment.html>


More information about the texhax mailing list.