LaTeX and \l and generated pdf and suppressing chars

Norbert Preining norbert at preining.info
Thu Jun 20 16:37:21 CEST 2024


Hi Ulrike,

On Thu, 20 Jun 2024, Ulrike Fischer wrote:
> Well I do with Adobe reader on windows.  Copy&paste is a processor
> feature, and the exact result depends on the viewer you use. They

Ouch indeed indeed. Interesting to see that okular does the "right"
thing in this case.

> Yes I know this. But you have to accept reality: the PDF has a flaw
> and copy&paste won't work correctly. So how are you hoping to repair
> that without changing the encoding?

I cannot repair 2.5M papers from the past. But I can make text
extraction recognize the "suppress" special name and deal with it ;-)
I am not really concerned about copy/paste, but text extraction via
pdfminer, which parses the PDF.

Thanks Ulrike, that all helped me a lot understand a few of the moving
parts. I guess my next step is forking pdfminer(.six) (which development
seems to be dead anyway) and add try to fix what I see.

Thanks again, and all the best

Norbert

--
PREINING Norbert                              https://www.preining.info
arXiv / Cornell University   +   IFMGA Guide   +   TU Wien  +  TeX Live
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13


More information about the texhax mailing list.