TEXMFHOME on Windows (for users with long names, diacritics or spaces in their names)

Siep Kroonenberg siepo at bitmuis.nl
Sat May 25 18:01:46 CEST 2024


On Sat, May 25, 2024 at 04:25:06AM +0200, Reinhard Kotucha via tex-live wrote:
> On 2024-05-24 at 22:48:34 +0200, Denis Bitouzé wrote:
> 
>  > Couldn't Powershell be useful here?
> 
> Hi Denis,
> this is a good point.  But I must admit that I don't have access to
> a Windows machine, hence I can't try anything myself.
> 
> You have to distinguish between "Windows PowerShell" (powershell.exe),
> which is part of any Windows installation, and "PowerShell" (pwsh.exe).
> AFAIK the latter must be downloaded and installed explicitly.
> 
> The difference is that powershell.exe uses UTF16LE by default (this is
> how filenames are encoded internally on Windows) and that pwsh.exe
> uses UTF-8 by default.
> 
>    https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_character_encoding?view=powershell-7.4
> 
> If the batch file invoking the Perl script is replaced with a pwsh
> script I suppose that things behave as on Unix.  The drawback is that
> users have to download and install external software.
> 
> When using "Windows PowerShell" (powershell.exe) it should be possible
> to switch from UTF-16LE to UTF-8 within the script but in this case
> UTF-8 sequences are preceded by a Byte Order Mark (BOM) which can
> cause trouble.
> 
> On the other hand, if it's possible to tell cmd.exe to use UTF-8 and
> the installer works as expected with non-ASCII characters in
> file/directory names, I assume that the UTF-8 BOM doesn't hurt because
> it's certainly present.
> 
> The BOM is necessary for UTF-16 and UTF-32 encodings because
> characters are stored as binary numbers where the byte order matters.
> Characters in UTF-8 are encoded as sequences of bytes and thus don't
> need a BOM.
> 
> On 2024-05-24 at 21:57:04 +0200, Siep Kroonenberg via tex-live wrote:
> 
>  > And if a script sets the codepage to utf-8, then this setting will
>  > NOT be inherited by child processes.
> 
> I don't think that this is the case with PowerShell.  There is a
> variable called $OutputEncoding.  The name wouldn't make sense if the
> specified encoding is only used internally and not by child processes.
> 
> If the sole problem is that forcing cmd.exe to use UTF-8 requires
> admin permissions I believe that it's worthwhile to keep an eye on
> PowerShell which uses Unicode by default.

We shall always need a wrapper for powershell scripts, to tell
powershell that execution of the script is allowed. Still,
powershell is something to look into.

Also, do not forget that perl has its own encoding-related
idiosyncrasies.

So, no easy answers, and it will be a while before I can look into
it again.

-- 
Siep Kroonenberg


More information about the tex-live mailing list.