Discussion:
Encoding of files in Master/texmf-dist/doc/man/man1
Add Reply
t***@t-lab.opal.ne.jp
2018-04-09 14:45:54 UTC
Reply
Permalink
Raw Message
Hi all,

I'm confused with encoding of files in Master/texmf-dist/doc/man/man{1,5} .
I checked encoding of them and found as following:
a2ping.1, kpsewhere.1, latex2man.1, mkjobtexmf.1, thumbpdf.1 : Latin1 (iso-8859-1)
findhyph.1, luaotfload-tool.1, mendex.1, luaotfload.conf.5 : UTF-8
others : ASCII

Some pdf files comverted from UTF-8 roff files (luaotfload-tool.man1.pdf, luaotfload.conf.man5.pdf)
seem to include some character corruption.
For example,
a sentence in luaotfload.conf.man5.pdf
'For example, the âcolor callbackâ must be a string of' ...
should be
'For example, the “color callback” must be a string of' ...

findhyph.man1.pdf and mendex.man1.pdf are found to be fine.

Is there any reason to keep mixture of file encoding?


Takuji TANAKA
Werner LEMBERG
2018-04-09 18:04:38 UTC
Reply
Permalink
Raw Message
Post by t***@t-lab.opal.ne.jp
I'm confused with encoding of files in
Master/texmf-dist/doc/man/man{1,5}. I checked encoding of them and
Latin1 (iso-8859-1)
findhyph.1, luaotfload-tool.1, mendex.1, luaotfload.conf.5 : UTF-8
others : ASCII
This is a known issue. Karl is working on it.
Post by t***@t-lab.opal.ne.jp
Is there any reason to keep mixture of file encoding?
Well, TeXLive takes whatever is delivered to CTAN. It's a bad idea
IMHO to modify the files in TeXLive only; instead, the authors should
fix this.

The good news: The next groff version uses the `uchardet' library to
improve the heuristics for identifying a man page's character encoding
in case an encoding tag (using the Emacs style) is missing. The bad
news: It will take some more weeks until this version (1.22.4) gets
released.

Until then it would be an excellent idea if man pages not encoded in
latin-1 (or ASCII) get tagged with

.\" -*- mode: troff; coding: <encoding> -*-

as the first (or second) line; <encoding> could be, for example,
`utf-8'. Invoking

groff -k ...

would then automatically convert the man page to something groff can
understand.


Werner
t***@t-lab.opal.ne.jp
2018-04-10 13:26:59 UTC
Reply
Permalink
Raw Message
Thank Werner for helpful comment.
Post by Werner LEMBERG
Post by t***@t-lab.opal.ne.jp
a2ping.1, kpsewhere.1, latex2man.1, mkjobtexmf.1, thumbpdf.1 : Latin1 (iso-8859-1)
findhyph.1, luaotfload-tool.1, mendex.1, luaotfload.conf.5 : UTF-8
others : ASCII
This is a known issue. Karl is working on it.
I have understood the status.
Post by Werner LEMBERG
Post by t***@t-lab.opal.ne.jp
Is there any reason to keep mixture of file encoding?
Well, TeXLive takes whatever is delivered to CTAN. It's a bad idea
IMHO to modify the files in TeXLive only; instead, the authors should
fix this.
The good news: The next groff version uses the `uchardet' library to
improve the heuristics for identifying a man page's character encoding
in case an encoding tag (using the Emacs style) is missing. The bad
news: It will take some more weeks until this version (1.22.4) gets
released.
It sounds great.
I will check groff development and new features.

Best,
Takuji TANAKA

Loading...