current charset guessing
Alain Bench
xwck at oreka.com
Sat Jan 15 01:17:10 CET 2005
On Tuesday, January 11, 2005 at 3:54:43 PM +0100, Werner Koch wrote:
> On Fri, 7 Jan 2005 18:23:34 +0100 (CET), Alain Bench said:
>> fails on implicit charsets, ambiguous names, or platform specific
>> spellings.
> we would need to reimplement everything from libiconv or check whether
> a proper libiconv is available.
I assume you meant libcharset. Yes: Reimplement, or reuse it, use it
if available, or even provide it. After all it's already squatting in
the tarball: gnupg-1.4.0/intl/localcharset.c :-)
But beware of the Win32 OEM/ANSI mismatch problem.
>> nl_langinfo(CODESET) also needs sanitizing
> this is something libiconv should care about
Libcharset, yes. Also libiconv accepts some limited common aliases
as parameters, and that helps, but not in all cases. A platform specific
iconv *should* accept the same names that were output by its
nl_langinfo(CODESET), or so I hope. Hybrid cases are complicated.
Example: Allegedly some versions of AIX may report "IBM-850"
CODESET. That's unknown to libiconv 1.9.2 who knows 4 aliases:
| $ iconv -l | grep 850
| 850 CP850 IBM850 CSPC850MULTILINGUAL
But on AIX, libcharset canonicalises "IBM-850" to "CP850" and
reports this known name.
>> "make check" also fails
> changed to a warning.
>> in CP-1252, not in Latin-1
> Removed that.
>> there is a whole set of aliases [28591 =3D=3D L1]
> I adapted that list and use it for Windows.
Great! Thank you. :-)
>> Libcharset seems to call GetACP() only, never GetConsoleOutputCP().
>> IIUC that would be false for console apps?
> GetConsoleOutputCP is the correct thing to do but it does not harm to
> fall back to getACP.
I wonder how The Bat!=99 can make GetConsoleOutputCP() return 0 when
needed. FreeConsole() to detach, or something like that?
I wonder how to decide which function applies, OCP or ACP. I mean:
Typically one gives 850 the other 1252. When you run in text mode,
that's typically in a console: OEM CP 850 is good. But try to run GnuPG
w32cli-1.4.0a in the default rxvt of MSYS-1.0.10. That rxvt is a Latin-1
terminal, but GetConsoleOutputCP() and GnuPG still report CP850, and of
course umlauts are garbled (real key on keyservers):
| $ gpg -vvv --list-keys BD7C8AA1
| pub 4096R/BD7C8AA1 2005-01-01 [expires: 2005-12-31]
| uid Hans M=81ller <ndof at gmx.li>
| sub 4096R/0958388C 2005-01-01 [expires: 2005-12-31]
|
| gpg: using character set `CP850'
| gpg: using PGP trust model
Normally should be "M=FCller" (u umlaut). BTW I wonder why
stdout/stderr are reordered.
>> US-Ascii is not Latin-1.
> we got a lot of complaints about these warnings from US people and it
> seesm reasonable that many more machines are not configured properly
> for Latin-1 than those who are explicitly using ascii.
Tolerance for misconfigured systems is good, but maybe not at the
cost of breaking legitimate usage, even rare. May I make two proposals:
-1) Get rid of the warning message on simple display of unconvertable
chars. Unconfigured locale people would see (faked here):
| $ gpg -vvv --list-keys 0x882B59FD
| gpg: using character set `ASCII'
| gpg: using PGP trust model
| pub 512D/882B59FD 2005-01-08
| uid Ren\xc3\xa9 Lec\xc5\x93ur <joe at foo.bar>
No more annoying warning, but a strange accents display that may
lead them to read the doc and fix locale. Doc may point to
Sven Mascheck's site, and provide quick "export LANG=3Den_US" hint.
-2) Make and document a special "novars" case: Where locale variables
are applicable (not Win32), if all 3 of LC_ALL, LC_CTYPE, and LANG are
unset, and so far guessed charset is US-Ascii, then charset =3D Latin-1.
In -vvv mode, unconf guys would be hinted (again faked output):
| $ gpg -vvv --list-keys "Ren=E9"
| gpg: using character set `ISO-8859-1' novars fallback: see http://expla=
nations
| gpg: using PGP trust model
| pub 512D/882B59FD 2005-01-08
| uid Ren=E9 Lec\xc5\x93ur <joe at foo.bar>
No more annoying warning, and accents hopefully displayed at best
possible, even in these adverse conditions. Harmless for normal guys.
Bye! Alain.
--=20
When you want to reply to a mailing list, please avoid doing so with
Lotus Notes 5. This lacks necessary references and breaks threads.
More information about the Gnupg-users
mailing list