Validation of User ID with invalid (non UTF-8) encoding
Werner Koch
wk at gnupg.org
Tue Apr 29 14:08:58 CEST 2014
On Tue, 29 Apr 2014 11:11, martijn.list at gmail.com said:
> Some keys stored on the public key servers have User IDs which seem to
> be encoded with a different encoding than UTF-8.
Right. Old PGP versions didn't care about the requirement for utf-8 and
used whatever the terminal was configured to (i.e. Latin-1). But that
should only be a display problem. See below for the code GPA uses to
detect and fix the display problem.
> $ gpg --check-sigs 0xA8364AC589C44886
> pub 1024D/89C44886 1999-09-30
> uid Lasse M\xberkedahl Larsen <lml at gr3.dk>
> sig! 89C44886 1999-09-30 Lasse M\xberkedahl Larsen <lml at gr3.dk>
> sub 2048g/0CA36EF9 1999-09-30
> sig! 89C44886 1999-09-30 Lasse M\xberkedahl Larsen <lml at gr3.dk>
>
> My own Java based tool however fails to validate this User ID, i.e., the
> calculated hash always returns a different value. Also PGP desktop
Note that the above output is for humans and has been sanitized to
inhibit attacks using ANSI control sequences. To check the signature
you need to use the bare OpenPGP packets and not some gpg output.
I am not aware of any PGP problems with user ids - the verification uses
the data verbatim and is transparent to the encoding.
Shalom-Salam,
Werner
====
/* Return the user ID, making sure it is properly UTF-8 encoded.
Allocates a new string, which must be freed with g_free (). */
static gchar *
string_to_utf8 (const gchar *string)
{
const char *s;
if (!string)
return NULL;
/* Due to a bug in old and not so old PGP versions user IDs have
been copied verbatim into the key. Thus many users with Umlauts
et al. in their name will see their names garbled. Although this
is not an issue for me (;-)), I have a couple of friends with
Umlauts in their name, so let's try to make their life easier by
detecting invalid encodings and convert that to Latin-1. We use
this even for X.509 because it may make things even better given
all the invalid encodings often found in X.509 certificates. */
for (s = string; *s && !(*s & 0x80); s++)
;
if (*s && ((s[1] & 0xc0) == 0x80) && ( ((*s & 0xe0) == 0xc0)
|| ((*s & 0xf0) == 0xe0)
|| ((*s & 0xf8) == 0xf0)
|| ((*s & 0xfc) == 0xf8)
|| ((*s & 0xfe) == 0xfc)) )
{
/* Possible utf-8 character followed by continuation byte.
Although this might still be Latin-1 we better assume that it
is valid utf-8. */
return g_strdup (string);
}
else if (*s && !strchr (string, 0xc3))
{
/* No 0xC3 character in the string; assume that it is Latin-1. */
return g_convert (string, -1, "UTF-8", "ISO-8859-1", NULL, NULL, NULL);
}
else
{
/* Everything else is assumed to be UTF-8. We do this even that
we know the encoding is not valid. However as we only test
the first non-ascii character, valid encodings might
follow. */
return g_strdup (string);
}
}
--
Die Gedanken sind frei. Ausnahmen regelt ein Bundesgesetz.
More information about the Gnupg-users
mailing list