[gnutls-devel] GnuTLS | gnutls_session_t unsafe to use from multiple threads due to TLS 1.3 rekeying (#1717)

Read-only notification of GnuTLS library development activities gnutls-devel at lists.gnutls.org
Fri Jun 13 13:44:27 CEST 2025



Daniel P_ Berrangé created an issue: https://gitlab.com/gnutls/gnutls/-/issues/1717



## Description of problem:

Note, this bug reports comes out of a QEMU bug we've been struggling to resolve for a few years https://gitlab.com/qemu-project/qemu/-/issues/1937

I've also seen https://gitlab.com/gnutls/gnutls/-/issues/1567  talking about SEGVs, which may well ultimately be the same problem as this bug. I've never managed to create any SEGV in my own testing though, only error return codes.

The GNUTLS docs about thread safety have a list of conditions which, if satisfied, are expected to allow a `gnutls_session_t` object to be used for concurrent I/O from 2 threads (ie for parallel `gnutls_record_send` & `gnutls_record_recv`, but **nothing** else):

https://gnutls.org/manual/gnutls.html#Thread-safety

AFAICT, even if an application follows that guidance, usage of `gnutls_session_t` from 2 threads for concurrent send & recv remains unsafe by default.

At the very least this appears always unsafe if TLS 1.3 is negotiated for the connection, I've not checked if older TLS protocol versions are similarly impacted by default.

In my testing any TLS 1.3 rekeying message sent **or** received, will corrupt the internal session state if there are concurrent send & recv operations taking place in separate threads.

If the application sets `GNUTLS_NO_AUTO_REKEY` on their `gnutls_session_t` object, that will merely prevent *their end* of the connection *initiating* a rekey operation. 

The remote peer may still initiate a rekey operation, either automatically or manually. In addition the  rekey request from the remote peer, might have set the flag to require the local peer to initiate its own rekey for the other direction of the channel.

IOW setting `GNUTLS_NO_AUTO_REKEY` is insufficient to prevent rekeying in either direction on a TLS 1.3 session.


As an alternative an application could arrange for `CHACHA20-POLY1305` to be the only listed algorithm in the priority string, since that algorithm does not require automatic rekeying. The problem with this is that there's no guarantee the remote peer will support this - only `AES-128` is mandated by the RFC IIUC, or an admin defined priority string on the remote peer may prevent `CHACHA20-POLY1305` being offered.

Lets assume the handshake does negotiate `CHACHA20-POLY1305`, such that auto-rekeying is not required. This is still not sufficient to guarantee that an application's usage of `gnutls_session_t` from multiple threads for send & recv is safe.  The remote implementation may have (redundantly) enabled auto-rekeying for `CHACHA20-POLY1305`, or the remote peer's application code could initiate a manual rekeying at any time regardless of cipher choice.


AFAICT, the only way to safely use `gnutls_session_t` for concurrent I/O from two threads is to have an application level mutex acquired & released around `gnutls_record_send` and `gnutls_record_recv` calls, but that has problems which make it suboptimal:

* It eliminates concurrency entirely defeating the point of using 2 threads
* It entirely prevents all progress if one thread sits in a blocking `gnutls_record_recv` while wanting to also call `gnutls_record_send`

To mitigate the second problem there are two options

* Use poll() to detect readability/writability before calling gnutls_record_send/recv, so they (hopefully) don't block.
* Register custom push/pull functions which then release + reaquire the mutex either side of the recv/send sockets syscalls. 

The second option would rely on GNUTLS' concurrency problems being confined to exclusively **after** the syscall completes, or exclusively **before** the syscall is invoked. If there is anything in GNUTLS rekey process which relies on state being maintained across the OS syscall, then this release+reacquire dance would still be potentially unsafe. My testing so far though, hasn't exposed a problem with release+reacquire dance inside the push+pull functions.

Both mitigations share the problem, however, that the application mutex is preventing any concurrency on the cipher operations. Even with hardware accelerated AES, cipher operations can be a performance bottleneck, so neither are really viable workarounds if needing to maximise concurrency in the general case.

Conceptually I also don't like the idea of setting GNUTLS_NO_AUTO_REKEY as that is disabling a cryptographic security recommendation from the TLS 1.3 RFC, and thus not rekeying could be considered a security bug / CVE in any app that does that.

AFAICT, the only way to fix this without impacting concurrency/performance, is for GNUTLS to have its own private mutex(es) or rw-lock to protect its internal session state across the rekey handling, when there are concurrent send/recv operations taking place.

## Version of gnutls used:
Fedora build of 3.8.9 and upstream git master build of af6a39894e0dc8e1dd3f9690f7fc011d0ffe86b5

## Distributor of gnutls (e.g., Ubuntu, Fedora, RHEL)
Fedora and Upstream.

## How reproducible:
Non-deterministic frequency, but will always eventually hit when TLS 1.3 is negotiated and concurrent `gnutls_record_{send,recv}` calls are made from 2 threads.

Steps to Reproduce:
Use the following demo program which acts as both a single threaded server and two-threaded client (one send, one recv)

[tlsrekey.c](/uploads/d9a9465521cc287814085ec4e49b9fd3/tlsrekey.c)

A default build of it will operate correctly, since all rekeying is disabled on server and client.

To demonstrate the flaw with client initiated auto-rekeying

```
$ gcc -g -Wall -lgnutls -o tlsrekey tlsrekey.c -DCLIENT_AUTO_REKEY
$ ./tlsrekey 
1569408: client sender
1569407: client receiver
1569409: server echo
1569408: send: The specified session has been invalidated for some reason.
1569407: recv: Decryption has failed.
```

This shows the client gnutls_record_send() call failing with invalid session, and the client gnutls_record_recv() call failing with bad decryption. It suggests the client's automatic rekey has corrupted its own session state


To demonstrate the flaw with server initiated auto-rekeying

```
$ gcc -g -Wall -lgnutls -o tlsrekey tlsrekey.c -DSERVER_AUTO_REKEY
$ ./tlsrekey 
1569654: client receiver
1569655: client sender
1569656: server echo
1569656: echo recv: Decryption has failed.
```

This shows the server gnutls_record_recv() call failing with bad decryption. The server is single threaded, however, so this is not a server flaw. It suggests the threaded client has sent bad data after receiving the server's rekey operation.

To demonstrate the flaw with server initiated manual rekeying

```
$ gcc -g -Wall -lgnutls -o tlsrekey tlsrekey.c -DSERVER_MANUAL_REKEY
$ ./tlsrekey 
1569673: client receiver
1569674: client sender
1569675: server echo
1569675: echo recv: Decryption has failed.
```

This shows the same behaviour as the server auto-rekeying, likely triggering client session state corruption.

Note the response to server rekeying is somewhat non-deterministic - it often takes many rekeying requests from the server before the client goes wrong. IME, frequency of problems can be improved with ``GNUTLS_DEBUG_LEVEL=5`` since that tweaks the timing of the race.

In testing auto-rekeying failures, I advise making it rekey more often than every 16 million records to speed things up. The following patch does that, and also adds some printfs which make the race hit more frequently (less verbose that using ``GNUTLS_DEBUG_LEVEL`` for this purpose):

[rekey-faster.patch](/uploads/d34e69334e51db76ad0e96ddd23e328a/rekey-faster.patch)

When building the test program `-DLOCK` and `-DUNLOCK_IO` can be defined to implement the workarounds discussed above and show that GNUTLS no longer corrupts its state. As mentioned though, these workarounds harm concurrency of I/O and/or cipher operations.

## Actual results:
`gnutls_record_recv` and `gnutls_recv_send` called concurrently from 2 threads, will periodically fail when a TLS 1.3 rekeying operation is performed, whether initiated by the local or remote peer, whether automatic or manually initiated by the remote peer.

## Expected results:
`gnutls_record_recv` and `gnutls_recv_send` can be safely called concurrently from 2 threads, provided they are the only  GNUTLS APIs used (as per https://gnutls.org/manual/gnutls.html#Thread-safety)

-- 
Reply to this email directly or view it on GitLab: https://gitlab.com/gnutls/gnutls/-/issues/1717
You're receiving this email because of your account on gitlab.com.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.gnupg.org/pipermail/gnutls-devel/attachments/20250613/c5e8d553/attachment-0001.html>


More information about the Gnutls-devel mailing list