Mailing List Archive

clean_child_exit and watchdog threads
Looking for help on an issue in mod_watchdog use and child exits.

Occasionally a httpd child crashes due to races between child pool being destroyed while watchdog threads are still running. The crash manifests most likely when OPENSSL_cleanup runs while another thread is generated a private key (singe that takes relatively long).

Beside the crash report, nothing else really fails since the child is terminating anyway and no requests are onoing. But still, it's not a nice thing.

Do you see an easy way to avoid this?

- Stefan
Re: clean_child_exit and watchdog threads [ In reply to ]
On Wed, Sep 25, 2019 at 11:54 AM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> Looking for help on an issue in mod_watchdog use and child exits.
>
> Occasionally a httpd child crashes due to races between child pool being destroyed while watchdog threads are still running. The crash manifests most likely when OPENSSL_cleanup runs while another thread is generated a private key (singe that takes relatively long).
>
> Beside the crash report, nothing else really fails since the child is terminating anyway and no requests are onoing. But still, it's not a nice thing.
>
> Do you see an easy way to avoid this?

AFAICT, clean_child_exit() destroys pchild only, and pconf is still
alive at exit() time.
Couldn't we use pconf only in child_init hooks of mod_ssl and mod_watchdog?
Does something like the attached patch fixes the crash?
Re: clean_child_exit and watchdog threads [ In reply to ]
The patch looks nice. Running it in my test suite over and over without any crash showing up!

Great work!

> Am 25.09.2019 um 13:24 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Sep 25, 2019 at 11:54 AM Stefan Eissing
> <stefan.eissing@greenbytes.de> wrote:
>>
>> Looking for help on an issue in mod_watchdog use and child exits.
>>
>> Occasionally a httpd child crashes due to races between child pool being destroyed while watchdog threads are still running. The crash manifests most likely when OPENSSL_cleanup runs while another thread is generated a private key (singe that takes relatively long).
>>
>> Beside the crash report, nothing else really fails since the child is terminating anyway and no requests are onoing. But still, it's not a nice thing.
>>
>> Do you see an easy way to avoid this?
>
> AFAICT, clean_child_exit() destroys pchild only, and pconf is still
> alive at exit() time.
> Couldn't we use pconf only in child_init hooks of mod_ssl and mod_watchdog?
> Does something like the attached patch fixes the crash?
> <some_pchild_to_pconf.diff>
Re: clean_child_exit and watchdog threads [ In reply to ]
Hmm, far less likely, but still:

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_CRASH (SIGSEGV)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: httpd [6106]

Application Specific Information:
crashed on child side of fork pre-exec

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_malloc.dylib 0x00007fff6b2e1c8c free_tiny + 243
1 libcrypto.1.1.dylib 0x0000000101363484 OPENSSL_LH_delete + 228
2 libcrypto.1.1.dylib 0x00000001013730f9 OBJ_NAME_remove + 105
3 libcrypto.1.1.dylib 0x000000010136368a OPENSSL_LH_doall + 74
4 libcrypto.1.1.dylib 0x0000000101373338 OBJ_NAME_cleanup + 72
5 libcrypto.1.1.dylib 0x00000001013577de evp_cleanup_int + 14
6 libcrypto.1.1.dylib 0x0000000101360ccf OPENSSL_cleanup + 335
7 libsystem_c.dylib 0x00007fff6b1da3d6 __cxa_finalize_ranges + 326
8 libsystem_c.dylib 0x00007fff6b1da6b3 exit + 55
9 mod_mpm_event.so 0x0000000101531546 clean_child_exit + 54 (event.c:768)
10 mod_mpm_event.so 0x0000000101531382 child_main + 1698 (event.c:2551)
11 mod_mpm_event.so 0x0000000101530c84 make_child + 436
12 mod_mpm_event.so 0x000000010152f7e5 event_run + 1093 (event.c:3256)
13 httpd 0x0000000100f0366b ap_run_mpm + 75 (mpm_common.c:101)
14 httpd 0x0000000100ef4679 main + 2233 (main.c:848)
15 libdyld.dylib 0x00007fff6b1343d5 start + 1

Thread 1:
0 libsystem_pthread.dylib 0x00007fff6b327e02 pthread_rwlock_wrlock + 0
1 libcrypto.1.1.dylib 0x00000001013be209 CRYPTO_THREAD_write_lock + 9
2 libcrypto.1.1.dylib 0x000000010138f3a6 RAND_get_rand_method + 54
3 libcrypto.1.1.dylib 0x000000010138f675 RAND_priv_bytes + 21
4 libcrypto.1.1.dylib 0x00000001012b4bb2 bnrand + 178
5 libcrypto.1.1.dylib 0x00000001012b2f49 BN_generate_prime_ex + 665
6 libcrypto.1.1.dylib 0x00000001013979b7 RSA_generate_multi_prime_key + 1783
7 libcrypto.1.1.dylib 0x000000010139c2d0 pkey_rsa_keygen + 160
8 libcrypto.1.1.dylib 0x000000010135bc5b EVP_PKEY_keygen + 91
9 mod_md.so 0x0000000101552bb4 gen_rsa + 132 (md_crypt.c:464)
10 mod_md.so 0x000000010154a797 md_acme_acct_register + 887 (md_acme_acct.c:591)
11 mod_md.so 0x000000010154c83c md_acme_drive_set_acct + 1020 (md_acme_drive.c:158)
12 mod_md.so 0x000000010154fd61 md_acmev2_drive_renew + 81 (md_acmev2_drive.c:102)
13 mod_md.so 0x000000010154da40 acme_driver_renew + 1520
14 mod_md.so 0x000000010155f196 run_renew + 262 (md_reg.c:1066)
15 mod_md.so 0x0000000101564179 md_util_pool_vdo + 185 (md_util.c:54)
16 mod_md.so 0x000000010155f08a md_reg_renew + 42 (md_reg.c:1075)
17 mod_md.so 0x00000001015423dc run_watchdog + 668 (mod_md_drive.c:127)
18 mod_watchdog.so 0x00000001014fd4fc wd_worker + 636 (mod_watchdog.c:202)
19 libsystem_pthread.dylib 0x00007fff6b3282eb _pthread_body + 126
20 libsystem_pthread.dylib 0x00007fff6b32b249 _pthread_start + 66
21 libsystem_pthread.dylib 0x00007fff6b32740d thread_start + 13


> Am 25.09.2019 um 14:46 schrieb Stefan Eissing <stefan.eissing@greenbytes.de>:
>
> The patch looks nice. Running it in my test suite over and over without any crash showing up!
>
> Great work!
>
>> Am 25.09.2019 um 13:24 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>
>> On Wed, Sep 25, 2019 at 11:54 AM Stefan Eissing
>> <stefan.eissing@greenbytes.de> wrote:
>>>
>>> Looking for help on an issue in mod_watchdog use and child exits.
>>>
>>> Occasionally a httpd child crashes due to races between child pool being destroyed while watchdog threads are still running. The crash manifests most likely when OPENSSL_cleanup runs while another thread is generated a private key (singe that takes relatively long).
>>>
>>> Beside the crash report, nothing else really fails since the child is terminating anyway and no requests are onoing. But still, it's not a nice thing.
>>>
>>> Do you see an easy way to avoid this?
>>
>> AFAICT, clean_child_exit() destroys pchild only, and pconf is still
>> alive at exit() time.
>> Couldn't we use pconf only in child_init hooks of mod_ssl and mod_watchdog?
>> Does something like the attached patch fixes the crash?
>> <some_pchild_to_pconf.diff>
>
Re: clean_child_exit and watchdog threads [ In reply to ]
On Wed, Sep 25, 2019 at 3:07 PM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> Hmm, far less likely, but still:

Likewise, I think the MPMs themselves shouldn't use pchild for their
internal allocations possibly still in use at exit().
So v2 (attached) may be the thing..
Re: clean_child_exit and watchdog threads [ In reply to ]
Oh, actually the stacktrace shows openssl which cleanups itself on
exit(), i.e. atexit() callback or alike (which is preserved on fork()
too..).
To avoid this, we may want to use OPENSSL_INIT_NO_ATEXIT at
OPENSSL_init() time and call OPENSSL_cleanup() explicitly when needed.

On Wed, Sep 25, 2019 at 3:07 PM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> Hmm, far less likely, but still:
>
> Crashed Thread: 0 Dispatch queue: com.apple.main-thread
>
> Exception Type: EXC_CRASH (SIGSEGV)
> Exception Codes: 0x0000000000000000, 0x0000000000000000
> Exception Note: EXC_CORPSE_NOTIFY
>
> Termination Signal: Segmentation fault: 11
> Termination Reason: Namespace SIGNAL, Code 0xb
> Terminating Process: httpd [6106]
>
> Application Specific Information:
> crashed on child side of fork pre-exec
>
> Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
> 0 libsystem_malloc.dylib 0x00007fff6b2e1c8c free_tiny + 243
> 1 libcrypto.1.1.dylib 0x0000000101363484 OPENSSL_LH_delete + 228
> 2 libcrypto.1.1.dylib 0x00000001013730f9 OBJ_NAME_remove + 105
> 3 libcrypto.1.1.dylib 0x000000010136368a OPENSSL_LH_doall + 74
> 4 libcrypto.1.1.dylib 0x0000000101373338 OBJ_NAME_cleanup + 72
> 5 libcrypto.1.1.dylib 0x00000001013577de evp_cleanup_int + 14
> 6 libcrypto.1.1.dylib 0x0000000101360ccf OPENSSL_cleanup + 335
> 7 libsystem_c.dylib 0x00007fff6b1da3d6 __cxa_finalize_ranges + 326
> 8 libsystem_c.dylib 0x00007fff6b1da6b3 exit + 55
> 9 mod_mpm_event.so 0x0000000101531546 clean_child_exit + 54 (event.c:768)
> 10 mod_mpm_event.so 0x0000000101531382 child_main + 1698 (event.c:2551)
> 11 mod_mpm_event.so 0x0000000101530c84 make_child + 436
> 12 mod_mpm_event.so 0x000000010152f7e5 event_run + 1093 (event.c:3256)
> 13 httpd 0x0000000100f0366b ap_run_mpm + 75 (mpm_common.c:101)
> 14 httpd 0x0000000100ef4679 main + 2233 (main.c:848)
> 15 libdyld.dylib 0x00007fff6b1343d5 start + 1
>
> Thread 1:
> 0 libsystem_pthread.dylib 0x00007fff6b327e02 pthread_rwlock_wrlock + 0
> 1 libcrypto.1.1.dylib 0x00000001013be209 CRYPTO_THREAD_write_lock + 9
> 2 libcrypto.1.1.dylib 0x000000010138f3a6 RAND_get_rand_method + 54
> 3 libcrypto.1.1.dylib 0x000000010138f675 RAND_priv_bytes + 21
> 4 libcrypto.1.1.dylib 0x00000001012b4bb2 bnrand + 178
> 5 libcrypto.1.1.dylib 0x00000001012b2f49 BN_generate_prime_ex + 665
> 6 libcrypto.1.1.dylib 0x00000001013979b7 RSA_generate_multi_prime_key + 1783
> 7 libcrypto.1.1.dylib 0x000000010139c2d0 pkey_rsa_keygen + 160
> 8 libcrypto.1.1.dylib 0x000000010135bc5b EVP_PKEY_keygen + 91
> 9 mod_md.so 0x0000000101552bb4 gen_rsa + 132 (md_crypt.c:464)
> 10 mod_md.so 0x000000010154a797 md_acme_acct_register + 887 (md_acme_acct.c:591)
> 11 mod_md.so 0x000000010154c83c md_acme_drive_set_acct + 1020 (md_acme_drive.c:158)
> 12 mod_md.so 0x000000010154fd61 md_acmev2_drive_renew + 81 (md_acmev2_drive.c:102)
> 13 mod_md.so 0x000000010154da40 acme_driver_renew + 1520
> 14 mod_md.so 0x000000010155f196 run_renew + 262 (md_reg.c:1066)
> 15 mod_md.so 0x0000000101564179 md_util_pool_vdo + 185 (md_util.c:54)
> 16 mod_md.so 0x000000010155f08a md_reg_renew + 42 (md_reg.c:1075)
> 17 mod_md.so 0x00000001015423dc run_watchdog + 668 (mod_md_drive.c:127)
> 18 mod_watchdog.so 0x00000001014fd4fc wd_worker + 636 (mod_watchdog.c:202)
> 19 libsystem_pthread.dylib 0x00007fff6b3282eb _pthread_body + 126
> 20 libsystem_pthread.dylib 0x00007fff6b32b249 _pthread_start + 66
> 21 libsystem_pthread.dylib 0x00007fff6b32740d thread_start + 13
>
>
> > Am 25.09.2019 um 14:46 schrieb Stefan Eissing <stefan.eissing@greenbytes.de>:
> >
> > The patch looks nice. Running it in my test suite over and over without any crash showing up!
> >
> > Great work!
> >
> >> Am 25.09.2019 um 13:24 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
> >>
> >> On Wed, Sep 25, 2019 at 11:54 AM Stefan Eissing
> >> <stefan.eissing@greenbytes.de> wrote:
> >>>
> >>> Looking for help on an issue in mod_watchdog use and child exits.
> >>>
> >>> Occasionally a httpd child crashes due to races between child pool being destroyed while watchdog threads are still running. The crash manifests most likely when OPENSSL_cleanup runs while another thread is generated a private key (singe that takes relatively long).
> >>>
> >>> Beside the crash report, nothing else really fails since the child is terminating anyway and no requests are onoing. But still, it's not a nice thing.
> >>>
> >>> Do you see an easy way to avoid this?
> >>
> >> AFAICT, clean_child_exit() destroys pchild only, and pconf is still
> >> alive at exit() time.
> >> Couldn't we use pconf only in child_init hooks of mod_ssl and mod_watchdog?
> >> Does something like the attached patch fixes the crash?
> >> <some_pchild_to_pconf.diff>
> >
>
Re: clean_child_exit and watchdog threads [ In reply to ]
I see no improvements to the first one. As far as I understand what is going on:

- shutdown of mpm waits for all ongoing requests to finish
- mpm exits

watchdog tries to cope this this by
- wake up every 100ms (when idle)
- poll mpm state
- exit watchdog thread when STOPPING

But when the watchdog is busy running, the mpm will exit() and the at_sysexit will tear OpenSSL down.

I think what we need is a way to let watchdog register ist workers at mpm and treat them similar to ongoing requests?

- Stefan

> Am 25.09.2019 um 15:10 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Wed, Sep 25, 2019 at 3:07 PM Stefan Eissing
> <stefan.eissing@greenbytes.de> wrote:
>>
>> Hmm, far less likely, but still:
>
> Likewise, I think the MPMs themselves shouldn't use pchild for their
> internal allocations possibly still in use at exit().
> So v2 (attached) may be the thing..
> <some_pchild_to_pconf-v2.diff>
Re: clean_child_exit and watchdog threads [ In reply to ]
> Am 25.09.2019 um 15:30 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> Oh, actually the stacktrace shows openssl which cleanups itself on
> exit(), i.e. atexit() callback or alike (which is preserved on fork()
> too..).
> To avoid this, we may want to use OPENSSL_INIT_NO_ATEXIT at
> OPENSSL_init() time and call OPENSSL_cleanup() explicitly when needed.

I a not sure this will address the issue. If the watchdog thread stays running, any teardown of OpenSSL will be too early.

Does mod_watchdog need a child pool cleanup that waits to its workers to shut down maybe?

> On Wed, Sep 25, 2019 at 3:07 PM Stefan Eissing
> <stefan.eissing@greenbytes.de> wrote:
>>
>> Hmm, far less likely, but still:
>>
>> Crashed Thread: 0 Dispatch queue: com.apple.main-thread
>>
>> Exception Type: EXC_CRASH (SIGSEGV)
>> Exception Codes: 0x0000000000000000, 0x0000000000000000
>> Exception Note: EXC_CORPSE_NOTIFY
>>
>> Termination Signal: Segmentation fault: 11
>> Termination Reason: Namespace SIGNAL, Code 0xb
>> Terminating Process: httpd [6106]
>>
>> Application Specific Information:
>> crashed on child side of fork pre-exec
>>
>> Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
>> 0 libsystem_malloc.dylib 0x00007fff6b2e1c8c free_tiny + 243
>> 1 libcrypto.1.1.dylib 0x0000000101363484 OPENSSL_LH_delete + 228
>> 2 libcrypto.1.1.dylib 0x00000001013730f9 OBJ_NAME_remove + 105
>> 3 libcrypto.1.1.dylib 0x000000010136368a OPENSSL_LH_doall + 74
>> 4 libcrypto.1.1.dylib 0x0000000101373338 OBJ_NAME_cleanup + 72
>> 5 libcrypto.1.1.dylib 0x00000001013577de evp_cleanup_int + 14
>> 6 libcrypto.1.1.dylib 0x0000000101360ccf OPENSSL_cleanup + 335
>> 7 libsystem_c.dylib 0x00007fff6b1da3d6 __cxa_finalize_ranges + 326
>> 8 libsystem_c.dylib 0x00007fff6b1da6b3 exit + 55
>> 9 mod_mpm_event.so 0x0000000101531546 clean_child_exit + 54 (event.c:768)
>> 10 mod_mpm_event.so 0x0000000101531382 child_main + 1698 (event.c:2551)
>> 11 mod_mpm_event.so 0x0000000101530c84 make_child + 436
>> 12 mod_mpm_event.so 0x000000010152f7e5 event_run + 1093 (event.c:3256)
>> 13 httpd 0x0000000100f0366b ap_run_mpm + 75 (mpm_common.c:101)
>> 14 httpd 0x0000000100ef4679 main + 2233 (main.c:848)
>> 15 libdyld.dylib 0x00007fff6b1343d5 start + 1
>>
>> Thread 1:
>> 0 libsystem_pthread.dylib 0x00007fff6b327e02 pthread_rwlock_wrlock + 0
>> 1 libcrypto.1.1.dylib 0x00000001013be209 CRYPTO_THREAD_write_lock + 9
>> 2 libcrypto.1.1.dylib 0x000000010138f3a6 RAND_get_rand_method + 54
>> 3 libcrypto.1.1.dylib 0x000000010138f675 RAND_priv_bytes + 21
>> 4 libcrypto.1.1.dylib 0x00000001012b4bb2 bnrand + 178
>> 5 libcrypto.1.1.dylib 0x00000001012b2f49 BN_generate_prime_ex + 665
>> 6 libcrypto.1.1.dylib 0x00000001013979b7 RSA_generate_multi_prime_key + 1783
>> 7 libcrypto.1.1.dylib 0x000000010139c2d0 pkey_rsa_keygen + 160
>> 8 libcrypto.1.1.dylib 0x000000010135bc5b EVP_PKEY_keygen + 91
>> 9 mod_md.so 0x0000000101552bb4 gen_rsa + 132 (md_crypt.c:464)
>> 10 mod_md.so 0x000000010154a797 md_acme_acct_register + 887 (md_acme_acct.c:591)
>> 11 mod_md.so 0x000000010154c83c md_acme_drive_set_acct + 1020 (md_acme_drive.c:158)
>> 12 mod_md.so 0x000000010154fd61 md_acmev2_drive_renew + 81 (md_acmev2_drive.c:102)
>> 13 mod_md.so 0x000000010154da40 acme_driver_renew + 1520
>> 14 mod_md.so 0x000000010155f196 run_renew + 262 (md_reg.c:1066)
>> 15 mod_md.so 0x0000000101564179 md_util_pool_vdo + 185 (md_util.c:54)
>> 16 mod_md.so 0x000000010155f08a md_reg_renew + 42 (md_reg.c:1075)
>> 17 mod_md.so 0x00000001015423dc run_watchdog + 668 (mod_md_drive.c:127)
>> 18 mod_watchdog.so 0x00000001014fd4fc wd_worker + 636 (mod_watchdog.c:202)
>> 19 libsystem_pthread.dylib 0x00007fff6b3282eb _pthread_body + 126
>> 20 libsystem_pthread.dylib 0x00007fff6b32b249 _pthread_start + 66
>> 21 libsystem_pthread.dylib 0x00007fff6b32740d thread_start + 13
>>
>>
>>> Am 25.09.2019 um 14:46 schrieb Stefan Eissing <stefan.eissing@greenbytes.de>:
>>>
>>> The patch looks nice. Running it in my test suite over and over without any crash showing up!
>>>
>>> Great work!
>>>
>>>> Am 25.09.2019 um 13:24 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>>>
>>>> On Wed, Sep 25, 2019 at 11:54 AM Stefan Eissing
>>>> <stefan.eissing@greenbytes.de> wrote:
>>>>>
>>>>> Looking for help on an issue in mod_watchdog use and child exits.
>>>>>
>>>>> Occasionally a httpd child crashes due to races between child pool being destroyed while watchdog threads are still running. The crash manifests most likely when OPENSSL_cleanup runs while another thread is generated a private key (singe that takes relatively long).
>>>>>
>>>>> Beside the crash report, nothing else really fails since the child is terminating anyway and no requests are onoing. But still, it's not a nice thing.
>>>>>
>>>>> Do you see an easy way to avoid this?
>>>>
>>>> AFAICT, clean_child_exit() destroys pchild only, and pconf is still
>>>> alive at exit() time.
>>>> Couldn't we use pconf only in child_init hooks of mod_ssl and mod_watchdog?
>>>> Does something like the attached patch fixes the crash?
>>>> <some_pchild_to_pconf.diff>
>>>
>>
Re: clean_child_exit and watchdog threads [ In reply to ]
On Wed, Sep 25, 2019 at 3:56 PM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> > Am 25.09.2019 um 15:30 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
> >
> > Oh, actually the stacktrace shows openssl which cleanups itself on
> > exit(), i.e. atexit() callback or alike (which is preserved on fork()
> > too..).
> > To avoid this, we may want to use OPENSSL_INIT_NO_ATEXIT at
> > OPENSSL_init() time and call OPENSSL_cleanup() explicitly when needed.
>
> I a not sure this will address the issue. If the watchdog thread stays running, any teardown of OpenSSL will be too early.

Yeah, but I see no reason why openssl should tear down in children
processes, we don't need/want this.
But that's where we are, new automagic OPENSSL_init/shutdown() is a
mess IMHO, and even a bigger mess when it comes to DSOs!

>
> Does mod_watchdog need a child pool cleanup that waits to its workers to shut down maybe?

That wouldn't fit ungraceful restarts, i.e. exit ASAP. Would we
forcibly kill threads at some point?
The solution could be to _exit() in clean_child_exit() to avoid
atexit() callbacks (which exit() calls, but not _exit()).
AW: clean_child_exit and watchdog threads [ In reply to ]
C2 General

> -----Ursprüngliche Nachricht-----
> Von: Yann Ylavic <ylavic.dev@gmail.com>
> Gesendet: Mittwoch, 25. September 2019 15:10
> An: httpd-dev <dev@httpd.apache.org>
> Betreff: Re: clean_child_exit and watchdog threads
>
> On Wed, Sep 25, 2019 at 3:07 PM Stefan Eissing
> <stefan.eissing@greenbytes.de> wrote:
> >
> > Hmm, far less likely, but still:
>
> Likewise, I think the MPMs themselves shouldn't use pchild for their
> internal allocations possibly still in use at exit().
> So v2 (attached) may be the thing..

Hm, haven't checked, but aren't there any cleanups that should run and
currently run before exit that will not run any longer when we tie
stuff to pconf instead of pchild?
I guess pure allocations are not a problem, since the process dies,
but I would be a little worried about other OS resources like
shared memory or locks not being cleaned up properly.
Regarding the watchdog threads I guess we could handle this
like Stefan suggested by handling it similar to still running connections.
Give them a grace period and kill them afterwards during regular shutdown.
For an immediate shutdown kill them off directly.

Regards

Rüdiger
Re: clean_child_exit and watchdog threads [ In reply to ]
On Thu, Sep 26, 2019 at 8:20 AM Pluem, Ruediger, Vodafone Group
<ruediger.pluem@vodafone.com> wrote:
>
> > -----Ursprüngliche Nachricht-----
> > Von: Yann Ylavic <ylavic.dev@gmail.com>
> >
> > Likewise, I think the MPMs themselves shouldn't use pchild for their
> > internal allocations possibly still in use at exit().
> > So v2 (attached) may be the thing..
>
> Hm, haven't checked, but aren't there any cleanups that should run and
> currently run before exit that will not run any longer when we tie
> stuff to pconf instead of pchild?
> I guess pure allocations are not a problem, since the process dies,
> but I would be a little worried about other OS resources like
> shared memory or locks not being cleaned up properly.

I think you are right, proc mutexes at least need to cleanup properly
on child exit.
I updated the patch (attached) to keep them on pchild.

> Regarding the watchdog threads I guess we could handle this
> like Stefan suggested by handling it similar to still running connections.
> Give them a grace period and kill them afterwards during regular shutdown.
> For an immediate shutdown kill them off directly.

Killing threads is going to be hard to achieve, all the more so in a
portable way. There is no apr_thread_kill() for instance,
pthread_kill() is not suitable, I know of tgkill() on linux...
But we shouldn't take that road IMHO, and regarding the state of
shared/proc resources potentially used by these threads it looks like
a can of worms..
Asking for watchdog callbacks (including third-parties') to
[un]gracefully stop is not something in the current "contract"
unfortunately, we are quite weaponless here I'm afraid.

So I can only think of _exit() like in attached v3, although in
addition to not run atexit() handlers _exit() also potentially does
not flush stdios, but all fds are closed so pending outputs should
still finish (for whatever that means in linux/BSD docs..).
This is still going to be racy with anything initialized on pchild
though, like mod_ssl caches mutexes (session, stapling) :/

Regards,
Yann.
Re: clean_child_exit and watchdog threads [ In reply to ]
Coming back to this, I ran the v3 patch several times in the mod_md test suite and had no crashes reported.

Now, about the side effects you mentioned, I cannot judge how severe that is and if it is worth it. mod_md itself does not mind if the child process is exiting early. It's just the crash reported about OpenSSL that is annoying.

I think we should have a not-crashing server, but there is no urgency for a quick fix.

Cheers, Stefan

> Am 26.09.2019 um 13:10 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Thu, Sep 26, 2019 at 8:20 AM Pluem, Ruediger, Vodafone Group
> <ruediger.pluem@vodafone.com> wrote:
>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Yann Ylavic <ylavic.dev@gmail.com>
>>>
>>> Likewise, I think the MPMs themselves shouldn't use pchild for their
>>> internal allocations possibly still in use at exit().
>>> So v2 (attached) may be the thing..
>>
>> Hm, haven't checked, but aren't there any cleanups that should run and
>> currently run before exit that will not run any longer when we tie
>> stuff to pconf instead of pchild?
>> I guess pure allocations are not a problem, since the process dies,
>> but I would be a little worried about other OS resources like
>> shared memory or locks not being cleaned up properly.
>
> I think you are right, proc mutexes at least need to cleanup properly
> on child exit.
> I updated the patch (attached) to keep them on pchild.
>
>> Regarding the watchdog threads I guess we could handle this
>> like Stefan suggested by handling it similar to still running connections.
>> Give them a grace period and kill them afterwards during regular shutdown.
>> For an immediate shutdown kill them off directly.
>
> Killing threads is going to be hard to achieve, all the more so in a
> portable way. There is no apr_thread_kill() for instance,
> pthread_kill() is not suitable, I know of tgkill() on linux...
> But we shouldn't take that road IMHO, and regarding the state of
> shared/proc resources potentially used by these threads it looks like
> a can of worms..
> Asking for watchdog callbacks (including third-parties') to
> [un]gracefully stop is not something in the current "contract"
> unfortunately, we are quite weaponless here I'm afraid.
>
> So I can only think of _exit() like in attached v3, although in
> addition to not run atexit() handlers _exit() also potentially does
> not flush stdios, but all fds are closed so pending outputs should
> still finish (for whatever that means in linux/BSD docs..).
> This is still going to be racy with anything initialized on pchild
> though, like mod_ssl caches mutexes (session, stapling) :/
>
> Regards,
> Yann.
> <some_pchild_to_pconf-v3.diff>