外观
Nginx nchan 模块导致 SSL 证书批量续期失败
问题背景
在例行检查 SSL 证书自动续期时,发现 certbot renew --dry-run 命令出现大量失败。13 个域名中,前 6 个续期成功,后 7 个全部失败,返回 504 Gateway Timeout 错误。
环境信息
| 组件 | 版本/配置 |
|---|---|
| Nginx | 1.24.0 |
| Certbot | 2.9.0 |
| 域名数量 | 13 个 |
错误信息
Certbot failed to authenticate some domains (authenticator: nginx).
The Certificate Authority reported these problems:
Domain: example.a.com
Type: unauthorized
Detail: 12.34.56.78: Invalid response from
http://example.a.com/.well-known/acme-challenge/xxx: 504初步排查
1. 检查 Nginx 状态
Nginx 服务显示运行正常,但发现异常:
$ pgrep nginx | wc -l
147有 147 个 nginx 进程,远超正常数量(通常 1 master + N workers)。
并且无法正常访问到任何nginx代理的服务,疑似进程阻塞。
2. 检查错误日志
$ grep "23:18" /var/log/nginx/error.log发现大量 worker 进程崩溃记录:
2026/04/21 23:18:20 [alert] 149662#149662: worker process 153306 exited on signal 6 (core dumped)
2026/04/21 23:18:20 [alert] 149662#149662: shared memory zone "memstore" was locked by 153306
2026/04/21 23:18:20 [alert] 149662#149662: worker process 153307 exited on signal 6 (core dumped)
2026/04/21 23:18:20 [alert] 149662#149662: shared memory zone "memstore" was locked by 153307
...统计崩溃次数:
$ grep -c "exited on signal 6" /var/log/nginx/error.log
23612361 次 worker 进程崩溃!
问题分析
certbot renew → 修改 nginx 配置 → nginx reload
→ nchan 模块 bug → worker 崩溃
→ 无法处理请求 → Let's Encrypt 等待超过 60 秒 → 504 超时certbot 按顺序处理证书,每个证书需要:
- 修改 nginx 配置添加临时验证路径
- reload nginx
- 等待 Let's Encrypt 验证
- 恢复配置
当处理到第 7 个证书时,频繁的 reload 触发了 nchan 模块的 bug,导致 worker 进程批量崩溃。此时 nginx 无法正常响应请求,后续所有证书验证都超时失败。
根据 Nginx Ticket #1135 的记录,nchan 模块在 nginx reload 时存在已知问题:
After upgrading from 1.10.1 without ALPN support to 1.10.2 with ALPN support... we've been getting into situations where Nginx completely stops serving connections without any warning.
The nginx error log on the affected hosts gets these odd messages:
worker process exited on signal 6 (core dumped) shared memory zone "memstore" was locked by xxx
解决方案
禁用 nchan 模块
# 1. 找到 nchan 模块配置
ls -la /etc/nginx/modules-enabled/ | grep nchan
# lrwxrwxrwx 1 root root 49 Apr 20 18:50 50-mod-nchan.conf -> /usr/share/nginx/modules-available/mod-nchan.conf
# 2. 删除符号链接(禁用模块)
sudo rm /etc/nginx/modules-enabled/50-mod-nchan.conf
# 3. 测试配置
sudo nginx -t
# nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
# nginx: configuration file /etc/nginx/nginx.conf test is successful
# 4. 重启 nginx
sudo service nginx restart重新运行 certbot 续期测试:
sudo certbot renew --dry-run结果 13 个证书全部续期成功
