为了保持与后端的连接,请下载keepalive模块(http://mdounin.ru/hg/ngx_http_upstream_keepalive )
需要注意的是在多进程模式下,需要设置accept_mutex off;
假设你已经会用keepalive模块,我们继续分析在fastcgi如何保持连接?
nginx连接fastcgi默认情况下,是这样的(即使你设置了上面keepalive)
[root@localhost ~]# tcpdump -i lo -s 1500 port 9000 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes 15:23:16.901004 IP localhost.localdomain.50867 > localhost.localdomain.9000: S 3482201970:3482201970(0) win 32767 <mss 16396,sackOK,timestamp 2296841391 0,nop,wscale 7> 15:23:16.901025 IP localhost.localdomain.9000 > localhost.localdomain.50867: S 3473410857:3473410857(0) ack 3482201971 win 32767 <mss 16396,sackOK,timestamp 2296841391 2296841391,nop,wscale 7> 15:23:16.901039 IP localhost.localdomain.50867 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901150 IP localhost.localdomain.50867 > localhost.localdomain.9000: P 1:1377(1376) ack 1 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901170 IP localhost.localdomain.9000 > localhost.localdomain.50867: . ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901214 IP localhost.localdomain.9000 > localhost.localdomain.50867: P 1:97(96) ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901222 IP localhost.localdomain.50867 > localhost.localdomain.9000: . ack 97 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901236 IP localhost.localdomain.9000 > localhost.localdomain.50867: F 97:97(0) ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901822 IP localhost.localdomain.50867 > localhost.localdomain.9000: F 1377:1377(0) ack 98 win 256 <nop,nop,timestamp 2296841392 2296841391> 15:23:16.901836 IP localhost.localdomain.9000 > localhost.localdomain.50867: . ack 1378 win 256 <nop,nop,timestamp 2296841392 2296841392>
可以看出是后端主动关闭了连接,所以直接在nginx.conf配置文件中设置keepalive无效。
查看ngx_http_fastcgi_module.c文件,发现ngx_http_fastcgi_begin_request_t数据结构中有flags字段,
这个字段用来控制后端是否要主动关闭连接,
102 typedef struct { 103 u_char role_hi; 104 u_char role_lo; 105 u_char flags; //通过这个来控制连接的关闭是否 106 u_char reserved[5]; 107 } ngx_http_fastcgi_begin_request_t;
fastcgi协议说明如下:
Closing Transport Connections The Web server controls the lifetime of transport connections. The Web server can close a connection when no requests are active. Or the Web server can delegate close authority to the application (see FCGI_BEGIN_REQUEST). In this case the application closes the connection at the end of a specified request.
再查看ngx_http_fastcgi_module.c文件中的设置:
475 static ngx_http_fastcgi_request_start_t ngx_http_fastcgi_request_start = { 476 { 1, /* version */ 477 NGX_HTTP_FASTCGI_BEGIN_REQUEST, /* type */ 478 0, /* request_id_hi */ 479 1, /* request_id_lo */ 480 0, /* content_length_hi */ 481 sizeof(ngx_http_fastcgi_begin_request_t), /* content_length_lo */ 482 0, /* padding_length */ 483 0 }, /* reserved */ 484 485 { 0, /* role_hi */ 486 NGX_HTTP_FASTCGI_RESPONDER, /* role_lo */ 487 0, /* NGX_HTTP_FASTCGI_KEEP_CONN */ /* flags */ 488 { 0, 0, 0, 0, 0 } }, /* reserved[5] */ 489 490 { 1, /* version */ 491 NGX_HTTP_FASTCGI_PARAMS, /* type */ 492 0, /* request_id_hi */ 493 1 }, /* request_id_lo */ 494 495 };
我们把487行的0改为1,确保后端不主动关闭connection。
仅仅修改这个还是不能保持keepalive,还需要在ngx_http_fastcgi_module.c中的ngx_http_fastcgi_finalize_request函数中增加这么一段,确保nginx保持连接,也就是设置
u->length=0,以确保keepalive模块能够判断是否要保持连接。
1917 ngx_http_fastcgi_finalize_request(ngx_http_request_t *r, ngx_int_t rc) 1918 { 1919 ngx_http_upstream_t *u = r->upstream; 1920 if(u != NULL) 1921 { 1922 u->length = 0; 1923 } 1924 ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, 1925 "finalize http fastcgi request"); 1926 1927 return; 1928 }
为了让nginx及时返回信息给客户端,
还需要修改src/event/ngx_event_pipe.c
原来代码如下: while (cl && n > 0) { ngx_event_pipe_remove_shadow_links(cl->buf); size = cl->buf->end - cl->buf->last; if (n >= size) { cl->buf->last = cl->buf->end; /* STUB * / cl->buf->num = p->num++; if (p->input_filter(p, cl->buf) == NGX_ERROR){ return NGX_ABORT; } n -= size; ln = cl; cl = cl->next; ngx_free_chain(p->pool, ln); } else { cl->buf->last += n; n = 0; } } 修改如下: while (cl && n > 0) { ngx_event_pipe_remove_shadow_links(cl->buf); size = cl->buf->end - cl->buf->last; if (n >= size) { cl->buf->last = cl->buf->end; n -= size; } else { cl->buf->last += n; n = 0; } /* STUB */cl->buf->num = p->num++; if (p->input_filter(p, cl->buf) == NGX_ERROR) { return NGX_ABORT; } ln = cl; cl = cl->next; ngx_free_chain(p->pool, ln); }
经过上面修改,执行程序,抓包分析如下:
[root@localhost ~]# tcpdump -i lo -s 1500 port 9000 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes 11:14:03.711955 IP localhost.localdomain.50708 > localhost.localdomain.9000: S 2207544028:2207544028(0) win 32767 <mss 16396,sackOK,timestamp 3491628977 0,nop,wscale 7> 11:14:03.712218 IP localhost.localdomain.9000 > localhost.localdomain.50708: S 2221134347:2221134347(0) ack 2207544029 win 32767 <mss 16396,sackOK,timestamp 3491628977 3491628977,nop,wscale 7> 11:14:03.712241 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 3491628977 3491628977> 11:14:03.712257 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 1:1257(1256) ack 1 win 256 <nop,nop,timestamp 3491628977 3491628977> 11:14:03.712273 IP localhost.localdomain.9000 > localhost.localdomain.50708: . ack 1257 win 256 <nop,nop,timestamp 3491628977 3491628977> 11:14:03.711969 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 1:2985(2984) ack 1257 win 256 <nop,nop,timestamp 3491628978 3491628977> 11:14:03.711980 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 2985 win 303 <nop,nop,timestamp 3491628978 3491628978> 11:14:05.738632 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 1257:2513(1256) ack 2985 win 303 <nop,nop,timestamp 3491631005 3491628978> 11:14:05.738832 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 2985:5969(2984) ack 2513 win 256 <nop,nop,timestamp 3491631005 3491631005> 11:14:05.738848 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 5969 win 303 <nop,nop,timestamp 3491631005 3491631005> 11:14:06.901924 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 2513:3769(1256) ack 5969 win 303 <nop,nop,timestamp 3491632168 3491631005> 11:14:06.902098 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 5969:8953(2984) ack 3769 win 256 <nop,nop,timestamp 3491632168 3491632168> 11:14:06.902110 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 8953 win 303 <nop,nop,timestamp 3491632168 3491632168> 11:14:07.570211 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 3769:5025(1256) ack 8953 win 303 <nop,nop,timestamp 3491632836 3491632168> 11:14:07.570387 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 8953:11937(2984) ack 5025 win 256 <nop,nop,timestamp 3491632837 3491632836> 11:14:07.570399 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 11937 win 303 <nop,nop,timestamp 3491632837 3491632837> 11:14:08.202399 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 5025:6281(1256) ack 11937 win 303 <nop,nop,timestamp 3491633469 3491632837> 11:14:08.202473 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 11937:14921(2984) ack 6281 win 256 <nop,nop,timestamp 3491633469 3491633469> 11:14:08.202483 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 14921 win 303 <nop,nop,timestamp 3491633469 3491633469> 11:14:09.475039 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 6281:7537(1256) ack 14921 win 303 <nop,nop,timestamp 3491634742 3491633469> 11:14:09.475277 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 14921:17905(2984) ack 7537 win 256 <nop,nop,timestamp 3491634742 3491634742> 11:14:09.475291 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 17905 win 303 <nop,nop,timestamp 3491634742 3491634742> 11:14:10.082268 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 7537:8793(1256) ack 17905 win 303 <nop,nop,timestamp 3491635349 3491634742> 11:14:10.082512 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 17905:20889(2984) ack 8793 win 256 <nop,nop,timestamp 3491635349 3491635349> 11:14:10.082522 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 20889 win 303 <nop,nop,timestamp 3491635349 3491635349> 11:14:10.818134 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 8793:10049(1256) ack 20889 win 303 <nop,nop,timestamp 3491636085 3491635349> 11:14:10.818252 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 20889:23873(2984) ack 10049 win 256 <nop,nop,timestamp 3491636085 3491636085> 11:14:10.818263 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 23873 win 303 <nop,nop,timestamp 3491636085 3491636085> 11:14:11.506168 IP localhost.localdomain.50711 > localhost.localdomain.9000: S 2218187766:2218187766(0) win 32767 <mss 16396,sackOK,timestamp 3491636773 0,nop,wscale 7> 11:14:11.506191 IP localhost.localdomain.9000 > localhost.localdomain.50711: S 2224663648:2224663648(0) ack 2218187767 win 32767 <mss 16396,sackOK,timestamp 3491636773 3491636773,nop,wscale 7> 11:14:11.506205 IP localhost.localdomain.50711 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 3491636773 3491636773> 11:14:11.506318 IP localhost.localdomain.50711 > localhost.localdomain.9000: P 1:1257(1256) ack 1 win 256 <nop,nop,timestamp 3491636773 3491636773> 11:14:11.506329 IP localhost.localdomain.9000 > localhost.localdomain.50711: . ack 1257 win 256 <nop,nop,timestamp 3491636773 3491636773> 11:15:11.506325 IP localhost.localdomain.50711 > localhost.localdomain.9000: F 1257:1257(0) ack 1 win 256 <nop,nop,timestamp 3491696782 3491636773> 11:15:11.546176 IP localhost.localdomain.9000 > localhost.localdomain.50711: . ack 1258 win 256 <nop,nop,timestamp 3491696822 3491696782>
上面我启动了两个nginx工作进程,
发现在第一个进程执行保持keepalive是没有问题的,一旦第二个进程取得了处理权后,就歇菜了,后端fastcgi就没有响应了,导致客户端迟迟没有响应
我们用strace进行查看fastcgi在干吗?
read(3, "/1/1/0/1/0/10/0/0/0/1/1/0/0/0/0/0/1/4/0/1/4/271/7/0/t/21"..., 8192) = 1256 time(NULL) = 1302665416 write(3, "/1/6/0/1/v/201/7/0Content-type: text/html/r"..., 2984) = 2984 read(3, "/1/1/0/1/0/10/0/0/0/1/1/0/0/0/0/0/1/4/0/1/4/271/7/0/t/21"..., 8192) = 1256 time(NULL) = 1302665418 write(3, "/1/6/0/1/v/201/7/0Content-type: text/html/r"..., 2984) = 2984
read(3,
发现出现问题的时候,read没有响应,好像只对一个连接起到作用,其它的连接fastcgi根本没法读取,导致没法返回信息给nginx
这里说明一下我用的是fcgi-2.4.0下自带的examples下的echo程序作为fastcgi后端程序,如果后端能够正常处理,比如也采用epoll,理论上能够处理。
后续探索还在进行中