nginx针对fastcgi保持keepalive的实验

    技术2022-05-20  43

    为了保持与后端的连接,请下载keepalive模块(http://mdounin.ru/hg/ngx_http_upstream_keepalive )

    需要注意的是在多进程模式下,需要设置accept_mutex off;

     

    假设你已经会用keepalive模块,我们继续分析在fastcgi如何保持连接?

     

    nginx连接fastcgi默认情况下,是这样的(即使你设置了上面keepalive)

     

    [root@localhost ~]# tcpdump -i lo -s 1500 port 9000 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes 15:23:16.901004 IP localhost.localdomain.50867 > localhost.localdomain.9000: S 3482201970:3482201970(0) win 32767 <mss 16396,sackOK,timestamp 2296841391 0,nop,wscale 7> 15:23:16.901025 IP localhost.localdomain.9000 > localhost.localdomain.50867: S 3473410857:3473410857(0) ack 3482201971 win 32767 <mss 16396,sackOK,timestamp 2296841391 2296841391,nop,wscale 7> 15:23:16.901039 IP localhost.localdomain.50867 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901150 IP localhost.localdomain.50867 > localhost.localdomain.9000: P 1:1377(1376) ack 1 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901170 IP localhost.localdomain.9000 > localhost.localdomain.50867: . ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901214 IP localhost.localdomain.9000 > localhost.localdomain.50867: P 1:97(96) ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901222 IP localhost.localdomain.50867 > localhost.localdomain.9000: . ack 97 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901236 IP localhost.localdomain.9000 > localhost.localdomain.50867: F 97:97(0) ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391> 15:23:16.901822 IP localhost.localdomain.50867 > localhost.localdomain.9000: F 1377:1377(0) ack 98 win 256 <nop,nop,timestamp 2296841392 2296841391> 15:23:16.901836 IP localhost.localdomain.9000 > localhost.localdomain.50867: . ack 1378 win 256 <nop,nop,timestamp 2296841392 2296841392>

     

     

    可以看出是后端主动关闭了连接,所以直接在nginx.conf配置文件中设置keepalive无效。

     

    查看ngx_http_fastcgi_module.c文件,发现ngx_http_fastcgi_begin_request_t数据结构中有flags字段,

    这个字段用来控制后端是否要主动关闭连接,

     

         102 typedef struct {                                            103     u_char  role_hi;      104     u_char  role_lo;                                        105     u_char  flags;     //通过这个来控制连接的关闭是否                                     106     u_char  reserved[5];                                    107 } ngx_http_fastcgi_begin_request_t;

     

    fastcgi协议说明如下:

     

    Closing Transport Connections     The Web server controls the lifetime of transport connections. The Web server can close a connection when no requests are active. Or the Web server can delegate close authority to the application (see FCGI_BEGIN_REQUEST). In this case the application closes the connection at the end of a specified request.

     

    再查看ngx_http_fastcgi_module.c文件中的设置:

     

         475 static ngx_http_fastcgi_request_start_t  ngx_http_fastcgi_request_start = {      476     { 1,                                               /* version */      477       NGX_HTTP_FASTCGI_BEGIN_REQUEST,                  /* type */      478       0,                                               /* request_id_hi */      479       1,                                               /* request_id_lo */      480       0,                                               /* content_length_hi */      481       sizeof(ngx_http_fastcgi_begin_request_t),        /* content_length_lo */      482       0,                                               /* padding_length */      483       0 },                                             /* reserved */      484      485     { 0,                                               /* role_hi */      486       NGX_HTTP_FASTCGI_RESPONDER,                      /* role_lo */      487       0, /* NGX_HTTP_FASTCGI_KEEP_CONN */              /* flags */      488       { 0, 0, 0, 0, 0 } },                             /* reserved[5] */      489      490     { 1,                                               /* version */      491       NGX_HTTP_FASTCGI_PARAMS,                         /* type */      492       0,                                               /* request_id_hi */      493       1 },                                             /* request_id_lo */      494      495 };

     

     

    我们把487行的0改为1,确保后端不主动关闭connection。

     

    仅仅修改这个还是不能保持keepalive,还需要在ngx_http_fastcgi_module.c中的ngx_http_fastcgi_finalize_request函数中增加这么一段,确保nginx保持连接,也就是设置

    u->length=0,以确保keepalive模块能够判断是否要保持连接。

     

                1917 ngx_http_fastcgi_finalize_request(ngx_http_request_t *r, ngx_int_t rc)             1918 {             1919     ngx_http_upstream_t *u = r->upstream;             1920     if(u != NULL)             1921     {             1922         u->length = 0;             1923     }             1924     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,                                                                              1925                    "finalize http fastcgi request");             1926             1927     return;             1928 }

     

    为了让nginx及时返回信息给客户端,

    还需要修改src/event/ngx_event_pipe.c

     

    原来代码如下: while (cl && n > 0) {     ngx_event_pipe_remove_shadow_links(cl->buf);     size = cl->buf->end - cl->buf->last;     if (n >= size) {     cl->buf->last = cl->buf->end;     /* STUB * / cl->buf->num = p->num++;     if (p->input_filter(p, cl->buf) == NGX_ERROR){         return NGX_ABORT;     }     n -= size;     ln = cl;     cl = cl->next;     ngx_free_chain(p->pool, ln);     } else {     cl->buf->last += n;     n = 0;     }     }     修改如下: while (cl && n > 0) {     ngx_event_pipe_remove_shadow_links(cl->buf);     size = cl->buf->end - cl->buf->last;     if (n >= size) {     cl->buf->last = cl->buf->end;     n -= size;     } else {     cl->buf->last += n;     n = 0;     }     /* STUB */cl->buf->num = p->num++;     if (p->input_filter(p, cl->buf) == NGX_ERROR) {     return NGX_ABORT;     }     ln = cl;     cl = cl->next;     ngx_free_chain(p->pool, ln);     }

     

     

    经过上面修改,执行程序,抓包分析如下:

     

    [root@localhost ~]# tcpdump -i lo -s 1500 port 9000 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes 11:14:03.711955 IP localhost.localdomain.50708 > localhost.localdomain.9000: S 2207544028:2207544028(0) win 32767 <mss 16396,sackOK,timestamp 3491628977 0,nop,wscale 7> 11:14:03.712218 IP localhost.localdomain.9000 > localhost.localdomain.50708: S 2221134347:2221134347(0) ack 2207544029 win 32767 <mss 16396,sackOK,timestamp 3491628977 3491628977,nop,wscale 7> 11:14:03.712241 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 3491628977 3491628977> 11:14:03.712257 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 1:1257(1256) ack 1 win 256 <nop,nop,timestamp 3491628977 3491628977> 11:14:03.712273 IP localhost.localdomain.9000 > localhost.localdomain.50708: . ack 1257 win 256 <nop,nop,timestamp 3491628977 3491628977> 11:14:03.711969 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 1:2985(2984) ack 1257 win 256 <nop,nop,timestamp 3491628978 3491628977> 11:14:03.711980 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 2985 win 303 <nop,nop,timestamp 3491628978 3491628978> 11:14:05.738632 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 1257:2513(1256) ack 2985 win 303 <nop,nop,timestamp 3491631005 3491628978> 11:14:05.738832 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 2985:5969(2984) ack 2513 win 256 <nop,nop,timestamp 3491631005 3491631005> 11:14:05.738848 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 5969 win 303 <nop,nop,timestamp 3491631005 3491631005> 11:14:06.901924 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 2513:3769(1256) ack 5969 win 303 <nop,nop,timestamp 3491632168 3491631005> 11:14:06.902098 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 5969:8953(2984) ack 3769 win 256 <nop,nop,timestamp 3491632168 3491632168> 11:14:06.902110 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 8953 win 303 <nop,nop,timestamp 3491632168 3491632168> 11:14:07.570211 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 3769:5025(1256) ack 8953 win 303 <nop,nop,timestamp 3491632836 3491632168> 11:14:07.570387 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 8953:11937(2984) ack 5025 win 256 <nop,nop,timestamp 3491632837 3491632836> 11:14:07.570399 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 11937 win 303 <nop,nop,timestamp 3491632837 3491632837> 11:14:08.202399 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 5025:6281(1256) ack 11937 win 303 <nop,nop,timestamp 3491633469 3491632837> 11:14:08.202473 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 11937:14921(2984) ack 6281 win 256 <nop,nop,timestamp 3491633469 3491633469> 11:14:08.202483 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 14921 win 303 <nop,nop,timestamp 3491633469 3491633469> 11:14:09.475039 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 6281:7537(1256) ack 14921 win 303 <nop,nop,timestamp 3491634742 3491633469> 11:14:09.475277 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 14921:17905(2984) ack 7537 win 256 <nop,nop,timestamp 3491634742 3491634742> 11:14:09.475291 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 17905 win 303 <nop,nop,timestamp 3491634742 3491634742> 11:14:10.082268 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 7537:8793(1256) ack 17905 win 303 <nop,nop,timestamp 3491635349 3491634742> 11:14:10.082512 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 17905:20889(2984) ack 8793 win 256 <nop,nop,timestamp 3491635349 3491635349> 11:14:10.082522 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 20889 win 303 <nop,nop,timestamp 3491635349 3491635349> 11:14:10.818134 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 8793:10049(1256) ack 20889 win 303 <nop,nop,timestamp 3491636085 3491635349> 11:14:10.818252 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 20889:23873(2984) ack 10049 win 256 <nop,nop,timestamp 3491636085 3491636085> 11:14:10.818263 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 23873 win 303 <nop,nop,timestamp 3491636085 3491636085> 11:14:11.506168 IP localhost.localdomain.50711 > localhost.localdomain.9000: S 2218187766:2218187766(0) win 32767 <mss 16396,sackOK,timestamp 3491636773 0,nop,wscale 7> 11:14:11.506191 IP localhost.localdomain.9000 > localhost.localdomain.50711: S 2224663648:2224663648(0) ack 2218187767 win 32767 <mss 16396,sackOK,timestamp 3491636773 3491636773,nop,wscale 7> 11:14:11.506205 IP localhost.localdomain.50711 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 3491636773 3491636773> 11:14:11.506318 IP localhost.localdomain.50711 > localhost.localdomain.9000: P 1:1257(1256) ack 1 win 256 <nop,nop,timestamp 3491636773 3491636773> 11:14:11.506329 IP localhost.localdomain.9000 > localhost.localdomain.50711: . ack 1257 win 256 <nop,nop,timestamp 3491636773 3491636773> 11:15:11.506325 IP localhost.localdomain.50711 > localhost.localdomain.9000: F 1257:1257(0) ack 1 win 256 <nop,nop,timestamp 3491696782 3491636773> 11:15:11.546176 IP localhost.localdomain.9000 > localhost.localdomain.50711: . ack 1258 win 256 <nop,nop,timestamp 3491696822 3491696782>

     

    上面我启动了两个nginx工作进程,

    发现在第一个进程执行保持keepalive是没有问题的,一旦第二个进程取得了处理权后,就歇菜了,后端fastcgi就没有响应了,导致客户端迟迟没有响应

     

    我们用strace进行查看fastcgi在干吗?

     

    read(3, "/1/1/0/1/0/10/0/0/0/1/1/0/0/0/0/0/1/4/0/1/4/271/7/0/t/21"..., 8192) = 1256 time(NULL)                              = 1302665416 write(3, "/1/6/0/1/v/201/7/0Content-type: text/html/r"..., 2984) = 2984 read(3, "/1/1/0/1/0/10/0/0/0/1/1/0/0/0/0/0/1/4/0/1/4/271/7/0/t/21"..., 8192) = 1256 time(NULL)                              = 1302665418 write(3, "/1/6/0/1/v/201/7/0Content-type: text/html/r"..., 2984) = 2984

    read(3,

     

    发现出现问题的时候,read没有响应,好像只对一个连接起到作用,其它的连接fastcgi根本没法读取,导致没法返回信息给nginx

     

    这里说明一下我用的是fcgi-2.4.0下自带的examples下的echo程序作为fastcgi后端程序,如果后端能够正常处理,比如也采用epoll,理论上能够处理。

     

    后续探索还在进行中

     


    最新回复(0)