分享

关于nginx负载的内核参数调整及其可能产生的影响

 hh3755 2012-11-27
调整如下参数:影响可见内文
net.ipv4.tcp_syncookies = 1 
net.ipv4.tcp_fin_timeout = 20     
net.ipv4.tcp_max_syn_backlog = 20480 
net.core.netdev_max_backlog = 4096 
net.ipv4.tcp_max_tw_buckets = 400000 
net.core.somaxconn = 4096 
net.nf_conntrack_max = 262144 
net.ipv4.ip_local_port_range = 1024  65000 



We have a box running nginx and two boxes running apache.  The apache 
boxes are configured as an upstream for nginx.   

The nginx box has a public IP, and then it talks to the upstream apaches 
using the private network (same switch).  We are sustaining a couple 
hundred requests/sec. 

We've had several issues with the upstreams being counted out by nginx, 
causing the "no live upstreams" message in the error log and end users 
seeing 502 errors.  When this happens the machines are barely being 
used, single digit load averages in 16 core boxes. 

Initially we were seeing a ton of "connect() failed (110: Connection 
timed out)", 1 every couple seconds.  I added these to sysctl.conf and 
that seemed to solve the problem: 

net.ipv4.tcp_syncookies = 1 
net.ipv4.tcp_fin_timeout = 20     
net.ipv4.tcp_max_syn_backlog = 20480 
net.core.netdev_max_backlog = 4096 
net.ipv4.tcp_max_tw_buckets = 400000 
net.core.somaxconn = 4096 

Now things generally run fine but every once in awhile we get a huge 
burst of "upstream prematurely closed connection while reading response 
header from upstream" followed by a "no live upstreams".  Again, no 
apparent load on the machines involved.  These bursts only last a minute 
or so.  We also still get an occasional "connect() failed (110: 
Connection timed out)" but they are far less frequent, perhaps 1 or 2 
per hour. 

Anyone have recommendations for tuning the networking side to improve 
the situation here?  These are some of the nginx.conf settings we have 
in place, removed the ones that don't seem related to the issue: 

worker_processes  4; 
worker_rlimit_nofile 30000; 
events { 
    worker_connections  4096; 
    # multi_accept on;                                                   
                    
    use epoll; 

http { 
    client_max_body_size 200m; 

    proxy_read_timeout 600s; 
    proxy_send_timeout 600s; 
    proxy_connect_timeout 60s; 

    proxy_buffer_size 128k; 
    proxy_buffers 4 128k; 

    keepalive_timeout  0; 
    tcp_nodelay        on; 


Happy to provide any other details.  This is the "ulimit -a" on all 
boxes: 

core file size          (blocks, -c) 0 
data seg size           (kbytes, -d) unlimited 
scheduling priority             (-e) 20 
file size               (blocks, -f) unlimited 
pending signals                 (-i) 16382 
max locked memory       (kbytes, -l) 64 
max memory size         (kbytes, -m) unlimited 
open files                      (-n) 300000 
pipe size            (512 bytes, -p) 8 
POSIX message queues     (bytes, -q) 819200 
real-time priority              (-r) 0 
stack size              (kbytes, -s) 8192 
cpu time               (seconds, -t) unlimited 
max user processes              (-u) unlimited 
virtual memory          (kbytes, -v) unlimited 
file locks                      (-x) unlimited 

Posted at Nginx Forum: http://forum./read.php?2,220894,220894#msg-220894

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
gtuhl Wrote: 
-------------------------------------------------------

> Initially we were seeing a ton of "connect() 
> failed (110: Connection timed out)", 1 every 
> couple seconds.  I added these to sysctl.conf and 
> that seemed to solve the problem: 

> net.ipv4.tcp_syncookies = 1 
> net.ipv4.tcp_fin_timeout = 20     
> net.ipv4.tcp_max_syn_backlog = 20480 
> net.core.netdev_max_backlog = 4096 
> net.ipv4.tcp_max_tw_buckets = 400000 
> net.core.somaxconn = 4096 

> Now things generally run fine but every once in 
> awhile we get a huge burst of "upstream 
> prematurely closed connection while reading 
> response header from upstream" followed by a "no 
> live upstreams".  Again, no apparent load on the 
> machines involved.  These bursts only last a 
> minute or so.  We also still get an occasional 
> "connect() failed (110: Connection timed out)" but 
> they are far less frequent, perhaps 1 or 2 per 
> hour. 
>

On looking at this again recently, we made two adjustments that 
eliminated the connection issues completely: 

net.nf_conntrack_max = 262144 
net.ipv4.ip_local_port_range = 1024  65000 

After making those two changes things became quite stable.  However, we 
still have massive numbers of TIME_WAIT connections both on the nginx 
machine and on the upstream apache machines. 

The nginx machine is accepting roughly 1000 requests/s, and has 40,000 
connections in TIME_WAIT. 
The apache machines are each accepting roughly 250 requests/s, and have 
15,000 connections in TIME_WAIT. 

We tried setting net.ipv4.tcp_tw_reuse to 1 and restarting networking. 
That did not cause any trouble, but also didn't drop the TIME_WAIT 
count.  I have read that net.ipv4.tcp_tw_recycle is dangerous but we may 
try that if others have had good experiences. 

Is there a way to have these cleaned up more quickly?  My concern is 
that even with the expanded ip_local_port_range 40k is cutting it rather 
close.  Before we bumped ip_local_port_range the whole system was 
falling down right as the TIME_WAIT count approached 32k.  Is it normal 
for nginx to cause this many TIME_WAIT connections?  If we're only doing 
1k requests/s and nearly exhausting the available port range what would 
sites with heavier volume do? 

Posted at Nginx Forum: http://forum./read.php?2,220894,221550#msg-221550

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
net.ipv4.tcp_tw_recycle = 1 

is what your looking for 

Posted at Nginx Forum: http://forum./read.php?2,220894,221583#msg-221583

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

Andrey Korolyov
1 post
On Tue, Jan 24, 2012 at 9:59 PM, ggrensteiner <[hidden email]> wrote:

> net.ipv4.tcp_tw_recycle = 1 

> is what your looking for 

> Posted at Nginx Forum: http://forum./read.php?2,220894,221583#msg-221583

> _______________________________________________ 
> nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx

This may cause trouble if multiple clients trying to reach the server 
over same NAT, so be careful. I have a negative experience even on ~ 
10 http reqs/min from NAT machine. 

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
Andrey Korolyov Wrote: 
-------------------------------------------------------

> On Tue, Jan 24, 2012 at 9:59 PM, ggrensteiner 
> <[hidden email]> wrote: 
> > net.ipv4.tcp_tw_recycle = 1 
> > 
> > is what your looking for 
> > 
> > Posted at Nginx Forum: 
http://forum./read.php?2,220894,221583#ms
> g-221583 
> > 
> > _______________________________________________ 
> > nginx mailing list 
> > [hidden email] 
> > http://mailman./mailman/listinfo/nginx

> This may cause trouble if multiple clients trying 
> to reach the server 
> over same NAT, so be careful. I have a negative 
> experience even on ~ 
> 10 http reqs/min from NAT machine. 
>

This is what I had read everywhere as well, so I've been hesitant to try 
it.  We definitely have a lot of users that would be coming at our 
servers from the same buliding/NAT.   

Has anyone tried using "net.ipv4.tcp_tw_reuse = 1" in a larger 
connection count environment before? 

I have it enabled now, but it did not seem to have any impact on the 
number of TIME_WAIT connections.  Does it wait until it actually needs 
to reuse one (due to port exhaustion) before doing so?  Or should it be 
keeping the number lower? 

Posted at Nginx Forum: http://forum./read.php?2,220894,221587#msg-221587

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
Have you tried using HTTP 1.1 keepalive connections from nginx to 
apache?  They became available in 1.1.4 and will re-use sockets rather 
then close them and leaving them in TIME_WAIT 

Be sure to remember to turn on keepalive in your apache config as well. 

http:///en/docs/http/ngx_http_upstream_module.html

Posted at Nginx Forum: http://forum./read.php?2,220894,221646#msg-221646

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

Rami Essaid
24 posts
Out of curiosity why would it keep it in TIME_WAIT if it is closing the connection?

On Wednesday, January 25, 2012 at 5:14 PM, ggrensteiner wrote:

Have you tried using HTTP 1.1 keepalive connections from nginx to
apache? They became available in 1.1.4 and will re-use sockets rather
then close them and leaving them in TIME_WAIT

Be sure to remember to turn on keepalive in your apache config as well.



_______________________________________________
nginx mailing list


_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
In reply to this post by bach
I'm thinking about giving the development version with the upstream 
keepalive over http 1.1 a try. 

Are people using that version in production?  Is there a release 
schedule/estimate anywhere that indicates when that feature might 
trickle over to stable? 

We're using nginx heavily in a pretty vanilla load balancer role - 
upstream of apache servers, ssl termination in nginx, that's it in terms 
of features we are using.   

It's worked fantastically well overall, we're just flirting with an 
ephemeral port limit on a few of our sites (have worked around by 
setting up multiple A records pointed at multiple nginx pairs).  If we 
could get keepalive connections between nginx and the upstream apaches I 
believe we would be in very good shape and could keep our configuration 
simple moving forward. 

Posted at Nginx Forum: http://forum./read.php?2,220894,224118#msg-224118

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

Alexandr Gomoliako
60 posts
On Tue, Mar 20, 2012 at 11:33 PM, gtuhl <[hidden email]> wrote: 
> I'm thinking about giving the development version with the upstream 
> keepalive over http 1.1 a try. 

> Are people using that version in production?  Is there a release 
> schedule/estimate anywhere that indicates when that feature might 
> trickle over to stable? 

According to their roadmap -- in 6 days :) 
http://trac./nginx/roadmap

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

David Yu
18 posts
In reply to this post by Rami Essaid


On Thu, Jan 26, 2012 at 7:21 AM, Rami Essaid <[hidden email]> wrote:
Out of curiosity why would it keep it in TIME_WAIT if it is closing the connection?
+1.  Also if the connection is closed, why is the upstream (apache) in TIME_WAIT also?

On Wednesday, January 25, 2012 at 5:14 PM, ggrensteiner wrote:

Have you tried using HTTP 1.1 keepalive connections from nginx to
apache? They became available in 1.1.4 and will re-use sockets rather
then close them and leaving them in TIME_WAIT

Be sure to remember to turn on keepalive in your apache config as well.



_______________________________________________
nginx mailing list


_______________________________________________
nginx mailing list
[hidden email]
http://mailman./mailman/listinfo/nginx



-- 
When the cat is away, the mouse is alone.
- David Yu

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
In reply to this post by Alexandr Gomoliako
Alexandr Gomoliako Wrote: 
-------------------------------------------------------

> On Tue, Mar 20, 2012 at 11:33 PM, gtuhl 
> <[hidden email]> wrote: 
> > I'm thinking about giving the development 
> version with the upstream 
> > keepalive over http 1.1 a try. 
> > 
> > Are people using that version in production? 
>  Is there a release 
> > schedule/estimate anywhere that indicates when 
> that feature might 
> > trickle over to stable? 

> According to their roadmap -- in 6 days :) 
http://trac./nginx/roadmap
>

This is excellent news.  Also apologies for somehow missing this page, 
was exactly what I was looking for. 

Posted at Nginx Forum: http://forum./read.php?2,220894,224171#msg-224171

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
Looks like that was for the 1.1.18 development release.  Is this what 
will become the 1.2.0 stable in a couple weeks?  Seems I'll need to wait 
for that one to get http 1.1 keepalive upstreams in stable. 

gtuhl Wrote: 
-------------------------------------------------------

> Alexandr Gomoliako Wrote: 
> -------------------------------------------------- 
> ----- 
> > On Tue, Mar 20, 2012 at 11:33 PM, gtuhl 
> > <[hidden email]> wrote: 
> > > I'm thinking about giving the development 
> > version with the upstream 
> > > keepalive over http 1.1 a try. 
> > > 
> > > Are people using that version in production? 
> >  Is there a release 
> > > schedule/estimate anywhere that indicates 
> when 
> > that feature might 
> > > trickle over to stable? 
> > 
> > According to their roadmap -- in 6 days :) 
> > http://trac./nginx/roadmap
> > 

> This is excellent news.  Also apologies for 
> somehow missing this page, was exactly what I was 
> looking for.

Posted at Nginx Forum: http://forum./read.php?2,220894,224560#msg-224560

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

bach
6302 posts
Initial testing with 1.2.0 and 1.1 keepalive to upstreams has our 
ephemeral port usage down from 38,000 to 220 on a canned test run.  This 
is a big deal, we can use nginx for reverse proxy on far busier sites 
now. 

Anyone put this under heavy usage in production yet? 

New release seems to be working brilliantly, good work to all involved. 

Posted at Nginx Forum: http://forum./read.php?2,220894,225921#msg-225921

_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx
Reply | Threaded | More     star

Re: Nginx as Load Balancer Connection Issues

Andrey Belov
5 posts

On May 1, 2012, at 5:26 , gtuhl wrote: 

> Initial testing with 1.2.0 and 1.1 keepalive to upstreams has our 
> ephemeral port usage down from 38,000 to 220 on a canned test run.  This 
> is a big deal, we can use nginx for reverse proxy on far busier sites 
> now. 

> Anyone put this under heavy usage in production yet? 

Yes. 

Somewhere from 1.1.4 or so. :) 


> New release seems to be working brilliantly, good work to all involved. 

> Posted at Nginx Forum: http://forum./read.php?2,220894,225921#msg-225921

> _______________________________________________ 
> nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx


_______________________________________________ 
nginx mailing list 
[hidden email] 
http://mailman./mailman/listinfo/nginx

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多