Apache Tomcat chokes after 300 connections – Problems with loading a website are often blamed on the Internet connection, but even the most perfectly set up network cannot help if there is no service to reply at your destination. One of the most popular HTTP servers used for this task is Apache2. Much of Apache’s popularity can be attributed to its easy installation and use, but never the less it is possible to run into problems with even the easiest of the software. If you’ve encountered an issue loading your web page, follow these simple troubleshooting methods outlined in this guide to attempt to get your web server back up and working again. Below are some tips in manage your apache2 server when you find problem about apache-2.2, tomcat, tcp, performance-tuning, .
We have an apache webserver in front of Tomcat hosted on EC2, instance type is extra large with 34GB memory.
Our application deals with lot of external webservices and we have a very lousy external webservice which takes almost 300 seconds to respond to requests during peak hours.
During peak hours the server chokes at just about 300 httpd processes. ps -ef | grep httpd | wc -l =300
I have googled and found numerous suggestions but nothing seems to work.. following are some configuration i have done which are directly taken from online resources.
I have increased the limits of max connection and max clients in both apache and tomcat. here are the configuration details:
//apache
<IfModule prefork.c>
StartServers 100
MinSpareServers 10
MaxSpareServers 10
ServerLimit 50000
MaxClients 50000
MaxRequestsPerChild 2000
</IfModule>
//tomcat
<Connector port="8080" protocol="org.apache.coyote.http11.Http11NioProtocol"
connectionTimeout="600000"
redirectPort="8443"
enableLookups="false" maxThreads="1500"
compressableMimeType="text/html,text/xml,text/plain,text/css,application/x-javascript,text/vnd.wap.wml,text/vnd.wap.wmlscript,application/xhtml+xml,application/xml-dtd,application/xslt+xml"
compression="on"/>
//Sysctl.conf
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1
fs.file-max = 5049800
vm.min_free_kbytes = 204800
vm.page-cluster = 20
vm.swappiness = 90
net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_max_orphans = 65536
net.ipv4.ip_local_port_range = 5000 65000
net.core.somaxconn = 1024
I have been trying numerous suggestions but in vain.. how to fix this? I’m sure m2xlarge server should serve more requests than 300, probably i might be going wrong with my configuration..
The server chokes only during peak hours and when there are 300 concurrent requests waiting for the [300 second delayed] webservice to respond.
I was just monitoring the tcp connections with netstat
i found around 1000 connections in TIME_WAIT state, no idea what that would mean in terms of performance, i’m sure it must be adding to the problem.
Output of TOP
8902 root 25 0 19.6g 3.0g 12m S 3.3 8.8 13:35.77 java
24907 membase 25 0 753m 634m 2528 S 2.7 1.8 285:18.88 beam.smp
24999 membase 15 0 266m 121m 3160 S 0.7 0.3 51:30.37 memcached
27578 apache 15 0 230m 6300 1536 S 0.7 0.0 0:00.03 httpd
28551 root 15 0 11124 1492 892 R 0.3 0.0 0:00.25 top
Output of free -m
total used free shared buffers cached
35007 8470 26536 0 1 61
8407 26599
15999 15 15984
output of iostat
avg-cpu: %user %nice %system %iowait %steal %idle
26.21 0.00 0.48 0.13 0.02 73.15
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda1 14.36 4.77 329.37 9005402 622367592
sdb 0.00 0.00 0.00 1210 48
Also at peak time there are about 10-15k tcp connections to membase server[local]
SOME ERRORS IN MODJK LOG, I hope this throws some light on the issue..
[Wed Jul 11 14:39:10.853 2012] [8365:46912560456400] [error] ajp_send_request::jk_ajp_common.c (1630): (tom2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=110)
[Wed Jul 11 14:39:18.627 2012] [8322:46912560456400] [error] ajp_send_request::jk_ajp_common.c (1630): (tom2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=110)
[Wed Jul 11 14:39:21.358 2012] [8351:46912560456400] [error] ajp_get_reply::jk_ajp_common.c (2118): (tom1) Tomcat is down or refused connection. No response has been sent to the client (yet)
[Wed Jul 11 14:39:22.640 2012] [8348:46912560456400] [error] ajp_get_reply::jk_ajp_common.c (2118): (tom1) Tomcat is down or refused connection. No response has been sent to the client (yet)
~
Worker.properties
workers.tomcat_home=/usr/local/tomcat/
worker.list=loadbalancer
worker.tom1.port=8009
worker.tom1.host=localhost
worker.tom1.type=ajp13
worker.tom1.socket_keepalive=True
worker.tom1.connection_pool_timeout=600
worker.tom2.port=8109
worker.tom2.host=localhost
worker.tom2.type=ajp13
worker.tom2.socket_keepalive=True
worker.tom2.connection_pool_timeout=600
worker.loadbalancer.type=lb
worker.loadbalancer.balanced_workers=tom1,tom2
worker.loadbalancer.sticky_session=True
worker.tom1.lbfactor=1
worker.tom1.socket_timeout=600
worker.tom2.lbfactor=1
worker.tom2.socket_timeout=600
//Solved
thansk all for your valuable suggestions.. i missed out the maxThreads settings for the AJP 1.3 connector.. Now everything seems under control.
I would also start looking at even based servers like nginx.
Have you increased maxThreads in the AJP 1.3 Connector on port 8009?
Consider setting up an asynchronous proxying web server like nginx
or lighttpd
in front of Apache. Apache serves content synchronously so workers are blocked until clients download generated content in full (more details here). Setting up an asynchronous (non-blocking) proxy usually improves situation dramatically (I used to lower the number of concurrently running Apache workers from 30 to 3-5 using nginx
as a frontend proxy).
I suspect your problem is in tomcat not apache, from the logs you have shown anyway. When you get ‘error 110’ trying to connect back into tomcat it indicates you’ve got a queue of connections waiting to be served that no more can fit into the listening backlog setup for the listening socket in tomcat.
From the listen manpage:
The backlog parameter defines the maximum length the queue of pending
connections may grow to. If a connection request arrives with
the queue full the client may receive an error with an indication
of ECONNREFUSED or, if the underlying protocol supports
retransmission, the request may be ignored so that retries succeed.
If I had to guess, I would suspect that the vast majority of HTTP requests when the server is “choking” is blocked waiting for something to come back from tomcat. I bet if you attempted to fetch some static content thats directly served up by apache (rather than being proxied to tomcat) that this would work even when its normally ‘choking’.
I am not familiar with tomcat unfortunately, but is there a way to manipulate the concurrency settings of this instead?
Oh, and you might need to also consider the possibility that its the external network services thats limiting the number of connections that it is doing to you down to 300, so it makes no difference how much manipulating of concurrency you are doing on your front side if practically every connection you make relies on an external web services response.
In one of your comments you mentioned data goes stale after 2 minutes. I’d suggest caching the response you get from this service for two minutes to reduce the amount of concurrent connections you are driving to the external web service.
The first step to troubleshoot this is enabling Apache’s mod_status and studying its report — until you’ve done this, actually you’re blindly walking. That’s not righteous. 😉
The second thing to mention (I by myself dislike to be told answers to questions I wasn’t asking, but …) is using more efficient and special front-ends servers like nginx
.
Also, did you exactly restart
apache, or just graceful
ly reloaded it? 🙂