logging - When ColdFusion is maxing out the CPU, how do I find out what it's chewing/choking on? -
i'm running cf 9.0.1 on ubuntu on "medium" amazon ec2 instance. cf has been seizing-up intermittently (several times per day...but notably not isolated hours of peak usage). @ such times, running top gets me (or similar):
pid user pr ni virt res shr s %cpu %mem time+command 15855 wwwrun 20 0 1762m 730m 20m s 99.3 19.4 13:22.96 coldfusion9 so, it's consuming of server resources. following error has been showing in cfserver.log in lead-up each seize-up:
java.lang.runtimeexception: request timed out waiting available thread run. may want consider increasing number of active threads in thread pool. if run /opt/coldfusion9/bin/coldfusion status, get:
pg/sec db/sec cp/sec reqs reqs reqs avgq avgreq avgdb bytes bytes hi hi hi q'ed run'g to'ed time time time in/sec out/sec 0 0 0 0 -1 -1 150 25 0 0 -1352560 0 0 in administrator, under server settings > request tuning, setting maximum number of simultaneous template requests 25. makes sense far. increase thread pool cover these sort of load spikes. make 200. (which did test.)
however, there's file /opt/coldfusion9/runtime/servers/coldfusion/server-inf/jrun.xml. , of settings in there appear conflict. example, reads:
<service class="jrunx.scheduler.schedulerservice" name="schedulerservice"> <attribute name="bindtojndi">true</attribute> <attribute name="activehandlerthreads">25</attribute> <attribute name="maxhandlerthreads">1000</attribute> <attribute name="minhandlerthreads">20</attribute> <attribute name="threadwaittimeout">180</attribute> <attribute name="timeout">600</attribute> </service> which a) has fewer active threads (what mean?), , b) has max threads exceed simultaneous request limit set in admin. so, i'm not sure. these independent configs need made match manually? or jrun.xml file supposed written cf admin when changes made there? hmm. maybe different because presumably cf scheduler should use subset of available threads, right?...so we'd have threads real live users? have in there:
<service class="jrun.servlet.http.webservice" name="webservice"> <attribute name="port">8500</attribute> <attribute name="interface">*</attribute> <attribute name="deactivated">true</attribute> <attribute name="activehandlerthreads">200</attribute> <attribute name="minhandlerthreads">1</attribute> <attribute name="maxhandlerthreads">1000</attribute> <attribute name="mapcheck">0</attribute> <attribute name="threadwaittimeout">300</attribute> <attribute name="backlog">500</attribute> <attribute name="timeout">300</attribute> </service> this appears have changed when changed cf admin setting...maybe...but it's activehandlerthreads matches new maximum simulataneous requests setting...rather maxhandlerthreads, again exceeds it. finally, have this:
<service class="jrun.servlet.jrpp.jrunproxyservice" name="proxyservice"> <attribute name="activehandlerthreads">200</attribute> <attribute name="minhandlerthreads">1</attribute> <attribute name="maxhandlerthreads">1000</attribute> <attribute name="mapcheck">0</attribute> <attribute name="threadwaittimeout">300</attribute> <attribute name="backlog">500</attribute> <attribute name="deactivated">false</attribute> <attribute name="interface">*</attribute> <attribute name="port">51800</attribute> <attribute name="timeout">300</attribute> <attribute name="cacherealpath">true</attribute> </service> so, i'm not (if any) of these should change , relationship between maximum requests , maximum threads. also, since several of these list maxhandlerthreads 1000, i'm wondering if should set maximum simultaneous requests 1000. there must upper limit depends on available server resources...but i'm not sure , don't want play around since it's production environment.
i'm not sure if pertains issue @ all, when run ps aux | grep coldfusion following:
wwwrun 15853 0.0 0.0 8704 760 pts/1 s 20:22 0:00 /opt/coldfusion9/runtime/bin/coldfusion9 -jar jrun.jar -autorestart -start coldfusion wwwrun 15855 5.4 18.2 1678552 701932 pts/1 sl 20:22 1:38 /opt/coldfusion9/runtime/bin/coldfusion9 -jar jrun.jar -start coldfusion there these 2 , never more these 2 processes. there not appear one-to-one relationship between processes , threads. recall mx 6.1 install maintained many years additional cf processes visible in process list. seemed me @ time had process each thread...so either wrong or quite different in version 9 since it's reporting 25 running requests , showing these 2 processes. if single process can have multiple threads in background, i'm given wonder why have 2 processes instead of one?...just curious.
so, anyway, i've been experimenting while composing post. noted above adjusted maximum simultaneous requests 200. hoping solve problem, cf crashed again (rather slogged down , requests started timing out...so "crashed"). time, top looked similar (still consuming more 99% of cpu), cf status looked different:
pg/sec db/sec cp/sec reqs reqs reqs avgq avgreq avgdb bytes bytes hi hi hi q'ed run'g to'ed time time time in/sec out/sec 0 0 0 0 -1 -1 0 150 0 0 0 0 0 0 obviously, since i'd increased maximum simultaneous requests, allowing more requests run simultaneously...but still maxing out server resources.
further experiments (after restarting cf) showed me server became unusably slogged after 30-35 "reqs run'g", additional requests headed inevitable timeout:
pg/sec db/sec cp/sec reqs reqs reqs avgq avgreq avgdb bytes bytes hi hi hi q'ed run'g to'ed time time time in/sec out/sec 0 0 0 0 -1 -1 0 33 0 0 -492 0 0 0 so, it's clear increasing maximum simultaneous requests has not helped. guess comes down this: having such hard time with? these spikes coming from? bursts of traffic? on pages? requests running @ given time? guess need more information continue troubleshooting. if there long-running requests, or other issues, i'm not seeing in logs (although have option checked in admin). need know requests responsible these spikes. appreciated. thanks.
~day
i've had number of 'high-cpu in production' type bugs , way i've dealt them this:
use jstack pid >> stack.log dump 5 of stack traces, 5 seconds apart. number of traces , timing not critical.
open log in samurai. view of threads @ each time did dump. threads processing code start web- (for requests using built-in server) , jrpp- requests coming in through apache/iis.
read history of each thread. you're looking stack being similar in each dump. if thread looks it's handling same request whole time, bits vary near top point infinite loop happening.
feel free dump stack trace somewhere online , point it.
the other technique i've used understand what's going on modify apache's httpd.conf log time taken: %d , record session id: %{jsessionid} allows trace individual users in run-up hangs , nice stats/graphs data (i use logparser crunch numbers , output csv, followed excel graph data):
logformat "%h %l %u %t "%r" %>s %b %d %{jsessionid}" customanalysis customlog logs/analysis_log customanalysis one other technique i've remembered enable cf metrics, measure of server in runup hang. set log every 10 seconds , change format csv, can grep metrics event log , run them through excel graph server load in runup crashes.
barny
Comments
Post a Comment