Amazon Auto Scaling API for Job Servers -
i have read pretty entire documentation beyond on aws api understand stuff.
however still wondering (without having used api yet since wanna find out first someone) if scenario viable as.
say got bunch of work servers setup within group working on job each , comes time (i dunno say, avg cpu greater or in case less 80%) scale or down.
my main worry loss of in progress job. maybe better explained example:
- i startup 5 job servers 5 jobs on them
- a job finishes on 1 , fires scale down trigger in amazon api
- amazon comes scale them down
- i lose job server running job (90% complete gotta start again)
with in mind better me use amazon spot instance/ec2 api , manage own scaling or there missing how amazon api judges server terminations?
to honest rather scale sqs waiting amount health figure on servers:
- for every 100 messages waiting increase cluster capacity 20%
but doesn't seem viable either.
so aws api not right solution or missing vital info how works?
thanks,
after searching found there 2 accepted ways manage api or in general jobs:
one method manipulate health of server directly within worker itself. quite few sites , effective, when worker detects no more jobs or redundancy in system marks server on unhealthy. way api comes along , automatically takes down after period of time.
so method have scale policy based on sqs queue size on period of time (say every 5 mins of sqs messages being on 100 add 2 servers; every 10 mins of sqs messages being on 500 double network capacity 50%). scale down handled code instead of active policy.
this method work 0 clusters can down cluster way no servers when it's not being used making quite cost effective.
advantages:
- easy setup
- using aws api functions
- probably quickest setup
- using aws managed api manage cluster size you
disadvantages:
- hard manage without using full aws api i.e. when making new server can't it's instanceid without doing full api command return of instanceids. there other occasions aws api gets in way , makes life little harder if want element of self control on cluster
- relying on amazon know what's best wallet. relying on amazon api scale correctly, advantage many disadvantage some.
- the worker must house of server pool code meaning worker not generic , can't instantly moved cluster without configuration change.
with in mind there second option, diy. use ec2 spot instance , on demand instance api make own api based around custom rules. pretty simple explain:
- you have cli script when run starts, say, 10 servers
- you have cronjob when detects satisfying of conditions downs servers or ups more
advantages:
- easy , clean manage end
- can make generic workers
- the server pool can start manage many clusters
- you can make rules , not quite complex getting figures metrics on aws , using them comparison , time ranges understand if things should change.
disadvantages:
- hard multi-region (not bad sqs since sqs single region)
- hard deal errors in region capacity , workload
- you must rely on own servers uptime , own code ensure cronjob runs should , provisions servers should , breaks them down when should.
so seems battle of more comfortable end user. mulling 2 still , have created small self hosted server pooler work me @ same time tempted try , work on aws' own api.
hope helps people,
edit: note either of these methods still require function on side predict how should bid, such need call bid history api on spot type (ec2 type) , compute how bid.
another edit: way automatically detect redundancy in system check empty responses metric sqs queue. amount of times workers have pinged queue , received no response. quite effective if use exclusive lock in app duration of worker.
Comments
Post a Comment