Each scheduler RPC reports results, gets work, or both. The client's scheduler RPC policy has several components: when to make a scheduler RPC, which project to contact, which scheduling server for that project, how much work to ask for, and what to do if the RPC fails.
The scheduler RPC policy has the following goals:
The client maintains an exponentially-averaged sum of the CPU time it has devoted to each project. The constant EXP_DECAY_RATE determines the decay rate (currently a factor of e every week).
Each project is assigned a resource debt, computed as
resource_debt = resource_share / exp_avg_cpu
Resource debt is a measure of how much work the client owes the project, and in general the project with the greatest resource debt is the one from which work should be requested.
The client maintains a minimum RPC time for each project. This is the earliest time at which a scheduling RPC should be done to that project (if zero, an RPC can be done immediately). The minimum RPC time can be set for various reasons:
Communication with schedulers is organized into sessions, each of which may involve many RPCs. There are two types of sessions:
get_work_session() { while estimated work < high water mark P = project with greatest debt and min_rpc_time < now for each scheduler URL of P attempt an RPC to that URL if no error break if some RPC succeeded P.nrpc_failures = 0 else P.nrpc_failures++ P.min_rpc_time = exponential_backoff(P.min_rpc_failures) if P.nrpc_failures mod MASTER_FETCH_PERIOD = 0 P.fetch_master_flag = true for each project P with P.fetch_master_flag set read and parse master file if error P.nrpc_failures++ P.min_rpc_time = exponential_backoff(P.min_rpc_failures) if got any new scheduler urls P.nrpc_failures = 0 P.min_rpc_time = 0 } report_result_session(project P) { for each scheduler URL of project attempt an RPC to that URL if no error break if some RPC succeeded P.nrpc_failures = 0 else P.nrpc_failures++; P.min_rpc_time = exponential_backoff(P.min_rpc_failures) }The logic for initiating scheduler sessions is expressed in the following poll function:
if a scheduler RPC session is not active if estimated work is less than low-water mark start a get-work session else if some project P has overdue results start a report-result session for P; is P is the project with greatest resource debt, the RPC request should ask for enough work to bring us up to the high-water mark