[FIXED JENKINS-7813]
Fixed the throughput problem between master/slave communication. This fix contains two independent problems. One was in the remoting. During a large sustained data transfer (such as artifact archiving and large test reports), the way we were doing flow control and ACK-ing were penalizing us badly. I improved the flow control algorithm in remoting 1.23, and also increased advertised window size so that the transfer can saturate available bandwidth even when a latency is large. (And unless the reader side is excessivesly slow, this shouldn't increase any memory consumption.) The other fix was in trilead-ssh2, which is our SSH client implementation used by ssh-slaves plugin. The buffer size for flow control was too small. I improved the way buffering is done to reduce the memory footprint when the reader closely follows the writer, then I increased the advertised window size. Again, this shouldn't increase memory consumption (in fact it'll likely actually reduce them) unless the reader end gets abandoned. On my simulated latency-injected network, the sustained transfer rate is now on par with scp. We win for smaller files because of the TCP slow start penality that scp would incur, and we lose a bit as files get larger due to additional framing overhead. If you have manually extracted slave.jar and placed them on slaves, you need to update them to 2.23 to see the performance benefits.
Loading
Please register or sign in to comment