Skip to content
Commit 3e03a812 authored by Kohsuke Kawaguchi's avatar Kohsuke Kawaguchi
Browse files

This might be causing a slow down during artifact copy:

(02:01:53 PM) jorgenpt: kohsuke: How are JNLP slaves connected to the master?
(02:02:09 PM) kohsuke: by making a TCP/IP connection
(02:02:21 PM) jorgenpt: kohsuke: I'm seeing pretty crappy performance when copying artifacts to a JNLP slave, but on the same machine, copying artifacts to the master from an SSH slave seems very fast.
(02:03:00 PM) kohsuke: it's possible that it fell back to HTTP connection
(02:03:06 PM) kohsuke: that can make things slower
(02:03:33 PM) jorgenpt: kohsuke: How can I tell?
(02:03:36 PM) jorgenpt: Nothing in the slave log
(02:04:03 PM) kohsuke: Yeah, I should log it
(02:04:09 PM) kohsuke: in the mean time, server thread dump can tell
(02:04:34 PM) kohsuke: or maybe there's some lack of buffering somewhere with TCP/IP connection of JNLP slaves
(02:04:38 PM) jorgenpt: What does that mean?
(02:04:50 PM) kohsuke: that means give me the server thread dump and I can tell you
(02:04:56 PM) kohsuke: http://yourjenkinsserver/threadDump/
(02:05:20 PM) jorgenpt: Ah, cool
(02:07:26 PM) jorgenpt: Jesus, gist.github.com is silly slow
(02:07:53 PM) jorgenpt: kohsuke: http://pastebin.com/LaszVA11
(02:08:45 PM) kohsuke: I mean the server VM not the slave VM
(02:08:50 PM) mode (+o hare_brain) by ChanServ
(02:09:29 PM) jorgenpt: Ah
(02:13:14 PM) jorgenpt: kohsuke: http://pastebin.com/ydn8J8Tw
(02:15:59 PM) kohsuke: I don't see any JNLP slaves connected here
(02:16:01 PM) kohsuke: what's the name of it?
(02:16:34 PM) jorgenpt: osx-build_gui
(02:16:58 PM) kohsuke: OK, I missed that. It's connected via TCP/IP so it's on the fast path
(02:17:03 PM) jorgenpt: Ok
(02:17:35 PM) kohsuke: that probably means there's some stupid things going on with this connection mode
(02:17:54 PM) kohsuke: maybe this is why other people often complain about the artifact copy speed
(02:18:04 PM) kohsuke: maybe it's only on certain connection mode
(02:18:09 PM) jorgenpt: Where's the code that's responsible for setting up the socket on the slave?
(02:18:31 PM) jorgenpt: It wasn't a problem until the builder and the server were at different sites
(02:18:43 PM) jorgenpt: Using iperf I still get 80-90mbit/s from the slave to the master
(02:18:57 PM) kohsuke: hudson.remoting.Launcher.runAsTcpClient()
(02:18:59 PM) jorgenpt: But Jenkins with JNLP only gets a few hundred kilobyte/s copying
(02:19:30 PM) kohsuke: Oh, don't I need TCP_NODELAY?
(02:19:41 PM) jorgenpt: In https://github.com/jenkinsci/remoting?
(02:19:46 PM) kohsuke: yes
(02:20:19 PM) jorgenpt: I don't think that should impact it when we're transmitting ~40MB
(02:20:32 PM) jorgenpt: I think TCP_NODELAY makes a difference when you're only sending a little bit of data
(02:20:39 PM) jorgenpt: (http://www.unixguide.net/network/socketfaq/2.16.shtml)
(02:21:44 PM) mkoch: hello, is https://wiki.jenkins-ci.org/display/JENKINS/Building+Jenkins up-to-date? I ask becuase I just built jenkins from source and after "mvn hudson-dev:run" I get only a 404 on http://localhost:8080
(02:22:05 PM) kohsuke: That 40MB stuff gets chopped up into a chunk by the remoting layer with TCP-like windowing and other stuff
(02:22:26 PM) kohsuke: It doesn't translate into 40MB write syscall
(02:22:45 PM) kohsuke: I think it's worth a try especially if you are willing to try it out
(02:23:08 PM) kohsuke: otherwise I can't think of anything obvious.
(02:23:20 PM) kohsuke: both server & client are buffering as can be seen in the stack trace
(02:23:28 PM) kohsuke: and otherwise it's a fairly plain socket connection
(02:24:16 PM) jorgenpt: Is there any way to set these things without recompiling etc?
(02:24:52 PM) kohsuke: we do need to recompile I'm afraid
(02:25:09 PM) jorgenpt: Our networking guy is looking at the tcpdump now
(02:25:18 PM) jorgenpt: Let me wait until he gets back to me to see what we can do
(02:25:22 PM) jorgenpt: I wish I didn't have to use JNLP :/
(02:25:25 PM) kohsuke: sounds good
(02:25:33 PM) jorgenpt: But I need Java running in a context where it can start GUI applications (ios-sim)
(02:27:05 PM) mkoch: What is the easiest way to test changes the CLI stuff?
(02:28:25 PM) kohsuke: mkoch: that doc should be up to date
(02:28:38 PM) kohsuke: what does "mvn hudson-dev:run" report?
(02:28:58 PM) mkoch: it starts the jetty server as expected
(02:30:01 PM) mkoch: I built the WAR with mvn -Plight-tests clean install
(02:30:20 PM) mkoch: before running hudson-dev:run
(02:30:32 PM) kohsuke: and no error message reported of any kind?
(02:30:44 PM) kohsuke: maybe you are running "hudson-dev:run" from top, not from war/ ?
(02:30:45 PM) mkoch: so I assume I followed everyting in the docs
(02:30:48 PM) jorgenpt: Ho, hum.
(02:30:53 PM) jorgenpt: I guess I don't actually need JNLP.
(02:31:05 PM) jorgenpt: I just need the computer to have been logged in once.
(02:32:38 PM) mkoch: gah, mistyped "cd war" and didnt recognized. Thanks for the hint.
(02:32:52 PM) mkoch: now I can finish that patch
(02:33:32 PM) kohsuke: I should have "mvn hudson-dev:run" on top fail more gracefully
(02:33:37 PM) kohsuke: I've made that mistake a number of times myself
(02:33:53 PM) kohsuke: jorgenpt: our fixing the performance problem for JNLP slaves help other users too
(02:34:08 PM) kohsuke: from my PoV, it'd be great to get this solved properly than you work around it
(02:34:18 PM) kohsuke: (and yes, I know that's not the goal you have)
(02:34:23 PM) jorgenpt: :-)
(02:34:46 PM) jorgenpt: I don't really like JNLP to begin with, because it keeps failing when the SSL certs change or we move the box :p
(02:34:51 PM) jorgenpt: Too fragile
(02:37:19 PM) mode (+v olamy) by ChanServ
(02:37:27 PM) mode (+v buep) by ChanServ
(02:38:13 PM) jorgenpt: kohsuke: Hm, I'm not sure it's JNLP related
(02:38:20 PM) jorgenpt: It seems more like it's related to the direction of the traffic
(02:38:25 PM) jorgenpt: It could be specific to our network
(02:38:32 PM) kohsuke: ok
(02:38:34 PM) jorgenpt: Copying *from* master = fast, copying *to* master = slow.
(02:38:39 PM) jorgenpt: Err
(02:38:41 PM) jorgenpt: Reverse that
(02:38:46 PM) jorgenpt: Copying *from* master is the slow one.
(02:40:37 PM) jorgenpt: Maybe it's related to the fingerprinting stuff?
(02:41:19 PM) kohsuke: that's not impossible, but I find it difficult to believe that md5 computation is slower than network transfer
(02:42:57 PM) jorgenpt: Yeah~
(02:43:06 PM) jorgenpt: 3 minutes to copy 40MB over 100mbit
(02:45:40 PM) kohsuke: if I build a binary for you, would you be willing to try?
(02:45:58 PM) kohsuke: jorgenpt:  I can create a new binary for you quickly, but for me to properly set up a test environment would take far longer
(02:46:18 PM) jorgenpt: What does the new binary do?
(02:46:23 PM) jorgenpt: If it's slave-only, then yes
(02:46:28 PM) jorgenpt: If it requires restarting the master, then no.
(02:46:39 PM) kohsuke: the latter, because TCP_NODELAY needs to be applied on both ends
(02:46:54 PM) kohsuke: all right, what if I put this in now, and we come back to this in the next update?
(02:47:19 PM) jorgenpt: Sure
(02:49:15 PM) kohsuke: any other socket options that I could be using while we are at it?
(02:50:08 PM) jorgenpt: No idea :)
(02:50:21 PM) kohsuke: looking at jetty
(02:50:30 PM) kohsuke: it's using nodelay, so_timeout and soLinger
(02:50:39 PM) jorgenpt: Hrm, I guess we updated Jenkins a bunch of versions at the same time as moving it to a different site
(02:50:47 PM) jorgenpt: So it could be related to Jenkins versions too :/
(02:51:21 PM) kohsuke: I don't think I need SO_LINGER
parent 0f8d1a6d
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment