Skip to content
  1. Jun 09, 2012
  2. Jun 08, 2012
    • Kohsuke Kawaguchi's avatar
      do not approximate the aggregated test result. · 07c09beb
      Kohsuke Kawaguchi authored
      AggregatedTestResultAction remembers the exact build # of the child build, so honor that number, and don't let MatrixBuild.getRunFor(Configuration) picks up some earlier run.
      
      The problem was discovered in the following situation:
      
       - there's a matrix project M with single configuration X
       - M #868 run normally, with 69 failures.
       - M #869 run with 258 failures, and that was reported in MatrixTestResult.add(AbstractTestResultAction), updating  the total count.
       - but for whatever reason, X #869 is lost
       - now if you look at M #869 test report, you'll see 258 as the total count, but the detail drill down would show 69, from X #868.
      
      ----
      (05:25:57 PM) veebers: Hi all, I  have an odd issue with the 'test results'
      (05:26:12 PM) veebers: In this result: Test Result (258 failures / +189) where could the +189 come from?
      (05:26:20 PM) veebers: I can only actually see 69 failed tests :P
      (05:26:48 PM) kohsuke: it means comparing to the last test result, you got 189 more failure
      (05:27:37 PM) veebers: ah ok, but how can I have +189 failing tests when there are only 69 failed this time around?
      (05:28:21 PM) kohsuke: I wonder if it cuts off the summary display at 70 (out of your 258 failures)?
      (05:29:48 PM) veebers: Hmmm, kohsuke I don't know about that. Clicking the 'show all failed tests'  only shows 69 tests :P
      (05:30:22 PM) veebers: and when I look at a prev. failing test, one that has ~200 fails
      (05:30:24 PM) veebers: it shows the lot
      (05:31:07 PM) kohsuke: veebers: want to do the screenshot?
      (05:34:59 PM) veebers: kohsuke: sure: http://static.inky.ws/image/2143/image.jpg
      (05:35:40 PM) kohsuke: this is a matrix project that just has one configuration?
      (05:35:49 PM) kohsuke: Or is it a partial re-run?
      (05:35:53 PM) mode (+o abayer) by ChanServ
      (05:36:46 PM) veebers: first option. Only one config
      (05:37:56 PM) kohsuke: I wonder if this is some kind of rendering gritch
      (05:38:06 PM) kohsuke: perhaps if somehow it's showing the data of the previous run?
      (05:38:24 PM) kohsuke: 69+189=258 can't be a coincidence
      (05:38:53 PM) kohsuke: is this instance publicly visible?
      (05:39:25 PM) veebers: err not that one, just checking if the results are published to the public one
      (05:41:16 PM) veebers: hmm, that's odd. The publicly avail. one isn't showing the right details :P
      (05:41:24 PM) kohsuke: URL?
      (05:41:26 PM) veebers: well, perhaps there is something screwy with that setup
      (05:42:14 PM) veebers: kohsuke: https://jenkins.qa.ubuntu.com/job/dx-autopilot-run/869/
      (05:43:33 PM) veebers: kohsuke: although I don't think that's going to be much help as that gives me a 500 error :P
      (05:43:45 PM) kohsuke: that's a bug of its own
      (05:43:51 PM) kohsuke: I think I can fix that one
      (05:50:29 PM) kohsuke: veebers: I think I have a hypothesis
      (05:51:00 PM) kohsuke: when you do click that 69 failed tests reported under a configuration, I suspect you'll be actually taken to the different build number
      (05:51:23 PM) kohsuke: likely the same configuration build of the previous one
      (05:53:02 PM) veebers: kohsuke: hmm, not sure that's it.
      (05:53:10 PM) veebers: it's the same build number all over
      (05:53:53 PM) kohsuke: I'm pretty sure that's it
      (05:54:01 PM) kohsuke: let me think of other ways to prove this
      (05:54:27 PM) veebers: sure
      (05:54:36 PM) kohsuke: try /job/dx-autopilot-run/869/label=1EBEE0FF-DAC9-11DF-BBDA-64A98C34D485/parentBuild/
      (05:54:43 PM) kohsuke: and I predict you'll see #868 in the UI
      (05:54:45 PM) veebers: so, the public page there says: Test Result (258 failures / +189)
      (05:55:16 PM) veebers: When I click that link in the private server it takes me to that report you saw in the screenshot
      (05:55:43 PM) veebers: which says: .../869/testReport/
      (05:55:52 PM) veebers: and any other link on that page references 869
      (05:56:04 PM) veebers: so perhaps it's somewhere in the background that might be accessing a prev build?
      (05:56:36 PM) veebers: i.e. whatever builds up the top graph bar (at the top of the page) in server end rendering?
      (05:56:50 PM) kohsuke: does /job/dx-autopilot-run/869/ show gray ball?
      (05:57:27 PM) veebers: no, it shows the default config
      (05:57:37 PM) veebers: (yeah, something is wrong with the public facing one :P)
      (05:59:29 PM) veebers: not sure why you can't click through
      (06:00:40 PM) veebers: kohsuke: ah, looks like you are correct
      (06:00:50 PM) veebers: I was looking in the wrong place
      (06:01:01 PM) veebers: kohsuke: d'oh, thanks for pointing that out :)
      (06:01:41 PM) kohsuke: OK, I understand what happened
      (06:02:38 PM) veebers: so yeah, clicking on that config, I can see the line: Started by upstream project dx-autopilot-run build number 868
      (06:04:37 PM) kohsuke: #869 really did have 258 failures, but for some reason Jenkins lost the record of it
      (06:05:02 PM) kohsuke: and when it's displaying this to you, it's approximating the lost record by using the nearest build, that was #868
      (06:05:20 PM) kohsuke: and that's why the public instance, which runs ancient version of Jenkins, reports NPE
      (06:05:30 PM) kohsuke: this version doesn't have that approximation feature
      (06:05:42 PM) kohsuke: veebers: BTW you really should upgrade that publicly facing instance
      (06:06:00 PM) kohsuke: there have been some security advisories issued since then
      (06:06:11 PM) veebers: kohsuke: Makes sense, will get onto that
      (06:06:30 PM) kohsuke: I'm fixing this in the trunk so that it doesn't incorrectly approximate here
      (06:06:36 PM) veebers: nice
      07c09beb
    • Kohsuke Kawaguchi's avatar
      fixed NPE · d8d53053
      Kohsuke Kawaguchi authored
      d8d53053
  3. Jun 07, 2012
  4. Jun 06, 2012
  5. Jun 05, 2012
  6. Jun 04, 2012
    • Kohsuke Kawaguchi's avatar
      Make data consistency self-healing. · 0e12fe4e
      Kohsuke Kawaguchi authored
      Ryan discovered an instance where Label cached data at incorrect state. I'm not sure exactly how this is caused, but it probably is some kind of data race.
      
      Making it self-healing would hopefully eliminate this problem in practice.
      0e12fe4e
  7. Jun 03, 2012
Loading