Hey Thomas,
Right, this should be an open discussion on ros-dev, as we are (or should be ?) all impacted by the test results or the lack thereof.
See inline for my answers.
Le 16/02/2015 10:56, Thomas Faber a écrit :
On 2015-02-16 10:27, zefklop (JIRA) wrote:
zefklop commented on ROSTESTS-153:
This and the advapi32:service disabled test call for (optional) per-test timeout parameter in rostests or testman.We can't afford skipping tests, that's an open door to all kind of regressions.
I agree that skipping tests is a really bad idea. It should be used as a last resort only. And we need a proper process for it** -- I've been spending way too much time hunting down skipped tests recently, and way too many of them had their underlying bugs (and sometimes even Jira issue) fixed without re-enabling the test.
However I don't see a practical way to implement your solution. For one, the timeout seems like it would have to cause a machine reboot, which significantly increases test time. I guess rosautotest could keep track of the time and kill the child process, but I don't know how reliable that would be.
I think the whole point here is "test hung" vs "ReactOS hung". If time is the matter here, one could imagine that test bot could communicate with rosautotests with a socket or COM2 polling for activity. If rosautotests is dead, this means that something bad is happening --> reboot. This would be way faster than waiting x seconds for debug spam to happen. I don't see why rosautotest keeping track of its child process would be unreliable. As far as I know, even wine tests do this all the time.
Secondly and more importantly, killing the test on timeout seems worse than skipping the offending part, because then any tests running _after_ the code causing the timeout would never be executed. That means instead of skipping a test that's known to be problematic, we skip completely unrelated and innocent tests.
Well, now I'm confused regarding the reason why you skipped the tests. I thought this was because the debug spam (or the wait call in the advapi32:service case) took too long, and then test bot considered it as timed-out and hence, rebooted the VM.
I'm not sure if there's a way to win this. Let me know if you have any ideas. As it stands I think skipping the offenders is an okay solution -- we just need to stay very aware of them and make sure to re-test more regularly than we do now.
Well, as long as we skip them because "they take too long" and not because of valid, identified bugs, we will lose. We (as in ReactOS and Wine) add more and more tests each release. If we don't find a long-term solution, we're doomed to chase ghosts.
Thanks! -Thomas
** For the record, here's an example process that I try to keep to:
- Create a ROSTESTS bug to indicate that a test was skipped
- Link the ROSTESTS bug to ROSTESTS-125, the Skipped Tests Epic
- Do not #if out the test, instead check for winetest_interactive, and skip the test only if it's not set
- Make sure to call skip() with a message that references the ROSTESTS bug
- Optionally create a CORE issue for the underlying problem if enough information is available; make sure it blocks the ROSTESTS issue
- Do not under any circumstances close the ROSTESTS bug before the test code is actually restored and no longer skips the test in question
This allows the issue to be actually found, by checking the Epic, or grepping for "ROSTESTS-" or "winetest_interactive". Skipped tests should be re-tested regularly, at the very least every time the test is Wine-synced.
gdi32_winetest:bitmap test_mono_bitmap skipped because it takes too long and spams the debug log with too many failures
Key: ROSTESTS-153 URL: https://jira.reactos.org/browse/ROSTESTS-153 Project: ReactOS Test Suite Issue Type: Bug Components: Wine Tests Reporter: Thomas Faber Assignee: Bug Zilla Labels: HACKIf you close this task, that means the winetest_interactive check and skip() in test_mono_bitmap must be gone!
Ros-dev mailing list Ros-dev@reactos.org http://www.reactos.org/mailman/listinfo/ros-dev