Advanced search

Message boards : Number crunching : All Linux tasks error with "process got signal 11"

AuthorMessage
ChertseyAl
Avatar
Send message
Joined: Jul 28 10
Posts: 57
Credit: 6,881,902
RAC: 2,269
Message 600 - Posted 27 Sep 2013 8:39:48 UTC

    Last night every single Linux 7.06 task errored out (the Windows ones are OK).

    Typical example:

    http://www.primaboinca.com/result.php?resultid=9197201

    <core_client_version>5.10.45</core_client_version>
    <![CDATA[
    <message>
    process got signal 11
    </message>
    <stderr_txt>
    Hello, stderr!!!!

    </stderr_txt>
    ]]>

    I've crunched thousands of these successfully, and now suddenly both of my Linux boxes have the same problem.

    Cheers,

    Al.

    whynot
    Send message
    Joined: Sep 15 10
    Posts: 30
    Credit: 10,790,444
    RAC: 2,424
    Message 601 - Posted 28 Sep 2013 15:43:57 UTC

      (that's me speculating here) As you can see, I'm living with this for years. So I'm observing this for years and I think what's happening here is:



      • science code requests shared memory segment;

      • for whatever reason that request fails (aka NULL);

      • in contrary with any other (aka respectful science code) uc passes it along (respectful science code exits "with zero status but no 'finished' file");

      • NULL is passed to libc;

      • libc croaks on uc with 'process got signal 11' (11 is misleading here, it's not real SEGFAULT);



      Iff uc would be 'respectful science code' then it would be restarted by client some time later.
      uc isn't.

      p.s. (based on my previous success with crap on home page) Fabio, where are my fscking sources?

      ____________
      I'm counting for science.
      Points just make me sick.

      Profile Conan
      Send message
      Joined: Aug 2 11
      Posts: 6
      Credit: 1,137,000
      RAC: 3,437
      Message 602 - Posted 2 Oct 2013 7:03:38 UTC

        One of my Linux machines is doing this almost constantly but the other Linux and Windows machines are having no problems, similar project mix on each machine but this one Linux machine kills most Primaboinca work units. It also gives ibercivis a hiding as well for some reason with the same error.

        So no sure if related to Primaboinca but it was around the time I restarted this project that the error started.

        I will do some more tests to confirm or deny this theory.

        Conan

        ChertseyAl
        Avatar
        Send message
        Joined: Jul 28 10
        Posts: 57
        Credit: 6,881,902
        RAC: 2,269
        Message 604 - Posted 2 Oct 2013 16:51:48 UTC - in response to Message 602.

          I've tried a few more WUs now on the linux boxes and they are OK. Must have been a bad batch that hit both machines at the same time.

          Cheers,

          Al.

          Post to thread

          Message boards : Number crunching : All Linux tasks error with "process got signal 11"


          Main page · Your account · Message boards


          Copyright © 2017 primaboinca.com