Advanced search

Message boards : Number crunching : Always sending 50 WUs regardless of requested amount

AuthorMessage
ChertseyAl
Avatar
Send message
Joined: Jul 28 10
Posts: 57
Credit: 6,881,902
RAC: 2,269
Message 474 - Posted 14 Jan 2012 19:25:19 UTC

    All of my hosts are now receiving 50 WUs rather than a number appropriate to the amount by time requested. Not particularly a problem as I think I'll be able to return them on time, but it's not right :)

    Cheers,

    Al.

    ____________

    skivelitis
    Avatar
    Send message
    Joined: Oct 24 10
    Posts: 3
    Credit: 286,104
    RAC: 732
    Message 476 - Posted 16 Jan 2012 0:02:02 UTC

      Same here except that I will never finish them on time and am forced to abort 40 or so at time. Apart from hating to abort any task, I cannot remember a simple little trick I once knew to abort multiple tasks at once and must do so one at a time. Perhaps the trick no longer works on BOINC v.6.12.34. I was somehow able to drag the mouse to highlight multiple tasks. One of the F keys perhaps? A better solution would be a more accurate time to completion estimate as they seem to be underestimated by a factor of 20 or so on my boxes.
      ____________

      BeemerBiker
      Send message
      Joined: Nov 13 10
      Posts: 6
      Credit: 930,282
      RAC: 34
      Message 482 - Posted 16 Jan 2012 18:29:04 UTC

        Hold down shift and move the cursor to highlight in BM. This does not work in BT. In BT you can use the ctrl key and touch the top and the bottom of the task rows to select a group to terminate.

        skivelitis
        Avatar
        Send message
        Joined: Oct 24 10
        Posts: 3
        Credit: 286,104
        RAC: 732
        Message 483 - Posted 17 Jan 2012 8:23:37 UTC

          Thanks for the reminder BB!

          Ananas
          Send message
          Joined: Apr 10 11
          Posts: 12
          Credit: 2,501,600
          RAC: 4,601
          Message 487 - Posted 21 Jan 2012 16:57:00 UTC

            Last modified: 21 Jan 2012 16:59:02 UTC

            The rsc_fpops_est value in the workunits is extremely wrong in this project, something you can see on your host pages (Duration Correction Factor should be as close to 1 as possible, here it is somewhere near 20)

            I guess the server side scheduler uses rsc_fpops_est to assign the requested amount of work to a computer and does not use the correction factor to adjust the assigned time calculation.

            ChertseyAl
            Avatar
            Send message
            Joined: Jul 28 10
            Posts: 57
            Credit: 6,881,902
            RAC: 2,269
            Message 488 - Posted 21 Jan 2012 18:43:23 UTC

              This seems to have settled down now on my hosts, getting a sensible number of tasks again :) Only had to abort about 10 in total, so not too bad.

              Al.

              ChertseyAl
              Avatar
              Send message
              Joined: Jul 28 10
              Posts: 57
              Credit: 6,881,902
              RAC: 2,269
              Message 551 - Posted 27 Jan 2013 19:26:16 UTC - in response to Message 488.

                This seems to have settled down now on my hosts, getting a sensible number of tasks again :) Only had to abort about 10 in total, so not too bad.


                I spoke too soon ;)

                For some reason, on all but one host, I always get 10 WUs per core, regardless of the amount of work requested. I can easily crunch them in time, but it means my caches are filled for way over my normal cache time and it stops work fetch on other projects.

                Maybe I just need to leave them to settle down again, but I don't know what changed to cause this to happen :(

                Cheers,

                Al.

                Werinbert
                Send message
                Joined: May 11 13
                Posts: 1
                Credit: 100,800
                RAC: 60
                Message 572 - Posted 13 May 2013 16:43:12 UTC

                  The problem is still around, it dumped 40 tasks on my machine when I had space allocated for a mere 2 tasks.

                  whynot
                  Send message
                  Joined: Sep 15 10
                  Posts: 30
                  Credit: 9,833,244
                  RAC: 2,201
                  Message 573 - Posted 25 May 2013 17:37:58 UTC

                    Recently I've made some observations, I'm not so glad to present (noise from parallel projects deleted).


                    20-May-2013 05:33:39 [---] [wfd]: work fetch start
                    20-May-2013 05:33:39 [primaboinca] chosen: minor shortfall CPU: 0.00 inst, 1393.80 sec
                    20-May-2013 05:33:39 [---] [wfd] ------- start work fetch state -------
                    20-May-2013 05:33:39 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
                    20-May-2013 05:33:39 [---] [wfd] CPU: shortfall 1393.80 nidle 0.00 saturated 41806.20 busy 0.00 RS fetchable 100.00 runnable 300.00
                    20-May-2013 05:33:39 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 61440.00 (comm deferred)
                    20-May-2013 05:33:39 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -18454.46 backoff dt 0.00 int 0.00 (comm deferred)
                    20-May-2013 05:33:39 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD -225.14 backoff dt 0.00 int 0.00 (comm deferred)
                    20-May-2013 05:33:39 [primaboinca] [wfd] CPU: fetch share 1.00 LTD -59228.18 backoff dt 0.00 int 0.00
                    20-May-2013 05:33:39 [ABC@home] [wfd] overall LTD 0.00
                    20-May-2013 05:33:39 [SZTAKI Desktop Grid] [wfd] overall LTD -28786.98
                    20-May-2013 05:33:39 [PrimeGrid] [wfd] overall LTD -6654.77
                    20-May-2013 05:33:39 [primaboinca] [wfd] overall LTD -75407.47
                    20-May-2013 05:33:39 [---] [wfd] ------- end work fetch state -------
                    20-May-2013 05:33:39 [primaboinca] [wfd] request: 1393.80 sec CPU (1393.80 sec, 0.00)
                    20-May-2013 05:33:39 [primaboinca] Sending scheduler request: To fetch work.
                    20-May-2013 05:33:39 [primaboinca] Reporting 46 completed tasks, requesting new tasks
                    20-May-2013 05:33:53 [primaboinca] Scheduler request completed: got 2 new tasks


                    At time of observation real estimated run-time is ~7000sec. As you can see client asks for 1393.80sec. What would be 0.199 WU. Instead gets 2.

                    Another one, just a couple hours later.


                    20-May-2013 09:33:19 [---] [wfd]: work fetch start
                    20-May-2013 09:33:19 [primaboinca] chosen: major shortfall CPU: 0.00 inst, 83650.42 sec
                    20-May-2013 09:33:19 [---] [wfd] ------- start work fetch state -------
                    20-May-2013 09:33:19 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
                    20-May-2013 09:33:19 [---] [wfd] CPU: shortfall 83650.42 nidle 0.00 saturated 27744.42 busy 0.00 RS fetchable 100.00 runnable 300.00
                    20-May-2013 09:33:19 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 46862.72 int 86400.00
                    20-May-2013 09:33:19 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -24033.97 backoff dt 0.00 int 0.00 (comm deferred)
                    20-May-2013 09:33:19 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
                    20-May-2013 09:33:19 [primaboinca] [wfd] CPU: fetch share 1.00 LTD -65457.82 backoff dt 0.00 int 0.00
                    20-May-2013 09:33:19 [ABC@home] [wfd] overall LTD 0.00
                    20-May-2013 09:33:19 [SZTAKI Desktop Grid] [wfd] overall LTD -35727.97
                    20-May-2013 09:33:19 [PrimeGrid] [wfd] overall LTD -4461.58
                    20-May-2013 09:33:19 [primaboinca] [wfd] overall LTD -74931.11
                    20-May-2013 09:33:19 [---] [wfd] ------- end work fetch state -------
                    20-May-2013 09:33:19 [primaboinca] [wfd] request: 83650.42 sec CPU (83650.42 sec, 0.00)
                    20-May-2013 09:33:19 [primaboinca] Sending scheduler request: To fetch work.
                    20-May-2013 09:33:19 [primaboinca] Reporting 9 completed tasks, requesting new tasks
                    20-May-2013 09:33:30 [primaboinca] Scheduler request completed: got 50 new tasks


                    Client requests 83650.42sec. What would be 11.950 WU. Instead gets 50.

                    I have a theory what happens. Regulars should remember that from server's POV, estimated run-time is ~760sec. Now, 1393.80 divided by 760 is: 1.833 WU, what is pretty close to 2 WU from first example. 83650.42sec by 760 is: whooping 110 WU. Why it's 50 instead? We all know, because those are 50 WU that are always ready.

                    Now, what if some new-comer would, after couple of hours, find that he get 10 times more workload and then desperately trying to get reasonable amount by deleting everything (or aborting, what doesn't matter for the purpose of this theory)? In no-time server will get loads of resends (just like right now, at time of posting: ~2.5 kWU). Then there's a hard-coded limit:

                    [code]
                    23-May-2013 01:33:33 [---] [wfd]: work fetch start
                    23-May-2013 01:33:33 [primaboinca] chosen: major shortfall CPU: 0.00 inst, 103518.66 sec
                    23-May-2013 01:33:33 [---] [wfd] ------- start work fetch state -------
                    23-May-2013 01:33:33 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
                    23-May-2013 01:33:33 [---] [wfd] CPU: shortfall 103518.66 nidle 0.00 saturated 28430.55 busy 0.00 RS fetchable 100.00 runnable 300.00
                    23-May-2013 01:33:33 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 14122.52 int 86400.00
                    23-May-2013 01:33:33 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -52535.69 backoff dt 0.00 int 0.00 (comm deferred)
                    23-May-2013 01:33:33 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.0

                    whynot
                    Send message
                    Joined: Sep 15 10
                    Posts: 30
                    Credit: 9,833,244
                    RAC: 2,201
                    Message 575 - Posted 1 Jun 2013 14:10:12 UTC

                      [didn't know we have message limits, forums had cut off patetic part of last week post; luckily, I'd network problems last time]


                      23-May-2013 01:33:33 [---] [wfd]: work fetch start
                      23-May-2013 01:33:33 [primaboinca] chosen: major shortfall CPU: 0.00 inst, 103518.66 sec
                      23-May-2013 01:33:33 [---] [wfd] ------- start work fetch state -------
                      23-May-2013 01:33:33 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
                      23-May-2013 01:33:33 [---] [wfd] CPU: shortfall 103518.66 nidle 0.00 saturated 28430.55 busy 0.00 RS fetchable 100.00 runnable 300.00
                      23-May-2013 01:33:33 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 14122.52 int 86400.00
                      23-May-2013 01:33:33 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -52535.69 backoff dt 0.00 int 0.00 (comm deferred)
                      23-May-2013 01:33:33 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
                      23-May-2013 01:33:33 [primaboinca] [wfd] CPU: fetch share 1.00 LTD -151432.65 backoff dt 0.00 int 0.00 (overworked)
                      23-May-2013 01:33:33 [ABC@home] [wfd] overall LTD 0.00
                      23-May-2013 01:33:33 [SZTAKI Desktop Grid] [wfd] overall LTD -61975.66
                      23-May-2013 01:33:33 [PrimeGrid] [wfd] overall LTD -7903.51
                      23-May-2013 01:33:33 [primaboinca] [wfd] overall LTD -162476.08
                      23-May-2013 01:33:33 [---] [wfd] ------- end work fetch state -------
                      23-May-2013 01:33:33 [primaboinca] [wfd] request: 103518.66 sec CPU (103518.66 sec, 0.00)
                      23-May-2013 01:33:33 [primaboinca] Sending scheduler request: To fetch work.
                      23-May-2013 01:33:33 [primaboinca] Reporting 15 completed tasks, requesting new tasks
                      23-May-2013 01:33:52 [primaboinca] Scheduler request completed: got 80 new tasks
                      23-May-2013 01:33:52 [---] [wfd] Request work fetch: RPC complete


                      What can we do about it? We can either get out of this building or, as they say, "We can be patient". For a long time I've been bothered by client stuffing workload for a day per core. Now I understand what happens. After couple of tries (about a week) client gives up on stabilizing running workload and turns to daily basis (as you can see I have buffer set for 8+4 hour).

                      I have only one question. Fabio, tell me. Is it possible to do something with user-of-the-day? That litter hangs on front page for a darn month! If you don't care, shut the thing down, what a big deal?

                      ____________
                      I'm counting for science.
                      Points just make me sick.

                      Profile Ray_GTI-R
                      Send message
                      Joined: Sep 26 13
                      Posts: 1
                      Credit: 400
                      RAC: 38
                      Message 599 - Posted 26 Sep 2013 14:01:51 UTC

                        Now, what if some new-comer would, after couple of hours, find that he get 10 times more workload and then desperately trying to get reasonable amount by deleting everything (or aborting, what doesn't matter for the purpose of this theory)?


                        Hi.

                        Just joined this project for the first time in order to test something and got ...

                        161 CPU tasks (!) and all had virtually the same deadline for completion in one week from now. (That on a part-time P4 with 25% devoted to crunching.)

                        Bonkers.

                        Luckily I know how to manage the situation as I have BOINC'd for years :-D

                        Post to thread

                        Message boards : Number crunching : Always sending 50 WUs regardless of requested amount


                        Main page · Your account · Message boards


                        Copyright © 2017 primaboinca.com