SLURM: cannot set QoS limit for gres/gpu for a single job or userHow to get GPU acceleration working for my...

Can my boyfriend, who lives in the UK and has a Polish passport, visit me in the USA?

Are required indicators necessary for radio buttons?

Overwrite file only if data

The teacher logged me in as administrator for doing a short task, is the whole system now compromised?

Is there such a thing as too inconvenient?

Does Git delete empty folders?

What can I do to keep a threaded bolt from falling out of its slot?

How do you call it when two celestial bodies come as close to each other as they will in their current orbits?

What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?

Chess software to analyze games

What is the difference between a premise and an assumption in logic?

Does adding the 'precise' tag to daggers break anything?

How to dismiss intrusive questions from a colleague with whom I don't work?

Would combining A* with a flocking algorithm be too performance-heavy?

How would one country purchase another?

How to avoid using System.String with Rfc2898DeriveBytes in C#

What is "Wayfinder's Guide to Eberron"?

Most practical knots for hitching a line to an object while keeping the bitter end as tight as possible, without sag?

Why my earth simulation is slower than the reality?

How to "know" if I have a passion?

(Why) May a Beit Din refuse to bury a body in order to coerce a man into giving a divorce?

Are illustrations in novels frowned upon?

Do I have to learn /o/ or /ɔ/ separately?

Why doesn't mathematics collapse even though humans quite often make mistakes in their proofs?



SLURM: cannot set QoS limit for gres/gpu for a single job or user


How to get GPU acceleration working for my old PC?Guide to set up GPU passthroughDoes /usr/bin/top tool aggregate CPU and GPU usage in a single value?Check CPU/thread usage for a node in the Slurm job managerHow to cancel jobs on Slurm with job ID(job number) bigger than a certain number?How to use a variable with SLURM sbatch to set the output/error file name?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







0















I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:



AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=qos
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

GresTypes=gpu
NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core


I have appropriate gres.conf and uniform users. I create a user and QoS like this, for example:



sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1


Then I modify the user setting qos=student.



But I still can enqueue and run successfully jobs with --gres=gpu:2 and more. Could someone help me to pin the problem here?










share|improve this question

































    0















    I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:



    AccountingStorageType=accounting_storage/slurmdbd
    AccountingStorageTRES=gres/gpu
    AccountingStorageEnforce=qos
    SelectType=select/cons_res
    SelectTypeParameters=CR_Core_Memory

    GresTypes=gpu
    NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
    NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
    PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core


    I have appropriate gres.conf and uniform users. I create a user and QoS like this, for example:



    sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
    sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
    sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1


    Then I modify the user setting qos=student.



    But I still can enqueue and run successfully jobs with --gres=gpu:2 and more. Could someone help me to pin the problem here?










    share|improve this question





























      0












      0








      0








      I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:



      AccountingStorageType=accounting_storage/slurmdbd
      AccountingStorageTRES=gres/gpu
      AccountingStorageEnforce=qos
      SelectType=select/cons_res
      SelectTypeParameters=CR_Core_Memory

      GresTypes=gpu
      NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
      NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
      PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core


      I have appropriate gres.conf and uniform users. I create a user and QoS like this, for example:



      sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
      sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
      sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1


      Then I modify the user setting qos=student.



      But I still can enqueue and run successfully jobs with --gres=gpu:2 and more. Could someone help me to pin the problem here?










      share|improve this question
















      I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:



      AccountingStorageType=accounting_storage/slurmdbd
      AccountingStorageTRES=gres/gpu
      AccountingStorageEnforce=qos
      SelectType=select/cons_res
      SelectTypeParameters=CR_Core_Memory

      GresTypes=gpu
      NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
      NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
      PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core


      I have appropriate gres.conf and uniform users. I create a user and QoS like this, for example:



      sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
      sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
      sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1


      Then I modify the user setting qos=student.



      But I still can enqueue and run successfully jobs with --gres=gpu:2 and more. Could someone help me to pin the problem here?







      gpu slurm






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 2 days ago









      muru

      43.8k5 gold badges110 silver badges181 bronze badges




      43.8k5 gold badges110 silver badges181 bronze badges










      asked 2 days ago









      KamilKamil

      4961 gold badge5 silver badges22 bronze badges




      4961 gold badge5 silver badges22 bronze badges

























          0






          active

          oldest

          votes














          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f536152%2fslurm-cannot-set-qos-limit-for-gres-gpu-for-a-single-job-or-user%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f536152%2fslurm-cannot-set-qos-limit-for-gres-gpu-for-a-single-job-or-user%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

          Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

          Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...