SLURM: cannot set QoS limit for gres/gpu for a single job or userHow to get GPU acceleration working for my...
Can my boyfriend, who lives in the UK and has a Polish passport, visit me in the USA?
Are required indicators necessary for radio buttons?
Overwrite file only if data
The teacher logged me in as administrator for doing a short task, is the whole system now compromised?
Is there such a thing as too inconvenient?
Does Git delete empty folders?
What can I do to keep a threaded bolt from falling out of its slot?
How do you call it when two celestial bodies come as close to each other as they will in their current orbits?
What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?
Chess software to analyze games
What is the difference between a premise and an assumption in logic?
Does adding the 'precise' tag to daggers break anything?
How to dismiss intrusive questions from a colleague with whom I don't work?
Would combining A* with a flocking algorithm be too performance-heavy?
How would one country purchase another?
How to avoid using System.String with Rfc2898DeriveBytes in C#
What is "Wayfinder's Guide to Eberron"?
Most practical knots for hitching a line to an object while keeping the bitter end as tight as possible, without sag?
Why my earth simulation is slower than the reality?
How to "know" if I have a passion?
(Why) May a Beit Din refuse to bury a body in order to coerce a man into giving a divorce?
Are illustrations in novels frowned upon?
Do I have to learn /o/ or /ɔ/ separately?
Why doesn't mathematics collapse even though humans quite often make mistakes in their proofs?
SLURM: cannot set QoS limit for gres/gpu for a single job or user
How to get GPU acceleration working for my old PC?Guide to set up GPU passthroughDoes /usr/bin/top tool aggregate CPU and GPU usage in a single value?Check CPU/thread usage for a node in the Slurm job managerHow to cancel jobs on Slurm with job ID(job number) bigger than a certain number?How to use a variable with SLURM sbatch to set the output/error file name?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=qos
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
GresTypes=gpu
NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core
I have appropriate gres.conf
and uniform users. I create a user and QoS like this, for example:
sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1
Then I modify the user setting qos=student
.
But I still can enqueue and run successfully jobs with --gres=gpu:2
and more. Could someone help me to pin the problem here?
gpu slurm
add a comment |
I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=qos
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
GresTypes=gpu
NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core
I have appropriate gres.conf
and uniform users. I create a user and QoS like this, for example:
sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1
Then I modify the user setting qos=student
.
But I still can enqueue and run successfully jobs with --gres=gpu:2
and more. Could someone help me to pin the problem here?
gpu slurm
add a comment |
I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=qos
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
GresTypes=gpu
NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core
I have appropriate gres.conf
and uniform users. I create a user and QoS like this, for example:
sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1
Then I modify the user setting qos=student
.
But I still can enqueue and run successfully jobs with --gres=gpu:2
and more. Could someone help me to pin the problem here?
gpu slurm
I have a cluster with several GPU worker nodes. Users cannot log in onto those nodes and on the master machine (login node) I would like to enforce limit for 1 card per job for each user using QoS. This is my configuration:
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=qos
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
GresTypes=gpu
NodeName=steven NodeAddr=10.0.0.60 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
NodeName=bruce NodeAddr=10.0.0.59 CPUs=40 Gres=gpu:4 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=54297 State=UNKNOWN
PartitionName=workonly Nodes=ALL Default=Yes State=UP Maxtime=1 SelectTypeParameters=CR_Core
I have appropriate gres.conf
and uniform users. I create a user and QoS like this, for example:
sudo sacctmgr create account Name=students cluster=chernobyl Description="Account for students" organization=Blah
sudo sacctmgr create user name=kmwil account=students adminlevel=none cluster=chernobyl defaultaccount=students partition=workonly
sudo sacctmgr create qos Name=student flags=denyonlimit MaxWall=5 Priority=1 maxtresperuser=gres/gpu=1
Then I modify the user setting qos=student
.
But I still can enqueue and run successfully jobs with --gres=gpu:2
and more. Could someone help me to pin the problem here?
gpu slurm
gpu slurm
edited 2 days ago
muru
43.8k5 gold badges110 silver badges181 bronze badges
43.8k5 gold badges110 silver badges181 bronze badges
asked 2 days ago
KamilKamil
4961 gold badge5 silver badges22 bronze badges
4961 gold badge5 silver badges22 bronze badges
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f536152%2fslurm-cannot-set-qos-limit-for-gres-gpu-for-a-single-job-or-user%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f536152%2fslurm-cannot-set-qos-limit-for-gres-gpu-for-a-single-job-or-user%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown