Confusing systemd behaviour with OnFailure= and Restart=Make systemd reload only single openvpn process and...

Loading military units into ships optimally, using backtracking

Markov-chain sentence generator in Python

Are employers legally allowed to pay employees in goods and services equal to or greater than the minimum wage?

How can Radagast come across Gandalf and Thorin's company?

How would timezones work on a planet 100 times the size of our Earth

Safest way to store environment variable value in a file

What is this "Table of astronomy" about?

What are these funnel-looking green things in my yard?

How to describe accents?

First amendment and employment: Can a police department terminate an officer for speech?

How to assign many blockers at the same time?

Can the ground attached to neutral fool a receptacle tester?

Why did I get only 5 points even though I won?

Word for an event that will likely never happen again

Graphs for which a calculus student can reasonably compute the arclength

Does Molecular Weight of a Gas affect its lifting properties at the same velocity over the same wing?

What is this 1990s horror game of otherworldly PCs dealing with monsters on modern Earth?

TEMPO: play a sound in animated GIF/PDF/SVG

A Non Math Puzzle. What is the middle number?

Boss wants me to ignore a software license

How can God warn people of the upcoming rapture without disrupting society?

What does the phrase "pull off sick wheelies and flips" mean here?

Why aren’t there water shutoff valves for each room?

Is there a Morita cocycle for the mapping class group Mod(g,n) when n > 1?

Confusing systemd behaviour with OnFailure= and Restart=

Make systemd reload only single openvpn process and not the whole groupSystemd: service doesn't restart with WatchdogSec option setmysql service restarted during user being connected lead to failing serviceHow to log Systemd Watchdog restartsHow to configure nodejs app to run via systemd on CentOS7?Why x0vncserver is not starting at boot?systemd: finish the execution of custom shell script before starting nginxsystemd: restart service if any of the forked processes are killedSystemd - How to restart a node process from userspace that uses passwordless sudoHow to run systemctl command inside ExecStart?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I'm using systemd 231 in an embedded system, and I'm trying to create a service that monitors a hardware component in the system. Here's a rough description of what I'm trying to do:

When the service, foo.service, is started, it launches an application, foo_app.

foo_appmonitors the hardware component, running continuously.

If foo_app detects a hardware failure, it exits with a return code of 1. This should trigger a system reboot.

If foo_app crashes, systemd should relaunch foo_app.

If foo_app repeatedly crashes, systemd should reboot the system.

Here's my attempt at implementing this as a service:

[Unit]

Description=Foo Hardware Monitor



# If the application fails 3 times in 30 seconds, something has gone wrong,

# and the state of the hardware can't be guaranteed. Reboot the system here.

StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=reboot



# StartLimitAction=reboot will reboot the box if the app fails repeatedly,

# but if the app exits voluntarily, the reboot should trigger immediately

OnFailure=systemd-reboot.service



[Service]

ExecStart=/usr/bin/foo_app



# If the app fails from an abnormal condition (e.g. crash), try to

# restart it (within the limits of StartLimit*).

Restart=on-abnormal

From the documentation (systemd.service and systemd.service), I'd expect that if I kill foo_app in a way such that Restart=on-abnormal is triggered (e.g. killall -9 foo_app), systemd should give priority to Restart=on-abnormal over OnFailure=systemd-reboot.service and not start systemd-reboot.service.

However, this isn't what I'm seeing. As soon as I kill foo_app once, the system immediately reboots.

Here are some relevant snippets from the docs:

OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state. A service unit using Restart= enters the failed state only after the start limits are reached.

Restart=

[snip] Note that service restart is subject to unit start rate limiting configured with StartLimitIntervalSec= and StartLimitBurst=, see systemd.unit(5) for details. A restarted service enters the failed state only after the start limits are reached.

The documentation seems pretty clear:

Services specified in OnFailure should only run when a service enters the "failed" state

A service should only enter the "failed" state after StartLimitIntervalSec and StartLimitBurst are satisfied.

This is not what I'm seeing.

To confirm this, I edited my service file to the following:

[Unit]

Description=Foo Hardware Monitor  



StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=none



[Service]

ExecStart=/usr/bin/foo_app

Restart=on-abnormal

By removing OnFailure and setting StartLimitAction=none, I was able to see how systemd is responding to foo_app dying. Here's a test where I repeatedly kill foo_app with SIGKILL.

[root@device ~]

# systemctl start foo.service

[root@device ~]

# journalctl -f -o cat -u foo.service &

[1] 2107

Started Foo Hardware Monitor.

[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

foo.service: Start request repeated too quickly

Failed to start foo.

foo.service: Unit entered failed state.

foo.service: Failed with result 'start-limit-hit'

This makes sense or the most part. When foo_app is killed, systemd restarts it until StartLimitBurst is hit and then gives up. This is what I want, except with StartLimitAction=reboot.

What's unusual is that systemd prints foo.service: Unit entered failed state. whenever foo_app is killed, even if it is about to be restarted through Restart=on-abnormal. This seems to directly contradict these lines from the docs quoted above:

A service unit using Restart= enters the failed state only after the start limits are reached.

A restarted service enters the failed state only after the start limits are reached.

All of this has left me pretty confused. Am I misunderstanding any of these systemd options? Is this a systemd bug? Any help is appreciated.

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

add a comment |

I'm using systemd 231 in an embedded system, and I'm trying to create a service that monitors a hardware component in the system. Here's a rough description of what I'm trying to do:

When the service, foo.service, is started, it launches an application, foo_app.

foo_appmonitors the hardware component, running continuously.

If foo_app detects a hardware failure, it exits with a return code of 1. This should trigger a system reboot.

If foo_app crashes, systemd should relaunch foo_app.

If foo_app repeatedly crashes, systemd should reboot the system.

Here's my attempt at implementing this as a service:

[Unit]

Description=Foo Hardware Monitor



# If the application fails 3 times in 30 seconds, something has gone wrong,

# and the state of the hardware can't be guaranteed. Reboot the system here.

StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=reboot



# StartLimitAction=reboot will reboot the box if the app fails repeatedly,

# but if the app exits voluntarily, the reboot should trigger immediately

OnFailure=systemd-reboot.service



[Service]

ExecStart=/usr/bin/foo_app



# If the app fails from an abnormal condition (e.g. crash), try to

# restart it (within the limits of StartLimit*).

Restart=on-abnormal

However, this isn't what I'm seeing. As soon as I kill foo_app once, the system immediately reboots.

Here are some relevant snippets from the docs:

OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state. A service unit using Restart= enters the failed state only after the start limits are reached.

Restart=

[snip] Note that service restart is subject to unit start rate limiting configured with StartLimitIntervalSec= and StartLimitBurst=, see systemd.unit(5) for details. A restarted service enters the failed state only after the start limits are reached.

The documentation seems pretty clear:

Services specified in OnFailure should only run when a service enters the "failed" state

A service should only enter the "failed" state after StartLimitIntervalSec and StartLimitBurst are satisfied.

This is not what I'm seeing.

To confirm this, I edited my service file to the following:

[Unit]

Description=Foo Hardware Monitor  



StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=none



[Service]

ExecStart=/usr/bin/foo_app

Restart=on-abnormal

By removing OnFailure and setting StartLimitAction=none, I was able to see how systemd is responding to foo_app dying. Here's a test where I repeatedly kill foo_app with SIGKILL.

[root@device ~]

# systemctl start foo.service

[root@device ~]

# journalctl -f -o cat -u foo.service &

[1] 2107

Started Foo Hardware Monitor.

[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

foo.service: Start request repeated too quickly

Failed to start foo.

foo.service: Unit entered failed state.

foo.service: Failed with result 'start-limit-hit'

This makes sense or the most part. When foo_app is killed, systemd restarts it until StartLimitBurst is hit and then gives up. This is what I want, except with StartLimitAction=reboot.

A service unit using Restart= enters the failed state only after the start limits are reached.

A restarted service enters the failed state only after the start limits are reached.

All of this has left me pretty confused. Am I misunderstanding any of these systemd options? Is this a systemd bug? Any help is appreciated.

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

add a comment |

I'm using systemd 231 in an embedded system, and I'm trying to create a service that monitors a hardware component in the system. Here's a rough description of what I'm trying to do:

When the service, foo.service, is started, it launches an application, foo_app.

foo_appmonitors the hardware component, running continuously.

If foo_app detects a hardware failure, it exits with a return code of 1. This should trigger a system reboot.

If foo_app crashes, systemd should relaunch foo_app.

If foo_app repeatedly crashes, systemd should reboot the system.

Here's my attempt at implementing this as a service:

[Unit]

Description=Foo Hardware Monitor



# If the application fails 3 times in 30 seconds, something has gone wrong,

# and the state of the hardware can't be guaranteed. Reboot the system here.

StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=reboot



# StartLimitAction=reboot will reboot the box if the app fails repeatedly,

# but if the app exits voluntarily, the reboot should trigger immediately

OnFailure=systemd-reboot.service



[Service]

ExecStart=/usr/bin/foo_app



# If the app fails from an abnormal condition (e.g. crash), try to

# restart it (within the limits of StartLimit*).

Restart=on-abnormal

However, this isn't what I'm seeing. As soon as I kill foo_app once, the system immediately reboots.

Here are some relevant snippets from the docs:

OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state. A service unit using Restart= enters the failed state only after the start limits are reached.

Restart=

[snip] Note that service restart is subject to unit start rate limiting configured with StartLimitIntervalSec= and StartLimitBurst=, see systemd.unit(5) for details. A restarted service enters the failed state only after the start limits are reached.

The documentation seems pretty clear:

Services specified in OnFailure should only run when a service enters the "failed" state

A service should only enter the "failed" state after StartLimitIntervalSec and StartLimitBurst are satisfied.

This is not what I'm seeing.

To confirm this, I edited my service file to the following:

[Unit]

Description=Foo Hardware Monitor  



StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=none



[Service]

ExecStart=/usr/bin/foo_app

Restart=on-abnormal

By removing OnFailure and setting StartLimitAction=none, I was able to see how systemd is responding to foo_app dying. Here's a test where I repeatedly kill foo_app with SIGKILL.

[root@device ~]

# systemctl start foo.service

[root@device ~]

# journalctl -f -o cat -u foo.service &

[1] 2107

Started Foo Hardware Monitor.

[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

foo.service: Start request repeated too quickly

Failed to start foo.

foo.service: Unit entered failed state.

foo.service: Failed with result 'start-limit-hit'

This makes sense or the most part. When foo_app is killed, systemd restarts it until StartLimitBurst is hit and then gives up. This is what I want, except with StartLimitAction=reboot.

A service unit using Restart= enters the failed state only after the start limits are reached.

A restarted service enters the failed state only after the start limits are reached.

All of this has left me pretty confused. Am I misunderstanding any of these systemd options? Is this a systemd bug? Any help is appreciated.

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

I'm using systemd 231 in an embedded system, and I'm trying to create a service that monitors a hardware component in the system. Here's a rough description of what I'm trying to do:

When the service, foo.service, is started, it launches an application, foo_app.

foo_appmonitors the hardware component, running continuously.

If foo_app detects a hardware failure, it exits with a return code of 1. This should trigger a system reboot.

If foo_app crashes, systemd should relaunch foo_app.

If foo_app repeatedly crashes, systemd should reboot the system.

Here's my attempt at implementing this as a service:

[Unit]

Description=Foo Hardware Monitor



# If the application fails 3 times in 30 seconds, something has gone wrong,

# and the state of the hardware can't be guaranteed. Reboot the system here.

StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=reboot



# StartLimitAction=reboot will reboot the box if the app fails repeatedly,

# but if the app exits voluntarily, the reboot should trigger immediately

OnFailure=systemd-reboot.service



[Service]

ExecStart=/usr/bin/foo_app



# If the app fails from an abnormal condition (e.g. crash), try to

# restart it (within the limits of StartLimit*).

Restart=on-abnormal

However, this isn't what I'm seeing. As soon as I kill foo_app once, the system immediately reboots.

Here are some relevant snippets from the docs:

OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state. A service unit using Restart= enters the failed state only after the start limits are reached.

Restart=

[snip] Note that service restart is subject to unit start rate limiting configured with StartLimitIntervalSec= and StartLimitBurst=, see systemd.unit(5) for details. A restarted service enters the failed state only after the start limits are reached.

The documentation seems pretty clear:

Services specified in OnFailure should only run when a service enters the "failed" state

A service should only enter the "failed" state after StartLimitIntervalSec and StartLimitBurst are satisfied.

This is not what I'm seeing.

To confirm this, I edited my service file to the following:

[Unit]

Description=Foo Hardware Monitor  



StartLimitBurst=3

StartLimitIntervalSec=30

StartLimitAction=none



[Service]

ExecStart=/usr/bin/foo_app

Restart=on-abnormal

By removing OnFailure and setting StartLimitAction=none, I was able to see how systemd is responding to foo_app dying. Here's a test where I repeatedly kill foo_app with SIGKILL.

[root@device ~]

# systemctl start foo.service

[root@device ~]

# journalctl -f -o cat -u foo.service &

[1] 2107

Started Foo Hardware Monitor.

[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

Started foo.



[root@device ~]

# killall -9 foo_app

foo.service: Main process exited, code=killed, status=9/KILL

foo.service: Unit entered failed state.

foo.service: Failed with result 'signal'

foo.service: Service hold-off time over, scheduling restart.

Stopped foo.

foo.service: Start request repeated too quickly

Failed to start foo.

foo.service: Unit entered failed state.

foo.service: Failed with result 'start-limit-hit'

This makes sense or the most part. When foo_app is killed, systemd restarts it until StartLimitBurst is hit and then gives up. This is what I want, except with StartLimitAction=reboot.

A service unit using Restart= enters the failed state only after the start limits are reached.

A restarted service enters the failed state only after the start limits are reached.

All of this has left me pretty confused. Am I misunderstanding any of these systemd options? Is this a systemd bug? Any help is appreciated.

systemd

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

asked Feb 8 '18 at 23:50

Matt K

1531 silver badge6 bronze badges

add a comment |

1 Answer
1

active

oldest

votes

Edit 2019/08/12: Per therealjumbo's comment, the fix for this has been merged and released with systemd v239, thus, if you aren't pinned to a version due to your distribution (looking at you CentOS) update and make merry!

TL;DR - Known documentation issue, currently still an outstanding issue for the systemd project

It turns out, since you asked this question, this has been reported and identified as a discrepancy in systemd between the documentation and the actual behavior. In my understanding (and my reading of the github issue) your expectation and the documentation match, so you are not crazy.

Currently systemd sets the state to failed after every attempted start, regardless of whether the start limit has been reached. In the issue the OP wrote an amusing anecdote about learning to ride a bike that I highly suggest taking a gander at.

edited 12 hours ago

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

1

FYI, the upstream code was fixed to match the documentation. The fix was released in v239, PR is here: github.com/systemd/systemd/pull/9158 Which is probably what most of us wanted.

– therealjumbo
Aug 9 at 20:46

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f422933%2fconfusing-systemd-behaviour-with-onfailure-and-restart%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

TL;DR - Known documentation issue, currently still an outstanding issue for the systemd project

edited 12 hours ago

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

1

FYI, the upstream code was fixed to match the documentation. The fix was released in v239, PR is here: github.com/systemd/systemd/pull/9158 Which is probably what most of us wanted.

– therealjumbo
Aug 9 at 20:46

add a comment |

TL;DR - Known documentation issue, currently still an outstanding issue for the systemd project

edited 12 hours ago

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

1

FYI, the upstream code was fixed to match the documentation. The fix was released in v239, PR is here: github.com/systemd/systemd/pull/9158 Which is probably what most of us wanted.

– therealjumbo
Aug 9 at 20:46

add a comment |

TL;DR - Known documentation issue, currently still an outstanding issue for the systemd project

edited 12 hours ago

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

TL;DR - Known documentation issue, currently still an outstanding issue for the systemd project

edited 12 hours ago

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

edited 12 hours ago

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

answered Mar 28 '18 at 20:23

cunninghamp3

5714 silver badges16 bronze badges

1

FYI, the upstream code was fixed to match the documentation. The fix was released in v239, PR is here: github.com/systemd/systemd/pull/9158 Which is probably what most of us wanted.

– therealjumbo
Aug 9 at 20:46

add a comment |

1

FYI, the upstream code was fixed to match the documentation. The fix was released in v239, PR is here: github.com/systemd/systemd/pull/9158 Which is probably what most of us wanted.

– therealjumbo
Aug 9 at 20:46

FYI, the upstream code was fixed to match the documentation. The fix was released in v239, PR is here: github.com/systemd/systemd/pull/9158 Which is probably what most of us wanted.

– therealjumbo
Aug 9 at 20:46

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs