Bug 781106

Summary: openvpn needs HUP upon resume
Product: [openSUSE] openSUSE Tumbleweed Reporter: Jiri Slaby <jslaby>
Component: NetworkAssignee: Marius Tomaschewski <mt>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bbrunner, fcrozat
Version: 13.1 Milestone 0   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Jiri Slaby 2012-09-19 08:31:02 UTC
openvpn used to work fine with respect to suspend/resume. The link was usable afterwards. Now, this is no longer the case. The link, after resume, looks like
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
    link/none 

instead of
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
    link/none 
    inet 10.100.200.33 peer 10.100.200.1/32 scope global tun0

and I have to send a HUP signal to the openvpn process (USR1 is not enough). Now, I'm doing it from pm-utils' sleep scripts automatically. But this is only a workaround.

I'm using openvpn from network:vpn.
Comment 1 Jiri Slaby 2012-09-19 08:33:44 UTC
One more note, I use linux-next kernel. If you feel this is a kernel regression, let me know.
Comment 2 Marius Tomaschewski 2012-09-20 08:11:34 UTC
(In reply to comment #1)
> One more note, I use linux-next kernel. If you feel this is a kernel
> regression, let me know.

No, I don't think it is a regression.
You can tune the defaults by setting a shorter "ping-restart" option:

From "man openvpn":

--ping-restart n
       Similar to --ping-exit, but trigger  a  SIGUSR1  restart
       after  n  seconds  pass  without  reception of a ping or
       other packet from remote.
[...]
       In  client  mode, the --ping-restart parameter is set to
       120 seconds by default.
[...]

SIGHUP Cause OpenVPN to close all TUN/TAP and  network  connec-
       tions, restart, re-read the configuration file (if any),
       and reopen TUN/TAP and network connections.

SIGUSR1
       Like SIGHUP, except don't  re-read  configuration  file,  
       and  possibly don't close and reopen TUN/TAP device, re-
       read key files, preserve local IP address/port, or  pre-
       serve most recently authenticated remote IP address/port
       based on --persist-tun, --persist-key,  --persist-local-
       ip,  and  --persist-remote-ip  options respectively (see
       above).

       This signal may also be internally generated by a  time-  
       out condition, governed by the --ping-restart option.

       This signal, when combined with --persist-remote-ip, may
       be sent when the underlying  parameters  of  the  host's 
       network interface change such as when the host is a DHCP 
       client and is assigned a new IP address.  See --ipchange
       above for more information.

So by default, it need 120 seconds to recover.

You can use "/etc/init.d/openvpn reopen" to send a USR1 to all
running instances.

Hmm... there seems to be a bug in the init script -- reopen is
also in the reload case, so it will never send USR1, but HUP
(which is more intrusive / closes & restarts running conns).

On the another side, a resume is different event than the other
reconnects that ping-reconnect handles (e.g. external IP changed),
where a "long" delay of 120 secs makes sense.

So it would make sense to add a suspend/resume script to pm-utils:

hibernate|suspend)
      test -x /etc/init.d/openvpn && \
      /etc/init.d/openvpn status &>/dev/null && \
      reopen_on_resume=yes || reopen_on_resume=no

      savestate "reopen_on_resume" "$reopen_on_resume"
;;
thaw|resume)
      restorestate "reopen_on_resume"
      test "x$reopen_on_resume" = "xyes" && \
         /etc/init.d/openvpn reopen
;;

Vojtech (pm-utils maintainer), what do you think?
Comment 3 Marius Tomaschewski 2012-09-20 10:57:21 UTC
reopen init-script fix is in request id 135135 => factory.

The rest is an enhancement of pm-utils.
Comment 4 Bernhard Wiedemann 2012-09-20 11:00:30 UTC
This is an autogenerated message for OBS integration:
This bug (781106) was mentioned in
https://build.opensuse.org/request/show/135135 Factory / openvpn
Comment 5 Vojtech Dziewiecki 2012-09-20 12:53:28 UTC
Sure, I will add a hook like that. I'd probably call it 48openvpn so that it reopens just after network is up.

/etc/init.d/openvpn status/reopen works the same on systemd and systemVinit right?
Comment 6 Jiri Slaby 2012-09-20 13:53:25 UTC
(In reply to comment #0)
> and I have to send a HUP signal to the openvpn process (USR1 is not enough).

See this               ^^^                                ^^^^^^^^^^^^^^^^^^
Comment 7 Jiri Slaby 2012-09-20 13:56:17 UTC
(In reply to comment #0)
> openvpn used to work fine with respect to suspend/resume.

And also this ^^^^. This used to work. Some time ago it stopped. Even if I wait more than 120 s, vpn won't recover.
Comment 8 Vojtech Dziewiecki 2012-09-20 15:50:49 UTC
Yeah so this pm-utils hook won't fix the problem.
Sending HUP from a hook would be an ugly hack, it would be much better if we could fix openvpn so that it handles  SIGUSR1 properly.
Comment 9 Marius Tomaschewski 2012-09-20 16:07:15 UTC
(In reply to comment #7)
> (In reply to comment #0)
> > openvpn used to work fine with respect to suspend/resume.
> 
> And also this ^^^^. This used to work. Some time ago it stopped. Even if I wait
> more than 120 s, vpn won't recover.

I've overlooked the "USR1 is not enough", sorry!

Any idea which version were working with USR1 --2.2.1 from 12.1? / which kernel?
Comment 10 Jiri Slaby 2012-09-20 18:02:59 UTC
(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #0)
> > > openvpn used to work fine with respect to suspend/resume.
> > 
> > And also this ^^^^. This used to work. Some time ago it stopped. Even if I wait
> > more than 120 s, vpn won't recover.
> 
> I've overlooked the "USR1 is not enough", sorry!
> 
> Any idea which version were working with USR1 --2.2.1 from 12.1? / which
> kernel?

From logs it looks like that the first time I had to restart openvpn after resume on Jul 13th. I had this kernel since Jun 28th: 3.5.0-rc4-next-20120628. And I resumed 13 times since 28th till 13th without restarting vpn, so I think this is not a kernel issue.
Comment 11 Jiri Slaby 2012-09-20 19:17:15 UTC
I booted 3.4.11, 3.5.4 and 3.6-rc6, all work. So this is a kernel issue I'm currently bisecting.
Comment 12 Jiri Slaby 2012-09-20 20:24:33 UTC
(In reply to comment #11)
> I booted 3.4.11, 3.5.4 and 3.6-rc6, all work. So this is a kernel issue I'm
> currently bisecting.

And while bisecting I've found out, that it is not the kernel. It's a matter of timing. It happens on the first or second invocation of suspend/resume.

It seems to be accompanied with a warning to the console when network is restarted:
tun0 <some ioctl I don't remember> Device or resource busy
Comment 13 Jiri Slaby 2012-09-21 07:44:53 UTC
(In reply to comment #12)
> tun0 <some ioctl I don't remember> Device or resource busy

network[9389]: tun0      TUNSETIFF: Device or resource busy

more precisely.
Comment 14 Marius Tomaschewski 2012-10-18 13:13:43 UTC
Jiri,

would you attach your configuration or it sent to me by mail, please?

TUNSETIFF creates a new tun interface... could you tell me which matter
of timing it is, that the creation of the interface fails?
Comment 15 Jiri Slaby 2012-10-18 13:22:35 UTC
(In reply to comment #14)
> would you attach your configuration or it sent to me by mail, please?

It's a simple setup:
client
dev tun
proto udp
remote gate.suse.cz
nobind
persist-key
persist-tun
ns-cert-type server
ca /etc/openvpn/SUSE/SUSE-Prague-ca.crt
cert /etc/openvpn/SUSE/SUSE-Prague-jslaby.crt
key /etc/openvpn/SUSE/SUSE-Prague-jslaby.key
auth-user-pass
comp-lzo
verb 3
explicit-exit-notify 5

I added explicit-exit-notify even recently, I commented that out now to see if that makes a difference.

> TUNSETIFF creates a new tun interface... could you tell me which matter
> of timing it is, that the creation of the interface fails?

I don't understand...

If I add this
  (sleep 40; killall -HUP openvpn) &
to /etc/pm/sleep.d/02vpn as a resume/thaw part, it works. If I reduce the sleep time, it does not work.
Comment 19 Swamp Workflow Management 2012-10-29 19:08:53 UTC
openSUSE-RU-2012:1411-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 692440,781106
CVE References: 
Sources used:
openSUSE 12.2 (src):    openvpn-2.2.2-3.4.1
openSUSE 12.1 (src):    openvpn-2.2.1-18.4.1
openSUSE 11.4 (src):    openvpn-2.1.4-11.30.1
Comment 20 Jiri Slaby 2013-01-09 11:58:25 UTC
This can be closed now, right?
Comment 21 Marius Tomaschewski 2013-01-17 10:19:48 UTC
No, it is not yet fixed (just the init script). There are several issues
with resume, especially when systemd is used.
Comment 22 Bernhard Wiedemann 2013-01-28 14:00:21 UTC
This is an autogenerated message for OBS integration:
This bug (781106) was mentioned in
https://build.opensuse.org/request/show/150169 Factory / sysconfig
Comment 23 Marius Tomaschewski 2013-01-28 14:08:08 UTC
OK, it are the network restarts (of underlying interface) which break it.
Further, daemons started like "rcopenvpn start foobar", were also never
considered as parts of the systemd openvpn.service.

I've reworked the suspend/resume hooks in OBS request 150169 [sysconfig]
and changed to join the service cgroup in openvpn init script in OBS
request 150171 [openvpn].

Adoption of the network script inside of the pm-utils is requested in
https://gitorious.org/opensuse/pm-utils/merge_requests/2 and follows.
Comment 24 Marius Tomaschewski 2013-01-28 14:12:00 UTC
Ahm... resolved too fast.
Comment 25 Marius Tomaschewski 2013-01-28 14:14:41 UTC
Benjamin,
IMO it would make sense to backport the relevant changes to 12.2 as soon as
we've verified that it works on 12.3 (-beta2). OK?
Comment 26 Jiri Slaby 2013-01-28 14:16:32 UTC
(In reply to comment #23)
> Adoption of the network script inside of the pm-utils is requested in
> https://gitorious.org/opensuse/pm-utils/merge_requests/2 and follows.

As far as I understand, pm-utils scripts are no longer called. See bnc#790157.
Comment 27 Marius Tomaschewski 2013-01-28 14:21:01 UTC
(In reply to comment #26)
> (In reply to comment #23)
> > Adoption of the network script inside of the pm-utils is requested in
> > https://gitorious.org/opensuse/pm-utils/merge_requests/2 and follows.
> 
> As far as I understand, pm-utils scripts are no longer called. See bnc#790157.

On my factory there is no openvpn pm-utils script any more, but there
were a network script, which were called and which I've adopted.
Comment 28 Benjamin Brunner 2013-01-28 14:54:16 UTC
Marius, feel free to create a maintenancerequest for 12.2. with the backported fix as soon as it's tested. Thanks.
Comment 29 Bernhard Wiedemann 2013-01-28 15:00:18 UTC
This is an autogenerated message for OBS integration:
This bug (781106) was mentioned in
https://build.opensuse.org/request/show/150171 Factory / openvpn
Comment 30 Bernhard Wiedemann 2013-10-31 20:00:16 UTC
This is an autogenerated message for OBS integration:
This bug (781106) was mentioned in
https://build.opensuse.org/request/show/205451 12.2 / openvpn