Bug 728774

Summary: during boot: /etc/rc.status: line 54: /dev/stderr: No such device or address
Product: [openSUSE] openSUSE Tumbleweed Reporter: Danny van Delft <d.a.van.delft>
Component: KernelAssignee: Jeff Mahoney <jeffm>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P2 - High CC: bart.vanassche, bruno, cwh, dieter.jurzitza, dvaleev, fcrozat, korossy, lnussel, luuk34, manfred.h, mmarek, opensuse, per, piny, ro, someuniquename, support, suse-beta, werner, wvvelzen
Version: 201408*   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: excerpt from /var/log/messages with "OUTPUT" output mentioned in comment 27
The used rc.status from comment 27, fwiw
/etc/rc.status I suggest
[PATCH] vfs: allow /proc/pid/fd to dup a socket

Description Danny van Delft 2011-11-07 22:03:20 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; nl; rv:1.9.2.23) Gecko/20110920 SUSE/3.6.23-0.2.1 Firefox/3.6.23

This message (present in /var/log/messages) is given a couple of times during system start up, for example by ntp and mysql. Not sure if it is harmful, but may be indicative of wrong order/dependency of services at boot. This is with systemd start up, have not tested yet with sysvinit.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 Danny van Delft 2011-11-07 23:06:17 UTC
On the other hand, perhaps it's just a bug in /etc/rc.status: shouldn't use /dev/stderr. As a test I ran:

thuis@robinia:~> echo 'echo hi there err > /dev/stderr; echo anybody home out > /dev/stdout ; echo who 2 >&2 ; echo why 1' | at 23:58
warning: commands will be executed using /bin/sh
job 4 at 2011-11-07 23:58

... waiting for 23:58

 mailx
Heirloom mailx version 12.5 7/5/10.  Type ? for help.
...
 N  7 thuis@robinia.home Mon Nov  7 23:58   17/570   Output from your job        4
? 7
Message  7:
...
sh: regel 66: /dev/stderr: Toegang geweigerd
sh: regel 66: /dev/stdout: Toegang geweigerd
who 2
why 1

So the redirect to /dev variants don't work.
Comment 2 Wilfred van Velzen 2012-02-01 10:50:39 UTC
It also happens after system startup, when you for instance restart mysqld ...

Btw: This isn't fixed yet in the final/current 12.1 release.
Comment 3 Wilfred van Velzen 2012-02-01 12:05:57 UTC
I've replaced '>/dev/stderr' with '1>&2' on line 54 of /etc/rc.status -> Problem gone!

Please fix this in the distribution package...
Comment 4 Ruediger Oertel 2012-02-01 12:19:23 UTC
werner, your code. can you comment on this please ?
Comment 5 Dr. Werner Fink 2012-02-01 14:19:50 UTC
/dev/stderr has to exist as it is standard I/O ... guess, udev and/or
initrd do not work well
Comment 6 Frederic Crozat 2012-02-01 15:01:06 UTC
from a quick look at initrd, /dev/stderr symlink is created there and should be available on the booting system, since /dev is moved to rootfs (/bin/mount --move /dev /root/dev )
Comment 7 Wilfred van Velzen 2012-02-01 15:06:00 UTC
It's not just a booting problem. The message is also reported in /var/log/messages on a running system when you are restarting mysqld (# rcmysql restart) for instance!
Comment 8 Frederic Crozat 2012-02-01 15:13:13 UTC
so, it means something is incorrectly removing the symlink from /dev ..
Comment 9 Wilfred van Velzen 2012-02-01 15:57:33 UTC
The symlink exists on a running sever when I check from the commandline in a bash shell:

# ls -l /dev/stderr
lrwxrwxrwx 1 root root 15 May 19  2011 /dev/stderr -> /proc/self/fd/2
# ls -l /proc/self/fd/2
lrwx------ 1 root root 64 Feb  1 16:55 /proc/self/fd/2 -> /dev/pts/4
# ls -l /dev/pts/4
crw--w---- 1 root tty 136, 4 Feb  1 16:55 /dev/pts/4
Comment 10 Wilfred van Velzen 2012-02-01 16:02:06 UTC
Sorry this was on a 10.3 machine (I have too many open putty screens ;))

# ls -l /dev/stderr
lrwxrwxrwx 1 root root 4 Jan 31 13:46 /dev/stderr -> fd/2
# ls -l /dev/fd/2
lrwx------ 1 root root 64 Feb  1 17:00 /dev/fd/2 -> /dev/pts/2
# ls -l /dev/pts/2
crw--w---- 1 root tty 136, 2 Feb  1 17:00 /dev/pts/2
Comment 11 Frederic Crozat 2012-02-01 16:13:08 UTC
could it be related to some chroot (since ntp and postfix use such thing) ?
Comment 12 Wilfred van Velzen 2012-02-01 17:06:54 UTC
I can further add:

/etc/init.d/mysql status or stop

Doesn't trigger this behaviour. But...

/etc/init.d/mysql start (or restart)

does! However '/etc/init.d/mysql start' does output 'redirecting to systemctl' to stderr. So rc.status seems to be called a second time, when the error line is written to /var/log/messages, and it might be running in some chroot environment by that time?
Comment 13 Ruediger Oertel 2012-03-19 11:40:54 UTC
I'm seeing this on factory with "systemctl restart mysql.service"
Comment 14 Ruediger Oertel 2012-03-19 11:45:41 UTC
added debugging code to rc.status:

+        ls -l /dev/stderr /dev/fd/2 | logger

which shows:

Mar 19 12:43:19 fatou logger: lrwx------ 1 root root 64 Mar 19 12:43 /dev/fd/2 -> socket:[664286]
Mar 19 12:43:19 fatou logger: lrwxrwxrwx 1 root root  4 Mar  7 15:53 /dev/stderr -> fd/2
Mar 19 12:43:19 fatou mysql[1753]: /etc/rc.status: line 58: /dev/stderr: No such device or address

looks systemd specific.

Frederic: I'm pushing this over to you for the moment ...
Comment 15 Frederic Crozat 2012-03-19 11:59:59 UTC
This looks similar to bnc#732910

It would mean mysql is closing stderr..
Comment 16 Wilfred van Velzen 2012-03-19 12:28:09 UTC
Maybe it's not fixing the underlying problem, but when you replace '>/dev/stderr' with '1>&2' on line 54 of /etc/rc.status, it's fixed!

Btw: '1>&2' is used in 5 other places in /etc/rc.status, so why this single line needs to use /dev/stderr is unclear to me...
Comment 17 Luuk V 2012-05-26 09:09:07 UTC
The solution Wilfred gave on 2012-03-19 is working OK!
Why is a solution not being implemented, after two month?
Who is waiting for what kind of info?
Comment 18 Wilfred van Velzen 2012-05-26 10:24:50 UTC
Good question!
Comment 19 Dr. Werner Fink 2012-05-29 08:21:37 UTC
As already told this is not the problem of /etc/rc.status as /dev/stderr
has to exist!  And dupping /dev/stderr from /dev/stdout by using  '1>&2'
is not a fix but only a workaround, the question *WHO* or *WHAT* is closing
/dev/stderr without any need?
Comment 20 Dr. Werner Fink 2012-05-29 08:24:00 UTC
As it seems that the mysql boot script is destroying the symbolic link
/dev/stderr I'd like to know from the current mysql maintainers which
script does this do and how to fix this.
Comment 21 Dr. Werner Fink 2012-05-29 08:30:53 UTC
A real workarpound would be if /etc/rc.status would restore the missing link(s)
below /dev/ and /dev/fd/

   test -h /dev/fd     || ln -sf /proc/self/fd /dev/fd
   test -h /dev/stderr || ln -sf fd/2 /dev/stderr
   test -h /dev/stdin  || ln -sf fd/1 /dev/stdin
   test -h /dev/stdout || ln -sf fd/0 /dev/stdout

Beside this we could change the usage of /dev/stderr to use '1>&2' but nevertheless if should detected *WHO* or *WHAT* is closing/removing the
system symbloc link /dev/stderr without any need.
Comment 22 Danny van Delft 2012-05-29 08:48:53 UTC
(In reply to comment #20)
> As it seems that the mysql boot script is destroying the symbolic link
> /dev/stderr I'd like to know from the current mysql maintainers which
> script does this do and how to fix this.

As a reminder, it is not only mysql that gives this message at boot: the ntp script as well.
Comment 23 Wilfred van Velzen 2012-05-29 08:56:26 UTC
Although it's not fixing the underlying problem. I think it is a real (and easy) fix for rc.status, that shouldn't be delayed, because it is not fixing the underlying problem. '1>&2' is used in 5 other places in rc.status, '/dev/stderr' only in 1. So it's good to "standardize" to this way of redirecting to stderr anyway!
Comment 25 Dr. Werner Fink 2012-05-29 09:50:30 UTC
(In reply to comment #22)

The ntp script its self does not touch /dev/stderr that is if /dev/stderr
was removed or replaced then this error simply happens to all scripts.
Please note that /etc/rc.status will be sourced/readed at the very first
lines of the most boot scripts.

Maybe it would help to add some lines like

 rc_exit ()

   test -h /dev/stderr || echo "Service $0 has removed /dev/stderr"
   if test -e /dev/stderr -a "$(readlink /dev/stderr)" != fd/2 ; then
       echo "Service $0 has destroyed /dev/stderr"
   fi
   exit $_rc_status_all
 }

into the shell function rc_exit() of /etc/rc.status ... maybe with this
we will detect the waste producer.
Comment 26 Wilfred van Velzen 2012-05-29 11:30:29 UTC
Ok, if you will provide an updated (and tested! ;)) rc.status as attachment here, we can use it to test this...
Comment 27 Danny van Delft 2012-05-29 11:52:32 UTC
(In reply to comment #25)
> (In reply to comment #22)
> 
> The ntp script its self does not touch /dev/stderr that is if /dev/stderr
> was removed or replaced then this error simply happens to all scripts.
> Please note that /etc/rc.status will be sourced/readed at the very first
> lines of the most boot scripts.
> 
> Maybe it would help to add some lines like
> 
>  rc_exit ()
> 
>    test -h /dev/stderr || echo "Service $0 has removed /dev/stderr"
>    if test -e /dev/stderr -a "$(readlink /dev/stderr)" != fd/2 ; then
>        echo "Service $0 has destroyed /dev/stderr"
>    fi
>    exit $_rc_status_all
>  }
> 
> into the shell function rc_exit() of /etc/rc.status ... maybe with this
> we will detect the waste producer.

Done that, but gave no output. So added the following lines to /etc/rc.status, just before the "redirecting to systemctl" is done in the start|stop|... section:
degbug="logger"
echo 'OUTPUT of ls -l /dev/std*' | $degbug
ls -l /dev/std* | $degbug
echo 'OUTPUT of ls -lL /dev/std*' | $degbug
ls -lL /dev/std* | $degbug
echo 'OUTPUT of ls -l /dev/fd/*' | $degbug
ls -l /dev/fd/* | $degbug
echo 'OUTPUT of ls -lL /dev/fd/*' | $degbug
ls -lL /dev/fd/* | $degbug
echo 'OUTPUT of netstat -l -n -p' | $degbug
netstat -l -n -p | $degbug
echo 'OUTPUT of ps axf' | $degbug
ps axf | $degbug
                echo "redirecting to systemctl" >/dev/stderr

I will attach the output of these. In it you'll see the ultimate destination of /dev/stderr:
May 29 13:36:43 postoffice logger: lrwx------ 1 root root 64 May 29 13:36 /dev/fd/2 -> socket:[8947]
May 29 13:36:43 postoffice logger: OUTPUT of ls -lL /dev/fd/*
May 29 13:36:43 postoffice ntp[1644]: ls: cannot access /dev/fd/3: No such file or directory
May 29 13:36:43 postoffice logger: crw-rw-rw- 1 root root 1, 3 May 29 13:36 /dev/fd/0
May 29 13:36:43 postoffice logger: prw------- 1 root root    0 May 29 13:36 /dev/fd/1
May 29 13:36:43 postoffice logger: srwxrwxrwx 1 root root    0 Jan  1  1970 /dev/fd/2

A socket which apparently doesn't exist. HTH
Comment 28 Danny van Delft 2012-05-29 11:53:49 UTC
Created attachment 492773 [details]
excerpt from /var/log/messages with "OUTPUT" output mentioned in comment 27
Comment 29 Danny van Delft 2012-05-29 11:56:22 UTC
Created attachment 492774 [details]
The used rc.status from comment 27, fwiw
Comment 30 Dr. Werner Fink 2012-05-29 12:09:00 UTC
Created attachment 492776 [details]
/etc/rc.status I suggest

with this script all about /dev/fd and /dev/std* will be checked and if
possible it will be repaired.
Comment 31 Dr. Werner Fink 2012-05-29 12:22:54 UTC
(In reply to comment #27)

SysVinit does not use such sockets.  Does this happen with the rc links
that is with e.g. rcntp or during the boot process its self?
Comment 32 Wilfred van Velzen 2012-05-29 12:39:28 UTC
Werner, I just tested your new version of rc.status. It still outputs the error to /var/log/messages, but nothing else. So none of your new 'echo' lines in rc_exit is triggered.

I tested both 'rcntp restart' and 'rcmysql restart' on a running server.

Btw: I'm using systemd.
Comment 33 Dr. Werner Fink 2012-05-29 12:49:49 UTC
(In reply to comment #32)

Guess triggered by Danny's debug session: it happens duing boot/ system start up
only.  And here we may have some other locations for /dev/fd and below
Comment 34 Wilfred van Velzen 2012-05-29 13:10:14 UTC
(In reply to comment #33)
> Guess triggered by Danny's debug session: it happens duing boot/ system start
> up only.

Maybe you misunderstood: The bug is still showing it self, not just during boot!

# date; rcmysql restart
Tue May 29 15:08:41 CEST 2012
redirecting to systemctl

# tail -3 /var/log/messages
2012-05-29T15:08:42+02:00 reposerver mysql[13668]: Shutting down service MySQL ..done
2012-05-29T15:08:42+02:00 reposerver mysql[13721]: /etc/rc.status: line 57: /dev/stderr: No such device or address
2012-05-29T15:08:44+02:00 reposerver mysql[13721]: Starting service MySQL ..done
Comment 35 Danny van Delft 2012-05-29 14:31:45 UTC
(In reply to comment #31)
> (In reply to comment #27)
> 
> SysVinit does not use such sockets.  Does this happen with the rc links
> that is with e.g. rcntp or during the boot process its self?

Both at system boot and when manually invoking rcntp restart.


> SysVinit does not use such sockets.

Fine, but this is under systemd control.
So the question becomes why a symbolic link to a non existing socket is created.
Comment 36 Dr. Werner Fink 2012-05-29 14:36:46 UTC
IMHO a very good question
Comment 37 Danny van Delft 2012-05-29 14:41:24 UTC
(In reply to comment #36)
> IMHO a very good question

> So the question becomes why a symbolic link to a non existing socket is
created.

Or probably more likely, why the socket disappears after the link has been created.
Comment 38 Frederic Crozat 2012-05-29 17:22:54 UTC
systemd is wrapping all services started (including initscripts) to ensure their input and output / error output are correctly handled (either send to syslog, to stdout / null, etc).

I've been able to reduce ntp initscript to a very simple test initscript:
#! /bin/sh
#
### BEGIN INIT INFO
# Provides:       foo
# Requires-Start: $localfs
### END INIT INFO

echo foobar > /dev/stderr
exit 0

and it still triggers the error.

I'm looking at systemd code to try to understand why the socket setup for handling stderr / stdout isn't used properly. Or it is because it is a socket and not a "standard" file descriptor ? Hints welcome..
Comment 39 Dr. Werner Fink 2012-05-30 08:21:39 UTC
(In reply to comment #38)

Hmmm ... AFAICR a simliar problem I had with ksh which uses socketpair(2)
as fast replacement for pipe(2).  The major problem was and currently is
that it is not possible to dup a file descriptor belonging to a socket or
socketpair, whereas you can do this with file descriptors of pipes and 
characters devices. This was one of my reasons to use a pty/tty pair for
my old blogd(8) and also this was done for startpar(8).

In other words: with a socket on stdout you can not do in bash code

            exec 2>&1

as this will lead to an invalid file descriptor.
Comment 40 Frederic Crozat 2012-05-30 13:20:27 UTC
after discussing this with upstream systemd, it is suggested to get the kernel fixed one time for all to handle dup on a socket.
Comment 41 Dieter Jurzitza 2012-06-03 12:49:55 UTC
Hi folks,
only to mention - the very same happens (I guess this is expected ...) with kernel 3.4 ...
I see it with ntp, but anyway.
Just to let you know,
take care




Dieter Jurzitza
Comment 42 Luuk V 2012-06-03 18:07:22 UTC
It seems to happen on a restart too (when 'meclog' is started)

QUESTION:
When/where/how are the devices '/dev/mcelog' and '/dev/std*' created?

opensuse:/etc # grep mcelog /var/log/messages
Apr 14 09:44:01 opensuse mcelog[6022]: Shutting down mcelog... ..done
Apr 14 09:44:01 opensuse mcelog[6030]: Starting mcelog... ..done
Apr 21 08:58:14 opensuse mcelog: mcelog read: No such device
Apr 21 08:58:14 opensuse mcelog[1222]: Starting mcelog... ..done
Apr 24 16:09:58 opensuse mcelog[7182]: Shutting down mcelog... ..done
Apr 24 16:09:58 opensuse mcelog[7190]: Starting mcelog... ..done
Apr 28 13:15:46 opensuse mcelog[1216]: Starting mcelog... ..done
Apr 28 13:15:46 opensuse mcelog: mcelog read: No such device
Apr 28 13:25:22 opensuse mcelog: mcelog read: No such device
Apr 28 13:25:23 opensuse mcelog[1207]: Starting mcelog... ..done
Jun  1 20:12:25 opensuse mcelog[16705]: Shutting down mcelog... ..done
Jun  1 20:12:25 opensuse mcelog[16713]: Starting mcelog... ..done
Jun  2 19:45:56 opensuse mcelog: mcelog read: No such device
Jun  2 19:45:56 opensuse mcelog[1218]: Starting mcelog... ..done
opensuse:/etc # ls -l /dev/mcelog
crw------- 1 root root 10, 227 Jun  2 19:45 /dev/mcelog
opensuse:/etc # ls -l /dev/std*
lrwxrwxrwx 1 root root 4 Jun  2 19:45 /dev/stderr -> fd/2
lrwxrwxrwx 1 root root 4 Jun  2 19:45 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 Jun  2 19:45 /dev/stdout -> fd/1
opensuse:/etc #
Comment 43 Jeff Mahoney 2012-06-04 15:40:22 UTC
The mcelog issue is unrelated. No such device means that the kernel hasn't detected the device matching the device node -- not that the device node has disappeared.
Comment 44 Jeff Mahoney 2012-06-06 19:31:57 UTC
> after discussing this with upstream systemd, it is suggested to get the kernel
> fixed one time for all to handle dup on a socket.

What isn't working correctly with dup on a socket? It's functionally equivalent
to forking. The same struct file (and obviously as a result, the same socket)
just get their refcount incremented and are assigned to an additional fd.

Can you provide a link to the discussion? Perhaps I'm missing something.

/dev/stderr pointing to a socket will *always* return -ENXIO. This is by
design.

From net/socket.c:
/*
 *      In theory you can't get an open on this inode, but /proc provides
 *      a back door. Remember to keep it shut otherwise you'll let the
 *      creepy crawlies in.
 */     

static int sock_no_open(struct inode *irrelevant, struct file *dontcare)
{
        return -ENXIO;
}

It might be enough to allow the open to proceed if the opener owns the socket.
Then it's functionally equivalent to the dup case.

(In reply to comment #3)
> I've replaced '>/dev/stderr' with '1>&2' on line 54 of /etc/rc.status ->
> Problem gone!
> 
> Please fix this in the distribution package...

This may not be the "correct" answer but it _will_ work. The big thing is that
redirecting to /dev/stderr *isn't* the same as as 1>&2. File descriptor 2 is
already open with the 1>&2 case. The file descriptor used for > /dev/stderr
will never be 2.
Comment 45 Jeff Mahoney 2012-06-06 19:41:23 UTC
BTW this can be demonstrated with the following simple program. The open statement will always fail with -ENXIO. dup() has nothing to do with it.

#include <stdio.h>
#include <sys/fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int
main(void)
{
	int ret;
	int fds[2];
	int fd;
	pid_t pid;

	ret = socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
	if (ret < 0) {
		perror("socketpair");
		return 1;
	}

	close(2);
	ret = dup2(fds[0], 2);
	if (ret < 0) {
		printf("dup2 failed; %d %s\n", ret, strerror(errno));
		return 1;
	}

	fd = open("/dev/stderr", O_WRONLY);
	if (fd < 0) {
		printf("open failed; %d %s\n", ret, strerror(errno));
		return 1;
	}

	close(fd);
	close(fds[0]);
	close(fds[1]);
	return 0;
}
Comment 46 Frederic Crozat 2012-06-07 07:55:42 UTC
here is the irc log when discussing with upstream:

<fcrozat>       mezcaler1: hi.. after debugging a little, we found some issues with the stdout / stderr handling (for capturing output to syslog / j
ournal) when used with shell redirection :(
<mezcaler1>     fcrozat: hmm, issues? where?
<fcrozat>       mezcaler1: https://bugzilla.novell.com/show_bug.cgi?id=728774#c39
<fcrozat>       I was hoping it was an issue in our "initscript integration of systemd" but it isn't, unfortunately :(
<mezcaler1>     fcrozat: what does /dev/stderr point to?
<fcrozat>       mezcaler1: /dev/fd/2
<fcrozat>       which points to the socket
<mezcaler1>     fcrozat: i presume /proc/self/fd/2
<fcrozat>       yep
<mezcaler1>     fcrozat: get the kernel fixed if you cannot dup() a scoket by opening /proc/self/fd/<socket>
<mezcaler1>     fcrozat: not going to work around this in userspace
<mezcaler1>     i am not the doktor ;-)
<mezcaler1>     kay: have you seen that?
<mezcaler1>     kay: supposedly opening /proc/self/fd/2 doesn't work if 2 is a socket?
<kay>   or give up the weird tricks in shell scripts :)
<kay>   mezcaler1: yeah, seen it. did not know about that
<fcrozat>       mezcaler1: you caused the disease ! :)
<kay>   mezcaler1: we do not run such shell scripts, i guess :)
<mezcaler1>     kay: we had those too
<mezcaler1>     kay: but we hardly run any shell scripts anymore by default ;-)
<mezcaler1>     fcrozat: but really this deserves to be fixed in the kernel
<mezcaler1>     fcrozat: please reassign to your kernel package
*       fcrozat tries ;)
<mezcaler1>     fcrozat: working arond this in userspace is not an option
<kay>   mezcaler1: we had "exec 2>&1" ? i've not seen that :)
Comment 47 Dr. Werner Fink 2012-06-11 12:09:52 UTC
(In reply to comment #45)

Indeed dup(2) works whereas (re)open the file (descriptor) below /proc/self/
does not.  Nevertheless it seems to be common as I've a simliar bug report
with the ksh which uses socketpair(2) instead of pipe(2) and this fools programs
like diff (compare with bnc #627524) ... and there are more than diff which
uses /proc/sef/fd/{0,1,2} or /dev/std{in,out,err} to check for standard
file descriptors.  The behaviour that this does not work for a socket seems
to linux specific if I understand David Korn correct.
Comment 48 Dr. Werner Fink 2012-07-09 15:08:22 UTC
Q: Would it be possible to use in the kernel the system call sys_dup()
   found in fs/proc/fcntl.c instead of returning -ENXIO in sock_no_open()?
Comment 49 Jeff Mahoney 2012-07-10 02:27:36 UTC
That would result in two file descriptors pointing to two struct files (like when you open the same file twice) instead of two file descriptors pointing to one struct file (like when you dup()). This is because the ->open happens with an already-allocated struct file and there's no way to pass back a different one, which is what would be required.

I'll dig a little deeper for this, but it may be a limitation of the Linux VFS.
Comment 50 Dr. Werner Fink 2012-07-10 11:49:06 UTC
That would require a kind of a callback in the struct socket, wouldn't it?
If something like this is possible in the Linux network stack.
Comment 51 Ludwig Nussel 2012-09-04 13:40:38 UTC
12.2 still affected
Comment 52 Per Jessen 2012-09-06 07:37:01 UTC
In addition to ntp and mysql, the hp-snmp-agents script also has this problem.
Comment 53 P 2012-09-24 13:37:33 UTC
Confirmed in 12.2 x64:

# service mysql start
Job failed. See system journal and 'systemctl status' for details.
#
# tail /var/log/messages
Sep 24 15:33:32 specht mysql[17057]: /etc/rc.status: line 57: /dev/stderr: No such device or address
Sep 24 15:34:02 specht mysql[17057]: Starting service MySQL warning: /var/run/mysql/mysql.sock didn't appear within 30 seconds
Sep 24 15:34:02 specht mysql[17057]: chmod: cannot access '/var/run/mysql/mysqld.pid': No such file or directory
Sep 24 15:34:02 specht mysql[17057]: ..failed
Sep 24 15:34:02 specht systemd[1]: mysql.service: control process exited, code=exited status=1
Sep 24 15:34:02 specht systemd[1]: Unit mysql.service entered failed state.
Comment 54 Dr. Werner Fink 2012-12-03 08:18:41 UTC
*** Bug 780643 has been marked as a duplicate of this bug. ***
Comment 55 Christian Boltz 2013-01-27 16:20:12 UTC
*** Bug 800573 has been marked as a duplicate of this bug. ***
Comment 56 Christian Boltz 2013-01-27 16:21:20 UTC
(In reply to comment #55)
> *** Bug 800573 has been marked as a duplicate of this bug. ***

Note: Bug 800573 contains a patch - attachment 522021 [details]
Comment 57 Wilfred van Velzen 2013-01-28 08:32:23 UTC
(In reply to comment #56)
> Note: Bug 800573 contains a patch - attachment 522021 [details]

The same patch has been suggested in Comment 3 of this bug almost a year ago...
Comment 58 Dr. Werner Fink 2013-01-28 08:37:15 UTC
(In reply to comment #57)

This is a workaround!  The real problem has to be fixed otherwise the normal scripting does not work.  Please note the open via `>/dev/stderr' is valid!
Comment 59 Manfred Hollstein 2013-01-28 17:14:57 UTC
(In reply to comment #58)
> (In reply to comment #57)
> 
> This is a workaround!  The real problem has to be fixed otherwise the normal
> scripting does not work.  Please note the open via `>/dev/stderr' is valid!

I'm not sure I follow you here. What do you mean with "normal scripting"? Something like ">&2" has been valid shell script code for more than 20 years and most scripters are used to it. Using "> /dev/stderr" might look more obvious, but we should make sure that the scripts work correctly, and when "> /dev/stderr" doesn't work occasionally, please open a new report for that issue in general, but lets get this particular bug fixed using a proper fix - even if you call it a workaround.
Comment 60 Don Hughes 2013-05-01 00:38:16 UTC
 fggfgf
Comment 61 Pi Ny 2013-05-10 10:06:18 UTC
(In reply to comment #60)
>  fggfgf

I confirm this bug with ntp (12.2, fresh dup from 12.1) during boot and feel tempted to repeat comment #60.

It’s 15 months now since at least a workaround is known...

WHY can't the workaround be implemented despite it is not THE clean solution until such a thing exists? 

Please don't fight on the back of your users. Now a lot of people are forced to fiddle around in a core start-up script...
Comment 62 Dr. Werner Fink 2013-05-10 10:35:14 UTC
For 12,.3 and above:

 Mon Feb  4 10:52:44 UTC 2013 - werner@suse.de
 - Avoid to stumble over missing /dev/stderr in boot script started
   by systemd (work around bnc#728774o but not solve it)

the question is: when this will be fixed in the kernel.  I guess this will never happen as long there is not enough pressure onto the kernels people (:
Comment 63 Pi Ny 2013-05-10 19:04:19 UTC
(In reply to comment #62)
> For 12,.3 and above:
> 
>  Mon Feb  4 10:52:44 UTC 2013 - werner@suse.de
>  - Avoid to stumble over missing /dev/stderr in boot script started
>    by systemd (work around bnc#728774o but not solve it)
> 
> the question is: when this will be fixed in the kernel.  I guess this will
> never happen as long there is not enough pressure onto the kernels people (:

I jumped forward to 12.3 and I am "Forced To Resort To Astonishment": There it is "fixed" (in terms of the workaround).
Comment 64 Jeff Mahoney 2013-05-10 20:00:28 UTC
No, it's not even on my radar in terms of important things that need fixing. It involves making sockets dupable which isn't really needed outside of this use case. I agree with Manfred. The "workaround" is the proper fix.
Comment 65 Jeff Mahoney 2013-09-27 14:51:20 UTC
Closing as WONTFIX. Implementing socket dup work just to allow /dev/stderr in scripts isn't a great cost/benefit.
Comment 66 Dr. Werner Fink 2013-09-27 15:21:39 UTC
(In reply to comment #65)

Does this mean that Linux is the only system where socketpairs() can not duplicated?  AFAICR this had caused a lot of work in ksh as this requires a lot of workarounds due to the fact that ksh uses socketpairs() and not pipes() in case socketpairs() are faster than pipes().

IMHO there is no reason why socketpairs() should not behave like pipes().

Please simply fix it, thanks.
Comment 67 Jeff Mahoney 2013-10-01 18:22:46 UTC
Created attachment 561090 [details]
[PATCH] vfs: allow /proc/pid/fd to dup a socket

I've asked Miklos for his take on this patch. It could be that I'll get laughed out of the room. It at least works, though.

When a process has one of the stdio file descriptors set to one end
of a socket, it is prohibited from using /dev/std{in,out,err}.

For example, a script that does:
echo "some error" > /dev/stderr

... will get ENXIO returned. The fact that this works for nearly every
other type of file except for sockets has been a source of confusion
among users and developers for some time. Other UNIX-like systems don't
make the distinction between socket and other files in their /dev/fd/
implementation, but this is largely due to using dup() rather than
handling it as symlink-open.

For a variety of reasons, the socket code depends on there being
a 1:1 mapping between a struct file and struct socket. It's possible
to rework the reference counting and export a few more socket
functions to allow a many:1 mapping, but we use the flags associated
with the file to determine if e.g. the socket is nonblocking. Having
multiple files with different semantics operating on the same file
would yield unexpected results.

As a result, the option that's left is to dup() the existing file
descriptor so that we preserve the 1:1 mapping. This is a special
case that is limited to sockets opened by the current process. The
symlink-open behavior has become a de-facto ABI and the dup
behavior can't be used universally as it would surprise users
expecting a /proc/pid/fd file open to return at offset 0 instead
of the f_pos of that file descriptor.

This patch adds the ability for a function in the lookup path to
set a new LOOKUP_DUP_FD flag and stash the file descriptor number
in the nameidata to provide the descriptor to dup after the open
fails.

It's hacky, but short of reworking the entire open path to accomodate
this special case, it's the cleanest solution. The FreeBSD kernel
reserves an int in their equivalent of task_struct for this, which is
even worse.
Comment 68 Dr. Werner Fink 2013-10-02 08:44:42 UTC
(In reply to comment #67)

Thank you very much ... beside this, helping the user space developers should not cause anyone to laugh about 8)

Let's see what happens, e.g. I'd like to knows the opinion of Linus
Comment 69 Tomáš Chvátal 2018-04-13 17:17:36 UTC
This is automated batch bugzilla cleanup.

The openSUSE Tumbleweed changed its development model at the end of
year 2014. [1]
Which means that most of the older bugs are reported against completely
different product than the current release of openSUSE Tumbleweed.

There is very high probability that this bug is no-longer relevant at all.
As a result we are closing this bug.

If you can reproduce this bug against a current Tumbleweed installation of
openSUSE, or you can still observe it under openSUSE Leap 15.0, please
feel free to reopen this bug.

Thank you for reporting this bug and we are sorry it was not resolved
under the old product.

[1] https://en.opensuse.org/Portal:Tumbleweed