Bug 105885 - grep segfaults when locale is utf8
Summary: grep segfaults when locale is utf8
Status: RESOLVED INVALID
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Beta 2
Hardware: x86-64 All
: P5 - None : Normal
Target Milestone: ---
Assignee: Mads Martin Joergensen
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-19 15:49 UTC by Andreas Klein
Modified: 2005-09-14 12:15 UTC (History)
4 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Oops on smp opteron with beta1 (2.97 KB, text/plain)
2005-08-19 15:53 UTC, Andreas Klein
Details
input for grep (5.72 KB, application/octet-stream)
2005-09-03 13:03 UTC, Andreas Klein
Details
core (100.00 KB, application/octet-stream)
2005-09-03 13:09 UTC, Andreas Klein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Klein 2005-08-19 15:49:09 UTC
I am a little bit confused about the following problem. I can reproduce the
problem on only one type of machine: smp opteron. I do not see the problem on
single opteron or i386. Since it only happens on smp machines, it is maybe
kernel-related. Running SuSEconfig on that machine produces segfaults. Maybe i
has to do with bug 78353 (same machine type).
I also hat a kernel-oops on that machine with beta1. Will attach the oops.
I also thought about a hardware-problem, but the machine ran fine with 9.2/9.3
since last October. The memory is chipkill memory and there are no entries in
/var/log/mcelog.

Aug 19 17:33:12 linux kernel: fonts-config[6028]: segfault at 0000000000000000
rip 00002aaaaaaae05c rsp 00007fffff8376c0 error 4
Aug 19 17:33:12 linux kernel: acroread-cidfon[6033]: segfault at
0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffaf6460 error 4
Aug 19 17:33:45 linux kernel: grep[6336]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffc1f1c0 error 4
Aug 19 17:33:45 linux kernel: grep[6340]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007ffffff3b230 error 4
Aug 19 17:33:45 linux kernel: grep[6356]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffcc1520 error 4
Aug 19 17:33:50 linux kernel: grep[7755]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffa40150 error 4
Aug 19 17:33:50 linux kernel: grep[7759]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffa953d0 error 4
Aug 19 17:33:53 linux syslog-ng[4511]: SIGHUP received, restarting syslog-ng
Aug 19 17:33:54 linux syslog-ng[4511]: new configuration initialized
Aug 19 17:33:59 linux kernel: grep[8433]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffe416b0 error 4
Aug 19 17:33:59 linux kernel: klogd 1.4.1, ---------- state change ---------- 
Aug 19 17:33:59 linux kernel: xpdf-cjk-config[8439]: segfault at
0000000000000000 rip 00002aaaaaaae05c rsp 00007fffffbacb30 error 4
Comment 1 Andreas Klein 2005-08-19 15:53:03 UTC
Created attachment 46711 [details]
Oops on smp opteron with beta1
Comment 2 Andreas Klein 2005-08-19 16:52:26 UTC
You can trigger the problem by running:
/usr/sbin/ghostscript-cjk-config
while LANG is set to en_US.UTF-8
wpyc022:~ # echo $LANG 
en_US.UTF-8


If LANG is set to C, no segfaults occur.

What really puzzles me is, that it works on single opterons, regardless of the
LANG setting.
Comment 3 Andreas Kleen 2005-08-19 17:06:37 UTC
Maybe something is subtly different in your single opteron installation.
Comment 4 Olaf Kirch 2005-08-29 12:16:25 UTC
Erm. Does the oops attached above really belong to this bug report? 
That's an oops in lockd, which is completely unrelated to any grep 
segfaults. Did you mean to attach that oops to a different report? 
Comment 5 Andreas Klein 2005-08-29 13:28:51 UTC
No, the Oops is not related to this problem. I thought at first it may be, but
then I realized, that the Oopses has nothing to to with this.
And the problem is not a kernel-issue I think, that's why I changed it from
kernel to basesystem. But somehow mmj assigned it again to Hubert.
Comment 6 Olaf Kirch 2005-08-29 13:37:44 UTC
And round and round we go... sending back to Mads. 
Comment 7 Mads Martin Joergensen 2005-08-29 13:55:33 UTC
And how can I reproduce the problem in grep?
Comment 8 Mads Martin Joergensen 2005-08-29 14:04:07 UTC
# echo $LANG ; /usr/sbin/ghostscript-cjk-config
en_US.UTF-8
# cat /proc/cpuinfo
...
processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 248
...
Comment 9 Andreas Klein 2005-08-29 14:22:32 UTC
Run it more often.
The segfault is not visible in your terminal. You have to look in
/var/log/messages for it.

Mon Aug 29 16:18:51 CEST 2005
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # /usr/sbin/ghostscript-cjk-config
wpyc022:~ # date
Mon Aug 29 16:19:54 CEST 2005

Aug 29 16:19:03 wpyc022 kernel: grep[18059]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffdb8cc0 error 4
Aug 29 16:19:12 wpyc022 kernel: grep[18228]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffd5fcc0 error 4
Aug 29 16:19:20 wpyc022 kernel: grep[18360]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffd54d00 error 4
Aug 29 16:19:20 wpyc022 kernel: grep[18363]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffcccf10 error 4
Aug 29 16:19:20 wpyc022 kernel: grep[18379]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffff879df0 error 4
Aug 29 16:19:29 wpyc022 kernel: grep[18511]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffa13ce0 error 4
Aug 29 16:19:29 wpyc022 kernel: grep[18529]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffff8abe10 error 4

Comment 10 Mads Martin Joergensen 2005-08-29 14:29:50 UTC
15 times in a row, and I still get nothing. Neither in log/messages or dmesg.
Comment 11 Mads Martin Joergensen 2005-08-29 15:10:11 UTC
Mike, can you reproduce/have an idea?
Comment 12 Andreas Klein 2005-08-29 15:30:54 UTC
Hmm, mabe there must be another package present, to trigger the problem?
Should I attach my package selection file?
Comment 13 Mike Fabian 2005-08-29 15:35:16 UTC
No, I cannot reproduce this either. I tried on Beta3 on a x86_64 single processor machine and on Beta3 on a x86_64 dual processor machine.  Closing as WORKSFORME.  Andreas, if you still think this is a bug you may reopen it but then please find out exactly which grep command causes the segfault. This should be rather easy, if you run /usr/sbin/ghostscript-cjk with verbose output:      /usr/sbin/ghostscript-cjk --verbosity 1  You can see all "grep" commands executed by /usr/sbin/ghostscript-cjk in the output. Looks like this:      executing: grep -q "Creator: aliascid.ps" fonts.cache-1     executing: grep -q "Creator: aliascid.ps" MOEKai-Regular  Find out exactly which grep command segfaults for you. And then please attach the file which was the argument of the grep command to this bugreport.  Until then it is WORKSFORME.
Comment 14 Mike Fabian 2005-08-29 15:36:44 UTC
No, I cannot reproduce this either. I tried on Beta3 on a x86_64
single processor machine and on Beta3 on a x86_64 dual processor
machine. Closing as WORKSFORME.

Andreas, if you still think this is a bug you may reopen it but then
please find out exactly which grep command causes the segfault. This
should be rather easy, if you run /usr/sbin/ghostscript-cjk with
verbose output:

    /usr/sbin/ghostscript-cjk --verbosity 1

you can see all "grep" commands executed by /usr/sbin/ghostscript-cjk
in the output. Looks like this:

    executing: grep -q "Creator: aliascid.ps" fonts.cache-1
    executing: grep -q "Creator: aliascid.ps" MOEKai-Regular

Find out exactly which grep command segfaults for you. And then please
attach the file which was the argument of the grep command to this
bugreport.

Until then it is WORKSFORME.
Comment 15 Mike Fabian 2005-08-29 15:38:16 UTC
Andreas> Hmm, mabe there must be another package present, to trigger the problem?
Andreas> Should I attach my package selection file?

I think it would help us much more if you can check exactly which grep
command segfaults and attach the file which was used as an argument in
that grep command.


Comment 16 Andreas Klein 2005-08-29 16:29:11 UTC
This is really a strange problem.
Running:
/usr/sbin/ghostscript-cjk --verbosity 1
does not lead to a segfault, either does running the grep itself.
Running 
/usr/sbin/ghostscript-cjk
without arguments still produces the segfaults with grep.
Comment 17 Andreas Klein 2005-08-29 18:43:06 UTC
This problem depends on the running kernel.
I installed the kernel-default on that machine, then I can't trigger the problem.

Summary:

The problem is independent of the LANG setting. It was just an unlucky coincidence.

SMP kernel:
grep segfaults can be produced by the following commands:
while :; do /usr/sbin/ghostscript-cjk; done
while :; do /sbin/conf.d/SuSEconfig.lyx-cjk; done
The segfault happens on average on one out of 10-20 executions.
Running the greps that were executed by hand do not segfault.
running the script with verbosity 1 does not segfault.
The segfaults happens at grep in this part. Maybe it has something to do with
the fs-cache?
rm -f wrap_chkconfig.ltx chkconfig.vars chkconfig.classes chklayouts.tex
  done > chklayouts.tex
  ${LATEX} wrap_chkconfig.ltx 2>/dev/null | grep '^\+'
  eval `cat chkconfig.vars | sed 's/-/_/g'`
  test -n "${rmcopy}" && rm -f chkconfig.ltx
fi

kernel-default:
Installing the single-processor kernel on that machine "solves" the problem!
I tested several 100 executions without segfault.

smp-kernel with nosmp kernel-parameter:
kernel does not boot: hda lost interrupt; kernel panic
Comment 18 Andreas Kleen 2005-08-29 18:47:09 UTC
Does you always get the same oops in lockd or random different ones?
Comment 19 Andreas Klein 2005-08-30 05:09:12 UTC
The Oops was just once with beta1. Beta2 and Beta3 never had an Oops.
The grep segfaults startet with Beta3 smp-kernel. /var/log/mcelog shows no problems.
Comment 20 Olaf Kirch 2005-08-30 07:19:48 UTC
I still don't believe this is a kernel problem. Maybe the problem is just 
that grep receives an incomplete multibyte sequence from the pipe, and barfs. 
But let's see. 
 
Andreas, please provide the input that's being piped into grep above. 
If just run "grep < output", does it still segfault? IOW, does it depend 
on the data being fed to grep, or the fact that there's two commands 
connected in a pipe? 
 
If not: run this single pipe (latex|grep) - does it segfault? 
 
If not: can you write a small C program that does something like this: 
 
	while (read(0, &c, 1) == 1) { 
		write(1, &c, 1); 
		msleep(10); 
	} 
 
and use that to pipe the latex output into grep? 
Comment 21 Olaf Kirch 2005-09-01 14:51:13 UTC
Were you able to extract the latex output? 
 
BTW you mentioned that the segfault happens here: 
 
${LATEX} wrap_chkconfig.ltx 2>/dev/null | grep '^\+' 
 
I didn't find this in ghostscript-cjk-config 
 
Comment 22 Andreas Klein 2005-09-01 18:36:51 UTC
(In reply to comment #21)
> Were you able to extract the latex output? 

I will try it.

> BTW you mentioned that the segfault happens here: 
>  
> ${LATEX} wrap_chkconfig.ltx 2>/dev/null | grep '^\+' 
>  
> I didn't find this in ghostscript-cjk-config 

This was from:
/usr/share/lyx-cjk/configure which is called by one of the SuSEconfig scripts.
Comment 23 Andreas Klein 2005-09-01 19:59:30 UTC
I just tried beta4. The segfault still exists. It is always in another line and
not always. But it only happens with the smp kernel.
The segfault always happen at address 0 and the same rip. The rsp is changing.

grep[11162]: segfault at 0000000000000000 rip 00002aaaaaaae05c rsp
00007fffff90db10 error 4

wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
/sbin/conf.d/SuSEconfig.ghostscript-cjk: line 26: 20297 Segmentation fault     
/usr/sbin/ghostscript-cjk-config
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
/sbin/conf.d/SuSEconfig.ghostscript-cjk: line 22: 22124 Segmentation fault     
/usr/sbin/acroread-cidfont-config
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
wpyc022:~ # /sbin/conf.d/SuSEconfig.ghostscript-cjk
Comment 24 Olaf Kirch 2005-09-02 11:29:09 UTC
I don't have a beta2 anymore, so I cannot validate exactly. But the RIP  
is definitely in a shared library. You can verify if you start "grep X"  
in one window, and do a "cat /proc/$PID/maps" in another.  
  
I am quite sure that this is a problem with multibyte support in the  
locale code. The reason no-one is able to reproduce it here may just  
depend on the set of packages you have installed. Lots of CJK packages 
maybe? 
  
Please, it's important that you provide us with the input that makes  
grep segfault.  Just insert a "tee /tmp/somefile" into the latex|grep 
pipe. 
Comment 25 Mike Fabian 2005-09-02 11:37:34 UTC
Olaf> I am quite sure that this is a problem with multibyte support in the  
Olaf> locale code. The reason no-one is able to reproduce it here may just  
Olaf> depend on the set of packages you have installed.

I don't think so because I cannot reproduce it either and I have *all*
CJK packages installed which are available on SuSE Linux and some more.

And apparently Andreas cannot give us the exact input that makes grep
segfault. I also asked for that input, without such input I cannot
even start to try finding a bug in grep.


Comment 26 Olaf Kirch 2005-09-02 11:47:30 UTC
What's the problem with providing the input? It's quite possible that 
if you just extract the inout, it will not crash grep if you just feed it 
that file. Most likely the problem also depends on the exact sequence of 
read()s that were doing. My guess is still that LC_COLLATE or something 
like this is being fed an incomplete collation sequence and walks off 
into the woods. 
 
If you like, you can also try prefixing that grep invocation by 
"strace -e read -o /tmp/grep.trace". Again, this may or may not yield 
useful information, as strace changes the application's timing considerably. 
Comment 27 Andreas Klein 2005-09-03 12:58:23 UTC
(In reply to comment #24)
> I don't have a beta2 anymore, so I cannot validate exactly. But the RIP  
> is definitely in a shared library. You can verify if you start "grep X"  
> in one window, and do a "cat /proc/$PID/maps" in another.  

The RIP is still the same in beta 4 00002aaaaaaae05c
cat /proc/maps show ld.so for this:
2aaaaaaab000-2aaaaaac0000 r-xp 00000000 03:06 34914                     
/lib64/ld-2.3.5.so

> I am quite sure that this is a problem with multibyte support in the  
> locale code. The reason no-one is able to reproduce it here may just  
> depend on the set of packages you have installed. Lots of CJK packages 
> maybe? 


Go to zzz_all select all packages in this list install. Thats how all machines
are installed here. remove *debuginfo, beagle, kat and choose only one kernel. 
I can give you the softwware-selection file, or create an account for you on
that machine.

> Please, it's important that you provide us with the input that makes  
> grep segfault.  Just insert a "tee /tmp/somefile" into the latex|grep 
> pipe. 

Ok, I will attach it. But if the tee is present, it does not segfault! Even
adding some prints before and after that is enough to prevent the segfault.
I enabled corefile writing and will attach the corefile.
Comment 28 Andreas Klein 2005-09-03 13:03:10 UTC
Created attachment 48694 [details]
input for grep

This is the onput, that is fed into grep, where the segfaults sometimes occur.
If the tee is present, grep does not segfault.
Comment 29 Andreas Klein 2005-09-03 13:09:31 UTC
Created attachment 48695 [details]
core
Comment 30 Olaf Kirch 2005-09-05 09:43:36 UTC
Thanks for the core dump.      
     
The faulting address (00002aaaaaaae05c) is actually in this segment:     
     
2aaaaaac8000-2aaaaaafb000 r--p 00000000 /usr/lib/locale/en_US.utf8/LC_CTYPE    
(Note that the map you gave in comment #27 doesn't contain the faulting RIP)    
    
IOW it jumps right into the CTYPE tables. Reproduceably. I find it hard    
to believe that this should be a kernel bug.  
  
I think someone from the packagers team needs to look at the core dump. 
Comment 31 Andreas Kleen 2005-09-05 10:09:12 UTC
It could be triggered somehow by the randomized va mappings.
It seems to cause all kinds of user space problems.  See the long story
in http://bugzilla.kernel.org/show_bug.cgi?id=4851

Does it go away with
echo 0 > /proc/sys/kernel/randomize_va_space
?

If yes we should probably consider to disable it for the release.
Comment 32 Andreas Klein 2005-09-05 10:22:26 UTC
This makes it better.
After 
echo 0 > /proc/sys/kernel/randomize_va_space
all segfaults in the cjk scripts do not occur anymore. So this prevents most of
the segfaults.
Only one segfault remains:
updmap: initial config file is `/etc/texmf/web2c/updmap.cfg'
/usr/bin/updmap: line 298: 30445 Segmentation fault      grep "$pat" "$file"
>/dev/null
Sep  5 12:19:14 wpyc022 kernel: grep[30445]: segfault at 0000000000000000 rip
00002aaaaaaae05c rsp 00007fffffffdb70 error 4
Comment 33 Olaf Kirch 2005-09-05 13:18:13 UTC
This is still the same RIP as all the other reports. 
 
Also, we're not doing much in terms of randomization anyway. Mostly the 
stack. 
Comment 34 Marcus Meissner 2005-09-05 13:33:13 UTC
it would really help to capture such a crash in gdb if possible. 
Comment 35 Olaf Kirch 2005-09-05 13:39:07 UTC
There is a core file attached above, but I cannot find a grep with debuginfo  
symbols that would show a sane backtrace.  
Comment 36 Marcus Meissner 2005-09-05 14:00:07 UTC
andreas, can you perhaps attach the grep binary? 
 
or try with the beta4 grep binary? 
Comment 37 Andreas Klein 2005-09-05 14:17:07 UTC
The core was from a beta4 grep.
Comment 38 Andreas Klein 2005-09-05 14:19:07 UTC
taskset -p 0x01 $$ seems also prevent a segfault.
Another way is to produce load with cpuburn.
Comment 39 Marcus Meissner 2005-09-05 14:58:32 UTC
i tried with beta3 on reger.suse.de (4 cpu machine)... but no luck so far. 
Comment 40 Andreas Klein 2005-09-05 15:49:40 UTC
Our machine is a Tyan S2885 with 2x Opteron 242, 2GB RAM.
Comment 41 Mads Martin Joergensen 2005-09-05 15:52:30 UTC
Andreas, how new is your bios? If not new, then please try and reproduce with the
the latest bios. With dual opterons and Tyan board we've seen weird stuff before
on Rudis machine.
Comment 42 Andreas Klein 2005-09-06 07:39:07 UTC
No, it is not the newest BIOS. I had the problem, that the newest did not work
reliable on machines with 6GB RAM.
I use 2.02w. With that I was able to find settings for MTRT and the adjust
memory switch, so that SLES8, SLES9 an SUSE 9.3 was running without problems.
Wich BIOS do you recommend/use on your Tyan S2885 board?
Comment 43 Ruediger Oertel 2005-09-06 16:23:08 UTC
2.02w is one of the old beta series. 
my machine (also K8W aka 2885) is running 2.05 nicely with 
4G of memory. I also had tons of segfaults with bioses prior 
to 2.05 when running recent kernels. 
Comment 44 Andreas Klein 2005-09-07 08:11:53 UTC
(In reply to comment #43)
> 2.02w is one of the old beta series. 
> my machine (also K8W aka 2885) is running 2.05 nicely with 
> 4G of memory.

I just tested 2.05 on my workstation and it works even with 6gb. Only 2.04 had
the 6GB problem.

> I also had tons of segfaults with bioses prior 
> to 2.05 when running recent kernels.

I will install 2.05 this afternoon on the beta-machine to see, if my segfaults
are gone, too.
Comment 45 Andreas Klein 2005-09-08 14:28:29 UTC
After the BIOS update the segfaults are gone!
The BIOS seems to repair all segfaults.
Comment 46 Mads Martin Joergensen 2005-09-08 14:38:30 UTC
So since it was a local hw problem, this was invalid.