Bugzilla – Bug 1092877
vasprintf(&dest, "%s", str) doesn't always fill whole destination
Last modified: 2018-12-17 00:05:35 UTC
When running "zypper dup" the program core dumps due to a memory allocation failure. The version installation taken from /etc/os-release below: NAME="openSUSE Tumbleweed" # VERSION="20180320" ID=opensuse ID_LIKE="suse" VERSION_ID="20180320" PRETTY_NAME="openSUSE Tumbleweed" =====/===== The full BT below: Loading repository data... Reading installed packages... Computing distribution upgrade... malloc(): memory corruption Program received signal SIGABRT, Aborted. 0x00007ffff653d9eb in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff653d9eb in raise () from /lib64/libc.so.6 #1 0x00007ffff653eda1 in abort () from /lib64/libc.so.6 #2 0x00007ffff6580c87 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff658729a in malloc_printerr () from /lib64/libc.so.6 #4 0x00007ffff658a30c in _int_malloc () from /lib64/libc.so.6 #5 0x00007ffff658badc in malloc () from /lib64/libc.so.6 #6 0x00007ffff65864a7 in __GI__IO_str_overflow () from /lib64/libc.so.6 #7 0x00007ffff6585504 in __GI__IO_default_xsputn () from /lib64/libc.so.6 #8 0x00007ffff6557a65 in vfprintf () from /lib64/libc.so.6 #9 0x00007ffff6611a38 in __vasprintf_chk () from /lib64/libc.so.6 #10 0x00007ffff7a5165d in zypp::str::form[abi:cxx11](char const*, ...) () from /usr/lib64/libzypp.so.1702 #11 0x00007ffff7a61417 in zypp::base::LogControl::LineFormater::format(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, zypp::base::logger::LogLevel, char const*, char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libzypp.so.1702 #12 0x00007ffff7a68641 in ?? () from /usr/lib64/libzypp.so.1702 #13 0x00007ffff6bf257a in std::ostream::put(char) () from /usr/lib64/libstdc++.so.6 #14 0x00007ffff6bf275f in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) () from /usr/lib64/libstdc++.so.6 #15 0x00007ffff78b5d69 in zypp::solver::detail::SATResolver::solverInit(std::__cxx11::list<zypp::PoolItem, std::allocator<zypp::PoolItem> > const&) () from /usr/lib64/libzypp.so.1702 #16 0x00007ffff78b7f26 in zypp::solver::detail::SATResolver::resolvePool(std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::__cxx11::list<zypp::PoolItem, std::allocator<zypp::PoolItem> > const&, std::set<zypp::Repository, std::less<zypp::Repository>, std::allocator<zypp::Repository> > const&) () from /usr/lib64/libzypp.so.1702 #17 0x0000555555670cf5 in solve_and_commit(Zypper&) () #18 0x00005555555ccfae in Zypper::doCommand() () #19 0x00005555555dedd1 in Zypper::safeDoCommand() () #20 0x000055555559e7a5 in Zypper::main(int, char**) () #21 0x000055555559dd89 in main ()
@Marco: We can't reproduce this. Could you please attach the core file?
(In reply to Michael Andres from comment #1) > @Marco: We can't reproduce this. Could you please attach the core file? Info provided as per IRC chat
Thanks Marco, for access to your orthos machine (I don't need it anymore). After a few hours of debugging: 1) By now, the issue can not be reproduced on any other machine. The code and the amount of data however is the same on any TW system perfoming a 'zypper dup'. I'd expect more report like this, if this is a common issue. 2) The zypp logger is using `vasprintf` (man ASPRINTF(3)) to format it's log limes. The crash happens when writing out a large log line (> 200000 chars). I was finally able to create a small test program not using libzypp at all (will be attached as bug.cc). It basically creates std::strings of a given size and compares them with the result of 'vasprintf( "%s" string )'. The expected result are equal strings. On your machine however the strings returned by vasprintf are truncated if the input string exceeds ~170000 chars. Causing various memory allocation issues and crashes. So this seems to be no libzypp issue, probably not even an issue in glibc/libstdc++ (as the tests behaved well on any TW,Leap,SLE system I checked.
Created attachment 770328 [details] Sample program; bug.cc Sample program: g++ -Wall -std=c++11 bug.cc -o bug Usage: bug [STRINGLENGTH [NUMBER_OF_CHECKS]] Default: bug 200000 10000 Prints a line whenver the size of the input string and the formated strings differ.
@Michael Matz: The `vasprintf` behavior on Marcos machine is really strange. May it be due to some obscure optimization which is not active on the other CPUs I tested? Are you interested in further investigations?
I'd like to investigate this a bit. On which machine can I see the testcase breaking?
Thanks for the initial analysis. I agree that this is a bug in vasprintf(), and indeed it only seems to happen on some hardware (or needs very specific environments to occur). I can reproduce the problem on rancid.arch.suse.de with this testcase: ------------ snip ------------------ #define _GNU_SOURCE #include <stdio.h> #include <stdarg.h> #include <stdlib.h> #include <string.h> #define N 200000 int main() { char *src, *dest; //for (int N = 0; N < 300000; N++) { src = malloc(N+1); memset (src, 'x', N); src[N] = 0; //dest = form ("%s", src); asprintf(&dest, "%s", src); if (strlen (dest) != strlen(src)) printf("%d: failed\n", N); /*else if (!(N & 63)) printf("%d: passed\n", N);*/ free (dest); free (src); //} return 0; } ---------------- snap ------------------ When the bug occurs (with 200000 chars on that machine always), then the destination string is only 102300 characters long. The _allocated_ length of the destination block is 200001, though, as one can see when tracing the thing in gdb. It's also the case that the destination buffer holds more 'x' characters after some intermediate zeros: (The last malloc done from within IO_str_overflow): Breakpoint 2, 0x00007ffff7a9da50 in malloc () from /lib64/libc.so.6 (gdb) p $rdi $19 = 204700 So it's now allocating a buffer of 204700 bytes (before that all mallocs ultimately were too small to contain the input string). Then breaking at the final realloc that supposedly sizes the returned buffer to the exact length necessary, in vasprintf: Breakpoint 3, 0x00007ffff7a9e1f0 in realloc () from /lib64/libc.so.6 (gdb) p $rdi $31 = 140737353601040 (gdb) p/x $rdi $32 = 0x7ffff7f7d010 (gdb) p $rsi $33 = 200001 So, %rdi is the returned string, it's allocated size will be 200001, now looking at content: (gdb) p ((char*)$rdi)[102300]@100 $44 = "x", '\000' <repeats 83 times>, 'x' <repeats 16 times> So, [102300] still has an 'x', then follows some zeros (ultimately making the string length less than the allocated size), then more 'x' (in fact until the end, though I haven't checked if there are more intermediate blocks of zeros). So something is going amiss when copying the string buffers I believe.
Which memcpy variant does this use?
On this Xeon PHI machine it's using the AVX512 variants, and that might indeed be a good hint: #0 0x00007ffff7b79bb0 in __memmove_avx512_no_vzeroupper () from /lib64/libc.so.6 #1 0x00007ffff7a984c6 in __GI__IO_str_overflow () from /lib64/libc.so.6 #2 0x00007ffff7a97504 in __GI__IO_default_xsputn () from /lib64/libc.so.6 #3 0x00007ffff7a69a65 in vfprintf () from /lib64/libc.so.6 #4 0x00007ffff7a91ac9 in vasprintf () from /lib64/libc.so.6 #5 0x00007ffff7a71104 in asprintf () from /lib64/libc.so.6 #6 0x00000000004006cb in main () at bug3.c:16
It's definitely the memcpy/memmove involved there. From some gdb session: Breakpoint 6, 0x00007ffff7b79b90 in __mempcpy_avx512_no_vzeroupper () from /lib64/libc.so.6 (gdb) p/x $rdi $91 = 0x7ffff7f95fad (gdb) p/x $rsi $92 = 0x7ffff7fc7fad (gdb) p $rdx $93 = 97699 (gdb) p/x $rsi-102301 $94 = 0x7ffff7faf010 (gdb) p/x $rdi-102301 $95 = 0x7ffff7f7d010 (gdb) p ((char*)$94)[0]@200000 $97 = 'x' <repeats 200000 times> (gdb) p ((char*)$95)[0]@200000 $98 = 'x' <repeats 102301 times>... So, this is the last block copying involved. Parts of the final destination string are already constructed (the first 102300 bytes). The full source string (at $rsi-102301) correctly consists of 200000 'x'. The destination string already has 102301 'x'. The copy destination ($rdi) points to dest+102301, i.e. exactly after all the 'x' that are already there. This copy operation hence should copy a further 97699 bytes from source+102301, which are all 'x'. So, finishing this copy operation, and looking at the destination string again: (gdb) finish Run till exit from #0 0x00007ffff7b79b90 in __mempcpy_avx512_no_vzeroupper () from /lib64/libc.so.6 0x00007ffff7a9753c in __GI__IO_default_xsputn () from /lib64/libc.so.6 (gdb) p ((char*)$95)[0]@200000 $99 = 'x' <repeats 102301 times>... (gdb) p ((char*)$95)[102301]@200000 $100 = '\000' <repeats 83 times>, 'x' <repeats 97744 times>... So, there are zeros where there should be 'x'. And what's more, after the 83 zeros (that shouldn't be there), there are 97744 'x' characters. Which is actually more than the amount that should have been copied (the above 97699 from the memcpy call). So, not only did it insert zeros, it also wrote more 'x'. The number of 'x' is too large by 45, plus the number of zeros (83), so that's overall 128 bytes too large, which is a nice round power of two, and the width of two avx512 registers. Probably, given the above numbers one can construct a memcpy testcase that breaks similarly. It probably would require using the same mis-alignment of the input pointers and same size (or misalignment of that size), to trigger the same code paths.
$ cat mempcpy.c #define _GNU_SOURCE 1 #include <string.h> #include <assert.h> #define N 97699 char a[N]; char b[N+128]; int main (void) { memset (a, 'x', N); char *c = mempcpy (b, a, N); assert (*c == 0); } $ gcc -g mempcpy.c -o mempcpy -fno-builtin-mempcpy $ ./mempcpy mempcpy: mempcpy.c:14: main: Assertion `*c == 0' failed.
Moving to security bug. *** This bug has been marked as a duplicate of bug 1094154 ***
This is an autogenerated message for OBS integration: This bug (1092877) was mentioned in https://build.opensuse.org/request/show/614277 15.0 / glibc
openSUSE-SU-2018:1633-1: An update that solves one vulnerability and has one errata is now available. Category: security (moderate) Bug References: 1092877,1094154 CVE References: CVE-2018-11237 Sources used: openSUSE Leap 15.0 (src): glibc-2.26-lp150.11.3.2, glibc-testsuite-src-2.26-lp150.11.3.2, glibc-utils-src-2.26-lp150.11.3.2
SUSE-SU-2018:1991-1: An update that solves three vulnerabilities and has two fixes is now available. Category: security (moderate) Bug References: 1082318,1092877,1094150,1094154,1094161 CVE References: CVE-2017-18269,CVE-2018-11236,CVE-2018-11237 Sources used: SUSE Linux Enterprise Module for Development Tools 15 (src): glibc-2.26-13.3.1, glibc-utils-src-2.26-13.3.1 SUSE Linux Enterprise Module for Basesystem 15 (src): glibc-2.26-13.3.1
openSUSE-SU-2018:2159-1: An update that solves three vulnerabilities and has two fixes is now available. Category: security (moderate) Bug References: 1082318,1092877,1094150,1094154,1094161 CVE References: CVE-2017-18269,CVE-2018-11236,CVE-2018-11237 Sources used: openSUSE Leap 15.0 (src): glibc-2.26-lp150.11.6.120, glibc-testsuite-src-2.26-lp150.11.6.120, glibc-utils-src-2.26-lp150.11.6.120