Bugzilla – Bug 1187864
Semantic interposition inhibits optimization of shared libraries
Last modified: 2022-04-07 11:18:30 UTC
By default, any visible symbol in shared libraries (PIC) can be overwritten during runtime by the executable or mechanisms such as LD_PRELOAD, this is called semantic interposition. This however affects optimization of shared libraries quite badly, because any call or reference to exported functions (or symbols in general) in the same library might not actually resolve to the symbol in that library anymore. Any such references have to be resolved during runtime (GOT+PLT relocations) and interprocedural optimization is not possible anymore. For executables (also PIE) this is not the case, they can utilize direct calls and references to global data. Through the magic of copy relocations, it can even reference global data from shared objects directly. Libraries then use the copy of the global data from the executable instead of their own. As libraries actually make up the most code on the system, optimizing for those is actually worthwhile. Using the "-fno-semantic-interposition" compiler option, interprocedural optimization can be enabled for exported symbols and the linker option "-Bsymbolic(-functions)" allows symbol references inside the same object to be resolved directly instead of going through runtime relocation. Some libraries enable those options themselves already like openssl. Qt also used "-Bsymbolic" for a long time, but it conflicted with the "gcc-PIE" package which built incompatible executables and so it got disabled again (boo#1175278). However, executables have to be built with that in mind (by compiling them like PIC) to avoid features which rely on interposition, like copy relocations. The "-fno-direct-access-external-data" option achieves that FWICT. It would be great if we could build (most parts of) the distro such that libraries can be optimized more. Thread on the GCC ML about -fno-semantic-interposition: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572018.html GCC PR report about -fno-direct-access-external-data: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 Extensive discussion about Qt's reduce-relocations option: https://bugreports.qt.io/browse/QTBUG-86173 Blog posts about this topic: https://maskray.me/blog/2021-05-09-fno-semantic-interposition https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic
(In reply to Fabian Vogt from comment #0) > Using the "-fno-semantic-interposition" compiler option, interprocedural > optimization can be enabled for exported symbols [...] The way I see it, -fno-semantic-interposition is justified by the "one definition rule" in both C and C++. Having different functions with the same (qualified) name (and signature for C++) is not allowed in a program. There is still a non-standard "back door": __attribute__((weak)). So if necessary it's still possible to define overridable functions, but other libraries can't just override anything they want. It's also kind of on-by-default in Clang (not literally, but Clang does IPO on default visibility functions) and on other platforms, so there is hopefully not a lot of software out there that relies on semantic interposition. > [...] and the linker option "-Bsymbolic(-functions)" allows symbol > references inside the same object to be resolved directly instead of going > through runtime relocation. [...] > > However, executables have to be built with that in mind (by compiling them > like PIC) to avoid features which rely on interposition, like copy > relocations. The "-fno-direct-access-external-data" option achieves that > FWICT. In the thread where Fangrui Song aka maskray proposed to enable these flags for building Clang [1], the time difference between -Bsymbolic and -Bsymbolic-functions was minor. For Qt it might very well look different though. Address uniqueness is probably not a big issue, but it certainly can be an issue, as the standards guarantee it. So I think we could enable -Bsymbolic-functions on a per-package basis, but probably not globally. Avoiding copy relocations for -Bsymbolic might be too high a price to pay, at least in general. I don't know how common default visibility variables are, and whether they're typically used more often by the library or the program. With modern libraries I'd expect there not be a lot of exported variables, so maybe it just doesn't matter that much. So I'd be in favor of enabling -fno-semantic-interposition by default, but not -Bsymbolic[-functions]. We can enable that where it matters and where we think it's safe. I played with -Bsymbolic in the LLVM build once, but I'm mostly compiling C++ where the compile jobs take so long that the initial relocation processing doesn't change much. (Since we're building LLVM with Clang, we already have IPO for default visibility functions.) [1] https://lore.kernel.org/lkml/20210501235549.vugtjeb7dmd5xell@google.com/
(In reply to Aaron Puchert from comment #1) > (In reply to Fabian Vogt from comment #0) > > Using the "-fno-semantic-interposition" compiler option, interprocedural > > optimization can be enabled for exported symbols [...] > The way I see it, -fno-semantic-interposition is justified by the "one > definition rule" in both C and C++. Having different functions with the same > (qualified) name (and signature for C++) is not allowed in a program. There > is still a non-standard "back door": __attribute__((weak)). So if necessary > it's still possible to define overridable functions, but other libraries > can't just override anything they want. It's also kind of on-by-default in > Clang (not literally, but Clang does IPO on default visibility functions) > and on other platforms, so there is hopefully not a lot of software out > there that relies on semantic interposition. I think for GCC the ODR trumps -fsemantic-interposition though GCC applies ODR only to C++. Honza, please correct me if I'm wrong here. So I'm not sure whether -fno-semantic-interposition on its own has any measurable effect on a C++ code base. For C there's no ODR so things are more complicated.
As I wrote to maskray on the generic-abi list ( https://groups.google.com/g/generic-abi/c/LgSC6te51uM/m/WGzfjtFKAgAJ ) I'm quite heavily opposed to disabling interposition wholesale. I would perhaps support it when restricted to C++ symbols for functions. Also note that protected visiblity is going to be fixed on x86-64 (to not be worse than default visibility and break with copy relocations, i.e. it will work again like before 2014/15) over the next months. At that point packages choosing so can simply use that. -Bsymbolic should be regarded as a hack in the ELF world, because ELF has symbol visibility which is (or was, and will be again) designed to be exactly that. And of course the compiler doesn't know anything about it, so can't base it's inlining decisions on it. (Which is the whole reason why people had to invent still another flag for the compiler; well, that and because protected visibility wasn't working as designed/desired). So, my course of action would be: wait for protected symbol visibility to be fixed again, make it so that packages can select that visibility as default for C++ functions (select it in a way the compiler knows), let packages decide. Possibly make that default in the toolchain after quite some time.
Note that for instance in above generic-abi thread maskray says that most speedup for clang itself was by avoiding the symbolic relocations, i.e. what protected visibility or variants of -Bsymbolic* give you, _not_ what -fno-semantic-interposition gives you. There are unclear claims about fantastic speedups with cpython, which look a bit doubtful or are done on unrealistic microbenchmarks, that may or may not come from the disabled interposition or from reduced relocations.
(In reply to Richard Biener from comment #2) > For C there's no ODR so things are more complicated. What about C11 6.9.5? An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one. (In reply to Michael Matz from comment #3) > I'm quite heavily opposed to disabling interposition wholesale. In my view this is more about explicitly annotating interposable functions as weak instead of allowing interposition for any function. ELF default visibility has always implied interposability, but I think that's mostly seen as historical artifact by now. Visibility and interposability are pretty much orthogonal. Just because a library is "exporting" functions doesn't mean it's inviting others to replace them. With a few exceptions that makes absolutely no sense, and it wouldn't hurt to mark these exceptions as "weak". > Also note that protected visiblity is going to be fixed on x86-64 (to not be > worse than default visibility and break with copy relocations, i.e. it will > work again like before 2014/15) over the next months. What are the ideas for that? Not doing copy relocations at all, or giving protected visibility a different meaning for variables? > At that point packages choosing so can simply use that. It's going to quite a bit of time for packages to migrate to this. I get your point, default visibility has never been a good default, and it's probably hopeless to save it. Protected visibility is more or less defined as non-interposable export, so that and hidden visibility should satisfy nearly all uses. Default visibility could still be used for the rare case where a function should actually be interposable. But I think that's what library writers have (in almost all cases) always wanted, but didn't use it because it didn't work and was recommended against. (GCC's man page says "protected and internal are pretty useless in real-world usage so the only other commonly used option is hidden." Drepper writes: "Only the first two [default and hidden] should ever be used.") Still, protected visibility just doesn't interpose, it doesn't error out if there is a function with the same name. So while library writers can use it to opt out of interposition (which they probably didn't want in the first place), those who'd want to interpose just get a silent failure just as if we were making -fno-semantic-interposition the default. If library writers know about their users and whether someone might interpose their functions that could be helpful, but I don't know how often that's the case. In any event, if library writers want to actively allow interposition, an explicit "weak" attribute would communicate that intent clearly.
(In reply to Aaron Puchert from comment #5) > (In reply to Richard Biener from comment #2) > > For C there's no ODR so things are more complicated. > What about C11 6.9.5? > > An external definition is an external declaration that is also a > definition of > a function (other than an inline definition) or an object. If an identifier > declared with external linkage is used in an expression (other than as > part of > the operand of a sizeof or _Alignof operator whose result is an integer > constant), somewhere in the entire program there shall be exactly one > external > definition for the identifier; otherwise, there shall be no more than one. That doesn't constrain the semantics - it merely says that multiple definitions will be diagnosed by the linker and no definition as well. I think there's even a stronger guarantee somewhere that mismatched declarations (as in mismatched types) invoke undefined behavior. Compare that to the C++ ODR which says (C++14, 3.2(6)): "There can be more than one definition [...] - each definition of D shall consist of the same sequence of tokens; and [...]" You possibly can read the C variant as being even stronger, but then even for C there are "multiple definitions", namely for C99 inline functions, and I don't see (OK, did not exhaustively search the standard) that the standard constrains the inline function definitions.
Normally I do like language lawyering very much, but I think in this case it's not appropriate. Symbol interposition is an ELF feature, and hence, for us, a system feature; a fairly powerful one with some disadvantages. We can't just disable that feature after 30 years. At least not if there are ways where we can mitigate the disadvantages without throwing out the baby completely. Aaron: the power of interposition doesn't lie in being able to override known symbols, but rather to override _all_ (exported) symbols. The point being that the software authors don't have to explicitely mark symbols as overridable; if they were to be required to do that, that wouldn't then make it different from them explicitely designing hooks into their interfaces. There is for instance software that hooks many libc routines (and a changeable set of other routines) in order to, well, hook them (e.g. for tracing). Symbol interposition makes this trivial. Without it you need to resort to the contortions that equivalent Windows software needs to go through (basically patching either the import table, if it exists, or even the function code itself). Basically: if library authors would be required to make their exports hookable, we would end up with nothing being hookable. That is because the need for hooking can't be predicted; if someone needs it, it's usually not the library author but someone outside who doesn't necessarily (want to) have means to change the library. As I've written elsewhere (probably in the above generic-abi thread): I'll concede that such hooking usually takes place for C symbols, not for other language symbols. (Though of course there's nothing that would currently prevent that, and of course e.g. valgrind does hook the global c++ allocation routines). So, I'd be willing to try disabling interposition for C++ symbols (with exceptions). But before even that, we need to be clear what exactly we want to change and what the improvements of that change will be. Perhaps on a bit more than just clang (speedup due to fewer symbol lookups) and some anecdote from the web about cpython. (FWIW: I do know that libqt also goes to lengths to reduce symbolic lookups, and libreoffice (at the time still openoffice) had the same problem, which is why we now have .gnu.hash, and KDE had the problem which is why we have (had) kdeinit. I.e. I know that symbolic lookups are a problem for some software (all C++!) but I also know that it's absolutely no problem at all for other software, which is why I'm hammering so much on the protected visibility, being _exactly_ the right tool for avoiding symbolic lookups from within shared libs) And yes, Aaron: the idea for "fixing" protected vis is to not generate copy relocs for variables (i.e. cross module accesses will be indirect, like now but in the other direction). For function addresses something similar can be done; it must be the one from the defining module, not from the PLT slot in the exe. H.J. works on some patches that try to phase that into the world step by step to not break existing binaries.
(In reply to Michael Matz from comment #7) > Normally I do like language lawyering very much, but I think in this case > it's not appropriate. Symbol interposition is an ELF feature, and hence, > for us, a system feature; a fairly powerful one with some disadvantages. Sure, this isn't only about what's legal and what isn't. The one definition rule does not just constrain me as programmer, it also allows me to better reason about what's happening. Because if I see a function call, and a definition with a matching signature, I know that's the function being called. That's arguably a cornerstone of (interprocedurally) understanding code. With interposition I might have to ask myself as library author if my library will still behave well if random functions have their functionality replaced, and I'm not sure how one could come to such a conclusion, especially if there are also changes within the library. > if they were to be required to [explicitly mark symbols as overridable], > that wouldn't then make it different from them explicitly designing hooks > into their interfaces. Fully agreed, though just overriding a function is obviously more comfortable. > There is for instance software that hooks many libc routines (and a changeable > set of other routines) in order to, well, hook them (e.g. for tracing). Certainly many parts of libc are well-suited for this, especially all that invoke system calls. (It's a lot like mocking in that regard.) When it comes to tracing: eBPF allows out-of-process tracing on any function (not just those with default visibility), so I think we have a good replacement there. What eBPF cannot do is valgrind-like instrumentation, but there is a quite limited set of functions that are interesting for that. It would be strange if someone wanted to instrument functions like sin, cos or strlen. > Basically: if library authors would be required to make their exports > hookable, we would end up with nothing being hookable. That is because the > need for hooking can't be predicted; if someone needs it, it's usually not > the library author but someone outside who doesn't necessarily (want to) > have means to change the library. That's where I disagree. You're absolutely right that hooking would be done by users of the library, but just like a library author can decide which functions they expose to the outside world for calling, they can surely also decide what to expose for hooking. We're relying on that anyway, a hidden visibility function cannot be interposed, let alone static (internal linkage) functions. Now I think it would go off the rails if I went into how to decide that, but basically good candidates for hooking are things that invoke system calls or have side effects, like IO of any kind. Bad candidates are functions that are "CPU only", like an FFT or a prime factorization, or decoding a media stream. Alternatively one could see them as pure functions, or functions operating on "value types" as opposed to types like file handles that have an identity. Hooking pure functions is not so interesting because they're memoizable, so when and how often they're called is to some extent meaningless. > But before even that, we need to be clear what exactly we want to change > and what the improvements of that change will be. Perhaps on a bit more than > just clang (speedup due to fewer symbol lookups) and some anecdote from the > web about cpython. Right, libLLVM and libclang are probably a special case with their enormous set of default visibility symbols. And generally C++ seems to be a bit of an issue, as you noted, though I can't quite put my finger on why that is. Perhaps the ability to mark entire classes as default visibility (which also includes private and protected methods) instead of individual functions is a contributing factor. That being said, I think that fixing protected visibility seems like it would solve the problem as well, at least in the long term. Surely it will take projects time to switch when only the newest gcc/binutils properly support that. While we're there, couldn't we make protected visibility even stricter and let the dynamic linker error out if there are two symbols of the same name? Like the static linker enforces every symbol having one definition. As I wrote earlier, this one-to-one correspondence is nice when trying to reason about code.