Bug 1187864 - Semantic interposition inhibits optimization of shared libraries
Summary: Semantic interposition inhibits optimization of shared libraries
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Michael Matz
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-30 10:00 UTC by Fabian Vogt
Modified: 2022-04-07 11:18 UTC (History)
7 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2021-06-30 10:00:26 UTC
By default, any visible symbol in shared libraries (PIC) can be overwritten during runtime by the executable or mechanisms such as LD_PRELOAD, this is called semantic interposition.

This however affects optimization of shared libraries quite badly, because any call or reference to exported functions (or symbols in general) in the same library might not actually resolve to the symbol in that library anymore. Any such references have to be resolved during runtime (GOT+PLT relocations) and interprocedural optimization is not possible anymore.

For executables (also PIE) this is not the case, they can utilize direct calls and references to global data. Through the magic of copy relocations, it can even reference global data from shared objects directly. Libraries then use the copy of the global data from the executable instead of their own.

As libraries actually make up the most code on the system, optimizing for those is actually worthwhile. Using the "-fno-semantic-interposition" compiler option, interprocedural optimization can be enabled for exported symbols and the linker option "-Bsymbolic(-functions)" allows symbol references inside the same object to be resolved directly instead of going through runtime relocation.

Some libraries enable those options themselves already like openssl. Qt also used "-Bsymbolic" for a long time, but it conflicted with the "gcc-PIE" package which built incompatible executables and so it got disabled again (boo#1175278).

However, executables have to be built with that in mind (by compiling them like PIC) to avoid features which rely on interposition, like copy relocations. The "-fno-direct-access-external-data" option achieves that FWICT.

It would be great if we could build (most parts of) the distro such that libraries can be optimized more.

Thread on the GCC ML about -fno-semantic-interposition: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572018.html
GCC PR report about -fno-direct-access-external-data: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Extensive discussion about Qt's reduce-relocations option: https://bugreports.qt.io/browse/QTBUG-86173
Blog posts about this topic: https://maskray.me/blog/2021-05-09-fno-semantic-interposition https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic
Comment 1 Aaron Puchert 2021-07-05 22:41:42 UTC
(In reply to Fabian Vogt from comment #0)
> Using the "-fno-semantic-interposition" compiler option, interprocedural
> optimization can be enabled for exported symbols [...]
The way I see it, -fno-semantic-interposition is justified by the "one definition rule" in both C and C++. Having different functions with the same (qualified) name (and signature for C++) is not allowed in a program. There is still a non-standard "back door": __attribute__((weak)). So if necessary it's still possible to define overridable functions, but other libraries can't just override anything they want. It's also kind of on-by-default in Clang (not literally, but Clang does IPO on default visibility functions) and on other platforms, so there is hopefully not a lot of software out there that relies on semantic interposition.

> [...] and the linker option "-Bsymbolic(-functions)" allows symbol
> references inside the same object to be resolved directly instead of going
> through runtime relocation. [...]
> 
> However, executables have to be built with that in mind (by compiling them
> like PIC) to avoid features which rely on interposition, like copy
> relocations. The "-fno-direct-access-external-data" option achieves that
> FWICT.
In the thread where Fangrui Song aka maskray proposed to enable these flags for building Clang [1], the time difference between -Bsymbolic and -Bsymbolic-functions was minor. For Qt it might very well look different though.

Address uniqueness is probably not a big issue, but it certainly can be an issue, as the standards guarantee it. So I think we could enable -Bsymbolic-functions on a per-package basis, but probably not globally. Avoiding copy relocations for -Bsymbolic might be too high a price to pay, at least in general. I don't know how common default visibility variables are, and whether they're typically used more often by the library or the program. With modern libraries I'd expect there not be a lot of exported variables, so maybe it just doesn't matter that much.

So I'd be in favor of enabling -fno-semantic-interposition by default, but not -Bsymbolic[-functions]. We can enable that where it matters and where we think it's safe.

I played with -Bsymbolic in the LLVM build once, but I'm mostly compiling C++ where the compile jobs take so long that the initial relocation processing doesn't change much. (Since we're building LLVM with Clang, we already have IPO for default visibility functions.)

[1] https://lore.kernel.org/lkml/20210501235549.vugtjeb7dmd5xell@google.com/
Comment 2 Richard Biener 2021-07-06 06:17:33 UTC
(In reply to Aaron Puchert from comment #1)
> (In reply to Fabian Vogt from comment #0)
> > Using the "-fno-semantic-interposition" compiler option, interprocedural
> > optimization can be enabled for exported symbols [...]
> The way I see it, -fno-semantic-interposition is justified by the "one
> definition rule" in both C and C++. Having different functions with the same
> (qualified) name (and signature for C++) is not allowed in a program. There
> is still a non-standard "back door": __attribute__((weak)). So if necessary
> it's still possible to define overridable functions, but other libraries
> can't just override anything they want. It's also kind of on-by-default in
> Clang (not literally, but Clang does IPO on default visibility functions)
> and on other platforms, so there is hopefully not a lot of software out
> there that relies on semantic interposition.

I think for GCC the ODR trumps -fsemantic-interposition though GCC applies ODR
only to C++.  Honza, please correct me if I'm wrong here.  So I'm not sure
whether -fno-semantic-interposition on its own has any measurable effect on a C++ code base.

For C there's no ODR so things are more complicated.
Comment 3 Michael Matz 2021-07-06 12:45:43 UTC
As I wrote to maskray on the generic-abi list ( https://groups.google.com/g/generic-abi/c/LgSC6te51uM/m/WGzfjtFKAgAJ ) I'm quite
heavily opposed to disabling interposition wholesale.  I would perhaps support
it when restricted to C++ symbols for functions.

Also note that protected visiblity is going to be fixed on x86-64 (to not be
worse than default visibility and break with copy relocations, i.e. it will work again like before 2014/15) over the next months.  At that point packages
choosing so can simply use that.

-Bsymbolic should be regarded as a hack in the ELF world, because ELF has symbol
visibility which is (or was, and will be again) designed to be exactly that.  And of course the compiler doesn't know anything about it, so can't base it's inlining
decisions on it.  (Which is the whole reason why people had to invent still another flag for the compiler; well, that and because protected visibility wasn't
working as designed/desired).

So, my course of action would be: wait for protected symbol visibility to be fixed
again, make it so that packages can select that visibility as default for C++
functions (select it in a way the compiler knows), let packages decide.  Possibly make that default in the toolchain after quite some time.
Comment 4 Michael Matz 2021-07-06 12:57:05 UTC
Note that for instance in above generic-abi thread maskray says that most speedup
for clang itself was by avoiding the symbolic relocations, i.e. what protected
visibility or variants of -Bsymbolic* give you, _not_ what -fno-semantic-interposition gives you.  There are unclear claims about fantastic speedups with
cpython, which look a bit doubtful or are done on unrealistic microbenchmarks,
that may or may not come from the disabled interposition or from reduced relocations.
Comment 5 Aaron Puchert 2021-07-07 00:52:52 UTC
(In reply to Richard Biener from comment #2)
> For C there's no ODR so things are more complicated.
What about C11 6.9.5?

  An external definition is an external declaration that is also a definition of
  a function (other than an inline definition) or an object. If an identifier
  declared with external linkage is used in an expression (other than as part of
  the operand of a sizeof or _Alignof operator whose result is an integer
  constant), somewhere in the entire program there shall be exactly one external
  definition for the identifier; otherwise, there shall be no more than one.

(In reply to Michael Matz from comment #3)
> I'm quite heavily opposed to disabling interposition wholesale.
In my view this is more about explicitly annotating interposable functions as weak instead of allowing interposition for any function.

ELF default visibility has always implied interposability, but I think that's mostly seen as historical artifact by now. Visibility and interposability are pretty much orthogonal. Just because a library is "exporting" functions doesn't mean it's inviting others to replace them. With a few exceptions that makes absolutely no sense, and it wouldn't hurt to mark these exceptions as "weak".

> Also note that protected visiblity is going to be fixed on x86-64 (to not be
> worse than default visibility and break with copy relocations, i.e. it will
> work again like before 2014/15) over the next months.
What are the ideas for that? Not doing copy relocations at all, or giving protected visibility a different meaning for variables?

> At that point packages choosing so can simply use that.
It's going to quite a bit of time for packages to migrate to this.

I get your point, default visibility has never been a good default, and it's probably hopeless to save it. Protected visibility is more or less defined as non-interposable export, so that and hidden visibility should satisfy nearly all uses. Default visibility could still be used for the rare case where a function should actually be interposable.

But I think that's what library writers have (in almost all cases) always wanted, but didn't use it because it didn't work and was recommended against. (GCC's man page says "protected and internal are pretty useless in real-world usage so the only other commonly used option is hidden." Drepper writes: "Only  the first two [default and hidden] should ever be used.")

Still, protected visibility just doesn't interpose, it doesn't error out if there is a function with the same name. So while library writers can use it to opt out of interposition (which they probably didn't want in the first place), those who'd want to interpose just get a silent failure just as if we were making -fno-semantic-interposition the default. If library writers know about their users and whether someone might interpose their functions that could be helpful, but I don't know how often that's the case.

In any event, if library writers want to actively allow interposition, an explicit "weak" attribute would communicate that intent clearly.
Comment 6 Richard Biener 2021-07-07 06:15:20 UTC
(In reply to Aaron Puchert from comment #5)
> (In reply to Richard Biener from comment #2)
> > For C there's no ODR so things are more complicated.
> What about C11 6.9.5?
> 
>   An external definition is an external declaration that is also a
> definition of
>   a function (other than an inline definition) or an object. If an identifier
>   declared with external linkage is used in an expression (other than as
> part of
>   the operand of a sizeof or _Alignof operator whose result is an integer
>   constant), somewhere in the entire program there shall be exactly one
> external
>   definition for the identifier; otherwise, there shall be no more than one.

That doesn't constrain the semantics - it merely says that multiple
definitions will be diagnosed by the linker and no definition as well.
I think there's even a stronger guarantee somewhere that mismatched
declarations (as in mismatched types) invoke undefined behavior.

Compare that to the C++ ODR which says (C++14, 3.2(6)):

"There can be more than one definition [...]
 - each definition of D shall consist of the same sequence of tokens; and
[...]"

You possibly can read the C variant as being even stronger, but then even
for C there are "multiple definitions", namely for C99 inline functions,
and I don't see (OK, did not exhaustively search the standard) that the
standard constrains the inline function definitions.
Comment 7 Michael Matz 2021-07-07 13:35:02 UTC
Normally I do like language lawyering very much, but I think in this case it's
not appropriate.  Symbol interposition is an ELF feature, and hence, for us, a
system feature; a fairly powerful one with some disadvantages.  We can't just
disable that feature after 30 years.  At least not if there are ways where we
can mitigate the disadvantages without throwing out the baby completely.

Aaron: the power of interposition doesn't lie in being able to override known
symbols, but rather to override _all_ (exported) symbols.  The point being that
the software authors don't have to explicitely mark symbols as overridable; if
they were to be required to do that, that wouldn't then make it different from
them explicitely designing hooks into their interfaces.  There is for instance
software that hooks many libc routines (and a changeable set of other routines)
in order to, well, hook them (e.g. for tracing).  Symbol interposition makes
this trivial.  Without it you need to resort to the contortions that equivalent
Windows software needs to go through (basically patching either the import
table, if it exists, or even the function code itself).

Basically: if library authors would be required to make their exports hookable,
we would end up with nothing being hookable.  That is because the need for
hooking can't be predicted; if someone needs it, it's usually not the library
author but someone outside who doesn't necessarily (want to) have means to
change the library.

As I've written elsewhere (probably in the above generic-abi thread): I'll
concede that such hooking usually takes place for C symbols, not for other
language symbols.  (Though of course there's nothing that would currently
prevent that, and of course e.g. valgrind does hook
the global c++ allocation routines).  So, I'd be willing to try disabling
interposition for C++ symbols (with exceptions).

But before even that, we need to be clear what exactly we want to change
and what the improvements of that change will be.  Perhaps on a bit more than
just clang (speedup due to fewer symbol lookups) and some anecdote from the web
about cpython.

(FWIW: I do know that libqt also goes to lengths to reduce symbolic
lookups, and libreoffice (at the time still openoffice) had the same problem,
which is why we now have .gnu.hash, and KDE had the problem which is why we
have (had) kdeinit.  I.e. I know that symbolic lookups are a problem for
some software (all C++!) but I also know that it's absolutely no problem
at all for other software, which is why I'm hammering so much on the protected
visibility, being _exactly_ the right tool for avoiding symbolic lookups from
within shared libs)

And yes, Aaron: the idea for "fixing" protected vis is to not generate copy
relocs for variables (i.e. cross module accesses will be indirect, like now but
in the other direction).  For function addresses something similar can be
done; it must be the one from the defining module, not from the PLT slot
in the exe.  H.J. works on some patches that try to phase that into the world
step by step to not break existing binaries.
Comment 8 Aaron Puchert 2021-08-20 18:31:53 UTC
(In reply to Michael Matz from comment #7)
> Normally I do like language lawyering very much, but I think in this case
> it's not appropriate.  Symbol interposition is an ELF feature, and hence,
> for us, a system feature; a fairly powerful one with some disadvantages.
Sure, this isn't only about what's legal and what isn't. The one definition rule does not just constrain me as programmer, it also allows me to better reason about what's happening. Because if I see a function call, and a definition with a matching signature, I know that's the function being called. That's arguably a cornerstone of (interprocedurally) understanding code.

With interposition I might have to ask myself as library author if my library will still behave well if random functions have their functionality replaced, and I'm not sure how one could come to such a conclusion, especially if there are also changes within the library.

> if they were to be required to [explicitly mark symbols as overridable],
> that wouldn't then make it different from them explicitly designing hooks
> into their interfaces.
Fully agreed, though just overriding a function is obviously more comfortable.

> There is for instance software that hooks many libc routines (and a changeable
> set of other routines) in order to, well, hook them (e.g. for tracing).
Certainly many parts of libc are well-suited for this, especially all that invoke system calls. (It's a lot like mocking in that regard.)

When it comes to tracing: eBPF allows out-of-process tracing on any function (not just those with default visibility), so I think we have a good replacement there. What eBPF cannot do is valgrind-like instrumentation, but there is a quite limited set of functions that are interesting for that. It would be strange if someone wanted to instrument functions like sin, cos or strlen.

> Basically: if library authors would be required to make their exports
> hookable, we would end up with nothing being hookable.  That is because the
> need for hooking can't be predicted; if someone needs it, it's usually not
> the library author but someone outside who doesn't necessarily (want to)
> have means to change the library.
That's where I disagree. You're absolutely right that hooking would be done by users of the library, but just like a library author can decide which functions they expose to the outside world for calling, they can surely also decide what to expose for hooking. We're relying on that anyway, a hidden visibility function cannot be interposed, let alone static (internal linkage) functions.

Now I think it would go off the rails if I went into how to decide that, but basically good candidates for hooking are things that invoke system calls or have side effects, like IO of any kind. Bad candidates are functions that are "CPU only", like an FFT or a prime factorization, or decoding a media stream. Alternatively one could see them as pure functions, or functions operating on "value types" as opposed to types like file handles that have an identity. Hooking pure functions is not so interesting because they're memoizable, so when and how often they're called is to some extent meaningless.

> But before even that, we need to be clear what exactly we want to change
> and what the improvements of that change will be.  Perhaps on a bit more than
> just clang (speedup due to fewer symbol lookups) and some anecdote from the
> web about cpython.
Right, libLLVM and libclang are probably a special case with their enormous set of default visibility symbols. And generally C++ seems to be a bit of an issue, as you noted, though I can't quite put my finger on why that is. Perhaps the ability to mark entire classes as default visibility (which also includes private and protected methods) instead of individual functions is a contributing factor.

That being said, I think that fixing protected visibility seems like it would solve the problem as well, at least in the long term. Surely it will take projects time to switch when only the newest gcc/binutils properly support that.

While we're there, couldn't we make protected visibility even stricter and let the dynamic linker error out if there are two symbols of the same name? Like the static linker enforces every symbol having one definition. As I wrote earlier, this one-to-one correspondence is nice when trying to reason about code.