Discussion:
[Bug gold/23016] New: assert in output.h on mix of .eh_frame types for x86_64
thanm at google dot com
2018-03-29 13:47:30 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23016

Bug ID: 23016
Summary: assert in output.h on mix of .eh_frame types for
x86_64
Product: binutils
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: gold
Assignee: ccoutant at gmail dot com
Reporter: thanm at google dot com
CC: ian at airs dot com
Target Milestone: ---

Created attachment 10924
--> https://sourceware.org/bugzilla/attachment.cgi?id=10924&action=edit
tar file containing C sources, makefile

The code in the gold linker that handles relocatable links seems to be having
problems when there are a mix of input object files from GCC and Clang
(x86_64), in some cases triggering an assert.

Given two C source files (contents unimportant, but each should have at least
one function), bug can be triggered by running the following commands:

gcc -fPIC -g -O2 -c this.c
clang -fPIC -g -O2 -c other.c
gcc -fuse-ld=gold -fPIC -g -O2 -Wl,-r -nostdlib -no-pie this.o other.o -o
rel.o

resulting in this assert:

ld.gold: internal error in set_info_section, at
../../binutils/gold/output.h:3386

I've reproduced this on linux x86_64 with several versions of clang, gcc, and
ld.gold (including top of trunk for each). GCC 7 + clang 3.9 + gold 1.15 is one
starting point.

Stack trace for the assert looks like:

#0 gold::Output_section::set_info_section (this=0x<...> os=0x<...> at
../../binutils/gold/output.h:3382
#1 0x<...> in gold::Layout::layout_reloc<64, false> (this=0x<...>
object=0x<...> shdr=..., data_section=0x<...> rr=0x<...> at
../../binutils/gold/layout.cc:1354
#2 0x<...> in gold::Sized_relobj_file<64, false>::do_layout (this=0x<...>
symtab=0x<...> layout=0x<...> sd=0x<...> at ../../binutils/gold/object.cc:1856
#3 0x<...> in gold::Object::layout (this=0x<...> symtab=0x<...> layout=0x<...>
sd=0x<...> at ../../binutils/gold/object.h:651
#4 0x<...> in gold::Add_symbols::run (this=0x<...> at
../../binutils/gold/readsyms.cc:634
#5 0x<...> in gold::Workqueue::find_and_run_task (this=0x<...>
thread_number=0) at ../../binutils/gold/workqueue.cc:319
#6 0x<...> in gold::Workqueue::process (this=0x<...> thread_number=0) at
../../binutils/gold/workqueue.cc:495
#7 0x<...> in main (argc=6, argv=0x<...> at ../../binutils/gold/main.cc:252

The relocation section being laid out at the point of the assert is one of the
.rela.eh_frame sections.

When I look at the gcc-compiled objects feeding into the link, I see .eh_frame
sections that look like:

[ 7] .eh_frame
PROGBITS 0000000000000000 000088 000038 00 0 0 8
[0000000000000002]: ALLOC
[ 8] .rela.eh_frame
RELA 0000000000000000 0001e0 000018 18 9 7 8
[0000000000000040]: INFO LINK

The corresponding sections from LLVM-backend compiled objects look like

[16] .eh_frame
X86_64_UNWIND 0000000000000000 0001e8 000030 00 0 0 8
[0000000000000002]: ALLOC
[17] .rela.eh_frame
RELA 0000000000000000 000470 000018 18 20 16 8
[0000000000000000]:

Note the section types -- this seems to be the crux of the problem. LLVM (as of
2013 or so) sets the section type to X86_64_UNWIND, not PROGBITS. The X86_64
ps-ABI recommends using this section type, but not all producers do (gcc does
not).

Googling assert in question turns up a few other instances, including:


https://stackoverflow.com/questions/47797817/ld-gold-internal-error-in-set-info-section-at-output-h3209
https://sourceware.org/bugzilla/show_bug.cgi?id=15861

however it's not clear whether this is the same bug (some of the instances out
in the wild seem to involve the use of a linker script).

I spent a while in the debugger looking at what is happening. When gold
processes the two different .eh_frame sections, it effectively treats them as
distinct sections in the output file (since output section lookup is based on
name, type, and flags). In the case of the .rela.eh_frame sections, however, we
get only a single output section (since both input ".rela.eh_frame" sections
have the same name/type/flags). The assert is happening when
Layout::layout_reloc is called on the second relocation section. The code looks
like:

gold_assert((this->info_section_ == NULL
|| (this->info_section_ == os
&& this->info_uses_section_index_))
&& this->info_symndx_ == NULL
&& this->info_ == 0);

Here "info_section_" is already pointing to the first output section for
.eh_frame, however we're trying to set it to the second output section.

I have been experimenting with fixes. I note that for non-relocatable links,
gold is already effectively merging PROGBITS + X86_64_UNWIND .eh_frame sections
into a single merged PROGBITS section (see the "FIXME" comment in
Layout::make_eh_frame_section). With this in mind, a similar way to solve the
problem would be to do the same merging for relocatable links.

A patch to do this appears below; this fixes the assert.

A second possibility would be to change things so that the two separate
"flavors" of .eh_frame section are preserved in the output object for
relocatable links, meaning that there would have to be two separate relocation
sections as well. This would mean special handling of some sort when creating
output sections for .rela.eh_frame sections (e.g. not just calling
Layout::choose_output_section).

I am attaching a reproducer; unpack the tar file and run "make" (may need to
edit the makefile so that correct gcc and clang paths are picked up). Ex:

$ make THISCC=gcc-7 OTHERCC=clang-3.9 all
gcc-7 -fPIC -g -O2 -c this.c
clang-3.9 -fPIC -g -O2 -c other.c
gcc-7 -fuse-ld=gold -fPIC -g -O2 -Wl,-r -nostdlib -no-pie this.o other.o -o
rel.o
/usr/bin/ld.gold: internal error in set_info_section, at
../../gold/output.h:3386
collect2: error: ld returned 1 exit status
makefile:19: recipe for target 'rel.o' failed
make: *** [rel.o] Error 1
$

-----

Tentative patch:

diff --git a/gold/layout.cc b/gold/layout.cc
index f5fe805ea..987e12092 100644
--- a/gold/layout.cc
+++ b/gold/layout.cc
@@ -829,6 +829,13 @@ Layout::get_output_section(const char* name,
Stringpool::Key name_key,
|| lookup_type == elfcpp::SHT_PREINIT_ARRAY)
lookup_type = elfcpp::SHT_PROGBITS;

+ // Some compilers assign a type of SHT_PROGBITS to the .eh_frame
+ // section, while others use the psABI-recommended SHT_X86_64_UNWIND.
+ // Insure that we combine the two in the resulting output file (mainly
+ // an issue for relocatable links).
+ if (lookup_type == elfcpp::SHT_X86_64_UNWIND)
+ lookup_type = elfcpp::SHT_PROGBITS;
+
elfcpp::Elf_Xword lookup_flags = flags;

// Ignoring SHF_WRITE and SHF_EXECINSTR here means that we combine
--
You are receiving this mail because:
You are on the CC list for the bug.
thanm at google dot com
2018-03-30 16:14:31 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23016

--- Comment #1 from Than McIntosh <thanm at google dot com> ---

After discussion with colleagues, I'm now opting in favor of the second
proposal (being more selective about merging relocation sections). Sent a patch
to the mailing list: https://sourceware.org/ml/binutils/2018-03/msg00416.html.
--
You are receiving this mail because:
You are on the CC list for the bug.
cvs-commit at gcc dot gnu.org
2018-04-03 02:07:57 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23016

--- Comment #2 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Cary Coutant <***@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bce5a025d2ed7eda2c5bbb85bd9b33333ca5d556

commit bce5a025d2ed7eda2c5bbb85bd9b33333ca5d556
Author: Cary Coutant <***@gmail.com>
Date: Mon Apr 2 16:12:10 2018 -0700

Fix problem where mixed section types can cause internal error during a -r
link.

During a -r (or --emit-relocs) link, if two sections had the same name but
different section types, gold would put relocations for both sections into
the same relocation section even though the data sections remained
separate.

For .eh_frame sections, when one section is PROGBITS and another is
X86_64_UNWIND, we really should be using the UNWIND section type and
combining the sections anyway. For other sections, we should be
creating one relocation section for each output data section.

gold/
PR gold/23016
* incremental.cc (can_incremental_update): Check for unwind section
type.
* layout.h (Layout::layout): Add sh_type parameter.
* layout.cc (Layout::layout): Likewise.
(Layout::layout_reloc): Create new output reloc section if data
section does not already have one.
(Layout::layout_eh_frame): Check for unwind section type.
(Layout::make_eh_frame_section): Use unwind section type for .eh_frame
and .eh_frame_hdr.
* object.h (Sized_relobj_file::Shdr_write): New typedef.
(Sized_relobj_file::layout_section): Add sh_type parameter.
(Sized_relobj_file::Deferred_layout::Deferred_layout): Add sh_type
parameter.
* object.cc (Sized_relobj_file::check_eh_frame_flags): Check for
unwind section type.
(Sized_relobj_file::layout_section): Add sh_type parameter; pass it
to Layout::layout.
(Sized_relobj_file::do_layout): Make local copy of sh_type.
Force .eh_frame sections to unwind section type.
Pass sh_type to layout_section.
(Sized_relobj_file<size, big_endian>::do_layout_deferred_sections):
Pass sh_type to layout_section.
* output.cc (Output_section::Output_section): Initialize
reloc_section_.
* output.h (Output_section::reloc_section): New method.
(Output_section::set_reloc_section): New method.
(Output_section::reloc_section_): New data member.
* target.h (Target::unwind_section_type): New method.
(Target::Target_info::unwind_section_type): New data member.

* aarch64.cc (aarch64_info): Add unwind_section_type.
* arm.cc (arm_info, arm_nacl_info): Likewise.
* i386.cc (i386_info, i386_nacl_info, iamcu_info): Likewise.
* mips.cc (mips_info, mips_nacl_info): Likewise.
* powerpc.cc (powerpc_info): Likewise.
* s390.cc (s390_info): Likewise.
* sparc.cc (sparc_info): Likewise.
* tilegx.cc (tilegx_info): Likewise.
* x86_64.cc (x86_64_info, x86_64_nacl_info): Likewise.

* testsuite/Makefile.am (pr23016_1, pr23016_2): New test cases.
* testsuite/Makefile.in: Regenerate.
* testsuite/testfile.cc: Add unwind_section_type.
* testsuite/pr23016_1.sh: New test script.
* testsuite/pr23016_1a.s: New source file.
* testsuite/pr23016_1b.s: New source file.
* testsuite/pr23016_2.sh: New test script.
* testsuite/pr23016_2a.s: New source file.
* testsuite/pr23016_2b.s: New source file.
--
You are receiving this mail because:
You are on the CC list for the bug.
ccoutant at gmail dot com
2018-04-04 16:50:18 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23016

Cary Coutant <ccoutant at gmail dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED

--- Comment #3 from Cary Coutant <ccoutant at gmail dot com> ---
Fixed on trunk.
--
You are receiving this mail because:
You are on the CC list for the bug.
ccoutant at gmail dot com
2018-05-10 17:46:54 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23016

Cary Coutant <ccoutant at gmail dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |alec.theriault at gmail dot com

--- Comment #4 from Cary Coutant <ccoutant at gmail dot com> ---
*** Bug 23155 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...