C++中多个动态库链接同一个静态库: 违反ODR导致double free

引言

最近在研究C++中单例模式, 发现网上有人提到这个问题. 记录一下探究的过程.

google找到了这个能够复现该问题的帖子: Global variable in static library - double free or corruption error

复现流程如下:

  • 静态库里定义一个全局变量, 该对象构造时分配内存, 析构时释放内存

  • 有2个动态库都使用到了该变量, 动态库都链接了该静态库

  • 主程序里调用2个动态库的函数, 运行时libc提示检测到了double free

$ ./reproduce.sh
Foo::Foo(int) begin: 0x7d15500f9048 0
Foo::Foo(int) end: 0x7d15500f9048 0x61720d3316c0
Foo::Foo(int) begin: 0x7d15500f9048 0x61720d3316c0
Foo::Foo(int) end: 0x7d15500f9048 0x61720d3316e0
void a(): 0x7d15500f9048
void b(): 0x7d15500f9048
Foo::~Foo(): 0x7d15500f9048 0x61720d3316e0
Foo::~Foo(): 0x7d15500f9048 0x61720d3316e0
free(): double free detected in tcache 2
./reproduce.sh: line 8:  5622 Aborted                 (core dumped) LD_LIBRARY_PATH=. ./main.elf

测试环境

$ cat ./reproduce.sh
#!/bin/env bash

g++ ./foo.cpp -std=c++17 -O2 -g -fPIC -c -o foo.o
ar rcs libfoo.a foo.o
g++ ./a.cpp -std=c++17 -O2 -g -fPIC -shared -L. -lfoo -o liba.so
g++ ./b.cpp -std=c++17 -O2 -g -fPIC -shared -L. -lfoo -o libb.so
g++ ./main.cpp -std=c++17 -O2 -g -L. -la -lb -o main.elf
LD_LIBRARY_PATH=. ./main.elf
// foo.h
#pragma once

#include <iostream>

class Foo {
public:
  Foo(int val) {
    std::cout << __PRETTY_FUNCTION__ << " begin: " << this << " " << p_ << "\n";
    p_ = new int(val);
    std::cout << __PRETTY_FUNCTION__ << " end: " << this << " " << p_ << "\n";
  }
  Foo(Foo const &) = delete;
  Foo &operator=(Foo const &) = delete;
  ~Foo() noexcept {
    std::cout << __PRETTY_FUNCTION__ << ": " << this << " " << p_ << "\n";
    delete p_;
  }

private:
  int *p_;
};

extern Foo foo;
// foo.cpp
#include "foo.h"

Foo foo {0x666};
// a.cpp
#include "foo.h"

void a() { std::cout << __PRETTY_FUNCTION__ << ": " << &foo << "\n"; }
// b.cpp
#include "foo.h"

void b() { std::cout << __PRETTY_FUNCTION__ << ": " << &foo << "\n"; }
// main.cpp
extern void a();

extern void b();

int main() {
  a();
  b();

  return 0;
}

原因分析

分析输出的字符串, 可以看到里同一块内存调用了2次构造函数, 2次析构函数. 第2次构造函数产生了内存泄露, 第1次申请的内存不会被释放, 而第2次申请的内存会被反复释放, 导致了double free.

其中liba.so以及libb.so里都链接了libfoo.a, 所以都定义了一份foo, 程序加载时ld.so会对符号进行合并, liba.so以及libb.so里重定位后都会使用一份内存.

liba.so以及libb.so都会在.init_array以及.fini_array里执行Foo::Foo(int)以及Foo::~Foo(), 导致了最终的crash.

这个应该属于UB, ODR要求foo在整个程序里只能存在一份定义, 而现在有2份.

One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined.

通过readelf以及objdump简单验证我们的猜想.

可以看到liba.so以及libb.so里都定义了foo符号.

File: ./liba.so

Symbol table '.symtab' contains 41 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    26: 0000000000004048     8 OBJECT  GLOBAL DEFAULT   26 foo

File: ./libb.so

Symbol table '.symtab' contains 41 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    27: 0000000000004048     8 OBJECT  GLOBAL DEFAULT   26 foo

查看so的.init_array section

$ objdump -s -j .init_array ./liba.so

./liba.so:     file format elf64-x86-64

Contents of section .init_array:
 3dc8 e0110000 00000000 00110000 00000000  ................
$ objdump -s -j .init_array ./libb.so

./libb.so:     file format elf64-x86-64

Contents of section .init_array:
 3dc8 e0110000 00000000 00110000 00000000  ................

反汇编查看是否调用了Foo:Foo(int), 可以发现.init_array里的第3项指向的函数调用了构造函数.

$ objdump -d -Mintel ./liba.so | c++filt
0000000000001100 <_GLOBAL__sub_I_foo.cpp>:
    1100:       f3 0f 1e fa             endbr64
    1104:       53                      push   rbx
    1105:       48 8b 1d ac 2e 00 00    mov    rbx,QWORD PTR [rip+0x2eac]        # 3fb8 <foo@@Base-0x90>
    110c:       be 66 06 00 00          mov    esi,0x666
    1111:       48 89 df                mov    rdi,rbx
    1114:       e8 d7 ff ff ff          call   10f0 <Foo::Foo(int)@plt>
    1119:       48 8b 3d a8 2e 00 00    mov    rdi,QWORD PTR [rip+0x2ea8]        # 3fc8 <Foo::~Foo()@@Base+0x2d78>
    1120:       48 89 de                mov    rsi,rbx
    1123:       5b                      pop    rbx
    1124:       48 8d 15 05 2f 00 00    lea    rdx,[rip+0x2f05]        # 4030 <__dso_handle>
    112b:       e9 80 ff ff ff          jmp    10b0 <__cxa_atexit@plt>

进一步确认0x3fb8通过重定位项指向了foo.

$ readelf -rW ./liba.so | c++filt

Relocation section '.rela.dyn' at offset 0x758 contains 12 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000003fb8  0000000d00000006 R_X86_64_GLOB_DAT      0000000000004048 foo + 0

怎么解决这个问题: C++17开始支持inline variable, 它允许在程序中多次定义.

inline variable

There can be more than one definition in a program of each of the following: class type, enumeration type, inline function, inline variable(since C++17), templated entity (template or member of template, but not full template specialization), as long as all following conditions are satisfied:

For an inline function or inline variable(since C++17), a definition is required in every translation unit where it is odr-used.

修改foo.h以及foo.cpp, 将foo定义放到头文件, 注释掉foo.cpp里原本的定义.

// foo.h
inline Foo foo {0x666};
// foo.cpp
// Foo foo {0x666};

可以看到现在foo只会构造1次, 析构1次.

$ ./reproduce.sh
Foo::Foo(int) begin: 0x747cb4e42050 0
Foo::Foo(int) end: 0x747cb4e42050 0x5d921df826c0
void a(): 0x747cb4e42050
void b(): 0x747cb4e42050
Foo::~Foo(): 0x747cb4e42050 0x5d921df826c0

继续通过readelf以及objdump简单分析下底层机制. 发现多了一个guard variable, 第1次构造后将它置1, 第2次执行到这发现已经置1后, 就不会执行Foo:FOO(int).

0000000000001100 <_GLOBAL__sub_I_a.cpp>:
    1100:       f3 0f 1e fa             endbr64
    1104:       48 8b 05 9d 2e 00 00    mov    rax,QWORD PTR [rip+0x2e9d]        # 3fa8 <guard variable for foo@@Base-0xa0>
    110b:       80 38 00                cmp    BYTE PTR [rax],0x0
    110e:       74 01                   je     1111 <_GLOBAL__sub_I_a.cpp+0x11>
    1110:       c3                      ret
    1111:       53                      push   rbx
    1112:       48 8b 1d 9f 2e 00 00    mov    rbx,QWORD PTR [rip+0x2e9f]        # 3fb8 <foo@@Base-0x98>
    1119:       be 66 06 00 00          mov    esi,0x666
    111e:       c6 00 01                mov    BYTE PTR [rax],0x1
    1121:       48 89 df                mov    rdi,rbx
    1124:       e8 c7 ff ff ff          call   10f0 <Foo::Foo(int)@plt>
    1129:       48 8b 3d 98 2e 00 00    mov    rdi,QWORD PTR [rip+0x2e98]        # 3fc8 <Foo::~Foo()@@Base+0x2d68>
    1130:       48 89 de                mov    rsi,rbx
    1133:       5b                      pop    rbx
    1134:       48 8d 15 f5 2e 00 00    lea    rdx,[rip+0x2ef5]        # 4030 <__dso_handle>
    113b:       e9 70 ff ff ff          jmp    10b0 <__cxa_atexit@plt>
$ readelf -rW ./liba.so  | c++filt

Relocation section '.rela.dyn' at offset 0x778 contains 13 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000003fa8  0000001300000006 R_X86_64_GLOB_DAT      0000000000004048 guard variable for foo + 0
0000000000003fb8  0000000d00000006 R_X86_64_GLOB_DAT      0000000000004050 foo + 0