What is Name Mangling in C++?

When we write programs in C++, we create new functions and methods with (hopefully) descriptive names all the time. Names are important for us humans to understand code. But in the compiled machine code of our finished executables, they don’t play any role. The CPU doesn’t even know about them.

Whenever our program has loaded, all functions and methods are placed at specific memory addresses of the virtual address space assigned to the process of our running program. Whenever a function calls another function, it doesn’t call the name of the function but the address where the function is located in memory.

Of course, the names of our functions and methods play a very important role during the build process. When the compiler compiles our .cpp files into .o files, it puts those names and their addresses into the .o files. In this context, the names are called “symbols”, and the mapping of symbols to addresses is called the symbol table. The linker can then use this information to combine all functions from all .o files of a program into a single executable and make sure that every function calls the correct memory address when it wants to call another function within the program.

The Problem of Finding Symbol Names

In the C programming language, finding a unique symbol name for a function is very easy. Since we cannot overload functions and methods and classes are unheard of, function names are always unique. If you declare a function with a certain name you can’t declare a second one with the same name even if it has different parameters:

// OK
void do_something()
{
    // do something here
}

// Compile error: do_something already exists!
void do_something(int x, int y)
{
    // do something here
}

This means that when we need to find a unique symbol name for do_something, we just choose do_something as the symbol name and are done. And this is what C compilers actually do.

Let’s test this by saving the following program to a file called syms.c:

#include <stdio.h>

void do_something(int x, int y)
{
    printf("Hello World!\n");
}

int main(int argc, char **argv)
{
    do_something(0, 0);
    return 0;
}

Compile the program with:

$ gcc -o syms syms.c

Now you can take a look at the symbol table:

$ nm syms

This should give you output similar to:

000000000000038c r __abi_tag
0000000000004010 B __bss_start
0000000000004010 b completed.0
                 w __cxa_finalize@GLIBC_2.2.5
0000000000004000 D __data_start
0000000000004000 W data_start
0000000000001070 t deregister_tm_clones
00000000000010e0 t __do_global_dtors_aux
0000000000003df8 d __do_global_dtors_aux_fini_array_entry
0000000000001129 T do_something
0000000000004008 D __dso_handle
0000000000003e00 d _DYNAMIC
0000000000004010 D _edata
0000000000004018 B _end
0000000000001164 T _fini
0000000000001120 t frame_dummy
0000000000003df0 d __frame_dummy_init_array_entry
00000000000020e8 r __FRAME_END__
0000000000003fc0 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
0000000000002004 r __GNU_EH_FRAME_HDR
0000000000001000 T _init
0000000000002000 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U __libc_start_main@GLIBC_2.34
000000000000113a T main
00000000000010a0 t register_tm_clones
0000000000001040 T _start
0000000000004010 D __TMC_END__

If you look closely, you will see that there is an entry for our function do_something as well as for the main function. Both are prefixed with T, which means that they are located in the code section of the program.

Of course, compilers could call the symbol sym_do_something or func_do_something or encode the parameters and the return value in the symbol name, like void_do_domething_void or void_do_something_int_int for the second example. But why should they? In C, there is no need to put any extra information in the symbol name.

There is only one little exception when C compilers encode meaning into symbol names, and that is on Windows to differentiate between functions that use different calling conventions (a calling convention is a contract about how to use the stack and/or registers to pass parameters to a function and how to pass back the return value).

Before, we said that the compiler adds symbols to the .o files, and we also said that they are not needed by the program itself. Now it might confuse you that we query an executable for its symbol table. Normally an executable has a symbol table, too. It is just not needed by the code.

To prove this, remove the symbol table from the executable with strip syms and check that there is no more symbol table with nm syms. Now run the program with ./syms, and you will see that the program is still working.

In C++, on the other hand, finding a symbol name is way more complicated. C++ has functions that can be overloaded, and it has methods that are allowed to have the same names as functions and as methods of other classes. And it even supports namespaces.

Mangling Names

This means that in C++, the compiler has no other choice than to encode all the parameters and attributes that a function or method can have to create a unique symbol name for each possible name and signature of a function or method. This process of encoding a name and a signature to a symbol name is called name mangling.

To watch this in action, let’s save the following program in mangling.cpp:

#include <iostream>
#include <string>

class DoSomething
{
public:
    void do_something() { std::cout << "do_something method\n"; };
};

void do_something()
{
    std::cout << "do_something function\n";
}

void do_something(int x, int y)
{
    std::cout << "do_something function: " << "x: " << x << " y: " << y << "\n";
}

int do_something(std::string str)
{
    std::cout << "do_something function: " << str << "\n";
    return 0;
}

int main(int argc, char **argv)
{
    DoSomething obj;

    do_something();
    do_something(0, 0);
    do_something("Hello World");

    obj.do_something();

    return 0;
}

Compile the program with:

g++ -o mangling mangling.cpp

And get the symbol names for our do_something functions and method with grep:

nm mangling | grep "[T|W].*do_something"

This should give you output similar to:

000000000000246d T _Z12do_somethingii
00000000000024fa T _Z12do_somethingNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
0000000000002449 T _Z12do_somethingv
00000000000026e0 W _ZN11DoSomething12do_somethingEv

This looks quite ugly, but this is what the symbols look like that C++ compilers generate from our function and method names and signatures.

No Standardization

Unfortunately, name mangling isn’t standardized in C++. That means that every compiler has its own name mangling scheme that is incompatible with the one used by other compilers. And sometimes name mangling schemes change with a new version of a compiler, rendering it incompatible with its older version.

This is why you can’t usually link C++ object files and libraries that were generated with one compiler with code that was compiled by another compiler.

Demangling Names

So what do you do if your linker spits out error messages containing weird looking strings that you can now identify as mangled symbol names? Luckily there are tools that can do the decoding for us.

On Linux, you can use c++filt to demangle symbol names generated by GCC (for clang use llvm-cxxfilt instead).

Let’s try it out with the first of our symbol names retrieved from the symbol table of our binary:

c++filt "_Z12do_somethingii"

This should give you this output:

do_something(int, int)

Great! This is the signature of our second function.

Let’s decode all of our symbol names:

Symbol Name	Demangled Signature
_Z12do_somethingii	do_something(int, int)
_Z12do_somethingNSt7__cxx1112 basic_stringIcSt11char_traitsIcESaIcEEE	do_something(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
_Z12do_somethingv	do_something()
_ZN11DoSomething12do_somethingEv	DoSomething::do_something()

This looks pretty much like what we defined in C++. Only the second signature looks a bit odd, but that is because std::string is based on templates.

Most of the time you want to demangle a name you got the mangled name from a linker and want to find out what it means. And for this, our solution here is perfectly fine.

But if you actually want to demangle all the names nm prints out, there is a much easier way. Just call nm with the -C flag and it will demangle all symbol names on its own.

Demangling Names on macOS

On macOS, demangling works the same as on Linux with GCC. The name of the demangling program is c++filt which actually is llvm-cxxfilt in disguise.

This is little surprising since macOS and XCode use clang as the compiler.

Demangling Names on Windows

If you use Windows and Visual C++ you can use the command line tool undname which comes with Visual Studio.

To use it just open a x64 Native Tools Command Prompt by typing “x64” in the Windows search.

Finding the x64 Native Tools Command Prompt with the Windows search box

This will give a prompt like this:

Damangling a C++ symbol with undname on Windows

Now you can demangle your symbols:

$ undname ?func1@a@@AAEXH@Z

Web-based Demangling

Probably the easiest way to demangle symbol names is using a web-based demangling service like Demangler.com.

But you might want to watch out for privacy issues since you are sending your data to a third party.

2 thoughts on “What is Name Mangling in C++?”

Alex

Very good explanation of name mangling. It should be pointed out that name mangling is not standardized in C++ because it’s not a requirement for the language. It’s one mechanism used for encoding and conveying symbol information between components of a build system; e.g., between the compiler and linker. Other build systems have used other mechanisms for conveying similar information in other programming languages. (And perhaps in C++ too although the big 3–Visual C++, gcc, and clang–use name mangling, which simplifies matters somewhat for multi-language linkers, e.g., mixing C++ and C object modules.)

LikeLike

30. June 2023 at 6:56 Reply
Pingback: C++20 Modules, CMake, And Shared Libraries - Crascit

Abstract Expression

What is Name Mangling in C++?

The Problem of Finding Symbol Names

Mangling Names

No Standardization

Demangling Names

Demangling Names on macOS

Demangling Names on Windows

Web-based Demangling

Further Reading

2 thoughts on “What is Name Mangling in C++?”

Leave a comment Cancel reply

The Problem of Finding Symbol Names

Mangling Names

No Standardization

Demangling Names

Demangling Names on macOS

Demangling Names on Windows

Web-based Demangling

Further Reading

Teilen mit:

Related

2 thoughts on “What is Name Mangling in C++?”

Leave a comment Cancel reply