# From C to exec: Part 2

The last part explored static linking, but stopped before actually executing our executable. If you think this was all the linking, you're in for a surprise!

This post is best followed using the code from the previous part.

## Act 1: Unexpected symbol

### Scene 1: Undefined?

Did you notice the lonely symbol in the last act? Let's call it forth:

# gcc -c -o printer.o printer.c
# ar rcs libprinter.a printer.o
# nm libprinter.a

0000000000000000 T print_hello
                 U puts

That's a lonely "Undefined" symbol! And where did it even come from? Remember `printer.c`:

#include <stdio.h>
void print_hello(void) {
  puts("Hello World!\n");

We call `puts`, but we never define it… Previously, when we tried to use a symbol that we didn't define, gcc refused to finish the linking, but here it works:

# gcc hello.o libprinter.a -o hello
# ./hello
Hello World!

What gives?

### Scene 2: Deus ex library

Before we solve the mystery, let me throw another hint towards the solution: `puts` is hidden in a library! More specifically, it's in libc:

# nm /lib64/libc.so.6 | grep puts
0000000000073280 W puts

Here, "W" means it's a "weak" but defined symbol.

We know where the lost symbol is, and where it's missing from. But we still don't know the connection between the two places! The newest trace we have is the `/lib64/libc.so.6` file...

### Scene 3: Dynamic libraries

We're familiar with static libraries already. They have the `.a` file ending, and they are an ingredient for the executable, and are no longer used after linking. But another variety exists. Like static libraries, it contain symbols, but its usefulness extends beyond linking time, and into the actual execution. They are called *dynamic libraries*, and their file names traditionally end with `.so`, for "shared object".

# file /lib64/libc.so.6
/lib64/libc.so.6: symbolic link to libc-2.29.so
# file /lib64/libc-2.29.so
/lib64/libc-2.29.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=51377236ad01808e26404c98faa41d72f11a46a5, for GNU/Linux 3.2.0, not stripped, too many notes (256)

`libc.so.6` is one of them.

### Scene 4: Reconstruction

If we want to make progress on the story of the misplaced `puts`, we need to understand dynamic libraries. Time to create our own!

We already created a static library before, so let's use the same procedure, but make a dynamic one this time:

# gcc -fPIC -c printer.c -o printer.o
# gcc -shared printer.o -o libprinter.so
# file libprinter.so
libprinter.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=dd2595216f9c10b317ab3dbece8450a31fcaf672, not stripped
# nm libprinter.so | grep hello
0000000000001109 T print_hello

The procedure is similar to building a static library, except packaging with `ar` was replaced by another linking step (with `-shared`), making it look more like creating an executable. The object file gets created in a slightly different way, with the `-fPIC` flag being obligatory.

Most importantly, we see our `print_hello` symbol as present! Let's link the whole program together:

# gcc hello.o libprinter.so -o hello_dynamic
# file hello_dynamic
hello_dynamic: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5a8e0d151089682720545349da87b330ab089805, not stripped

So far so good, this looks just like our previous static binary. Let's make sure all is fine.

# nm hello_dynamic | grep hello
                 U print_hello
# ./hello_dynamic
./hello_dynamic: error while loading shared libraries: libprinter.so: cannot open shared object file: No such file or directory

Uh oh! We lost `print_hello` along the way, and the program now tries to open the shared library we created. The one that contains `print_hello`! Could there be a connection? What made the program try to load a file if all we do is printing text?

### Scene 5: Dynamic linker

The culprit here is the dynamic linker. It's part of the operating system responsible for loading programs. Remember the `interpreter` part of our file descriptions?

[...] dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2 [...]

The interpreter's responsibility is to execute executables. It sets up basic resources, links them together if needed (!), and finally gives control to the program by running the entry point (in C that ends up being the function corresponding to the `main` symbol).

You may be surprised to see linking here again. Didn't we do enough of that already? We linked object files together with the dynamic library, and the dynamic library needed to be linked itself. Why again?

Well, that's because we chose to create a dynamic library instead of a static one. The static library's selling point is that its code is injected (at the time of linking) into the executable. The dynamic library's point is that its code is injected (at the time of execution) into the running program. The former option provides some degree of certainty that the program doesn't change, while the latter gives some flexibility: the dynamic library can get changed and updated without the need to recreate the executable.

When the interpreter complains about `cannot open shared object file`, it means that it needed the library for the linking step, but couldn't find it in the standard location. The `ldd` command lists all dynamic libraries that need to be linked together before running the executable:

# ldd ./hello_dynamic
        linux-vdso.so.1 (0x00007ffc35be9000)
        libprinter.so => not found
        libc.so.6 => /lib64/libc.so.6 (0x00007fc874215000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc874419000)

Our shared library is `not found`, because it's not in one of the standard paths. Thankfully, we can tell the dynamic linker to look for extra libraries in the current directory:

# LD_LIBRARY_PATH=`pwd` ldd ./hello_dynamic
        linux-vdso.so.1 (0x00007ff5f4e32000)
        libprinter.so => /home/rhn/libprinter.so (0x00007ff5f4e28000)
        libc.so.6 => /lib64/libc.so.6 (0x00007ff5f4c26000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff5f4e33000)
# LD_LIBRARY_PATH=`pwd` ./hello_dynamic
Hello World!

Great, we directed the computer to find our lost `print_hello` symbol inside a shared library!

### Scene 6: Standard libraries

The list coming from `ldd` is suspicious. In our final step using gcc, we linked `hello.o` together with `libprinter.so`. Where did the other dynamic libraries on the list come from?

The answer is: kernel and standard libraries.

If you look at our statically linked executable, you will see the same list:

# ldd ./hello
        linux-vdso.so.1 (0x00007ffeda3d9000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f4e3293a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4e32b3e000)

The libraries here are required for the execution of virtually all programs, therefore gcc includes them implicitly. If you remember all the way back, we found `puts` in one of them: in `libc.so.6`. It turns out that all our executables do eventually link with that library, just well after the executable is created. That's why we were allowed to use `puts` even without knowing where it was defined.

## Final words

With this, you should have a pretty good understanding of where linking happens on a modern computer. Linking still has a lot of quirks when working with C — after all, we may want some control over what we mark as symbols and how. That will et covered in a future part.

## Glossary

Written on .


dcz's projects

Thoughts on software and society.

Atom feed