A few Fridays back I got a message from a friend:
also puzzler for you:
$ ls -la /path/to/my/executable
-r-xr-xr-x 1 root root 6559032 Jan 1 1970 /path/to/my/executable
$ /path/to/my/executable
-bash: /path/to/my/executable: No such file or directory
After a few sad guesses, I asked my coworker and got:
Oh, it’s a dynamically linked file missing the loader
What
When you execute a (dynamically linked) executable, it’s not actually executing the file, the kernel runs the loader. It’s not in the right path so it errors. Here
Okay, but first, context! Before, when we talked about building C programs, we saw when compiling a binary, you can package all object code together in one file (static linked) or distribute libraries/modules independently in shared object archives that are installed separately and loaded at runtime (dynamic linking). When a dynamically executed file is started, it relies on a linker/loaded to load all the shared object files into memory before executing.
Now you’re on the same page as I was: something’s going wrong with loading, but why file not found? It’s the kernel doing all this? How does this all happen?
The demo went something like:
$ docker run -it --rm ubuntu
$ ldd /bin/mv
...
/lib64/ld-linux-x86-64.so.2
...
$ apt-get update
$ apt-get install file
$ file /bin/mv
/bin/mv: ELF 64-bit LSB shared object
x86-64, version 1 (SYSV)
dynamically linked
interpreter /lib64/ld-linux-x86-64.so.2
for GNU/Linux 3.2.0
...
# Or with readelf
$ apt-get install binutils
$ readelf /bin/mv -l
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
...
/bin/mv
is dynamically linked with /lib64/ld-linux-x86-64.so.2
as the program interpreter. You
can run programs with it directly too!
$ /lib64/ld-linux-x86-64.so.2 /bin/ls -al /bin/ls
-rwxr-xr-x 1 root root 133792 Jan 18 2018 /bin/ls
But when it’s not located on the right path, the kernel errors trying to load it.
$ mv /lib64/ld-linux-x86-64.so.2 /tmp/
$ ls
bash: /bin/ls: No such file or directory
$ mv /tmp/ld-linux-x86-64.so.2 /lib64/
bash: /bin/mv: No such file or directory
# uh oh
$ /tmp/ld-linux-x86-64.so.2 /bin/mv /tmp/ld-linux-x86-64.so.2 /lib64/
# Things are okay now
We’ve reproduced our problem! And so ended the workweek.
ld-linux
And thus began weekend reading! To start with, ld-linux(8) points out:
-
dynamic linker can be run either indirectly by running some dynamically linked program or library (e.g. ELF binaries in the
PT_INTERP
program header) or directly by running:/lib/ld-linux.so.* [OPTIONS] [PROGRAM [ARGUMENTS]]
I think being able to run directly is for convenience, but hey we saw that.
-
Also
ld.so and ld-linux.so* find and load the shared libraries needed by a program, prepare the program to run, and then run it.
This hints at how this is happening, namely that
ld-linux
is actually running the program in addition to loading shared libraries
How programs get run
lwn.net has some pretty good articles on what happens when a program is exec’d:
Namely, a number of executable formats are defined as instances of struct linux_binfmt
, the most
interesting to me being:
- binfmt_script: files that start with
#!
- binfmt_misc: dynamically defined formats in
/proc/sys/fs/binfmt_misc
, which Jessie Frazelle dove into to script go, and Java uses to exec.jar
files - binfmt_elf: ELF binaries, standard Linux binary format
After some initialization, the kernel iterates over each format, having each attempt to load the
binary file, and returning the first that successfully recognizes it without returning ENOEXEC
.
For example, binfmt_script
matches on the #!
at the start, parses the interpreter in the first
line, and recursively execs on it (which then matches to another binfmt). The original file is
passed in as argv[0] like:
$ cat args.sh
#! /bin/bash
echo "args: $*"
$ cat test
#! args.sh
$ ./test
args: ./test
You could write a much more interesting interpreter like say, python, that actually uses the given file.
Additionally, fun facts:
- Because
BINPRM_BUF_SIZE
is set to 128, the#! interpreter
will fail to match an interpreter path longer than 128 bytes (minus like 3 or 4 bytes) - We just recursed into another
binfmt_script
executable. You can’t do this forever,exec
returnsELOOP
after 4 levels:-bash: ./5: 4: bad interpreter: Too many levels of symbolic links
But from our demo, we were dealing with binfmt_elf
which does a lot more than binfmt_script
.
The lwn article covers all the heavy lifting to load and initialize the program and kernel state, but here I’ll focus on the dynamic loading.
If the PT_INTERP
program header is present, the interpreter is itself loaded into memory as well.
The entry point is replaced with the interpreter’s, and execution begins with the interpreter (the
linker here) loading shared libraries and resolving symbols in userspace. After which, the
interpreter jumps to the original entry point, and actual execution begins.
Why did this all happen
My friend, being the Nix enthusiast that he is, has all sorts of interesting puzzlers. Nix wants to load it’s own Glibc, and programs must be either compiled with a correct ELF interpreter or patched afterwards with patchelf, which rewrites the interpreter header.
I believe this was a program compiled with Nix, installed into a regular, Docker-built container (so
only the regular /lib/ld-linux
is present) and boom, thus arises a fun problem most developers
won’t ever have to deal with.