A few Fridays back I got a message from a friend:

also puzzler for you:

$ ls -la /path/to/my/executable
-r-xr-xr-x 1 root root 6559032 Jan 1 1970 /path/to/my/executable

$ /path/to/my/executable
-bash: /path/to/my/executable: No such file or directory

After a few sad guesses, I asked my coworker and got:

Oh, it’s a dynamically linked file missing the loader

What

When you execute a (dynamically linked) executable, it’s not actually executing the file, the kernel runs the loader. It’s not in the right path so it errors. Here

Okay, but first, context! Before, when we talked about building C programs, we saw when compiling a binary, you can package all object code together in one file (static linked) or distribute libraries/modules independently in shared object archives that are installed separately and loaded at runtime (dynamic linking). When a dynamically executed file is started, it relies on a linker/loaded to load all the shared object files into memory before executing.

Now you’re on the same page as I was: something’s going wrong with loading, but why file not found? It’s the kernel doing all this? How does this all happen?

The demo went something like:

$ docker run -it --rm ubuntu

$ ldd /bin/mv
  ...
	/lib64/ld-linux-x86-64.so.2
  ...

$ apt-get update
$ apt-get install file
$ file /bin/mv
/bin/mv: ELF 64-bit LSB shared object
  x86-64, version 1 (SYSV)
  dynamically linked
  interpreter /lib64/ld-linux-x86-64.so.2
  for GNU/Linux 3.2.0
  ...

# Or with readelf
$ apt-get install binutils
$ readelf /bin/mv -l
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  ...
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  ...

/bin/mv is dynamically linked with /lib64/ld-linux-x86-64.so.2 as the program interpreter. You can run programs with it directly too!

$ /lib64/ld-linux-x86-64.so.2 /bin/ls -al /bin/ls
-rwxr-xr-x 1 root root 133792 Jan 18  2018 /bin/ls

But when it’s not located on the right path, the kernel errors trying to load it.

$ mv /lib64/ld-linux-x86-64.so.2 /tmp/

$ ls
bash: /bin/ls: No such file or directory

$ mv /tmp/ld-linux-x86-64.so.2 /lib64/
bash: /bin/mv: No such file or directory

# uh oh

$ /tmp/ld-linux-x86-64.so.2 /bin/mv /tmp/ld-linux-x86-64.so.2 /lib64/
# Things are okay now

We’ve reproduced our problem! And so ended the workweek.

ld-linux

And thus began weekend reading! To start with, ld-linux(8) points out:

How programs get run

lwn.net has some pretty good articles on what happens when a program is exec’d:

Namely, a number of executable formats are defined as instances of struct linux_binfmt, the most interesting to me being:

After some initialization, the kernel iterates over each format, having each attempt to load the binary file, and returning the first that successfully recognizes it without returning ENOEXEC.

For example, binfmt_script matches on the #! at the start, parses the interpreter in the first line, and recursively execs on it (which then matches to another binfmt). The original file is passed in as argv[0] like:

$ cat args.sh
#! /bin/bash
echo "args: $*"

$ cat test
#! args.sh

$ ./test
args: ./test

You could write a much more interesting interpreter like say, python, that actually uses the given file.

Additionally, fun facts:

But from our demo, we were dealing with binfmt_elf which does a lot more than binfmt_script.

The lwn article covers all the heavy lifting to load and initialize the program and kernel state, but here I’ll focus on the dynamic loading.

If the PT_INTERP program header is present, the interpreter is itself loaded into memory as well. The entry point is replaced with the interpreter’s, and execution begins with the interpreter (the linker here) loading shared libraries and resolving symbols in userspace. After which, the interpreter jumps to the original entry point, and actual execution begins.

Why did this all happen

My friend, being the Nix enthusiast that he is, has all sorts of interesting puzzlers. Nix wants to load it’s own Glibc, and programs must be either compiled with a correct ELF interpreter or patched afterwards with patchelf, which rewrites the interpreter header.

I believe this was a program compiled with Nix, installed into a regular, Docker-built container (so only the regular /lib/ld-linux is present) and boom, thus arises a fun problem most developers won’t ever have to deal with.