Now that we have a good sense of how how C builds artifacts, I want to survey some other languages and ecosystems to see the similarities and differences. I’m hoping by the end, I have a good idea of what characterizes builds and artifacts in a language, without prematurely diving into details (like how dynamic library search paths are in C). Here’s what I’m thinking:

Let’s look at some languages I’ve run into: Java, Python, and Go.

Java (and other JVM Languages)

Java (and other JVM languages like Scala and Clojure) compile to Java bytecode, rather than targeting a particular OS/available hardware directly like C. The Java Virtual Machine executes this bytecode since it is not directly executable. The JVM abstracts platform details like system calls, giving Java its famed portability: bytecode can be run anywhere the JVM is available.

Java follows the model of:

In C, linking was the process of resolving external references into the corresponding compiled code. Static and dynamic libraries let us choose between resolving at compile time vs runtime. In Java, object files aren’t linked into a standalone artifact. Instead, all classes are loaded at runtime (including the Java runtime classes), typically on demand, by classloaders. Classloaders define how to resolve a classname into the respective bytecode (e.g. the system classloader defines where it expects to fine classes, like wanting my.package.Class to be somewhere in the classpath under My/Package/Class.class). Users can write their own classloaders too.

Java code is typically distributed as a JAR. Managing classloading with JARs is typically a headache: a JAR can define a classpath in its metadata, but cannot reference a JAR within itself. Classloading needs to be carefully managed, leading to build tools to automate or repackage all JARs into a single fat jar (like one-jar or maven assembly).

Takeaways

Python

Python follows a very similar model to Java: Source is read by the Python executable, which compiles it to bytecode (cached as *.pyc files) and runs bytecode on the Python Virtual Machine.

Modules are dynamically loaded in Python. import foo will find and load a module on demand, compiling and running any references. Similar to the Java classloader, Python has its own search rules for resolving a module into the code for it.

Python does a lot less validation when compiling bytecode. I think this might be part of the “dynamic” nature of Python. For modules, where C and Java check external references when compiling, in Python you can get away with something like this:

# bad-import.py
def foo():
  import bogus
  return ""

print("Hello!")
# Evaluates despite missing module
> python3 bad-import.py
Hello!

Python has different archive formats for distributing code and dependencies (e.g. wheels, eggs), but as far as I can tell, external dependencies are included in your program just by having the source or bytecode at one of the module search paths. For example, pip show requests tells me requests is installed at ~/Library/Python/2.7/lib/python/site-packages/requests/ which is on the module search path sys.path). There I see an __init__.py and other source files like api.py and session.py. Even though the Python Packaging User Guide is Python specific, it’s also interesting to see what problems exist in packaging (e.g. packaging non-Python files, specifying system/Python compatibility).

Takeaways

Go

Go looks really similar to C! It compiles to native code.

Go has:

Linking executables looks much like C, but historically Go was statically linked. In more recent versions of Go, executables can be dynamically linked, e.g. if net or os/user packages are used or when interfacing with C. You can also compile Go into a dynamic library (shared object file) for use in C programs which is pretty neat!

Takeaways

Summary

By the end this started to be repetitive, which is cool because I feel it hows we have a good mental model, even across languages. In each we’ve seen source code, compiled version (object files, bytecode files), archives (just zips and tars!) for distribution and packaging, linking and loading (and their search paths) for resolving external references.

There’s still a lot beyond this cursory overview though. I spent way too much time reading about classloaders, JARs and how the Java runtime is bootstrapped. Python has it’s own rules about loading which makes things like virtualenv possible (apparently Python searches from its install location up directories until it find the python libraries?). But, I think with decent understanding of where things fit, these are details and it’s good to leave these until you need it.