prose :: and :: conz


The JVM Bytes: Pilot Post

Thanks to a recent talk by Heroku’s Joe Kutner, I got inspired to learn how to program on the JVM. I’m sure you’re thinking I’ve been doing just that for years with Java, Groovy, and Scala. However, I’ve yet to write straight to bytecode. Joe’s aforementioned talk covered the basics of how the JVM works under the hood of our languages. He introduced the JVM architecture and tools for reading and writing our own bytecode. I have been writing a lisp compiler lately to explore this subject, and I’ve decided it is time to kick off a blog series on the topic of JVM bytecode.

What is bytecode? It’s the *.class files emitted by your compiler. It is a low-level binary format much like native machine code. The difference being that the JVM itself is another layer of software between your bytecode and the metal of your machine. The point to emphasize here is in order to view the bytecode directly, you’ve gotta use a hex editor. Well, that’s quite difficult to understand of course. What we want is something that is a little higher-level than the bytes, much like assembly language. There is no assembly for the JVM, tho. The languages to straight to the bytes. Fortunately the JDK includes javap which will print out the bytecode of a class file in a human-readable format. Invoke javap -v -c <classname> in the root directory of your class files to view it.

For example, compile a typical Hello World program in java then view the javap output. This java class…

public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello Java!");
    }
}

…produces the following javap output…

>javap -v -c Hello
Compiled from "Hello.java"
public class Hello extends java.lang.Object
  SourceFile: "Hello.java"
  minor version: 0
  major version: 50
  Constant pool:
const #1 = Method       #6.#15; //  java/lang/Object."<init>":()V
const #2 = Field        #16.#17;        //  java/lang/System.out:Ljava/io/PrintStream;
const #3 = String       #18;    //  Hello Java!
const #4 = Method       #19.#20;        //  java/io/PrintStream.println:(Ljava/lang/String;)V
const #5 = class        #21;    //  Hello
const #6 = class        #22;    //  java/lang/Object
const #7 = Asciz        <init>;
const #8 = Asciz        ()V;
const #9 = Asciz        Code;
const #10 = Asciz       LineNumberTable;
const #11 = Asciz       main;
const #12 = Asciz       ([Ljava/lang/String;)V;
const #13 = Asciz       SourceFile;
const #14 = Asciz       Hello.java;
const #15 = NameAndType #7:#8;//  "<init>":()V
const #16 = class       #23;    //  java/lang/System
const #17 = NameAndType #24:#25;//  out:Ljava/io/PrintStream;
const #18 = Asciz       Hello Java!;
const #19 = class       #26;    //  java/io/PrintStream
const #20 = NameAndType #27:#28;//  println:(Ljava/lang/String;)V
const #21 = Asciz       Hello;
const #22 = Asciz       java/lang/Object;
const #23 = Asciz       java/lang/System;
const #24 = Asciz       out;
const #25 = Asciz       Ljava/io/PrintStream;;
const #26 = Asciz       java/io/PrintStream;
const #27 = Asciz       println;
const #28 = Asciz       (Ljava/lang/String;)V;

{
public Hello();
  Code:
   Stack=1, Locals=1, Args_size=1
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
   4:   return
  LineNumberTable:
   line 1: 0

public static void main(java.lang.String[]);
  Code:
   Stack=2, Locals=1, Args_size=1
   0:   getstatic       #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   ldc     #3; //String Hello Java!
   5:   invokevirtual   #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   8:   return
  LineNumberTable:
   line 3: 0
   line 4: 8
}

Ok, so that is definitely a lot of information. I’m not going to attempt digesting all of this information in this blog post. Frankly, I don’t understand it all, and we can begin making progress with a few observations.

First take note of the constant pool. If you have a fair amount of experience on the JVM, there is a good chance you are already aware that any String literals in your code are stored as constants for the entirety of the program. We can find our "Hello Java!" in this table as const #3 and const #18. I have no idea why it is split this way (perhaps a const for the reference and one for the char array?). I’ll leave that for the reader to explain to me. :)

Next notice the code below the constant pool. These are the byte code mnemonics of the program. Let’s focus on the main method where we see four things happen:

  1. Get the static field out from java.lang.System.
  2. Do something cryptic with our "Hello Java!" string.
  3. Invoke the method println on java.io.PrintStream with an argument of type java.lang.String.
  4. Finally, we return from the method.

Notice that the invokevirtual has neither System.out or "Hello Java!" specified. That is because in the previous two steps we put those values on a stack. The JVM is a stack-architecture machine. Every operation we have here (except return) operates on this stack. First we push the target object onto the stack, namely System.out. Then we load the argument string onto the stack from the run-time constant pool with ldc. Finally we invoke the appropriate method on the target object. This invocation pops those two items off the stack and performs the method.

Don’t confuse this stack with the call stack that you are familiar with. Each frame of the call stack has this stack I am referring to as its playground where it can push and pop values to perform all of the duties of the running program.

The main takeaways are (1) javap for viewing byte code, (2) the constant pool, and (3) the stack architecture. Stay tuned for the next installment where I will introduce some tooling to assist us in writing our own Hello World straight to JVM instructions.

Leave a reply below, or send me a tweet.

Tagged with: java (22), tutorial (5), bytecode (1)