Class Emitter<N>
- Type Parameters:
N- the contents of the stack after having emitted all the previous bytecodes
This is either genius or a sign of some deep pathology. On one hand it allows the type-safe
generation of bytecode in Java classfiles. On the other, it requires an often onerous type
signature on any method of appreciable sophistication that uses it. The justification for this
utility library stems from our difficulties with error reporting in the ASM library. We certainly
appreciate the effort that has gone into that library, and must recognize its success in that it
has been used by the OpenJDK itself and eventually prompted them to devise an official classfile
API. Nevertheless, its analyses (e.g., max-stack computation) fail with inscrutable messages.
Admittedly, this only happens when we have generated invalid bytecode. For example, popping too
many items off the stack usually results in an ArrayIndexOutOfBoundsException instead of,
"Hey, you can't pop that here: [offset]". Similarly, if you push a long and then pop an int, you
typically get a NullPointerException. Unfortunately, these errors do not occur with the
offending visitXInstruction() call on the stack, but instead during
MethodVisitor.visitMaxs(int, int), and so we could not easily debug and identify the
cause. We did find some ways to place breakpoints and at least derive the bytecode offset. We
then used additional dumps and instrumentation to map that back to our source that generated the
offending instruction. This has been an extremely onerous process. Additionally, when refactoring
bytecode generation, we are left with little if any assistance from the compiler or IDE. These
utilities seek to improve the situation.
Our goal is to devise a way leverage Java's Generics and its type checker to enforce stack
consistency of generated JVM bytecode. We want the Java compiler to reject code that tries, for
example, to emit an iload followed by an lstore, because there is clearly an
int on the stack where a long is required. We accomplish this by encoding the
stack contents (or at least the local knowledge of the stack contents) in this emitter's type
variable <N>. We encode the types of stack entries using a Lisp-style list. The bottom of
the stack is encoded as Emitter.Bot. A list is encoded with Emitter.Ent where the first type
parameter is the tail of the list (for things further down the stack), and the second type
parameter encodes the JVM machine type, e.g., Types.TInt, of the element at that position. The
head of this list, i.e., the type <N>, is the top of the stack.
The resulting syntax for emitting code is a bit strange, but still quite effective in practice. A
problem we encounter in Java (and most OOP languages to our knowledge) is that an instance method
can always be invoked on a variable, no matter the variable's type parameters. Sure, we can
always throw an exception at runtime, but we want the compiler to reject it, which implies static
checking. Thus, while instance methods can be used for pure pushes, we cannot use them to
validate stack contents, e.g., for pops. Suppose we'd like to specify the lcmp bytecode
op. This would require a long at the top of the stack, but there's no way we can
restrict <N> on the implied this parameter. Nor is there an obvious way to unpack
the contents of <N> so that we can remove the Types.TLong and add a Types.TInt.
Instead, we must turn to static methods.
This presents a different problem. We'd like to provide a syntax where the ops appear in the order they are emitted. Usually, we'd chain instance methods, like such:
em
.ldc(1)
.pop();
However, we've already ruled out instance methods. Were we to use static methods, we'd get something like:
Op.pop(Op.ldc(em, 1));
However, that fails to display the ops in order. We could instead use:
var em1 = Op.ldc(em, 1); var em2 = Op.pop(em1);However, that requires more syntactic kruft, not to mention the manual bookkeeping to ensure we use the previous
emn at each step. To work around this, we define instance
methods, e.g., emit(Function), that can accept references to static methods we provide,
each representing a JVM bytecode instruction. This allows those static methods to impose a
required structure on the stack. The static method can then return an emitter with a type
encoding the new stack contents. (See the Op class for examples.) Thus, we have a syntax
like:
em
.emit(Op::ldc__i, 1)
.emit(Op::pop);
While not ideal, it is succinct, allows method chaining, and displays the ops in order of
emission. (Note that we use this pattern even for pure pushes, where restricting <N> is
not necessary, just for syntactic consistency.) There are some rubs for operators that have
different forms, e.g., Op.ldc__i(Emitter, int), but as a matter of opinion, having to
specify the intended form here is a benefit. The meat of this class is just the specification of
the many arities of emit. It also includes some utilities for declaring local variables,
and the entry points for generating and defining methods.
To give an overall taste of using this utility library, here is an example for dynamically generating a class that implements an interface. Note that the interface is not dynamically generated. This is a common pattern as it allows the generated method to be invoked without reflection.
interface MyIf {
int myMethod(int a, String b);
}
<THIS extends MyIf> void doGenerate(ClassVisitor cv) {
var mdescMyMethod = MthDesc.derive(MyIf::myMethod)
.check(MthDesc::returns, Types.T_INT)
.check(MthDesc::param, Types.T_INT)
.check(MthDesc::param, Types.refOf(String.class))
.check(MthDesc::build);
TRef<THIS> typeThis = Types.refExtends(MyIf.class, "Lmy.pkg.ImplMyIf;");
var paramsMyMethod = new Object() {
Local<TRef<THIS>> this_;
Local<TInt> a;
Local<TRef<String>> b;
};
var retMyMethod = Emitter.start(typeThis, cv, ACC_PUBLIC, "myMethod", mdescMyMethod)
.param(Def::param, Types.refOf(String.class), l -> paramsMyMethod.b = l)
.param(Def::param, Types.T_INT, l -> paramsMyMethod.a = l)
.param(Def::done, typeThis, l -> paramsMyMethod.this_ = l);
retMyMethod.em()
.emit(Op::iload, paramsMyMethod.a)
.emit(Op::ldc__i, 10)
.emit(Op::imul)
.emit(Op::ireturn, retMyMethod.ret())
.emit(Misc::finish);
}
Yes, there is a bit of repetition; however, this accomplishes all our goals and a little more.
Note that the generated bytecode is essentially type checked all the way through to the method
definition in the MyIf interface. Here is the key: We were to change the MyIf
interface, the compiler (and our IDE) would point out the inconsistency. The first such
errors would be on mdescMyMethod. So, we would adjust it to match the new definition. The
compiler would then point out issues at retMyMethod -- assuming the parameters to
myMethod changed, and not just the return type. We would adjust it, along with the
contents of paramsMyMethod to accept the new parameter handles. If the return type of
myMethod changed, then the inferred type of retMyMethod will change accordingly.
Now for the generated bytecode. The Op.iload(Emitter, Local) requires the given variable
handle to have type Types.TInt, and so if the parameter "a" changed type, the compiler
will point out that the opcode must also change. Similarly, the Op.imul(Emitter) requires
two ints and pushes an int result, so any resulting inconsistency will be caught. Finally, when
calling Op.ireturn(Emitter, RetReq), two things are checked: 1) there is indeed an int on
the stack, and 2) the return type of the method, witnessed by retMyMethod.ret(), is also
an int. There are some occasional wrinkles, but for the most part, once we resolve all the
compilation errors, we are assured of type consistency in the generated code, both internally and
in its interface to other compiled code.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceA 3-argument consumerstatic interfaceA 3-argument functionstatic interfaceA 4-argument consumerstatic interfaceA 4-argument functionstatic interfaceA 5-argument functionstatic interfaceA 6-argument functionstatic interfaceA 7-argument functionstatic interfaceA 7-argument functionstatic interfaceThe bottom of the stack, i.e., the empty stackstatic interfaceUse in place of stack contents when code emitted at this point would be unreachablestatic interfaceEmitter.Ent<N extends Emitter.Next,T extends Types.BNonVoid> An entry on the stackstatic interfaceStack contents -
Constructor Summary
ConstructorsConstructorDescriptionEmitter(org.objectweb.asm.MethodVisitor mv) Create a new emitter by wrapping the given method visitor. -
Method Summary
Modifier and TypeMethodDescriptionstatic <N extends Emitter.Next>
Emitter<N> assume(org.objectweb.asm.MethodVisitor mv, N assumedStack) (Not recommended) Wrap the given method visitor with assumed stack contents<R,A1, A2>
Remit(Emitter.A3Function<? super Emitter<N>, A1, A2, R> func, A1 arg1, A2 arg2) Emit a 2-argument operator<R,A1, A2, A3>
Remit(Emitter.A4Function<Emitter<N>, A1, A2, A3, R> func, A1 arg1, A2 arg2, A3 arg3) Emit a 3-argument operator<R,A1, A2, A3, A4>
Remit(Emitter.A5Function<? super Emitter<N>, A1, A2, A3, A4, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4) Emit a 4-argument operator<R,A1, A2, A3, A4, A5>
Remit(Emitter.A6Function<? super Emitter<N>, A1, A2, A3, A4, A5, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5) Emit a 5-argument operator<R,A1, A2, A3, A4, A5, A6>
Remit(Emitter.A7Function<? super Emitter<N>, A1, A2, A3, A4, A5, A6, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5, A6 arg6) Emit a 6-argument operator<R,A1, A2, A3, A4, A5, A6, A7>
Remit(Emitter.A8Function<? super Emitter<N>, A1, A2, A3, A4, A5, A6, A7, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5, A6 arg6, A7 arg7) Emit a 7-argument operator<R,A1> R emit(BiFunction<? super Emitter<N>, A1, R> func, A1 arg1) Emit a 1-argument operator<R> REmit a 0-argument operatorGet the root scope for declaring local variablesstatic <MR extends Types.BType,OT, N extends Emitter.Next>
Methods.ObjDef<MR, OT, N> start(Types.TRef<OT> owner, org.objectweb.asm.ClassVisitor cv, int access, String name, Methods.MthDesc<MR, N> desc) Define an instance methodstatic <MR extends Types.BType,N extends Emitter.Next>
Methods.Def<MR, N> start(org.objectweb.asm.ClassVisitor cv, int access, String name, Methods.MthDesc<MR, N> desc) Define a static methodstatic Emitter<Emitter.Bot> start(org.objectweb.asm.MethodVisitor mv) Wrap the given method visitor assuming an empty stack
-
Constructor Details
-
Emitter
public Emitter(org.objectweb.asm.MethodVisitor mv) Create a new emitter by wrapping the given method visitor.Direct use of this constructor is not recommended, but is useful during transition from unchecked to checked bytecode generation.
- Parameters:
mv- the ASM method visitor
-
-
Method Details
-
rootScope
Get the root scope for declaring local variables- Returns:
- the root scope
-
emit
Emit a 0-argument operatorThis can also be used to invoke generator subroutines whose only argument is the emitter.
- Type Parameters:
R- the return type- Parameters:
func- the method reference, e.g.,Op.pop(Emitter).- Returns:
- the value returned by
func
-
emit
Emit a 1-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method reference, e.g.,Op.ldc__i(Emitter, int).arg1- the argument (other than the emitter) to pass tofunc- Returns:
- the value returned by
func
-
emit
Emit a 2-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method referencearg1- an argument (other than the emitter) to pass tofuncarg2- the next argument- Returns:
- the value returned by
func
-
emit
public <R,A1, R emitA2, A3> (Emitter.A4Function<Emitter<N>, A1, A2, A3, R> func, A1 arg1, A2 arg2, A3 arg3) Emit a 3-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method referencearg1- an argument (other than the emitter) to pass tofuncarg2- the next argumentarg3- the next argument- Returns:
- the value returned by
func
-
emit
public <R,A1, R emitA2, A3, A4> (Emitter.A5Function<? super Emitter<N>, A1, A2, A3, A4, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4) Emit a 4-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method referencearg1- an argument (other than the emitter) to pass tofuncarg2- the next argumentarg3- the next argumentarg4- the next argument- Returns:
- the value returned by
func
-
emit
public <R,A1, R emitA2, A3, A4, A5> (Emitter.A6Function<? super Emitter<N>, A1, A2, A3, A4, A5, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5) Emit a 5-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method referencearg1- an argument (other than the emitter) to pass tofuncarg2- the next argumentarg3- the next argumentarg4- the next argumentarg5- the next argument- Returns:
- the value returned by
func
-
emit
public <R,A1, R emitA2, A3, A4, A5, A6> (Emitter.A7Function<? super Emitter<N>, A1, A2, A3, A4, A5, A6, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5, A6 arg6) Emit a 6-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method referencearg1- an argument (other than the emitter) to pass tofuncarg2- the next argumentarg3- the next argumentarg4- the next argumentarg5- the next argumentarg6- the next argument- Returns:
- the value returned by
func
-
emit
public <R,A1, R emitA2, A3, A4, A5, A6, A7> (Emitter.A8Function<? super Emitter<N>, A1, A2, A3, A4, A5, A6, A7, R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5, A6 arg6, A7 arg7) Emit a 7-argument operatorThis can also be used to invoke generator subroutines.
- Type Parameters:
R- the return type- Parameters:
func- the method referencearg1- an argument (other than the emitter) to pass tofuncarg2- the next argumentarg3- the next argumentarg4- the next argumentarg5- the next argumentarg6- the next argumentarg7- the next argument- Returns:
- the value returned by
func
-
assume
public static <N extends Emitter.Next> Emitter<N> assume(org.objectweb.asm.MethodVisitor mv, N assumedStack) (Not recommended) Wrap the given method visitor with assumed stack contentsstart(ClassVisitor, int, String, MthDesc)orstart(TRef, ClassVisitor, int, String, MthDesc)is recommended instead.- Type Parameters:
N- the stack contents- Parameters:
mv- the ASM method visitorassumedStack- the assumed stack contents- Returns:
- the emitter
-
start
Wrap the given method visitor assuming an empty stackstart(ClassVisitor, int, String, MthDesc)orstart(TRef, ClassVisitor, int, String, MthDesc)is recommended instead.- Parameters:
mv- the ASM method visitor- Returns:
- the emitter
-
start
public static <MR extends Types.BType,N extends Emitter.Next> Methods.Def<MR,N> start(org.objectweb.asm.ClassVisitor cv, int access, String name, Methods.MthDesc<MR, N> desc) Define a static method- Type Parameters:
MR- the type returned by the methodN- the parameter types of the method- Parameters:
cv- the ASM class visitoraccess- the access flags (static is added automatically)name- the name of the methoddesc- the method descriptor- Returns:
- an object to aid further definition of the method
-
start
public static <MR extends Types.BType,OT, Methods.ObjDef<MR,N extends Emitter.Next> OT, startN> (Types.TRef<OT> owner, org.objectweb.asm.ClassVisitor cv, int access, String name, Methods.MthDesc<MR, N> desc) Define an instance method- Type Parameters:
MR- the type returned by the methodOT- the type owning the methodN- the parameter types of the method- Parameters:
owner- the owner type (as a reference type)cv- the ASM class visitoraccess- the access flags (static is forcibly removed)name- the name of the methoddesc- the method descriptor- Returns:
- an object to aid further definition of the method
-