Java, Serializable, and Externalizable
One of the features in JBuilder 2006 is peer-to-peer. JBuilder 2006 allows users to exchange messages, edit a file remotely, debug remotely, etc. (Yes, JBuilder 2007 also has peer-to-peer, but the main purpose of this particular post is to discuss a serialization issue we first discovered in implementing peer-to-peer in JB2006). The actual sending of bytes from one machine to another in JB2006 is done using JGroups. Without going into too much detail, the way you send a message (the bytes) from one machine to another with JGroups is via Java serialization. You might have a class like this:
public class PeerMessage implements java.io.Serializable {
private static final long serialVersionUID = 1L;
private final String message;
private final long id;
public PeerMessage(String message, long id) {
this.message = message;
this.id = id;
}
public long getId() {
return id;
}
public String getMessage() {
return message;
}
}
You then create instances of the PeerMessage class, pass them to JGroups, which sends them to another machine. The other machine deserializes, and processes the PeerMessage instance, e.g., perhaps by displaying PeerMessage.getMessage() in a chat window.
When we first implemented this in R&D, after ironing out some kinks, it was pretty much working. Developers could connect with other developers. Then Integration started delivering builds to QA, and we found that some instances of JBuilder could only successfully communicate with certain instances of JBuilder but not with others. We soon figured out what was going on. Our Integration department delivers both obfuscated and non-obfuscated builds to QA. The pattern was that obfuscated builds could only communicate with other obfuscated builds, and non-obfuscated builds could only communicate with non-obfuscated builds.
Does anybody see why the above, simple little class, is the problem? Well if you don’t, the solution follows. And, as we will see, the solution also produces less data when serializing objects, which is certainly a very important benefit when you are sending data over a network.
What does the default Java serialization do? It essentially writes out the fully qualified class name, the names and types of non-transient fields, and the values of the fields. When you deserialize, it instantiates the class using that information. (I am probably over-simplifying here. For purposes of this discussion though, it’s all we need to know. If you’re really interested, go check the Sun website for more details).
What does Java obfuscation do? It can rename package names, class names, method names, and field names. Most obfuscators give you control over what you want renamed. Typically you don’t want anything public or protected renamed; otherwise your API names would change.
So what happens when the above class is obfuscated? The fields message and id get renamed, probably to something cryptic like a and b (the getMessage() and getId() method names remain unchanged because they are public).
So when a message is sent from a non-obfuscated JBuilder to obfuscated JBuilder:
- Non-obfuscated JBuilder serializes the
PeerMessageinstance, writing out, among other things, the field namesidandmessage - Obfuscated JBuilder attempts to deserialize the
PeerMessage. It doesn’t work! Obfuscated JBuilder has a classPeerMessagewith fieldsaandb, but the serialized object has fieldsidandmessage.
Arguably we could have lived with this. We only ship obfuscated builds to customers, and the obfuscated builds were able to talk to each other. Although even that would get dicey for any updates — we would have to make sure all the relevant classes were obfuscated the exact same way, e.g., the field message was always renamed to a, or then different obfuscated versions may not work with each other. And it still made sense to fix it so developers could debug from their non-obfuscated builds while communicating with obfuscated builds. And, as I mentioned earlier, the solution produces smaller serialized objects, which should improve network performance.
The solution we came up with is to use an interface I was personally unfamiliar with, java.io.Externalizable. It extends Serializable. The difference is that you write the serialization and deserialization yourself. Here is the class PeerMessage2, with differences in bold:
import java.io.*;
public class PeerMessage2 implements Externalizable {
private static final long serialVersionUID = 1L;
private String message; //No longer final!
private long id; //No longer final!
public PeerMessage2() { //New, parameterless constructor
}
public PeerMessage2(String message, long id) {
this.message = message;
this.id = id;
}
public long getId() {
return id;
}
public String getMessage() {
return message;
}
public void readExternal(ObjectInput in) throws IOException,
ClassNotFoundException {
id = in.readLong();
message = in.readUTF();
}
public void writeExternal(ObjectOutput out) throws IOException {
out.writeLong(id);
out.writeUTF(message);
}
}
With Externalizable, what is essentially written out is the fully qualified class name, and whatever you specify in writeExternal(). (Again I may be over-simplifying here, but it’s good enough to understand this problem). No field names, no field types. That is why a serialized PeerMessage2 is smaller than a serialized PeerMessage. I wrote a simple test, creating a PeerMessage object with "Hello" and 1 as its parameters, and serialized it to a file. I then created a PeerMessage2 object with the same parameters and serialized it to a file. The serialized PeerMessage instance was 105 bytes; the serialized PeerMessage2 was 72 bytes. Thats a difference of 33 bytes, making PeerMessage 45% larger than PeerMessage2, or making PeerMessage2 31% smaller than PeerMessage, depending on how you want to skew the statistics. Of course those numbers will vary according to the number of fields in the class, the length of the field names, and the size of the fields’ data. But the bottom line is that smaller serializations are produced.
And what about the obfuscation issue? It’s no longer an issue. Field names are no longer written out, just the field data is. So when the readExternal is executed, it doesn’t matter if the fields’ names are message and id, or if they are a and b. Whatever they are, the data gets assigned to them.
Unfortunately, there are some drawbacks to using Externalizable:
- The way
readExternal()works is that an instance of the class is created invoking a public, parameterless constructor. You then fill in the fields’ values of the instance inreadExternal(). Therefore, you must declare a public, parameterless constructor. I don’t like exposing that.PeerMessagecan only be instantiated by invoking the constructor with parameters. I know the fields in that class will be initialized. WithPeerMessage2the object can be created without initializing the fields. - Because of the the requirement of parameterless constructor, and because you have to assign fields values in
readExternal(), you can no longer make your fieldsfinal. I likefinalfields. It’s a commonly recommended practice. As Joshua Bloch’s Effective Java book says about classes, "Favor immutability". Yeah, you can still avoid having setters, so nobody from the outside can change the instance, but still…
But despite the drawbacks, Externalizable is certainly very useful in certain cases.
Charles
Share This | Email this page to a friend
Posted by Charles Overbeck on January 24th, 2007 under Java |

RSS Feed

January 31st, 2007 at 5:12 am
Hi Charles,
You certainly have covered the size issue but what about speed? Is Externalizable faster than Serializable?
Furthermore, how do you deal with Classes that contain some Collection/List etc? And would calling super.readExternal() suffice to handle a hierarchy?
Finally, since you guys develop an IDE, is there any tool that would allow me to convert a Serializable class (or hierarchy) to Externalizable? That’d be a useful Wizard…
Thanks
Benoit