Java Q&A Java 问答
January 24, 2003 二零零三年一月二十四日
Q:What are the advantages and disadvantages of implementing deep cloning via Java serialization and a built-in Object.clone() method from a performance point of view? 问:从性能的角度观之, 以 Java serialization(次第读写)或者内建的 Object.clone() 方法(method)来实现 deep cloning(深度克隆),各有哪些优劣之处?
A:Equipping classes in your application with correct clone() implementation is essential to many defensive programming patterns. Common examples include defensive copying of method parameters, cloning internal fields before returning them in getters, implementing immutability patterns, implementing containers with deep cloning semantics, and so on. 答:在您的应用程序中为各个类别搭载正确实现了的 clone() 方法,这对于许多防御式编程模式而言是至关重要的。常见的防御式编程模式包括:防御式的对方法接收的参数进行拷贝;从getters返回内部字段(field)之前先对该内部字段进行克隆;实现提供不可变功能的模式(immutability patterns);以 deep cloning(深度克隆)语义实现 containers(容器);等等。
Even though the question mentions just two possibilities, there are at least four distinct approaches to clone() implementation. In this Java Q&A installment, I consider design and performance tradeoffs involved in all of them. 尽管问题中只提到了两个可能的方案,其实至少有四种不同的方案来实现 clone() 方法。在本期的Java 问答中,我就针对这四种方案来进行设计和性能两方面的权衡。
Because cloning is so customizable, this article's examples will not necessarily translate directly to your own code; however, the general conclusions we will reach should provide useful guidelines in any application design. 由于克隆实现代码的可定制性很强,因此本文的示例代码不一定就适合直接转化到您自己的代码中;然而,我们得出的普适结论应该能为任何应用设计提供有用的指导。
Note the following disclaimer: What exactly constitutes a deep clone is debatable. Even though two objects can safely share the same String reference viewed as data, they cannot if the same field is used as an instance-scoped object monitor (such as when calling Object.wait()/notify() on it) or if field instance identity (such as when using the == operator) is significant to the design. In the end, whether or not a field is shareable depends on the class design. For simplicity, I assume below that all fields are used as pure data. 请注意这面这句不作承诺的声明:deep clone(深度克隆)究竟有哪些具体的实现要素,这个问题本身就具有争议性。尽管一个被视为数据的“String 引用”可以被两个对象安全的共享,但如果该 String 字段是被用作实体生存空间范围内(instance-scoped)的对象监视器(object monitor,比如对其调用 Object.wait()/notify() 的情形),或者字段实体的身份(identity)对于设计而言至关重要(比如使用 == operator 的情形),那么它就无法被安全的共享了。一言以蔽之,字段是否可被共享取决于类别的设计。为了简单起见,我假设本文所述的所有字段都被视为纯粹的数据来使用。
Performance measurements setup 用于性能度量的范例设定 Let's jump right into some code. I use the following simple hierarchy of classes as my cloning guinea pig: 让我们直接来看些代码。我使用如下简单的类别阶层体系来作为克隆“实验鼠”:
public class TestBaseClass implements Cloneable, Serializable { public TestBaseClass (String dummy) { m_byte = (byte) 1; m_short = (short) 2; m_long = 3L; m_float = 4.0F; m_double = 5.0; m_char = '6'; m_boolean = true; m_int = 16; m_string = "some string in TestBaseClass"; m_ints = new int [m_int]; for (int i = 0; i < m_ints.length; ++ i) m_ints [i] = m_int; m_strings = new String [m_int]; m_strings [0] = m_string; // invariant: m_strings [0] == m_string for (int i = 1; i < m_strings.length; ++ i) m_strings [i] = new String (m_string); } public TestBaseClass (final TestBaseClass obj) { if (obj == null) throw new IllegalArgumentException ("null input: obj"); // Copy all fields: m_byte = obj.m_byte; m_short = obj.m_short; m_long = obj.m_long; m_float = obj.m_float; m_double = obj.m_double; m_char = obj.m_char; m_boolean = obj.m_boolean; m_int = obj.m_int; m_string = obj.m_string; if (obj.m_ints != null) m_ints = (int []) obj.m_ints.clone (); if (obj.m_strings != null) m_strings = (String []) obj.m_strings.clone (); } // Cloneable: public Object clone () { if (Main.OBJECT_CLONE) { try { // Chain shallow field work to Object.clone(): final TestBaseClass clone = (TestBaseClass) super.clone (); // Set deep fields: if (m_ints != null) clone.m_ints = (int []) m_ints.clone (); if (m_strings != null) clone.m_strings = (String []) m_strings.clone (); return clone; } catch (CloneNotSupportedException e) { throw new InternalError (e.toString ()); } } else if (Main.COPY_CONSTRUCTOR) return new TestBaseClass (this); else if (Main.SERIALIZATION) return SerializableClone.clone (this); else if (Main.REFLECTION) return ReflectiveClone.clone (this); else throw new RuntimeException ("select cloning method"); } protected TestBaseClass () {} // accessible to subclasses only private byte m_byte; private short m_short; private long m_long; private float m_float; private double m_double; private char m_char; private boolean m_boolean; private int m_int; private int [] m_ints; private String m_string; private String [] m_strings; // invariant: m_strings [0] == m_string } // end of class public final class TestClass extends TestBaseClass implements Cloneable, Serializable { public TestClass (String dummy) { super (dummy); m_int = 4; m_object1 = new TestBaseClass (dummy); m_object2 = m_object1; // invariant: m_object1 == m_object2 m_objects = new Object [m_int]; for (int i = 0; i < m_objects.length; ++ i) m_objects [i] = new TestBaseClass (dummy); } public TestClass (final TestClass obj) { // Chain to super copy constructor: super (obj); // Copy all fields declared by this class: m_int = obj.m_int; if (obj.m_object1 != null) m_object1 = ((TestBaseClass) obj.m_object1).clone (); m_object2 = m_object1; // preserve the invariant if (obj.m_objects != null) { m_objects = new Object [obj.m_objects.length]; for (int i = 0; i < m_objects.length; ++ i) m_objects [i] = ((TestBaseClass) obj.m_objects [i]).clone (); } } // Cloneable: public Object clone () { if (Main.OBJECT_CLONE) { // Chain shallow field work to Object.clone(): final TestClass clone = (TestClass) super.clone (); // Set only deep fields declared by this class: if (m_object1 != null) clone.m_object1 = ((TestBaseClass) m_object1).clone (); clone.m_object2 = clone.m_object1; // preserve the invariant if (m_objects != null) { clone.m_objects = (Object []) m_objects.clone (); for (int i = 0; i < m_objects.length; ++ i) clone.m_objects [i] = ((TestBaseClass) m_objects [i]).clone (); } return clone; } else if (Main.COPY_CONSTRUCTOR) return new TestClass (this); else if (Main.SERIALIZATION) return SerializableClone.clone (this); else if (Main.REFLECTION) return ReflectiveClone.clone (this); else throw new RuntimeException ("select cloning method"); } protected TestClass () {} // accessible to subclasses only private int m_int; private Object m_object1, m_object2; // invariant: m_object1 == m_object2 private Object [] m_objects; } // End of class
TestBaseClass has several fields of primitive types as well as a String and a couple of array fields. TestClass both extends TestBaseClass and aggregates several instances of it. This setup allows us to see how inheritance, member object ownership, and data types can affect cloning design and performance. TestBaseClass 拥有几个基本型别(primitive types)的字段(fields),还有一个 String 以及两个数组。 TestClass 继承自 TestBaseClass ,还聚合了几个 TestBaseClass 实体。这种范例设定可以让我们看到继承、成员对象所有权(ownership)以及数据类型如何会影响克隆方法的设计与性能。
In a previous Java Q&A article, I developed a simple timing library that comes in handy now. This code in class Main measures the cost of TestClass.clone(): 在 上一期 Java 问答 中,我开发了一个简单的计时程序库,现在可以信手拈来使用。在 class Main 中的如下代码测量了 TestClass.clone() 的时间消耗:
// Create an ITimer: final ITimer timer = TimerFactory.newTimer (); // JIT/hotspot warmup: // ... TestClass obj = new TestClass (); // Warm up clone(): // ... final int repeats = 1000; timer.start (); // Note: the loop is unrolled 10 times for (int i = 0; i < repeats / 10; ++ i) { obj = (TestClass) obj.clone (); ... repeated 10 times ... } timer.stop (); final DecimalFormat format = new DecimalFormat (); format.setMinimumFractionDigits (3); format.setMaximumFractionDigits (3); System.out.println ("method duration: " + format.format (timer.getDuration () / repeats) + " ms");
I use the high-resolution timer supplied by TimerFactory with a loop that creates a moderate number of cloned objects. The elapsed time reading is reliable, and there is little interference from the garbage collector. Note how the obj variable continuously updates to avoid memory caching effects. 我使用了由 TimerFactory 提供的高解析度的计时器(high-resolution timer),利用一个循环创建了相当数量的克隆出来的对象。表示流逝时间的数据是可靠的,受垃圾收集器的影响很小。请注意 obj 变量被持续更新,以避免内存缓冲效应(memory caching effects)。
Also note how clone() is implemented in both classes. The implementation in each class is in fact four, selected one at a time using four conditional compilation constants in Main: OBJECT_CLONE, COPY_CONSTRUCTOR, SERIALIZATION, and REFLECTION. Recompile the entire object when changing the cloning approach. 还请注意,在两个类别中都实现了 clone() 方法。实际上每个类别中都有四种克隆动作的实现,可以通过 Main 里面的条件编译常量(conditional compilation constants)来选择施行其中之一,这些常量分别是: OBJECT_CLONE,COPY_CONSTRUCTOR,SERIALIZATION 以及 REFLECTION 。要改变克隆动作的实现方案,需要重新编译整个类别。
Let's now examine each approach in detail. 现在我们就分别详细的审视前述的四个方案。
Approach 1: Cloning by chaining to Object.clone() 方案 1:通过串链 Object.clone() 实现克隆 This is perhaps the most classical approach. The steps involved are: 这或许就是最经典型的方案了。该方案涉及的实现步骤为:
Declare your class to implement the Cloneable marker interface.令您的类别实现 Cloneable 标记接口(marker interface)。
Provide a public clone override that always begins with a call to super.clone() followed by manual copying of all deep fields (i.e., mutable fields that are object references and cannot be shared between several instances of the parent class).提供一个覆写(override)版本的 public clone 方法,其内以调用 super.clone() 开头,后面再接续拷贝所有深层字段(deep fields,即为对象引用,且不能共享于多个父辈类别实体之间的可变字段(mutable fields))的代码。
Declare your clone override not to throw any exceptions, including CloneNotSupportedException. To this effect, the clone() method in your hierarchy's first class that subclasses a non-Cloneable class will catch CloneNotSupportedException and wrap it into an InternalError.声明该覆写(override)版本的 clone 方法不抛出任何异常,包括不能抛出 CloneNotSupportedException 异常。 意思就是说:在您的类别阶层体系中,对于第一个派生自 non-Cloneable 类别的那个类别,其 clone() 方法能够捕获 CloneNotSupportedException 异常并将该异常包入 InternalError 中。
Correct implementation of Cloneable easily deserves a separate article. Because my focus is on measuring performance, I will repeat the relevant points here and direct readers to existing references for further details (see Resources). 光是 Cloneable 的正确实现方法就可以很容易的需要占用另外一整篇文章的篇幅来进行阐述。鉴于我在这里关注的是性能的测量,因而我也就只复述一些相关的要点,并为读者您提供更多细节的参考信息(详见参考资源)。
This traditional approach is particularly well suited to the presence of inheritance because the chain of super.clone() eventually calls the native java.lang.Object.clone() implementation. This is good for two reasons. First, this native method has the magic ability to always create an instance of the most derived class for the current object. That is, the result of super.clone() in TestBaseClass is an instance of TestClass when TestBaseClass.clone() is part of the chain of methods originating from TestClass.clone(). This makes it easy to implement the desirable x.clone().getClass() == x.getClass() invariant even in the presence of inheritance. 这个经典型的方案特别适用于有继承体系的地方,因为 super.clone() 串链最终会导致调用原生的 java.lang.Object.clone() 方法。说这样做很妥当有两个原因。其一,该原生方法(native method)具有神奇的能力,总是能够为当前对象创建继承体系最末端的类别实体。这就是说,TestBaseClass 中 super.clone() 的执行结果得到 TestClass 实体,因为 TestBaseClass.clone() 是起源自 TestClass.clone() 的一系列串链起来的方法之一。这样一来,即使是在继承体系之中也很容易实现我们想要的 x.clone().getClass() == x.getClass() 不变式(invariant)。
Second, if you examine the JVM sources, you will see that at the heart of java.lang.Object.clone() is the memcpy C function, usually implemented in very efficient assembly on a given platform; so I expect the method to act as a fast "bit-blasting" shallow clone implementation, replicating all shallow fields in one fell swoop. In many cases, the only remaining manual coding is done to deeply clone object reference fields that point to unshareable mutable objects. 其二,如果您查看JVM源代码的话,您会看到 java.lang.Object.clone() 的核心部分是C函数 memcpy ,这个函数是用目标平台上非常高效的汇编代码实现的;因此可以期望这个 java.lang.Object.clone() 方法的实现是以快速的“按比特狂做(bit-blasting)”之方式进行的浅度克隆(shallow clone),能够迅捷的复制所有浅层字段(shallow fields)。这样一来在许多情况下,所剩的唯一需要手工编写的代码就只用负责对“指向非共享、可易变对象(unshareable mutable objects)之引用”进行深度克隆。
Running the test with the OBJECT_CLONE variable set to true on a Windows 550-MHz machine with Sun Microsystems' JDK 1.4.1 produces: 将 OBJECT_CLONE 变量设为 true ,在一台安装了 Sun Microsystems JDK 1.4.1 的 Windows 550-MHz 机器上面运行测试程序就产生出如下结果:
clone implementation: Object.clone() method duration: 0.033 ms
This is not bad for a class with multiple primitive and object reference fields. But for better insight, I must compare the result with other approaches below. 对于拥有多个基本型别字段和对象引用字段的类别而言,这不算坏。然而为了更好的考究问题,我须将此结果与下面其它方案进行比较才对。
Despite its advantages, this approach is plagued with problems due to poor java.lang.Object.clone() design. It cannot be used for cloning final fields unless they can be copied shallowly. Creating smart, deeply cloning container classes is complicated by the fact that Cloneable is just a marker interface, and java.lang.Object.clone() is not public. Finally, cloning inner classes does not work due to problems with outer references. See articles by Mark Davis and Steve Ball in Resources for some of the earliest discussions about this topic. 尽管该方案有自己的优势,但设计欠佳的 java.lang.Object.clone() 方法使其备受折磨。除非 final 字段能被浅层拷贝,否则该方案就不能用于对 final 字段进行克隆的情形。由于 Cloneable 只是一个标记接口(marker interface),而 java.lang.Object.clone() 方法又不是 public ,因此创建既聪明又具有 deeply cloning(深度克隆)能力的 container classes(容器类别)变得复杂起来。最后,由于外围引用(outer references)亦招致问题,因此该方案也无法运用于克隆内隐类别(inner classes)的情形。关于此议题的最早的讨论,参见 参考资源 中 Mark Davis 和 Steve Ball 的文章。
Approach 2: Cloning via copy construction 方案 2: 通过拷贝构造动作进行克隆
This approach complements Approach 1. It involves these steps: 这是对方案1的增强补足方案,实现起来包含下列步骤:
For every class X, provide a copy constructor with signature X(X x). 对于每个 class X ,以标记式(signature) X(X x) 来提供一个 copy constructor 。Chain to the base class's copy constructor in all but the first class in your hierarchy. You can chain to clone() or directly to the base copy constructor. The former choice is more polymorphic and works when the base copy constructor is private, and the latter sometimes avoids the small cost of casting clone()'s return value to a specific type. 将基类的拷贝构造函数(copy constructor)串链到类别阶层体系的所有类别中,阶层体系最顶端的第一个类除外。您可以将其串链到这些类的 clone() 方法中,或者直接串链到它们的基类的拷贝构造函数(copy constructor)中。前一种做法更具多态特性,在基类的拷贝构造函数(copy constructor)为private时即可凑效;后一种做法有时候能够避免“将 clone() 方法的返回值转型(cast)到某个特定型别”所带来的微小性能消耗。
Following the chaining call, set all class fields by copying them from the input parameter. For every object reference field, you decide individually whether to clone it deeply. 将上述调用串链起来之后,将输入参数拷贝给所有的类别字段(fields)。接着由您自己来决定是否对各个对象引用字段进行深度克隆。
Setting COPY_CONSTRUCTOR to true and rerunning the test produces: 将 COPY_CONSTRUCTOR 设为 true ,再重新运行测试程序,产生如下结果:
clone implementation: copy construction method duration: 0.024 ms
This beats Approach 1. The result might not be surprising because the overhead of native method calls has increased and the cost of new object creation has decreased with increasing JDK versions. If you rerun the same tests in Sun's JDK 1.2.2, the situation favors Approach 1. Of course, performance depends on the relative mix of shallow and deep fields in the class hierarchy. Classes with many primitive type fields benefit more from Approach 1. Classes with a few mostly immutable fields work very efficiently with Approach 2, with a speed advantage at least 10 times greater than Approach 1. 这次的结果意味方案2胜过方案1。或许这结果并不令人吃惊,因为增加了对原生方法的调用,而创建新对象的消耗伴随着 JDK 版本的升高而减小。如果您在 Sun 公司的 JDK 1.2.2 之下重新运行相同的测试,方案1就会胜出。当然,性能依赖于类别阶层体系中浅层字段(shallow fields)和深层字段(deep fields)的混杂方式。拥有很多基本型别之字段的类别会更多的得益于方案1。而对于只拥有少数字段且多为不可变字段的类别,方案2运作得非常高效,其速度上的优势至少为快过方案1十倍。
Approach 2 is more error prone than Approach 1 because it is easy to forget to override clone() and accidentally inherit a superclass's version that will return an object of the wrong type. If you make the same mistake in Approach 1, the result will be less disastrous. Additionally, it is harder to maintain the implementation in Approach 2 when fields are added and removed (compare the OBJECT_CLONE branch in TestBaseClass.clone() with similar code in the copy constructor). Also, Approach 1 requires less class cooperation in some cases: for a base class with only shallow fields, you don't need to implement Cloneable or even provide a clone() override if you do not intend to clone at the base class level. 方案2比方案1更容易出错,因为很容易忘记覆写(override) clone() 方法,并由此意外的继承了父辈类别(superclass)的 clone() 版本,其返回一个错误型别的对象。但若您在方案1中犯下同样的错误,后果就不会那么惨重。另外,当类别的字段被添加或者删除时,方案2的实现代码更难于维护(将 TestBaseClass.clone() 中的 OBJECT_CLONE 分支与拷贝构造函数中的相应代码进行比较即可知)。再有就是,方案1在某些情况下对类别之间的合作需求更少:对于只拥有浅层字段的基类,您不需要实现 Cloneable 方法;如果您无意在基类的层级上进行克隆动作,您甚至不需要提供覆写版本的 clone() 方法。
However, an undeniable advantage of cloning via copy construction is that it can handle both final fields and inner classes. But due to dangers present when inheritance is involved, I recommend using this sparingly and preferably simultaneously with making the relevant classes final. 然而,通过拷贝构造动作进行克隆(译注:即方案2)有个不可否认的优势,此即:该方案既可以处理 final 字段,也可以处理内隐类别(inner classes)。鉴于该方案在涉及继承时所具有的危险性,我建议保守的采用之,且采用该方案时最好同时将有关的类别声明为final 。
Approach 3: Cloning via Java serialization
方案 3:通过 Java serialization(次第读写)进行克隆
Java serialization is convenient. Many classes are made serializable by simply declaring them to implement java.io.Serializable. Thus, a whole hierarchy of classes can be made cloneable by deriving them from a base Serializable class whose clone() is implemented as a simple, yet extremely generic method: Java serialization(次第读写)方便好用。许多类别只要被简单的声明为“实现 java.io.Serializable” 就能具备 serializable 性质。于是,若令整个阶层体系派生自基类 Serializable ,那么阶层体系的所有类别就都能具备 cloneable 性质,欲使然只要求基类 Serializable 实现出一个简单,同时又极为通用的 clone() 方法:
public Object clone (Object obj) { try { ByteArrayOutputStream out = new ByteArrayOutputStream (); ObjectOutputStream oout = new ObjectOutputStream (out); oout.writeObject (obj); ObjectInputStream in = new ObjectInputStream ( new ByteArrayInputStream (out.toByteArray ())); return in.readObject (); } catch (Exception e) { throw new RuntimeException ("cannot clone class [" + obj.getClass ().getName () + "] via serialization: " + e.toString ()); } }
This is so generic it can be used for cloning classes that can be written and added to your application by someone else long after you provide the base classes. But this convenience comes at a price. After switching TestBaseClass.clone() and TestClass.clone() to the SERIALIZATION branch I get: 这个实现是如此之通用,在您写好基类很久以后,别人要将新编写的类别加入您的应用程序时,还可以利用该方法来克隆那些新编写的类别。然而这种便利性得来有代价。将 TestBaseClass.clone() 和 TestClass.clone() 之实现代码切换到 SERIALIZATION 分支的情况下,我得到如下的结果:
clone implementation: serialization method duration: 2.724 ms
This is roughly 100 times slower than Approaches 1 and 2. You probably would not want this option for defensive cloning of parameters of otherwise fast intra-JVM methods. Even though this method can be used for generic containers with deep cloning semantics, cloning a few hundred objects would make you see times in the one-second range: a doubtful prospect. 这比方案1和方案2慢了有100倍左右。如果您是在为本该很快的 intra-JVM 之 方法的参数作防御性的克隆,您大概不会希望采用这种方案。尽管该方法可被运用于带有深度克隆语义的通用containers(容器),但像这样克隆几百个对象的话,您会得到1秒钟范围内的时间消耗——其应用前景令人生疑。
There are several reasons why this approach is so slow. Serialization depends on reflective discovery of class metadata, known to be much slower than normal method calls. Furthermore, because a temporary input/output (I/0) stream is used to flatten the entire object, the process involves UTF (Universal Transformation Format) 8-encoding and writing out every character of, say, TestBaseClass.m_string. Compared to that, Approaches 1 and 2 only copy String references; each copy step has the same small fixed cost. 该方案如此缓慢有几个原因。首先,serialization(次第读写)机制系依靠类别元数据(metadata)的映像式探知动作(reflective discovery),已知它比普通的函数调用慢得多。更为甚之,由于serialization(次第读写)使用一个临时的 输入/输出(I/0)串流(stream)来摊开(flatten)整个对象,因而整个过程涉及到 UTF8 编码动作(UTF8-encoding,Universal Transformation Format)以及向外写入被摊开的对象成分的每个字符(比如 TestBaseClass.m_string)。相比之下(再以 TestBaseClass.m_string 为例),方案1和方案2只需要拷贝 String 引用,且每次拷贝具有相同的固定的时间消耗。
What's even worse, ObjectOutputStream and ObjectInputStream perform a lot of unnecessary work. For example, writing out class metadata (class name, field names, metadata checksum, etc.) that may need to be reconciled with a different class version on the receiving end is pure overhead when you serialize a class within the same ClassLoader namespace. 更糟糕的是,ObjectOutputStream 和 ObjectInputStream 做了诸多不必要的工作。例如向外写入类别元数据(metadata,这包括类别名称、字段名称、元数据校验和,等等),只为与写入操作之接收端的不同版本类别相配合,而这对于您在同一个 ClassLoader 命名空间(namespace)里面次第读写(serialize)类别的情况下,纯粹就是额外负荷。
On the plus side, serialization imposes fairly light constructor requirements (the first non-Serializable superclass must have an accessible no-arg constructor) and correctly handles final fields and inner classes. This is because native code constructs the clone and populates its fields without using any constructors (something that can't be done in pure Java). 从好的一面来说,次第读写(serialization)对构造函数的特定需求相当小(第一个 non-Serializable 基类必须拥有一个可访问的无参数构造函数),并能正确妥当的处理final字段和内隐类别的情形。这是因为原生代码能在不使用构造函数的情况下构造克隆对象并转存(populates)对象的字段(这是单纯依靠Java所无法做到的)。
One more interesting advantage of Approach 3 is that it can preserve the structure of object graph rooted at the source object. Examine the dummy TestBaseClass constructor. It fills the entire m_strings array with the same m_string reference. Without any special effort on our part, the invariant m_strings[0] == m_string is preserved in the cloned object. In Approaches 1 and 2, the same effect is either purely incidental (such as when immutable objects remain shared by reference) or requires explicit coding (as with m_object1 and m_object2 in TestClass). The latter could be hard to get right in general, especially when object identities are established at runtime and not compile time (as is the case with TestClass). 方案3还有一个优势:它可以保持根基于次第读写源对象的“对象图面(object graph)”结构。来观察一下 dummy TestBaseClass 构造函数。该构造函数以相同的 m_string 引用填充整个 m_strings 数组。在我们的代码中,不用借助任何特殊动作就可以在克隆出来的对象内保持 m_strings[0] == m_string 不变式(invariant)。而要在方案1和方案2中达到同样的效果,则要么纯粹靠巧合(比如不可变对象通过引用保持被共享),要么就需要额外的编码(如同 TestClass 中 m_object1 和 m_object2 的情形)。要把后一种情况做到正确无误通常是困难的,特别是在对象的身份在运行期(而非编译期)才建立之情形下(如 TestClass 中的情形)。
Approach 4: Cloning via Java reflection
方案 4:通过 Java reflection(映像)进行克隆
Approach 4 draws inspiration from Approach 3. Anything that uses reflection can work on a variety of classes in a generic way. If I require the class in question to have a (not necessarily public) no-arg constructor, I can easily create an empty instance using reflection. It is especially efficient when the no-arg constructor doesn't do anything. Then it is a straightforward matter to walk the class's inheritance chain all the way to Object.class and set all (not just public) declared instance fields for each superclass in the chain. For each field, I check whether it contains a primitive value, an immutable object reference, or an object reference that needs to be cloned recursively. The idea is straightforward but getting it to work well requires handling a few details. My full demo implementation is in class ReflectiveClone, available as a separate download. Here is the pseudo-code of the full implementation, with some details and all error handling omitted for simplicity: 方案4从方案3吸取了一些要领。针对各种类别,任何动用映像(reflection)者都能以通用的方式处理之。如果我希望手中的类别能拥有一个无参数构造函数(并非需要为 public),我用映像(reflection)简单的创建一个空白实体即可。在无参数构造函数并不做任何事情的情况下,使用映像(reflection)就特别有效率。于是,我们可以直截了当的走遍类别的继承链,一路直至 Object.class ,并在其间为继承链中每一个基类设置所有声明的实体字段(不仅只含 public 的字段)。我针对其中每一个字段做检查,看其包含的是否为:基本型别的值,或不可变对象之引用,或是需要被递归克隆的对象引用。整个想法是直截了当的,但欲令其正确运作,我们需要处理几个细节。我撰写的完整范例实现在 ReflectiveClone 类别中,被作为一个单独的 下载 供您查看。该完整实现的伪码如下,为了简单起见忽略了某些细节以及所有错误处理:
public abstract class ReflectiveClone { /** * Makes a reflection-based deep clone of 'obj'. This method is mutually * recursive with {@link #setFields}. * * @param obj current source object being cloned * @return obj's deep clone [never null; can be == to 'obj'] */ public static Object clone (final Object obj) { final Class objClass = obj.getClass (); final Object result; if (objClass.isArray ()) { final int arrayLength = Array.getLength (obj); if (arrayLength == 0) // empty arrays are immutable return obj; else { final Class componentType = objClass.getComponentType (); // Even though arrays implicitly have a public clone(), it // cannot be invoked reflectively, so need to do copy construction: result = Array.newInstance (componentType, arrayLength); if (componentType.isPrimitive () || FINAL_IMMUTABLE_CLASSES.contains (componentType)) { System.arraycopy (obj, 0, result, 0, arrayLength); } else { for (int i = 0; i < arrayLength; ++ i) { // Recursively clone each array slot: final Object slot = Array.get (obj, i); if (slot != null) { final Object slotClone = clone (slot); Array.set (result, i, slotClone); } } } return result; } } else if (FINAL_IMMUTABLE_CLASSES.contains (objClass)) { return obj; } // Fall through to reflectively populating an instance created // via a no-arg constructor: // clone = objClass.newInstance () can't handle private constructors: Constructor noarg = objClass.getDeclaredConstructor (EMPTY_CLASS_ARRAY); if ((Modifier.PUBLIC & noarg.getModifiers ()) == 0) { noarg.setAccessible (true); } result = noarg.newInstance (EMPTY_OBJECT_ARRAY); for (Class c = objClass; c != Object.class; c = c.getSuperclass ()) { setFields (obj, result, c.getDeclaredFields ()); } return result; } /** * This method copies all declared 'fields' from 'src' to 'dest'. * * @param src source object * @param dest src's clone [not fully populated yet] * @param fields fields to be populated */ private static void setFields (final Object src, final Object dest, final Field [] fields) { for (int f = 0, fieldsLength = fields.length; f < fieldsLength; ++ f) { final Field field = fields [f]; final int modifiers = field.getModifiers (); if ((Modifier.STATIC & modifiers) != 0) continue; // Can also skip transient fields here if you want reflective // cloning to be more like serialization. if ((Modifier.FINAL & modifiers) != 0) throw new RuntimeException ("cannot set final field" + field.getName () + " of class " + src.getClass ().getName ()); if ((Modifier.PUBLIC & modifiers) == 0) field.setAccessible (true); Object value = field.get (src); if (value == null) field.set (dest, null); else { final Class valueType = value.getClass (); if (! valueType.isPrimitive () && ! FINAL_IMMUTABLE_CLASSES.contains (valueType)) { // Value is an object reference, and it could be either an // array or of some mutable type: try to clone it deeply // to be on the safe side. value = clone (value); } field.set (dest, value); } } } private static final Set FINAL_IMMUTABLE_CLASSES; // Set in <clinit> private static final Object [] EMPTY_OBJECT_ARRAY = new Object [0]; private static final Class [] EMPTY_CLASS_ARRAY = new Class [0]; static { FINAL_IMMUTABLE_CLASSES = new HashSet (17); // Add some common final/immutable classes: FINAL_IMMUTABLE_CLASSES.add (String.class); FINAL_IMMUTABLE_CLASSES.add (Byte.class); ... FINAL_IMMUTABLE_CLASSES.add (Boolean.class); } } // End of class
Note the use of java.lang.reflect.AccessibleObject.setAccessible() to gain access to nonpublic fields. Of course, this requires sufficient security privileges. 请注意,使用了 java.lang.reflect.AccessibleObject.setAccessible() 来获得对 non-public 字段的访问。当然,这也需要有足够安全级别的权限才能为之。
Since the introduction of JDK 1.3, setting final fields via reflection is no longer possible (see Note 1 in Resources); so, this approach resembles Approach 1 because it can't handle final fields. Note also that inner classes cannot have no-arg constructors by definition (see Note 2 in Resources), so this approach will not work for them either. 自从 JDK 1.3 以来,通过映像(reflection)设置 final 字段就不再被允许了。(详见参考资源中的注释1);因此,本方案类似方案1,它无法处理 final 字段的情形。还请注意,内隐类别(inner classes)不能在其定义中含有无参数构造含数(详见 参考资源中的注释2),故本方案也无法处理内隐类别(inner classes)情形。
Coupled with the no-arg constructor requirement, this approach restricts the type of classes it can handle. But you would be surprised how far it can go. The full implementation adds a few useful features. While traversing the object graph rooted at the source object, it keeps an internal objMap parameter that maps values in source object graphs to their respective clones in the cloned graphs. This restores the ability to preserve object graphs that I had in Approach 3. Also, the metadataMap parameter caches class metadata for all classes that it encounters while cloning an object and improves performance by avoiding slow reflection. The relevant data structures are scoped to a single call to clone(), and the overall idea is very similar to Java serialization revamped to just do object cloning. Similar to the previous section, a whole hierarchy of suitable classes can be made cloneable by equipping the base class with one generic method: 该方案除了有“需要无参数构造函数”之要求,还对能够处理的类别有所限制。但您也许会惊讶于其能够做到什么程度。完整的 实现 中增加了几个有用的功能。在遍历根基于克隆源对象的对象图面(object graph)过程中,该实现会保留一个内部的 objMap 参数,用来将克隆源对象之图面中的值对应到其克隆目标对象的图面中去。这样做就回复了方案3中的那种“保持对象图面”的能力。另外, metadataMap 参数用来缓存(caches)克隆过程中遇到的所有类别之元数据(metadata),以此尽量避免缓慢的影像(reflection)动作从而提高性能。相关数据结构的生存空间被限定在单独的 clone() 调用之中,其总体想法非常类似于“为了让其专做对象克隆而对 Java serialization(次第读写) 进行修补”。这里的情形类同前面的小节:为基类搭载一个通用的方法,就可让整个相互搭配的类别阶层体系具有 cloneable 性质:
public Object clone () { return ReflectiveClone.clone (this); }
What is this method's performance? Rerunning the test with the REFLECTION branch selected produces: 这个方法的性能如何呢?以 REFLECTION 分支重新运行测试程序产生出如下结果:
clone implementation: reflection method duration: 0.537 ms
This is roughly five times faster than straightforward serialization—not too bad for another generic approach. In terms of its performance and capabilities, it represents a compromise between the other three solutions. It can work very well for JavaBean-like classes and other types that usually do not have final fields. 这比直截了当型的次第读写方案大约快了5倍——作为一个通用的方案还不算太坏。从其性能和处理能力来考量,该方案代表了对另外三个解决方案的折衷. 对于 JavaBean 形式的类别以及其它通常没有 final 字段的型别,该方案非常凑效。
Resource considerations
对资源的考量
Measuring memory overhead is more difficult than measuring performance. It should be obvious that the first two approaches shine in this area, as they instantiate only the data that will populate the cloned fields. 度量内存负荷比度量性能更为困难。在内存负荷方面,前两个方案应该具有很明显的优势,因为其中只有用来转存(populate)克隆字段的数据才会被具现化(instantiated)。
Cloning via serialization has an extra drawback that may have escaped your attention above. Even though serializing an object preserves the structure of the object graph rooted at that instance, immutable values will get duplicated across disjoint calls to clone(). As an example, you can verify for yourself that 您或许刚才还没留意,通过次第读写(serialization)进行克隆还另有一个缺点。尽管次第读写对象时能够保持根基于该实体的对象图面(object graph)结构,不可变的值却会在对 clone() 方法的单个调用过程中被复制。作为例证,您可以自行验证:
TestClass obj = new TestClass ("dummy"); System.out.println (obj.m_string == ((TestClass) obj.clone ()).m_string);
will print false for Approach 3 only. Thus, cloning via serialization will have a tendency to pollute heap with redundant copies of immutable objects like Strings. Approaches 1 and 2 are completely free from this problem, and Approach 3 is mostly free from it. 其结果仅在采用方案3时列印出 false 。如此看来,通过次第读写(serialization)进行克隆就具有倾向性,容易产生冗余的诸如 Strings 这样的不可变对象,从而污染堆(heap)空间。方案1和方案2中完全不存在这个问题,而方案3中则是几乎不存在这个问题。
A quick and dirty proof of these observations can be seen by changing the body of Main.main() to keep the clones in memory and track the object count when a given heap size is reached: 有个蹩脚又便宜的办法来证实上面的发现,只要改变 Main.main() 函数体,令其在内存中保留克隆体,并在堆空间增长到一定大小时追踪对象计数即可:
int count = 0; List list = new LinkedList (); try { while (true) { list.add (obj.clone ()); ++ count; } } catch (Throwable t) { System.out.println ("count = " + count); }
Run this in a JVM with a -Xmx8m setting and you will see something similar to this: 若在 JVM 中以 -Xmx8m 设置来运行上述代码,您将看到类似如下的结果:
>java -Xmx8m Main clone implementation: Object.clone() count = 5978 Exception in thread "main" java.lang.OutOfMemoryError ... clone implementation: copy construction count = 5978 ... clone implementation: serialization count = 747 ... clone implementation: reflection count = 5952
Approach 3's overhead increases with the number of immutable fields in a class. Removing this overhead is nontrivial. 方案3的负荷随着类别中不可变字段(immutable fields)数量的增加而增加。消除该负荷则需要一些心力。
The recap
摘要列表
The following table recaps the properties of all cloning approaches in this article from several perspectives: speed, resource utilization, class design constraints, object graph handling. 下面的表格从几个方面整理了本文中所有克隆方案,这些方面包括:速度;资源利用率;类别设计上的约束;对象图面掌控情况。
|
|
This article discussed implementing a single method, Object.clone(). It is amazing that a single method can have so many implementation choices and subtle points. I hope this article provided you with some food for thought and useful guidelines for your application class design. 本文讨论了 Object.clone() 这单独一个方法的实现。令人惊异的是,一个方法竟然可以有这么多种实现方案和这么多微妙的细节要点。我希望本文带给您一些思考的素材,并为您的应用程序之类别设计提供了有用指导。
About the author 关于作者 Vladimir Roubtsov has programmed in a variety of languages for more than 12 years, including Java since 1995. Currently, he develops enterprise software as a senior developer for Trilogy in Austin, Texas. Vladimir Roubtsov 具有超过十二年的多语言编程经验,掌握的语言包括从1995就年开始使用的Java。目前他任职于德克萨斯州奥斯汀的 Trilogy 公司,作为高级开发人员进行企业级软件的开发。
Resources 相关资源 Download the complete source code that accompanies this article: http://www.javaworld.com/javaworld/javaqa/2003-01/clone/02-qa-0124-clone.zip 在这里下载本文配套的源代码: http://www.javaworld.com/javaworld/javaqa/2003-01/clone/02-qa-0124-clone.zip The high-resolution library used for measurements in this article was developed in another Java Q&A installment: http://www.javaworld.com/javaworld/javaqa/2003-01/01-qa-0110-timing.html 在本文用于测量时间的高解析度程序库是在另一期Java 问答中开发出来的。 http://www.javaworld.com/javaworld/javaqa/2003-01/01-qa-0110-timing.html For more on cloning see "Hashing and Cloning," Mark Davis (Java Report, April 2000) pp. 60-66; "Effective Cloning," Steve Ball (Java Report, January 2000) pp. 60-67; "Solutions for Implementing Dependable Clone Methods," Steve Ball (Java Report, April 2000) pp. 68-82欲了解更多关于克隆技术的内容,请参见 "Hashing and Cloning," Mark Davis (Java Report, April 2000) pp. 60-66; "Effective Cloning," Steve Ball (Java Report, January 2000) pp. 60-67; "Solutions for Implementing Dependable Clone Methods," Steve Ball (Java Report, April 2000) pp. 68-82Note 1: In Sun JDK 1.2 you could set even final fields using reflection as long as you used setAccessible(), but this changed in later Sun JDK versions.注释 1:在 Sun JDK 1.2 中,只要您使用了 setAccessible() ,您甚至可以使用 reflection 机制来设置 final fields, 但是在后续的 Sun JDK 版本中,这个细节发生了变化。Note 2: Syntactically an inner class may appear to have a no-arg constructor. However, in bytecode every constructor of an inner class takes at least one parameter that is a reference to the outer object. Note that by inner classes, I specifically mean non-static nested classes. Static nested classes do not have this problem. 注释 2:从语法的角度来说,内隐类别(inner class)可以拥有一个无参数的构造函数。然而在最终的 bytecode 里,内隐类别的每一个构造函数至少具有一个参数,即指向外层对象的一个引用(reference)。要注意,我这里谈及的 inner classes 特别意指非静态的内嵌类别(non-static nested classes)。静态的内嵌类别(static nested classes)没有上述问题。 Java 101's "Object-oriented language basics, Part 5" by Jeff Friesen (JavaWorld, August 2001) contains a section about cloning: http://www.javaworld.com/javaworld/jw-08-2001/jw-0803-java101.html Java 101's "Object-oriented language basics, Part 5" by Jeff Friesen (JavaWorld, August 2001) 里面有一节关于克隆技术的内容: http://www.javaworld.com/javaworld/jw-08-2001/jw-0803-java101.html Want more? See the Java Q&A index page for the full Q&A catalog: http://www.javaworld.com/columns/jw-qna-index.shtml 还想看更多内容吗?请浏览 Java 问答 的索引页查看完整的问答集分类: http://www.javaworld.com/columns/jw-qna-index.shtml For more than 100 insightful Java tips, visit JavaWorld's Java Tips index page: http://www.javaworld.com/columns/jw-tips-index.shtml 欲学习超过100条的Java专家技巧,请访问 JavaWorld's Java Tips 索引页: http://www.javaworld.com/columns/jw-tips-index.shtml Browse the Core Java section of JavaWorld's Topical Index: http://www.javaworld.com/channel_content/jw-core-index.shtml 请浏览至 JavaWorld's Topical Index 的 Core Java 小节: http://www.javaworld.com/channel_content/jw-core-index.shtml Get more of your questions answered in our Java Beginner discussion: http://forums.devworld.com/webx?50@@.ee6b804 请到我们的 Java Beginner 论坛获得更多自己问题的答案: http://forums.devworld.com/webx?50@@.ee6b804 Sign up for JavaWorld's free weekly email newsletters: http://www.javaworld.com/subscribe 欲注册订阅 JavaWorld 免费的新闻邮件周刊请至: http://www.javaworld.com/subscribe You'll find a wealth of IT-related articles from our sister publications at IDG.net您可以在我们的兄弟出版品 IDG.net 中找到丰富的IT相关文章。
主要术语英汉对照表 base class, 基类 cast/casting, 转型 chain/chaining, 串链 class, 类别 clone/cloning, 克隆 copy constructor, 拷贝构造函数 field, 字段 immutable, 不可变的 mutable, 可变的/易变的 override, 覆写 parameter, 参数 reflection, 映像 reflective discovery, 映像式探知 serialization, 次第读写 superclass, 父辈类别/超类 subclass, 子辈类别/子类 type, 型别