class文件格式详细

碧海山城 2012-09-08

展开全文

1. Class文件格式

使用大端法存储，高位在前，低位在后。

ClassFile {

1.u4 magic; //4字节的魔数，0xCAFEBABE

2.u2 minor_version; //2个字节的文件版本，major.minor

3.u2 major_version;

4.u2 constant_pool_count; //2个字节，表示常量池的长度=constant_pool table+1

5.cp_info constant_pool[constant_pool_count-1];//各种各样的常量（chassl/接口名、字段名。。从1至constant_pool_count-1；虽然没有索引位0的项，但是这项也被计入常量池的长度，例如，当constant_pool中有14项（索引值从1到14），则constant_pool_count的值为15，常量池中的每一项由开头的字节决定）

6.u2 access_flags; //两个字节的访问标志

Flag Name	Value	Interpretation
ACC_PUBLIC	0x0001	Declared public; may be accessed from outside its package.
ACC_FINAL	0x0010	Declared final; no subclasses allowed.
ACC_SUPER	0x0020	Treat superclass methods specially when invoked by the invokespecial instruction.
ACC_INTERFACE	0x0200	Is an interface, not a class.
ACC_ABSTRACT	0x0400	Declared abstract; may not be instantiated.

一个类的标志位，除了上表中的标志位，不会有其他的了。

没有被设置为ACC_INTERFACE标志的，就被认为是类文件，如果设置了，那么也应该同时包括ACC_ABSTRACT标志，ACC_PUBLIC是可选的；

SUN当前版本的Java虚拟机中，invokespecial指令的语义比老版本更为严格，所有新版本的编译器都必须设置ACC_SUPER标志，java老版本的编译器没有设置ACC_SUPER，并且老的JVM忽略ACC_SUPER位，但是新的编译器应该实现invokespecial语；

其他的未指明的位保留将来使用，并且编译器应当将其置为0，同时java虚拟机应当忽略他们。

7.u2 this_class; //两个字节的，constant_pool中的索引值；该位置的值必须是代表class或者interface的CONSTANT_Class_info格式的数据

8.u2 super_class;//class文件的super_class只能是0或者是常量池中的一个索引。如果是类，则索引必须是常量池中的结构CONSTANT_Class_info（表示类或者借口），如果是0，则表示Object类。如果是接口，在常量池入口项的位置为java.lang.Object CONSTANT_Class_info。

9.u2 interfaces_count;//实现的接口数，这个数组值容纳那些直接出现在类声明的implements子句或者接口声明的extends子句的父接口。

10.u2 interfaces[interfaces_count];//实现的接口数组。数组中的索引值（0<=i<interface_count）对应常量池中的一项。每项的结构必须是CONSTANT_Class_info，表示子类或者子接口实现的接口，顺序就是源文件中给定的顺序。

11.u2 fields_count; //字段数，即field_info结构在常量池中的数量，他包括类变量和实例变量的字段的数量总和，只有在文件中由类/接口声明了的字段才能在fields列表中列出，不列出从超累或者父接口继承而来的字段。另外，还有一些事Java源文件中没有声明的字段，是Java编译器在编译时候向类或者接口添加的字段，这些字段使用Synthetic属性标识。

12.field_info fields[fields_count];//每一项的值都是常量池的索引，符合field_info结构

13.u2 methods_count;//给出method_info结构的数量

14.method_info methods[methods_count];//常量池中的每一项都是常量池中的索引，对应的值符合method_info格式

15.u2 attributes_count; //属性的数量

16.attribute_info attributes[attributes_count];//在常量池中每个属性的值，class文件接收的属性有SourceFile attribute和Deprecated attribute

}

1.1. 特殊字符串

常量池中容纳的符号引用包括三种特殊的字符串：全限定名、简单名称和描述符。

1.1.1. 全限定名

在class文件中，全限定名中的点用斜线取代了，例如，java.lang.Object的权限顶名表示为java/lang/Object；在Class文件中，java.util.HashTable的权限顶名表示为java/util/HasnTable。

1.1.2. 简单名称

字段名和方法名以简单名称（非权限顶名）形式出现在常量池入口中。例如，一个getInt方法，在常量池中会有一个形如toString的方法名。

1.1.3. 描述符号

参考下面的字段描述、方法描述

2. 常量池

2.1. 基本格式

Java虚拟机指令不依赖于运行时的布局或者类、接口、类的实例。相反，依赖于常量池中的表示。所有常量池中的项的格式都如下：

cp_info {

u1 tag;

u1 info[];

}

所有常量池中的项必须以1字节的标志位表明项的类型。如下表：

*Constant Type*	*Value*
CONSTANT_Class	7
CONSTANT_Fieldref	9
CONSTANT_Methodref	10
CONSTANT_InterfaceMethodref	11
CONSTANT_String	8
CONSTANT_Integer	3
CONSTANT_Float	4
CONSTANT_Long	5
CONSTANT_Double	6
CONSTANT_NameAndType	12
CONSTANT_Utf8	1

每个tag后面必须跟随2个或者更多的字节的内容。

2.2. CONSTANT_Class_info

该结构用来表明类或者接口：

CONSTANT_Class_info {

u1 tag;

u2 name_index;

}

Tag=CONSTANT_Class（7）；

name_index的值也是常量池的索引，对应的值必须是CONSTANT_Utf8_info格式，代表一个有效的权限顶类名或者接口名。

2.2.1. 数组

因为数组表示的是对象数组，操作符anewarray和multianewarray可以通过结构CONSTANT_CLASS_Info创建数组。对于类数组，对象的名字表示数组的类型，下面的例子表明二维int数组：

int[][] is [[I；

线程数组：Thread[] is [Ljava/lang/Thread;

2.3. CONSTANT_Utf8_info

该结构用来代表字符串值。他采用一个UTF-8格式的变体来存储一个常量字符串

CONSTANT_Utf8_info {

u1 tag;

u2 length;

u1 bytes[length];

}

Tag=CONSTANT_Utf8_info（1）

Length，表示值的字节数组的长度

Bytes[]字节数组

他可以存储多种字符串，包括：

l 文字字符串，如String对象

l 被定义的类和接口的全限定名

l 被定义的类的超类（如果有的话）的权限定名

l 被定义的类和接口的父接口的权限定名

l 由类或者接口声明的任意字段的简单名称和描述符

l 由类或者接口声明的任意方法的简单名称和描述符

l 任何引用的类和接口的全限定名

l 任何引用的字段的简单名称和描述符

l 任何引用的方法的简单名称和描述符

l 与属性相关的字符串

2.4. CONSTANT_Fieldref_info,CONSTANT_Methodref_info,and CONSTANT_InterfaceMethodref_info

//相应的字段

CONSTANT_Fieldref_info {

u1 tag;

u2 class_index;

u2 name_and_type_index;

}

//用来描述类中声明的方法（不包括接口方法）

CONSTANT_Methodref_info {

u1 tag;

u2 class_index;

u2 name_and_type_index;

}

//用来描述接口中声明的方法（不包括类方法）

CONSTANT_InterfaceMethodref_info {

u1 tag;

u2 class_index;

u2 name_and_type_index;

}

这常量池中的三个结构，用来定位一个方法或者一个字段。可能这里有些困惑，下面的Field_info和method_info已经包含了字段和方法的信息，那么常量池中的信息又有神马用捏？

其实这里的字段和方法信息，并不一定是属于当前类的，用来代码中定位方法。比如下面的代码：

public class SimpleHolderInterfaceImpl{

public static void main(){

SimpleHolderInterfaceImpl holder=new SimpleHolderInterfaceImpl();

holder.set("Item");

SimpleHolder holder2=new SimpleHolder();

holder2.set("Item");

}

Main方法中分别调用了SimpleHolderInterfaceImpl 实例的set方法和SimpleHolder 实例的set方法，这时候就需要定位到相应的类，以及相应的方法。这些信息都保存在常量池中，在指令代码中，只需常量池的索引：

下面是常量池中的8和11的索引内容：

而后面的Field和method信息，是当前类的方法和字段，

Tag标志位

The tag item of a CONSTANT_Fieldref_info structure has the value CONSTANT_Fieldref (9).

The tag item of a CONSTANT_Methodref_info structure has the value CONSTANT_Methodref (10).

The tag item of a CONSTANT_InterfaceMethodref_info structure has the value CONSTANT_InterfaceMethodref (11).

Class_index

常量池的索引，对应的值必须是CONSTANT_Class_info结构的，表示字段或者方法所在的类/接口。

The class_index item of a CONSTANT_Methodref_info structure must be a class type, not an interface type. The class_index item of a CONSTANT_InterfaceMethodref_info structure must be an interface type. The class_index item of a CONSTANT_Fieldref_info structure may be either a class type or an interface type.

Name_and_type_index

常量池的索引，对应的值必须是CONSTANR_NameAndType_info结构。描述一个字段或者一个方法

In a CONSTANT_Fieldref_info the indicated descriptor must be a field descriptor Otherwise, the indicated descriptor must be a method descriptor 。

如果CONSTANT_Methodref_info 结构的名字以'<'('\u003c'),开头，那么这个方法就一定是<init>方法。representing an instance initialization method. Such a method must return no value.

2.5. CONSTANT_String_info

CONSTANT_String_info {

u1 tag;

u2 string_index;

}

表示一个字符串，string_index是常量池的索引，对应的是一个CONSTANT_Utf8_info结构，表示字符串的初始化字符序列。

2.6. CONSTANT_Integer_info、CONSTANT_Float_info

CONSTANT_Integer_info {

u1 tag;

u4 bytes;

}

表示整形常量，以大端法存储。

CONSTANT_Float_info {

u1 tag;

u4 bytes;

}

表示浮点型常量，以IEEE 754浮点格式存储。也是以大端法存储。

int s = ((bits >> 31) == 0) ? 1 : -1;

int e = ((bits >> 23) & 0xff);

int m = (e == 0) ?

(bits & 0x7fffff) << 1 :

(bits & 0x7fffff) | 0x800000;

2.7. CONSTANT_Long_info、CONSTANT_Double_info

CONSTANT_Long_info {

u1 tag;

u4 high_bytes;

u4 low_bytes;

}

CONSTANT_Double_info {

u1 tag;

u4 high_bytes;

u4 low_bytes;

}

Long、Double有8个字节，占据常量池中两个位置，一个long类型入口紧接着下一个入口（每个4字节，总共8字节），

2.8. CONSTANT_NameAndType_info

用来表明一个字段或者方法，没有关于方法/字段所属的类或者接口信息：

CONSTANT_NameAndType_info {

u1 tag;

u2 name_index; //name_index是常量池索引，对应的值表明字段或者方法的名称

u2 descriptor_index;//常量池索引，对应的值表明一个合法的字段描述/方法描述。

}

主要用在CONSTANT_Fieldref_info, CONSTANT_Methodref_info, and CONSTANT_InterfaceMethodref_info三个结构中

3. Field_info 字段信息

每个类域都通过field_info结构表示：

包含了这个字段的名字、描述符、修饰符，如果是final的，还会展示其常量值。

field_info {

u2 access_flags; //access_flags

u2 name_index; //常量池中的索引，对应的值必须是CONSTANT_Utf8_info结构，表示字段名

u2 descriptor_index;//表示字段的类型，也是常量池中的索引，是CONSTANT_Utf8_info结构，表示一个有效的字段类型；

u2 attributes_count;//属性的数量

attribute_info attributes[attributes_count];//常量池的索引，结构是Attribute的结构。

}

3.1.1. Access_flags

Flag Name	Value	Interpretation
ACC_PUBLIC	0x0001	Declared public; may be accessed from outside its package.
ACC_PRIVATE	0x0002	Declared private; usable only within the defining class.
ACC_PROTECTED	0x0004	Declared protected; may be accessed within subclasses.
ACC_STATIC	0x0008	Declared static.
ACC_FINAL	0x0010	Declared final; no further assignment after initialization.
ACC_VOLATILE	0x0040	Declared volatile; cannot be cached.
ACC_TRANSIENT	0x0080	Declared transient; not written or read by a persistent object manager.

3.1.2. 字段描述：类型

BaseType? Character	Type	Interpretation
B	byte	signed byte
C	char	Unicode character
D	double	double-precision floating-point value
F	float	single-precision floating-point value
I	int	integer
J	long	long integer
L<classname>;	reference	an instance of class? <classname>
S	short	signed short
Z	boolean	true? or? false
[	reference	one array dimension

double d[][][];

[[[D

3.1.3. Attribute

字段上能用的Attribute有：ConstantValue、Synthetic、Deprecated等。

4. Method_info 方法信息

method_info格式代表所有的方法，包括实例方法，类方法（static）、初始化方法、或者由编译器产生的方法，都由一个可变长度的method_info表来描述。不过不包括父类或者父接口的方法，包括方法名、描述符（返回值以及参数类型），如果方法既不是抽象的，又不是本地的，那method_info还会还有局部变量所需的栈空间长度，为方法所捕获的异常表，字节码序列以及可选的行数和局部变量表。如果有异常，还会有对应的异常表。

有可能在class文件中出现的两种编译器产生的方法是：实例初始化方法（<init>）和类于接口的初始化方法<clinit>，关于这部分的内容，可以参考《Java虚拟机---类装载子系统--Class的生命周期》

method_info {

u2 access_flags; //访问控制符

u2 name_index;

u2 descriptor_index;

u2 attributes_count;

attribute_info attributes[attributes_count];

}

4.1.1. Access_flags

Flag Name	Value	Interpretation
ACC_PUBLIC	0x0001	Declared public; may be accessed from outside its package.
ACC_PRIVATE	0x0002	Declared private; accessible only within the defining class.
ACC_PROTECTED	0x0004	Declared protected; may be accessed within subclasses.
ACC_STATIC	0x0008	Declared static.
ACC_FINAL	0x0010	Declared final; may not be overridden.
ACC_SYNCHRONIZED	0x0020	Declared synchronized; invocation is wrapped in a monitor lock.
ACC_NATIVE	0x0100	Declared native; implemented in a language other than Java.
ACC_ABSTRACT	0x0400	Declared abstract; no implementation is provided.
ACC_STRICT	0x0800	Declared strictfp; floating-point mode is FP-strict

Methods of classes may set any of the flags in Table 4.5. However, a specific method of a class may have at most one of its ACC_PRIVATE, ACC_PROTECTED, and ACC_PUBLICflags set (§2.7.4). If such a method has its ACC_ABSTRACT flag set it may not have any of its ACC_FINAL, ACC_NATIVE, ACC_PRIVATE, ACC_STATIC, ACC_STRICT, orACC_SYNCHRONIZED flags set (§2.13.3.2).

All interface methods must have their ACC_ABSTRACT and ACC_PUBLIC flags set and may not have any of the other flags in Table 4.5 set (§2.13.3.2).

4.1.2. 方法描述

包括了方法的返回值以及参数：（ParameterDescriptor*）ReturnDescriptor

ParameterDescriptor格式：FieldType

ReturnDescriptor：FieldType/V（V代表返回值会void）

例如下面的方法以及他的描述：

Object mymethod(int i, double d, Thread t)

(IDLjava/lang/Thread;)Ljava/lang/Object;

另外要注意，虽然实例方法和类方法的描述都是一样的，但是实例方法会隐式的传入一个对于当前实例“this”的作为参数，类方法并没有。

这在生成的字节码上也可以体现出来，如果你操作实例变量，那么都会先"load_this":

方法描述只能包含255个字长以内的参数，传给实例方法的隐藏this参数占用1个字节，属于基本类型的long或者double占用两个字长，其他的参数占用一个字长。

4.1.3. Attribute

方法上能用的attribute有：Code、Exception、Synthetic、Deprecated。

5. Attribute_info 属性

5.1. 格式

attribute_info {

u2 attribute_name_index; //常量池中的起始位置索引，对应的值是CONSTANT_Utf8_info格式的，表示该属性的名字。

u4 attribute_length; //info数组的长度，除去起始的这6个字节

u1 info[attribute_length]; //attributes的其他值，不确定

}

5.2. 各种Attribute

用在ClassField、field_info、method_info、以及code_attribute结构上，所有的attributes都是atribute_info结构。

5.3. ConstantValue Attribute

一个固定长度的属性，用在field_info结构中，表示该字段同时

ConstantValue_attribute {

u2 attribute_name_index;//常量池中的索引，Constant_UTF8_INFO结构，值必须是“ConstantValue”

u4 attribute_length; //定死了是2

u2 constantvalue_index;//常量池中的索引，对应的值表示该常量。

}

每种类型的常量，都有对应的常量结构看，如下表：

Field Type	Entry Type
long	CONSTANT_Long
float	CONSTANT_Float
double	CONSTANT_Double
int, short, char, byte, boolean	CONSTANT_Integer
String	CONSTANT_String

比如String和integer的常量：

5.4. Code Attribute

他是一个变长的属性，用在method_info结构上。

Code_attribute {

u2 attribute_name_index; //常量池索引，UTF8结构，是"code"

u4 attribute_length; //

u2 max_stack; //执行此函数时可用的栈的最大深度

u2 max_locals; //执行此函数所需的存储空间的长度（以字为单位），无论虚拟机声明时候调用被Code属性所描述的方法，他都必须分配一个长度为max_locals的局部变量数组。这个数组用来存储传递给方法的参数以及为方法所使用的局部变量，long/double类型值的最大有效局部变量索引是max_locals-2。任何其他类型的最大有效局部变量索引为max_locals-1

u4 code_length; //本函数用到的代码长度。

u1 code[code_length]; //实现本函数的真正字节码

u2 exception_table_length; //异常表

{ u2 start_pc; //code数组中的索引，表示开始处理异常的部分

u2 end_pc; //异常处理结束的索引

u2 handler_pc; //处理异常部分代码的索引（他会包括finally部分的代码）

u2 catch_type; //常量池索引，表明处理的异常类型，class_info结构，如果是0的话，表示处理所有异常，他也同时用于实现finally子句。

} exception_table[exception_table_length];

u2 attributes_count;

attribute_info attributes[attributes_count];

// LineNumberTable (§4.7.8) and LocalVariableTable (§4.7.9) attributes

}

5.5. SourceFile attribute

定长的，用在ClassFile结构上的，不能超过一个。表明源文件的位置：

SourceFile_attribute {

u2 attribute_name_index;

u4 attribute_length;

u2 sourcefile_index;

}

5.6. Deprecated attribute

表明废弃的，用在class、method、field上。

5.7. Exception

变长的，可以用在method_info上，指明了一个方法的checked异常，最多只能有一个Exception Attribute：

Exceptions_attribute {

u2 attribute_name_index; //值是“Exception”

u4 attribute_length; //

u2 number_of_exceptions; //表明exception_index_table的数目

u2 exception_index_table[number_of_exceptions];//每一项表明抛出的异常类型，CONSTANT_Class_info结构

}

5.8. Synthetic

定长的属性，用在ClassFile、field_info、method_info结构，不能访问到源码的，或者说是由编译器生成的，需要被标记为Synthetic。

Synthetic_attribute {

u2 attribute_name_index;

u4 attribute_length; //固定值：0

}

下面的例子是最常见的synthetic field ：

1 class parent {

2 public void foo() {

3 }

5 class inner {

6 inner() {

7 foo();

8 }

9 }

10 }

非static的inner class里面都会有一个this$0的字段保存它的父对象。编译后的inner class 就像下面这样：
Java代码

1 class parent$inner

2 {

3 synthetic parent this$0;

4 parent$inner(parent this$0)

5 {

6 this.this$0 = this$0;

7 this$0.foo();

8 }

9 }

所有父对象的非私有成员都通过this$0来访问。

还有许多用到synthetic的地方。比如使用了assert 关键字的class会有一个
synthetic static boolean $assertionsDisabled 字段

5.9. InnerClasses

InnerClasses_attribute {

u2 attribute_name_index;

u4 attribute_length;

u2 number_of_classes;//内部类数目

{ u2 inner_class_info_index;

u2 outer_class_info_index;

u2 inner_name_index;

u2 inner_class_access_flags;

} classes[number_of_classes];

}

表明内部类，如果类Rain、Snow、Wet都声明为Weather的内部类，那么Weather类的InnerClass属性就会明确地包含他们三个的inner_class_info。

另外，还有对应的outer_class_info信息，表明对应的外部类。

inner_name_index，一个指向CONSTANT_Utf8_info入口的索引，描述该内部类的简单名称。如果是一个匿名内部类，该项值为0.

5.10. LineNumberTable

用在debug的时候，并且他用在codeAttribute上，另外，多个LineNumberTable属性可能代表多个源文件中给定的行，也就是说，LineNumberTable属性不是一对一对应源文件。

LineNumberTable_attribute {

u2 attribute_name_index; //LineNumberTable

u4 attribute_length;

u2 line_number_table_length;

{ u2 start_pc; //在该方法中的相对位置

u2 line_number; //表明该代码在源文件中的位置

} line_number_table[line_number_table_length];

}

例如下面的代码：

虽然只有三行，但是换成字节码代码的话，就可能不止三行，那么在执行的时候就需要知道当前执行的字节码在哪一行上，即需要保存，字节码-----源码的对应关系。

所以就出现了下面的结构：

在该方法区内的0-7字节码都是对应源码19行，所以在执行该字节码的时候，就会debug到相应代码上，到第8行字节码的时候，就需要定位到20行源码，一次类推。

5.11. LocalVariableTable

LocalVariableTable_attribute {

u2 attribute_name_index;

u4 attribute_length; //局部变量池的长度

u2 local_variable_table_length;//该表建立起源代码中局部变来那个的名称、类型与局部变量在字节码数组中的作用域、栈帧内局部变量中的索引之间的联系。

{ u2 start_pc; //该变量有效的字节码起始位置

u2 length; //START+length，表明该局部变量有效的代码长度

u2 name_index; //变量名称索引

u2 descriptor_index; //变量类型索引

u2 index;

} local_variable_table[local_variable_table_length];

}

如果以javac -g编译java文件，就会发现有这个表：

上面的最后一行，表明aa这个变量从3-85都有效，对应源码，可以看到aa是在最开始声明的。

Index表明在此方法栈帧中局部变量部分的索引，这是当方法执行时该局部变量数据所保存的位置。如果是long或者double类型，那么数据占据index和index+1位置的两个位置；其他类型的数据变量占据index位置的一个位置。

用在调试的时候，确定方法中某个变量的值，不能超过一个。他建立了栈帧中局部变来那个部分内容与源代码中局部变量的名称和描述符之间的映射关系。

本站是提供个人知识管理的网络存储空间，所有内容均由用户发布，不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息，谨防诈骗。如发现有害或侵权内容，请点击一键举报。

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自：碧海山城 > 《class文件和java语法规范》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

碧海山城

关注对话

TA的最新馆藏

[转] gzip原理与实现（非常好）
Socket通道续3---io框架模型演化
ConcurrentHashMap的实现分析
并发总结1-线程、中断、锁（Lock）、协作
5.CMS GC
GC其他：引用标记-清除、复制、标记-整理的说明

喜欢该文的人也喜欢更多

热门阅读换一换