NET Foundations - Memory model (part 2) - VusCode - Coding dreams since 1998!

xiao huan 2008-08-20

展开全文

NET Foundations - Memory model (part 2)

In part 1 of the NET foundation - Memory model blog post, I‘ve explained some of the frequently used terms related to .NET memory model which would be used in this post a lot, so in case you haven‘t check that out already, I advice you to do that.

Today‘s post would take a bite on reference types memory allocation related questions such as: how GC knows what to collect, LOH objects, , static vs instance method, etc...

Next post would try to wrap up memory model exploration with covering the value type memory allocation subjects with emphasize on boxing/unboxing

After that I plan to switch gears and jump to some TDD related subjects (Composite UI testability, more MVP mocking examples etc)

Reference type memory allocation

Reference type memory allocation part would be based on very simple example which would be our simple HelloHelper console application slightly modified to suit the purposes of this post

So, there would be a HelloHelper class

view plain copy to clipboard print ?

using System;
namespace NETMemoryModel
public class HelloHelper
{
public static DateTime ActiveDate = DateTime.Now;
private readonly string _name;
pubic byte[] Image;
public HelloHelper(string name)
{
_name = name;
}
public static string GetDate()
{
return string.Format("Current date is:{0}", ActiveDate);
}
public string GetHelloText()
{
return string.Format("Hello, {0}. Current date is:{1}", _name, ActiveDate);
}
}
}

using System;
namespace NETMemoryModel
public class HelloHelper
{
public static DateTime ActiveDate = DateTime.Now;
private readonly string _name;
pubic byte[] Image;
public HelloHelper(string name)
{
_name = name;
}
public static string GetDate()
{
return string.Format("Current date is:{0}", ActiveDate);
}
public string GetHelloText()
{
return string.Format("Hello, {0}. Current date is:{1}", _name, ActiveDate);
}
}
}

As we can see, this class has a static method and public field and a instance method, constructor and field.

The HelloHelper class would be called from console application, so the Program,cs file would look like this

view plain copy to clipboard print ?

using System;
namespace NETMemoryModel
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(HelloHelper.GetDate());
DateTime helloDate = HelloHelper.ActiveDate;
Console.WriteLine(helloDate);
HelloHelper.ActiveDate=Convert.ToDate("11/23/1976");
Console.WriteLine(helloDate);
HelloHelper hlp=new HelloHelper("Nikola");
string helloText=hlp.GetHelloText();
Console.WriteLine(helloText);
helloText += "(Addon)";
Console.WriteLine(helloText);
hlp=new HelloHelper("Doe");
Console.WriteLine(hlp.GetHelloText());
hlp.Image=new byte[90000];
}
}
}

using System;
namespace NETMemoryModel
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(HelloHelper.GetDate());
DateTime helloDate = HelloHelper.ActiveDate;
Console.WriteLine(helloDate);
HelloHelper.ActiveDate=Convert.ToDate("11/23/1976");
Console.WriteLine(helloDate);
HelloHelper hlp=new HelloHelper("Nikola");
string helloText=hlp.GetHelloText();
Console.WriteLine(helloText);
helloText += "(Addon)";
Console.WriteLine(helloText);
hlp=new HelloHelper("Doe");
Console.WriteLine(hlp.GetHelloText());
hlp.Image=new byte[90000];
}
}
}

Before the program would even start

Just a moment before the managed module entry point - Main method would be processed by CLR, CLR would first ensure that all the types which would be used in processing are loaded in DefaultApplicationDomain loader heap so the CLR could could perform JIT compilation.

We can see from diagram that every type allocated in loader heap contains:

a type object pointer pointing to a System.Type type determining the type information
sync block used in synchronizing threads accessing type members
Method table with pointers to addresses where method compiled native CPU instructions or to internal JIT compiler function are and with method slots describing to CLR nature of the method
Static fields reside too in type

Line - Console.WriteLine(HelloHelper.GetDate());

Once this assemblies would be loaded in application domain, CLR would start executing first line of Program.cs class

Console.WriteLine(HelloHelper.GetDate());

Both methods are static methods, which means they don‘t need instance context for their execution and that‘s why memory model during this is having empty stack and GC heap. Types are loaded once per application domain, therefore JITCompilation of the GetDate and WriteLine methods would occur only once

(On further diagrams I would exclude Console and Program types from Loader heap to make illustrations simpler and smaller, and they are not very important to the things I‘ll be presenting.)

Line -> DateTime helloDate = HelloHelper.ActiveDate

This line is similar to previous one in a sense that ActiveDate is public static field, which means that it is stored inside of the type (highlighted yellow on last illustration). What is difference here is that we assign a type field value to a variable of DateTime type. DateTime is a value type primitive data type, which means that it is allocated on stack, which in case of this line result in copying the value of type ActiveDate field to a helloDate stack variable. helloDate vaiable therefore would contain a separate value copy and not the pointer to type field.
The result of this line is very expected one: in line 3, console would write out the value of helloDate as the same to the value of HelloHelper.ActiveDate

Line -> HelloHelper.ActiveDate=Convert.ToDate("11/23/1976");

The purpose of this line, which changes the value of the type ActiveDate field to a new value, is to highlight the previous statements about the fact that the value is been copied in line , not the pointer to the value. That‘s why in line 5, console would write out the "old" value and not the new. Once again, value types are inline allocated on stack, containing the values and not the references.

Line -> HelloHelper hlp=new HelloHelper("Nikola");

In this line we would leave the kingdom of Types, and enter the realm of Instances.

First of all, CLR would examine the constructor signature and due to the fact that HelloHelper constructor contains a parameter argument value, CLR would first put that value on the stack as temporary location which would be read by constructor. Then CLR would process the new with creating of an instance of HelloHelper type which would result with heap memory allocation required for handling that instance related data.That memory would be allocate right after the last allocated block so CLR would handle only continuing memory blocks and skip the seeking operations as performance enhancements. Once the CLR would allocate memory, it would update the instance type object pointer to point to instance type and sync block and initialize instance fields default values.

Result of this allocation process would be a pointer pointing to the address of newly allocated memory block. Value of that pointer would be copied to stack variable defined on the left side of the statement in line 6.

That‘s how, after complete execution of the line 6, we would have on stack HelloHelper variable containing pointer to HelloHelper heap instance which would point to HelloHelper type. There‘s a diagram illustrating the state of memory after the line 6 would execute

Line -> string helloText=hlp.GetHelloText();

In this line, the CLR would first find in stack pointer containing heap address where the instance is located. Then it would use instance type object pointer value to find a type containing the GetHelloText() method definition. The method would be JIT compiled and executed in context of newly created HelloHelper instance (with a instance field value Name=John). The result of method execution would be creation of a string, which due to the fact that strings are reference types, would be allocated on heap and its pointer would be stored in stack variable named helloText

Line 9 -> helloText += "(Addon)";

Here we have concatenation of the string pointed allocated in line 7 with a "(Addon)" string. Due to the fact that the stings are immutable , concatenation is in fact full heap allocation of new string instance with concatenated string value. After that allocation would be complete, address of the new string instance would update the existing helloText pointer. Original instance wouldn‘t have any roots (stack pointers) left, so it would become candidate for garbage collection.

Line -> hlp=new HelloHelper("Doe")

Last line of the example creates a new instance of HelloHelper class. The process here is basically the same as the one just described:

new instance would be allocated on GC heap
instance field value would be initialized with the constructor parameter value
after allocation would be complete, pointer to newly allocated memory block would be retrieved and stored in hlp stack variable.
original instance would lost roots and become candidate for GC collection

At the end of this line memory model would look like this

Line -> hlp.Image=new byte[90000];

Every array is a reference type in .NET (System.Array type) so it would be allocated on heap as all other reference types. But in cases when the size of reference type is greater then 85Kb , allocation happens in special type of GC heap called large object heap (LOH) to increase the performance of GC because the costs of collection and defragmentation of such a large objects would take too much CPU cycles for GC.

Therefore those large objects are separated in their own GC address space and treated as Generation 2 entities (which means they are collected only on full GC collections). The memory space of LOH is never defragmented.

All the above mentioned things would happen after execution of this line so the memory model would look after that like this

As we see in the diagram, Image property of instance allocated in normal GC heap would contain only a pointer to LOH address where the large memory block would be allocated

Static VS instance methods

Originally, when I was learning about the difference between static and instance methods I had in my mind picture that static methods in diagram are inside of "load heap box" (and therefore JIT compiled "once per type " and instance method are inside of "GC heap box" (and JIT compiled once per instance). We can see on the last diagram that is not true at all, because both static GetHelloText method and GetHelloText instance method are in "type box". When you think about it just for a second, there is no sense at all in having method def table in "instance box" because that would mean that we would have one JIT compilation per each instance which would heart performances a lot.

So, in both cases , IL code of instance and type methods are JIT compiled once per type (per Application Domain).

The difference is in the fact that for instance method, CLR needs a context to run the instance method. Something on which that method logic depends on is different in each one of the instances, so to call that method (although it‘s "in type box"), CLR requires a instance to be provided as "hook up" to context values used by method. If nothing in method is instance dependable , then the instance method should be declared as static.

Also, if you were ever asking yourself how (on CLR level) you can access static field from instance, but not instance field from static method, previous diagram is providing answer on that question too. If you would just look at the direction of arrows (representing type object pointer values) you would see that instance is "aware of" type, but type is not aware of its instances. Having that in mind and the fact that static fields are located "in type box", we got to the answer of how instance method access static field.

So, if instance methods can "do more" then static methods and they are both once JITCompiled , the question is why we should use static methods? The answer is very simple: slightly better performance due to the fact that the instance methods are called using the callvirt IL statement (which has additional if !null check of the variable calling the method) and the fact that virtual methods can not be inlined by the JIT compiler

On the other hand, using static methods limits a lot architectural options because with static methods there is no inheritance, interfaces etc

My preferred solution here is to use instance methods enhanced with some singleton pattern approach

What about memory allocation of properties and events?

CLR know only about two things: methods and fields. Properties are during the compile time transformed to set/get pair of methods which is then JIT compiled like any other method. Events get the same treatment as properties (decomposition to methods and fields)

What about memory allocation models of interfaces and partial classes?

According to CLR via C#, from the CLR perspective interfaces and partial classes don‘t exist at all. Interfaces are something which C# compiler uses to verify certain type safety related things and partial classes are just a way of signalizing the C# compiler that the code which is about to be compiled is scattered across different files.

So, that would be it... Hope you enjoy it and stay tuned for the next part :)