Reference counting

In garbage collection algorithms, reference counts may be used to deallocate objects that are no longer needed.

Reference counting in naive form has two main disadvantages over the tracing garbage collection, both of which require additional mechanisms to ameliorate:

In this context, the simple reference count of an object is the in-degree of its vertex. Deleting a vertex is like collecting an object. It can only be done when the vertex has no incoming edges, so it does not affect the out-degree of any other vertices, but it can affect the in-degree of other vertices, causing their corresponding objects to be collected as well if their in-degree also becomes 0 as a result.

The connected component containing the special vertex contains the objects that can't be collected, while other connected components of the graph only contain garbage. If a reference-counting garbage collection algorithm is implemented, then each of these garbage components must contain at least one cycle; otherwise, they would have been collected as soon as their reference count (i.e., the number of incoming edges) dropped to zero.

One simple technique is for the compiler to combine a number of nearby reference updates into one. This is especially effective for references which are created and quickly destroyed. Care must be taken, however, to put the combined update at the right position so that a premature free can be avoided.

The Deutsch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in data structures, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and registers that no other reference to it still exists.

Systems may also be designed to tolerate or correct the cycles they create in some way. Developers may design code to explicitly "tear down" the references in a data structure when it is no longer needed, though this has the cost of requiring them to manually track that data structure's lifetime. This technique can be automated by creating an "owner" object that does the tearing-down when it is destroyed; for instance, a Graph object's destructor could delete the edges of its GraphNodes, breaking the reference cycles in the graph. Cycles may even be ignored in systems with short lives and a small amount of cyclic garbage, particularly when the system was developed using a methodology of avoiding cyclic data structures wherever possible, typically at the expense of efficiency.

Although it is possible to augment simple reference counts in a variety of ways, often a better solution can be found by performing reference counting in a fundamentally different way. Here we describe some of the variants on reference counting and their benefits and drawbacks.

Destroying a reference decrements the total weight by the weight of that reference. When the total weight becomes zero, all references have been destroyed. If an attempt is made to copy a reference with a weight of 1, the reference has to "get more weight" by adding to the total weight and then adding this new weight to the reference, and then splitting it. An alternative in this situation is to create an indirection reference object, the initial reference to which is created with a large weight which can then be split.

The property of not needing to access a reference count when a reference is copied is particularly helpful when the object's reference count is expensive to access, for example because it is in another process, on disk, or even across a network. It can also help increase concurrency by avoiding many threads locking a reference count to increase it. Thus, weighted reference counting is most useful in parallel, multiprocess, database, or distributed applications.

The primary problem with simple weighted reference counting is that destroying a reference still requires accessing the reference count, and if many references are destroyed, this can cause the same bottlenecks we seek to avoid. Some adaptations of weighted reference counting seek to avoid this by transferring weight from a dying reference to an active reference.

In indirect reference counting, it is necessary to keep track of the reference's source. This means that two references are kept to the object: a direct one which is used for invocations; and an indirect one which forms part of a diffusion tree, such as in the Dijkstra–Scholten algorithm, which allows a garbage collector to identify dead objects. This approach prevents an object from being discarded prematurely.

As a collection algorithm, reference counting tracks, for each object, a count of the number of references to it held by other objects. If an object's reference count reaches zero, the object has become inaccessible, and can be destroyed.

When an object is destroyed, any objects referenced by that object also have their reference counts decreased. Because of this, removing a single reference can potentially lead to a large number of objects being freed. A common modification allows reference counting to be made incremental: instead of destroying an object as soon as its reference count becomes zero, it is added to a list of unreferenced objects, and periodically (or as needed) one or more items from this list are destroyed.

Simple reference counts require frequent updates. Whenever a reference is destroyed or overwritten, the reference count of the object it references is decremented, and whenever one is created or copied, the reference count of the object it references is incremented.

One primary motivation for reference counting in COM is to enable interoperability across different programming languages and runtime systems. A client need only know how to invoke object methods in order to manage object life cycle; thus, the client is completely abstracted from whatever memory allocator the implementation of the COM object uses. As a typical example, a Visual Basic program using a COM object is agnostic towards whether that object was allocated (and must later be deallocated) by a C++ allocator or another Visual Basic component.

C++ does not perform reference-counting by default, fulfilling its philosophy of not adding functionality that might incur overheads where the user has not explicitly requested it. Objects that are shared but not owned can be accessed via a reference, raw pointer, or iterator (a conceptual generalisation of pointers).

In addition, C++11's move semantics further reduce the extent to which reference counts need to be modified by removing the deep copy normally used when a function returns an object, as it allows for a simple copy of the pointer of said object.

Some of the reasons reference counting may have been preferred to other forms of garbage collection in Delphi include:

Perl also uses reference counting, without any special handling of circular references, although (as in Cocoa and C++ above), Perl does support weak references, which allows programmers to avoid creating a cycle.

Xojo also uses reference counting, without any special handling of circular references, although (as in Cocoa and C++ above), Xojo does support weak references, which allows programmers to avoid creating a cycle.