In an attempt to
improve string-handling performance, you have converted your code to
use the StringBuilder
class. However, this change
has not improved performance as much as you had hoped.
The chief advantage of a StringBuilder
object over
a string
object is that it preallocates a default
initial amount of memory in an internal buffer in which a string
value can expand and contract. When that memory is used, however,
.NET must allocate new memory for this internal buffer.
You can reduce the
frequency with which this occurs by explicitly defining the size of
the new memory using either of two techniques. The first approach is
to set this value when the StringBuilder
class
constructor is called. For example, the code:
StringBuilder sb = new StringBuilder(200);
specifies that a StringBuilder
object can hold
200
characters before new memory must be
allocated.
The second approach is to change the value after the
StringBuilder
object has been created, using one
of the following properties or methods of the
StringBuilder
object:
sb.Capacity = 200; sb.EnsureCapacity(200);
As noted in previous recipes in this chapter, the
string
class is immutable; once a string is
assigned to a variable of type string
, that
variable cannot be changed in any way. So changing the contents of a
string variable entails the creation of a new
string
containing the modified string. The
reference variable of type string
must then be
changed to reference this newly created string
object. The old string
object will eventually be
marked for collection by the garbage collector, and, subsequently,
its memory will be freed. Because of this intensive behind-the-scene
action, code that performs intensive string manipulations using the
string
class suffers greatly from having to create
new string
objects for each string modification,
and greater pressure is on the garbage collector to remove unused
objects from memory more frequently.
The StringBuilder
class solves this problem by
preallocating an internal buffer to hold a string. The contents of
this string buffer are manipulated directly. Any operations performed
on a StringBuilder
object do not carry with it the
performance penalty of creating a whole new string
or StringBuilder
object and, consequently, filling
up the managed heap with many unused objects.
There is one
caveat with using the StringBuilder
class, which,
if not heeded, can impede performance. The
StringBuilder
class uses a default initial
capacity to contain the characters of a string, unless you change
this default initial capacity through one of the
StringBuilder
constructors. Once this space is
exceeded, by appending characters, for instance, a new string buffer
is allocated double the size of the original buffer. For example, a
StringBuilder
object with an initial size of 20
characters would be increased to 40 characters, then to 80
characters, and so on. The string contained in the original internal
string buffer is then copied to this newly allocated internal string
buffer along with any appended or inserted characters.
The default capacity for a StringBuilder
object is
16 characters; in many cases, this is much too small. To increase
this size upon object creation, the StringBuilder
class has an overloaded constructor that accepts an integer value to
use as the starting size of the preallocated string. Determining an
initial size value that is not too large (thereby allocating too much
unused space) or too small (thereby incurring a performance penalty
for creating and discarding a large number of
StringBuilder
objects) may seem like more of an
art than a science. However, determining the optimal size may prove
invaluable when your application is tested for performance.
In cases where good values for the initial size of a
StringBuilder
object cannot be obtained
mathematically, try running the applications under a constant load
while varying the initial StringBuilder
size. When
a good initial size is found, try varying the load while keeping this
size value constant. You may discover that this value needs to be
tweaked to get better performance. Keeping good records of each run,
and committing them to a graph, will be invaluable in determining the
appropriate number to choose. As an added note, using PerfMon
(Administrative Tools → Performance Monitor) to detect and
graph the number of garbage collections that occur might also provide
useful information in determining whether your
StringBuilder
initial size is causing too many
reallocations of your StringBuilder
objects.
The most efficient method of setting the capacity of the
StringBuilder
object is to set it in the call to
its constructor. The overloaded constructors of a
StringBuilder
object that accept a capacity value
are defined as follows:
public StringBuilder(intcapacity
) public StringBuilder(stringstr
, intcapacity
) public StringBuilder(intcapacity
, intmaxCapacity
) public StringBuilder(stringstr
, intstartPos
, intlength
, intcapacity
)
In addition to the constructor parameters, one property of the
StringBuilder
object allows its capacity to be
increased (or decreased.) The Capacity
property
gets or sets an integer value that determines the new capacity of
this instance of a StringBuilder
object. Note that
the Capacity
property cannot be less than the
Length
property.
A second way to change the capacity is
through the EnsureCapacity
method, which is
defined as follows:
public int EnsureCapacity(string capacity
)
This method returns the new capacity for this object. If the capacity
of the existing object already exceeds that of the value in the
capacity
parameter, the initial capacity
is retained, and this value is also returned by this method.
There is one problem with using these last two members. If any of
these members increases the size of the
StringBuilder
object by even a single character,
the internal buffer used to store the string has to be reallocated.
However, minimizing the capacity of the object does not force a
reallocation of a new, larger internal string buffer. These methods
are useful if they are used in exceptional cases when the
StringBuilder
capacity may need an extra boost, so
that fewer reallocations are performed in the long run.
The
StringBuilder
object also contains a
Length
property, which, if increased, appends
spaces to the end of the existing StringBuilder
object’s string. If the Length
is
decreased, characters are truncated from the
StringBuilder
object’s string.
Increasing the Length
property can increase the
Capacity
property, but only as a side effect. If
the Length
property is increased beyond the size
of the Capacity
property, the
Capacity
property value is set to the new value of
the Length
property. This property acts similarly
to the Capacity
property:
sb.Length = 200;
The
string
and StringBuilder
objects are considered
nonblittable, which
means that they must be marshaled across any managed/unmanaged
boundaries in your code. The reason is that strings have multiple
ways of being represented in unmanaged code, and there is no
one-to-one correlation between these representations in unmanaged and
managed code. In contrast, types such as byte
,
sbyte
, short
,
ushort
, int
,
uint
, long
,
ulong
, IntPtr
, and
UIntPtr
are blittable types
and do not require conversion between managed and unmanaged code.
One-dimensional arrays of these blittable types, as well as
structures or classes containing only blittable types, are also
considered blittable and do not need extra conversion when passed
between managed and unmanaged code.
The string
and StringBuilder
objects take more time to marshal, due to conversion between managed
and unmanaged types. Performance will be improved when calling
unmanaged code through P/Invoke methods if only blittable types are
used. Consider using a byte array instead of a
string
or StringBuilder
object,
if at all possible.