StringBuffer re-visited
Some time back, I mentioned a tip I picked up about using StringBuffer rather than endlessly concatenating String objects thus:
String newString = "widgets" + 4 + "cola" + "general mincery";
All very well, and given that String is an object, makes perfect sense. But actually reading the API documentation about StringBuffer the other day yielded this wee nugget of information:
String buffers are used by the compiler to implement the binary string concatenation operator +. For example, the code:
x = "a" + 4 + "c"is compiled to the equivalent of:
x = new StringBuffer().append("a").append(4).append("c") .toString()which creates a new string buffer (initially empty), appends the string representation of each operand to the string buffer in turn, and then converts the contents of the string buffer to a string. Overall, this avoids creating many temporary strings.
So now I'm not so sure about the validity of the original tip (bar saving the compiler some work). Anyone else care to chip in?

You should use StringBuffer explicitly in two circumstances:
(1) You are looping over the string concatenation, passing the buffer between methods, or using some other structure the compiler isn't smart enough to optimise around.
(2) You are creating a very large string in a part of the code where performance is critical, and you want to avoid the StringBuffer having to resize its internal buffer by explicitly setting the buffer size in its constructor.
String res = "";
for( .... ) {
res += ...
}
This is easily 100 to 1000 times slower than the StringBuffer equivalent, even with the JDK1.4 generational GC.
dont forget a StringBuffer is mutable (changeable) so you can pass it into a method and its contents can be changed. The content change will be reflected outside the method also. You cannot do this with Strings as they are immutable (un-changeable). This is not passing a parameter by reference which apparently Java does not do, its passing a copy of the reference.
So if you have a very large amount of text to pass between methods use a StringBuffer and there will be only one copy of the text in memory. I think...
I was just reading up on this subject the other night (which means, i think, i'm becoming a completely hopeless raging nerd).
here is the caveat:
Let's say you create a StringBuffer "sb". You loop around, mutate your chars however you want. Cool. It is very efficient for this. As Mr. Shirazi says, there are no "intermediate objects" being created. Then you pop the results into a String -> sb.toString();
After this your program does some more stuff, and the StringBuffer gets mutated again. Now the char array in the StringBuffer is copied, and the StringBuffer now points to the new char array. So the more this happens, the higher the overhead. In other words, do all the mutating you can do before calling toString(), as it adds overhead. This is because the String object maintains the reference to the previous char array. Of course, you might need several different Strings, there may be reasons to do this. But it is something to be aware of at least.
I'm still sorting all this out myself. See Shirazi's book... it's really fantastic. There is an example in the book where he compares methods that perform a word count on a text file, one using straight up char arrays, and one where it uses StringTokenizer. it turns out that the method that uses StringTokenizer ends up creating 1.2 million objects (in his example). The code using the char array takes less than 1% of the time to do the same task because it isn't creating objects (that have to be garbage collected and take up memory, etc) along the way.
what belies all of this is a good lesson for java: objects are expensive, and programs that create lots of objects are really expensive.
Donald Knuth said:
I think this thread is instructive; in most cases this kind of string concatenation isn't going to cripple your application. If it does, you can always whip out your handy dandy profiler and see which portion of your code is causing the problem.
I've done Java code in Domino where I worried about stuff like StringBuffer optimization, but the profiler showed that repeated calls to Database.getView were causing the performance problems.
To paraphrase Knuth:
learned that at one of those fancy conventions they send us to. i'm not sure where they get the numbers from, but it seems to make sense anyway.
But having said that, I still take slight exception to the notion that these things don't matter. Sure, you see some travesties, like overly-complicated algorithms, throwing assembler in the wrong places, and generally really fscking the cat trying to save a few processor cycles.
In contrast, I've occasionally tried to get answers to very simple 'which is a more efficient way of doing it' questions, and had the 'don't worry about optimisation' response. I suspect that answer is occasionally given because the person answering doesn't know the real answer and doesn't want to admit it. (I'm not saying that's the case with you so please don't read too much into it). I've also been on the other side and expressed my frustration with colleagues by saying "who cares, just write the damned code cleanly and stop fluffing on about a few microseconds!"
The REAL issue and reason for asking though, is that there is bound to be a Right Way and a Wrong Way. Or at the least, a Better Way and a Slightly Less Better Way. The difference in performance might be negligible, and equally, the difference in effort or readability might be negligible too. And yes, if that's the case, why bother? My response: in such cases, why not learn the Right/Better Way, and use it? Firstly, there is satisfaction gained in Doing The Right Thing, and secondly, knowing which is the better way and why, often leads to a better understanding of the nature and guts of the platform/language you're using. There's no evil in that, and often a deepened understanding of your platform *will* make a difference to your code somewhere down the line.
Knuth's opinion re performance and the like is also espoused in that old benpoole.com favourite, The Pragmatic Programmer, and makes perfect sense. Whilst logic dictates we should abide by the 80/20 rule, at the same time, keeping an eye on the small stuff can't hurt, which is what the original post was all about: a tiny detail in a huge system can make all the difference I think.
But what do I know?