Answered Mar 23, 2009 · 17 votes
There are many forms of leaks:
- Unmanaged leaks (code that allocates unmanaged code)
- Resource leaks (code that allocates and uses unmanaged resources, like files, sockets)
- Extended lifetime of objects
- Incorrect understanding of how GC and .NET memory management works
- Bugs in the .NET runtime
The first two is usually handled by two different pieces of code:
- Implementing IDisposable on the object and disposing of the unmanaged memory/resource in the Dispose method
- Implementing a finalizer, to make sure unmanaged resources are deallocated when GC has found the object to be eligible for collection
The third, however, is different.
Say you are using a big list holding thousands of objects, totalling a significant size of memory. If you keep around a reference to this list for longer than you need to, you will have what looks like a memory leak. In addition, if you keep adding to this list, so that it grows with more data periodically, and old data is never reused, you definitely have a memory leak.
One source of this I've seen frequently is to attach methods to event handlers but forget to unregister them when you're done, slowly bloating the event handler both in size and code to execute.
The fourth, an incorrect understanding of how .NET memory management works can mean that you look at the memory usage in a process viewer and notice that your app keeps growing in memory usage. If you have lots and lots of memory available, GC might not run that often, giving you an incorrect picture of the current usage of memory, as opposed to mapped memory.
The fifth, that's harder, I've only seen one resource management bug in .NET so far and afaik it has been slated for fix in .NET 4.0, it was with copying the desktop screen into a .NET image.
Edit: In response to the question in the comments, how to avoid keeping references longer than necessary, then the only way to do that is to to just that.
First, if you have a long-running method (for instance, it might be processing files on disk, or downloading something, or similar), and you used a reference to a big data-structure early in the method, before the long-running part, and then you don't use that data structure for the rest of the method, then .NET, in release-builds (and not running under a debugger) is smart enough to know that this reference, though it is held in a variable that is technically in scope, is eligible for garbage collection. The garbage collector is really aggressive in this respect. In debug-builds and running under a debugger, it will keep the reference for the life of the method, in case you want to inspect it when stopped at a breakpoint.
However, if the reference is stored in a field reference in the class where the method is declared, it's not so smart, since it's impossible to determine whether it will be reused later on, or at least very very hard. If this data structure becomes unnecessary, you should clear the reference you're holding to it, so that GC will pick it up later.
Answered Apr 01, 2009 · 78 votes
Bear in mind that I speak ASFAC++B. :) I've put the most important differentiating factor first.
Garbage Collection (GC) is the single most important factor in differentiating between these languages.
While C and C++ can be used with GC, it is a bolted-on afterthought and cannot be made to work as well (the best known is here) - it has to be "conservative" which means that it cannot collect all unused memory.
C# is designed from the ground up to work on a GC platform, with standard libraries also designed that way. It makes an absolutely fundamental difference to developer productivity that has to be experienced to be believed.
There is a belief widespread among C/C++ users that GC equates with "bad performance". But this is out-of-date folklore (even the Boehm collector on C/C++ performs much better than most people expect it to). The typical fear is of "long pauses" where the program stops so the GC can do some work. But in reality these long pauses happen with non-GC programs, because they run on top of a virtual memory system, which occasionally interrupts to move data between physical memory and disk.
There is also widespread belief that GC can be replaced with shared_ptr, but it can't; the irony is that in a multi-threaded program, shared_ptr is slower than a GC-based system.
There are environments that are so frugal that GC isn't practical - but these are increasingly rare. Cell phones typically have GC. The CLR's GC that C# typically runs on appears to be state-of-the-art.
Since adopting C# about 18 months ago I've gone through several phases of pure performance tuning with a profiler, and the GC is so efficient that it is practically invisible during the operation of the program.
GC is not a panacea, it doesn't solve all programming problems, it only really cleans up memory allocation, if you're allocating very large memory blocks then you will still need to take some care, and it is still possible to have what amounts to a memory leak in a sufficiently complex program - and yet, the effect of GC on productivity makes it a pretty close approximation to a panacea!
C++ is founded on the notion of undefined behaviour. That is, the language specification defines the outcome of certain narrowly defined usages of language features, and describes all other usages as causing undefined behaviour, meaning in principle that the operation could have any outcome at all (in practice this means hard-to-diagnose bugs involving apparently non-deterministic corruption of data).
Almost everything about C++ touches on undefined behaviour. Even very nice forthcoming features like lambda expressions can easily be used as convenient way to corrupt the stack (capture a local by reference, allow the lambda instance to outlive the local).
C# is founded on the principle that all possible operations should have defined behaviour. The worst that can happen is an exception is thrown. This completely changes the experience of software construction.
(There's unsafe mode, which has pointers and therefore undefined behaviour, but that is strongly discouraged for general use - think of it as analogous to embedded assembly language.)
In terms of complexity, C++ has to be singled out, especially if we consider the very-soon-to-be standardized new version. C++ does absolutely everything it can to make itself effective, short of assuming GC, and as a result it has an awesome learning curve. The language designers excuse much of this by saying "Those features are only for library authors, not ordinary users" - but to be truly effective in any language, you need to build your code as reusable libraries. So you can't escape.
On the positive side, C++ is so complex, it's like a playground for nerds! I can assure you that you would have a lot of fun learning how it all fits together. But I can't seriously recommend it as a basis for productive new work (oh, the wasted years...) on mainstream platforms.
C keeps the language simple (simple in the sense of "the compiler is easy to write"), but this makes the coding techniques more arcane.
Note that not all new language features equate with added complexity. Some language features are described as "syntactic sugar", because they are shorthand that the compiler expands for you. This is a good way to think of a great deal of the enhancements to C# over recent years. The language standard even specifies some features by giving the translation to longhand, e.g. using statement expands into try/finally.
At one point, it was possible to think of C++ templates in the same way. But they've since become so powerful that they are now form the basis of a whole separate dimension of the language, with its own enthusiastic user communities and idioms.
The strangest thing about C and C++ is that they don't have a standard interchangeable form of pre-compiled library. Integrating someone else's code into your project is always a little fiddly, with obscure decisions to be made about how you'll be linking to it.
Also, the standard library is extremely basic - C++ has a complete set of data structures and a way of representing strings (std::string), but that's still minimal. Is there a standard way of finding a list of files in a directory? Amazingly, no! Is there standard library support for parsing or generating XML? No. What about accessing databases? Be serious! Writing a web site back-end? Are you crazy? etc.
So you have to go hunting further afield. For XML, try Xerces. But does it use std::string to represent strings? Of course not!
And do all these third-party libraries have their own bizarre customs for naming classes and functions? You betcha!
The situation in C# couldn't be more different; the fundamentals were in place from the start, so everything inter-operates beautifully (and because the fundamentals are supplied by the CLR, there is cross-language support).
It's not all perfect; generics should have been in place from the start but wasn't, which does leave a visible scar on some older libraries; but it is usually trivial to fix this externally. Also a number of popular libraries are ported from Java, which isn't as good a fit as it first appears.
Closures (Anonymous Methods with Local Variable Capture)
Java and C are practically the last remaining mainstream languages to lack closures, and libraries can be designed and used much more neatly with them than without (this is one reason why ported Java libraries sometimes seem clunky to a C# user).
The amusing thing about C++ is that its standard library was designed as if closures were available in the language (container types, <algorithm>, <functional>). Then ten years went by, and now they're finally being added! They will have a huge impact (although, as noted above, they leak underfined behaviour).
C# and JavaScript are the most widely used languages in which closures are "idiomatically established". (The major difference between those languages being that C# is statically typed while JavaScript is dynamically typed).
I've put this last only because it doesn't appear to differentiate these languages as much as you might think. All these languages can run on multiple OSes and machine architectures. C is the most widely-supported, then C++, and finally C# (although C# can be used on most major platforms thanks to an open source implementation called Mono).
My experience of porting C++ programs between Windows and various Unix flavours was unpleasant. I've never tried porting anything very complex in C# to Mono, so I can't comment on that.
Answered Mar 28, 2009 · 15 votes
C is the bare-bones, simple, clean language that makes you do everything yourself. It doesn't hold your hand, it doesn't stop you from shooting yourself in the foot. But it has everything you need to do what you want.
C++ is C with classes added, and then a whole bunch of other things, and then some more stuff. It doesn't hold your hand, but it'll let you hold your own hand, with add-on GC, or RAII and smart-pointers. If there's something you want to accomplish, chances are there's a way to abuse the template system to give you a relatively easy syntax for it. (moreso with C++0x). This complexity also gives you the power to accidentally create a dozen instances of yourself and shoot them all in the foot.
C# is Microsoft's stab at improving on C++ and Java. Tons of syntactical features, but no where near the complexity of C++. It runs in a full managed environment, so memory management is done for you. It does let you "get dirty" and use unsafe code if you need to, but it's not the default, and you have to do some work to shoot yourself.