Andrew Lauritzen created a DirectX 10 demo which shows off variance shadow maps. To check out this demo you need DirectX 10 capable graphics card like the GeForce 8800 and Windows Vista.
Shadow MSAA makes a *huge* difference in motion (use "animate light" checkbox). It really has to be seen to be believed, but even for really large minimum filter widths, swimming is still somewhat visible without MSAA. With even 4x MSAA swimming is drastically reduced or eliminated.
int32 is also really awesome for summed-area tables, and is the preferred implementation. Two things make this the case: the extra bits of precision over fp32, and the overflow behavior in D3D10. The latter works because overflow is wrapped in D3D10 which means that we only need to waste W*H bits of the SAT for accumulation where WxH are the dimensions of the maximum filter width. This maximum filter width can be bounded fairly conservatively (ex. 64x64 is plenty - probably overkill for most implementations). The results of int32 make numeric precision a non-issue again, and save a ton of memory bandwidth since there's no need to distribute precision into 4 components.
Parallel-split variance shadow maps are also really cool, especially with the new, larger "convoy" scene. Three 512x512 variance shadow map splits with 4x MSAA and a bit of blurring looks fantastic and has excellent performance, and the quality can go up from there if necessary. Note that this implementation is relatively unoptimized; it's more of a "proof of concept". In particular, the shadow split locations could be chosen a lot more sensibly, and even some basic frustum culling would greatly improve the performance of rendering the different shadow map splits.