Yup, alpha always has to be set after any depth channel effects because depth channels are strictly binary (occlude or be occluded) and obliterate any material/texture based alpha upstream in the noodle. That’s why AllZ works in the first case…it eliminates any depth (binary) information from inclusion in the render layer (at render time) resulting in a binary “false” or vanishing function.
In the case of the grass however you only have alpha occlusion of the image planes so the binary info being pumped to the defocus node is making a mess of things. This is where things get interesting. The problem with the trees looking better in the first image is the image’s Sample Filter Type which can be found on the Render tab. These filters have an inverse effect when FSA is used (gauss becomes the sharpest filter while catrom becomes the blurriest…the polar opposite effect of using OSA). Also, setting the filter’s value to 0.50, it’s lowest setting, nearly always yields the best results regardless of OSA/FSA usage. With FSA it doesn’t make any difference what type of alpha channel you use either because even with a premultiplied alpha type (sky or premul) the alpha is ALWAYS composited last resulting in a Keyed or straight alpha. This is because FSA is a series of non-aliased images (regardless of channel type, e.g. alpha, RGB, or Z) meaning that it is totally impossible to multiply any background color into the matte. The final product of all channel types are ultimately premultiplied as the final result of the composite. As a result FSA level 5 will take 5 times longer to composite (post render action) than any level of OSA. The actual render time is the same for either aliasing scheme having the same number of samples but FSA does take longer to advance from frame to frame because Save Buffers is a disk caching routine meaning that you require a minimum of 5 times more overhead (lowest FSA level) to write the render tiles to disk as you would with an OSA configured render which also has save buffers enabled. This is because OSA only writes a single image to disk. All OSA sampling gets done old school Blender style, in RAM.
Essentially, sampling is just a series of ever so slightly skewed camera shots which get recombined along the same lines that your brain combines the images from each of your eyes into a single stereo view. Since FSA does this with a minimum of 5 samples and, because it does this as the last step in the composite, Blender devs ware able to truly antialias those horible depth maps into a smoothly averaged final result.
As for where Blender stores these samples, you can set the file path in your user preferences (drag down header at the top of the screen). The default file is either /tmp\ or C: mp depending on the OS you use. These saved buffer files are multipass openexr files saved to disk as full 32bits per channel floating point color. This makes them MASSIVE. If you have a file utilizing 5 to ten scenes, one render layer per scene, an average of 5 passes per render layer, and you render in full on HD, you can run well into the several gigabyte range of disk space for these saved buffers…all to composite a single HD frame.
I don’t mean to imply that z-masking is useless either, it just seems a bit unintuitive to me always feeling like I’m trying to give my self a mohawk while using 2 mirrors cuz I always get it backward. The main usage seems to be advertized as a way to overcome the inevitible seams that are nearly impossible to avoid when working with standard alpha types (you always have to blur and distort things to hide them). It may be able to save you gobs of memory and render time overhead as compared to simply following the procedure I posted above (under certain circumstances), if you take the time to learn to use it, but proper masking via node techniques will usually win that fight since node based matting allows you to use instances or render layers rather than having to composite an additional full render layer. In my experience FSA sequence depth is far less prone to causing memory related crashes than that which I’ve seen via the width required when using multiple render layers which were necessary for OSA masking techniques. Other’s mileage may vary.