Ace's Hardware wrote an article about the new features of the Intel Prescott CPU. Like you will problably know this CPU will have doubled L1 and L2 cache and PNI (Prescott New Instructions). But did you already know all of the following new features?
bigger D-L1 cache (16 KB instead of 8 KB) & L2-cache (1 MB instead of 512 KB) .. No comments necessary
4x Improved Clock Distribution (compared to Northwood) for better Frequency Scaling
Improved Imul latency
Prescott New Instructions (SSE-3), which will not improve performance at the launch (needs optimized software).
Additional WC Buffers. Instead of sending small pieces of data to the AGP videocard, these pieces of data are stored together in buffers, and send through in one big burst. This helps to preserve FSB bandwidth as the bandwidth of the FSB is more efficiently used (less overhead from one big burst than from many small ones)
Improved Pre-Fetcher Branch Predictor. I did not get much info on this but it seems that the buffers have been made bigger so the branch predictor will be able to cope better with more then one thread.
Automated design of the functional blocks
Ace's Hardware also managed to find out some more information about the improved Hyperthreading that the Prescott will feature. Two new instructions and the bigger L1 and L2 caches are the two key words for the improved Hyperthreading :
The two new instructions are MONITOR and MWAIT. The use of MWAIT seems to help to transfer CPU resources from a thread that doesn't need (that is storing it's results to the memory) them anymore to a new thread. But to benefit from these two new instructions, applications need to be recompiled. Intel's current compiler already supports these two new instructions, but it is clear that it will take some time before we will see these instructions pop up in commercial applications. Essentially, the improved Hyperthreading performance will come solely from the bigger caches.
The Automated design of the functional blocks is quite interesting. Human designers have to group all the functional blocks (FPU, cache controller etc.) in nicely separated blocks to keep it some overview. Intel seems to have used a sort of Expert AI designer software so that each piece of a functional block is placed on the most ideal place on the CPU die. This means for example that the Floating point unit can not be recognized anymore from a die photo as pieces of the FPU have been scattered all over the die to minimize the time it takes for the units to exchange information. You could say the whole die has been optimized for much better logistics. This should help the clockspeed scaling quite a bit.