Intel Pentium 4 660 and Extreme Edition 3.73 GHz - Prescott2 in the 32-bit make. Part 1
As many of you remember, a year ago (to be more precise, on 2 February, 2004) the long-awaited release of Prescott, the first Intel's desktop processor (and x86 architecture) made following the 90 nm process technology using the strained silicon, took place. That time, new Prescott-based Pentium 4 processors aroused mixed response among the consumers, even though the Intel NetBurst architecture in its new core gained evident improvements and was entirely different from its Northwood-based predecessors in the following three most important categories:
- Progressive numerous changes to the micro-architecture of the processor core.
- Use of the finer process technology of making the chips.
- Use of more advanced materials for the silicon technology.
Among the most important architectural improvements introduced by Prescott versus Northwood the following can be noted (e.g. read the article "Prescott, or at processor ranch of Craig Barrett"):
- Doubled size of the L2 and L1(!) cache, whereas the latter offers doubled associativity.
- Increase in the length of the main pipeline – from 20 to 31 cycles. Developers chose that in order to achieve a significant increase in the clock speed of the Prescott core in future, although the boost in clock speeds failed due to other reasons.
- Improved branch processing and pre-fetch mechanism.
- Addition of support for 13 new instructions (Prescott New Instructions or PNI) later dubbed more traditionally as SSE3. They allowed accelerating execution of multimedia and gaming applications by 5% on the average (and up to tens of percent in some tasks).
- Elaborated Hyper-Threading technology (switching between threads has been made more predictable). To that end, even the pipeline was amended, with a greater number of execution blocks paralleled.
- Improved power management.
All these rather cardinal innovations promised a noticeable performance boost in the new core as compared to the old. However, that didn't happen at the start - at the same clock speed, Prescott ran with a great number of applications even slower than its predecessor, with the too long pipeline and doubled latency of the cache memory (both L1 and L2) are to blame for that. Both these arrangements were made in Prescott in order to achieve a substantial core clock speed raise in future (they both allow overcoming the issues of matching the high-frequency operation of the units located in various parts of the chip and handle some delays relative to one another). First, they even talked about the possibility to raise the clock speed of such core up to 5 GHz, but as was found out later, the corporation won't be able to release the 4 GHz Intel Prescott announced in the roadmap. However, the Prescott core is somehow able for some rise in the core clock speed, and as it has gone up to 3.6 and even 3.8 GHz the new processors indeed has got a speed advantage over the high-end Northwood models in almost all modern applications. The tradeoff for that was not only the price (higher-end models are more expensive), but the essentially increased power consumption in new Pentium 4 – that is seen not only by the specifications (over 100 W for high-end models) but also from the increased size (and noise) of coolers as well as incompatibility of old motherboards to new processors (in particular, more powerful line conditioners to feed the processor are needed).
Unfortunately, that hasn't added popularity of the new Prescott core among common users (I'd better keep silence about professionals), even if the sales of Prescott processors after all the "settling" of earlier revisions were going on quite lively and, according to the claims by Intel, - even at record pace. Although Intel may have lost some percentage points on the desktop processor market in 2004 (partly because of own mistakes, and partly – due to confident actions from the major competitor who has released the good indeed Athlon 64), that almost hasn't affected the general situation – worldwide, Intel still holds about 80% of the desktop processor market, which tells much. More amusing is the fact that only for the past half a year Intel sold more processors with support for new 64-bit extensions to x86-64 (Intel EM64T) than AMD for the whole period of existence of AMD64 architecture processors. Of course, the vast majority of both are still operated by users in the 32-bit mode as before because of the lack of mass operating systems and respective applications.
Prescott2M
While those events were going on in 2004, the corporation was preparing another core for its desktop processors which has seen the light today: on 21 February 2005, Intel turned Prescott for the second time through releasing a new core for Pentium 4 dubbed as Prescott2M (also known as Irwindale for the similar Intel Xeon DP line). No doubt, with all the clamor related to the forthcoming release of multicore Smithfield processors later in 2005 the new slightly improved Presott2M looks like a half measure. This move of the corporation is quite justified and even necessary: the progress of Prescott core came to a standstill at 3.8 GHz, and new Extreme Edition at higher clock speed based on the Gallatin core (to support the reputation of the most advanced desktop CPU ;)) can't be made either…
The only things left to do were to build up the cache memory size of Prescott to 2 MB with small losses (which was already done in the Dothan core), introduce some insignificant amendments to the former core (in particular, to improve its manufacturability, reduce leaks and perhaps slightly optimize Hyper-Threading for the increased cache size – the cache division mechanism etc.), finally "legalize" support for the EM64T and XD-bit (which emerged earlier in the Prescott core of the latest revisions although not always available for the end consumer), as well as the Enhanced Intel SpeedStep, or EIST technology (to reduce the power consumption, which also was there in the chips of Prescott (E0) and Nocona but waited for its time for «resolution» in desktop Pentium 4) and send all that to the already well-established for the year and a half 90-nm process technology of manufacture (again with the same strained silicon) following which the Prescott of stepping E0 is done. In fact, there are also quite weighty reasons to assume that in the Prescott2M chip the Vanderpool technology gained further development (virtualization of applications, start-up of several independent operating systems on a single PC) which albeit rudimentary was perhaps present in Prescott as well. However, for now Intel does not offer these features to users, so it is still premature to talk about their release in connection with the release of new processors.
That is, by and large, the new Prescott2M core (Irwindale) is in fact the former Prescott (Nocona) with the partly added L2 cache (as is easy to see through comparison of photos of both cores, see above) and the «fully enabled» Intel EM64T, Execute Disable Bit (XD bit) and Enhanced Intel SpeedStep technologies (by the way, the latter will be missing in the Extreme Edition model so as not to lose even a little bit in performance of the top-end processor).
Therefore, it doesn't make sense discussing the architectural (and technological) differences between these cores - we'd better compare them in Table 1.
Table 1. Brief comparison of chips in modern high-end Intel processors for desktop PC | |||||
Processor | Intel Pentium 4Prescott2M | Intel Pentium 4Extreme Edition (Prescott2M) | Intel Pentium 4Prescott | Intel Pentium 4Northwood | Intel Pentium 4Extreme Edition (Gallatin) |
Core frequencies, GHz | 2,8-3,6* | 3,73* | 2,8 – 3,8 | 1,6 – 3,4 | 3,2 – 3,47 |
System bus speeds, MHz | 800 | 1067 | 800, 533 | 800, 533, 400 | 1067, 800 |
Process technology | 90 nm, strained silicon | 130 nm | |||
Core area, sq. mm | 135 | 135 | 112 | 131 | 237 |
Q-ty of transistors | 169 mln | 169 mln | 125 mln | 55 mln | 178 mln |
Length of the integer pipeline | 31 | 31 | 31 | 20 | 20 |
L1 cache size | 16 K | 16 K | 16 K | 8 K | 8 K |
L1** cache memory latency | 4 | 4 | 4 | 2 | 2 |
Associativity of the L1 cache memory | 8 | 8 | 8 | 4 | 4 |
Instructions cache memory capacity | 12 thousand micro instructions | ||||
L2 cache size | 2048 K | 2048 K | 1024 K | 512 K | 512 K |
L2** cache memory latency | ~31 | ~31 | ~29 cycles | ~18 cycles | ~18 cycles |
Associativity of L2 cache memory | 8 | 8 | 8 | 8 | 8 |
L3 cache | none | 2048 K | |||
L3** cache memory latency | - | - | - | - | ~43 cycles |
Extension of the instructions set | SSE3/SSE2/SSE, Intel EM64T, XD bit | SSE3/SSE2/SSE, Intel EM64T, XD bit | SSE3/SSE2/SSE | SSE2/SSE | SSE2/SSE |
Hyper-Threading technology | improved | standard | |||
Supply voltage, V | 1,25-1,4 | 1,25-1,4 | 1,25-1,4 | 1,55 | 1,55 |
Admissible heat power, TDP, W | 84 or 115 | 115 | 89-103 | no more than 82 | no more than 115 |
* - top frequencies to date A rise is possible in future.** - measured with CPU-Z. |
The number of transistors in Prescott2M has been increased by 44 mln as compared to Prescott. Since most of them went to the additional megabyte (8192 Kbit) of L2 cache, we get that a bit more than five (to be more precise - 5.24) transistors fall within each cell of the added L2 memory. Reminding it that a standard SRAM-cell requires 6 transistors plus expenses for multiplexers and buffers (approx. 1 transistor per 1 memory cell is spent for "auxiliary" needs - main buffers, multiplexers etc. That is confirmed by the estimates based on comparison of Northwood and Gallatin cores). Since the original Prescott core shows some excess in transistors (reviewers even liked to speculate about new unrealized capabilities on that), it is evident that part of the "excesses" has been simply removed in the new core. The new Extreme Edition built on the same core and offering a much faster system bus (1067 MHz versus 800 MHz in Pentium 4 660 and lower) makes a big difference from the former Extreme Edition that used the 130-nm core with a bit greater number of transistors and much larger (almost twice as much) chip – 237 versus 135 sq.mm. The new core does not make so significant difference in area from Prescott – by 20%.
The L2 cache parameters in Prescott2M have remained the same, and apart from increased L2 cache size (with the former associativity preserved) they slightly increased the latency (according to the readings produced with CPU-Z and some others – by about 2-3 cycles of the core speed, i.e. by approximately 5-10%; for now I still can't tell the more precise figure).
Besides, the new processors offer a bit increased nameplate dissipation power - up to 115W in high-end models. According to my subjective perception, Pentium 4 660 and Extreme Edition 3.73 GHz were heating up less intensely than the Pentium 4 560 specimen used for comparison.
On 21 February 2005, Intel released several new Pentium 4 processors built on the Prescott2M core at a time. They all are based on the first official stepping of the core - N0. Their list with brief distinctive features is presented in Table 2.
Table 2. Intel Pentium 4 processors released on 21 February 2005 | ||||||
Processor | Core frequency, GHz | FSB speed, MHz | L2 and L3 cache memory | Supply voltage, V | Admissible heat power, TDP, W | Preliminary wholesale price, $ |
Pentium 4 Extreme Edition 3.73 GHz | 3,73 | 1067 | 2 MB L2 | 1,25-1,4 | 115 (04B PCG) | 999 |
Pentium 4,660 | 3,60 | 800 | 2 MB L2 | 1,25-1,4 | 115 (04B PCG) | 605 |
Pentium 4,650 | 3,40 | 800 | 2 MB L2 | 1,25-1,4 | 84 (04A PCG) | 401 |
Pentium 4,640 | 3,20 | 800 | 2 MB L2 | 1,25-1,4 | 84 (04A PCG) | 273 |
Pentium 4,630 | 3,00 | 800 | 2 MB L2 | 1,25-1,4 | 84 (04A PCG) | 224 |
All these processors are aimed at the LGA775 socket that appeared in summer 2004 (new Pentium 4 won't be produced for Socket 478, unlike Celeron D) and use the processor model ID number (all except Extreme Edition) from 630 to 660 are in a similar sequence as Pentium 4 (Prescott, LGA) 560 and lower which were released half a year ago. Obviously, this line will also include model 620 of 2.8 GHz clock speed, but on the date of writing the review there wasn't official data on that yet. A bit later, in spring 2005, the 3.8 GHz Pentium 4 670 model (priced at over 800$) will appear. By the way, prices for Prescott2M will still be much higher than for Prescott of similar clock speed. Below we'll see how much performance we'll gain for these extra money (in addition to the new EM64T, XD, and EIST technologies).
Initially, these processors will run at Intel 925XE/X and 915 chipsets (by other manufacturers as well), however later Intel intends to release new 945 series chipsets for them and future multicore processors with support for much faster memory and other «dainties».
It's interesting to see how Intel which recently radically changed its structure and reoriented to "platform" directions is positioning its new products. Mass processors of the Pentium 4 6xx series are mainly aimed at corporations for business users. The idea behind it is that it is just corporate customers will be able spending extra funds for the introduction of new technologies for 64-bit computations (EM64T), advanced security (XD bit) and power saving (therefore, noise reduction in the PC) so that to gain hundreds of thousand dollars of such saving on a mid-size corporate scale.
Intel subdivides corporate platforms for desktop PCs in a small business and IT department into three categories (see the slide below):
- Leading powerful platform.
- Mass stable platform for IT divisions.
- Mass working platform for small businesses.
Processors and chipsets aimed at each of the categories are presented on this block diagram.
The second category for using new processors are platforms for the home users (Consumer Platform). In it, Intel marks out three segments:
- Powerful platforms for games and creativity.
- Mass platforms for work and leisure.
- Entry-level platforms.
Processors and chipsets aimed at each of the categories are presented on another block diagram.
At the same time, the manufacturer accentuates that along with 2 MB of L2 cache and improved security (XD bit), the platforms on new processors will automatically acquire power-saving features controlled by the operating systems (support for which starts with Windows XP SP2) similar to those used in notebooks as well as effective power management without loss in system performance (although the documents of the corporation state that use of EIST may result in the reduction of clock speed and processor performance). At the same time, Intel EM64T will impart «flexibility to operating systems and applications with enabled support for 64-bit computational systems, while applications handling digital multimedia content will gain a number of advantages of using the EM64T for processing hi-definition audio and video».
In view of the above statement, it is interesting to note how the view of the corporation on the usefulness of 64-bit extensions has changed. Initially, when Intel just announced its intent to embed the EM64T into its 32-bit processors, the employees outlined it in every way that the 64-bit feature in this case does not bring additional bonuses in terms of performance boost on traditional desktop platforms. They related the usefulness of the 64-bit nature of the new platform only to the cases when large (greater than 4 GB) amount of memory and specific (mainly server-type and other "highly professional" applications) were required. Personally, I heard it from Patrick Helsinger (and other directors of Intel) that 64 bits in Pentium 4 or Xeon is NOT a performance issue and for personal computers it is a thing of little use to personal computers! Now, the advantages of EM64T in running (at least) digital multimedia content while processing high-definition audio and video is put forward as one of the most important marketing directions in promoting the new product. Well, things are changing :) In fact, the manufacturer accentuates that the time of introducing the EM64T on Intel platforms for desktop PCs is synchronized to the schedule of releasing the Microsoft Windows XP Pro x64.
That's enough with the theory - now moving on to practice. Our practical investigation of new processors will comprise at least three independent parts related to various operation aspects of the new processors:
• Performance in current 32-bit tasks powered by the 32-bit Windows XP.
• Performance in the future 64-bit Windows XP Professional x64.
• Power consumption and power-saving aspects of the new processors.
To be continued in Part Two of the review.