INTRODUCTION
Every two
years NVIDIA engineers set out to design a new GPU architecture. The architecture
defines a GPU's building blocks, how these blocks are connected and how they
work. The architecture blueprint is a basis for a family of chips that serves a
whole spectrum of systems from consumer computers or PC's to workstations and
supercomputers.
Two years
ago NVIDIA lunched Fermi which is named after the Italian nuclear physicist
Enrico Fermi. This new architecture brought two new key advancements. Firstly,
it brought full geometry processing to the GPU, enabling a key DirectX 11
technique called tessellation with displacement mapping. This technique was
used in Battlefield 3 and Crysis 2 and improved the geometric realism of water,
terrain and game characteristics.
Last
month NVIDIA lunched Kepler, the much anticipated successor to the Fermi
architecture. Currently the Kepler is the fastest GPU on the market and also
power efficient. Feature wise, they have added new technologies that
fundamentally improve the smoothness of each frame and the richness of the
overall experience.
When
NVIDIA lunched Fermi with the GeForce GTX 480 the problem was noise and power
consumption. Gamers got top performance at that time, but the problem was noise
from the fans and power consumption. With Kepler NVIDIA team was focused on
building graphic card with silent fans and low power consumption. For these two
key elements the NVIDIA team needed to redesigned the streaming multiprocessor
which is the most important building block of the GPUs. With this redesigned
streaming multiprocessor Kepler got optimal performance per watt. Second feature was adding a feature called
GPU Boost the performance of the graphic card becomes dynamical. This means
that when you play games or working with pictures and videos the GPU Boost
automatically increases the clock speed to improve performance within the card's
power budgets.
Kepler
has new SM, called SMX is radical departure from past designs. SMX eliminates
the Fermi 2x processor clock and uses the same base clock across the GPU.To
balance out this change; SMX uses an ultra-wide design with 192 CUDA cores.
With this new SMX Kepler can do twice as
much as standard SM per watt. Put another way, given a watt of power, Kepler's
SMX can do twice the amount of work as Fermi's SM. The benefit of this power improvement
is most obvious when you plug in a GeForce GTX 680 into your system. If you've
owned a high end graphics card before, you'll know that it requires an 8-pin
and 6-pin PCI-E power connector. With the GTX 680, you need just two 6-pin
connectors. This is because the card draws at most 195 watts of power, compared
to 244 watts on the GeForce GTX 580. If there ever was a middle weight boxer
that fought like a heavy weight that would be the GeForce GTX 680.
GPU Boost
SMX
doubled the performance per watt, but what if the GPU wasn't actually using its
full power capacity? Going back to the light bulb analogy, what if a 100 watt
light bulb was sometimes running at 90 watts, or even 80 watts? As it turns out, that's exactly how GPUs
behave today.
The
reason for this is actually pretty simple. Like light bulbs, GPUs are designed
to operate under certain wattage. This number is called the thermal design
point, or TDP. For a high-end GPU, the TDP has typically been about 250 watts.
You can interpret this number as saying: this GPU's cooler can remove 250 watts
of heat away from the GPU. If it goes over this limit for an extended period of
time, the GPU will be forced to throttle down its clock speed to prevent
overheating. What this also means is, to get the maximum performance; the GPU
should operate close to its TDP, without ever exceeding it. In reality, GPUs
rarely reach their TDP when playing even the most intensive 3D games. This is
because different games consume different amounts of power and the GPU's TDP is
measured using the worst case. Popular games like Battlefield 3 or Crysis 2
consume far less power than a GPU's TDP rating. Only a few synthetic benchmarks
can push GPUs to their TDP limit. For example, say your GPU has a TDP of 200
watts. What this means is that in the worst case, your GPU will consume 200
watts of power. If you happen to be playing Battlefield 3, it may draw as
little as 150 watts. In theory, your GPU could safely operate at a higher clock
speed to tap into this available headroom. But since it doesn't know the power
requirements of the application ahead of time, it sticks to the most
conservative clock speed. Only when you quit the game does it reduce to a lower
clock speed for the desktop environment.
GPU Boost
changes all this. Instead of running the GPU at a clock speed that is based on
the most power hungry app, GPU Boost automatically adjusts the clock speed
based on the power consumed by the currently running app. To take our
Battlefield 3 example, instead of running at 150 watts and leaving performance
on the table, GPU Boost will dynamically increase the clock speed to take
advantage of the extra power headroom. Different games use different amounts of
power. GPU Boost monitors power consumption in real-time and increases the
clock speed when there's available headroom.
How the GPU Boost Works?
The most
important thing to understand about GPU Boost is that it works through real
time hardware monitoring as opposed to application based profiles. As an
algorithm, it attempts to find what is the appropriate GPU frequency and
voltage for a given moment in time. It does this by reading a huge swathe of
data such as GPU temperature, hardware utilization, and power consumption.
Depending on these conditions, it will raise the clock and voltage accordingly
to extract maximum performance within the available power envelop. Because all
this is done via real-time hardware monitoring, GPU Boost requires no
application profiles. As new games are released, even if you don't update your
drivers, GPU Boost "just works."
The GPU
Boost algorithm takes in a variety of operating parameters and outputs the
optimal GPU clock and voltage. Currently it does not alter memory frequency or
voltage though it has the option to do so.
How Much
Boost?
Because
GPU Boost happens in real-time and the boost factor varies depending on exactly
what's being rendered, it's hard to pin the performance gain down to a single
number. To help clarify the typical performance gain, all Kepler GPUs with GPU
Boost will list two clock speeds on its specification sheet: the base clock and
the boost clock. The base clock equates to the current graphics clock on all
NVIDIA GPUs. For Kepler, that's also the minimum clock speed that the GPU cores
will run at in an intensive 3D application. The boost clock is the typical
clock speed that the GPU will run at in a 3D application. For example, the
GeForce GTX 680 has a base clock of 1006 MHz and a boost clock of 1058 MHz .
What this means is that in intensive 3D games, the lowest the GPU will run at
is 1006 MHz, but most of the time, it'll likely run at around 1058 MHz. It
won't run exactly at this speed--based on real-time monitoring and feedback, it
may go higher or lower, but in most cases it will run close to this speed.GPU
Boost doesn't take away from overclocking. In fact, with GPU Boost, you now
have more than one way to overclock your GPU. You can still increase the base
clock just like before and the boost clock will increase correspondingly.
Alternatively, you can increase the power target. This is most useful for games
that are consuming near 100% of this power target.
Genuinely
Smoother Gameplay
Despite
the gorgeous graphics seen in many of today's games, there are still some
highly distracting artifacts that appear in gameplay despite our best efforts
to suppress them. The most jarring of these is screen tearing. Tearing is
easily observed when the mouse is panned from side to side. The result is that
the screen appears to be torn between multiple frames with an intense
flickering effect. Tearing tends to be aggravated when the framerate is high
since a large number of frames are in flight at a given time, causing multiple
bands of tearing.
FXAA:
Anti-Aliasing at Warp Speed
Nothing
ruins a beautiful looking game like jaggies. They makes straight lines look
crooked and generate distracting crawling patterns when the camera is in
motion. The fix for jaggies is anti-aliasing, but today's method of doing
anti-aliasing is very costly to frame rates. To make matters worse, their
effectiveness at removing jaggies has diminished in modern game engines.Almost
all games make use of a form of antialiasing called multi-sample anti-aliasing
(MSAA). MSAA renders the screen at an extra high resolution then down samples
the image to reduce the appearance of aliasing. The main problem with this is
technique is that it requires a tremendous amount of video memory. For example,
4x MSAA requires four times the video memory of standard rendering. In
practice, a lot of gamers are forced to disable MSAA in order to maintain
reasonable performance. FXAA is a
new way of performing antialiasing that's fast, effective, and optimized for
modern game engines. Instead of rendering everything at four times the
resolution, FXAA picks out the edges in a frame based on contrast detection. It
then smoothes out the aliased edges based on their gradient. All this is a done
as a lightweight, post processing shader. FXAA is
not only faster than 4xMSAA, but produces higher quality results in game
engines that make use of extensive post processing. Click here for an
interactive comparison.
Compared
to 4xMSAA, FXAA produces comparable if not smoother edges. But unlike 4xMSAA,
it consumes no additional memory and runs almost as fast as no antialiasing.
FXAA has the added benefit that it works on transparent geometry such as
foliage and helps reduce shader based aliasing that often appears on shiny
materials.Comparison
of MSAA (Antiliasing Deferred) vs. FXAA (Antialiasing Post) performance in
Battlefield 3. Click to enlarge.While
FXAA is available in a handful of games today, with the R300 series driver,
we've integrated it into the control panel. This means you'll be able to enable
it in hundreds of games, even legacy titles that don't support anti-aliasing.
TXAA:
Even Higher Quality Than FXAA
Computer
generated effects in films spend a huge amount of computing resources on
anti-aliasing. For games to truly approach film quality, developers need new
antialiasing techniques that bringing even higher quality without compromising
performance.With
Kepler, NVIDIA has invented an even higher quality AA mode called TXAA that is
designed for direct integration into game engines. TXAA combines the raw power
of MSAA with sophisticated resolve filters similar to those employed in CG
films. In addition, TXAA can also jitter sample locations between frames for
even higher quality.TXAA is
available with two modes: TXAA 1, and TXAA 2. TXAA 1 offers visual quality on
part with 8xMSAA with the performance similar to 2xMSAA, while TXAA 2 offers
image quality that’s superior to 8xMSAA, with performance comparable to 4xMSAA.
TXAA 2
performs similarly to 4xMSAA, but produces higher quality results than 8xMSAA.
Like our
FXAA technology, TXAA will be first integrated directly into the game engine.
The following games, engines, and developers have committed to offering TXAA
support: MechWarrior Online, Secret World, Eve Online, Borderlands 2, Unreal
Engine 4, BitSquid, Slant Six Games, and Crytek.
Conclusion
The new
architecture Kepler with GPU Boost makes the GTX 680 the fastest graphic card
with lower power consumption and noise. The NVIDIA team wanted to make gaming
faster and smoother. The FRXAA and TXAA make the gaming both. Games get super
smooth edges without tanking the performance. Adaptive V-Sync improves upon the
feature that many gamers swear-by. With new Kepler architecture you can play
with V-sync enabled and not worry about sudden dips in frame rate. GPU now
powers a full NVIDIA Surround setup plus an accessory display. There is simply
no better way to play a racing game or flight simulator. Add in NVIDIA PhysX
technology, and the GTX 680 delivers an amazingly rich gaming experience. Kepler
is the outcome of over four years of research and development by some of the
best and brightest engineers at NVIDIA.
0 comments:
Post a Comment