Two announcements from NVIDIA. First up is the news that the company developed a lower-end version of its Drive PX 2 automotive computer, the new model is a palm-sized version of what the company showed at CES earlier this year. One of the main differences is that this new model has just one CPU and one GPU, versus the two CPUs and two GPUs offered by the full version of the Drive PX 2. NVIDIA claims the single-processor Drive PX 2 will ship to production partners in Q4 2016.
NVIDIA (NASDAQ: NVDA) today unveiled a palm-sized, energy-efficient artificial intelligence (AI) computer that automakers can use to power automated and autonomous vehicles for driving and mapping.
The new single-processor configuration of the NVIDIA® DRIVE™ PX 2 AI computing platform for AutoCruise functions -- which include highway automated driving and HD mapping -- consumes just 10 watts of power and enables vehicles to use deep neural networks to process data from multiple cameras and sensors. It will be deployed by China's Baidu as the in-vehicle car computer for its self-driving cloud-to-car system.
DRIVE PX 2 enables automakers and their tier 1 suppliers to accelerate production of automated and autonomous vehicles. A car using the small form-factor DRIVE PX 2 for AutoCruise can understand in real time what is happening around it, precisely locate itself on an HD map and plan a safe path forward.
"Bringing an AI computer to the car in a small, efficient form factor is the goal of many automakers," said Rob Csongor, vice president and general manager of Automotive at NVIDIA. "NVIDIA DRIVE PX 2 in the car solves this challenge for our OEM and tier 1 partners, and complements our data center solution for mapping and training."
More than 80 automakers, tier 1 suppliers, startups and research institutions developing autonomous vehicle solutions are using DRIVE PX. DRIVE PX 2's architecture scales from a single mobile processor configuration, to a combination of two mobile processors and two discrete GPUs, to multiple DRIVE PX 2s. This enables automakers and tier 1s to move from development into production for a wide range of self-driving solutions -- from AutoCruise for the highway, to AutoChauffeur for point to point travel, to a fully autonomous vehicle.
The new small form-factor DRIVE PX 2 will be the AI engine of the Baidu self-driving car. Last week at Baidu World, in Beijing, NVIDIA and Baidu announced a partnership to deliver a self-driving cloud-to-car system for Chinese automakers, as well as global brands.
"Baidu and NVIDIA are leveraging our AI skills together to create a cloud-to-car system for self-driving," said Liu Jun, vice president of Baidu. "The new, small form-factor DRIVE PX 2 will be used in Baidu's HD map-based self-driving solution for car manufacturers."
NVIDIA DRIVE PX is part of a broad family of NVIDIA AI computing solutions. Data scientists who train their deep neural networks in the data center on the NVIDIA DGX-1™ can then seamlessly run on NVIDIA DRIVE PX 2 inside the vehicle. The same NVIDIA DriveWorks algorithms, libraries and tools that run in the data center also run in the car.
This end-to-end approach leverages NVIDIA's unified AI architecture, and enables cars to receive over-the-air updates to add new features and capabilities throughout the life of a vehicle.
NVIDIA DRIVE PX 2 is powered by the company's newest system-on-a-chip, featuring a GPU based on the NVIDIA Pascal™ architecture. A single NVIDIA Parker system-on-chip (SoC) configuration can process inputs from multiple cameras, plus lidar, radar and ultrasonic sensors. It supports automotive inputs/outputs, including ethernet, CAN and Flexray.
The new single-processor DRIVE PX 2 will be available to production partners in the fourth quarter of 2016. DriveWorks software and the DRIVE PX 2 configuration with two SoCs and two discrete GPUs are available today for developers working on autonomous vehicles.
NVIDIA also announced the Pascal-based Tesla P4 and Tesla P40 GPU accelerators, these will be available in November and October, respectively. The Tesla P40 features 3840 CUDA cores, 24GB GDDR5 memory, 346GB/s memory bandwidth and offers up to 12 teraflops of single-precision computing power within a 250W TDP. The Tesla P4 on the other hand has a TDP of 50W (or higher), it has 2560 CUDA cores, 8GB GDDR5 memory, 192GB/s memory bandwidth and 5.5 teraflops of single-precision computing power.
The Tesla P40 is based on a full GP100 with all 3840 CUDA cores, whereas the Tesla P4 uses a full GP104 GPU with 2560 CUDA cores.
GPU Technology Conference China - NVIDIA (NASDAQ: NVDA) today unveiled the latest additions to its Pascal™ architecture-based deep learning platform, with new NVIDIA® Tesla® P4 and P40 GPU accelerators and new software that deliver massive leaps in efficiency and speed to accelerate inferencing production workloads for artificial intelligence services.
Modern AI services such as voice-activated assistance, email spam filters, and movie and product recommendation engines are rapidly growing in complexity, requiring up to 10x more compute compared to neural networks from a year ago. Current CPU-based technology isn't capable of delivering real-time responsiveness required for modern AI services, leading to a poor user experience.
The Tesla P4 and P40 are specifically designed for inferencing, which uses trained deep neural networks to recognize speech, images or text in response to queries from users and devices. Based on the Pascal architecture, these GPUs feature specialized inference instructions based on 8-bit (INT8) operations, delivering 45x faster response than CPUs and a 4x improvement over GPU solutions launched less than a year ago.2
The Tesla P4 delivers the highest energy efficiency for data centers. It fits in any server with its small form-factor and low-power design, which starts at 50 watts, helping make it 40x more energy efficient than CPUs for inferencing in production workloads. A single server with a single Tesla P4 replaces 13 CPU-only servers for video inferencing workloads, delivering over 8x savings in total cost of ownership, including server and power costs.
The Tesla P40 delivers maximum throughput for deep learning workloads. With 47 tera-operations per second (TOPS) of inference performance with INT8 instructions, a server with eight Tesla P40 accelerators can replace the performance of more than 140 CPU servers. At approximately $5,000 per CPU server, this results in savings of more than $650,000 in server acquisition cost.
"With the Tesla P100 and now Tesla P4 and P40, NVIDIA offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries," said Ian Buck, general manager of accelerated computing at NVIDIA. "They slash training time from days to hours. They enable insight to be extracted instantly. And they produce real-time responses for consumers from AI-powered services."
Software Tools for Faster Inferencing
Complementing the Tesla P4 and P40 are two software innovations to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.
TensorRT is a library created for optimizing deep learning models for production deployment that delivers instant responsiveness for the most complex networks. It maximizes throughput and efficiency of deep learning applications by taking trained neural nets -- defined with 32-bit or 16-bit operations -- and optimizing them for reduced precision INT8 operations.
NVIDIA DeepStream SDK taps into the power of a Pascal server to simultaneously decode and analyze up to 93 HD video streams in real time compared with seven streams with dual CPUs. This addresses one of the grand challenges of AI: understanding video content at-scale for applications such as self-driving cars, interactive robots, filtering and ad placement. Integrating deep learning into video applications allows companies to offer smart, innovative video services that were previously impossible to deliver.