Cookie Consent by Free Privacy Policy Generator 📌 Optimizing TensorFlow for 4th Gen Intel Xeon Processors

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 Optimizing TensorFlow for 4th Gen Intel Xeon Processors


💡 Newskategorie: AI Videos
🔗 Quelle: blog.tensorflow.org

Posted by Ashraf Bhuiyan, AG Ramesh from Intel, Penporn Koanantakool from Google

TensorFlow 2.9.1 was the first release to include, by default, optimizations driven by the Intel® oneAPI Deep Neural Network (oneDNN) library, for 3rd Gen Intel ® 3rd Xeon® processors (Cascade Lake). Since then, Intel and Google have continued our collaboration to introduce new TensorFlow optimizations for the next generation of Intel Xeon processors.

These optimizations accelerate TensorFlow models using the new matrix-based instructions set, Intel® Advanced Matrix Extension (AMX). The Intel AMX instructions are designed to accelerate deep learning operations such as matrix multiplication and convolutions that use Google’s bfloat16 and 8-bit low precision data types. Low precision data types are widely used and provide significant improvement over the default 32-bit floating format without significant loss in accuracy.

We are happy to announce that these features are now available as a preview in the nightly build of TensorFlow on Github, and also in the Intel optimized build. TensorFlow developers can now use Intel AMX on the 4th Gen Intel® Xeon® Scalable processor (formerly known as Sapphire Rapids) using the existing mixed precision support available in TensorFlow. We are excited by the results - several popular AI models run up to 19x faster by moving from 3rd Gen to 4th Gen Intel Xeon processors using Intel AMX.

Intel’s Advanced Matrix Extension (AMX) Accelerations in 4th Gen Intel Xeon Processor

The Intel® Advanced Matrix Extension (AMX) is an X86-based extension which introduces a new programming framework for dot products of two matrices. Intel AMX serves as an AI acceleration engine and builds on capabilities such as AVX-512 (for optimized vector operations) and Deep Learning Boost (through Vector Neural network Instructions for optimized resource utilization/caching and for lower precision AI optimizations) in previous generations of Intel Xeon processors.

In Intel AMX, a new type of 2-dimensional register file, called “tiles”, and a set of 12 new X86 instructions to operate on the tiles, are introduced. New instruction TDPBF16PS performs a dot product of bfloat16 tiles, and TDPBSSD performs dot product of signed 8-bit integer tiles. Other instructions include tile configuration and data movement to the Intel AMX unit. Further details can be found in the document published by Intel.

How to take advantage of AMX optimizations on 4th Gen Intel Xeon.

Intel AMX optimizations are included in the official TensorFlow nightly releases. The latest stable release 2.11 includes preliminary support, however full support will be available in a subsequent stable release.

Users running TensorFlow on Intel 4th gen Intel Xeon can take advantage of the optimizations with minimal changes:

a)    For bfloat16 mixed precision, developers can accelerate their models using Keras mixed precision API, as explained here. You can easily invoke auto mixed precision by including these lines in your code, that’s it! 
   
from tensorflow.keras import mixed_precisionpolicy = mixed_precision.Policy('mixed_bfloat16') mixed_precision.set_global_policy(policy)

b)    Using Intel AMX with 8-bit quantized models requires the models to be quantized to use int8. Any existing standard models, for example RN50, BERT, SSD-RN34 that have been previously quantized with Intel Neural Compressor will run with no changes needed.

    Performance improvements

    The following charts show performance improvement on a 2-socket, 56-core 4th Gen Intel Xeon using Intel AMX low precision on various popular vision and language models, where the baseline is a 2-socket, 40-core 3rd Gen Intel Xeon with FP32 precision. We use Intel Optimization for TensorFlow* preview and the launch_benchmark script from Model Zoo for Intel® Architecture .

    Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX BF16 vs. 3rd Gen Intel Xeon with FP32 across mixed precision models

    Here in the chart, inference with mixed precision models on a 4th Gen Intel Xeon was 1.9x to 9.6x faster than FP32 models on a 3rd Gen Intel Xeon. (BS=x indicates a large batch size, depending on the model)

    Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX BF16 vs. 3rd Gen Intel Xeon with FP32 for training across mixed precision models

    Training models with auto-mixed-precision on a 4th Gen Intel Xeon was 2.3x to 5.5x faster than FP32 models on a 3rd Gen Intel Xeon.

    Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX Int8 vs. 3rd Gen Intel Xeon with FP32 across mixed precision models

    Similarly, quantized model inference on a 4th Gen Intel Xeon was 3.3x to 19x faster than FP32 precision on a 3rd Gen Intel Xeon.

    In addition to the above popular models, we have tested 100s of other models to ensure that the performance gain is observed across the board.

    Next Steps

    We are working to continuously tune and improve the Intel AMX optimizations in future releases of TensorFlow. We encourage users to optimize their AI models with Intel AMX on Intel 4th Gen processors to get a significant performance boost; not just for inference, but also for pre-training, fine tuning and transfer learning. We would like to hear from you, please provide feedback through the TensorFlow Github page or the oneAPI Deep Neural Network library GitHub page.

    Acknowledgements

    The results presented in this blog is the work of many people including the TensorFlow and oneDNN teams at Intel and our collaborators in Google’s TensorFlow team.

    From Intel: Md Faijul Amin, Mahmoud Abuzaina, Gauri Deshpande, Ashiq Imran, Kanvi Khanna, Geetanjali Krishna, Sachin Muradi, Srinivasan Narayanamoorthy, Bhavani Subramanian, Yimei Sun, Om Thakkar, Jojimon Varghese, Tatyana Primak, Shamima Najnin, Mona Minakshi, Haihao Shen, Shufan Wu, Feng Tian, Chandan Damannagari.

    From Google: Eugene Zhulenev, Antonio Sanchez, Emilio Cota.

    *For configuration details see www.intel.com/performanceindex


    Notices and Disclaimers:
    Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured list by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

    ...



    📌 Optimizing TensorFlow for 4th Gen Intel Xeon Processors


    📈 81.71 Punkte

    📌 CVE-2023-20566 | AMD 3rd Gen EPYC Processors/4th Gen EPYC Processors ASP memory corruption


    📈 59.18 Punkte

    📌 CVE-2023-20573 | AMD 3rd Gen EPYC Processors/4th Gen EPYC Processors Debug Information denial of service


    📈 59.18 Punkte

    📌 Intel Adds TDX to Confidential Computing Portfolio With Launch of 4th Gen Xeon Processors


    📈 55.31 Punkte

    📌 Hyve Solutions Leveraging 4th Gen Intel Xeon Scalable Processors


    📈 55.31 Punkte

    📌 Accelerating AI performance on 3rd Gen Intel® Xeon® Scalable processors with TensorFlow and Bfloat16


    📈 50.69 Punkte

    📌 Intel unveils 3rd Gen Intel Xeon Scalable processors, additions to its hardware and software AI portfolio


    📈 44.48 Punkte

    📌 Intel Xeon E3/Xeon Scalable/Xeon D DCI privilege escalation [CVE-2018-3652]


    📈 44.29 Punkte

    📌 Intel Xeon E3/Xeon Scalable/Xeon D DCI erweiterte Rechte [CVE-2018-3652]


    📈 44.29 Punkte

    📌 Intel Launches 4th Gen Xeon & Max Series CPUs: Here’s Everything You Need To Know


    📈 41.72 Punkte

    📌 Intel announces 13th Gen mobile processors, plus 65-watt and 35-watt desktop processors


    📈 40.15 Punkte

    📌 17 Days of Flutter, optimizing TensorFlow Processors, and more dev news!


    📈 40 Punkte

    📌 Lenovo revamps ThinkSystem lineup as Intel launches next-gen Xeon Scalable processors


    📈 39.73 Punkte

    📌 Intel Launches First 10nm 3rd Gen Xeon Scalable Processors For Data Centers


    📈 39.73 Punkte

    📌 Hyve Solutions Offering Systems Powered by 4th Gen AMD EPYC™ Processors


    📈 37.38 Punkte

    📌 Intel Xeon E3 processors produced since at least mid-2017 do not have Intel ME


    📈 36.28 Punkte

    📌 why don't xeon processors support intel secure boot?


    📈 31.53 Punkte

    📌 Intel’s Ice Lake Xeon processors get new security features


    📈 31.53 Punkte

    📌 Intel celebrates security of Ice Lake Xeon processors, so far impervious to any threat due to their unavailability


    📈 31.53 Punkte

    📌 Intel Unveils New 9th Generation, Core X, and 28-Core Xeon Processors


    📈 31.53 Punkte

    📌 Intel announces new 11th Gen Intel Core mobile processors, first 5G M.2 solution and more during Computex keynote


    📈 31.3 Punkte

    📌 Intel launches 11th Gen Intel Core H-series mobile processors


    📈 31.3 Punkte

    📌 Intel debuts 13th Gen Intel Core desktop processors


    📈 31.3 Punkte

    📌 CES 2023: Intel introduces 13th Gen Intel Core mobile processors, vision processing unit and more


    📈 31.3 Punkte

    📌 ISC Stormcast For Monday, November 4th 2019 https://isc.sans.edu/podcastdetail.html?id=6736, (Mon, Nov 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Wednesday, December 4th 2019 https://isc.sans.edu/podcastdetail.html?id=6774, (Wed, Dec 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Tuesday, February 4th 2020 https://isc.sans.edu/podcastdetail.html?id=6852, (Tue, Feb 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Wednesday, March 4th 2020 https://isc.sans.edu/podcastdetail.html?id=6894, (Wed, Mar 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Monday, May 4th 2020 https://isc.sans.edu/podcastdetail.html?id=6980, (Mon, May 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Thursday, June 4th 2020 https://isc.sans.edu/podcastdetail.html?id=7024, (Thu, Jun 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Tuesday, August 4th 2020 https://isc.sans.edu/podcastdetail.html?id=7108, (Tue, Aug 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Friday, September 4th 2020 https://isc.sans.edu/podcastdetail.html?id=7154, (Fri, Sep 4th)


    📈 31.16 Punkte

    📌 ISC Stormcast For Wednesday, November 4th 2020 https://isc.sans.edu/podcastdetail.html?id=7238, (Wed, Nov 4th)


    📈 31.16 Punkte











    matomo