Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ Optimizing TensorFlow for 4th Gen Intel Xeon Processors

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Optimizing TensorFlow for 4th Gen Intel Xeon Processors


๐Ÿ’ก Newskategorie: AI Videos
๐Ÿ”— Quelle: blog.tensorflow.org

Posted by Ashraf Bhuiyan, AG Ramesh from Intel, Penporn Koanantakool from Google

TensorFlow 2.9.1 was the first release to include, by default, optimizations driven by the Intelยฎ oneAPI Deep Neural Network (oneDNN) library, for 3rd Gen Intel ยฎ 3rd Xeonยฎ processors (Cascade Lake). Since then, Intel and Google have continued our collaboration to introduce new TensorFlow optimizations for the next generation of Intel Xeon processors.

These optimizations accelerate TensorFlow models using the new matrix-based instructions set, Intelยฎ Advanced Matrix Extension (AMX). The Intel AMX instructions are designed to accelerate deep learning operations such as matrix multiplication and convolutions that use Googleโ€™s bfloat16 and 8-bit low precision data types. Low precision data types are widely used and provide significant improvement over the default 32-bit floating format without significant loss in accuracy.

We are happy to announce that these features are now available as a preview in the nightly build of TensorFlow on Github, and also in the Intel optimized build. TensorFlow developers can now use Intel AMX on the 4th Gen Intelยฎ Xeonยฎ Scalable processor (formerly known as Sapphire Rapids) using the existing mixed precision support available in TensorFlow. We are excited by the results - several popular AI models run up to 19x faster by moving from 3rd Gen to 4th Gen Intel Xeon processors using Intel AMX.

Intelโ€™s Advanced Matrix Extension (AMX) Accelerations in 4th Gen Intel Xeon Processor

The Intelยฎ Advanced Matrix Extension (AMX) is an X86-based extension which introduces a new programming framework for dot products of two matrices. Intel AMX serves as an AI acceleration engine and builds on capabilities such as AVX-512 (for optimized vector operations) and Deep Learning Boost (through Vector Neural network Instructions for optimized resource utilization/caching and for lower precision AI optimizations) in previous generations of Intel Xeon processors.

In Intel AMX, a new type of 2-dimensional register file, called โ€œtilesโ€, and a set of 12 new X86 instructions to operate on the tiles, are introduced. New instruction TDPBF16PS performs a dot product of bfloat16 tiles, and TDPBSSD performs dot product of signed 8-bit integer tiles. Other instructions include tile configuration and data movement to the Intel AMX unit. Further details can be found in theย documentย published by Intel.

How to take advantage of AMX optimizations on 4th Gen Intel Xeon.

Intel AMX optimizations are included in the official TensorFlow nightly releases. The latest stable release 2.11 includes preliminary support, however full support will be available in a subsequent stable release.

Users running TensorFlow on Intel 4th gen Intel Xeon can take advantage of the optimizations with minimal changes:

a)ย ย  ย For bfloat16 mixed precision, developers can accelerate their models using Keras mixed precision API, as explained here. You can easily invoke auto mixed precision by including these lines in your code, thatโ€™s it!ย 
ย ย ย 
from tensorflow.keras import mixed_precisionpolicy = mixed_precision.Policy('mixed_bfloat16') mixed_precision.set_global_policy(policy)

b)ย ย  ย Using Intel AMX with 8-bit quantized models requires the models to be quantized to use int8. Any existing standard models, for example RN50, BERT, SSD-RN34 that have been previously quantized with Intel Neural Compressor will run with no changes needed.

    Performance improvements

    The following charts show performance improvement on a 2-socket, 56-core 4th Gen Intel Xeon using Intel AMX low precision on various popular vision and language models, where the baseline is a 2-socket, 40-core 3rd Gen Intel Xeon with FP32 precision. We use Intel Optimization for TensorFlow* preview and the launch_benchmark script from Model Zoo for Intelยฎ Architecture .

    Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX BF16 vs. 3rd Gen Intel Xeon with FP32 across mixed precision models

    Here in the chart, inference with mixed precision models on a 4th Gen Intel Xeon was 1.9x to 9.6x faster than FP32 models on a 3rd Gen Intel Xeon. (BS=x indicates a large batch size, depending on the model)

    Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX BF16 vs. 3rd Gen Intel Xeon with FP32 for training across mixed precision models

    Training models with auto-mixed-precision on a 4th Gen Intel Xeon was 2.3x to 5.5x faster than FP32 models on a 3rd Gen Intel Xeon.

    Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX Int8 vs. 3rd Gen Intel Xeon with FP32 across mixed precision models

    Similarly, quantized model inference on a 4th Gen Intel Xeon was 3.3x to 19x faster than FP32 precision on a 3rd Gen Intel Xeon.

    In addition to the above popular models, we have tested 100s of other models to ensure that the performance gain is observed across the board.

    Next Steps

    We are working to continuously tune and improve the Intel AMX optimizations in future releases of TensorFlow. We encourage users to optimize their AI models with Intel AMX on Intel 4th Gen processors to get a significant performance boost; not just for inference, but also for pre-training, fine tuning and transfer learning. We would like to hear from you, please provide feedback through the TensorFlow Github page or the oneAPI Deep Neural Network library GitHub page.

    Acknowledgements

    The results presented in this blog is the work of many people including the TensorFlow and oneDNN teams at Intel and our collaborators in Googleโ€™s TensorFlow team.

    From Intel:ย Md Faijul Amin, Mahmoud Abuzaina, Gauri Deshpande, Ashiq Imran, Kanvi Khanna, Geetanjali Krishna, Sachin Muradi, Srinivasan Narayanamoorthy, Bhavani Subramanian, Yimei Sun, Om Thakkar, Jojimon Varghese, Tatyana Primak, Shamima Najnin, Mona Minakshi, Haihao Shen, Shufan Wu, Feng Tian, Chandan Damannagari.

    From Google:ย Eugene Zhulenev, Antonio Sanchez, Emilio Cota.

    *For configuration details seeย www.intel.com/performanceindex


    Notices and Disclaimers:
    Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured list by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

    ...



    ๐Ÿ“Œ Optimizing TensorFlow for 4th Gen Intel Xeon Processors


    ๐Ÿ“ˆ 83.49 Punkte

    ๐Ÿ“Œ CVE-2023-20566 | AMD 3rd Gen EPYC Processors/4th Gen EPYC Processors ASP memory corruption


    ๐Ÿ“ˆ 60.45 Punkte

    ๐Ÿ“Œ CVE-2023-20573 | AMD 3rd Gen EPYC Processors/4th Gen EPYC Processors Debug Information denial of service


    ๐Ÿ“ˆ 60.45 Punkte

    ๐Ÿ“Œ Intel Adds TDX to Confidential Computing Portfolio With Launch of 4th Gen Xeon Processors


    ๐Ÿ“ˆ 56.49 Punkte

    ๐Ÿ“Œ Hyve Solutions Leveraging 4th Gen Intel Xeon Scalable Processors


    ๐Ÿ“ˆ 56.49 Punkte

    ๐Ÿ“Œ Accelerating AI performance on 3rd Gen Intelยฎ Xeonยฎ Scalable processors with TensorFlow and Bfloat16


    ๐Ÿ“ˆ 51.85 Punkte

    ๐Ÿ“Œ Intel unveils 3rd Gen Intel Xeon Scalable processors, additions to its hardware and software AI portfolio


    ๐Ÿ“ˆ 45.58 Punkte

    ๐Ÿ“Œ Intel Xeon E3/Xeon Scalable/Xeon D DCI privilege escalation [CVE-2018-3652]


    ๐Ÿ“ˆ 45.21 Punkte

    ๐Ÿ“Œ Intel Xeon E3/Xeon Scalable/Xeon D DCI erweiterte Rechte [CVE-2018-3652]


    ๐Ÿ“ˆ 45.21 Punkte

    ๐Ÿ“Œ Intel Launches 4th Gen Xeon & Max Series CPUs: Hereโ€™s Everything You Need To Know


    ๐Ÿ“ˆ 42.63 Punkte

    ๐Ÿ“Œ Intel announces 13th Gen mobile processors, plus 65-watt and 35-watt desktop processors


    ๐Ÿ“ˆ 41.09 Punkte

    ๐Ÿ“Œ 17 Days of Flutter, optimizing TensorFlow Processors, and more dev news!


    ๐Ÿ“ˆ 40.86 Punkte

    ๐Ÿ“Œ Lenovo revamps ThinkSystem lineup as Intel launches next-gen Xeon Scalable processors


    ๐Ÿ“ˆ 40.65 Punkte

    ๐Ÿ“Œ Intel Launches First 10nm 3rd Gen Xeon Scalable Processors For Data Centers


    ๐Ÿ“ˆ 40.65 Punkte

    ๐Ÿ“Œ Hyve Solutions Offering Systems Powered by 4th Gen AMD EPYCโ„ข Processors


    ๐Ÿ“ˆ 38.14 Punkte

    ๐Ÿ“Œ Intel Xeon E3 processors produced since at least mid-2017 do not have Intel ME


    ๐Ÿ“ˆ 37.14 Punkte

    ๐Ÿ“Œ why don't xeon processors support intel secure boot?


    ๐Ÿ“ˆ 32.22 Punkte

    ๐Ÿ“Œ Intelโ€™s Ice Lake Xeon processors get new security features


    ๐Ÿ“ˆ 32.22 Punkte

    ๐Ÿ“Œ Intel celebrates security of Ice Lake Xeon processors, so far impervious to any threat due to their unavailability


    ๐Ÿ“ˆ 32.22 Punkte

    ๐Ÿ“Œ Intel Unveils New 9th Generation, Core X, and 28-Core Xeon Processors


    ๐Ÿ“ˆ 32.22 Punkte

    ๐Ÿ“Œ Intel announces new 11th Gen Intel Core mobile processors, first 5G M.2 solution and more during Computex keynote


    ๐Ÿ“ˆ 32.15 Punkte

    ๐Ÿ“Œ ASUS unveils new laptop lineup with 11th Gen Intel Core Processors and debuts first portable PC verified as an Intel Evo platform design


    ๐Ÿ“ˆ 32.15 Punkte

    ๐Ÿ“Œ Intel launches 11th Gen Intel Core H-series mobile processors


    ๐Ÿ“ˆ 32.15 Punkte

    ๐Ÿ“Œ Intel debuts 13th Gen Intel Core desktop processors


    ๐Ÿ“ˆ 32.15 Punkte

    ๐Ÿ“Œ CES 2023: Intel introduces 13th Gen Intel Core mobile processors, vision processing unit and more


    ๐Ÿ“ˆ 32.15 Punkte

    ๐Ÿ“Œ Intel Xeon E3 v6: Kaby Lake jetzt auch als Xeon X3 verfรผgbar


    ๐Ÿ“ˆ 31.78 Punkte

    ๐Ÿ“Œ Intel: Ice-Lake-Xeon ersetzt Xeon Phi Knights Hill


    ๐Ÿ“ˆ 31.78 Punkte

    ๐Ÿ“Œ Intel Launches Xeon Scalable CPUs: Dual Xeon Platinum 8176, 112 Threads Tested


    ๐Ÿ“ˆ 31.78 Punkte

    ๐Ÿ“Œ Intel Xeon-1200/Xeon E3-1500M SGX privilege escalation [CVE-2017-5691]


    ๐Ÿ“ˆ 31.78 Punkte

    ๐Ÿ“Œ Intel Xeon-1200/Xeon E3-1500M SGX erweiterte Rechte [CVE-2017-5691]


    ๐Ÿ“ˆ 31.78 Punkte

    ๐Ÿ“Œ Intel launches 3rd Gen Intel Xeon Scalable processor for data centers


    ๐Ÿ“ˆ 31.71 Punkte

    ๐Ÿ“Œ Intel launches third-gen Intel Xeon Scalable processor for data centers


    ๐Ÿ“ˆ 31.71 Punkte

    ๐Ÿ“Œ ISC Stormcast For Monday, November 4th 2019 https://isc.sans.edu/podcastdetail.html?id=6736, (Mon, Nov 4th)


    ๐Ÿ“ˆ 31.68 Punkte

    ๐Ÿ“Œ ISC Stormcast For Wednesday, December 4th 2019 https://isc.sans.edu/podcastdetail.html?id=6774, (Wed, Dec 4th)


    ๐Ÿ“ˆ 31.68 Punkte











    matomo