Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ Distributed Fast Fourier Transform in TensorFlow

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Distributed Fast Fourier Transform in TensorFlow


๐Ÿ’ก Newskategorie: AI Videos
๐Ÿ”— Quelle: blog.tensorflow.org

Posted by Ruijiao Sun, Google Intern - DTensor team

Fast Fourier Transform is an important method of signal processing, which is commonly used in a number of ways, including speeding up convolutions, extracting features, and regularizing models. Distributed Fast Fourier Transform (Distributed FFT) offers a way to compute Fourier Transforms in models that work with image-like datasets that are too large to fit into the memory of a single accelerator device. In a previous Google Research Paper, โ€œLarge-Scale Discrete Fourier Transform on TPUsโ€ by Tianjian Lu, a Distributed FFT algorithm was implemented for TensorFlow v1 as a library. This work presents the newly added native support in TensorFlow v2 for Distributed FFT, through the new TensorFlow distribution API, DTensor.

About DTensor

DTensor is an extension to TensorFlow for synchronous distributed computing. It distributes the program and tensors through a procedure called Single program, multiple data (SPMD) extension. DTensor offers an uniform API for traditional data and model parallelism patterns used widely in Machine Learning.

Example Usage

The API interface for distributed FFT is the same as the original FFT in TensorFlow. Users just need to pass a sharded tensor as an input to the existing FFT ops in TensorFlow, such as tf.signal.fft2d. The output of a distributed FFT becomes sharded too.

import TensorFlow as tf from TensorFlow.experimental import dtensor # Set up devices device_type = dtensor.preferred_device_type() if device_type == 'CPU': cpu = tf.config.list_physical_devices(device_type) tf.config.set_logical_device_configuration(cpu[0], [tf.config.LogicalDeviceConfiguration()] * 8) if device_type == 'GPU': gpu = tf.config.list_physical_devices(device_type) tf.config.set_logical_device_configuration(gpu[0], [tf.config.LogicalDeviceConfiguration(memory_limit=1000)] * 8) dtensor.initialize_accelerator_system() # Create a mesh mesh = dtensor.create_distributed_mesh(mesh_dims=[('x', 1), ('y', 2), ('z', 4)], device_type=device_type) # Set up a distributed input Tensor input = tf.complex( tf.random.stateless_normal(shape=(2, 2, 4), seed=(1, 2), dtype=tf.float32), tf.random.stateless_normal(shape=(2, 2, 4), seed=(2, 4), dtype=tf.float32)) init_layout = dtensor.Layout(['x', 'y', 'z'], mesh) d_input = dtensor.relayout(input, layout=init_layout) # Run distributed fft2d. DTensor determines the most efficient # layout of of d_output. d_output = tf.signal.fft2d(d_input)

Performance Analysis

The following experiment demonstrates that the distributed FFT can process more data than the non-distributed one by utilizing memory across multiple devices. The tradeoff is spending additional time on communication and data transposes that slow down the calculation speed.

Graph of performance on different machines, measuri8ng wall clock time in seconds by size per dimension across single GPU, Distributed FFT and Undistributed FFT

This phenomenon is shown in detail from the profiling result of the 10K*10K distributed FFT experiment. The current implementation of distributed FFT in TensorFlow follows the simple shuffle+local FFT method, which is also used by other popular distributed FFT libraries such as FFTW and PFFT. Notably, the two local FFT ops only take 3.6% of the total time (15ms). This is around 1/3 of the time for non-distributed fft2d. Most of the computing time is spent on data shuffling, represented by the ncclAllToAll Operation. Note that these experiments were conducted on an 8xV100 GPU system.

Table of Top 10 TensorFlow operations on GPU highlighting two local FFT ops in the top 3

Next steps

The feature is new and we have adopted a simplest distributed FFT algorithm. A few ideas to fine tune or improve the performance are:

  • Switch to a different DFT/FFT algorithm.
  • Tweaks on the NCCL communication settings for the particular FFT sizes may improve utilization of the network bandwidth and increase the speed.
  • Reducing the number of collectives to minimize bandwidth requirements.
  • Use N-d local FFTs, rather than multiple 1-d local FFTs.

Try the new distributed FFT! We welcome your feedback on the TensorFlow Forum and look forward to working with you on improving the performance. Your input would be invaluable!

...



๐Ÿ“Œ Distributed Fast Fourier Transform in TensorFlow


๐Ÿ“ˆ 70.73 Punkte

๐Ÿ“Œ Algorithmic Alchemy with The Fast Fourier Transform


๐Ÿ“ˆ 47.28 Punkte

๐Ÿ“Œ Using Fourier Transform of Vector Representations Derived from BERT Embeddings for Semanticโ€ฆ


๐Ÿ“ˆ 40.94 Punkte

๐Ÿ“Œ Learn Discrete Fourier Transform (DFT)


๐Ÿ“ˆ 40.94 Punkte

๐Ÿ“Œ Fast Fourier Transforms for Microstructural Analysis


๐Ÿ“ˆ 33.41 Punkte

๐Ÿ“Œ Transform decorator (@transform) not working using web-pack in NestJs


๐Ÿ“ˆ 27.74 Punkte

๐Ÿ“Œ ImageMagick 7.0.8-50 Q16 MagickCore/fourier.c ComplexImage memory corruption


๐Ÿ“ˆ 27.07 Punkte

๐Ÿ“Œ ImageMagick 7.0.8-50 Q16 MagickCore/fourier.c ComplexImages memory corruption


๐Ÿ“ˆ 27.07 Punkte

๐Ÿ“Œ ImageMagick 7.0.8-50 Q16 MagickCore/fourier.c ComplexImages memory corruption


๐Ÿ“ˆ 27.07 Punkte

๐Ÿ“Œ NEW FOURIER INTELLIGENCE'S GR-1 HUMANOID AI ROBOT UPDATE | TECH NEWS


๐Ÿ“ˆ 27.07 Punkte

๐Ÿ“Œ Assorted Flavors of Fourier Series on a Finite Domain


๐Ÿ“ˆ 27.07 Punkte

๐Ÿ“Œ Distributed TensorFlow model training on Cloud AI Platform (TF Dev Summit '20)


๐Ÿ“ˆ 23.45 Punkte

๐Ÿ“Œ Getting Started with Distributed TensorFlow on GCP


๐Ÿ“ˆ 23.45 Punkte

๐Ÿ“Œ Deep Learning: Tensorflow Lite wird noch kleiner als Tensorflow Mobile


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ TensorFlow 101 (Really Awesome Intro Into TensorFlow)


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Evaluating TensorFlow models with TensorFlow Model Analysis


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ TensorFlow Enterprise: Productionizing TensorFlow with Google Cloud (TF Dev Summit '20)


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Train your TensorFlow model on Google Cloud using TensorFlow Cloud


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15195: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ TensorFlow Roadshow Videos available in Spanish [Videos de la gira global de TensorFlow disponibles en espaรฑol]


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ TensorFlow operation fusion in the TensorFlow Lite converter


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Low CVE-2020-15197: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15210: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15204: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Low CVE-2020-15209: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15201: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15205: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15198: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15203: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15207: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15191: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Low CVE-2020-15200: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte

๐Ÿ“Œ Medium CVE-2020-15202: Tensorflow Tensorflow


๐Ÿ“ˆ 21.98 Punkte











matomo