Ausnahme gefangen: SSL certificate problem: certificate is not yet valid ๐Ÿ“Œ Accelerating TensorFlow Lite Micro on Cadence Audio Digital Signal Processors

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Accelerating TensorFlow Lite Micro on Cadence Audio Digital Signal Processors


๐Ÿ’ก Newskategorie: AI Videos
๐Ÿ”— Quelle: blog.tensorflow.org

Posted by Raj Pawate (Cadence) and Advait Jain (Google)

Digital Signal Processors (DSPs) are a key part of any battery-powered device offering a way to process audio data with a very low power consumption. These chips run signal processing algorithms such as audio codecs, noise canceling and beam forming.

Increasingly these DSPs are also being used to run neural networks such as wake-word detection, speech recognition, and noise suppression. A key part of enabling such applications is the ability to execute these neural networks as efficiently as possible.

However, productization paths for machine learning on DSPs can often be ad-hoc.ย In contrast, speech, audio, and video codecs haveย worldwide standards bodies such as ITU and 3GPP creating algorithms for compression and decompression addressing several aspects of quality measurement, fixed point arithmetic considerations and interoperability.

TensorFlow Lite Micro (TFLM) is a generic open-sourced inference framework that runs machine learning models on embedded targets, including DSPs.ย Similarly, Cadence has invested heavily in PPA-optimized hardware-software platforms such as Cadence Tensilica HiFi DSP family for audio and Cadence Tensilica Vision DSP family for vision.

Google and Cadence โ€“ A Multi-Year Partnership for Enabling AI at the Edge

This was the genesis of the collaboration between the TFLM team and the Audio DSP teams at Cadence, starting in 2019.ย The TFLM team is focusing on leveraging the broad TensorFlow framework and developing a smooth path from training to embedded and DSP deployment via an interpreter and reference kernels.ย Cadence is developing a highly optimized software library, called NeuralNet library (NNLIB), that leverages the SIMD and VLIW capabilities of their low-power HiFi DSPs. This collaboration started with three optimized kernels for one Xtensa DSP, and now encompasses over 50 kernels across a variety of platforms such as HiFi 5, HiFi 4, HiFi 3z, Fusion F1 as well as Vision DSPs such as P6, and includes the ability to offload to an accelerator, if available.

Additionally, we have collaborated to add continuous integration for all the optimized code targeted for the Cadence DSPs. This includes infrastructure that tests that every pull request to the TFLM repository passes all the unit tests for the Tensilica toolchain with various HiFix and Vision P6 cores. As such, we ensure that the combined TFLM and NNLIB open source software is both tightly integrated and has good automated test coverage.

Performance Improvements

Most recently, we have collaborated on adding optimizations for models that are quantized with int16 activations. Specifically in the domain of audio, int16 activations can be critical for the quality of quantized generative models. We expect that these optimized kernels will enable a new class of ML-powered audio signal processing. The table below shows a few operators that are required for implementing a noise suppression neural net. We show a 267x improvement in cycle count for a variant of SEANet, an example noise suppression neural net.

The following table shows the improvement with the optimized kernels relative to the reference implementations as measured with the Xtensa instruction set simulation tool.

Operator

Improvement

Transpose Conv

458x

Conv2D

287x

Sub

39x

Add

24x

Leaky ReLU

18x

Srided_Slice

10x

Pad

6x

Overall Network

267x

How to use these optimizations

All of the code can be used from the TFLite Micro GitHub repository.

To use HiFi 3z targeted TFLM optimizations, the following conditions need to be met:

  • the TensorFlow Lite (TFLite) flatbuffer model is quantized with int16 activations and int8 weights
  • it uses one or more of the operators listed in the table above
  • TFLM is compiled with OPTIMIZED_KERNEL_DIR=xtensa

For example, you can run Conv2D kernel integration tests with reference C++ code with:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=hifi4 XTENSA_CORE= test_integration_tests_seanet_conv

And compare that to the optimized kernels by adding OPTIMIZED_KERNEL_DIR=xtensa:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=hifi4 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE= test_integration_tests_seanet_conv

Looking Ahead

While the work thus far has been primarily focused on convolutional neural networks, Google and Cadence are also working together to develop an optimized LSTM operator and have released a first example of an LSTM-based key-word recognizer. We expect to expand on this and continue to bring optimized and production-ready implementations of the latest developments in AI/ML to Tensilica Xtensa DSPs.

Acknowledgements

We would like to acknowledge a number of our colleagues who have contributed to making this collaboration successful.

Cadence: Int16 optimizations: Manjunath CP, Bhanu Prakash Venkata, Anirban Mandal LSTM implementation: Niranjan Yadla, Lukman Rahumathulla, Manjunath CP, Pramodkumar Surana, Arjun Medinakere NNLIB optimizations: Vijay Pawar, Prasad Nikam, Harshavardhan, Mayur Jagtap, Raj Pawate

Google: Advait Jain, Deqiang Chen, Lawrence Chan, Marco Tagliasacchi, Nat Jeffries, Nick Kreeger, Pete Warden, Rocky Rhodes, Ting Yan, Yunpeng Li, Victor Ungureanu

...



๐Ÿ“Œ Accelerating TensorFlow Lite Micro on Cadence Audio Digital Signal Processors


๐Ÿ“ˆ 101.61 Punkte

๐Ÿ“Œ Elliptic Labs partners with Cadence to optimize ML algorithms on Cadence Tensilica HiFi DSPs


๐Ÿ“ˆ 43.5 Punkte

๐Ÿ“Œ Accelerating AI performance on 3rd Gen Intelยฎ Xeonยฎ Scalable processors with TensorFlow and Bfloat16


๐Ÿ“ˆ 40.81 Punkte

๐Ÿ“Œ Accelerating TensorFlow Lite with XNNPACK Integration


๐Ÿ“ˆ 35.2 Punkte

๐Ÿ“Œ Deep Learning: Tensorflow Lite wird noch kleiner als Tensorflow Mobile


๐Ÿ“ˆ 29.91 Punkte

๐Ÿ“Œ TensorFlow operation fusion in the TensorFlow Lite converter


๐Ÿ“ˆ 29.91 Punkte

๐Ÿ“Œ Cadence delivers digital full flow to optimize PPA solution for Arm Cortex-A78 and Cortex-X1 CPUs


๐Ÿ“ˆ 27.52 Punkte

๐Ÿ“Œ Build a Digital Collectibles Portal Using Flow and Cadence (Part 1)


๐Ÿ“ˆ 27.52 Punkte

๐Ÿ“Œ Build a Digital Collectibles Portal Using Flow and Cadence (Part 1)


๐Ÿ“ˆ 27.52 Punkte

๐Ÿ“Œ Intro mPOD DxTrack: A low-cost healthcare device using TensorFlow Lite Micro


๐Ÿ“ˆ 27.27 Punkte

๐Ÿ“Œ Integrating Arm Virtual Hardware with the TensorFlow Lite Micro Continuous Integration Infrastructure


๐Ÿ“ˆ 27.27 Punkte

๐Ÿ“Œ TensorFlow Lite Micro with ML acceleration


๐Ÿ“ˆ 27.27 Punkte

๐Ÿ“Œ Announcing TensorFlow Lite Micro support on the ESP32


๐Ÿ“ˆ 27.27 Punkte

๐Ÿ“Œ Accelerating TensorFlow on Intel Data Center GPU Flex Series


๐Ÿ“ˆ 27.26 Punkte

๐Ÿ“Œ Accelerating TensorFlow Performance on Mac


๐Ÿ“ˆ 27.26 Punkte

๐Ÿ“Œ Intel announces 13th Gen mobile processors, plus 65-watt and 35-watt desktop processors


๐Ÿ“ˆ 27.1 Punkte

๐Ÿ“Œ CVE-2023-20566 | AMD 3rd Gen EPYC Processors/4th Gen EPYC Processors ASP memory corruption


๐Ÿ“ˆ 27.1 Punkte

๐Ÿ“Œ CVE-2023-20573 | AMD 3rd Gen EPYC Processors/4th Gen EPYC Processors Debug Information denial of service


๐Ÿ“ˆ 27.1 Punkte

๐Ÿ“Œ PhotoBooth Lite on Raspberry Pi with TensorFlow Lite


๐Ÿ“ˆ 26.86 Punkte

๐Ÿ“Œ Optimizing TensorFlow for 4th Gen Intel Xeon Processors


๐Ÿ“ˆ 24.54 Punkte

๐Ÿ“Œ 17 Days of Flutter, optimizing TensorFlow Processors, and more dev news!


๐Ÿ“ˆ 24.54 Punkte

๐Ÿ“Œ Xiaomi-Mi-Vergleich: Note 10 Lite, 10 Lite, 10, 10 Pro, 10T Lite, 10T, 10T Pro


๐Ÿ“ˆ 23.81 Punkte

๐Ÿ“Œ Mate 10 lite, P8 lite (2017) und P10 lite: Android 8.0 Oreo-Beta ist da


๐Ÿ“ˆ 23.81 Punkte

๐Ÿ“Œ P20 Lite vs Mate 10 lite: Huaweis Lite-Modelle im Vergleich


๐Ÿ“ˆ 23.81 Punkte

๐Ÿ“Œ QEMU Audio Capture audio/audio.c denial of service


๐Ÿ“ˆ 22.59 Punkte

๐Ÿ“Œ Audio Hijack 4.1.2 - Record and enhance audio from any application (was Audio Hijack Pro).


๐Ÿ“ˆ 22.59 Punkte

๐Ÿ“Œ QEMU Audio Capture audio/audio.c Denial of Service


๐Ÿ“ˆ 22.59 Punkte

๐Ÿ“Œ Alibaba Researchers Introduce Qwen-Audio Series: A Set of Large-Scale Audio-Language Models with Universal Audio Understanding Abilities


๐Ÿ“ˆ 22.59 Punkte

๐Ÿ“Œ 84% of global decision makers accelerating digital transformation plans


๐Ÿ“ˆ 22.04 Punkte

๐Ÿ“Œ Wix: Strong Q1, growth accelerating as SMB ramp digital efforts


๐Ÿ“ˆ 22.04 Punkte

๐Ÿ“Œ Thrive with Digital, Accelerating Intelligence for Electric Power


๐Ÿ“ˆ 22.04 Punkte

๐Ÿ“Œ Kyndryl Services: Accelerating Digital Transformation for Real Return


๐Ÿ“ˆ 22.04 Punkte

๐Ÿ“Œ Fasten Your Seat Belt: Accelerating Your Digital Transformation Through Collaboration


๐Ÿ“ˆ 22.04 Punkte

๐Ÿ“Œ Accelerating staffingโ€™s digital transformation


๐Ÿ“ˆ 22.04 Punkte

๐Ÿ“Œ Covid is accelerating digital transformation for many businesses


๐Ÿ“ˆ 22.04 Punkte











matomo