Ausnahme gefangen: SSL certificate problem: certificate is not yet valid ๐Ÿ“Œ Even Faster Mobile GPU Inference with OpenCL

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Even Faster Mobile GPU Inference with OpenCL


๐Ÿ’ก Newskategorie: AI Videos
๐Ÿ”— Quelle: blog.tensorflow.org

Posted by Juhyun Lee and Raman Sarokin, Software Engineers

While the TensorFlow Lite (TFLite) GPU team continuously improves the existing OpenGL-based mobile GPU inference engine, we also keep investigating other technologies. One of those experiments turned out quite successful, and we are excited to announce the official launch of OpenCL-based mobile GPU inference engine for Android, which offers up to ~2x speedup over our existing OpenGL backend, on reasonably sized neural networks that have enough workload for the GPU.
Figure 1. Duo's AR effects are powered by our OpenCL backend.

Improvements over the OpenGL Backend

Historically, OpenGL is an API designed for rendering vector graphics. Compute shaders were added with OpenGL ES 3.1, but its backward compatible API design decisions were limiting us from reaching the full potential of the GPU. OpenCL, on the other hand, was designed for computation with various accelerators from the beginning and is thus more relevant to our domain of mobile GPU inference. Therefore, we have looked into an OpenCL-based inference engine, and it brings quite a lot of features that let us optimize our mobile GPU inference engine.

Performance Profiling: Optimizing the OpenCL backend was much easier than OpenGL, because OpenCL offers good profiling features and Adreno supports them well. With these profiling APIs, we are able to measure the performance of each kernel dispatch very precisely.

Optimized Workgroup Sizes: We have observed that the performance of TFLite GPU on Qualcomm Adreno GPUs is very sensitive to workgroup sizes; picking the right workgroup size can boost the performance, whereby picking the wrong one can degrade the performance by an equal amount. Unfortunately, picking the right workgroup size is not trivial for complex kernels with complicated memory access patterns. With the help of the aforementioned performance profiling features in OpenCL, we were able to implement an optimizer for workgroup sizes, which resulted in up to 50% speedup over the average.

Native 16-bit Precision Floating Point (FP16): OpenCL supports FP16 natively and requires the accelerator to specify the data type's availability. Being a part of the official spec, even some of the older GPUs, e.g. Adreno 305 from 2012, can operate at their full capabilities. OpenGL, on the other hand, relies on hints which the vendors can choose to ignore in their implementations, leading to no performance guarantees.

Constant Memory: OpenCL has a concept of constant memory. Qualcomm added a physical memory that has properties that makes it ideal to be used with OpenCL's constant memory. This turned out to be very efficient for certain special cases, e.g. very thin layers at the beginning or at the end of the neural network. OpenCL on Adreno is able to greatly outperform OpenGL's performance by having a synergy with this physical constant memory and the aforementioned native FP16 support.

Performance Evaluation

Below, we show the performance of TFLite on the CPU (single-threaded on a big core), on the GPU using our existing OpenGL backend, and on the GPU using our new OpenCL backend. Figure 2 and Figure 3 depict the performance of the inference engine on select Android devices with OpenCL on a couple of well-known neural networks, MNASNet 1.3 and SSD MobileNet v3 (large), respectively. Each group of 3 bars are to be observed independently which shows the relative speedup among the TFLite backends on a device. Our new OpenCL backend is roughly twice as fast as the OpenGL backend, but does particularly better on Adreno devices (annotated with SD), as we have tuned the workgroup sizes with Adreno's performance profilers mentioned earlier. Also, the difference between Figure 2 and Figure 3 visualizes that OpenCL performs even better on larger networks.
Figure 2. Inference latency of MNASNet 1.3 on select Android devices with OpenCL.
Figure 3. Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL.

Seamless Integration through the GPU Delegate

One major hurdle in employing the OpenCL inference engine is that OpenCL is not a part of the standard Android distribution. While major Android vendors include OpenCL as part of their system library, it is possible that OpenCL is not available for some users. For these devices, one needs to fall back to the OpenGL backend which is available on every Android device.

To make developers' life easy, we have added a couple of modifications to the TFLite GPU delegate. We first check the availability of OpenCL at runtime. If it is available, we employ the new OpenCL backend as it is much faster than the OpenGL backend; if it is unavailable or couldn't be loaded, we fall back to the existing OpenGL backend. In fact, the OpenCL backend has been in the TensorFlow repository since mid 2019 and seamlessly integrated through the TFLite GPU delegate v2, so you might be already using it through the delegate's fallback mechanism.

Acknowledgements

Andrei Kulik, Matthias Grundman, Jared Duke, Sarah Sirajuddin, and special thanks to Sachin Joglekar for his contributions to this blog post.
...



๐Ÿ“Œ Even Faster Mobile GPU Inference with OpenCL


๐Ÿ“ˆ 69.72 Punkte

๐Ÿ“Œ What exactly are these? Mesa, OpenCL, intel-opencl-runtime, intel-cpu, nvidia-driver etc


๐Ÿ“ˆ 41.46 Punkte

๐Ÿ“Œ Using TFX inference with Dataflow for large scale ML inference patterns


๐Ÿ“ˆ 32.08 Punkte

๐Ÿ“Œ Half-precision Inference Doubles On-Device Inference Performance


๐Ÿ“ˆ 32.08 Punkte

๐Ÿ“Œ Microsoft Office 2007/2010 and Office 2013 Passwords Now Recovered with AMD OpenCL and NVIDIA CUDA Reaching Faster Password Recovery Speeds


๐Ÿ“ˆ 30.75 Punkte

๐Ÿ“Œ How to Install OpenCL for Intel GPU on Fedora 40+


๐Ÿ“ˆ 28.84 Punkte

๐Ÿ“Œ TensorFlow Lite Core ML delegate enables faster inference on iPhones and iPads


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ Speculative Decoding for Faster Inference with Mixtral-8x7B and Gemma


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ Faster Dynamically Quantized Inference with XNNPack


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client | AI Show


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ This AI Paper Unveils the Potential of Speculative Decoding for Faster Large Language Model Inference: A Comprehensive Analysis


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ Appleโ€™s Breakthrough in Language Model Efficiency: Unveiling Speculative Streaming for Faster Inference


๐Ÿ“ˆ 26.06 Punkte

๐Ÿ“Œ Microsoft open sources breakthrough optimizations for transformer inference on GPU and CPU


๐Ÿ“ˆ 24.15 Punkte

๐Ÿ“Œ Meet PowerInfer: A Fast Large Language Model (LLM) on a Single Consumer-Grade GPU that Speeds up Machine Learning Model Inference By 11 Times


๐Ÿ“ˆ 24.15 Punkte

๐Ÿ“Œ PyTorchEdge Unveils ExecuTorch: Empowering On-Device Inference for Mobile and Edge Devices


๐Ÿ“ˆ 21.21 Punkte

๐Ÿ“Œ macOS: Aus fรผr 32-Bit-Apps, OpenGL und OpenCL


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ LLVM 8.0 Released With Cascade Lake Support, Better Diagnostics, More OpenMP/OpenCL


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ My experience with OpenCL on Linux (mainly for Agisoft Photoscan/Metashape, but also works for Blender, Darktable, QGIS etc.)


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Geekbench 4.4: Bug mit falschen OpenCL-Werten bei AMD/Intel behoben [Notiz]


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ ImageMagick 7.0.7 magick/opencl.c saveBinaryCLProgram denial of service


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ ImageMagick 7.0.7 MagickCore/opencl.c BenchmarkOpenCLDevices denial of service


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ ImageMagick 7.0.7 magick/opencl.c GetOpenCLCachedFilesDirectory denial of service


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ ImageMagick 7.0.7 MagickCore/opencl.c LogOpenCLBuildFailure denial of service


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ ImageMagick 7.0.7 MagickCore/opencl.c LoadOpenCLDevices denial of service


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Introducing OpenCL and OpenGL on DirectX


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Vorlรคufige Spezifikationen fรผr OpenCL 3.0 verรถffentlicht


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Khronos gibt finale Version der OpenCL-3.0-Spezifikation frei


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Aus der Community: OpenCL Benchmark 1.1.133 fรผr CPUs und GPUs [Notiz]


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Blender 3: Kรผnftige Cycles-X-Engine des Open-Source-Renderers ohne OpenCL


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Sicherheitsupdate fรผr macOS 10.15 killt OpenCL


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ Grafikbeschleunigung: Blender will auf OpenCL verzichten


๐Ÿ“ˆ 20.73 Punkte

๐Ÿ“Œ OpenCL Benchmark 1.23: Projekt aus der Community sucht noch nach รœbersetzern [Notiz]


๐Ÿ“ˆ 20.73 Punkte











matomo