MulticoreWare Inc.

MulticoreWare is devoted to providing professional software solutions to help you build accelerated applications on heterogeneous systems. They have a major presence at AMD Fusion Developer Summit 2012 (AFDS), with twelve presentations and numerous demos at the Experience Zone.

AFDS Sessions List:
  • OpenCL Optimizations on ImageMagick; Convert, Edit, and Compose Images
  • The MulticoreWare Application Development and Deployment Environment for HSA
  • OpenCL Enabled Face Detection Plug-in for IrfanView
  • Optimizing VLC Using OpenCL and Other Fusion Capabilities
  • OpenCL Acceleration of x264
  • VPL Overview: Performant and Extensible OpenCL-based Video Processing Framework
  • GMAC-2: Easy and Efficient Programming for OpenCL
  • Task Manager - A Parallel Building Library for Heterogeneous Computing
  • Parallel Path Analyzer (PPA) - A Visualized System Dependency Analyzer for Heterogeneous Computing
  • Quickly Optimize OpenCL Applications with SlotMaximizer
  • MXPA – The Multicore Cross-Platform Architecture for Performance-Portable Computing
  • Crypto++ Optimizations - Using Heterogeneous Computing to Improve Encryption / Decryption Performance

If you haven't already checked out their heterogeneous compute tools, one that really stands out is their Parallel Path Analyzer (PPA). The PPA is an excellent cross-platform tool with graphical interface and high-performance profiler over the heterogeneous system. It features a series of intelligent analysis utilities as well as recording capabilities that assists developers in identifying critical paths and bottlenecks in the applications as well as exploiting the maximized system resources to achieve acceleration.

If you won't be able to attend AFDS, there is another chance to connect directly with MulticoreWare experts to get trained on how to Quickly Optimize OpenCL Code with SlotMaximizer and their Multicore Cross-Platform Architecture for Performance-Portable code.  See "Training & Events" section below for dates and times.

Submit suggestions for hcInnovators to

Welcome to the new look and feel of the AMD Developer Newsletter!  The hcNewsFlash will focus more on news related to heterogeneous computing and we will start delivering every two weeks in order to keep you up to date on the ground swell of activity.  Hope you enjoy the new newsletter and make sure to submit your hcNews!

Just Released: AMD APP SDK 2.7 with OpenCL™ 1.2 Support and C Static Kernel Language
Beginning with the AMD OpenCL™ APP SDK 2.6 availability back in December of 2011 AMD has been making available preview versions of both OpenCL™ 1.2 support and improved C++ support for both host side and kernel side coding. With our recent release of the AMD OpenCL™ APP SDK 2.7 these capabilities are now fully supported in the SDK and fully integrated into the run-time support delivered via the AMD Catalyst™ software drivers.
» Read More
» Download AMD OpenCL™ APP SDK 2.7

Are you a Write Once, Run Anywhere Guy or Gal?

If you are a Java developer, you should definitely check out this beginners guide to getting started with Aparapi & APP SDK. This post will walk you through the first 10 steps (with code samples!) of how to get started profiling kernels. At the end of these 10 steps, you’ll know just how much faster your JAVA app will run leveraging the GPU.

The magic is supplied by Aparapi -- an open source Java API for expressing data parallel workloads. Aparapi converts bytecode to OpenCL so that it can be executed on the GPU. Even if you don’t have an OpenCL compatible platform or if your code cannot be converted to OpenCL, then Aparapi will execute the code in a thread pool. So you’ll so still get some performance improvement over a single threaded implementation. Aparapi extends Java’s promise of ‘Write once run anywhere’ to include the GPU. You should follow Aparapi on twitter at @aparapi
» Learn More

ARM Supports European Research on GPU Programming

Arm is a key participant in a European collaborative research project that is aiming to develop a language to boost the programmability of accelerator hardware, such as graphics processing units (GPUs).

Effective programming tools are essential to help broaden the adoption of heterogeneous systems... We aim to provide software developers with a variety of programming technologies that range from industry standards, such as OpenCL™, to domain-specific frameworks. The emphasis is on efficiency, performance portability and productivity,” said Dr Anton Lokhmotov, staff engineer, ARM.

“Parallel programming is becoming increasingly synonymous with accelerator programming. The CARP project is a unique opportunity for programming tools researchers to contribute practical solutions to the productivity, performance, and energy consumption challenges of accelerated computing systems, in close collaboration with hardware vendors and domain experts”, said Dr Albert Cohen, senior research scientist at INRIA.
» Read the full press release

New Open Source Library: VexCL

VexCL is vector expression template library for OpenCL. It has been created for ease of C++ based OpenCL development. Multi-device (and multi-platform) computations are supported.
» Get the source code
» Documentation

Submit suggestions for news to

OpenCL™ Programming Webinar Series Date Time
Heterogeneous Computing Tips and Tricks

June 19, 2012

10:00AM Pacific Time
MXPA - The Multicore Cross-Platform Architecture
for Performance-Portable Computing
June 26, 2012 10:00AM Pacific Time
Quickly Optimize OpenCL™ Applications with SlotMaximizer July 10, 2012 10:00AM Pacific Time
Conferences Date Location
AMD Fusion Developer Summit June 11-15, 2012 Bellevue, WA
GPGPU-Day June 28, 2012 Amsterdam
Training Date Location
C++ AMP Training by Acceleware June 19, 2012 Seattle, WA
OpenCL Programming Course by Acceleware June 26-29 Mountain View, CA
C++ AMP Training by Accleware July 10, 2012 New York, NY
C++ AMP Training by Acceleware July 24, 2012 Mountain View, CA
OpenCL Programming Course by Acceleware August, 21-24 Houston, TX
OpenCL Programming Course by Acceleware Sepetember 11-14 Los Angeles, CA
OpenCL Programming Course by Acceleware October 16-19 Calgary, AB

OpenCL™ Software Development Downloads

OpenCL™ Training Online

Recommended Reading

University Programs

Send comments and suggestions for the newsletter to and follow us on Twitter @hcNewsFlash


To get a throughput high bound of your memory-bound kernel use the SDK benchmark MemoryOptimization. The benchmark gives a close to reality To get a throughput high bound of your memory-bound kernel use the SDK benchmark MemoryOptimization. The benchmark gives a close to reality (sustainable) global memory bandwidth. To estimate your throughput just count the amount of bytes your kernel reads and writes through its lifespan multiply it by global domain and divide by the number obtained from the benchmark.

  Submit your Tips&Tricks to

AMD is Hiring!

Performance Application Engineer - HSA

Submit Heterogeneous Compute job listings to

Connect with the heterogeneous community and be a part of this new era of computing!

     LinkedIn FaceBook Twitter YouTube



©2012 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Advanced Micro Devices, Inc., One AMD Place, P.O. Box 3453, Sunnyvale, California 94088-3453 USA
AMD Developer Central | RSS | Copyright | Privacy | Trademark Info