<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.amddevcentral.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>AMD Developer Central Blogs</title>
	
	<link>http://blogs.amd.com/developer</link>
	<description />
	<lastBuildDate>Tue, 02 Mar 2010 22:03:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.amddevcentral.com/AmdDeveloperBlogs" /><feedburner:info uri="amddeveloperblogs" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by-nd/2.0/</creativeCommons:license><image><link>http://creativecommons.org/licenses/by-nd/2.0/</link><url>http://creativecommons.org/images/public/somerights20.gif</url><title>Some Rights Reserved</title></image><feedburner:emailServiceId>AmdDeveloperBlogs</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>ACML 4.4.0 Released!</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/Y3WzdBs9e58/</link>
		<comments>http://blogs.amd.com/developer/2010/02/23/acml-4-4-0-released/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 15:41:02 +0000</pubDate>
		<dc:creator>chipf</dc:creator>
				<category><![CDATA[AMD Libraries]]></category>
		<category><![CDATA[ACML]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=924</guid>
		<description><![CDATA[Most of the Linux builds of ACML 4.4.0 are now live on the ACML download page.   The NAG builds will follow very soon.
We have not yet posted any Windows builds, although there are links for them on the download page.
The ACML team is working diligently to produce final builds for Windows 64-bit and 32-bit.  We [...]]]></description>
			<content:encoded><![CDATA[<p>Most of the Linux builds of ACML 4.4.0 are now live on the ACML download page.   The NAG builds will follow very soon.</p>
<p>We have not yet posted any Windows builds, although there are links for them on the download page.</p>
<p>The ACML team is working diligently to produce final builds for Windows 64-bit and 32-bit.  We want to make sure that they are right and resolve a few issues that developers have commented on.   The 64-bit builds are passing all of our testing, and now we are working on the final packaging.  We hope to post them by the end of this week. <strong><br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
</strong><em>This response is provided for informational purposes only, is provided “AS IS” and does not obligate AMD to provide any of the services, technology, or programs described.</em></p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/Y3WzdBs9e58" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2010/02/23/acml-4-4-0-released/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2010/02/23/acml-4-4-0-released/</feedburner:origLink></item>
		<item>
		<title>ADIT Episode #6: Using CPUID</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/iEguSE-_pQ8/</link>
		<comments>http://blogs.amd.com/developer/2010/02/22/adit-episod-6-using-cpuid/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 23:47:51 +0000</pubDate>
		<dc:creator>stroia</dc:creator>
				<category><![CDATA[Inside Dev Central]]></category>
		<category><![CDATA[CPUID]]></category>
		<category><![CDATA[Inside Track]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=780</guid>
		<description><![CDATA[There is a lot of information you can gather using the CPUID instruction.  In this month’s episode of the AMD Developer Inside Track is on “Using CPUID”,  I asked Randy VanderHeyden, a 17 year AMD veteran, what is CPUID, when should you use it and how? 
The main point Randy makes is that you should use CPUID [...]]]></description>
			<content:encoded><![CDATA[<p>There is a lot of information you can gather using the CPUID instruction.  In this month’s episode of the <a href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx">AMD Developer Inside Track is on “Using CPUID</a>”,  I asked Randy VanderHeyden, a 17 year AMD veteran, what is CPUID, when should you use it and how? </p>
<p>The main point Randy makes is that you should use CPUID to identify what <em>features</em> you need in the processor to support your optimized code path, example: vectorized code such as <a href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a>, rather than basing your code path on <em>what</em> processor you have.</p>
<p>A typical example of what not to do is to use CPUID to get the vendor ID string, and then base your optimized code path on that.  There are a few problems with this approach:</p>
<ol>
<li>It is possible that not all of the vendor’s legacy processors support the optimized code you are writing, and thus might break compatibility on older processors</li>
<li>It is possible that you will be leaving out another vendor’s processors even though those processors actually support the features needed to take advantage of the optimized code path</li>
</ol>
<p>Another thing to be careful of is using CPUID for parallel programming.  In this video, Randy demonstrates how to find the processor core count, however if you are looking for actual topology information (i.e. cache hierarchy), then you’ll be better off using the operating system methods.  For example, in Windows you can use the API call GetLogicalProcessorInformation to find NUMA and multi-core information.  Check out our older <a href="http://forums.amd.com/devblog/blogpost.cfm?threadid=87803&amp;catid=271">“Barcelona” blog</a> for more information on that.</p>
<p>Hopefully this is not news to those seasoned software optimization gurus, but questions arise from time to time on the forums, so this is for the rest of us.  I hope this helps you to understand more about how to make sure your optimized code paths are being used whenever possible.</p>
<p>Do you use CPUID?  If so how?  Was this video and blog helpful to you?</p>
<p>-Sharon Troia, Sr. Developer Relations Engineer</p>
<p>Related Links:</p>
<ul>
<li><a href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx">View ADIT Episode 6: Using CPUID</a></li>
<li><a href="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25481.pdf">CPUID Specification</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/hskdteyh.aspx">CPUID Intrinsic (MSDN)</a></li>
</ul>
<p> </p>
<p>————————-</p>
<p>The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/iEguSE-_pQ8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2010/02/22/adit-episod-6-using-cpuid/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2010/02/22/adit-episod-6-using-cpuid/</feedburner:origLink></item>
		<item>
		<title>ADIT Episode 5: AMD x86 Open64 Compiler Team Talks about Features and Optimization Flags</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/1VIYbHfcsp4/</link>
		<comments>http://blogs.amd.com/developer/2010/01/19/adit-episode-5-amd-x86-open64-compiler-team-talks-about-features-and-optimization-flags/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 20:13:13 +0000</pubDate>
		<dc:creator>AMD DeveloperCentral</dc:creator>
				<category><![CDATA[Inside Dev Central]]></category>
		<category><![CDATA[Inside Track]]></category>
		<category><![CDATA[Open64]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=355</guid>
		<description><![CDATA[It is the mission of the AMD Developer Inside Track (ADIT for short), to get you helpful tips straight from source.  Luckily, the Open64 team was willing to participate!
This video features Mike Vermeulen &#8211; manager of the AMD Open64 development team talking about the origins of Open64.  It has come a long way since we [...]]]></description>
			<content:encoded><![CDATA[<p>It is the mission of the <a href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx">AMD Developer Inside Track</a> (ADIT for short), to get you helpful tips straight from source.  Luckily, the Open64 team was willing to participate!</p>
<p>This video features Mike Vermeulen &#8211; manager of the AMD Open64 development team talking about the origins of Open64.  It has come a long way since we launched the beta compiler suite in the summer of 2009, see Roy&#8217;s <a href="blogpost.cfm?threadid=116028&amp;catid=208">blog</a>, but it existed long before we started working with it as Mike reveals.</p>
<p>Roy Ju, Architect on the Open64 team, speaks about one of the unique features recently added called multi-core scalability optimizations (-mso).  He explains how it works, and why it might not be in most compilers.  Then he talks about the <a href="http://open64.net/">Open64 community</a> in general and how you might be able to participate if you are interested.</p>
<p>Michael Lai, software engineer on the Open64 team, steps through several <a href="http://developer.amd.com/assets/x86Open64QuickRef.pdf">optimization flags</a>, such as O2 and O3 optimizations and what is included with each.  He also talks about interprocedural analysis, or IPA, and how it can be used to help get better cache utilization.</p>
<p>If you are into high performance parallel workloads you won&#8217;t want to miss <a href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx#open64">this episode</a> of the AMD Developer Inside Track on Open64. </p>
<p>-Sharon Troia, Sr. Developer Relations Engineer</p>
<p>Related links:</p>
<ul>
<li><a href="http://developer.amd.com/cpu/open64/Pages/default.aspx#four">AMD x86 Open64 Compiler Suite Downloads</a></li>
<li><a href="http://developer.amd.com/cpu/open64/assets/x86_open64_user_guide.pdf">X86 Open64 User&#8217;s Guide</a></li>
<li><a href="http://developer.amd.com/assets/x86Open64QuickRef.pdf">Open64 Quick Reference Guide to Optimization Flags</a></li>
<li><a href="http://developer.amd.com/cpu/open64/assets/ReleaseNotes.txt">X86 Open64 Installation Prerequisites and Guidelines</a></li>
</ul>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/1VIYbHfcsp4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2010/01/19/adit-episode-5-amd-x86-open64-compiler-team-talks-about-features-and-optimization-flags/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2010/01/19/adit-episode-5-amd-x86-open64-compiler-team-talks-about-features-and-optimization-flags/</feedburner:origLink></item>
		<item>
		<title>AMD at PDC 2009</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/6fdL89Os8g8/</link>
		<comments>http://blogs.amd.com/developer/2009/12/15/amd-at-pdc-2009/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 01:06:13 +0000</pubDate>
		<dc:creator>AMD DeveloperCentral</dc:creator>
				<category><![CDATA[Inside Dev Central]]></category>
		<category><![CDATA["Magny-Cours"]]></category>
		<category><![CDATA[DirectX11]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[PDC]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=356</guid>
		<description><![CDATA[People often wonder why AMD exhibits at PDC, they&#8217;ll ask me, &#8220;Doesn&#8217;t AMD just do hardware?&#8221;  The suprising answer is that AMD actually maintains several software tools and libraries that we created in-house.  We also work very closely with our software partners, Microsoft being one of them, to make sure that their software tools and OS [...]]]></description>
			<content:encoded><![CDATA[<p>People often wonder why AMD exhibits at PDC, they&#8217;ll ask me, &#8220;Doesn&#8217;t AMD just do hardware?&#8221;  The suprising answer is that AMD actually maintains several software tools and libraries that we created in-house.  We also work very closely with our software partners, Microsoft being one of them, to make sure that their software tools and OS take full advantage of everything our hardware platforms have to offer, and to make sure our hardware supports the latest software needs (see the AMD Dev Central <a href="http://developer.amd.com/zones/windows/Pages/default.aspx">Windows® Zone</a> to learn more).  PDC is a great place to show all this off.  Here is a summary of AMD&#8217;s booth at PDC 2009 and some related links. </p>
<p><strong>The Booth</strong></p>
<p><img style="vertical-align: middle; border: 2px solid black;" src="http://developer.amd.com/blog_assets/PDCbooth_small.jpg" alt="AMD Booth" width="360" height="162" /></p>
<p><strong> </strong>We scaled back on our sponsorship this year to two small exhibitor booths side by side, expecting a smaller than average PDC.  It turned out that there were 1,000 more people than last year to make 6,000 total!  It&#8217;s a good thing we didn&#8217;t scale back the demos.  Our booth was packed to capacity with a laptop, a fully loaded desktop, a fully loaded server, and even a sneak peak at an engineering reference design of a server/workstation system featuring a 12-core processor, code-named &#8220;Magny-Cours&#8221;.</p>
<p><strong>The Demos</strong></p>
<p> <img style="vertical-align: middle; border: 2px solid black;" src="http://developer.amd.com/blog_assets/PDCDX11_small.jpg" alt="ATI Radeon HD 5870 Demo" width="360" height="117" /></p>
<p><strong><em>Desktop: 3.2 GHz AMD Phenom<sup>TM</sup> II X4 with an ATI Radeon 5870 with Eyefinity and DX®11</em></strong></p>
<p>This demo was driving 3 displays which appear to Windows® 7 as a single desktop (AMD&#8217;s ATI  <a href="http://www.amd.com/us/products/technologies/eyefinity/Pages/eyefinity.aspx">EyeFinity</a> Technology).  We demonstrated DX 11 and hardware features such as hardware tessellation, ambient occlusion, and DirectCompute using software from the DX11 SDK as well as upcoming software titles (Dirt 2, Aliens vs. Predator), and the <a href="http://unigine.com/download/">Unigine</a> &#8220;Heaven&#8221; DX11 benchmark.  Check out <a href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx">the Vision Launch Recap video</a> for footage of these demos and interviews with the game developers using DX11.</p>
<p>Check out these PDC sessions and Keynotes that were powered by the ATI Radeon 5870</p>
<ul>
<li><a href="http://microsoftpdc.com/Sessions/KEY02">Day #2 Keynote</a> (~ highlighted started at minute29)</li>
<li><a href="http://microsoftpdc.com/Sessions/CL15">Modern 3D Graphics Using Windows 7 and Direct3D 11 Hardware</a></li>
<li><a href="http://microsoftpdc.com/Sessions/P09-16">DirectX11 Direct Compute</a></li>
</ul>
<p> <img style="vertical-align: middle; border: 2px solid black;" src="http://developer.amd.com/blog_assets/PDCServersSmall.jpg" alt="HP DL785 G6 on the left, " width="360" height="240" /></p>
<p><strong><em>Server: HP DL785 G6, packed with 48 cores and 128 GB of memory</em></strong></p>
<p>HP&#8217;s DL785 G6 is an 8 socket server platform that has a total of 48 cores, using AMD&#8217;s six core &#8220;<a href="http://developer.amd.com/zones/istanbul/Pages/default.aspx">Istanbul</a>&#8221; processor.  At PDC we were using an image processing application that was built using the Parallel Patterns Library and the Concurrency Runtime (ConcRT).  ConcRT and PPL will ship in Visual Studio® 2010.  The application manipulates a number of 60 Megapixel images by converting them from RAW (Bayer) format into RGB, and applies a number of corrections such as gamma, luminance, and anti-aliasing to arrive at the final image.  Using ConcRT, developers were able to rapidly incorporate task-based programming into their application, which helps them to scale and fully utilize all the CPU cores, regardless if there were four cores in the system or 48.</p>
<p>Check out these PDC sessions that talk about many core and refer to or used the AMD system:</p>
<ul>
<li><a href="http://microsoftpdc.com/Sessions/FT19">C++ Forever: Interactive Applications in the age of Manycore</a> (The demo in this session was running on our system in our booth)</li>
<li><a href="http://microsoftpdc.com/Sessions/SVR10">Lighting up Windows Server 2008 R2 Using ConcRT on UMS</a> (48-core was used)</li>
</ul>
<p> </p>
<p><strong><em>Server/Workstation: 12-core processor code named &#8220;Magny-Cours&#8221; </em></strong></p>
<p>We were able to show a sneak-peek of our upcoming &#8220;Magny-Cours&#8221; 12-core processor that will be available in Q1 of next year.  This system was running a parallel version of a Raytracing application built in C#, using the Task Parallel Library (TPL) in .NET 4.0, and illustrated the benefits of parallel managed code development both from a developer productivity and scaling perspective.</p>
<p> <a href="http://developer.amd.com/blog_assets/PDCServersSmall.jpg">http://developer.amd.com/blog_assets/PDCServersSmall.jpg</a></p>
<p><strong><em>Laptop: AMD Turion<sup>TM</sup> 2 Ultra running AMD CodeAnalyst Performance Analyzer</em></strong></p>
<p>Our latest &#8220;Tigris&#8221; platform increases performance as well as battery life and was shown running our CodeAnalyst Performance tool, a free download to AMD Dev Central members is available at <a href="http://developer.amd.com/codeanalyst">http://developer.amd.com/codeanalyst</a>.  CodeAnalyst is a great tool for drilling down into performance problems in your code, whether it is native C/C++ or managed.  On AMD processors, in addition to being a time based sampling profiler, it can also be used to sample processor events (such as L2 cache misses) to help you optimize your application on the underlying hardware.</p>
<p><strong>The Summary</strong></p>
<p>Visual Studio® 2010 along with Microsoft&#8217;s ConcRT, parallel patterns libraries and task parallel libraries can really help get better scaling out of many core systems and you can use AMD CodeAnalyst within the Visual Studio IDE to help pin point hotspots in your code.  The latest ATI Radeon graphics can, not only get you 3 (or more) monitors use, but also take advantage of the latest DirectX APIs and the features like DirectCompute that go along with that.   The future will bring even more cores on a single die so these parallel programming tools are growing even more important.   </p>
<p>Happy Holidays Everyone!!</p>
<p>AMD Developer Outreach</p>
<p>PS.  Congrats to the 3 winners of the daily drawing for the ATI Radeon HD 5870!  Thanks to everyone who took the time to fill out the survey card and scan your badge!! </p>
<p style="font-style: italic;">The information presented in this blog is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/6fdL89Os8g8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/12/15/amd-at-pdc-2009/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/12/15/amd-at-pdc-2009/</feedburner:origLink></item>
		<item>
		<title>SuperComputing 2009 – Day 2</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/Za3oKiFUlks/</link>
		<comments>http://blogs.amd.com/developer/2009/11/19/supercomputing-2009-day-2/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 01:06:13 +0000</pubDate>
		<dc:creator>chipf</dc:creator>
				<category><![CDATA[AMD Libraries]]></category>
		<category><![CDATA[ACML]]></category>
		<category><![CDATA[BLAS]]></category>
		<category><![CDATA[FFT]]></category>
		<category><![CDATA[LAPACK]]></category>
		<category><![CDATA[Matlab]]></category>
		<category><![CDATA[OpenMP]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=357</guid>
		<description><![CDATA[Of the Wednesday sessions, one of the most interesting was a talk on Matlab. Matlab has language constructs such as parfor that enable rapid migration to multicore. And the distributed keyword marks an array as suitable for parallel processing. Matlab figures out the rest. In todays world of multicore CPUs, tools like this will be [...]]]></description>
			<content:encoded><![CDATA[<p>Of the Wednesday sessions, one of the most interesting was a talk on Matlab. Matlab has language constructs such as <strong>parfor</strong> that enable rapid migration to multicore. And the <strong>distributed</strong> keyword marks an array as suitable for parallel processing. Matlab figures out the rest. In todays world of multicore CPUs, tools like this will be indespensible for getting the most out of your CPU dollars.</p>
<p>Later in the day I attended an OpenMP Birds-of-a-Feather. They discussed the roadmap for OpenMP versions. The 3.1 version is imminent. Among the features discussed was better CPU affinity. This got my attention. One of the keys to repeatably good performance for large multithreaded ACML tasks is ensuring good task CPU affinity. ACML uses OpenMP to provide parallel operation for many of the BLAS, LAPACK, and FFT routines. When ACML is called with a large enough problem, it will run in a OpenMP parallel section to divide the problem among available CPU cores. Thread affinity is needed to keep threads running always on the same core to maximize cache reuse and minimize remote memory accesses. The numactl API and command are not sufficient. Numactl will restrict a group of threads to a set of processors, but does not prevent migration of tasks between the specified nodes and cores.</p>
<p>Fortunately, all of the OpenMP capable compilers used by ACML have implementation specific ways to lock tasks to cores. The downside is that each compiler has a different set of environment variables to control this. It sounds like OpenMP 3.1 will standardize this. It may be a while before the compilers catch up and implement this new feature, so in the meantime, check the documentation for each compiler to determine the best way to enforce affinity.</p>
<p><img title="SuperComputing 2009 - Day 2" src="http://developer.amd.com/blog_assets/sc09-2.JPG" alt="" width="448" height="336" /></p>
<p>The AMD booth was conveniently located near PGI, Cray, and Oak Ridge.</p>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/Za3oKiFUlks" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/11/19/supercomputing-2009-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/11/19/supercomputing-2009-day-2/</feedburner:origLink></item>
		<item>
		<title>SuperComputing 2009</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/yt21x38nmE8/</link>
		<comments>http://blogs.amd.com/developer/2009/11/18/supercomputing-2009/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 01:06:13 +0000</pubDate>
		<dc:creator>chipf</dc:creator>
				<category><![CDATA[AMD Libraries]]></category>
		<category><![CDATA[ACML]]></category>
		<category><![CDATA[FFT]]></category>
		<category><![CDATA[Matlab]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=358</guid>
		<description><![CDATA[Online at last. I can&#8217;t believe some hotels still charge for wireless internet. Here at the SuperComputing 2009 show in sunny Portland Oregon, wireless is free, fast, and reliable.

For the first time at an super computing show, I have no booth demo so aside from the usual meetings, I&#8217;m free to actually attend technical sessions. [...]]]></description>
			<content:encoded><![CDATA[<p>Online at last. I can&#8217;t believe some hotels still charge for wireless internet. Here at the SuperComputing 2009 show in sunny Portland Oregon<img src="http://forums.amd.com/devforum/i/expressions/face-icon-small-wink.gif" border="0" alt="" />, wireless is free, fast, and reliable.</p>
<p><img src="http://developer.amd.com/blog_assets/sc09-1.JPG" alt="" width="560" height="420" /></p>
<p>For the first time at an super computing show, I have no booth demo so aside from the usual meetings, I&#8217;m free to actually attend technical sessions. There are way too many sessions for one person to attend, difficult decisions must be made. My priorities are learning more about how real applications are using sparse solvers and what kind of performance is being reported, finding FFT applications that we can use for benchmarking and rounding out our feature set, exploring how MATLAB users are calling ACML, and looking at how GPUs are being adopted in the HPC market.</p>
<p>The opening keynote by Justin Rattner was kind of interesting. His talk was about 3D internet and ways to catalyze exponential growth for high performance computing. He talked abou OpenSim and one of his demos was a realtime talk with a University of Utah researcher who was represented locally by a sim&#8217;d avatar inhabiting a world populated by sim&#8217;d ferns which were the subject of his research, all rendered in realtime. The fern simulations were interacting with many aspects of the environment, which was also simulated. It&#8217;s easy to see how you could create complex simulation worlds that require huge amounts of computing power. Oh, and there&#8217;s a bunch of software needed also.</p>
<p>One of the technical sessions looked like just what I was looking for on sparse solvers. The big take away for me is that the primary task most people are using as a metric is as simple as solving Ax=b, where A is large and sparse. There are many ways to solve the problem, and the techniques used are very much dependent on the nature of the sparse system. The example problems used in the presentations are drawn from a variety of problem areas, and have matrix sizes on the order of 1 million squared, with about a hundred million non-zero elements.</p>
<p>The set of data found at the Matrix Market (<a href="http://math.nist.gov/MatrixMarket/">http://math.nist.gov/MatrixMarket/</a>) features much smaller matrices, so some searching will be necessary to find appropriate large problem examples suitable for todays faster computers.</p>
<p>It&#8217;s a bit humbling that I&#8217;m stumbling into an area of computing that&#8217;s been well known for the last 20 years. Many of the people who have worked on sparse solver applications over the years are here at the show.</p>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/yt21x38nmE8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/11/18/supercomputing-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/11/18/supercomputing-2009/</feedburner:origLink></item>
		<item>
		<title>The VELOX research project</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/BL8tOhYKDDg/</link>
		<comments>http://blogs.amd.com/developer/2009/11/17/the-velox-research-project/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 01:06:13 +0000</pubDate>
		<dc:creator>sdiestel</dc:creator>
				<category><![CDATA[AMD Operating System Research Center (OSRC)]]></category>
		<category><![CDATA[Advanced Synchronization Facility]]></category>
		<category><![CDATA[OSRC]]></category>
		<category><![CDATA[Transactional Memory]]></category>
		<category><![CDATA[Velox]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=359</guid>
		<description><![CDATA[This is the third of three blog articles describing how AMD&#8217;s Operating System Research Center (OSRC) became involved in the development of the Advanced Synchronization Facility (ASF), how we are evaluating ASF, and how this and other activities fit into the EU-funded VELOX project aiming at improving the state of the art for software-transactional-memory systems.
AMD&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>This is the third of three blog articles describing how AMD&#8217;s Operating System Research Center (OSRC) became involved in the development of the Advanced Synchronization Facility (ASF), how we are evaluating ASF, and how this and other activities fit into the EU-funded <span style="text-decoration: underline;"><a href="http://www.velox-project.eu/">VELOX project</a></span> aiming at improving the state of the art for software-transactional-memory systems.</p>
<p>AMD&#8217;s Operating System Research Center plays a central role in the EU-funded <span style="text-decoration: underline;"><a href="http://www.velox-project.eu/">VELOX project</a></span>, which targets an integrated approach to transactional memory (TM) on multi and many-core computers. I will shed some light on this role in the next section after a short introduction into transactional memory.</p>
<h4><a name="Transactional_memory"></a>Transactional memory</h4>
<p>The transactional-memory programming paradigm is one promising approach for better and easier leveraging the trend to more parallelism in hardware. In the future, in order to benefit from hardware developments, software will have to utilize the increase in hardware parallelism. Simply relying on increases in single-thread performance will not bring large improvements anymore.</p>
<p>There exists a range of approaches to leverage this hardware parallelism, all of which have different restrictions:</p>
<ul>
<li>For example, traditional lock-based approaches either do not scale well to a large number of cores, if coarse-grained locking is used, or</li>
<li>They are hard to get right and are complex if fine-grained locking is used.</li>
<li>Lock-free approaches may provide good performance for specific cases but are also very tricky to implement correctly.</li>
</ul>
<p>Transactional memory may come as a solution here because the programming model is easier to grasp: Atomic blocks are marked in the source code and are mapped to transactional-memory primitives by a compiler. For lock-based approaches, single, distinct locks have to be designated for specific resources to obtain parallism. Although with TM only a single primitive is used for marking atomic blocks, it may still provide good scalability as only <em>dynamic</em> conflicts have to be handled. For example, if two threads use a single lock for protecting an atomic block of code, only one threads can ever enter this block at once, independently of what is done inside the block. If this block of code is used for transferring funds from one account to another, no such transfers can run in parallel. With fine-grained locking, a lock for each account could be used required, which is more complex to use and may actually lead to a deadlock. With TM, any number of threads can enter this block and a conflict occurs only if the same memory regions are touched. For the account example, this means that conflicts only occur if two threads try to modify the same accounts at the same time, otherwise, everything can run in parallel.</p>
<p>The following example demonstrates the simple syntax for protection a block of code with a transaction:</p>
<p><span style="color: #ff0000; font-family: courier new,courier;">__tm_atomic {</span><span style="font-family: courier new,courier;"><br />
res = transfer_funds(src_account, dest_account);</span><span style="color: #ff0000; font-family: courier new,courier;"><br />
<span style="color: #000000;">account_update_stat++;</span><br />
}</span></p>
<p>Compare this to the complexity of a fine-grained locking example:</p>
<p><span style="color: #ff0000; font-family: courier new,courier;">lock(src_account.lock);<br />
</span><span style="color: #ff0000; font-family: courier new,courier;">lock(dest_account.lock);<br />
</span><span style="color: #ff0000; font-family: courier new,courier;">lock(stats_lock);<br />
</span><span style="font-family: courier new,courier;">res = transfer_funds(src_account, dest_account);</span><span style="color: #ff0000; font-family: courier new,courier;"><br />
<span style="color: #000000;">account_update_stat++;</span></span><span style="color: #ff0000; font-family: courier new,courier;"><br />
unlock(src_account.lock);</span><span style="color: #ff0000; font-family: courier new,courier;"><br />
unlock(dest_account.lock);<br />
unlock(stats_lock);</span></p>
<p>Software transactional memory (STM), up to now is not ready for widespread deployment. Its overhead simply is too high for many applications due to the additional bookkeeping involved. We saw an order-of-magnitude slowdown for single microbenchmarks in our measurements compared to a lock-free implementation. Even with perfect scalability around ten cores would be required to match the performance of traditional approaches on a single core.</p>
<p>Although a hardware-transactional-memory solution would be very desirable as the costly bookkeeping could be relocated to dedicated silicon, hardware-transactional memory is not available for current industry architectures. Full-blown support is very complex to realize as it would have to interact with many processor parts and would be very resource hungry to implement.</p>
<p>Therefore, special hardware support for speeding up critical paths in STM proposals and support for hybrid software-hardware solutions, as targeted with ASF, is a key motivation for AMD to participate in VELOX.</p>
<h4><a name="Our_Role_in_VELOX"></a>Our Role in VELOX</h4>
<p>In VELOX we fulfill two roles: First, we provide partners with support for verifying implementation proposals regarding their feasibilty from an industry point-of-view. Second, we verify our own hardware-transactional-memory proposal &#8211; AMD&#8217;s ASF.</p>
<p>We recently released version 2.1 of the <span style="text-decoration: underline;"><a href="http://developer.amd.com/cpu/ASF/Pages/default.aspx">ASF Specification</a></span>. Feel free to comment on it, we are interested in your feedback!</p>
<p>In the <span style="text-decoration: underline;"><a href="../blogpost.cfm?threadid=118419&amp;catid=317">previous entry</a></span> of our blog series on ASF, Stephan described how we use <span style="text-decoration: underline;"><a href="http://www.ptlsim.org/">PTLsim</a></span> for prototyping and evaluating ASF. To this end, we also extend PTLsim&#8217;s memory module to better correspond to contemporary AMD multicore systems.</p>
<h4><a name="PTLsim.27s_Memory_Module"></a>PTLsim&#8217;s Memory Module</h4>
<p>Up to now, PTLsim&#8217;s memory model is restricted to a single, inclusive cache hierarchy. This limits the simulation accuracy for current AMD multicore systems, which connect single processors with HyperTransport links.</p>
<p>To overcome this limitation we are working on extending PTLsim&#8217;s memory model.</p>
<ul>
<li>We therefore first establish an interface between the core model and the memory hierarchy and refactor PTLsim to solely access memory through this interface.<br />
Up to now, PTLsim&#8217;s access to memory locations is somewhat scattered throughout the code and the cache classes are tightly integrated with the rest of the code.<br />
For the interface design itself, we were inspired by the requests in M5&#8217;s memory model and by the Ruby memory module in GEMS.</li>
<li>If this process is completed, other, more versatile memory modules can be used together with PTLsim&#8217;s core, such as the module that is currently being developed at VELOX partner Chalmers University of Technology. Alternatively, the refactored memory module in PTLsim can be developed further and can be extended to more accurately model a HyperTransport-connected NUMA system.</li>
</ul>
<p style="text-align: center;"><img src="http://developer.amd.com/PublishingImages/velox_mm1.png" alt="Initial state of PTLsim's memory module." width="200" height="203" /></p>
<p>Starting point for the memory module structure in original PTLsim. The parts of the cache hierarchy (that will comprise the memory module) are tightly integrated with the rest of PTLsim.</p>
<p style="text-align: center;"><img src="http://developer.amd.com/PublishingImages/velox_mm2.png" alt="Encapsulated memory module." width="250" height="207" /></p>
<p>Architecture after introducing the memory-module interface and using it to fully encapsulate accesses to the memory module (the cache hierarchy).</p>
<p style="text-align: center;"><img src="http://developer.amd.com/PublishingImages/velox_mm3.png" alt="Replace encapsulated module with VELOX memory module." width="250" height="207" /></p>
<p>Now the memory module can be replaced with another one, implementing the same interface</p>
<h4><a name="Outlook"></a>Outlook</h4>
<p>I am currently in the process of updating the established interface inside PTLsim to the new packet-oriented version. Once this is done, we plan to release this version to Velox partners and will help integrate the first external memory module with PTLsim. Further on, PTLsim will be part of an integrated Velox demonstrator. Already our current collaboration with Velox partners is giving us valuable feedback regarding the ASF specification and the implementation in PTLsim. By further working on the memory module, we will be able to do better, more in-depth evaluations of ASF and will be able to estimate how ASF could be best used in tomorrow&#8217;s software.</p>
<h4><a name="About_me_2"></a>About me</h4>
<p>I joined AMD&#8217;s OSRC group in Dresden in October 2008 to work on ASF and, in the context of VELOX, to extend PTLsim.</p>
<p>Before that, I was a PhD student at Technische Universität Dresden, with interests in runtime monitoring for real-time systems, microkernel-based systems, and other operating system topics, such as disk scheduling and file systems.</p>
<p>Martin Pohlack, Senior Software Engineer<br />
AMD Operating System Research Center, Dresden</p>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/BL8tOhYKDDg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/11/17/the-velox-research-project/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/11/17/the-velox-research-project/</feedburner:origLink></item>
		<item>
		<title>Dealing With Reality | The Interview | ATI Stream and OpenCL | Part 2</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/ruY5dOd4r1A/</link>
		<comments>http://blogs.amd.com/developer/2009/10/13/dealing-with-reality-the-interview-ati-stream-and-opencl-part-2/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 01:05:40 +0000</pubDate>
		<dc:creator>ssolotko</dc:creator>
				<category><![CDATA[ATI Stream]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Parallel Computing]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=360</guid>
		<description><![CDATA[In Part I on the AMD At Home Blog Simon Solotko gave an overview of open, parallel computing with ATI Stream and OpenCl. Here, in Part 2, Simon Solotko &#38; Ben Sander discuss the power of ATI Stream technology and the elegant, standards-based interface now available with OpenCL for GPU.
Ben, what have we created with [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://links.amd.com/openftw">Part I on the AMD At Home Blog</a> Simon Solotko gave an overview of open, parallel computing with ATI Stream and OpenCl. Here, in Part 2, Simon Solotko &amp; Ben Sander discuss the power of ATI Stream technology and the elegant, standards-based interface now available with OpenCL for GPU.</p>
<p>Ben, what have we created with OpenCL and what does it do?</p>
<p><em>Ben: Sure, with OpenCL we created a C-based interface for programming a range of parallel processors. Developers write OpenCL Kernels, sub-routines which developers seek to accelerate or offload, and embed these in their applications. OpenCL includes a runtime component which allows these OpenCL Kernels to be compiled at runtime for either a CPU or GPU. AMD has contributed to the development of the OpenCL specification and written the implementation x86 processors and GPU&#8217;s &#8211; a runtime environment which compiles the code near runtime, then schedules and executes the code at runtime. </em></p>
<p>What are the benefits of being able to compile an application for a CPU or a GPU?</p>
<p><em>Ben: Developers can write one piece of code and easily support a variety of compute devices in the platform &#8211; CPUs and GPUs, from multiple vendors. Code can be load-balanced between CPU and GPU depending on the capabilities in the final platform. For example, we expect that some applications or parts of applications will run faster on the CPU than the GPU, other applications perform better on the GPU. Finally, the OpenCL CPU implementation levertages the CPU hardware debug features to provide excellent debug capabilities, using familiar debug environments, at a full CPU speeds. </em></p>
<p>When exactly during runtime is the Kernel compiled?</p>
<p><em>Ben: There are specific commands within the body of your application which you call to compile the Kernel, and direct it to be compiled for the CPU or GPU. At that point, the Kernel code is translated into a binary. The binary later executes natively when the Kernel is called. The code is not interpreted in the hot spot of the loop, it&#8217;s not like Java in that regard.</em></p>
<p>So the code within a Kernel looks like C but can be compiled to execute on the GPU?</p>
<p><em>Ben: Exactly. Because a GPU looks and functions differently than a CPU, however, you have to think differently when you write the Kernel for GPU, because at that point, you are executing your code directly on the GPU. There are constraints imposed on Kernel code to accommodate the specialized functionality of the GPU. Kernels are based on C99 with extensions provided by OpenCL-C for vectors and address spaces. </em></p>
<p>Give me some examples of the special ways in which the C code within a Kernel is different from the standard code in the body of the application?</p>
<p><em>Ben: To understand writing a Kernel it is important to understand that the code is actually executing on a GPU, despite the fact that the functions you are performing are syntactically the same as other C code. A GPU has a small fast cache (local memory) and larger main GPU memory (global memory). You move data in blocks, and complete as much of the task on that block as possible before moving the block out and moving the next block in. With a GPU we have a lot of compute bandwidth relative to memory bandwidth making it advantageous to do as much as you can to data within the cache. With OpenCL the blocking process does not necessarily get easier, but you can control it from C code.</em></p>
<p>How do we move data from main memory to the GPU memory for use by a Kernel function?</p>
<p><em>Ben: A Kernel cannot move memory from main memory, that is done in your application code. So there are standard functions to copy memory into GPU memory from the application, and pointers to this memory can then be passed to a Kernel function. The Kernel function can then copy memory into the fast cache or &#8220;local&#8221; memory.</em></p>
<p>This sounds a bit complicated, but I have to remind myself, this is all standard C code, and we are discussing the optimization that makes something run fast on the GPU, and the memory management tools that are available, now within standard C through the OpenCL library, to do that.</p>
<p><em>Ben: That&#8217;s Right. The magic is that a Kernel is C code which is amazingly compiled by the runtime component of OpenCL to run on a GPU or CPU with some extra tools to ensure it can take full advantage of the extremely high compute to memory bandwidth capability of the fast, parallel math engine of the GPU.</em></p>
<p>So as time goes on, we anticipate that people will write and optimize many useful Kernels which will simplify the development of complex applications?</p>
<p><em>Ben: Yes. It is relatively straight-forward to port applications written for other GPGPU languages like Brook+ and CUDA to OpenCL. This is a huge step forward from proprietary GPU code, you now have a standard way to get at GPU code and memory from C in a platform independent way.</em></p>
<p>With ATI Stream technology and the standardization of the programming model with OpenCL for GPU almost any aspiring GPGPU developer can download the tools necessary to get started and develop platform-independent software fueled by the power of the evolved GPU. I have collected resources below to get you started, enjoy blazing the trail of a new frontier in computing!</p>
<p>For more information, watch as <a href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx">AMD&#8217;s Mike Houston discusses OpenCL </a>and what the future has in store for software applications that use it.</p>
<p>If you are ready to get started with OpenCL, you can begin with <a href="http://ati.amd.com/technology/streamcomputing/opencl.html">AMD&#8217;s OpenCL resource page</a> here.  </p>
<p>Simon has regular posts on the AMD At Home blog and you can check out <a href="http://blogs.amd.com/home/2009/07/29/the-home-central-computer-a-hypothetical-inteview/">The Digital Nexus</a> series here.</p>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/ruY5dOd4r1A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/10/13/dealing-with-reality-the-interview-ati-stream-and-opencl-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/10/13/dealing-with-reality-the-interview-ati-stream-and-opencl-part-2/</feedburner:origLink></item>
		<item>
		<title>AMD Developer Inside Track, Episode 2:  OpenCL Introduction</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/iN6y0rOj7H8/</link>
		<comments>http://blogs.amd.com/developer/2009/09/15/amd-developer-inside-track-episode-2-opencl-introduction/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 01:05:41 +0000</pubDate>
		<dc:creator>AMD DeveloperCentral</dc:creator>
				<category><![CDATA[ATI Stream]]></category>
		<category><![CDATA[Game Development]]></category>
		<category><![CDATA[OpenCL]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=361</guid>
		<description><![CDATA[AMD has always been an advocate of open standards that build on and extend proven technologies (example: x86-64)W.  As such, it is a natural fit for AMD to embrace OpenCL as part of its ATI Stream offering.  But, just what is OpenCL?  
In this month&#8217;s episode of the AMD Developer Inside Track I interview Mike [...]]]></description>
			<content:encoded><![CDATA[<p><span>AMD has always been an advocate of open standards that build on and extend proven technologies (example: x86-64)W.  As such, it is a natural fit for AMD to embrace OpenCL</span><span> </span><span>as part of its <a href="http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx">ATI Stream </a>offering.  But, just what is OpenCL?  </span></p>
<p><span>In this month&#8217;s episode of the <a title="AMD Developer Inside Track" href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx" target="_self">AMD Developer Inside Track </a>I interview Mike Houston, GPG System Architect.  He talks about what OpenCL is, what the transition to this new language will be like and he gets into what applications could benefit from OpenCL, as well as what the future has in store for software applications that use it.  </span> </p>
<p>One of the advantages of OpenCL is its advanced queuing system which is great for game development. It is also designed to work very well with various graphics APIs such as OpenGL, DirectX 9 and DirectX 10. </p>
<p><span>Game developers aren&#8217;t the only ones who can take advantage of OpenCL though.  According to Michael, it is going to be very useful for applications such as media encoding, virus scanning, and physics to name a few.  It makes a</span><span> </span><span>lot of sense for AMD to move to a ubiquitous computing language that runs on platforms everywhere.  The next few years will be an interesting time for GPGPU technology as several hardware and software vendors get on board.  </span></p>
<p><a href="http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx">ATI Stream technology </a>is gaining significant momentum.<span>  </span>Some cool and unexpected examples of ATI Stream technology in action are:</p>
<ul>
<li><span><a href="mailto:Folding@home">Folding@home</a>: </span><a href="http://folding.stanford.edu/"><span>http://folding.stanford.edu/</span></a></li>
<li><span><a href="mailto:Milkyway@home">Milkyway@home</a>:</span><span> </span><a href="http://milkyway.cs.rpi.edu/milkyway/"><span>http://milkyway.cs.rpi.edu/milkyway/</span></a></li>
</ul>
<p>An example of gaming technology and OpenCL:</p>
<ul>
<li><span>Havoc demo:</span><span> </span><a href="http://www.engadget.com/2009/03/27/havok-and-amd-show-off-opencl-with-pretty-pretty-dresses/"><span>http://www.engadget.com/2009/03/27/havok-and-amd-show-off-opencl-with-pretty-pretty-dresses/</span></a></li>
</ul>
<p>Watch the <a title="AMD Developer Inside Track Video Series" href="http://developer.amd.com/documentation/videos/InsideTrack/Pages/default.aspx" target="_self">AMD Developer Inside Track</a>, Episode 2 for the full story.</p>
<p><span> </span><span>-Sharon Troia, AMD Developer Outreach</span></p>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/iN6y0rOj7H8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/09/15/amd-developer-inside-track-episode-2-opencl-introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/09/15/amd-developer-inside-track-episode-2-opencl-introduction/</feedburner:origLink></item>
		<item>
		<title>Framewave Multipass Build System</title>
		<link>http://feeds.amddevcentral.com/~r/AmdDeveloperBlogs/~3/2vQOEJbnPlk/</link>
		<comments>http://blogs.amd.com/developer/2009/09/07/framewave-multipass-build-system/#comments</comments>
		<pubDate>Mon, 07 Sep 2009 01:05:41 +0000</pubDate>
		<dc:creator>Rameshj</dc:creator>
				<category><![CDATA[AMD Libraries]]></category>
		<category><![CDATA[AMD Performance Library]]></category>
		<category><![CDATA[Framewave]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://blogs.amd.com/developer/?p=362</guid>
		<description><![CDATA[ 
Developing libraries can be difficult, fun and interesting; an equally difficult task is testing the library and distributing it, so that other developers can use the library in their projects.  The big advantage of using libraries to accomplish certain functionalities is that libraries are already tested and optimized for various platforms.  For the libraries optimized [...]]]></description>
			<content:encoded><![CDATA[<p><strong> </strong></p>
<p>Developing libraries can be difficult, fun and interesting; an equally difficult task is testing the library and distributing it, so that other developers can use the library in their projects.  The big advantage of using libraries to accomplish certain functionalities is that libraries are already tested and optimized for various platforms.  For the libraries optimized for particular platforms, there needs to be a dispatch mechanism to select the best optimized path depending on the processor.  I have found that the build system from the Framewave library provides a good solution to accomplish this.</p>
<p> Derived from the <strong>AMD Performance Library</strong>, Framewave is a free of charge, open-source collection of popular image and signal processing routines designed to accelerate application development, debugging, multi-threading and optimization on x86-class processor platforms. This library has three paths of optimized code:  a reference code (c code) path, an SSE2 code path, and an SSE3 and F10H code path. One reason I found it interesting is because it is open-source; I can go through the code, understand it, and modify it as per my requirements, plus it has a single source bundle for four operating systems (Linux®, Mac, Windows®, and Solaris operating systems).</p>
<p> Framewave has a different implementation for each of the paths, and the Framewave build system takes care of combining them together and exposing a single signature. To achieve this, Framewave has a custom build system based on the SCons build tool (<span style="text-decoration: underline;"><a href="http://www.scons.org/">http://www.scons.org</a></span>). The advantage of using SCons is that it uses the Python scripting language for its configuration files.</p>
<p> Framewave has a single source bundle that is termed platform independent and is compiled using a single build system across all the platforms. The tool sets supported are GCC, MSVC, and Sun CC. This build system allows me to build 32/64-bit shared/static libraries with the ability to build either a debug or release version.</p>
<p> This build system picks up the file and compiles it <em>n</em> times, <em>n</em> being the number of optimized paths, producing <em>n</em> object files. These <em>n</em> object files are linked together to the stub function which is exported as the actual function. To understand the build system more, refer to the architecture description here: <a href="http://framewave.sourceforge.net/DesignDoc/FramewaveBuildSystem-Architecture.htm">http://framewave.sourceforge.net/DesignDoc/FramewaveBuildSystem-Architecture.htm</a></p>
<p> Producing one DLL file and having only one signature exported for each function is a better option than having multiple DLL files for each of the optimized code paths and then loading the particular DLL depending on the processor. The advantage of having one single large DLL file for the library is that I end up adding only one file to the <em>n</em> files present in my in project.</p>
<p> Overall this build system offers a unique way to bundle software that has different implementations for each processor.</p>
<p> I&#8217;d like to hear what you think.  Is this build system useful in your own work?  What do you like about it, what do you dislike about it?</p>
<p> Watch out for my next post on Using SCons for building the build system.</p>
<p style="font-style: italic;">The information presented in this blogis for informational purposes only and may contain technical inaccuracies, omissions and typographical errors, and may not represent AMD</p>
<img src="http://feeds.feedburner.com/~r/AmdDeveloperBlogs/~4/2vQOEJbnPlk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.amd.com/developer/2009/09/07/framewave-multipass-build-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.amd.com/developer/2009/09/07/framewave-multipass-build-system/</feedburner:origLink></item>
	</channel>
</rss>
