The File - Nov 1, 2008 - (Page 9)

In Focus | Multicore/multiprocessor design Optimise software for multi-core processors continued from page • • for them to complete their work. Pipelining—The developer breaks a computation down into steps, where a different thread handles each step. Different data sets will exist in different stages of a pipeline. Peer— A thread creates other threads and participates as a peer, sharing the work. Let us consider an example of how worker threads can help implement parallelism in a quadcore system. In code listing 1, a single-threaded function called fill_array() updates a large, twodimensional array. The function simply iterates through the array, updating each element. Since the function updates the elements independently (element N value does not depend on N-1 element), it is easily made parallel. To increase the speed of fill_ array(), the developer can create a worker thread for each CPU and have each worker thread update a portion of the array. Code listing 2 shows how the developer can create these threads with the POSIX pthread_create() call. To specify the entry point for a thread (where the thread starts running), the developer passes a function pointer to pthread_create(). In this case, each worker thread starts with the fill_ar- ray_fragment() function, shown in code listing 3. Each worker thread determines which portion of the array it should update, ensuring no overlap with other worker threads. All worker threads can then proceed in parallel, taking advantage of SMP’s ability to schedule any thread on any available CPU core. The main thread must wait for the array to be fully updated before performing additional operations. To ensure that the main thread waits, this example uses barrier synchronisation, pthread_ barrier_wait(), in which any thread that has finished its work waits at a barrier until all other threads arrive at the barrier. Now that the software is converted to a multi-threaded approach, it can scale with the number of CPUs. The example uses a four-CPU system, but it is easy to adjust the number of worker threads to accommodate more or less processors. Read the full article to know what visualisation tools you can use for optimising software. In addition, learn about detection and reduction of resource contention, and more. Online | void multi _ thread _ fill _ array() { int thread, rc; pthread _ t worker _ threads[NUM _ CPUS]; int thread _ index[NUM _ CPUS]; // Sync 5 threads, main + threads 2-5 pthread _ barrier _ init(&barrier, NULL, NUM _ CPUS+1); for (thread = 0; thread < NUM _ CPUS; ++thread) { thread _ index[thread] = thread; rc = pthread _ create(&worker _ threads[thread], NULL, &fill _ array _ fragment, (void *)&thread _ index[thread]); if (rc) { // handle error } } pthread _ barrier _ wait(&barrier); pthread _ barrier _ destroy(&barrier); return; } Code listing 2: Main thread creates worker threads to update the array in parallel and creates synchronisation objects. void *fill _ array _ fragment(int *thread _ index) { int col = 0; int start _ row = 0; int end _ row = 0; // Compute the rows to update by this thread start _ row = *thread _ index * (NUM _ ROWS/NUM _ CPUS); end _ row = start _ row + (NUM _ ROWS/NUM _ CPUS) - 1; while (start _ row <= end _ row) { for (col = 0; col < NUM _ COLUMNS; col++) { array[start _ row][col] = ((start _ row/2 * col) / 3.2) + 1.0; } ++start _ row; } //Wait at barrier for all threads to complete pthread _ barrier _ wait(&barrier); return NULL; } Apply parallel programming basics to multi-processor designs Apply parallel programming basics to multi-processor designs, Part 2 Code listing 3: Each worker thread updates a portion of the array, then waits at the barrier. Managing threads, communications in multi-core partitioning continued from page the right software partitioning for their apps. Think non-sequentially Key to this effort, however, is writing the applications code to be multi-threaded from the outset, but this is still a challenge for many developers. “It takes a long time for developers to get used to the consequences of threads interacting in multi-threaded design on a single processor,” concurred Mark Zarins, vice president of products at GrammaTech. “The consequences of thread interaction are worse in multi-processing,” he added. This challenge has prompted at least one multi-core processor vendor to take an entirely different approach: Ambric has its customers develop software for the company’s massively parallel processor architecture (MPPA) using a structural object programming model. This method has developers describe their applications in block-diagram form comprising sets of software objects that link together. The company’s tools then map each software object to its own processor in an MPPA array with hundreds of processors connected through self-synchronising communications channels. Making the partitioning inherent in the software’s structure eliminates the need to write code before partitioning. Regardless of approach, developers must face the challenges of multi-core development sooner or later. “This is a natural progression,” said Brehmer. Online Embedded multi-core enables greener networks Commentar y: Multicore needs communications by design EE Times-India | November 1-15, 2008 | www.eetindia.com http://www.embeddeddesignindia.co.in/ART_8800549679_2800001_TA_2f060283.HTM?ClickFromNewsletter_081101 http://www.eetindia.co.in/SEARCH/SUMMARY/technical-articles/POSIX.HTM http://www.embeddeddesignindia.co.in/article/sendInquiry.do?articleId=8800549679&catId=2800001?ClickFromNewsletter_081101 http://www.embeddeddesignindia.co.in/article/emailToFriend.do?articleId=8800549679&catId=2800001?ClickFromNewsletter_081101 http://www.embeddeddesignindia.co.in/ART_8800544577_2800001_TA_0862e3a1.HTM?ClickFromNewsletter_081101 http://www.embeddeddesignindia.co.in/ART_8800544578_2800001_TA_7fb2447c.HTM?ClickFromNewsletter_081101 http://www.eetindia.co.in/ART_8800545407_1800001_NT_5c163617.HTM?ClickFromNewsletter_081101 http://www.eetindia.co.in/ART_8800543368_1800000_NT_9271c232.HTM?ClickFromNewsletter_081101 http://www.eetindia.com/STATIC/REDIRECT/Newsletter_081101_EETI02.htm?ClickFromNewsletter_081101

Table of Contents for the Digital Edition of The File - Nov 1, 2008

EETimes India - November 1, 2008
Contents
National Semiconductor
Managing Threads, Communications in Multicore Partitioning
Texas Instruments
DigiKey
Improve Multi-core Hypervisor Efficiency
NECE 2008, Power India 2008, CSF: Electronics & Components, Wind India 2008, User2User 2008 India, National Conference On E-Governance

The File - Nov 1, 2008