Concurrent Programming: A Solution For Multi-core Era?

By Rajesh Karmani, published on April 21, 2008 at 7:40 PM
Source: Tom's Guide US | Keywords: , , | Themes: Software, Desktop Computers
Syndication: Add to your Google homepage Add to My Yahoo!

 

Until now, programmers have been looking up to Moore’s law to speed-up their sequential program with each subsequent processor. But due to the physical limitations in chip design, the programmers’ "free lunch" is over: Multi-core architectures - multiple CPUs on a single chip - bring improved power efficiency, while each of the cores could be slower than the latest unicore processor when running traditional applications. This not only means that the acceleration of sequentially written applications may stagnate - they might even slow-down1. Rajesh Karmani discusses challenges, dead-ends and possible directions for developers.

One important question, of course, is what kinds of applications we can and need to parallelize. How much can we parallelize our word processor or the virus scanner? Scientific applications, simulations, numerical computations have been the focus of parallel computing community for some time. But as parallel platforms (like multi-core CPUs) become mainstream, mainstream applications and their programmers have to rise to the challenge. It is an interesting discussion in itself and demands a separate article which I’ll leave aside for later.

Unfortunately, the problem gets worse when chips with hundreds of cores (sometimes called manycore) are rolled out, as is being widely predicted. Two potential consequences for programmers are:

1) Programs should continue to scale (run faster) with the increasing number of cores.
2) Programs based on shared memory model will suffer serious bottlenecks on memory bus.

Shared-memory concurrent programming (in Java)

Java is a widely-used programming language. It supports concurrent programming through threads and it seems that Java’s concurrency features are an additional feature instead of setting out with the goal of being an elegant concurrent programming language. This shows up in quite a few of its constructs. For example, consider the following statement in Java:

....
x++;
....

Depending on whether x is a local variable or class/instance variable, this statement gets compiled into 1 or 3 bytecode instructions. Crucially, in case x is an instance variable, the sequence of 3 instructions read the value from the heap, increment it, and write it back to the heap. If the code gets called from two different threads, there is a small window such that one increment is observed instead of two increments, which is incorrect. Of course, they told us in their specification what can go wrong, but I am arguing for constructs and abstractions which are simple, intuitive and elegant.

Similarly, the locking mechanism in Java is bug-prone as illustrated by a commonly referred example. This piece of code was part of the StringBuffer class:

public synchronized StringBuffer append (StringBuffer sb)

int len = sb.length();
...
...
sb.getChars(0,len,value,count);
...

Although the method is declared as synchronized, another thread can modify ’sb’ object after the first statement such that its length decreases to less than ’len’. Therefore, when getChars is called on ’sb’ with the old value of length, the program raises an exception.

Once again, the semantics of synchronized methods in Java are not intuitive enough. On entering a synchronized method, the runtime obtains a lock for the object on which the method is being called. These semantics guarantee isolation of this block of code (method body) with respect to other blocks capturing lock on the same object. This raises multiple problems; what if the programmer forgets to obtain the lock at some place and what about the consistency of other objects (’sb’ here) being accessed (and modified) inside the synchronized method.

In order to fix this code, programmer should have obtained another lock on ’sb’ by using the synchronized block construct:

public synchronized StringBuffer append (StringBuffer sb)

synchronized (sb)

int len = sb.length();
...
...
sb.getChars(0,len,value,count);
...

But things are not as simple as they seem. Such an implementation could result in deadlock. Consider the following two threads:

There is a window where Thread 1 obtains the lock for sbObj and Thread 2 for sb. Each of the threads will keep waiting for the other thread to release the lock for respective object before it can proceed forward. In fact, the deadlock could occur trivially without the modification since the methods length() and getChars() are declared as synchronized in StringBuffer. For a more detailed discussion on this problem, see references 6 and 7 below.

To conclude, shared memory concurrent programs can have data races, which are conflicting accesses to shared variables. Low-level mechanism such as locks, semaphores provided in languages like C/C++/Java to prevent data-races put too much burden on programmers and yet can cause errors like deadlocks. Reference 8 details the problems with shared memory programs.

Read on the next page: Software Transactions, Conclusion

Software Transactions

Software transactions is an elegant mechanism proposed in order to reduce the complexity of concurrent programs in shared memory models. The idea is to annotate the blocks of code that are required to execute in isolation with the keyword atomic or transaction. The onus is on the compiler or run-time to obey the semantics using one of the few different mechanisms.

A pessimistic approach is to replace the atomic construct with locks acquisition statements that guarantee that there is no deadlock. Although it reduces the burden on programmer by taking care of the locks, it has certain disadvantages:

1) The approach is based on static code analysis, and tends to over-approximate the locks it needs to obtain. Hence, it restricts the potential parallelism in application.
2) It cannot guarantee isolation of atomic block with respect to non-atomic blocks. So, if the programmer forgets to annotate atomic, it can cause inconsistencies.
3) Composition of compiled code can still cause deadlocks.

Another approach taken by researchers is based on the conjecture that conflicts are timing-dependent and are usually rare in practice. Therefore, they execute the atomic blocks optimistically and before committing the results to shared memory, validate if there was a conflict with some other code. This approach is sometimes called TM. It can be implemented either in software (STM), hardware (HTM) or both (Hybrid). In case, a conflict is detected, the transaction is rolled back and executed again. Although, TM can provide more speed-ups in some cases, it can perform poorly in others. Some of its disadvantages are:

1) It’s difficult to roll back system and I/O calls. Hence these can’t be part of the atomic blocks.
2) It can perform poorly in case of frequent conflicts due to repeated roll backs.
3) There is a very high overhead in order to ensure isolation with respect to non-atomic code. In fact, with misunderstood constructs (such as increment operator discussed above) in the language, the behavior of the program is not very intuitive.

Software Transactions is an active research area due to the simplicity of its semantics but lot of questions remain unanswered at the implementation level. A good introduction can be found at 3,5. On the other hand, a heated debate on STM versus other models took place at Ltu4.

Conclusion

Mainstreaming of concurrent programming has woken up language designers. Efforts such as software transactions and library support in Java5 and Intel’s TBB are elegant workarounds, but essentially I would argue they are just patch-works. As multi-cores follow Moore’s law, the number of cores on a chip will grow exponentially and there will be serious bottlenecks at the shared memory bus. I believe it is not feasible to program based on a shared memory model. Moreover, in order to continue their "free lunch" in multicore era (scalability with number of cores instead of GHz), developers may well have to jilt their current love. But there is plenty of optimism, and we are yet to discuss message-passing models :-)

References:
[1] http://www.codinghorror.com/blog/archives/000942.html
[2] The Java Memory Model. Jeremy Manson, William Pugh and Sarita Adve. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2005. Long Beach, California. January, 2005.
[3] http://www.acmqueue.org/modules.php?>[4] http://lambda-the-ultimate.org/node/2048
[5] "Beautiful concurrency" Simon Peyton Jones. This is a chapter for the book "Beautiful code", edited by Greg Wilson, O’Reilly 2007.
[6] http://bugs.sun.com/bugdatabase/view_bug.do?bug_>[7] A. Williams, W. Thies, and M. D. Ernst. Static deadlock detection for Java libraries. In Proc. 2005 European Conference on Object-Oriented Programming (ECOOP) Lecture Notes in Computer Science. Springer-Verlag, July 2005.
[8] The Problem with Threads. Edward A. Lee. EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2006-1

Disclaimer: The views expressed in the article are author’s personal views and do not represent those of University of Illinois, or the UPCRC at the University of Illinois.

About the author: The author is a graduate student in the Department of Computer Science at University of Illinois at Urbana-Champaign. He is a recipient of the Sohaib and Sarah Abbasi Fellowship. His current area of interest is programming languages and software engineering. Previously, he has worked in wireless sensor networks and multi-agent systems.

Share your thoughts by writing a comment below!

Comments | Print | Send to a friend

Sponsored links

Comments

StealthMonkey27 04/22/2008 4:17 AM
Hide
-0+

LabVIEW is an inherently parallel programming language! Programs written in LabVIEW will thread across multiple cores as they become available.

chaosmstr 04/22/2008 7:02 AM
Hide
-0+

I've done a very small amount of programming. I admit to being a hardware, hands on kind of guy.

Concurrent programming sets my brain spinning. Having ADD makes things both harder and easier. I can think of several things at once, but keeping track of it all is another matter.

Multi-channel programming is a requirement for todays hardware. No argument there at all. I have difficulty in seeing how the average person (and I know that programmers aren't average people) can think beyond the "normal" linear mode of thinking.
"If x then y else z" is easy to see.
"If x then y unless y = q, q being processed while x was checked for y, and then lets calc z*x=y-q while q is being modified by x, or z..." I get lost just trying to figure out how to describe it!

I am GLAD that there are people who can program in a non-linear fashion.. but personally, I have to see it in a linear manner. Which todays hardware just laughs at anyway. If the language people can come up with something to allow linear people to program concurrently, it would be a wonderful thing.. but all I can see are two linear programs running on diff processors!

Ack.. I'm so confused.

chaosmstr 04/22/2008 7:02 AM
Hide
-0+

I've done a very small amount of programming. I admit to being a hardware, hands on kind of guy.

Concurrent programming sets my brain spinning. Having ADD makes things both harder and easier. I can think of several things at once, but keeping track of it all is another matter.

Multi-channel programming is a requirement for todays hardware. No argument there at all. I have difficulty in seeing how the average person (and I know that programmers aren't average people) can think beyond the "normal" linear mode of thinking.
"If x then y else z" is easy to see.
"If x then y unless y = q, q being processed while x was checked for y, and then lets calc z*x=y-q while q is being modified by x, or z..." I get lost just trying to figure out how to describe it!

I am GLAD that there are people who can program in a non-linear fashion.. but personally, I have to see it in a linear manner. Which todays hardware just laughs at anyway. If the language people can come up with something to allow linear people to program concurrently, it would be a wonderful thing.. but all I can see are two linear programs running on diff processors!

Ack.. I'm so confused.

chaosmstr 04/22/2008 7:04 AM
Hide
-0+

grr. sorry for the double post.

wonderview 04/22/2008 6:43 PM
Hide
-0+

If you want parallel language, go with VHDL.
It designed to describe hardware, which is parallel in nature ( as well as your brain ).
In order to write in VHDL, you would have to shitch your way of thinking - from writing programs for
Von Neumann architecture, to designing hardware ( like drawing schematics ).

It might seems extreme, but I'm not kidding - the apparent problem with parallel programming in
procedural lenguages ( like c/C++ ) is due to the hardware architecture designed for procedural lenguages,
that has sequential execution at it's hart.


example of VHDL code ( concurrent assignments ):
out1

mr_fnord 05/14/2008 8:52 PM
Hide
-0+

Get real. You're not going to use concurrent processing at this minute level of detail. The cross-core cache access alone blows away all benefits on small tasks. in the x++ example, you're not going to have two cores executing two of these commands in order because the 2 clocks it takes to do this on one core is much faster than the 20 clocks it takes for core 2 just to access the contents of core 1's cache.

Don't expect a 100 line code-block to run on 4 cores. Now, using good multi-threaded design, you can have a few dozen 1000 line code blocks running on a bunch of cores, and use standard multi threaded messaging between them for the areas of memory that need to move between threads.

Highly procedural code will not run in a highly parallel fashion. Programmers need to start thinking in parallel, not procedural fashion. The framework already exists in pretty much every commonly used language.

robojocks 05/17/2008 9:41 AM
Hide
-0+

I think it can't happen fast enough. I would love to have the networking off loaded to one core. The OS locked to one core and the other cores for the user. This model would hopefully make the computer more stable.

robojocks 05/17/2008 10:00 AM
Hide
-0+

The other thing I hopefully would see is more visual programming the is actually visual with each potential parallel program code thing shown visually. I hope microsoft or Xsim whatever gets cracking on it.

The other thing which most people already know is going to happen is million core processors aka worm brain :D. Did anyone read about that human brains are shrinking and dogs brains are getting bigger and that they think at sometime in the future the dogs will take over the planet?

Comments are closed on this page.

Sponsored links