Many academic professors talk about parallelism in the context of HDL languages. Having learned both Verilog and Pthreads, I have always felt that we can use some of the lessons learned in hardware (which is inherently parallel) in Parallel programming. ParC is based on this insight and an impressive piece of work. I learned about ParC through Kevin Cameron’s comments on Future Chips. After some (healthy) debate with Kevin, I felt that ParC is a concept we should all be familiar with. I am hoping that Kevin’s insights will trigger some interesting debate.
When people say parallel programming is hard they are correct, but to say it is a new problem would be wrong. Hardware designers have been turning out parallel implementations of algorithms for decades. Back in the 1980s designers moved up from designing in gates to using RTL descriptions of circuits with synthesis tools for the hardware description languages (HDLs) Verilog and VHDL. In recent years other methodologies like assertion-driven verification and formal methods have been added to help get chips working at first Silicon.
HDLs are also used for simulation, where circuits elements are modeled by threads that communicate by events (signals on wires). So the thread handling capacity of HDLs is much higher than an approach which uses pThreads or Quickthreads with C++ for the same purpose (e.g. the SystemC C++ class library).
So why wouldn’t you just use an HDL for your parallel programming? Well a) it’s not programmer friendly, b) most of the associated methodology is for “synchronous” design which uses clocks, and c) HDLs describe static things (fixed hardware) that is not modifiable at runtime. However, some people do use HDLs to get code onto FPGAs for acceleration.
The hardware design methodology (RTL) can be considered as “synchronous finite-state-machines” where an algorithm is described as multiple FSMs communicating based on a global time signal. The next level up in abstraction for hardware design is “asynchronous finite-state-machines” where the FSMs send (self timed) messages to each other and there is little or no global synchronization, the latter approach is also known as CSP (Communicating Sequential Processes) in programming circles. The CSP computing paradigm works well for any non-shared memory platform – from embedded systems and hardware, and through to the Cloud (distributed computing).
ParC is a pure extension of C++, so as a programmer you do not lose anything, what it adds is the threading model used in HDLs and the associated event-driven syntax and semantics. In addition there is a low-level “pipe” construct so that you can do asynchronous description more easily than in HDLs. The current ParC compiler is just a translator to regular C++. It’s not hard to imagine some existing constructs in C/Linux like the interrupt handling that would be much easier to handle as signals with events than the existing sigmask/signal approach or select for I/O processing.
The signal and pipe communication mechanisms in ParC are “opaque”, meaning that reading and writing threads cannot see into how they work, so they can be implemented efficiently on all platforms. If the threads avoid using shared memory then it is (in theory) easy to move them from processor to processor, so the runtime system can optimize itself.
The justification for adopting ParC is that it supports existing code and provides a route to programming “many-core” machines in a platform independent manner, so rather than having code that only works on some GP-GPUs, it should work on any machine. Using it for hardware description makes life a lot easier for people developing hardware interfaces, and allows the same tools to be used for hardware verification and software verification more easily, i.e. you have to use formal and semi-formal static or dynamic analysis tools to have any hope of getting your SoCs to work, but if you build the tools to work with ParC they will work for both hardware and software (so you have a broader market and more support).
The code example “async life” – http://parallel.cc/cgi-bin/bfx.cgi/test_cases.html - creates a 2-D 1800 thread array, and demonstrates a lot of the features of the language, there is an accompanying video of it compiling and running and being debugged with ddd/gdb. There is a specification document for the language, but it’s still under development so I’m refraining from publishing it at the moment.
A large part of the work to be done in developing ParC is infrastructure that would support the same features in other languages e.g. a parallel Java (PJ?). It is also a functional superset of Verilog and VHDL so it should make it easier to fold those languages into the mainstream – at the moment there are few open source projects in that area (of gcc quality), and the plethora of unfixed “features” in HDLs was part of the motivation for ParC in the first place.