Multicore processors have been around for many years, and today, they can be found in most devices. However, many developers are doing what they've always done: creating single-threaded programs. They're not taking advantage of all the extra processing power. Imagine you have many tasks to perform and many people to perform them, but you are using only one person because you don't know how to ask for more. It's inefficient. Users are paying for extra power, but their software is not allowing them to use it.
Multiple-thread processing isn't new for seasoned C# developers, but it hasn't always been easy to develop programs that use all the processor power. This article shows the evolution of parallel programming in C# and explains how to use the new Async paradigm, introduced in C# version 5.0.
What Is Parallel Programming?
Before talking about parallel programming, let me explain two concepts closely related to it: synchronous and asynchronous execution modes. These modes are important for improving the performance of your apps. When you execute a program synchronously, the program runs all tasks in sequence, as shown in Figure 1. You fire the execution of each task, and then wait until it finishes before firing the next one.
Figure 1. Synchronous execution
When executing asynchronously, the program doesn't run all tasks in sequence: it fires the tasks, and then waits for their end, as shown in Figure 2.
Figure 2. Asynchronous execution
If asynchronous execution takes less total time to finish than synchronous execution, why would anybody choose synchronous execution? Well, as Figure 1 shows, every task executes in sequence, so it's easier to program. That's the way you've been doing it for years. With asynchronous execution, you have some programming challenges:
- You must synchronize tasks. Say that in Figure 2 you run a task that must be executed after the other three have finished. You will have to create a mechanism to wait for all tasks to finish before launching the new task.
- You must address concurrency issues. If you have a shared resource, like a list that is written in one task and read in another, make sure that it's kept in a known state.
- The program logic is completely scrambled. There is no logical sequence anymore. The tasks can end at any time, and you don't have control of which one finishes first.
In contrast, synchronous programming has some disadvantages:
- It takes longer to finish.
- It may stop the user interface (UI) thread. Typically, these programs have only one UI thread, and when you use it as a blocking operation, you get the spinning wheel (and “not responding” in the caption title) in your program-not the best experience for your users.
- It doesn't use the multicore architecture of the new processors. Regardless of whether your program is running on a 1-core or a 64-core processor, it will run as quickly (or slowly) on both.
Asynchronous programming eliminates these disadvantages: it won't hang the UI thread (because it can run as a background task), and it can use all the cores in your machine and make better use of machine resources. So, do you choose easier programming or better use of resources? Fortunately, you don't have to make this decision. Microsoft has created several ways to minimize the difficulties of programming for asynchronous execution.
Asynchronous Programming models in Microsoft .NET
Asynchronous programming isn't new in Microsoft .NET: it has been there since the first version, in 2001. Since then, it has evolved, making it easier for developers to use this paradigm. The Asynchronous Programming Model (APM) is the oldest model in .NET and has been available since version 1.0. Because it's complicated to implement, however, Microsoft introduced a new model in .NET 2.0: the Event-Based Asynchronous Pattern (EAP). I don't discuss these models, but check out the links in “For More Information” if you're interested. EAP simplified things, but it wasn't enough. So in .NET 4.0, Microsoft implemented a new model: the Task Parallel Library (TPL).
The Task Parallel Library
The TPL is a huge improvement over the previous models. It simplifies parallel processing and makes better use of system resources. If you need to use parallel processing in your programs, TPL is the way to go.
For the sake of comparison, I'll create a synchronous program that calculates the prime numbers between 2 and 10,000,000. The program shows how many prime numbers it can find and the time required to do so:
GitHub - synchronous program code sample
This is not the best algorithm for finding prime numbers, but it can show the differences between approaches. On my machine (which has an Intel® Core™ i7 3.4 GHz processor), this program executes in about 3 seconds. I use the Intel® VTune™ Amplifier to analyze the program. This is a paid program, but a 30?day trial version is available (see “For More Information” for a link).
I run the Basic Hotspots analysis in the synchronous version of the program and get the results in Figure 3.
Figure 3. VTune™ analysis for the synchronous version of the Prime Numbers program
Here, you can see that the program took 3.369 seconds to execute, most of which was spent in IsPrimeNumber (3.127 s), and it uses only one CPU. The program does not make good use of the resources.
The TPL introduces the concept of a task, which represents an asynchronous operation. With the TPL, you can create tasks implicitly or explicitly. To create a task implicitly, you can use the Parallel class-a static class that has the For, ForEach, and Invoke methods. For and ForEach allow loops to run in parallel; Invoke allows you to queue several actions in parallel.
This class makes it easy to convert the synchronous version of my program into a parallel one:
GitHub - Parallel program code sample
The processing is broken into 10 parts, and I've used Parallel.For to execute each part. At the end of the processing, the counts of the lists are summed and shown. This code is similar to the synchronous version. I analyze it with the VTune Amplifier and get the results in Figure 4.
Figure 4. VTune™ analysis for the parallel version of the Prime Numbers program
The program executes in 1 second, and all eight processors in my machine are used. I have the best of both worlds: efficient usage of resources and ease of use.
You could also create the tasks explicitly and use them in the program, with the Task class. You can create a new Task and use the Start method to start it or use the more streamlined methods Task.Run and Task.Factory.StartNew, which create and start a task, respectively. You can create the same parallel program by using the Task class with a program like this one:
GitHub - Task program code sample
Task.WaitAll waits for all tasks to finish; only then does it continue the execution. If you analyze the program with the VTune Amplifier, you get a result similar to the parallel version.
Parallel Linq (PLINQ) is a parallel implementation for the LINQ query language. With PLINQ, you can transform your LINQ queries into parallel versions simply by using the AsParallel extension method. For example, a simple modification to the synchronous version improves the performance a great deal:
GitHub - PLINQ program code sample
Adding AsParallel to Enumerable.Range changes the sequential version to a parallel version of the query. If you run this version, you see a great improvement in the VTune Amplifier (Figure 5).
Figure 5. VTune™ analysis for the PLINQ version
With this simple change, the program runs in 1 second and uses all eight processors. However, there is a catch: the position of AsParallel interferes with the parallelism of the operation. If you change the line to:
. . . you won't see an improvement because the IsPrimeNumber method, which takes most of the processing time, won't be executed in parallel.
C# version 5 introduced two new keywords: async and await. Although it doesn't seem like a lot, the addition is a huge improvement. These keywords are central to asynchronous processing in C#. When you use parallel processing, sometimes you need to twist the execution sequence completely. Async processing restores the sanity of your code.
When you use the async keyword, you can write code the same way you wrote synchronous code. The compiler takes care of all the complexity and frees you to do what you do best: writing the logic.
To write an async method, follow these guidelines:
- The method signature must have the async keyword.
- By convention, the method name should end with Async (this is not enforced, but it is a best practice).
- The method should return Task, Task<T>, or void.
To use this method, you should wait for the result (i.e., use the await method). Following these guidelines, when the compiler finds an awaitable method, it starts to execute it and will continue the execution of other tasks. When the method is complete, the execution returns to its caller. The program to calculate the prime numbers with async becomes:
GitHub - Async program code sample
Notice that I have created a new method: ProcessPrimesAsync. When you use await in a method, it must be marked as async, and Main cannot be marked as async. That's why I created this method, which returns void. When Main executes the method, without the await keyword, it starts it but doesn't wait for finish. For that reason, I've added Console.ReadLine (or the program would end before the execution). The rest of the program is similar to the synchronous version.
Notice also that the primes variable is not a Task<List<int>> but a List<int>. This is a compiler trick so that I don't have to deal with Task to call an async method. With the await keyword, the compiler calls the method, frees resources until the method is complete, and-when the method returns-will transform the Task result into a normal result. When you call return in the method, you should not return Task<T> but the normal return value, as you would do in a synchronous method.
If you run this program, you will see that it doesn't run faster than the synchronous version because it has just one task. To make it run faster, you must create multiple tasks and synchronize them. You can do that with this change:
GitHub - Parallel async program code sample
With this new method, the program creates 10 tasks but doesn't wait for them. It awaits them in this line:
var results = await Task.WhenAll(primes);
The results variable is an array of List<int>: there are no tasks anymore. When I run this version in the VTune Amplifier, it shows that all tasks run in parallel (see Figure 6).
Figure 6. VTune™ analysis for the parallel Async version of the Prime Numbers program
The async and await keywords add a new twist to asynchronous processing in C#, but this change involves a lot more than what I've shown in this article. There are also cancellation of tasks, exception handling, and task coordination.
There are many ways to create a parallel executing program in C#. With multicore processors, there's no excuse for creating single-threaded programs: you won't be using the system resources, and you'll penalize your users with unneeded delays.
The improvements in the C# language with the async keyword restore sequential ordering in the code while efficiently using system resources. There are still a few issues to keep in mind, like concurrency, task synchronization, and cancellation, but these are minor compared with what you needed to create a good parallel program. If you learn and apply the techniques I describe here and start to create parallel programs, you will make better use of system resources-and have happier users.
About the Author
Bruno Sonnino is a Microsoft Most Valuable Professional (MVP) located in Brazil. He is a developer, consultant, and author having written five Delphi books, published in Portuguese by Pearson Education Brazil, and many articles for Brazilian and American magazines and websites.
For more such windows resources and tools from Intel, please visit the Intel® Developer Zone