Mar
One approach for increasing your application performance in Lucee is to take advantage of multi-threading - this allows you to improve throughput and take full advantage of multi-core processors. In this part 1 we will take a look at the concept of "parallelism" and how it can work in Lucee. In part 2 we will cover a couple of potential pitfalls, and how correct concurrent code can solve them.
While the terms Concurrent and Parallel are used many times interchangeably, they do not mean exactly the same thing. According to Brian Goetz [1], Java Language Architect at Oracle and author of the book “Java Concurrency in Practice”, concurrency describes the ability of a program to access shared resources in a correct and efficient manner, while parallelism describes its ability to utilize more resources in order to solve a problem faster. Writing correct concurrent code is difficult and error prone; making parallel code correct is comparatively simpler and safer.
Lucee makes parallelism simple with the use of the Each()
[2] built-in function, and its "cousins" ArrayEach()
, StructEach()
, and QueryEach(),
and their respective member methods, which all take the arguments parallel
and maxThreads
. The concept is simple: pass a collection and a closure to Each()
, and the closure will be called on each of the elements in the collection. Pass true for parallel, and the closure will be called in parallel by multiple threads, which will be joined at the end.
Let's take a simple example of a function that takes an element and processes it. For the sake of simplicity, we'll use sleep of a random interval in order to simulate a slow process like a file, network, or database operation:
tc = getTickCount(); function process(element) { var calledAt = getTickCount() - tc; sleep(randRange(80, 120)); // simulate a slow process var completedAt = getTickCount() - tc; echo("<br>element: #arguments.element#; called-at: #calledAt#; completed-at: #completedAt#"); }
Now let's say that we have a collection with 20 elements that we want to process. Let's first build an array of elements:
elements = []; for (i=1; i<=20; i++){ elements.append(i); }
And now that we have that collection, let's call process() on each element using the built-in function Each():
each(elements, process);
That produces an output like so:
element: 1; called-at: 0; completed-at: 105 element: 2; called-at: 105; completed-at: 204 element: 3; called-at: 204; completed-at: 323 element: 4; called-at: 323; completed-at: 427 element: 5; called-at: 427; completed-at: 540 element: 6; called-at: 540; completed-at: 637 element: 7; called-at: 637; completed-at: 720 element: 8; called-at: 720; completed-at: 813 element: 9; called-at: 813; completed-at: 911 element: 10; called-at: 911; completed-at: 1017 element: 11; called-at: 1017; completed-at: 1102 element: 12; called-at: 1102; completed-at: 1184 element: 13; called-at: 1184; completed-at: 1295 element: 14; called-at: 1295; completed-at: 1377 element: 15; called-at: 1377; completed-at: 1490 element: 16; called-at: 1490; completed-at: 1598 element: 17; called-at: 1598; completed-at: 1711 element: 18; called-at: 1711; completed-at: 1795 element: 19; called-at: 1795; completed-at: 1895 element: 20; called-at: 1895; completed-at: 1992 Set completed in 1,992ms;
As you can see, each element was called-at immediately after the completed-at of the previous element. So process() was called on element 2 at 105ms, which is exactly when the call on element 1 was completed-at. The total time was just shy of 2 seconds, which makes sense since we called sleep(~100ms) 20 times.
So now let's use multiple threads to process these elements. After all, if the processing of element 1 is sleeping, or waiting for some external resource, then there's no reason to wait until it completes before starting to process element 2. We will now call Each() and pass true
to the parallel argument, which will use the default value of 20 for maxThreads:
each(elements, process, true);
The output produced now is:
element: 1; called-at: 1; completed-at: 118 element: 2; called-at: 1; completed-at: 101 element: 3; called-at: 1; completed-at: 113 element: 4; called-at: 1; completed-at: 114 element: 5; called-at: 2; completed-at: 119 element: 6; called-at: 2; completed-at: 98 element: 7; called-at: 2; completed-at: 97 element: 8; called-at: 2; completed-at: 98 element: 9; called-at: 2; completed-at: 94 element: 10; called-at: 2; completed-at: 87 element: 11; called-at: 2; completed-at: 105 element: 12; called-at: 2; completed-at: 117 element: 13; called-at: 2; completed-at: 108 element: 14; called-at: 2; completed-at: 115 element: 15; called-at: 2; completed-at: 93 element: 16; called-at: 3; completed-at: 111 element: 17; called-at: 3; completed-at: 108 element: 18; called-at: 3; completed-at: 115 element: 19; called-at: 3; completed-at: 107 element: 20; called-at: 3; completed-at: 105 Set completed in 119ms;
This time, processing of element 2 did not wait for element 1 to complete. You can see that they were called at about the same time. In fact, since we used 20 threads, with 20 elements, no element waited for the previous one. Therefore, the whole operation completed in 119ms, which was the longest single processing time (element 5 in this case). Our code ran almost 20 times faster! (it's actually 16 times, but the sleep times are random so the two executions can not be compared exactly).
But sometimes you can't just use 20 threads. For example, if you try to send 20 emails at the same time through a server that is configured to only accept 10 concurrent connections from a single user, the first 10 emails will be processed successfully, but then a bunch of others will be rejected until emails in the first batch will complete and connections will become available.
In those cases, you have to set the maximum number of threads by passing the maxThreads argument. To avoid opening too may connections at the same time to our theoretical email server, let's call Each() with a value of 10 for maxThreads:
each(elements, process, true, 10);
And see what happens now:
element: 1; called-at: 1; completed-at: 89 element: 2; called-at: 1; completed-at: 109 element: 3; called-at: 1; completed-at: 91 element: 4; called-at: 1; completed-at: 108 element: 5; called-at: 1; completed-at: 91 element: 6; called-at: 1; completed-at: 106 element: 7; called-at: 2; completed-at: 112 element: 8; called-at: 2; completed-at: 90 element: 9; called-at: 2; completed-at: 93 element: 10; called-at: 2; completed-at: 97 element: 11; called-at: 89; completed-at: 193 element: 12; called-at: 90; completed-at: 170 element: 13; called-at: 91; completed-at: 200 element: 14; called-at: 91; completed-at: 197 element: 15; called-at: 93; completed-at: 183 element: 16; called-at: 97; completed-at: 214 element: 17; called-at: 106; completed-at: 194 element: 18; called-at: 108; completed-at: 217 element: 19; called-at: 109; completed-at: 225 element: 20; called-at: 112; completed-at: 200 Set completed in 225ms;
Elements 1 through 10 were called at about the same time (1 or 2 milliseconds after execution started), while elements 11 through 20 were called after each call to process() completed and a thread became available. So element 11 started processing at 89ms, just after element 1 completed; element 12 started at 90ms, just after element 8 completed; and so on.
The whole operation completed in 225ms, which is about 10 times faster than linear execution, and is exactly what we were hoping for.
This example did not use any shared resources between the threads so it is inherently thread-safe and falls under the parallelism paradigm, but in this companion blog post we look at concurrency and see why it is important to coordinate access to shared objects when multiple threads are involved.
[1] https://www.ibm.com/developerworks/library/j-java-streams-4-brian-goetz/index.html
[2] http://docs.lucee.org/reference/functions/each.html
Social Media
FOLLOW US