When do we need to
implement parallel processing?
You might have wondered from the previous post that how it
will improve the performance if we split the process. If I’m having only one
update statement and if I use set based processing for doing that, then where
really is my performance improved. It
will not be always good make a process do the processing in parallel. Sometimes
it may have negative performance gains. As per the example I stated above it
will be an overhead for the server to send 5 sql statements instead of one. So
when do I need to make the process parallel? Below are some scenarios which you
can think of introducing parallel processing.
1.
My process is updating/ creating millions of
rows in single run.
2.
There is a possibility that multiple users will
run my process at the same time for same transaction. This can happen if the
same process is available in batch and online mode. In this case if one person
runs in batch mode and one person runs online for the same transaction, one of
my processes may error out or updates the tables with wrong data. Also there
can be a chance where two users running the same process with same runcontrol
parameters at the same time.
3.
The transaction data to be processed is present
in multiple tables and I do the processing by importing relevant data to a
intermediate (temporary) table. With almost all the real process which does
bulk processing this is applicable. The data required for processing may be
scattered across different tables. I then need to query each individual table
and select the relevant data and put that into a common temporary table and from
there I do the processing. In the salary example, this scenario will come up if
you are increasing the salary of your employees based of different rules such
as (a) the percentage increase depends on your designation (b) percentage
increase depends on your experience (c) percentage increase depends on your performance
rating and so on.
4.
You are doing row by row processing in your
application engine program. There can be scenarios where you cannot do all your
processing in set based manor. In such cases implementing parallel processing
is the best option. Since the time required for processing is directly coupled
to the number of rows, the more row you have to process, the more time it is
going to take. So divide your data into logical set and run the process in parallel.
It will reduce the number of rows for each individual process instance and
thereby the processing time also gets reduced.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.