|
SPEEDES 2.2
Metron and Northrop Grumman are pleased
to announce the release of SPEEDES version 2.2, a significant advance
in parallel discrete event simulation. This release is the
cumulation of two years of both minor and major enhancements and
lessons learned on the Missile Defense Wargame Analysis Resource
(MDWAR) project on which SPEEDES has supported real-time simulations
on 1 through 48 processors.
The enhancements, bug fixes and changes
were funded under a variety of contracts including both MDWAR and
International Programs (IP) at the Joint National Integration Center
(JNIC) and the Air Force Research Lab (AFRL) in Rome NY. The work to
test and release Version 2.2 was funded by Northrop Grumman
Corporation.
What's new
- Rollbackable Standard Template
Library (STL) containers. SPEEDES now provides rollbackable
implementations of several STL containers. These containers are:
- RB_map
- RB_multimap
- RB_vector
- RB_string
- RB_set
- RB_multiset
- RB_list
- RB_deque
- Each of these containers is fully
rollbackable (all methods on each container are rollbackable) and
support all the STL traits such as iterator, size_type,
value_type,... This means all the containers and iterators can be
passed as arguments to your compiler's optimized STL functions.
These containers also support being
instantiated with rollbackable classes.
That is, you can now have RB_list<RB_int> rather than having
to use pointers and manage your own memory. In addition, each
container has the additional trait of an RB_iterator which is a
rollbackable iterator for that container.
- Significant performance
enhancements. SPEEDES has had many changes over the last few years
to significantly improve the performance of all simulations. The
improvement will naturally vary but will likely be in the area of
15-20% for a significant simulation.
SPEEDES also supports a new time
management algorithm SequentialLite which is designed for
simulations which run on one processor, do not require any external
interaction, tracing, or event or object performance statistics.
Benchmarks are provided at the end of this document which detail the
performance improvements.
- RB_ostream is now a true stream.
The original implementation of the RB_ostream (RB_cout, RB_cerr, and
RB_fstream) was through the use of macros. While serviceable, this
did not give the full power of C++ stream. RB_ostreams are now true
C++ streams and provide the full support of C++ streams such as all
stream manipulators. RB_ostreams also now support a synchronized
mode where all output is sent to node zero and is sorted in order.
- Support for modern compilers.
SPEEDES now uses standard includes (iostream versus iostream.h) and
uses the standard namespace for all applicable classes and members
of the standard namespace. This, unfortunately, means some older
compilers are no longer supported. This includes the SGI IRIX 7.2.1
compiler.
This version of SPEEDES was formally
tested on Linux using g++ versions 2.96, 3.4.3, and a snapshot of
the future 4.1 compiler (4.0 is due out in 'early 2005'). Informal
testing was performed on Macintosh OSX with g++ 3.3, Linux with g++
3.3, SGI Altix with Intel icc 8.0. It is also believed that SGI
Irix with version 7.4 or later of the compiler will also be
successful.
This also means you may encounter
compilation problems if you do not use the standard namespace. As
usual, we have attempted to ensure that only the minimal includes
are done by SPEEDES and it is possible that your code may need to
have additional includes performed in order to compile with the new
SPEEDES.
- Alpha implementation of a shared
memory host router. This feature has only limited testing but
allows external modules to connect directly to SPEEDES executables
through shared memory rather than through TCP/IP which could yield
significant performance improvements.
- Alpha implementation of clustering
functionality. This functionality allows different SPEEDES
executables to connect optimistically while allowing events and
object proxies to be sent across clusters. This initial
implementation only supports clusters to be connected with TCP/IP.
Future implementations should support shared memory connections.
- Many, many minor enhancements and
bug fixes. For a full list of all changes, please request a copy of
the Version Description Document which details all changes to the
source code along with the reasons for each of those changes.
Benchmarks
Benchmark results for SPEEDES version
2.2 versus SPEEDES version 2.1. All tests were executed on a dual
2.5ghz Power Macintosh. The GNU g++ compiler version 3.3 compiler
was used with static libraries for all executables. 5 runs were made
with both the high and low result real time discarded and the
remainder were averaged with times in seconds. Unless specified
otherwise, all data collection for simulation object processing time,
event processing time and critical path calculation was disabled.
The one test of the sequential algorithm uses the new SequentialLite
algorithm for SPEEDES 2.2 and the old sequential algorithm for
SPEEDES 2.1
|
Test
|
# nodes
|
SPEEDES 2.1
|
SPEEDES 2.2
|
Percent Faster
|
|
One object, self scheduling event, 3,000,000 events
|
1
|
6.89
|
2.67
|
158
|
|
One object, self scheduling event, object & event
statistics collected, 3,000,000 events
|
1
|
22.63
|
17.23
|
31
|
|
Ping pong, two players, 1,000,000 volleys
|
1
|
6.31
|
3.52
|
79
|
|
Ping pong, two players, 1,000,000 volleys. Sequential
algorithm
|
1
|
5.05
|
1.85
|
173
|
|
Ping pong, two players, 1,000,000 volleys
|
2
|
25.27
|
23.05
|
10
|
|
Ping Pong, 1,000 players, 1,000 volleys
|
1
|
8.22
|
5.54
|
48
|
|
Ping Pong, 1,000 players, 1,000 volleys
|
2
|
7.15
|
5.35
|
34
|
|
Process model wait, one object, 3,000,000 events
|
1
|
7.16
|
2.97
|
140
|
|
Object proxy changing, one subscriber, one publisher
|
1
|
28.48
|
19.48
|
46
|
|
Object proxy changing, one subscriber, one publisher
|
2
|
38.43
|
25.82
|
49
|
|
Logical semaphores
|
1
|
2.33
|
1.76
|
32
|
Many will obviously ask how much these
improvements will affect their simulation. In general, it is
believed that a normal simulation that actually does significant work
as compared to these benchmarks of event processing will experience
an improvement of 10-20% in execution speed.
|
|