its done with efficiency.
The XKS source seems to be some kind of mangled-C++, just like
a lot of C/C++-based languages exist for big/parallel
data processing (CUDA or other parallelizing extensions).
Given that, DB is obviously some kind of nested std::map or
apparently of a derived type, as can be seen by the apply()
member which is not part of a STL map.
Its probably not a multimap either, as denoted by the clear()
and in that [][] assignments are not possible with multimaps [1].
These types (as well as a multimap) are sorted associative
containers (dictionaries) who's lookup complexity is guaranteed
to be O(log(N)) at worst [2], where N denotes the number
of keys in the map. DB has at least 3 keys as seen from the
snippet, but chances are that the number is much larger.The
larger it is, the more need is for optimizing the map access.
I doubt that XKS has their own implementation of dictionaries
that have a better O() and are optimized in a way that
DB["tor_onion_survey"]["onion_count"]
access could be O(1). After all (look at the boost include), it
looks pretty much like STL-C++ code.
Given that, inside a loop the following XKS code is rather
inefficient:
for (values_t::const_iterator iter = VALUES.begin();
iter != VALUES.end();
++iter) {
DB["tor_onion_survey"]["onion_address"] = iter->address() + ".onion";
if (iter->has_scheme())
DB["tor_onion_survey"]["onion_scheme"] = iter->scheme();
if (iter->has_port())
DB["tor_onion_survey"]["onion_port"] = iter->port();
DB["tor_onion_survey"]["onion_count"] = boost::lexical_cast
DB.apply();
DB.clear();
}
return true;
because inside the loop, the STL's find-routine walks the DB map
4 times until it gets to DB["tor_onion_survey"]. Since the first
key tor_onion_survey is static, it would be much better to keep
a cached iterator to save the lookup time in each cycle.
Additionally, the find for the second key again has O(log(N)),
where N seems to be 4 (onion_address, onion_scheme, onion_port
and onion_count).
The loop should rather be organized like this:
auto cit = DB_fast.begin();
pair < string, map < string, string > > sm;
sm.first = "tor_onion_survey";
for (...) {
sm.second["onion_address"] = iter->address() + ".onion";
if (iter->has_scheme())
sm.second["onion_scheme"] = iter->scheme();
if (iter->has_port())
sm.second["onion_port"] = iter->port();
sm.second["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
DB_fast.insert(cit, sm);
DB_fast.apply();
DB_fast.clear();
}
The full speedup-demo with comparison of both methods can be
found here. The average speedup in my tests are about 30% which
can save a lot of tax payers money if the agency scales
their XKS horizontally. The speedup here is the O(1) access
via the pair<>, compared to the O(log(N)) access in the original
code via a map<>. And thats for a DB map that
just has N=1 (tor_onion_survey). In reality N should be much
larger.
Nevertheless C++ is a good choice for XKS for various reasons
and they seem to be learning-by-doing just like any other
coder out there.
Edit: Meanwhile I found another reason to avoid operator[]
for assignments in a row inside one of Scott Meyers excellent
books on C++ effectiveness [3] which I really recommend reading
to any XKS developers (there are also classes for it).
Edit: Meanwhile I found another reason to avoid operator[]
for assignments in a row inside one of Scott Meyers excellent
books on C++ effectiveness [3] which I really recommend reading
to any XKS developers (there are also classes for it).
[1] The clear() is important for our later optimization, as
insert() has the same semantics like operator[] assignment
only if the key doesn't already exist - otherwise the
assignment-step after finding the key won't happen with
insert().
insert() has the same semantics like operator[] assignment
only if the key doesn't already exist - otherwise the
assignment-step after finding the key won't happen with
insert().
[2] Generic Programming and the STL, using and extending the C++
Standard Template Library
Matthew H. Austern, Addison Wesley, 1999,
p.159f
[3] Effective STL, 50 Specific Ways to Improve Your Use of the
Standard Template Library
Scott Meyers, Addison Wesley, 2001,
Item 24, p.106ff
No comments:
Post a Comment