Monday, July 7, 2014

XKS speedup trickery

Lets have a look on how our traffic is XKey-scored and whether
its done with efficiency.

The XKS source seems to be some kind of mangled-C++, just like
a lot of C/C++-based languages exist for big/parallel
data processing (CUDA or other parallelizing extensions).

Given that, DB is obviously some kind of nested std::map or 
apparently of a derived type, as can be seen by the apply() 
member which is not part of a STL map.
Its probably not a multimap either, as denoted by the clear()
and in that [][] assignments are not possible with multimaps [1].

These types (as well as a multimap) are sorted associative
containers (dictionaries) who's lookup complexity is guaranteed
to be O(log(N)) at worst [2], where N denotes the number
of keys in the map. DB has at least 3 keys as seen from the 
snippet, but chances are that the number is much larger.The 
larger it is, the more need is for optimizing the map access.
I doubt that XKS has their own implementation of dictionaries
that have a better O() and are optimized in a way that
access could be O(1). After all (look at the boost include), it
looks pretty much like STL-C++ code.

Given that, inside a loop the following XKS code is rather

for (values_t::const_iterator iter = VALUES.begin();
          iter != VALUES.end();
          ++iter) {
        DB["tor_onion_survey"]["onion_address"] = iter->address() + ".onion";
        if (iter->has_scheme())
          DB["tor_onion_survey"]["onion_scheme"] = iter->scheme();
        if (iter->has_port())
          DB["tor_onion_survey"]["onion_port"] = iter->port();
        DB["tor_onion_survey"]["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
      return true;

because inside the loop, the STL's find-routine walks the DB map
4 times until it gets to DB["tor_onion_survey"]. Since the first 
key tor_onion_survey is static, it would be much better to keep
cached iterator to save the lookup time in each cycle.
Additionally, the find for the second key again has O(log(N)),
where N seems to be 4 (onion_address, onion_scheme, onion_port 
and onion_count).

The loop should rather be organized like this:

      auto cit = DB_fast.begin();
        pair < string, map < string, string > > sm;
        sm.first = "tor_onion_survey";
        for (...) {
                sm.second["onion_address"] = iter->address() + ".onion";
                if (iter->has_scheme())
                  sm.second["onion_scheme"] = iter->scheme();
                if (iter->has_port())
                  sm.second["onion_port"] = iter->port();
                sm.second["onion_count"] = boost::lexical_cast(TOTAL_VALUE_COUNT);
                DB_fast.insert(cit, sm);

The full speedup-demo with comparison of both methods can be
found here. The average speedup in my tests are about 30% which
can save a lot of tax payers money if the agency scales
their XKS horizontally. The speedup here is the O(1) access
via the pair<>, compared to the O(log(N)) access in the original
code via a map<>. And thats for a DB map that
just has N=1 (tor_onion_survey). In reality N should be much

Nevertheless C++ is a good choice for XKS for various reasons
and they seem to be learning-by-doing just like any other
coder out there.

Edit: Meanwhile I found another reason to avoid operator[]
for assignments in a row inside one of Scott Meyers excellent
books on C++ effectiveness [3] which I really recommend reading
to any XKS developers (there are also classes for it).

[1] The clear() is important for our later optimization, as
    insert() has the same semantics like operator[] assignment
    only if the key doesn't already exist - otherwise the
    assignment-step after finding the key won't happen with

[2] Generic Programming and the STL, using and extending the C++
    Standard Template Library
    Matthew H. Austern, Addison Wesley, 1999,

[3] Effective STL, 50 Specific Ways to Improve Your Use of the
    Standard Template Library
    Scott Meyers, Addison Wesley, 2001,
    Item 24, p.106ff