Wednesday, September 30, 2020

More greppin speedup trickery

 I learned about SIMD based hyperscan regex scanning libs being
super fast, so I refactored grab a bit to make it possible
to load different regex engines at runtime for speed comparison.
I was also told about a quite popular similar project and
compared it to my greppin branch. Enjoy!
Still need to check whether and how it would be possible to
vectorize the matching on files to fully exploit SIMD. Will
keep you updated!
Update:I checked the code of hs_scan_vector() and it's just
iterating over the scatter array and calling internal scan
functions on it. I thought it could be using SIMD for it too,
but I was stupid. So, no more speedup on that front.

While digging into that topic, I noticed that apparently quite
lot of NIDS technology is still relying on regexes in 2020 (lol).

Monday, September 21, 2020

grep speedup trickery

I polished my parallel grep version. When I started it in 2012, multicore + SSD setups were not that common. Today, lot of storage is on flash or SSD, so you can benefit from parallel grepping by a factor of 3 or more (depending on amount of CPU cores). Just check out the link; it will also contain some timed runs to underline the statements. I also noticed that my previous git singing key expired, so I will need to resign the repos with my new GPG key (already uploaded) over time.


Update: I added a new branch to the repo to again double the speed by an dedicated nftw() + readdir() implementation thats parallelized and recursive at the same time! If you enjoy brainfucks, give it a try!