But hey, dumping to bytes string makes only half of the process. My mentor suggested me to extend the performance analysis to include the writing of blob to disk.
So I did.
Writing to disk could be just easy as a call to one
open function, plus a few lines of code.
Still, in order to make this testing process meaningful, I want my system to be as possible as it can be to the target I have the task to implement, during this summer. The whole reason behind the serialization revamp, is basically changing how mitmproxy stores and retrieves flows.
It should be dynamic. Flows should be stored the disk, retrieved by index, ordered in bunches, possibly through user-defined filters. And this will happen interactively, in a transparent and flexible way. Using file handles just doesn’t click.
Database systems come to the rescue. Using a DBMS, I can easily implement all the functionalities I was listing right before. And since I will just store blobs, along with some utility columns, SQLite seems just the right choice. In particular, quoting from sqlite.org:
SQLite does not compete with client/server databases. SQLite competes with fopen().
(MID INTEGER PRIMARY KEY, PBUF_BLOB BLOB)
This is it. I suppose that every piece of work name, in this phase of coding, starts with Dummy. Take it like a contract, an insurance between me and you. Ahead in the road, things should be a bit more complete :)
As you can see, there’s not much! The only things I need to build a functioning system, here, is…storing the blobs and marking them with a good ol’ numeric index!
With the sqlite3 API ready, and a barebone schema, let’s connect our storage with the previously implemented
protobuf. This is practically how
load interact with our DB:
storetakes a blob, appends the current maximum
midand inserts the tuple into the DB. It returns that mid to the application, which can then use it as a ticket.
collecttakes that ticket mid, which is used to retrieve the blob from the db.
Dumping the same 4 MB body as in before, including DB insert, yields such results:
.05 seconds, for a single flow, is something far from what we should obtain.
But something worth noting: while insertion to DB takes much more time than tnetstring dump to file, performances in reading are still superior. That suggests something concerning how DBMS handles updates to databases: the loss in performances, as I pointed out in the GitHub PR discussion, is likely caused by every isolated transaction commit, implicit in
with sqlite3.connect() context.
The way I am approaching this “testing” period is truly helping me shaping ideas about how I will implement all the rest. The next points will be:
Til next week! Enjoy :)