Introduction

vProto - is an advanced description of regular expressions, allowing for the specification of a Finite State Machine, including states, behaviors, and transition conditions. This description serves as the basis for generating C++\Rust code, tailored for data parsing purposes.

State Machine descriptions are done using tokens. Tokens written on a single line denote sequential traversal (once one is completed, it moves to the next). Branching occurs when transitioning to a new line with an increase in depth (increase in indentation). Lines at the same depth correspond to branches. At the end of a line, there may be the presence of the symbol \ which symbolizes continuation of the line (including possible branching), and it will be used during transition in case branching is terminated.

Example:
    token_1_1 token_1_2 token_1_3
        token_2_1 token_2_2 \
            token_3_1 token_3_2
            token_4_1 token_4_2
        token_5_1 token_5_2
        token_6_1 token_6_2
    token_7_1 token_7_2 token_7_3

Example (Example → 'Generate' → scroll down to the state machine at the bottom)
Tokens written on the same line are executed sequentially, one after another (token_1_1 → token_1_2 → token_1_3)
When moving to the next line and increasing the depth (by one tab), branching occurs between tokens that are at the same depth:

During branching, the branch is selected using a greedy algorithm (the first suitable branch), unless it is specified to use parallel states: < >.
In the given example, the tokens token_1_1 and token_7_1 are at the same depth (branch1), so effectively a branching occurs between them. Let's assume a transition to token_1_1 -> token_1_2 -> token_1_3 will be executed sequentially, one after the other, since they are on the same line. After the token token_1_3, branching occurs to token_2_1 token_5_1 token_6_1 (because they are at the same depth, branch2), accordingly, the execution priority is from top to bottom. But we will no longer be able to return to the earlier branching between token_1_1 token_7_1, because after completing token_1_3 the \ symbol is absent. Let’s assume there is a transition to the branch token_2_1 -> token_2_2. After completing token_2_2, branching occurs to the branches token_3_1 token_4_1. However, after completing them, there will be a return to the branching between token_2_1, token_5_1, token_6_1, because after token_2_2 there is a \ symbol. But because the \ symbol is absent after token_1_3, so we do not return to the higher level.In fact, the \ symbol indicates a continuation of the line (with possible branching) and necessarily a transition to a lower level (one more Tab character).

The successful passage of a token (and consequently, transition to the next) depends on the incoming data. Sometimes tokens can be optional or have certain requirements for their repetition. It's possible to modify token traversal properties by adding additional options using parentheses, for example: (min=5, max=10, init=100).

Shorter forms can also be used: ? (equivalent to min=0, max=1), * (equivalent to min=0, max=infinity), + (equivalent to min=1, max=infinity).

If incoming characters do not match the token, an exception will occur (preventing a transition to the next token), which can be caught; otherwise, parsing will exit. To catch the exception, you need to add a branch with 'catch:' specified. If a '\' character exists in the branch, it will be analyzed as part of the parent branch catch.

Specialized tokens (such as call, jmp, etc.) can be used to allow transitions between branches or to create multiple trees. It is recommended to review the examples in the top-right corner.
Using parallel states <> allows branching to occur in parallel. The first branching token that is fully processed will be used as the single active state, and the others will be discarded.

Range [a-zA-Z\x20\r\n]

Describes the range of permissible values for the incoming byte. The expected format is equivalent to standard range in regular expressions. This token supports:

Also, this token supports redirection to a variable, meaning that while we are within this token, all data will be redirected to the specified variable (equivalent min=1, max=infinity). It is possible to use modifiers (string/string_view/array/vector/uint/hex/bool/bin(binLe)/binBe) that will affect the variable initialization and assignment.

Variables: u, h, be, le, array, vector, string, string_view, bool, enum

As mentioned earlier, when initializing variables, the main type is a range, which can describe all variables. However, there are also shorter forms of variable notation:

If you do not specify the ':', redirection to the variable will not occur, but the data will be ignored, equivalent to range without redirection.
Example of TLV parsing: b8:type be32:length data:value(max=length)
Example of UDP header parsing: be16:srcPort be16:dstPort be16:dataLength be16:checksum data:udpPayload(max=dataLength)

"casesensitive text" or 'no-casesensitive'

"casesensitive" or 'no-casesensitive' - describes constant words or binary values. For example, in the HTTP protocol, words like "GET" or "HTTP/1.1" are case-sensitive, while headers like 'Content-Length' or 'Content-type' can be written using uppercase or lowercase characters. You can also describe the binary value of a character, like in ranges.

{ user code }, notify, if

{ c++\rust code } or notify:userFunction - calling user's C++ code, this token doesn't use incoming bytes but can be used for modifying variables, invoking callbacks, determining branching conditions, or during debugging stages. This code is called as a function, and inside it, the user can "return false" (by default, it automatically returns true) - indicating successful or unsuccessful token processing.
if { condition } is recommended for use if it is placed at the beginning of the branching, the condition in parentheses is the defining one, and the use of the word "return" must be absent.
Examples:
b8:type { printf("Read Type: %u ", type); }
    if {type == 1} notify:userFunction1
    if {type == 2} notify:userFunction2
    if {type == 3} notify:userFunction3
C++\Rust insertion is used for debugging\printf function (automatically setting successful token processing return true), as well as for branching: the "return type == ..." returns true or false, thereby selecting a branch transition.

If a user wishes to add their own functions or repositories, they need to define the "OutputClassName". The parser will inherit from this class and access its variables and functions. It is recommended to inherit OutputClassName from the demoResult state structure.

notify:userFunction - essentially replaces the C++\Rust insert { userFunction(); }, but shorter and more efficient in terms of performance. Also, you can pass parameters like notify:userFunction(var1, ...), but don’t forget to manually declare this function in the outputName class.
Example TLV data struct: b8:type be32:length [\x00-\xff]->string:value(max=length) notify:gotValue

begin:userFunction end:userFunction

begin:userFunction end:userFunction - transfers to the user all data between the beginning and the end.
Example(will return to the user all the data that was used in the tokens between begin and end):
"GET" [ \t]+ begin:userFunction [^ \t]+ end:userFunction [ \t]+ "HTTP/" [0-9] "." [0-9] "\r"? "\n"

label, jmp, call, return

This tokens are used for transitions within the state tree:

Example:
call:readRequestType call:readUrl call:readHeaders

label:readRequestType
    "GET" return
    "POST" return

label:readUrl [ \t]+ [^ \t]->string:url [ \t]+ "HTTP/" [0-9] "." [0-9] "\r"? "\n" return

label:readHeaders ...

reset, break, bang, back

Example:
'A' // depth == 0
    'B' // depth == 1
        'C' back:1 // depth == 2, back:1 go to depth == 1, between 'B' and 'D'.
    'D' // depth == 1
        'E' back:0 // depth == 2, back:0 go to depth == 0, between 'A' and 'F'.
'F' // depth == 0

<tokensCase1,...>

<listTokensCase1, listTokensCase2,..> - this token allows creating nested branching variations within itself, describing a sequence of regular tokens and comma-separated alternative options. Its presence changes the logic of the entire state machine by creating parallel states, enabling simultaneous processing across multiple states. Completion of any sequence leads to the termination of other parallel states, essentially, the 'bang' token is automatically set after this token. This token significantly simplifies understanding of the finite state machine, but working in parallel states is always slower than working in a single state. It is recommended to minimize the description of variations in this token so that options are discarded as quickly as possible.
Example:
<"GET", "POST", "PUT"> - All three states are processed in parallel until one of them wins. If the byte G is received, then only the GET branch has a chance of passing; POST and PUT automatically fail. But if the byte P is received, then we will be simultaneously in the POST and PUT states, waiting for other symbols to determine the state.
Example (parallel states):
"GET" [ \t]+ \
    <'url-1'> [ \t]+
    <'url-2'> [ \t]+
    <'url-3'> [ \t]+
All three states url-1, url-2, url-3 are considered in parallel as data arrives (which can come byte by byte). Decisions are made about which states to exclude gradually.
Example (no parallel states):
"GET" [ \t]+ \
    'url-1' [ \t]+
    'url-2' [ \t]+
    'url-3' [ \t]+
Without using <>, branching will not be considered as parallel states, and transitions to url-2 and url-3 will not be possible because upon the arrival of the symbol 'u', the transition to the first, more prioritized branch url-1 (because it first) will be chosen, and if, for example, a url-2 comes next, the transition to url-2 will not occur. An exception will occur at the utl-1 branch.

vproto:moduleName:varName

vproto:moduleName:varName - the ability to delegate parsing to other vproto modules. Example: an initial parsing module handles protocols such as IMAP, POP3, or SMTP, and then passes the data to the common MIME module for further parsing. Another example is PCAP parsing, which consists of a separate module for reading the PCAP header (first module):
"\xd4\xc3\xb2\xa1" data(max=20)
    le32:ts_sec le32:ts_usec le32:pktLen le32 vproto:packet:var(max=pktLen)
followed by a module for parsing individual packets (second module name "packet"). For example:
array:macDst(max=6, init=6) array:macSrc(max=6, init=6) be16:protoL2 \
    { return protoL2 == 0x0800; // ipv4} ...
    { return protoL2 == 0x86dd; // ipv6} ...
Further parsing of the extracted data can also be delegated to subsequent modules. It is recommended to use it in combination with parameters like (max=length) or with ranges such as [a-z]->vproto:moduleName:varName

Optimization & Performance

Special attention is given to the performance of the generated code, even without special instructions (which exist in x86-64), to ensure code universality and its operation across multiple platforms.

Recommendations for achieving maximum performance:

Performance Testing involves running the most typical GET request (430 bytes) looped 100 million times (43 GB total), with all processing done on a SINGLE CORE IN ONE THREAD. Comparison is made against a modified description of the HTTP protocol (manual parsing of parallel states) and the Boost::HTTP library. The Valgrind report is also attached for a loop of 1 million iterations
vProtovProto (SSE+AVX)boost-1.85::http
Jetson Orin (ARM-8 rev1 2200mhz)13.8 gbit/s (4.0 m req/s)doesn't support3.4 gbit/s (997.0 k req/s)
AMD Ryzen9 5950X15.9 gbit/s (4.6 m req/s)20.2 gbit/s (5.8 m req/s)5.1 gbit/s (1.5 m req/s)
Intel Xeon W-222316.4 gbit/s (4.7 m req/s)33.1 gbit/s (9.6 m req/s)4.1 gbit/s (1.2 m req/s)
Intel Xeon Gold 634820.2 gbit/s (5.8 m req/s)34.2 gbit/s (9.9 m req/s)5.4 gbit/s (1.5 m req/s)
Intel i7-1065G722.3 gbit/s (6.4 m req/s)35.1 gbit/s (10.2 m req/s)4.4 gbit/s (1.2 m req/s)
AMD EPYC 9454P25.3 gbit/s (7.3 m req/s)40.6 gbit/s (11.8 m req/s)6.1 gbit/s (1.7 m req/s)
AMD Ryzen7 8845hs35.7 gbit/s (10.3 m req/s)53.6 gbit/s (15.5 m req/s)5.4 gbit/s (1.5 m req/s)
Intel i9-12900k37.4 gbit/s (10.8 m req/s)54.8 gbit/s (15.9 m req/s)9.8 gbit/s (2.8 m req/s)
Intel i9-13900hx38.0 gbit/s (11.0 m req/s)55.2 gbit/s (16.0 m req/s)9.7 gbit/s (2.8 m req/s)
AMD Ryzen AI 9 HX 37049.4 gbit/s (14.3 m req/s)71.6 gbit/s (20.8 m req/s)11.8 gbit/s (3.4 m req/s)
AMD Ryzen 9 9950X3D51.4 gbit/s (14.9 m req/s)78.9 gbit/s (22.9 m req/s)13.6 gbit/s (3.9 m req/s)
The same test, but after using profiling:
vProtovProto (SSE+AVX)boost-1.85::http
Jetson Orin (ARM-8 rev1 2200mhz)15.9 gbit/s (4.6 m req/s)doesn't support4.4 gbit/s (1.2 m req/s)
AMD Ryzen9 5950X16.4 gbit/s (4.7 m req/s)23.2 gbit/s (6.7 m req/s)6.5 gbit/s (1.8 m req/s)
Intel Xeon W-222321.2 gbit/s (6.1 m req/s)38.1 gbit/s (11.0 m req/s)5.1 gbit/s (1.4 m req/s)
Intel Xeon Gold 634822.3 gbit/s (6.4 m req/s)42.6 gbit/s (12.3 m req/s)5.9 gbit/s (1.7 m req/s)
Intel i7-1065G723.7 gbit/s (6.8 m req/s)47.4 gbit/s (13.7 m req/s)5.3 gbit/s (1.5 m req/s)
AMD EPYC 9454P30.8 gbit/s (8.9 m req/s)57.4 gbit/s (16.6 m req/s)8.7 gbit/s (2.5 m req/s)
AMD Ryzen7 8845hs39.8 gbit/s (11.5 m req/s)74.9 gbit/s (21.7 m req/s)6.8 gbit/s (1.9 m req/s)
Intel i9-12900k48.0 gbit/s (13.9 m req/s)82.3 gbit/s (23.9 m req/s)11.4 gbit/s (3.3 m req/s)
Intel i9-13900hx48.6 gbit/s (14.1 m req/s)82.7 gbit/s (24.0 m req/s)11.5 gbit/s (3.3 m req/s)
AMD Ryzen AI 9 HX 37051.5 gbit/s (14.9 m req/s)90.0 gbit/s (26.1 m req/s)11.8 gbit/s (3.4 m req/s)
AMD Ryzen 9 9950X3D52.4 gbit/s (15.2 m req/s)97.5 gbit/s (28.3 m req/s)14.7 gbit/s (4.2 m req/s)
Another test compares the generated JSON code (from example) with RapidJSON. The test uses a typical JSON of small size, 796 bytes. A distinguishing feature of JSON compared to HTTP is the constant transitions between states, whereas in the case of HTTP, we spend the majority of time in a single state. In this example, used with relatively short data, the generated JSON code runs about 10% faster with SSE (RapidJSON also utilizes SSE). If the fields in JSON are longer, the contribution to performance from SSE will be even greater. The Valgrind report is also attached for a loop of 1 million iterations
jsonFlowjsonPerfrapidJson-v1.1.0
Jetson Orin (ARM-8 rev1 2200mhz)4.9 gbit/s (771.0 k req/s)8.7 gbit/s (1.3 m req/s)4.1 gbit/s (645.4 k req/s)
AMD Ryzen9 5950X6.1 gbit/s (957.9 k req/s)16.2 gbit/s (2.5 m req/s)6.3 gbit/s (994.0 k req/s)
Intel Xeon W-22237.3 gbit/s (1.1 m req/s)9.1 gbit/s (1.4 m req/s)4.8 gbit/s (766.3 k req/s)
Intel Xeon Gold 63487.1 gbit/s (1.1 m req/s)11.1 gbit/s (1.7 m req/s)5.8 gbit/s (915.5 k req/s)
Intel i7-1065G77.5 gbit/s (1.1 m req/s)11.2 gbit/s (1.7 m req/s)8.4 gbit/s (1.3 m req/s)
AMD EPYC 9454P8.1 gbit/s (1.2 m req/s)14.7 gbit/s (2.3 m req/s)9.5 gbit/s (1.5 m req/s)
AMD Ryzen7 8845hs11.0 gbit/s (1.7 m req/s)19.7 gbit/s (3.0 m req/s)9.4 gbit/s (1.4 m req/s)
Intel i9-12900k11.0 gbit/s (1.7 m req/s)18.4 gbit/s (2.8 m req/s)12.2 gbit/s (1.9 m req/s)
Intel i9-13900hx13.2 gbit/s (2.0 m req/s)22.6 gbit/s (3.5 m req/s)13.2 gbit/s (2.0 m req/s)
AMD Ryzen AI 9 HX 37015.3 gbit/s (2.4 m req/s)26.6 gbit/s (4.1 m req/s)14.1 gbit/s (2.2 m req/s)
AMD Ryzen 9 9950X3D16.0 gbit/s (2.5 m req/s)28.4 gbit/s (4.4 m req/s)15.6 gbit/s (2.4 m req/s)
The same test, but after using profiling:
jsonFlowjsonPerfrapidJson-v1.1.0
Jetson Orin (ARM-8 rev1 2200mhz)5.1 gbit/s (807.1 k req/s)9.2 gbit/s (1.4 m req/s)2.9 gbit/s (455.4 k req/s)
AMD Ryzen9 5950X6.0 gbit/s (953.2 k req/s)18.6 gbit/s (2.9 m req/s)5.2 gbit/s (816.5 k req/s)
Intel Xeon W-22237.4 gbit/s (1.1 m req/s)11.5 gbit/s (1.8 m req/s)4.1 gbit/s (643.8 k req/s)
Intel Xeon Gold 63487.9 gbit/s (1.2 m req/s)12.4 gbit/s (1.9 m req/s)4.1 gbit/s (643.8 k req/s)
Intel i7-1065G78.7 gbit/s (1.3 m req/s)12.6 gbit/s (1.9 m req/s)4.7 gbit/s (738.0 k req/s)
AMD EPYC 9454P9.8 gbit/s (2.8 m req/s)17.1 gbit/s (4.9 m req/s)5.4 gbit/s (1.5 m req/s)
AMD Ryzen7 8845hs11.2 gbit/s (1.7 m req/s)22.7 gbit/s (3.5 m req/s)6.1 gbit/s (957.9 k req/s)
Intel i9-12900k14.2 gbit/s (2.2 m req/s)24.5 gbit/s (3.8 m req/s)8.5 gbit/s (1.3 m req/s)
Intel i9-13900hx16.1 gbit/s (2.5 m req/s)27.1 gbit/s (4.2 m req/s)7.0 gbit/s (1.0 m req/s)
AMD Ryzen AI 9 HX 37017.9 gbit/s (2.8 m req/s)28.2 gbit/s (4.4 m req/s)8.9 gbit/s (1.3 m req/s)
AMD Ryzen 9 9950X3D19.8 gbit/s (3.1 m req/s)32.0 gbit/s (5.0 m req/s)10.6 gbit/s (1.6 m req/s)

Сooperation

My collection of parsed protocols (via vProto) includes:

I am open to cooperation or participating in projects.

Author: Shchekoldin Sergey (Щеколдин Сергей)
shchekoldin@gmail.com