File re2parser.hh

Enums

enum class Encoding

Values:

enumerator UTF8
enumerator Latin1
namespace mata

Main namespace including structs and algorithms for all automata.

In particular, this includes:

  1. Alphabets,

  2. Formula graphs and nodes,

  3. Mintermization,

  4. Closed sets.

namespace parser

Parser from .mata format to automata (currently Nfa and Afa are supported).

This includes parsing either from files or from other streams (strings, etc.).

Functions

nfa::Nfa create_nfa(const std::string &pattern, bool use_epsilon = false, mata::Symbol epsilon_value = 306, bool use_reduce = true, const Encoding encoding = Encoding::Latin1)

Creates NFA from regular expression using RE2 parser.

At https://github.com/google/re2/wiki/Syntax, you can find the syntax of regular expressions with following futher limitations: 1) If you use UTF8 encoding, the created NFA will have the values of bytes instead of full symbols. For example, the character Ā whose Unicode code point is U+0100 and is represented in UTF8 as two bytes c4 80 will have two transitions, one with c4 followed with by 80, to encode it. 2) The created automaton represents the language of the regex and is not expected to be used in regex matching. Therefore, stuff like ^, $, , etc. are ignored in the regex.

Parameters:
  • pattern – regex as a string

  • use_epsilon – whether to keep epsilon transitions in created NFA

  • epsilon_value – symbol representing epsilon

  • use_reduce – if set to true the result is trimmed and reduced using simulation reduction

  • encoding – encoding of the regex, default is Latin1

Returns:

Nfa corresponding to pattern

void create_nfa(nfa::Nfa *nfa, const std::string &pattern, bool use_epsilon = false, mata::Symbol epsilon_value = 306, bool use_reduce = true, const Encoding encoding = Encoding::Latin1)