Strings

Algorithms and classes used for solving string constraints.

namespace mata

Main namespace including structs and algorithms for all automata.

In particular, this includes:

  1. Alphabets,

  2. Formula graphs and nodes,

  3. Mintermization,

  4. Closed sets.

namespace applications
namespace strings

NFA algorithms usable for solving string constraints.

Typedefs

using Nfa = nfa::Nfa
using State = nfa::State
using StateSet = nfa::StateSet
using Transition = nfa::Transition
using ParameterMap = nfa::ParameterMap
using SymbolPost = nfa::SymbolPost
using Nft = nft::Nft

Functions

std::set<Word> get_shortest_words(const Nfa &nfa)

Get shortest words (regarding their length) of the automaton using BFS.

Returns:

Set of shortest words.

std::optional<std::vector<Word>> get_words_of_lengths(const Nft &nft, std::vector<unsigned> lengths)

Get the accepting words for each tape of nft with specific lengths.

This function finds such an accepting word of nft that for each tape i, the word on this tape (on the ith index of resulting vector) has the length lengths[i].

Parameters:
  • nft – Transducer whose accepting words we are looking for

  • lengths – The lengths of the words of each tape (size of lengths == the levels of nft)

Returns:

std::optional<std::vector<Word>> Either the resulting words of tapes, or std::nullopt if such words of specific lengths do not exist

std::set<Symbol> get_accepted_symbols(const Nfa &nfa)

Get all the one symbol words accepted by nfa.

std::set<std::pair<int, int>> get_word_lengths(const Nfa &aut)

Get the lengths of all words in the automaton aut.

The function returns a set of pairs <u,v> where for each such a pair there is a word with length u+k*v for all ks. The disjunction of such formulae of all pairs hence precisely describe lengths of all words in the automaton.

Parameters:

aut – Input automaton

Returns:

Set of pairs describing lengths

bool is_lang_eps(const Nfa &nfa)

Checks if the automaton nfa accepts only a single word \eps.

Parameters:

nfa – Input automaton

Returns:

true iff L(nfa) = {\eps}

class ShortestWordsMap
#include <strings.hh>

Class mapping states to the shortest words accepted by languages of the states.

Public Functions

inline explicit ShortestWordsMap(const Nfa &aut)

Maps states in the automaton aut to shortest words accepted by languages of the states.

Parameters:

aut – Automaton to compute shortest words for.

std::set<Word> get_shortest_words_from(const StateSet &states) const

Gets shortest words for the given states.

Parameters:

states[in] States to map shortest words for.

Returns:

Set of shortest words.

std::set<Word> get_shortest_words_from(State state) const

Gets shortest words for the given state.

Parameters:

state[in] State to map shortest words for.

Returns:

Set of shortest words.

Private Types

using WordLength = int

A length of a word. Pair binding the length of all words in the word set and word set with words of the given length.

using LengthWordsPair = std::pair<WordLength, std::set<Word>>

Private Functions

void insert_initial_lengths()

Inserts initial lengths into the shortest words map.

Inserts initial length of length 0 for final state in the automaton (initial states in the reversed automaton).

void compute()

Computes shortest words for all states in the automaton.

void compute_for_state(State state)

Computes shortest words for the given state.

Parameters:

state[in] State to compute shortest words for.

inline LengthWordsPair map_default_shortest_words(const State state)

Creates default shortest words mapping for yet unprocessed state.

Parameters:

state[in] State to map default shortest words.

Returns:

Created default shortest words map element for the given state.

Private Members

std::unordered_map<State, LengthWordsPair> shortest_words_map = {}

Map mapping states to the shortest words accepted by the automaton from the mapped state.

std::set<State> processed = {}

Set of already processed states.

std::deque<State> fifo_queue = {}

FIFO queue for states to process.

const Nfa reversed_automaton

Reversed input automaton.

Private Static Functions

static void update_current_words(LengthWordsPair &act, const LengthWordsPair &dst, Symbol symbol)

Update words for the current state.

Parameters:
  • act[out] Current state shortest words and length.

  • dst[in] Transition target state shortest words and length.

  • symbol[in] Symbol to update with.

namespace replace

Enums

enum class ReplaceMode

How many occurrences of the regex to replace, in order from left to right?

Values:

enumerator Single

Replace only the first occurrence of the regex.

enumerator All

Replace all occurrences of the regex.

Functions

Nfa reluctant_nfa(Nfa nfa)

Modify nfa in-place to remove outgoing transitions from final states.

If nfa accepts empty string, returned NFA will accept only the empty string.

Parameters:

nfa – NFA to modify.

Returns:

The reluctant version of nfa.

Nft create_identity(mata::Alphabet *alphabet, size_t num_of_levels = 2)

Create identity transducer over the alphabet with num_of_levels levels.

Nft create_identity_with_single_symbol_replace(mata::Alphabet *alphabet, Symbol from_symbol, Symbol replacement, ReplaceMode replace_mode = ReplaceMode::All)

Create identity input/output transducer with 2 levels over the alphabet with level_cnt levels with single symbol from_symbol replaced with @to_symbol.

Nft create_identity_with_single_symbol_replace(mata::Alphabet *alphabet, Symbol from_symbol, const Word &replacement, ReplaceMode replace_mode = ReplaceMode::All)

Create identity input/output transducer with 2 levels over the alphabet with level_cnt levels with single symbol from_symbol replaced with word replacement.

Nft replace_reluctant_regex(const std::string &regex, const Word &replacement, Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All, Symbol begin_marker = BEGIN_MARKER)

Create NFT modelling a reluctant leftmost replace of regex regex to replacement.

The most general replace operation, handling any regex as the part to be replaced.

Parameters:
  • regex – A string containing regex to be replaced.

  • replacement – Literal to be replaced with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of regex.

  • begin_markerSymbol to be used internally as a begin marker of replaced regex.

Returns:

The reluctant leftmost replace NFT.

Nft replace_reluctant_regex(nfa::Nfa regex, const Word &replacement, Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All, Symbol begin_marker = BEGIN_MARKER)

Create NFT modelling a reluctant leftmost replace of regex regex to replacement.

The most general replace operation, handling any regex as the part to be replaced.

Parameters:
  • regex – NFA representing regex to be replaced.

  • replacement – Literal to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of regex.

  • begin_markerSymbol to be used internally as a begin marker of replaced regex.

Returns:

The reluctant leftmost replace NFT.

Nft replace_reluctant_literal(const Word &literal, const Word &replacement, Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All, Symbol end_marker = END_MARKER)

Create NFT modelling a reluctant leftmost replace of literal literal to replacement.

Parameters:
  • literal – Literal to replace.

  • replacement – Literal to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of literal.

  • end_markerSymbol to be used internally as an end marker marking the end of the replaced literal.

Returns:

The reluctant leftmost replace NFT.

Nft replace_reluctant_single_symbol(Symbol from_symbol, Symbol replacement, mata::Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All)

Create NFT modelling a reluctant leftmost replace of symbol from_symbol to replacement.

Parameters:
  • from_symbolSymbol to replace.

  • replacementSymbol to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of from_symbol.

Returns:

The reluctant leftmost replace NFT.

Nft replace_reluctant_single_symbol(Symbol from_symbol, const Word &replacement, mata::Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All)

Create NFT modelling a reluctant leftmost replace of symbol from_symbol to replacement.

Parameters:
  • from_symbolSymbol to replace.

  • replacement – Literal to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of from_symbol.

Returns:

The reluctant leftmost replace NFT.

Variables

Symbol BEGIN_MARKER = {mata::nft::EPSILON - 100}

Marker marking the beginning of the regex to be replaced.

Symbol END_MARKER = {mata::nft::EPSILON - 99}

Marker marking the end of the regex to be replaced.

class ReluctantReplace
#include <strings.hh>

Implementation of all reluctant replace versions.

Public Static Functions

static Nft replace_regex(nfa::Nfa regex, const Word &replacement, Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All, Symbol begin_marker = BEGIN_MARKER)

Create NFT modelling a reluctant leftmost replace of regex regex to replacement.

The most general replace operation, handling any regex as the part to be replaced.

Parameters:
  • regex – NFA representing regex to be replaced.

  • replacement – Literal to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of regex.

  • begin_markerSymbol to be used internally as a begin marker of replaced regex.

Returns:

The reluctant leftmost replace NFT.

static Nft replace_literal(const Word &literal, const Word &replacement, Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All, Symbol end_marker = END_MARKER)

Create NFT modelling a reluctant leftmost replace of literal literal to replacement.

Parameters:
  • literal – Literal to replace.

  • replacement – Literal to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of literal.

  • end_markerSymbol to be used internally as an end marker marking the end of the replaced literal.

Returns:

The reluctant leftmost replace NFT.

static Nft replace_symbol(Symbol from_symbol, Symbol replacement, mata::Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All)

Create NFT modelling a reluctant leftmost replace of symbol from_symbol to replacement.

Parameters:
  • from_symbolSymbol to replace.

  • replacementSymbol to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of from_symbol.

Returns:

The reluctant leftmost replace NFT.

static Nft replace_symbol(Symbol from_symbol, const Word &replacement, mata::Alphabet *alphabet, ReplaceMode replace_mode = ReplaceMode::All)

Create NFT modelling a reluctant leftmost replace of symbol from_symbol to replacement.

Parameters:
  • from_symbolSymbol to replace.

  • replacement – Literal to replace with.

  • alphabetAlphabet over which to create the NFT.

  • replace_mode – Whether to replace all or just the single (the leftmost) occurrence of from_symbol.

Returns:

The reluctant leftmost replace NFT.

Protected Functions

nfa::Nfa end_marker_dfa(nfa::Nfa regex)
Nft marker_nft(const nfa::Nfa &marker_dfa, Symbol marker)
nfa::Nfa generic_marker_dfa(const std::string &regex, Alphabet *alphabet)
nfa::Nfa generic_marker_dfa(nfa::Nfa regex, Alphabet *alphabet)
nfa::Nfa begin_marker_nfa(const std::string &regex, Alphabet *alphabet)
nfa::Nfa begin_marker_nfa(nfa::Nfa regex, Alphabet *alphabet)
Nft begin_marker_nft(const nfa::Nfa &marker_nfa, Symbol begin_marker)
Nft end_marker_dft(const nfa::Nfa &end_marker_dfa, Symbol end_marker)
nfa::Nfa reluctant_nfa_with_marker(nfa::Nfa nfa, Symbol marker, Alphabet *alphabet)
Nft reluctant_leftmost_nft(const std::string &regex, Alphabet *alphabet, Symbol begin_marker, const Word &replacement, ReplaceMode replace_mode)
Nft reluctant_leftmost_nft(nfa::Nfa nfa, Alphabet *alphabet, Symbol begin_marker, const Word &replacement, ReplaceMode replace_mode)
Nft replace_literal_nft(const Word &literal, const Word &replacement, const Alphabet *alphabet, Symbol end_marker, ReplaceMode replace_mode = ReplaceMode::All)
namespace seg_nfa

Segment Automata including structs and algorithms.

These are automata whose state space can be split into several segments connected by ε-transitions in a chain. No other ε-transitions are allowed. As a consequence, no ε-transitions can appear in a cycle. Segment automaton can have initial states only in the first segment and final states only in the last segment.

Typedefs

using SegNfa = Nfa
using VisitedEpsMap = std::map<State, std::map<Symbol, unsigned>>
using VisitedEpsilonsCounterMap = std::map<Symbol, unsigned>

Number of visited epsilons.

using VisitedEpsilonsCounterVector = std::vector<unsigned>

Projection of VisitedEpsilonsNumberMap to sorted keys (in descending order).

using Noodle = std::vector<std::shared_ptr<SegNfa>>

A noodle is represented as a sequence of segments (a copy of the segment automata) created as if there was exactly one ε-transition between each two consecutive segments.

using SegmentWithEpsilonsCounter = std::pair<std::shared_ptr<Nfa>, VisitedEpsilonsCounterVector>

Segment with a counter of visited epsilons.

using NoodleWithEpsilonsCounter = std::vector<SegmentWithEpsilonsCounter>

Noodles as segments enriched with EpsCntMap.

using TransducerNoodle = std::vector<TransducerNoodleElement>

Functions

void segs_one_initial_final(const std::vector<Nfa> &segments, bool include_empty, const State &unused_state, std::map<std::pair<State, State>, std::shared_ptr<Nfa>> &out)

segs_one_initial_final

segments_one_initial_final[init, final] is the pointer to automaton created from one of the segments such that init and final are one of the initial and final states of the segment and the created automaton takes this segment, sets initial={init}, final={final} and trims it; also segments_one_initial_final[unused_state, final] is used for the first segment (where we always want all initial states, only final state changes) and segments_one_initial_final[init, unused_state] is similarly for the last segment TODO: should we use unordered_map? then we need hash

std::vector<Noodle> noodlify(const SegNfa &aut, Symbol epsilon, bool include_empty = false)

Create noodles from segment automaton aut.

Segment automaton is a chain of finite automata (segments) connected via ε-transitions. A noodle is a vector of pointers to copy of the segments automata created as if there was exactly one ε-transition between each two consecutive segments.

Parameters:
  • automaton[in] Segment automaton to noodlify.

  • epsilon[in] Epsilon symbol to noodlify for.

  • include_empty[in] Whether to also include empty noodles.

Returns:

A list of all (non-empty) noodles.

std::vector<NoodleWithEpsilonsCounter> noodlify_mult_eps(const SegNfa &aut, const std::set<Symbol> &epsilons, bool include_empty = false)

Create noodles from segment automaton aut.

Segment automaton is a chain of finite automata (segments) connected via ε-transitions. A noodle is a vector of pointers to copy of the segments automata created as if there was exactly one ε-transition between each two consecutive segments.

Parameters:
  • automaton[in] Segment automaton to noodlify.

  • epsilons[in] Epsilon symbols to noodlify for.

  • include_empty[in] Whether to also include empty noodles.

Returns:

A list of all (non-empty) noodles.

std::vector<Noodle> noodlify_for_equation(const std::vector<std::reference_wrapper<Nfa>> &lhs_automata, const Nfa &rhs_automaton, bool include_empty = false, const ParameterMap &params = {{"reduce", "false"}})

Create noodles for left and right side of equation.

Segment automaton is a chain of finite automata (segments) connected via ε-transitions. A noodle is a copy of the segment automaton with exactly one ε-transition between each two consecutive segments.

Mata cannot work with equations, queries etc. Hence, we compute the noodles for the equation, but represent the equation in a way that libMata understands. The left side automata represent the left side of the equation and the right automaton represents the right side of the equation. To create noodles, we need a segment automaton representing the intersection. That can be achieved by computing a product of both sides. First, the left side has to be concatenated over an epsilon transitions into a single automaton to compute the intersection on, though.

Parameters:
  • lhs_automata[in] Sequence of segment automata for left side of an equation to noodlify.

  • rhs_automaton[in] Segment automaton for right side of an equation to noodlify.

  • include_empty[in] Whether to also include empty noodles.

  • params[in] Additional parameters for the noodlification:

    • ”reduce”: “false”, “forward”, “backward”, “bidirectional”; Execute forward, backward or bidirectional simulation minimization before noodlification.

Returns:

A list of all (non-empty) noodles.

std::vector<Noodle> noodlify_for_equation(const std::vector<Nfa*> &lhs_automata, const Nfa &rhs_automaton, bool include_empty = false, const ParameterMap &params = {{"reduce", "false"}})

Create noodles for left and right side of equation.

Segment automaton is a chain of finite automata (segments) connected via ε-transitions. A noodle is a copy of the segment automaton with exactly one ε-transition between each two consecutive segments.

Mata cannot work with equations, queries etc. Hence, we compute the noodles for the equation, but represent the equation in a way that libMata understands. The left side automata represent the left side of the equation and the right automaton represents the right side of the equation. To create noodles, we need a segment automaton representing the intersection. That can be achieved by computing a product of both sides. First, the left side has to be concatenated over an epsilon transitions into a single automaton to compute the intersection on, though.

Parameters:
  • lhs_automata[in] Sequence of pointers to segment automata for left side of an equation to noodlify.

  • rhs_automaton[in] Segment automaton for right side of an equation to noodlify.

  • include_empty[in] Whether to also include empty noodles.

  • params[in] Additional parameters for the noodlification:

    • ”reduce”: “false”, “forward”, “backward”, “bidirectional”; Execute forward, backward or bidirectional simulation minimization before noodlification.

Returns:

A list of all (non-empty) noodles.

std::vector<NoodleWithEpsilonsCounter> noodlify_for_equation(const std::vector<std::shared_ptr<Nfa>> &lhs_automata, const std::vector<std::shared_ptr<Nfa>> &rhs_automata, bool include_empty = false, const ParameterMap &params = {{"reduce", "false"}})

Create noodles for left and right side of equation (both sides are given as a sequence of automata).

Parameters:
  • lhs_automata[in] Sequence of pointers to segment automata for left side of an equation to noodlify.

  • rhs_automaton[in] Sequence of pointers to segment automata for right side of an equation to noodlify.

  • include_empty[in] Whether to also include empty noodles.

  • params[in] Additional parameters for the noodlification:

    • ”reduce”: “false”, “forward”, “backward”, “bidirectional”; Execute forward, backward or bidirectional simulation minimization before noodlification.

Returns:

A list of all (non-empty) noodles together with the positions reached from the beginning of left/right side.

std::vector<TransducerNoodle> noodlify_for_transducer(std::shared_ptr<Nft> nft, const std::vector<std::shared_ptr<Nfa>> &input_automata, const std::vector<std::shared_ptr<Nfa>> &output_automata, bool reduce_intersection = true, bool use_homomorphic_heuristic = true)
VisitedEpsilonsCounterVector process_eps_map(const VisitedEpsilonsCounterMap &eps_cnt)

Process epsilon map to a sequence of values (sorted according to key desc)

Parameters:

eps_cnt – Epsilon count

Returns:

Vector of keys (count of epsilons)

class Segmentation
#include <strings.hh>

Class executing segmentation operations for a given segment automaton.

Works only with segment automata.

Public Types

using EpsilonDepth = size_t

Depth of ε-transitions. Dictionary of lists of ε-transitions grouped by their depth. For each depth ‘i’ we have ‘depths[i]’ which contains a list of ε-transitions of depth ‘i’.

using EpsilonDepthTransitions = std::unordered_map<EpsilonDepth, std::vector<Transition>>
using EpsilonDepthTransitionMap = std::unordered_map<EpsilonDepth, std::unordered_map<State, std::vector<Transition>>>

Public Functions

inline explicit Segmentation(const SegNfa &aut, const std::set<Symbol> &epsilons)

Prepare automaton aut for segmentation.

Parameters:
  • aut[in] Segment automaton to make segments for.

  • epsilon[in] Symbol to execute segmentation for.

inline const EpsilonDepthTransitions &get_epsilon_depths() const

Get segmentation depths for ε-transitions.

Returns:

Map of depths to lists of ε-transitions.

inline const EpsilonDepthTransitionMap &get_epsilon_depth_trans_map() const

Get the epsilon depth trans map object (mapping of depths and states to eps-successors)

Returns:

Map of depths to a map of states to transitions

const std::vector<Nfa> &get_segments()

Get segment automata.

Returns:

A vector of segments for the segment automaton in the order from the left (initial state in segment automaton) to the right (final states of segment automaton).

const std::vector<Nfa> &get_untrimmed_segments()

Get raw segment automata.

Returns:

A vector of segments for the segment automaton in the order from the left (initial state in segment automaton) to the right (final states of segment automaton) without trimming (the states are same as in the original automaton).

inline const VisitedEpsMap &get_visited_eps() const

Private Functions

void compute_epsilon_depths()

Compute epsilon depths with their transitions.

void split_aut_into_segments()

Split segment automaton into segments.

void update_next_segment(size_t current_depth, const Transition &transition)

Propagate changes to the current segment automaton to the next segment automaton.

Parameters:
  • current_depth[in] Current depth.

  • transition[in] Current epsilon transition.

void update_current_segment(size_t current_depth, const Transition &transition)

Update current segment automaton.

Parameters:
  • current_depth[in] Current depth.

  • transition[in] Current epsilon transition.

std::unordered_map<State, bool> initialize_visited_map() const

Initialize map of visited states.

Returns:

Map of visited states.

std::deque<StateDepthTuple> initialize_worklist() const

Initialize worklist of states with depths to process.

Returns:

Queue of state and its depth pairs.

void process_state_depth_pair(const StateDepthTuple &state_depth_pair, std::deque<StateDepthTuple> &worklist)

Process pair of state and its depth.

Parameters:
  • state_depth_pair[in] Current state depth pair.

  • worklist[out] Worklist of state and depth pairs to process.

void add_transitions_to_worklist(const StateDepthTuple &state_depth_pair, const SymbolPost &move, std::deque<StateDepthTuple> &worklist)

Add states with non-epsilon transitions to the worklist.

Parameters:
  • move[in] – Move from current state.

  • depth[in] – Current depth.

  • worklist[out] – Worklist of state and depth pairs to process.

void handle_epsilon_transitions(const StateDepthTuple &state_depth_pair, const SymbolPost &move, std::deque<StateDepthTuple> &worklist)

Process epsilon transitions for the current state.

Parameters:
  • state_depth_pair[in] Current state depth pair.

  • move[in] Move from current state.

  • worklist[out] Worklist of state and depth pairs to process.

void remove_inner_initial_and_final_states()

Remove inner initial and final states.

Remove all initial states for all segments but the first one and all final states for all segments but the last one.

Private Members

const std::set<Symbol> epsilons

Symbol for which to execute segmentation. Automaton to execute segmentation for. Must be a segment automaton (can be split into segments).

const SegNfa &automaton
EpsilonDepthTransitions epsilon_depth_transitions = {}

Epsilon depths.

EpsilonDepthTransitionMap eps_depth_trans_map = {}
std::vector<SegNfa> segments = {}

Epsilon depths with mapping of states to epsilon transitions.

Segments for automaton.

std::vector<SegNfa> segments_raw = {}

Raw segments for automaton.

VisitedEpsMap visited_eps = {}
struct StateDepthTuple

number of visited eps for each state

Pair of state and its depth.

Public Members

State state

State with a depth.

EpsilonDepth depth

Depth of a state.

VisitedEpsilonsCounterMap eps
struct TransducerNoodleElement
#include <strings.hh>

Public Functions

inline TransducerNoodleElement(std::shared_ptr<Nft> transducer, std::shared_ptr<Nfa> input_aut, unsigned input_index, std::shared_ptr<Nfa> output_aut, unsigned output_index)

Public Members

std::shared_ptr<Nft> transducer
std::shared_ptr<Nfa> input_aut
unsigned input_index
std::shared_ptr<Nfa> output_aut
unsigned output_index