# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['pyjapt']

package_data = \
{'': ['*']}

setup_kwargs = {
    'name': 'pyjapt',
    'version': '0.2.5',
    'description': 'Just Another Parsing Tool Writ ten in Python',
    'long_description': '# Lexer and LR parser generator  "PyJapt"\n\n## Installation\n\n```\npip install pyjapt\n```\n\n## PyJapt\n\nPyJapt is a lexer and parser generator developed to provide a solution not only to the creation of these pieces of the compilation process, but also to allow a custom syntactic and lexicographic error handling interface. For its construction we have been inspired by other parser generators such as yacc, bison, ply and antlr, for example.\n\nPyJapt revolves around the concept of grammar.\n\nTo define the nonterminals of the grammar we use the `add_non_terminal ()` method of the `Grammar` class.\n\n```python\nfrom pyjapt import Grammar\n\ng = Grammar()\n\nexpr = g.add_non_terminal(\'expr\', start_symbol=True)\nterm = g.add_non_terminal(\'term\')\nfact = g.add_non_terminal(\'fact\')\n```\n\nTo define the terminals of our grammar we will use the `add_terminal ()` method of the `Grammar` class. This method receives the name of the non-terminal as the first parameter and a regular expression for the lexicographic analyzer as an optional parameter. In case the second parameter is not provided, the regular expression will be the literal name of the terminal.\n\n```python\nplus = g.add_terminal(\'+\')\nminus = g.add_terminal(\'-\')\nstar = g.add_terminal(\'*\')\ndiv = g.add_terminal(\'/\')\n\nnum = g.add_terminal(\'int\', regex=r\'\\d+\')\n```\n\nIf we have a set of terminals whose regular expression matches their own name we can encapsulate them with the `add_terminals()` function of the `Grammar` class.\n\n```python\nplus, minus, star, div = g.add_terminals(\'+ - * /\')\nnum = g.add_terminal(\'int\', regex=r\'\\d+\')\n```\n\nIt may also be the case that we want to apply a rule when a specific terminal is found, for this PyJapt gives us the `terminal()` function decorator of the `Grammar` class that receives the terminal name and regular expression. The decorated function must receive as a parameter a reference to the lexer to be able to modify parameters such as the row and column of the terminals or the parser reading position and return a `Token`, if this token is not returned it will be ignored.\n\n```python\n@g.terminal(\'int\', r\'\\d+\')\ndef id_terminal(lexer):\n    lexer.column += len(lexer.token.lex)\n    lexer.position += len(lexer.token.lex)\n    lexer.token.lex = int(lexer.token.lex)\n    return lexer.token\n```\n\nWe can also use this form of terminal definition to skip certain characters or tokens, we just need to ignore the return in the method.\n\n```python\n##################\n# Ignored Tokens #\n##################\n@g.terminal(\'newline\', r\'\\n+\')\ndef newline(lexer):\n    lexer.lineno += len(lexer.token.lex)\n    lexer.position += len(lexer.token.lex)\n    lexer.column = 1\n\n\n@g.terminal(\'whitespace\', r\' +\')\ndef whitespace(lexer):\n    lexer.column += len(lexer.token.lex)\n    lexer.position += len(lexer.token.lex)\n\n\n@g.terminal(\'tabulation\', r\'\\t+\')\ndef tab(lexer):\n    lexer.column += 4 * len(lexer.token.lex)\n    lexer.position += len(lexer.token.lex)\n```\n\nTo define the productions of our grammar we can use an attributed or not attributed form:\n\n```python\n# This is an unattributed grammar using previously declared variables\nexpr %= expr + plus + term\nexpr %= expr + minus + term\nexpr %= term\n\nterm %= term + star + fact\nterm %= term + div + fact\nterm %= fact\n\nfact %= num\n\n# A little easier to read...\n# Each symbol in the production string must be separated by a blank space\nexpr %= \'expr + term\'\nexpr %= \'expr - term\'\nexpr %= \'term\'\n\nterm %= \'term * factor\'\nterm %= \'term / factor\'\nterm %= \'fact\'\n\nfact %= \'int\'\n\n# This is an attributed grammar\nexpr %= \'expr + term\', lambda s: s[1] + s[3]\nexpr %= \'expr - term\', lambda s: s[1] - s[3]\nexpr %= \'term\', lambda s: s[1]\n\nterm %= \'term * factor\', lambda s: s[1] * s[3]\nterm %= \'term / factor\', lambda s: s[1] // s[3]\nterm %= \'fact\', lambda s: s[1]\n\nfact %= \'int\', lambda s: int(s[1])\n\n# We can also attribute a function to define a semantic rule\n# This function should receive as parameter `s` which is a reference to a\n# special list with the semantic rules of each symbol of the production.\n# To separate the symbol from the head of the body of the expression\n# use the second symbol `->`\n@g.production(\'expr -> expr + term\')\ndef expr_prod(s):\n    print(\'Adding an expression and a term ;)\')\n    return s[1] + s[3]\n\n# We can also assign a rule to many productions with this decorator\n# Please ignore the fact this grammar is ambiguous and pyjapt doesn\'t have support for it ... yet ;)\n@g.production(\'expr -> expr + expr\', \n              \'expr -> expr - expr\', \n              \'expr -> expr * expr\', \n              \'expr -> expr / expr\',\n              \'expr -> ( expr )\' \n              \'expr -> int\')\ndef expr_prod(s):\n    if len(s) != 2:\n        if s[2] == \'+\':\n            return s[1] + s[2]\n    \n        if s[2] == \'-\':\n            return s[1] - s[2]\n        \n        if s[2] == \'*\':\n            return s[1] * s[2]\n        \n        if s[2] == \'/\':\n            return s[1] // s[2]\n    \n        return s[2]\n    return int(s[1])\n```\n\nTo generate the lexer and the parser of the grammar we will use the get_lexer and get_parser methods respectively. In the case of get_parser it receives a string as an argument to find out what type of parser to use. Valid strings are `\'slr\'`,`\' lr1\'`, and `\'lalr1\'`.\n\n```python\ng.get_lexer()\ng.get_parser(\'slr\')\n```\n\nFinally an example of the entire pipeline in the way that we consider the most readable and comfortable to describe a grammar for the language of arithmetic expressions\n\n```python\nfrom pyjapt import Grammar\n\ng = Grammar()\nexpr = g.add_non_terminal(\'expr\', True)\nterm, fact = g.add_non_terminals(\'term fact\')\ng.add_terminals(\'+ - / * ( )\')\ng.add_terminal(\'int\', regex=r\'\\d+\')\n\n@g.terminal(\'whitespace\', r\' +\')\ndef whitespace(_lexer):\n    _lexer.column += len(_lexer.token.lex)\n    _lexer.position += len(_lexer.token.lex)\n\nexpr %= \'expr + term\', lambda s: s[1] + s[3]\nexpr %= \'expr - term\', lambda s: s[1] - s[3]\nexpr %= \'term\', lambda s: s[1]\n\nterm %= \'term * fact\', lambda s: s[1] * s[3]\nterm %= \'term / fact\', lambda s: s[1] // s[3]\nterm %= \'fact\', lambda s: s[1]\n\nfact %= \'( expr )\', lambda s: s[2]\nfact %= \'int\', lambda s: int(s[1])\n\nlexer = g.get_lexer()\nparser = g.get_parser(\'slr\')\n\nprint(parser(lexer(\'(2 + 2) * 2 + 2\'))) # prints 10\n```\n## Serialization\n\nWhen the grammar is large enough the parser construction process can be a bottleneck. To solve this problem PyJapt offers a solution by serializing the parser and lexer generated from the grammar into two files `parsetab.py` and` lexertab.py`. These files must refer to the instance of the original grammar.\n\nTo serialize, the name of the variable that contains the instance of Grammar is required, the name of the module where the grammar was written, and the name of the class that the Lexer and Parser will have serialized.\n\n```python\nimport inspect\nfrom pyjapt import Grammar\n\ng = Grammar()\n\n# ...\n\nif __name__ == \'__main__\':\n    module_name = inspect.getmodulename(__file__)\n    g.serialize_lexer(class_name=\'MyLexer\', grammar_module_name=module_name, grammar_variable_name=\'g\')\n    g.serialize_parser(parser_type=\'slr\', class_name=\'MyParser\', grammar_module_name=module_name, grammar_variable_name=\'g\') \n```\n\n## Lexer Construction\n\nAlthough it seems simple, there are some things to keep in mind when defining the terminals of our grammar. When detecting a token, the lexer will first recognize those that have been marked with the `terminal` decorator of the` Grammar` class ( or have a rule assigned ) according to their order of appearance, then those whose regular expression does not match their identifier will follow, and finally the literal terminals (those that do match their identifier with their regular expression). In the last two cases, the order of both will be given by the size of their regular expression ( largest first ).\n\nIf your language has keywords and identifiers a great idea to avoid the rule of the largest regular expression is set a rule to our identifier terminal\n\n```python\nfrom pyjapt import Grammar\n\n\ng = Grammar()\n\nkeywords = g.add_terminals(\n    \'class inherits if then else fi while loop pool let in case of esac new isvoid true false not\')\nkeywords_names = {x.name for x in keywords}\n\n@g.terminal(\'id\', r\'[a-zA-Z_][a-zA-Z0-9_]*\')\ndef id_terminal(lexer):\n    lexer.column += len(lexer.token.lex)\n    lexer.position += len(lexer.token.lex)\n    if lexer.token.lex in keywords_names:\n        # modify the token type ;) \n        lexer.token.token_type = lexer.token.lex\n    return lexer.token\n```\n\n## Handling of lexicographic and syntactic errors\n\nAn important part of the parsing process is handling errors. For this we can do the parser by hand and insert the error report, since techniques such as `Panic Recovery Mode` which implements `PyJapt` only allow the execution of our parser not to stop, to give specific error reports `PyJapt` offers the creation of erroneous productions to report common errors in a programming language such as the lack of a `;`, an unknown operator, etc. For this our grammar must activate the terminal error flag.\n\n```python\ng.add_terminal_error() # Add the error terminal to the grammar.\n\n# Example of a possible error production\n@g.production("instruction -> let id = expr error")\ndef attribute_error(s):\n    # With this line we report the error message\n    # As the semantic rule of s[5] is the token itself (because it is a terminal error), so we have access\n    # to your their, token type, line and column.\n    s.add_error(5, f"{s[5].line, s[5].column} - SyntacticError: Expected \';\' instead of \'{s[5].lex}\'")\n\n    # With this line we allow to continue creating a node of the ast to\n    # be able to detect semantic errors despite syntactic errors\n    return LetInstruction(s[2], s[4])\n```\n\nTo report lexicographical errors the procedure is quite similar we only define a token that contains an error, in this example a multi-line comment that contains an end of file.\n\n```python\n@g.terminal(\'comment_error\', r\'\\(\\*(.|\\n)*$\')\ndef comment_eof_error(lexer):\n    lexer.contain_errors = True\n    lex = lexer.token.lex\n    for s in lex:\n        if s == \'\\n\':\n            lexer.lineno += 1\n            lexer.column = 0\n        lexer.column = 1\n    lexer.position += len(lex)\n    lexer.add_error(f\'{lexer.lineno, lexer.column} -LexicographicError: EOF in comment\')\n```\n\nAnd to report general errors during the tokenization process we can use the `lexical_error` decorator.\n\n```python\n@g.lexical_error\ndef lexical_error(lexer):\n    lexer.add_error(lexer.line, lexer.column, f\'{lexer.lineno, lexer.column} -LexicographicError: ERROR "{lexer.token.lex}"\')\n    lexer.column += len(lexer.token.lex)\n    lexer.position += len(lexer.token.lex)\n```\n\n## Credits \n\nFor each recommendation or bug please write to alejandroklever.workon@gmail.com.\n',
    'author': 'Alejandro Klever',
    'author_email': 'alejandroklever4197@gmail.com',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/alejandroklever/PyJapt',
    'packages': packages,
    'package_data': package_data,
    'python_requires': '>=3.8,<4.0',
}


setup(**setup_kwargs)
