22 minutes read

The ast module is a primary tool from a standard Python library for working with Abstract Syntax Trees (ASTs). AST is a tree representation of the code. They are abstract since they do not employ the actual syntax; they use structure and concept models, instead. ASTs can help you with understanding how Python works.

ASTs are often used in IDEs, custom interpreters, static code analyzers, and other testing tools that automatically find code flaws and errors. AST analysis can help to comprehend why something went wrong, though it is not a standard debugging technique. In this topic, we'll look a little closer at ASTs using the ast module.

Abstract syntax trees

Any code consists of characters. They don't really mean anything for the computer in terms of program execution. It cannot really understand them, so they need to be translated to computer commands. As you know, Python is an interpreted programming language, so it includes the interpreter. The interpreter is a special program that translates the code you've written into the form that a machine can understand. In a nutshell, this is how the Python code turns into the code your computer actually executes:

  1. At first, your code is parsed into pieces called tokens: keywords, operators, delimiters, and so on.

  2. Then it constructs an AST — a representation of the Python syntax grammar. AST is a collection of nodes and edges (links) between them. Graphically, an AST of expression = 1 + 2 would look like this. Assign and BinOp are node classes that we'll cover in the next section:

    AST working principle

  3. After that, the interpreter produces bytecodes that a computer can run.

The ast helpers: parsing

Now, let's switch to the ast module. We will start with utility functions and classes. They are called helpers. We'll only look at some of them. Take a look at the full list in the official documentation if you're interested in details.

The first helper is ast.parse(). It takes the source code and parses it into an AST node:

import ast

expression = "1 + 2"
tree = ast.parse(expression)

print(tree)  # <_ast.Module object at 0x000001F064C3CA08>

Keep in mind that the tree is actually a node, a root node, to be specific. You're probably disappointed by the printed value because it doesn't really show anything of importance. Don't worry! Use ast.dump() to print the actual AST:

print(ast.dump(tree, indent=4))

# Module(
#   body=[
#       Expr(
#           value=BinOp(
#               left=Constant(value=1),
#               op=Add(),
#               right=Constant(value=2)))],
#   type_ignores=[])

See? Here it is! Nodes constitute the tree. A tree normally starts with the Module node, which is a root. It has a body attribute that contains every other node and its attributes. In our case, we have the Expr (expression) node with theBinOp (binary operation) node as its value.

Instances of ast.expr and ast.stmt subclasses have two handy attributes lineno and end_lineno that store respectively the first and last line numbers of their corresponding source code (the enumeration starts at 1). Let's assume you have a file called calculations.py with the following code:

expr = 1 + 2
expr_2 = 3 + 4
result = expr * expr_2
print(result, 'This is '
      'result.')

You want to analyze it using ast module. Here's how you can check at what line each node is and maybe even use this information later on in your work:

file = open('calculations.py').read()
tree = ast.parse(file)

for n in tree.body:
    print(n, n.lineno, n.end_lineno)

    
# <ast.Assign object at 0x000001FF4FA63CD0> 1 1
# <ast.Assign object at 0x000001FF4FA63BE0> 2 2
# <ast.Assign object at 0x000001FF4FA63EB0> 3 3
# <ast.Expr object at 0x000001FF4FA61780> 4 5

The ast helpers: visiting nodes

If you just need a list without any particular structure, take a look at the ast.walk() helper. It's, in fact, a generator, so you can print the values this way:

nodes = ast.walk(tree)
# not what we want:
print(nodes)  # <generator object walk at 0x000001F064C2CBC8>

for n in nodes:
    print(n)

# <ast.Module object at 0x000001FF4FA61A80>
# <ast.Expr object at 0x000001FF4FA61AB0>
# <ast.BinOp object at 0x000001FF4FA60370>
# <ast.Constant object at 0x000001FF4FA603A0>
# <ast.Add object at 0x000001FF4F7F98D0>
# <ast.Constant object at 0x000001FF4FA611E0>

It recursively yields children nodes in the tree starting at a given node (included), in no specified order, so it might be useful if you want to modify nodes in place and don’t care about the context.

There are a couple of alternatives to the ast.walk() helper, though. The first one is the ast.NodeVisitor class. It 'scans' the tree and calls a visitor function to every node. You can use it by subclassing it and overriding visit() methods that should have the names of the corresponding node classes (we will discuss them in detail a bit later):

class BinOpLister(ast.NodeVisitor):
    def visit_BinOp(self, node):
        print(node.left)
        print(node.op)
        print(node.right)
        self.generic_visit(node)

        
BinOpLister().visit(tree)
        
# <ast.Constant object at 0x000001FF4FA603A0>
# <ast.Add object at 0x000001FF4F7F98D0>
# <ast.Constant object at 0x000001FF4FA611E0>

Here we have overridden visit() to only visit BinOp class nodes and print the left operand, the operator, and the right operand. It is helpful if you only need to visit specific nodes, for example.

The second option is ast.NodeTransformer that works similarly, but allows you to modify the visited nodes of the tree. We won't consider it thoroughly now, so you can find more detailed information about the tool in the docs.

The ast helpers: literal_eval()

There's also another helper that you may be interested in. Imagine you have a program that works with user input. User input must be an integer. Even if you've indicated it in the docs, you can't be sure that all potential users will comply with the instructions, so you need a backup. You can use ast.literal_eval() that safely evaluates strings and, if everything's fine, returns the intended type. Just look at the following:

user_input = "15"
print(type(user_input))  # <class 'str'>

check_user_input = ast.literal_eval(user_input)
print(type(check_user_input))  # <class 'int'>
# awesome, right?

The ast nodes

Alright, helpers are out of the way. Let's now turn to nodes. Each node is a construct that describes a part of the source code. In the ast module, they are divided into classes, and most of them also have attributes that store the most useful information. For instance, the Import(names) class describes the imported parts of your code, and the names attribute stores their names.

Below is a shortlist of the most common node classes with some of their attributes:

  • literals: Constant(value), List(elts), Set(elts), Dict(keys, values);

  • variables: Name(id), Del;

  • expressions: Expr(value), BinOp(left, op, right), Call(func, args);

  • statements: Assign(targets, value), Raise(exc, cause);

  • imports: Import(names), ImportFrom(module, names);

  • control flow: If(test, body, orelse), For(target, iter, body, orelse), While(test, body, orelse), Break, Continue, Try(body, handlers, orelse, finalbody), ExceptHandler(type, name, body);

  • functions: FunctionDef(name, args, body, returns), Lambda(args, body), Return(value), Yield(value);

  • classes: ClassDef(name, keywords, kwargs, body).

Is anything looking familiar? We hope so! We have already seen Constant, Assign, and BinOp in the very first example and some other nodes later on.

This list isn't exhaustive; you can refer to the official documentation for more detailed information.

The ast nodes: example

It's example time! Imagine someone sent you a pile of scripts to check whether they correspond with PEP 8. You open the first one called my_func.py with the following code:

def greet(user_name):
    print("Hello, world!")
    print("Hello, ", user_name, "!", sep="")

user = "Mary"

greet(user)

It is small and easy to check. But there's a lot of them! You decide to do the checking automatically. Of course, there are a lot of conventions to be considered, but let's see how the part responsible for argument names can look like. Suppose you have a function that checks the name, but how to extract the names? ast knows how to do it:

script = open("my_func.py").read()
tree = ast.parse(script)

print(ast.dump(tree, indent=4))

# Module(
#    body=[
#        FunctionDef(
#            name='greet',
#            args=arguments(
#                posonlyargs=[],
#                args=[
#                    arg(arg='user_name')],
#                kwonlyargs=[],
#                kw_defaults=[],
#                defaults=[]),
#            body=[
#                Expr(
#                    value=Call(
#                        func=Name(id='print', ctx=Load()),
#                        args=[
#                            Constant(value='Hello, world!')],
#                        keywords=[])),
#                Expr(
#                    value=Call(
#                        func=Name(id='print', ctx=Load()),
#                        args=[
#                            Constant(value='Hello, '),
#                            Name(id='user_name', ctx=Load()),
#                            Constant(value='!')],
#                        keywords=[
#                            keyword(
#                                arg='sep',
#                                value=Constant(value=''))]))],
#            decorator_list=[]),
#        Assign(
#            targets=[
#                Name(id='user', ctx=Store())],
#            value=Constant(value='Mary')),
#        Expr(
#            value=Call(
#                func=Name(id='greet', ctx=Load()),
#                args=[
#                    Name(id='user', ctx=Load())],
#                keywords=[]))],
#    type_ignores=[])

Okay, that looks pretty confusing... but only at first sight. We see that our tree starts with the Module node that has the body attribute containing all other nodes. So, at first, we access the body and, since FunctionDef is the first element, we use the 0 index to access the attributes where the arguments are stored.

function = tree.body[0]

Now, we will take a look at how the FunctionDef is organized. First of all, we, obviously, need the args attribute that also has the args attribute. This last args attribute stores just what we need – the arg node with all function arguments, so we just assemble them in a list and print it:

args = [a.arg for a in function.args.args]

print(args)  # ['user_name']

By doing so, we receive a list of argument names and can proceed to check them.

Since it might be tricky to clearly understand what node is where in the tree and why, we strongly recommend using visualization tools, like this one, for example, that make trees more representative and sometimes even let you interact with them. To see how it works, copy the piece of code with the greet() function (or any code you like/have difficulties understanding), paste it in the left part of the webpage, and look at the resulting tree.

Conclusion

In this topic, we've learned several things:

  • how a computer actually executes the code you write;

  • what ASTs are, how to build them, and what is their purpose;

  • what ast helpers are and how to call them;

  • what ast nodes are, what classes of nodes are there, what attributes they have, and how to use them.

Now you're ready to build and extract information from ASTs. Don't forget about the practical tasks!

54 learners liked this piece of theory. 24 didn't like it. What about you?
Report a typo