Python 2 compatibility and AST matching limitations

Aura is designed to process the input AST tree independently from the python version that it is installed under. This is done by configuring different interpreters in the main aura configuration such as the path to Python 2.7 and Python 3+. When Aura encounters a Python source code, it starts going through the configured list of interpreters trying to find the one that can parse the source code. That is accomplished by injecting a special script into the interpreter, that tries to parse the input source code using the native python ast library, serializes it to JSON, and sends back to Aura for analysis. By default, the source code is parsed using Python 3+ syntax and then it falls back to the Python2.7 interpreter. Using this approach Aura can parse and analyze AST that comes from different python versions such as 2.7 vs. 3+ and perform semantic matching over that tree.

This, however, comes with several limitations. The semantic match of python signatures is done by comparing two AST trees, the one from the input source code and the one produced by compiling the semantic signature into the AST tree. AST tree produced by the semantic signature is always done using the python interpreter under which Aura is installed for performance reasons. This means that in some cases Aura compares AST trees produced by vastly different python versions, namely 2 and 3 which could cause misses in semantic signatures. A trivial example would be the infamous print “Hello world” vs. print(“Hello world”). In Python2, the print is a completely separate token and AST node in a tree, whereas, in Python3, the print is just a function call that has an AST tree specific to calling a function.

It is possible to go around this limitation by transforming the AST tree from lower Python (2) version into higher Python version, which is in fact what builtin lib2to3 is doing but this of course this has its own set of limitations as described in the official Python 2to3 docs. We’ve considered integrating this 2to3 translation into Aura to increase the compatibility and prevent missed cases when matching but with the recent end of life deprecation of Python 2.7, we decided to opt-out of this feature.

When it comes to scanning and comparing AST trees under the same major Python version such as 3.7. 3.8 and 3.9, the situation is much simpler. The framework tries to hide the differences in the AST tree (if any) between these versions so from end-user perspective there should not be any differences in pattern matching capabilities. Users are still highly recommended to use the highest possible Python version (3.8 or even 3.9) which produces more information. For example “end line number” identifiers were not produced before Python 3.7 which doesn’t impact the scanning capabilities but produces much nicer results as Aura can precisely pinpoint the location of code.