Configuration¶

Aura can be configured using two YAML files. The first one is the main framework configuration that configures the behavior of the framework such as scoring system, restrictions, or interpreters used to parse the source code. The second file is a list of semantic signatures and patterns to look for inside the files. The framework is bundled with a default set of configuration files which can be found under <repository_root>/aura/data.

CLI options¶

Aura scan¶

-v: verbose, some detections items are hidden by default from the output that is marked as informational such as file size of input data. Providing this flag would output also these informational data into the specified file format. This flag can be specified more than once. The second verbosity level would also output detections filtered as false positives, such as AST parse errors for string blobs
-a: Analyzer name, can be specified more than once. Allows overriding a list of default analyzers that are run on the input data.
-f: Output format
--min-score: Aura would not write any output information if the total score of the scanned input data is below this threshold
--benchmark: Run aura scan in a benchmark mode. This is intended for development purposes as it enables the python cProfile to run during the scan
--benchmark-sort: Sort benchmark profile by a given statistics name
--async: Run aura in async mode. After the input files are preprocessed, a separate process would be forked to scan that input file. Increases the speed of data scans in some circumstances, especially on multi-core machines.
--no-async: Disable the aura async mode.
-t: Filter output detections to only given tags. Detections can also be filtered to be excluded if they contain a given tag when prefixed by an exclamation mark, e.g. “!test-code”

Scan Output format options¶

The following options are present for each of the built-in output URI formats:

fmt://location?min_score=0&verbosity=1&tags=tag1,!tag2

fmt - output format, can be either text, json or sqlite
location - output location(file). If given the output is written to the path as specified by the location instead of stdout. Required for sqlite output format
min_score - same as –min-score cli option for aura scan, specify the minimum score threshold after which is the output produced
tags - filter detections for given tags, same as -t cli option
verbosity - specify output verbosity, same as -v cli option

Aura diff¶

-f: Output format for the diff data
--detections: Enable scanning the diff data for detections. This is the same as performing the aura scan on the diff data. Detections are then diffed against each other in the same manner as diff is done on files, e.g. added/removed/modified. This option allows aura to semantically detect changes in the source code from the point of functionality and capabilities.
--no-detections: Disable the detections mode for diffed files.
--patch: Output the diff patch to show changes in text files
--no-patch: Disable the diff patch output
--output-same-renames: By default, if files between diffs are exactly the same but they only differ in a name, they are filtered from the diff output. This behavior can be disabled by using this flag which would force to display diff also for the same files that were just renamed.
--no-same-renames: Default behavior, hide the same files with only filenames changes/renames from the diff output.

Diff output format options¶

The following options are present for each of the built-in output URI formats:

fmt://location?detections=true&output_same_renames=true&patch=true

fmt - same meaning as for aura scan
location - same meaning as for aura scan
detections - enable/disable outputting detections diff
output_same_renames - enable/disable the hiding of files that are same but only renamed
patch - enable/disable output of a text diff patch

Main configuration file¶

This a documented default configuration file

aura: &aura_config
  log-level: info

  # You can enable/disable/filter python warnings here
  # It's useful to disable them if running a full PyPI repo scan or collecting data sets
  # https://docs.python.org/3.7/library/warnings.html#overriding-the-default-filter
  warnings: default

  # Directory location for caching, unset to disable caching
  cache_location: &cache_location ~/.aura_cache

  # Path to the location of the offline PyPI mirror
  # This option can be safely disabled if you don't have a local mirror, some advanced features require this
  mirror: /var/pypi_mirror/pypi/web/

  # Path to the yara rules for scanning data blobs
  # See example_rules.yara for documentation on format and examples
  yara-rules: &yara_rules aura.data.rules.yara

  # Path to the semantic rules used by python code analyzer
  semantic-rules: &semantic_rules aura.data.signatures.yaml

  # This file is needed for typosquatting detections
  pypi_stats: &pypi_stats pypi_stats.json

  reverse_dependencies: &reverse_dependencies reverse_dependencies.json

  # Threshold for package download after which the package is considered not legitimate
  pypi_download_threshold: 10000

  # Default minimum score for outputting scan hits
  # This can be overridden by a specific output type
  min-score: 10

  # You can enable/disable forking for async processing of files here
  # Async processing is using the python's multiprocessing module
  # It has a much bigger performance but it is not very debugger friendly when developing new plugins
  # It is possible that some 3rd party plugins might require synchronous processing due to data pipelines
  async: false

  # Max width for the default text output
  text-output-width: auto

  # Limit heap size of the process
  # 4G
  rlimit-memory: 4294967296

  # Limit maximum file size the framework can create
  # This is also used as a limit when unpacking archive content to prevent for example zip bombs
  # 4G
  rlimit-fsize: 4294967296

  # When extracting an archive, limit the maximum file size that can be extracted
  # This is a safety mechanism for zip bombs
  # Falls back to rlimit-fsize if not configured
  # max-archive-size: 4294967296

  # You can limit the stack (python frames) recursion here
  # python-recursion-limit = 150
  # Aura recursively unpack archives, this specifies the maximum depth of recursion
  max-depth: 5

  # Limit maximum numbers of files that can be processed during one scan
  max-files: 1000

  # Order of AST analysis stages to run on python source code
  ast-stages: &ast_stages
    - convert
    - rewrite
    - ast_pattern_matching
    - taint_analysis
    - readonly

  # Define a maximum number of iterations for a visitor
  # A new iteration is performed over the AST tree each time a property or a node of the AST tree is modified
  # These iterations are performed until the AST tree converges (e.g. no more modifications are performed) or a maximum number of iterations has been reached
  max-ast-iterations: 500

  # This is a prevention against infinite traversals/loops in the AST tree, which puts a hard limit on queue and discards any new node traversals above the limit
  # In case there is a bug, AST tree could be rewritten in a way that creates loops
  # In some rare cases, the source code could just be extremely big which prolongs the processing a lot, especially the taint analysis
  max-ast-queue-size: 100000

  # Minimum blob size (string or bytes) to be extracted from source code
  # into separate file for scanning. This means that if there is a str or bytes
  # inside the source code longer then X characters it will be extracted
  # and inserted to the data analysis pipeline
  min-blob-size: 100

  # Set preferred output format for cli commands that supports it
  # Supported formats: text, json
  output-format: json

  # If defined, a dedicated log file for exceptions and errors would be created
  error-log: "aura_errors.log"

  # Define the threshold of shanon entropy for strings to be reported
  # TODO: be able to define both min & max
  shanon_entropy: 5.0

  # Always produce an informational `Detection` informing on all module imports in the source code
  # This will report any and all modules (code imports) even if there is no semantic pattern defined for them
  always_report_module_imports: false

  # Sort collected python files base on their imports by analyzing them via directed graph
  # This will ensure that imported files will be analyzed before the files that are importing them
  sort_by_imports: false

  # List of tags to exclude from the aggregation, e.g. when summing up all tags in a scan
  exclude_aggregated_tags:
    - "misc:file_stats"


tags: &tags
  # Filter results that contain only the specified tags
  # Results can also be excluded using "!" to prefix a tag
  # This list would be used for default tag filtering and can be optionally overriden via cli parameter(s) -t
  #- "!test-code"


diff: &diff
  # Threshold after which files are considered to be similar/modified
  similarity_threshold: 0.60
  # Max depth of files to consider for pairing similar files
  # Increasing this significantly impacts performance of fuzzy matching potentially similar files
  depth_limit: 2


pypirc: &pypirc
  # Blacklist values to reduce false positives from the default configurations
  username_blacklist:
    - "empty"
    - "${PYPI_USERNAME}"
    - "None"
  password_blacklist:
    - "empty"
    - "${PYPI_PASSWORD}"
    - "..."
    - "None"
    - "<your test password goes here>"
    - "<your production password goes here>"


behavioral_analysis: &behavioral_analysis
  - name: Network access
    id: network_access
    description: "Code is accessing network and/or establishing network connections"
    tags:
      anyOf:
        - "behavior:public_network_interface"
        - "behavior:network"
        - "vuln:unverified_request"

  - name: System execution
    id: system_execution
    description: "Code is performing system commands execution"
    tags:
      "behavior:system_execution"

  - name: Code execution
    id: code_execution
    description: "Code is able to execute a python (byte)code from a payload or external location"
    tags:
      "behavior:code_execution"

  - name: Accessing files and directories
    id: file_access
    description: "Code is accessing/reading files and directories on a user computer"
    tags:
      "behavior:opening_files"

  - name: Windows OS
    id: windows
    description: "Code contains functionality specific to the Microsoft Windows OS"
    tags:
      "behavior:windows"

  - name: MacOS
    id: macos
    description: "Code contains functionality specific to the Apple Mac OS"
    tags:
      "behavior:macos"

  - name: Unix
    id: unix
    description: "Code contains functionality specific to the Unix/Posix style OS"
    tags:
      - "behavior:unix"

  - name: Low-level OS access
    id: low_level_access
    description: "Code is accessing low level OS functionalities using the ctypes interface"
    tags:
      - "behavior:ctypes"

  - name: Obfuscation
    id: obfuscation
    description: "Code contains functionality that is commonly used to obfuscate the behavior"
    tags:
      anyOf:
        - "behavior:obfuscation"
        - "behavior:accessing_variables"

  - name: Vulnerability
    id: vulnerability
    description: "Code may have one or more possible vulnerabilities or security problematic behavior"
    tags:
      allOf:
        - not:
            - "misc:test_code"
        - "vuln"

  - name: Possible Malware
    id: possible_malware
    description: "Code contains strong indications of a possible malware"
    tags:
      - "behavior:possible_malware"


severities: &severities
  critical:
    score: 100
    detections:
      - TaintAnomaly
      - LeakingSecret
  high:
    score: 50
    detections:
      - UnpinnedPackage
  medium:
    score: 30
    detections:
      - OutdatedPackage
  low:
    score: 10


score: &scores
  # Score assigned when a package contain a suspicious file inside such as python bytecode (*.pyc)
  contain-suspicious-file: 5

  # Score assigned when a package contain a sensitive file inside such as accidentally including .pypirc
  contain-sensitive-file: 100

  # You can Adjust the following default values for other built-in analyzers here

  # Distribution contains a file which checksum doesn't match with the one listed in RECORDs
  # dist-invalid-record-checksum: 100

  # dist-records-missing: 100

  # Distribution contains a setup.py file
  # dist-contain-setup-py: 100

  # Distribution contains a file that is not listed in RECORDs
  # dist-file-not-listed-in-records: 10

  # There is a file listed in RECORDs but missing inside the distribution
  # dist-missing-file: 100

  # Valid base 64 blob was found as a string in a source code
  # base-64-blob: 0

  # Archive contains absolute path such as /etc/passwd
  # suspicious-archive-entry-absolute-path: 50

  # Archive contains a file with a parent reference such as ../../../../etc/passwd
  # suspicious-archive-entry-parent-reference: 50

  # File appears to be an archive but can't be opened successfully and/or is corrupted
  # corrupted-archive: 10

  # Archive contain a file greater than the configured maximum archive file size, e.g. zip bomb
  # archive-file-size-exceeded: 100

  # Archive contain a member that is a link. This can lead to tarbomb or overwriting files outside the extraction directory
  # archive-member-is-link: 100

  # XML contain an entity which can be used for billion laughs attacks and similar
  # malformed-xml-entities: 100

  # XML contain DTD
  # malformed-xml-dtd: 20

  # XML contain external reference which can be used to access external resources including files on disk
  # malformed-xml-external-reference: 100

  # PyPI requirement is not pinned
  # requirement-unpinned: 10

  # PyPI requirement is outdated
  # requirement-outdated: 5

  # PyPI requirement is not valid, e.g. parsing failed
  # requirement-invalid: 0

  # PyPI requirement points to a remote URL
  # requirement-remote-url: 20

  # Variables, attributes and function definitions containing non-ascii characters
  non-ascii-tokens: 50

  # Leaking PyPIrc credentials
  # leaking_pypirc: 100

cache: &cache
  # Mode can be one of:
  # - "ask": ask if cache should be purged when running some commands (aura info, aura update ...)
  # - "auto": same as ask but it would purge the cage automatically without asking on specific commands
  # - "always": purge cache after every operation. This is same as "auto" + other non-standard operations including `aura scan`, `aura diff`
  mode: auto
  max-size: 1G
  expiration:
    default: 72  # 3 days

sbom: &sbom
  enabled: true
  licenses: "aura.data.license_specifiers.json"

interpreters: &interpreters
  # Configure python interpreters for parsing AST code
  # `python2` must point to the py2.7+ version (versions under 2.7 are not supported but might work)
  # `python3` must point to the py3.6+ or ideally py3.7+ due to compatibility
  # All other interpreters are optional, AST parsing will try them in the defined order

  native: native
  python2: python2

You can easily overwrite or extend this configuration file by using YAML anchors. This native feature of YAML allows you to merge existing YAML documents/parts into a single document. Aura requires you to prefix the YAML configuration with the —, which indicated it’s a multi-document file. When Aura detects this string at the beginning of the YAML configuration file, it injects the default configuration anchors into the document allowing you to inherit the default configuration without a need for copy/pasting the unchanged parts.

An example of YAML file overriding the configuration while preserving the default values:

---
# Make sure to include the `---` prefix at the beginning of the document
aura:
    # Inherit the default configuration section
    <<: *aura_config
    # Overwrite the default option
    min-score: 100
    # You can also insert new config options in case some plugins need it
    custom_option: "yes please"

Signatures configuration file¶

Default documented signatures file:

patterns: &default_patterns
  - id: flask_run_debug
    pattern: "flask.Flask.run(..., debug=True)"
    detection:
      message: Debug mode enabled in Flask
      score: 10
    tags:
      - vuln:flask:debug_enabled

  - id: ctypes_memory_allocation
    pattern: "ctypes.windll.kernel32.VirtualAlloc"
    detection:
      message: Memory allocation using ctypes
    tags:
      - behavior:ctypes:memory_alloc

  - id: ctypes_create_thread
    pattern: "ctypes.windll.kernel32.CreateThread"
    detection:
      message: Creating thread using ctypes
    tags:
      - behavior:ctypes:create_thread

  - id: ctypes_create_remote_thread
    pattern: "ctypes.windll.kernel32.CreateRemoteThread"
    detection:
      message: Creating remote thread using ctypes
    tags:
      - behavior:ctypes:create_thread

  - id: ctypes_write_process_memory
    pattern: "ctypes.windll.kernel32.WriteProcessMemory"
    detection:
      message: Ctypes writing directly into the process memory
      score: 100
    tags:
      - behavior:ctypes:write_process_memory

  - id: ctypes_window_hook
    pattern: "ctypes.windll.user32.SetWindowsHookExA"
    detection:
      message: Ctypes creating a hook for the window. Possible sign of keylogger
      score: 100
    tags:
      - behavior:ctypes:window_hook

  - id: ctypes_lock_screen
    pattern: "ctypes.windll.user32.LockWorkStation"
    detection:
      message: Ctypes locks windows screen
      score: 50
    tags:
      - behavior:ctypes:lock_workstation

  - id: ctypes_is_admin
    pattern: "ctypes.windll.shell32.IsUserAnAdmin"
    detection:
      message: Ctypes determining if user is an admin
      score: 100
    tags:
      - behavior:ctypes:is_admin

  - id: ctypes_enum_windows
    pattern: "ctypes.windll.user32.EnumWindows"
    detection:
      message: "Ctypes is enumerating over windows"
      score: 10
    tags:
      - behavior:ctypes:enum_windows

  - id: ctypes_page_execute_read_write
    pattern: "ctypes.c_int(64)"
    detection:
      message: Flag PAGE_EXECUTE_READ_WRITE for allocated memory page
    tags:
      - behavior:ctypes:rwx_memory_page

  - id: ctypes_kernel32_rtlmovememory
    pattern: "ctypes.windll.kernel32.RtlMoveMemory(...)"
    detection:
      message: "Copying memory between allocated pages"
    tags:
      - behavior:ctypes:rtl_move_memory

  - id: ctypes_kernel32_waitforsingleobject
    pattern: "ctypes.windll.kernel32.WaitForSingleObject"
    detection:
      message: Wait until the object is in signaled state or timed-out
    tags:
      - behavior:ctypes:wait_for_single_object

  - id: ctypes_kernel32_getvolumeinformationw
    pattern: "ctypes.windll.kernel32.GetVolumeInformationW"
    detection:
      message: Get volume information via ctypes
    tags:
      - behavior:ctypes:volume_information

  - id: ctypes_cfunc_type
    pattern: "ctypes.CFUNCTYPE"
    detection:
      message: CFUNCTYPE allows creating a pointer to memory location containing executable code
      score: 100
    tags:
      - behavior:ctypes:function_pointer
      - behavior:possible_malware

  - id: mktemp_racecond
    pattern: "tempfile.mktemp(...)"
    detection:
      score: 20
      message: "Usage of tempfile.mktemp is susceptible to race conditions!"
    tags:
      - vuln:mktemp
      - behavior:tempfile

  - id: open_file
    pattern: "open(...)"
    detection:
      message: Code is accessing files via open
    tags:
      - behavior:opening_files
    taint: sink

  - id: md5_deprecated
    pattern: "hashlib.md5"
    detection:
      message: Usage of MD5 for cryptographic purposes is very dangerous and no longer recommended
      score: 20
    tags:
      - deprecated:hash

  - id: requests_unverified
    pattern:
      - "requests.get(..., verify=False)"
      - "requests.post(..., verify=False)"
      - "requests.put(..., verify=False)"
      - "requests.delete(..., verify=False)"
      - "requests.patch(..., verify=False)"
      - "requests.head(..., verify=False)"
      - "requests.options(..., verify=False)"
    detection:
      message: SSL/TLS verification disabled when doing a request
      score: 10
    tags:
      - vuln:unverified_request

  - id: shell_injection
    pattern:
      - "subprocess.run(..., shell=True)"
      - "subprocess.Popen(..., shell=True)"
      - "subprocess.call(..., shell=True)"
      - "subprocess.check_call(..., shell=True)"
      - "subprocess.check_output(..., shell=True)"
    detection:
      score: 20
      message: "Setting shell=True is dangerous and allows a shell injection attack"
    tags:
      - vuln:shell_injection
    taint: sink

  - id: dangerous_pickle
    pattern:
      - "pickle.load(...)"
      - "pickle.loads(...)"
      - "cPickle.load(...)"
      - "cPickle.loads(...)"
    detection:
      message: Usage of pickle is very dangerous and easily exploitable
      score: 50
    tags:
      - behavior:pickle
    taint: sink

  - id: os_system_execution
    pattern:
      - "os.system(...)"
      - "os.popen(...)"
      - "os.popen2(...)"
      - "os.popen3(...)"
      - "os.popen4(...)"
      - "os.startfile(...)"
      - "os.execl(...)"
      - "os.execle(...)"
      - "os.execlp(...)"
      - "os.execv(...)"
      - "os.execve(...)"
      - "os.execvp(...)"
      - "os.execvpe(...)"
      - "os.spawnl(...)"
      - "os.spawnle(...)"
      - "os.spawnlp(...)"
      - "os.spawnlpe(...)"
      - "os.spawnv(...)"
      - "os.spawnve(...)"
      - "os.spawnvp(...)"
      - "os.spawnvpe(...)"
    detection:
      score: 50
      message: Code is performing system command execution
    tags:
      - behavior:system_execution
    taint: sink

  - id: yaml_load_unsafe
    pattern: "yaml.load(...)"
    detection:
      message: yaml.load is considered unsafe as it can execute python commands via directive. Use yaml.safe_load instead
      score: 100
    tags:
      - vuln:yaml_unsafe_load
    taint: sink

  - id: python_code_execution
    pattern:
      - "eval(...)"
      - "exec(...)"
    detection:
      score: 100
      message: eval/exec usage found in a source code
    tags:
      - behavior:obfuscation
      - behavior:code_execution
    taint: sink

  - id: get_variables
    pattern:
      - "globals()"
      - "locals()"
    detection:
      score: 100
      message: Usage of locals() or globals() found in a source code
    tags:
      - behavior:accessing_variables

  - id: inline_import
    pattern:
      - "__import__(...)"
      - "importlib.import_module(...)"
      - "importlib.__import__(...)"
    detection:
      message: Inline import
      score: 50
    tags:
      - behavior:inline_import
      - behavior:code_execution
      - behavior:obfuscation
    taint: sink

  - pattern: "getpass.getuser"
    detection:
      message: Local username lookup, could be used for exploit to determine if running under the root/admin
      score: 10
    tags:
      - behavior:getuser

  # Taint cleaners
  - pattern: "int(...)"
    taint: safe

  - pattern: "float(...)"
    taint: safe

  - pattern: "flask.Markup.escape(...)"
    taint: safe

  - pattern: "shlex.escape(...)"
    taint: safe

  # Taint sources
  - pattern: "input(...)"
    taint: tainted

  - pattern: "raw_input(...)"
    taint: tainted

  - id: flask_request_args
    pattern: "flask.request.args"
    taint: tainted

  - id: flask_request_form
    pattern: "flask.request.form"
    taint: tainted

  - pattern: "flask.request.path"
    taint: tainted

  - id: flask_request_headers
    pattern: "flask.request.headers"
    taint: tainted

  - id: flask_request_files
    pattern: "flask.request.files"
    taint: tainted

  - id: flask_request_cookies
    pattern: "flask.request.cookies"
    taint: tainted

  - pattern: "flask.request.get_json(...)"
    taint: tainted

  # Taint sinks
  - pattern: "flask.make_response(...)"
    taint: sink

  - pattern: "flask.jsonify(...)"
    taint: sink

  - pattern: "flask.send_file(...)"
    taint: sink

  - pattern: "flask.db.execute(...)"
    taint: sink

  - pattern: "flask.make_response.set_cookie(...)"
    taint: sink

  - id: subprocess_sink
    pattern:
      - "subprocess.Popen(...)"
      - "subprocess.call(...)"
      - "subprocess.run(...)"
    taint: sink

  - pattern: "MySQLdb.connect.cursor.execute(...)"
    taint: sink

  - pattern: "mysql.connector.connect.cursor.execute(...)"
    taint: sink

  - pattern: "pymysql.connect.cursor.execute(...)"
    taint: sink

  - pattern: "sqlalchemy.orm.scoped_session.execute(...)"
    taint: sink

  - pattern: "psycopg2.connect.cursor.execute"
    taint: sink

  - pattern: "django.shortcuts.render(...)"
    tags:
      - django_view
    taint:
      level: sink
      log_message: "AST node has been marked as Django view"
      args:
        request: tainted

  - pattern: "django.http.HttpResponse(...)"
    taint: sink

  - pattern: "django.http.HttpResponseNotFound(...)"
    taint: sink

  - pattern: "sqlite3.connect.execute(...)"
    taint: sink

  - id: shutil_module_sinks
    pattern:
      - "shutil.copyfileobj(...)"
      - "shutil.copyfile(...)"
      - "shutil.copymode(...)"
      - "shutil.copystat(...)"
      - "shutil.copy(...)"
      - "shutil.copy2(...)"
      - "shutil.copytree(...)"
      - "shutil.rmtree(...)"
      - "shutil.move(...)"
      - "shutil.chown(...)"
      - "shutil.make_archive(...)"
      - "shutil.unpack_archive(...)"
    taint: sink

  # Module imports
  - id: network_modules
    pattern:
      - "import socket"
      - "import requests"
      - "import urllib"
      - "import urllib2"
      - "import urllib3"
      - "import httplib"
      - "import ftplib"
      - "import twisted"
    tags:
      - behavior:network

  - id: code_execution_modules
    pattern:
      - "import importlib"
      - "import pickle"
      - "import cPickle"
      - "import marshal"
      - "import imp"
      - "import imputil"
      - "import zipimport"
      - "import runpy"
      - "import subprocess"
      - "import popen2"
      - "import commands"
    tags:
      - behavior:code_execution

  - id: obfuscation_modules
    pattern: "import base64"
    tags:
      - behavior:obfuscation

  - id: windows_modules
    pattern:
      - "import winreg"
      - "import _winreg"
    tags:
      - behavior:windows

  - id: django_modules
    pattern: "import django.shortcuts.render"
    tags:
      - misc:django


files: &default_files
  - id: tag_test_code
    type: regex
    pattern: "^test(_.+|s)?$"
    target: part
    tags:
      - misc:test_code

  - type: regex
    pattern: "^id_[rd]sa$"
    target: filename
    tags:
      - leak:sensitive_file
      - leak:private_key

  - type: exact
    pattern: ".bash_history"
    target: filename
    tags:
      - leak:sensitive_file
      - leak:bash_history

  - type: exact
    pattern: ".htpasswd"
    target: filename
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: ".ssh/known_keys"
    target: full
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: ".ssh/authorized_keys"
    target: full
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: "wallet.dat"
    target: filename
    tags:
      - leak:senstivive_file
      - misc:crypto_wallet

  - type: contains
    pattern: "etc/shadow"
    target: full
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: "etc/sudoers"
    target: full
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: "Local/Google/Chrome/"
    target: full
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: "secret_token.rb"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: kwallet
    target: filename
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: ".docker/config.json"
    target: full
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: ".kube/config"
    target: full
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".bash_login"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".bash_history"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".sh_history"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".mysql_history"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".dbshell"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".rediscli_history"
    target: filename
    tags:
      - leak:sensitive_file

  - type: contains
    pattern: ".aws/credentials"
    target: full
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".viminfo"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".fetchmailrc"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: "database.yml"
    target: filename
    tags:
      - leak:sensitive_file

  - type: exact
    pattern: ".gitignore"
    target: filename
    tags:
      - misc:ignore_file

  - type: exact
    pattern: ".travis.yml"
    target: filename
    tags:
      - misc:ignore_file


strings: &default_strings
  - id: all_interfaces
    type: regex
    pattern: "^0\\.0\\.0\\.0(:\\d{2,6})?$"
    message: "Binding to all interfaces may unwillingly expose non-protected interface"
    score: 10
    tags:
      - behavior:public_network_interface

  - id: tmp_folder
    type: regex
    pattern: "^(/tmp|/var/tmp|/dev/shm|C:\\\\{1,2}Windows\\\\Temp\\\\).*$"
    message: "Hardcoded tmp folder in the source code"
    score: 10
    tags:
      - behavior:temp_file

  - id: url
    type: regex
    pattern: "^(http|ftp)s?://.{5,}"
    message: "A possible URL has been found"
    score: 0
    informational: false
    tags:
      - behavior:url

  - id: netsh_firewall
    type: regex
    pattern: "netsh (adv)?firewall"
    message: "Windows netsh firewall command"
    score: 10
    tags:
      - behavior:windows:firewall

  - id: mac_firewall
    type: regex
    pattern: "^(pfctl|/usr/libexec/ApplicationFirewall/socketfilterfw).*"
    message: "Mac os firewall command (packet filtering)"
    tags:
      - behavior:macos:firewall

  - id: windows_service
    type: contains
    pattern: "SysWOW64"
    message: "Windows system folder for services"
    score: 10
    tags:
      - behavior:windows:system_service_folder

  - id: windows_task_scheduler
    type: contains
    pattern: "schtasks"
    message: "Windows task scheduler"
    score: 10
    tags:
      - behavior:windows:task_scheduler

  - id: hosts_file
    type: regex
    pattern: "^(C:\\\\{1,2}Windows\\\\System32\\\\drivers\\\\etc\\\\hosts|/etc/hosts)$"
    message: "Location of hosts file"
    score: 5
    tags:
      - behavior:accessing_host_file

  - id: sudo_command
    type: regex
    pattern: "^sudo .*$"
    message: "Executing sudo command"
    score: 20
    tags:
    - behavior:unix:sudo

  # Regexes for detecting leaking api tokens, secrets, etc...
  # Source of some regexes used: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04B-3_Meli_paper.pdf
  - id: twitter_access_token
    type: regex
    pattern: "^[1-9][0-9]+-[0-9a-zA-Z]{40}$"
    message: "Twitter access token"
    score: 100
    tags:
      - leak:access_token

  - id: facebook_access_token
    type: regex
    pattern: "^EAACEdEose0cBA[0-9A-Za-z]+$"
    message: "Facebook access token"
    score: 100
    tags:
      - leak:access_token

  - id: google_api_key
    type: regex
    pattern: "^AIza[-0-9A-Za-z_]{35}$"
    message: "Google API key"
    score: 100
    tags:
      - leak:access_token

  - id: google_oauth_id
    type: regex
    pattern: "^[0-9]+-[0-9A-Za-z_]{32}\\.apps\\.googleusercontent\\.com$"
    message: "Google OAuth ID"
    score: 100
    tags:
      - leak:access_token

  - id: picatic_api_key
    type: regex
    pattern: "^sk_live_[0-9a-z]{32}$"
    message: "Picatic API key"
    score: 100
    tags:
      - leak:access_token

  - id: stripe_standard_key
    type: regex
    pattern: "^sk_live_[0-9a-zA-Z]{24}$"
    message: "Stripe standard key"
    score: 100
    tags:
      - leak:access_token

  - id: stripe_restricted_key
    type: regex
    pattern: "^rk_live_[0-9a-zA-Z]{24}$"
    message: "Stripe restricted key"
    score: 100
    tags:
      - leak:access_token

  - id: square_access_token
    type: regex
    pattern: "^sq0atp-[-0-9A-Za-z_]{22}$"
    message: "Square access token"
    score: 100
    tags:
      - leak:access_token

  - id: square_oauth_secret
    type: regex
    pattern: "^sq0csp-[-0-9A-Za-z_]{43}$"
    message: "Square OAuth secret"
    score: 100
    tags:
      - leak:access_token

  - id: paypal_braintree
    type: regex
    pattern: "^access_token\\$production\\$[0-9a-z]{16}\\$[0-9a-f]{32}$"
    message: "PayPal braintree access token"
    score: 100
    tags:
      - leak:access_token

  - id: amazon_mws_auth_token
    type: regex
    pattern: "^amzn\\.mws\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"
    message: "Amazon MWS auth token"
    score: 100
    tags:
      - leak:access_token

  - id: twilio_api_key
    type: regex
    pattern: "^SK[0-9a-fA-F]{32}$"
    message: "Twilio API key"
    score: 100
    tags:
      - leak:access_token

  - id: mailgun_api_key
    type: regex
    pattern: "^key-[0-9a-zA-Z]{32}$"
    message: "Mailgun API key"
    score: 100
    tags:
      - leak:access_token

  - id: mailchimp_api_key
    type: regex
    pattern: "^[0-9a-f]{32}-us[0-9]{1,2}$"
    message: "MailChimp API key"
    score: 100
    tags:
      - leak:access_token

  - id: amazon_aws_key
    type: regex
    pattern: "^(A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[0-9A-Z]{16}$"
    message: "Amazon AWS key"
    score: 100
    tags:
      - leak:access_token

  - id: pypi_api_key
    type: regex
    pattern: "^pypi-AgEIcHlwaS5vcmc[A-Za-z0-9-_]{50,1000}$"
    message: "PyPI upload token"
    score: 100
    tags:
      - leak:access_token

Signature definitions can be overridden and extended in the same manner as the main configuration file by using YAML anchors and prefixing the config with the —.

Environment config options¶

The following environment variable configuration options can be used to configure the aura behavior:

Environment variable	Explanation
AURA_CFG	Overwrite the path to the main configuration file
AURA_SIGNATURES	Overwrite the path to the configuration file for signatures/patterns
AURA_MIRROR_PATH	Location to the local pypi mirror repository
AURA_PYPI_STATS	Overwrite the path to the aura pypi_stats dataset
AURA_REVERSE_DEPENDENCIES	Overwrite the path to the aura reverse_dependencies dataset
AURA_NO_FORK	Disable forking for async processing in the analysis pipeline
AURA_LOG_LEVEL	Output log level
AURA_NO_BLOBS	Disable extraction of data blobs for further analysis
AURA_NO_PROGRESS	Disable cli progress bar, useful when redirecting stderr and stdout
AURA_FORCE_COLORS	Force ANSI colors on text output
AURA_TERM_WIDTH	Force terminal width for text output rendering
AURA_CACHE_LOCATION	Override the cache location
AURA_NO_CACHE	Disable the cache entirely for all operations
AURA_DEBUG_LEAKS	Turn on the garbage collector `DEBUG_LEAK` flag on
AURA_DEBUG_LINES	List of line numbers separated by a comma. Aura will then call `breakpoint()` when traversing AST tree and it visits a node located on those specific line numbers