The Aura typosquatting protection requires a dataset file that contains a list of python packages and their popularity (number of downloads) in a JSON file. This file can be obtained by querying the Google Big Query service.
Although Google Big Query is a commercial service, Google provides a free tier of 1TB/month of processed data which is more than enough to obtain the data needed for the typosquatting protection for free.
Manual dataset download¶
To connect to the Big Query service, you must first install the Big Query command-line tool from google-cloud-sdk. Follow the official documentation to install this tool. Alternatively, you can use the online console to run the query and export the JSON results to Google Drive https://console.cloud.google.com/bigquery .
Now run the following query to generate the dataset needed for the typosquatting protection:
SELECT file.project as package_name, count(file.project) as downloads FROM `the-psf.pypi.downloads*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE( '%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE()) GROUP BY package_name ORDER BY downloads DESC
Download dataset via Aura¶
If you have a google python SDK installed and authentication configured for the python client, you can download the dataset automatically via Aura by running aura fetch-pypi-stats. To find out if your python Big Query SDK is correctly configured, run aura info and check the output if the BigQuery service integration is enabled.