Skip to content

Python Url Downloader Library 🤷

PUDLy is yet another URL downloader for python.

Goal of the project

The goal of this project is to create simple functions to handle file download tasks.

The two main components of this library are:

How to use

Downloading one file

The download function can be used to download a file from an url. The function returns the downloaded file's path as Path.

from pudly import download

url = "https://databank.worldbank.org/data/download/WDI_CSV.zip"
file = download(url)

assert file.exists()

It takes optional arguments to specify the download directory or any query parameters for the request.

from pudly import download
from pathlib import Path

url = "https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD"
query_parameters = {"downloadformat": "csv"}
download_directory = Path("data")

file = download(url, download_dir=download_directory, query_parameters=query_parameters)

assert file.exists()

Downloading multiple files

The download_files_concurrently function uses threading to download files in parallel. It returns the list of the downloaded file's path.

from pathlib import Path
from pudly import download_files_concurrently

urls = [
    "https://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=csv",
    "https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=csv",
    "https://api.worldbank.org/v2/en/indicator/EN.POP.DNST?downloadformat=csv",
]
download_dictionary = Path("data")

files = download_files_concurrently(urls, download_dir=download_dictionary)

for file in files:
    assert file.exists()

Note

download_dir and query_parameters arguments are used for every URL in the list when downloaded.

Authentication

Currently, the following option are available for authentication for webservers that requires it.

Basic authentication

This is the simplest kind of authentication method for web services. You have to create a HTTPBasicAuth instance with your credentials, then pass it as an argument to the download function.

from pudly import download, HTTPBasicAuth

credential = HTTPBasicAuth("username", "password")
download("https://www.myurl.com/download", auth=credential)

netrc authentication

If no authentication method is given with the auth argument, pudly will attempt to get the authentication credentials for the URL's hostname from the user's netrc file.

Exception Handling

The library has custom exception type called DownloadError. Errors from requests are caught and raised as DownloadError.

The downloaded file is validated at the end of the process by comparing the expected size with the actual one. If the sizes are mismatched then DownloadError will be raised.

Logging

The library uses the python logging library. The name of the logger is pudly and can be accessed by importing it or calling the logging.getLogger() function.

Importing

from pudly import logger as pudly_logger

getLogger

import logging

pudly_logger = logging.getLogger("pudly")

Configuration

If you want to enable the logging for the library, you can add a handler to the log logger as in the following example.

import logging
from pudly import download
from pudly import logger as pudly_logger

log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
formatter = logging.Formatter(log_format)

console = logging.StreamHandler()
console.setLevel(logging.DEBUG)
console.setFormatter(formatter)

pudly_logger.addHandler(console)
pudly_logger.setLevel(logging.DEBUG)

download("https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=csv")

Output on console:

2024-11-23 17:14:21,992 - pudly - INFO - Download from https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=csv (135117 bytes)
2024-11-23 17:14:21,992 - pudly - DEBUG - Start downloading API_NY.GDP.MKTP.CD_DS2_en_csv_v2_2.zip
2024-11-23 17:14:22,019 - pudly - DEBUG - API_NY.GDP.MKTP.CD_DS2_en_csv_v2_2.zip downloaded 135117 bytes / 135117 bytes
2024-11-23 17:14:22,019 - pudly - DEBUG - Finished downloading API_NY.GDP.MKTP.CD_DS2_en_csv_v2_2.zip
2024-11-23 17:14:22,020 - pudly - INFO - Downloaded API_NY.GDP.MKTP.CD_DS2_en_csv_v2_2.zip successfully