S3FS
S3FS is a library built on top of botocore that allows you to mount Akave storage as a local file system while preserving the native object format for files.
The s3fs-fuse driver is a user-space file system that provides a virtual file system interface to S3-compatible storage. It allows you to access your Akave storage as a local file system, making it easy to work with your data as if it were stored on your local machine.
Prerequisites
-
Akave Cloud Credentials
These can be requested by contacting Akave at Akave Cloud Contact. -
Install dependencies (Requirements: Python 3.9+, pip, s3fs)
Installation
python
and pip
. For systems where you need to explicitly specify Python 3 you may need to use python3
and pip3
. Use the command variation that works for your specific environment.Pip Installation Instructions
Pip comes pre-installed with Python 3.4 and later. If you don’t already have Python installed, you can download it from https://www.python.org/downloads/.
You can verify that pip is installed by running the following command:
pip --version
S3FS Installation Instructions
The simplest way to install the S3FS library is to use pip:
pip install s3fs
Run the following command to verify installation:
pip show s3fs
S3FS Fuse Installation Instructions
MacOS
macOS 10.12 and newer via Homebrew:
brew install --cask macfuse
brew install gromgit/fuse/s3fs-mac
Linux
Debian 9 and Ubuntu 16.04 or newer:
sudo apt install s3fs
Authentication
Before using S3FS with Akave, you need to configure authentication. There are several ways to do this, for this guide we’ll focus on those that use the default AWS CLI profile functionality.
For more information on using the AWS CLI with Akave O3 see the documentation on setup.
For other authentication methods see the S3FS Fuse Github.
Option 1: Credentials File
Create or edit ~/.aws/credentials
and add your Akave credentials:
[akave-o3]
aws_access_key_id = your_access_key_id
aws_secret_access_key = your_secret_access_key
endpoint_url = https://o3-rc2.akave.xyz
Option 2: AWS CLI
Run the below command and follow the prompts to add your access key, secret key, and region.
aws configure --profile akave-o3
- AWS Access Key ID:
<your_access_key>
- AWS Secret Access Key:
<your_secret_key>
- Default region name:
akave-network
- Default output format:
json
Usage
CLI (s3fs-fuse)
Mounting an Akave Bucket
Create a directory to mount your bucket
mkdir -p ~/akave-mount
Mount the bucket
s3fs your-bucket-name ~/akave-mount \
-o url=https://o3-rc2.akave.xyz \
-o profile=akave-o3
Check active mounts
mount | grep s3fs
Unmount when done
umount ~/akave-mount
Additional mounting options
Enable Debugging
-o dbglevel=info -f
The -f
flag is used to run s3fs in foreground mode, which is useful for debugging.
To modify the verbosity of the output, use dbglevel=
followed by one of the following:
- debug
- warn
- info
- err
Use Cache
-o use_cache=/path/to/cache
Specifies a directory to use for caching files.
Parallel Upload
-o parallel_count=1
Controls the number of parallel upload threads.
Multi-Request Maximum
-o multireq_max=1
Controls the maximum number of requests that can be made in parallel.
Basic Operations
Once mounted, you can use standard file system commands:
List files in bucket with their sizes
ls -l ~/akave-mount
Copy a local file to the bucket
cp myfile.txt ~/akave-mount/
Download a file from the bucket
cp ~/akave-mount/myfile.txt ./
Delete a file from the bucket
rm ~/akave-mount/myfile.txt
S3FS Specific Operations
Edit a file in place
nano ~/akave-mount/notes/todo.txt
View S3FS Logs
On MacOS:
log show --predicate 'process == "s3fs"' --last 1h
On Linux:
journalctl -t s3fs --since "1 hour ago"
Python
The S3FS Python library provides a powerful interface to work with Akave O3 storage programmatically. Below are examples of common operations and best practices.
Imports
To use S3FS in Python, you need to import the s3fs
module:
import s3fs
Some other imports that are helpful are OS for environment variables and pandas for data analysis.
import os
import pandas as pd
Authentication Options
The below sections outline different ways to securely authenticate with Akave O3 storage using S3FS.
Using Environment Variables
Environment variables are a secure way to handle credentials. You can load them directly by exporting them in your shell:
export AKAVE_ACCESS_KEY=your_access_key_here
export AKAVE_SECRET_KEY=your_secret_key_here
Then in your Python code you can reference the credentials:
import os
access_key = os.environ.get("AKAVE_ACCESS_KEY")
secret_key = os.environ.get("AKAVE_SECRET_KEY")
fs = s3fs.S3FileSystem(
key=access_key,
secret=secret_key,
endpoint_url="https://o3-rc2.akave.xyz",
client_kwargs={"region_name": "akave-network"}
)
Using .env Files
For development environments, you can use .env files with python-dotenv:
Example .env
file:
AKAVE_ACCESS_KEY=your_access_key_here
AKAVE_SECRET_KEY=your_secret_key_here
Then in your Python code you can load the credentials from the .env file:
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
fs = s3fs.S3FileSystem(
key=os.environ.get("AKAVE_ACCESS_KEY"),
secret=os.environ.get("AKAVE_SECRET_KEY"),
endpoint_url="https://o3-rc2.akave.xyz",
client_kwargs={"region_name": "akave-network"}
)
.env
file to .gitignore
to prevent accidentally committing credentials to version control.Using AWS CLI Profile
You may also use AWS CLI profiles, where credentials are stored in your system’s credential store:
fs = s3fs.S3FileSystem(
profile="akave-o3",
endpoint_url="https://o3-rc2.akave.xyz",
client_kwargs={"region_name": "akave-network"}
)
Basic Operations
List buckets
buckets = fs.ls("")
print(f"Available buckets: {buckets}")
List files in a bucket
files = fs.ls("your-bucket-name")
for file in files:
print(file)
Create a directory
fs.mkdir("your-bucket-name/new-directory")
Upload a file
fs.put("local-file.txt", "your-bucket-name/remote-file.txt")
Upload a large file with progress tracking
with fs.open("your-bucket-name/large-file.zip", "wb") as remote_file:
with open("local-large-file.zip", "rb") as local_file:
remote_file.write(local_file.read())
print("Upload complete!")
Download a file
fs.get("your-bucket-name/remote-file.txt", "downloaded-file.txt")
Download a large file in chunks
with fs.open("your-bucket-name/large-file.csv", "rb") as remote_file:
# Process the file in chunks to avoid loading it all into memory
chunk_size = 1024 * 1024 # 1 MB chunks
while True:
chunk = remote_file.read(chunk_size)
if not chunk:
break
# Process chunk here
Delete a file
fs.rm("your-bucket-name/file-to-delete.txt")
Delete multiple files
fs.rm(["your-bucket-name/file1.txt", "your-bucket-name/file2.txt"])
Delete a directory and all its contents recursively
fs.rm("your-bucket-name/directory-to-delete", recursive=True)
Working with Pandas
S3FS integrates well with pandas for data analysis workflows:
Read CSV directly from Akave storage
df = pd.read_csv(fs.open("your-bucket-name/data.csv"))
Write DataFrame back to Akave storage as CSV
df.to_csv(fs.open("your-bucket-name/processed-data.csv", "w"))
Read parquet files directly from Akave storage
df = pd.read_parquet(fs.open("your-bucket-name/data.parquet"))
Write DataFrame back to Akave storage as parquet
df.to_parquet(fs.open("your-bucket-name/processed-data.parquet", "wb"))
Advanced Operations
Get file metadata and info
info = fs.info("your-bucket-name/myfile.txt")
print(f"File size: {info['size']} bytes")
print(f"Last modified: {info['LastModified']}")
Copy objects within storage
fs.copy("your-bucket-name/source.txt", "your-bucket-name/destination.txt")
Error Handling and Best Practices
Error handling
Use try/except blocks to handle errors by checking the error code and handling it accordingly.
import botocore.exceptions
try:
# Attempt to access a file
with fs.open("your-bucket-name/may-not-exist.txt", "rb") as f:
content = f.read()
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "NoSuchKey":
print("The file does not exist")
elif e.response["Error"]["Code"] == "AccessDenied":
print("Access denied - check permissions")
else:
print(f"Error occurred: {e}")
Batch operations for better performance
Use batch operations to upload multiple files at once for better performance.
files_to_upload = [
("local1.txt", "your-bucket-name/remote1.txt"),
("local2.txt", "your-bucket-name/remote2.txt"),
("local3.txt", "your-bucket-name/remote3.txt")
]
for local, remote in files_to_upload:
fs.put(local, remote)
Connection pooling for multiple operations
Use the same S3FileSystem instance for multiple operations to benefit from connection pooling by setting the max_pool_connections
parameter.
fs = s3fs.S3FileSystem(
profile="akave-o3",
endpoint_url="https://o3-rc2.akave.xyz",
config_kwargs={"max_pool_connections": 20}
)
Caching configuration
Enable client-side caching to reduce the number of requests made to Akave storage by setting the use_listings_cache
and listings_expiry_time
parameters.
fs = s3fs.S3FileSystem(
profile="akave-o3",
endpoint_url="https://o3-rc2.akave.xyz",
use_listings_cache=True,
listings_expiry_time=300 # Cache TTL in seconds
)
Example Python Script
A comprehensive example script demonstrating all S3FS operations with Akave O3 is available at:
This script includes:
- Bucket operations: List buckets and their contents
- File operations: Upload, download, delete, and copy files
- Directory operations: Create directories and manage folder structures
- Pandas integration: Read and write DataFrames directly to/from Akave storage
- Error handling: Robust error handling patterns
- CLI interface: Command-line arguments for flexible testing
Dependencies
Required packages (install via pip):
s3fs
: Python library for S3-compatible object storage file system operationspandas
: Data analysis library for working with DataFrames and structured data
Standard library modules (included with Python):
os
: Operating system interface for file operationsdatetime
: Date and time handlingargparse
: Command-line argument parsing
Install dependencies:
pip install s3fs pandas
Usage examples
You can run the script with the following commands to run all tests or test specific operations with variable bucket and file names.
Run all tests on a bucket:
python s3fs_test.py my-bucket
List buckets and contents:
python s3fs_test.py my-bucket --operation list
Upload a specific file:
python s3fs_test.py my-bucket --operation upload --file data.csv
Download a specific file:
python s3fs_test.py my-bucket --operation download --file data.csv
Delete a specific file:
python s3fs_test.py my-bucket --operation delete --file data.csv
Note: The script uses the AWS CLI profile akave-o3
by default. Modify the create_s3fs_client()
function to use your configured profile name (e.g., akave-o3
) or to use environment variables for authentication using the key
and secret
parameters described in the Authentication section above