Trying Azure Blob Storage

What is Azure Blob Storage?

Azure Blob Storage is a cloud service for storing unstructured data—like images, logs, backups, or CSV files. Think of it as a scalable, secure “cloud hard drive” with an API, access control, and built-in redundancy.

Unlike local files, blobs can be accessed by any authorized app—whether it’s running on your laptop, in Azure, or on a server across the world.

Why I Tried It

Azure Blob Storage was a requirement in a recent academic project. Before that, I hadn’t really thought about how or when I’d use it in my own projects—but I was curious to see how it actually works in practice.

The task was simple: load a dataset (All_Diets.csv) from the cloud instead of a local file. So I set up a Blob container and wrote a loader that tries cloud first, then falls back to local.

At the time, I used a connection string via environment variables (like in a .env file), which is a common and practical approach for local development or low-risk projects. Later, I learned about Azure Key Vault—a more secure option for managing secrets in cloud environments. Neither is “wrong”; it’s about choosing the right tool for the context.


The Fallback Loader: Cloud + Local, One Function

Here’s the entry point used by the app:

def load_dataset(filename="All_Diets.csv"):
# Try Azure Blob Storage first
if os.getenv("AZURE_STORAGE_CONNECTION_STRING"):
try:
from ..blob_storage import read_csv_from_blob
return read_csv_from_blob(filename)
except Exception:
pass # Silently fall back to local
# Local fallback
csv_path = Path(__file__).parent.parent / "datasets" / filename
return pd.read_csv(csv_path)

This means:

  • In Azure: app loads data directly from Blob Storage
  • On my laptop: it uses the local datasets/ folder
  • Same code, no config changes

No crashing. No manual switching. Just works.

How read_csv_from_blob Works

Behind the scenes, a small utility handles the Azure interaction:

def read_csv_from_blob(blob_name: str, container_name: str = "datasets") -> pd.DataFrame:
blob_service_client = get_blob_service_client()
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Download as bytes → load into pandas
blob_data = blob_client.download_blob().readall()
return pd.read_csv(io.BytesIO(blob_data))

It relies on the AZURE_STORAGE_CONNECTION_STRING environment variable—stored in .env during development.


Final Thoughts

I hadn’t planned to use Blob Storage before this assignment—and I’m not sure when I’ll need it in a personal project yet. But having hands-on experience with real Azure services is a huge bonus.

Through academic projects like this, I’m getting familiar with core cloud patterns:

  • Storing secrets (Key Vault)
  • Managing containers (ACR)
  • Hosting data (Blob Storage)

Even if I don’t use them all right away, knowing how they work—and how they fit together—makes the cloud feel less abstract.

☁️ Sometimes the best learning comes from a requirement you didn’t ask for—but turns out to be useful anyway.


Thank you

Big thanks for reading! You’re awesome, and I hope this post helped. Until next time!