Storage Architecture & Data Directories
The trueblocks-dalle library organizes all data using a structured directory hierarchy managed by the pkg/storage package. This chapter explains the storage architecture, data directory resolution, and file organization patterns.
Data Directory Resolution
Default Location
The library automatically determines an appropriate data directory based on the platform:
func DataDir() string {
if dir := os.Getenv("TB_DALLE_DATA_DIR"); dir != "" {
return dir
}
// Falls back to platform-specific defaults
}
Platform defaults:
- macOS:
~/Library/Application Support/TrueBlocks - Linux:
~/.local/share/TrueBlocks - Windows:
%APPDATA%/TrueBlocks
Environment Override
Set TB_DALLE_DATA_DIR to use a custom location:
export TB_DALLE_DATA_DIR="/custom/path/to/dalle-data"
Directory Structure
The data directory contains several key subdirectories:
$TB_DALLE_DATA_DIR/
├── output/ # Generated artifacts (images, prompts, audio)
├── cache/ # Database and context caches
├── series/ # Series configuration files
└── metrics/ # Progress timing data
Output Directory
Generated artifacts are organized by series under output/:
output/
└── <series-name>/
├── data/ # Raw attribute data dumps
├── title/ # Human-readable titles
├── terse/ # Short captions
├── prompt/ # Full structured prompts
├── enhanced/ # OpenAI-enhanced prompts
├── generated/ # Raw DALL·E generated images
├── annotated/ # Images with caption overlays
├── selector/ # Complete DalleDress JSON metadata
└── audio/ # Text-to-speech MP3 files
Each subdirectory contains files named <address>.ext where:
addressis the input seed string (typically Ethereum address)extis the appropriate file extension (.txt,.png,.json,.mp3)
Cache Directory
The cache directory stores processed database indexes and temporary files:
cache/
├── databases.cache # Binary database cache file
├── series.cache # Series configuration cache
└── temp/ # Temporary files during processing
Series Directory
Series configurations are stored as JSON files:
series/
├── default.json # Default series configuration
├── custom-series.json # Custom series with filters
└── deleted/ # Soft-deleted series
└── old-series.json
File Path Utilities
The storage package provides utilities for constructing paths:
Core Functions
// Base directories
func DataDir() string // Main data directory
func OutputDir() string // output/ subdirectory
func SeriesDir() string // series/ subdirectory
func CacheDir() string // cache/ subdirectory
// Path construction
func EnsureDir(path string) error // Create directory if needed
func CleanPath(path string) string // Sanitize file paths
Path Security
All file operations include security checks to prevent directory traversal:
// Example from annotate.go
cleanName := filepath.Clean(fileName)
if !strings.Contains(cleanName, string(os.PathSeparator)+"generated"+string(os.PathSeparator)) {
return "", fmt.Errorf("invalid image path: %s", fileName)
}
Artifact Lifecycle
Creation Flow
- Directory Creation: Output directories are created as needed during generation
- Incremental Writing: Artifacts are written as they're generated (prompts → image → annotation)
- Atomic Operations: Files are written atomically to prevent corruption
- Metadata Updates: JSON metadata is updated throughout the process
Caching Strategy
- Existence Checks: If an annotated image exists, the pipeline returns immediately (cache hit)
- Incremental Processing: Individual artifacts are cached, allowing partial resume
- Selective Regeneration: Only missing or outdated artifacts are regenerated
Cleanup Operations
The Clean function removes all artifacts for a series/address pair:
func Clean(series, address string) {
// Removes files from all output subdirectories
// Clears cached DalleDress entries
// Updates progress tracking
}
Database Storage
Embedded Databases
Attribute databases are embedded in the binary as compressed tar.gz archives:
pkg/storage/databases.tar.gz # Compressed attribute databases
Cache Format
Processed databases are cached in binary format for fast loading:
type DatabaseCache struct {
Version string // Cache version
Timestamp int64 // Creation time
Databases map[string]DatabaseIndex // Processed indexes
Checksum string // Validation checksum
SourceHash string // Source data hash
}
Cache Validation
The cache system validates integrity on load:
- Checksum Verification: Ensures cache file hasn't been corrupted
- Source Hash Check: Detects if embedded databases have changed
- Version Compatibility: Handles cache format changes
- Automatic Rebuild: Rebuilds cache if validation fails
Performance Considerations
Directory Operations
- Lazy Creation: Directories are created only when needed
- Path Caching: Resolved paths are cached to avoid repeated filesystem calls
- Batch Operations: Multiple files in the same directory are processed efficiently
Storage Optimization
- Binary Caching: Database indexes use efficient binary serialization
- Compression: Embedded databases are compressed to reduce binary size
- Selective Loading: Only required database sections are loaded into memory
Cleanup Strategies
- Automatic Cleanup: Temporary files are cleaned up on completion or failure
- LRU Eviction: Context cache uses LRU eviction to prevent unbounded growth
- Configurable Retention: TTL settings control how long contexts remain cached
Error Handling
Common Storage Errors
// Permission issues
if os.IsPermission(err) {
// Handle insufficient filesystem permissions
}
// Disk space issues
if strings.Contains(err.Error(), "no space left") {
// Handle disk space exhaustion
}
// Path traversal attempts
if strings.Contains(err.Error(), "invalid path") {
// Handle security violations
}
Recovery Strategies
- Graceful Degradation: Continue operation when non-critical files can't be written
- Cache Rebuilding: Automatically rebuild corrupted caches
- Alternative Paths: Fall back to temporary directories if primary locations fail
Integration Points
With Context Management
- Series configurations are loaded from the series directory
- Context cache uses storage utilities for persistence
- Database loading integrates with the cache management system
With Progress Tracking
- Progress metrics are persisted to the data directory
- Temporary run state is stored in cache directory
- Completed runs can optionally archive detailed timing data
With Generation Pipeline
- Each generation phase writes artifacts to appropriate subdirectories
- File existence checks drive caching decisions
- Path resolution ensures consistent artifact locations
This storage architecture provides a robust foundation for reproducible, auditable, and efficient artifact management throughout the generation pipeline.