READMEPython

README

real world projects / file sync

Project
Advanced
4 min

Learning Objective

Understand real world projects well enough to explain it, recognize it in Python, and apply it in a small task.

Why It Matters

This concept is part of the foundation that later lessons and projects assume you already understand.

RealWorldProjectsFeaturesProject Structure
Private notes
0/8000

Notes stay private to your browser until account sync is configured.

README
2 min read18 headings

File Sync Tool

A Python-based file synchronization tool that monitors local directories and syncs changes to cloud storage (AWS S3 or local backup).

Features

  • Real-time monitoring: Uses watchdog to detect file changes instantly
  • Multiple backends: Support for S3, local backup, and extensible for other cloud providers
  • Conflict resolution: Smart handling of sync conflicts
  • Incremental sync: Only syncs changed files
  • Ignore patterns: Configurable file/folder exclusions
  • Compression: Optional compression for bandwidth savings
  • Encryption: Optional AES encryption for sensitive files
  • Resume support: Handles interrupted syncs gracefully
  • Detailed logging: Track all sync operations

Project Structure

05_file_sync/
ā”œā”€ā”€ README.md
ā”œā”€ā”€ requirements.txt
ā”œā”€ā”€ .env.example
ā”œā”€ā”€ sync/
│   ā”œā”€ā”€ __init__.py
│   ā”œā”€ā”€ __main__.py
│   ā”œā”€ā”€ config.py
│   ā”œā”€ā”€ watcher.py
│   ā”œā”€ā”€ sync_manager.py
│   ā”œā”€ā”€ backends/
│   │   ā”œā”€ā”€ __init__.py
│   │   ā”œā”€ā”€ base.py
│   │   ā”œā”€ā”€ s3.py
│   │   └── local.py
│   ā”œā”€ā”€ utils/
│   │   ā”œā”€ā”€ __init__.py
│   │   ā”œā”€ā”€ hashing.py
│   │   ā”œā”€ā”€ compression.py
│   │   └── encryption.py
│   └── state.py
└── tests/
    ā”œā”€ā”€ __init__.py
    ā”œā”€ā”€ conftest.py
    └── test_sync.py

Learning Concepts

Core Python Skills

  • Async I/O: Using asyncio for non-blocking operations
  • File system watching: watchdog library for monitoring changes
  • Abstract base classes: Backend interface design
  • Context managers: Resource management for files and connections
  • Generators: Streaming large file operations

Advanced Topics

  • AWS SDK (boto3): Cloud storage integration
  • Threading: Background sync workers
  • Hashing: MD5/SHA256 for change detection
  • Compression: zlib/gzip for file compression
  • Encryption: AES encryption with cryptography library

Design Patterns

  • Strategy pattern: Swappable storage backends
  • Observer pattern: File change notifications
  • Singleton: Configuration management
  • Factory pattern: Backend creation

Installation

cd 05_file_sync
pip install -r requirements.txt

Configuration

  1. Copy .env.example to .env:
cp .env.example .env
  1. Configure your settings:
# Watch directory
WATCH_DIR=/path/to/sync

# Backend type: s3, local
BACKEND_TYPE=local

# Local backup settings
BACKUP_DIR=/path/to/backup

# S3 settings (if using S3)
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET=your-bucket

# Optional settings
COMPRESSION_ENABLED=true
ENCRYPTION_ENABLED=false
ENCRYPTION_KEY=your-32-byte-key-here

# Ignore patterns (comma-separated)
IGNORE_PATTERNS=.git,*.pyc,__pycache__,*.tmp

Usage

Basic Usage

# Start file sync daemon
python -m sync

# Or using the entry point
file-sync start

# Initial full sync
file-sync sync --full

# Sync specific directory
file-sync sync --path /path/to/folder

Command Line Options

# Show sync status
file-sync status

# List pending changes
file-sync list

# Force re-sync all files
file-sync sync --force

# Dry run (show what would sync)
file-sync sync --dry-run

# Watch mode (continuous monitoring)
file-sync watch

# Restore files from backup
file-sync restore --date 2024-01-15

Programmatic Usage

from sync import SyncManager, S3Backend, LocalBackend

# Create sync manager with S3
manager = SyncManager(
    source_dir="/path/to/watch",
    backend=S3Backend(bucket="my-bucket", region="us-east-1")
)

# Start watching
await manager.start()

# Manual sync
await manager.sync_all()

# Stop watching
await manager.stop()

Architecture

Sync Flow

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  File Watcher   │ ─── Detects changes
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
         │
         ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  Event Queue    │ ─── Debounces rapid changes
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
         │
         ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  Sync Manager   │ ─── Coordinates sync operations
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
         │
    ā”Œā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”
    ā–¼         ā–¼
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│  S3   │ │ Local │ ─── Storage backends
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

State Management

The sync tool maintains a state file (.sync_state.json) that tracks:

  • File checksums for change detection
  • Last sync timestamps
  • Pending uploads/downloads
  • Conflict information

Conflict Resolution

When conflicts are detected (file changed both locally and remotely):

  1. Keep Local: Local file overwrites remote
  2. Keep Remote: Remote file overwrites local
  3. Keep Both: Creates .conflict backup
  4. Manual: Prompts user for decision
# Configure conflict resolution
manager = SyncManager(
    source_dir="/path/to/watch",
    conflict_resolution="keep_both"  # keep_local, keep_remote, keep_both, manual
)

Security

Encryption

Files can be encrypted before upload:

from sync.utils.encryption import FileEncryptor

encryptor = FileEncryptor(key="your-32-byte-key")

# Encrypt file
encrypted_path = encryptor.encrypt_file("secret.txt")

# Decrypt file
decrypted_path = encryptor.decrypt_file("secret.txt.enc")

Credential Management

  • Never commit .env files
  • Use AWS IAM roles in production
  • Rotate encryption keys regularly

Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=sync

# Run specific test file
pytest tests/test_sync.py -v

Exercises

  1. Add Google Drive backend: Implement a new backend for Google Drive
  2. Add file versioning: Keep multiple versions of synced files
  3. Add bandwidth limiting: Implement rate limiting for uploads
  4. Add sync scheduling: Add cron-like scheduling for syncs
  5. Add progress bars: Show real-time sync progress

Troubleshooting

Common Issues

"Permission denied" errors

# Check directory permissions
chmod 755 /path/to/watch

S3 connection issues

# Verify AWS credentials
aws sts get-caller-identity

High memory usage

  • Enable streaming for large files
  • Reduce concurrent sync limit
  • Increase debounce timeout

License

MIT License - Educational use

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue