---
name: python-standards
version: 1.0.0
type: knowledge
description: Python code quality standards (PEP 8, type hints, docstrings). Use when writing Python code.
keywords: python, pep8, type hints, docstrings, black, isort, formatting
auto_activate: true
allowed-tools: [Read]
---

# Python Standards Skill

Python code quality standards for [PROJECT_NAME] project.

## When This Activates
- Writing Python code
- Code formatting
- Type hints
- Docstrings
- Keywords: "python", "format", "type", "docstring"

## Code Style (PEP 8 + Black)

### Formatting
- **Line length**: 100 characters (black --line-length=100)
- **Indentation**: 4 spaces (no tabs)
- **Quotes**: Double quotes preferred
- **Imports**: Sorted with isort

### Running Formatters
```bash
# Black
black --line-length=100 src/ tests/

# isort
isort --profile=black --line-length=100 src/ tests/

# Combined (automatic via hooks)
black src/ && isort src/
```

## Type Hints (Required)

### Function Signatures
```python
from pathlib import Path
from typing import Optional, List, Dict, Union, Tuple


def process_file(
    input_path: Path,
    output_path: Optional[Path] = None,
    *,
    max_lines: int = 1000,
    validate: bool = True
) -> Dict[str, any]:
    """Process file with type hints on all parameters and return."""
    pass
```

### Generic Types
```python
from typing import List, Dict, Set, Tuple, Optional, Union

# Collections
items: List[str] = ["a", "b", "c"]
mapping: Dict[str, int] = {"a": 1, "b": 2}
unique: Set[int] = {1, 2, 3}
pair: Tuple[str, int] = ("key", 42)

# Optional (can be None)
maybe_value: Optional[str] = None

# Union (multiple types)
flexible: Union[str, int] = "text"
```

### Class Type Hints
```python
from dataclasses import dataclass
from typing import ClassVar


@dataclass
class APIConfig:
    """API configuration with type hints."""

    base_url: str
    api_key: str
    timeout: int = 30
    max_retries: int = 3
    enable_cache: bool = True

    # Class variable
    DEFAULT_TIMEOUT: ClassVar[int] = 30
```

## Docstrings (Google Style)

### Function Docstrings
```python
def process_data(
    data: List[Dict],
    *,
    batch_size: int = 32,
    validate: bool = True
) -> ProcessResult:
    """Process data with validation and batching.

    This function processes input data in batches with optional
    validation. It handles errors gracefully and provides detailed results.

    Args:
        data: Input data as list of dicts with 'id' and 'content' keys
        batch_size: Number of items to process per batch (default: 32)
        validate: Whether to validate input data (default: True)

    Returns:
        ProcessResult containing processed items, errors, and metrics

    Raises:
        ValueError: If data is empty or invalid format
        ValidationError: If validation fails

    Example:
        >>> data = [{"id": 1, "content": "text"}]
        >>> result = process_data(data, batch_size=10)
        >>> print(result.success_count)
        1
    """
    pass
```

### Class Docstrings
```python
class DataProcessor:
    """Data processing orchestrator for batch operations.

    This class handles the complete data processing workflow including
    validation, transformation, batching, and error handling.

    Args:
        config: Processing configuration
        batch_size: Number of items per batch
        validate: Whether to validate input data

    Attributes:
        config: Processing configuration
        batch_size: Configured batch size
        metrics: Processing metrics tracker

    Example:
        >>> processor = DataProcessor(config, batch_size=100)
        >>> result = processor.process(input_data)
        >>> processor.save("results.json")
    """

    def __init__(
        self,
        config: APIConfig,
        device: str = "gpu"
    ):
        self.model_name = model_name
        self.config = config
        self.device = device
```

## Error Handling

### Helpful Error Messages
```python
# ✅ GOOD: Context + Expected + Docs
def load_config(path: Path) -> Dict:
    """Load configuration file."""
    if not path.exists():
        raise FileNotFoundError(
            f"Config file not found: {path}\n"
            f"Expected YAML file with keys: model, data, training\n"
            f"See example: docs/examples/config.yaml\n"
            f"See guide: docs/guides/configuration.md"
        )

    try:
        with open(path) as f:
            return yaml.safe_load(f)
    except yaml.YAMLError as e:
        raise ValueError(
            f"Invalid YAML in config file: {path}\n"
            f"Error: {e}\n"
            f"See guide: docs/guides/configuration.md"
        )


# ❌ BAD: Generic error
def load_config(path):
    if not path.exists():
        raise FileNotFoundError("File not found")
```

### Custom Exceptions
```python
class AppError(Exception):
    """Base exception for application."""
    pass


class ConfigError(AppError):
    """Configuration error."""
    pass


class ValidationError(AppError):
    """Validation error."""
    pass


# Usage
def validate_config(config: Dict) -> None:
    """Validate configuration."""
    required = ["database", "api_key", "settings"]
    missing = [k for k in required if k not in config]

    if missing:
        raise ConfigError(
            f"Missing required config keys: {missing}\n"
            f"Required: {required}\n"
            f"See: docs/guides/configuration.md"
        )
```

## Code Organization

### Imports Order (isort)
```python
# 1. Standard library
import os
import sys
from pathlib import Path

# 2. Third-party
import [framework].core as mx
import numpy as np
from anthropic import Anthropic

# 3. Local
from [project_name].core.trainer import Trainer
from [project_name].utils.config import load_config
```

### Function/Class Organization
```python
class Model:
    """Model class."""

    # 1. Class variables
    DEFAULT_LR = 1e-4

    # 2. __init__
    def __init__(self, name: str):
        self.name = name

    # 3. Public methods
    def train(self, data: List) -> None:
        """Public training method."""
        pass

    # 4. Private methods
    def _prepare_data(self, data: List) -> List:
        """Private helper method."""
        pass

    # 5. Properties
    @property
    def num_parameters(self) -> int:
        """Number of trainable parameters."""
        return sum(p.size for p in self.parameters())
```

## Naming Conventions

```python
# Classes: PascalCase
class ModelTrainer:
    pass

# Functions/variables: snake_case
def train_model():
    training_data = []

# Constants: UPPER_SNAKE_CASE
MAX_SEQUENCE_LENGTH = 2048
DEFAULT_LEARNING_RATE = 1e-4

# Private: _leading_underscore
def _internal_helper():
    pass

_internal_cache = {}
```

## Best Practices

### Use Keyword-Only Arguments
```python
# ✅ GOOD: Clear, prevents positional errors
def train(
    data: List,
    *,
    learning_rate: float = 1e-4,
    batch_size: int = 32
):
    pass

# Must use: train(data, learning_rate=1e-3)


# ❌ BAD: Easy to mix up arguments
def train(data, learning_rate=1e-4, batch_size=32):
    pass
```

### Use Pathlib
```python
from pathlib import Path

# ✅ GOOD: Pathlib
config_path = Path("config.yaml")
if config_path.exists():
    content = config_path.read_text()

# ❌ BAD: String paths
import os
if os.path.exists("config.yaml"):
    with open("config.yaml") as f:
        content = f.read()
```

### Use Context Managers
```python
# ✅ GOOD: Automatic cleanup
with open(path) as f:
    data = f.read()

# ✅ GOOD: Custom context manager
from contextlib import contextmanager

@contextmanager
def training_context():
    """Setup/teardown for training."""
    setup_training()
    try:
        yield
    finally:
        cleanup_training()
```

### Use Dataclasses
```python
from dataclasses import dataclass, field
from typing import List


@dataclass
class Config:
    """Configuration with dataclass."""

    model_name: str
    learning_rate: float = 1e-4
    epochs: int = 3
    tags: List[str] = field(default_factory=list)

    def __post_init__(self):
        """Validate after initialization."""
        if self.learning_rate <= 0:
            raise ValueError("Learning rate must be positive")


# Usage
config = Config(model_name="model", tags=["test"])
```

## Code Quality Checks

### Flake8 (Linting)
```bash
flake8 src/ --max-line-length=100
```

### MyPy (Type Checking)
```bash
mypy src/[project_name]/
```

### Coverage
```bash
pytest --cov=src/[project_name] --cov-fail-under=80
```

## File Organization

```
src/[project_name]/
├── __init__.py              # Package init
├── core/                    # Core functionality
│   ├── __init__.py
│   ├── trainer.py
│   └── model.py
├── backends/                # Backend implementations
│   ├── __init__.py
│   ├── mlx_backend.py
│   └── pytorch_backend.py
├── cli/                     # CLI tools
│   ├── __init__.py
│   └── main.py
└── utils/                   # Utilities
    ├── __init__.py
    ├── config.py
    └── logging.py
```

## Anti-Patterns to Avoid

```python
# ❌ BAD: No type hints
def process(data, config):
    pass

# ❌ BAD: No docstring
def train_model(data, lr=1e-4):
    pass

# ❌ BAD: Unclear names
def proc(d, c):
    x = d['k']
    return x

# ❌ BAD: Mutable default argument
def add_item(items=[]):
    items.append("new")
    return items

# ❌ BAD: Generic exception
try:
    process()
except:
    pass
```

## Key Takeaways

1. **Type hints** - Required on all public functions
2. **Docstrings** - Google style, with examples
3. **Black formatting** - 100 char line length
4. **isort imports** - Sorted and organized
5. **Helpful errors** - Context + expected + docs link
6. **Pathlib** - Use Path not string paths
7. **Keyword args** - Use * for clarity
8. **Dataclasses** - For configuration objects