A feature-rich web interface for interacting with AI models via local Ollama server. Built with professional UI/UX design, comprehensive error handling, and advanced conversation management capabilities.
This application provides a robust, user-friendly chat interface for communicating with Ollama models (DeepSeek-R1, Llama, Mistral, etc.). Built with production-grade code patterns and modern UI design, it demonstrates best practices for Python AI application development.
- Professional 2-Column UI: Optimized layout with conversation display and organized control panel
- Interactive Parameter Controls: Real-time adjustment of temperature, top-p, top-k, and max tokens
- Conversation Export: Export chat history to JSON (with metadata) or Markdown (human-readable)
- System Prompt Integration: Configurable AI personality and behavior
- Response Status Indicators: Visual progress feedback during generation
- Example Questions: Click-to-populate interactive prompts for quick testing
- Conversation Memory: Context-aware responses with full history management
- Production-Ready Architecture: Comprehensive logging, error handling, and retry logic
- Professional 2-Column Layout:
- Left panel (70%): Spacious conversation window with 630px height
- Right panel (30%): Organized controls (input, buttons, export)
- Advanced Settings Accordion:
- 2-column parameter layout for better visibility
- Real-time parameter adjustments (temperature, top-p, top-k, max tokens)
- Parameters apply per-session without modifying config
- Example Questions:
- 3 topics with 2 questions each (AI Product Management, Environment, Technology Trends)
- Click-to-populate functionality for instant testing
- Interactive chips with hover effects
- Enhanced Typography:
- Larger font sizes for better readability (16px base, 14px for sliders)
- Properly sized value input boxes (48px height with optimized padding)
- Dark borders and visual polish throughout
- System Prompt Support: Configurable AI personality prepended to all prompts
- Conversation Memory: Maintains up to 20 messages (configurable) with full context
- Parameter Tracking: Each message stores its generation parameters
- Timestamp Tracking: ISO 8601 timestamps for all exchanges
- Clear Function: Reset conversation and start fresh
JSON Export (with metadata):
{
"export_metadata": {
"timestamp": "2025-12-03T11:07:06",
"model": "deepseek-r1:8b",
"system_prompt": "You are a helpful AI assistant...",
"total_messages": 1,
"export_version": "1.0"
},
"conversation": [...]
}Markdown Export (human-readable): See example export for complete format.
- Configuration Management: YAML-based configuration for easy customization
- Structured Logging: File and console logging with configurable levels
- Retry Logic: Automatic retry with exponential backoff for network resilience
- Error Handling: Comprehensive exception handling with user-friendly messages
- Type Safety: Full type hints for better IDE support and code quality
- Modular Architecture: Reusable common utilities (logging, config, retry, export)
- Response Status: Visual progress indicator with descriptive messages
- Python 3.8+
- Ollama running locally (default: http://localhost:11434)
- Any Ollama model installed (default config:
deepseek-r1:8b)
git clone https://github.com/shrimpy8/ollama-chat-interface.git
cd ollama-chat-interfacepython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtDownload: Get Ollama from https://ollama.com/download
- macOS: Download and install the .dmg file
- Linux:
curl -fsSL https://ollama.com/install.sh | sh - Windows: Download and run the installer
Official Documentation: https://docs.ollama.com/
# Start Ollama server (in terminal 1)
ollama serve
# Pull the DeepSeek-R1 model (in terminal 2)
ollama pull deepseek-r1:8b
# Or try other popular models:
# ollama pull llama3.2 # Meta's Llama 3.2
# ollama pull mistral # Mistral AI
# ollama pull qwen2.5 # Alibaba's Qwen
# Verify the model is available
ollama list# Test the API
curl http://localhost:11434/api/version
# Expected output: {"version":"0.x.x"}The application works out-of-the-box with sensible defaults. To customize settings, edit config.yaml:
ollama:
model_name: "deepseek-r1:8b" # Change to your preferred model
parameters:
temperature: 0.7 # Adjust creativity (0.0-1.0)
top_p: 0.9 # Nucleus sampling
top_k: 40 # Vocabulary limit
num_predict: 2048 # Max response lengthpython app.pyThe application will start on http://127.0.0.1:7860 (configurable in config.yaml).
- Open your browser: Navigate to
http://127.0.0.1:7860 - Try example questions: Click any of the pre-populated questions
- Or type your message: Enter custom prompts in the input box
- Adjust parameters (optional): Open "Advanced Settings" to fine-tune:
- Temperature: 0.0 (deterministic) to 1.0 (creative)
- Top-P: Nucleus sampling threshold
- Top-K: Vocabulary size limit
- Max Tokens: Response length limit
- Send message: Click "Send" or press Enter
- View response: Watch the status indicator during generation
- Export conversation: Choose JSON or Markdown format and download
Quick Test with Example Questions:
- Open the interface
- Click "How do I prioritize AI features in a product roadmap?"
- Observe response generation with status indicator
- Continue conversation with follow-up questions
Custom Research Session:
- Type your research question
- Adjust temperature to 0.8 for more creative responses
- Increase max tokens to 4096 for longer answers
- Export the conversation as Markdown for documentation
Parameter Experimentation:
- Ask the same question with different temperatures
- Compare response creativity and determinism
- Export both conversations and analyze differences
ollama-chat-interface/
├── app.py # Main application with UI and chat logic
├── config.yaml # Centralized configuration
├── common/ # Shared utilities module
│ ├── __init__.py # Module exports
│ ├── config_loader.py # Configuration management
│ ├── logging_config.py # Structured logging setup
│ ├── retry_utils.py # Retry decorators and utilities
│ └── export_utils.py # Conversation export functionality
├── tests/ # Test suite (63 tests, 76% coverage)
│ ├── test_config_loader.py # Configuration tests
│ ├── test_retry_utils.py # Retry logic tests
│ ├── test_logging_config.py # Logging tests
│ └── test_export_utils.py # Export functionality tests
├── screenshots/ # Documentation screenshots
│ ├── Complete_Chat_Interface.png
│ └── ollama_conversation_*.md
├── requirements.txt # Python dependencies
├── requirements-dev.txt # Development dependencies
├── .env.example # Environment variable template
├── .gitignore # Git ignore patterns
├── pytest.ini # Pytest configuration
└── README.md # This file
All settings are managed via config.yaml for easy customization without code changes:
# Ollama Server Configuration
ollama:
base_url: "http://localhost:11434"
api_endpoint: "/api/generate"
model_name: "deepseek-r1:8b"
# Model Parameters (overridable via UI)
parameters:
temperature: 0.7 # Creativity level (0.0 = deterministic, 1.0 = creative)
top_p: 0.9 # Nucleus sampling
top_k: 40 # Top-k sampling
num_predict: 2048 # Maximum tokens to generate
# Request Configuration
request:
timeout: 120 # Timeout in seconds for API requests
retry:
max_attempts: 3 # Maximum retry attempts for failed requests
min_wait: 2 # Minimum wait time between retries (seconds)
max_wait: 10 # Maximum wait time between retries (seconds)
multiplier: 2 # Exponential backoff multiplier
# Logging Configuration
logging:
level: "INFO" # Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
file: "ollama_chat.log" # Log file name
console: true # Enable console logging
file_logging: true # Enable file logging
# Gradio UI Configuration
ui:
title: "DeepSeek-R1 AI Chat Interface"
description: "Chat with DeepSeek-R1 model via local Ollama server"
theme: "default" # Gradio theme: default, soft, monochrome
share: false # Enable public sharing via gradio.live link
server:
port: 7860 # Port to run the server on
host: "127.0.0.1" # Host address (127.0.0.1 for local only)
# Conversation Settings
conversation:
system_prompt: "You are a helpful AI assistant powered by DeepSeek-R1."
memory_enabled: true # Enable conversation memory
context_window: 4096 # Maximum context window in tokens
history:
max_messages: 20 # Maximum number of messages to keep in history
show_timestamps: true # Show timestamps in chat historyCustomize settings by editing config.yaml - no code changes needed!
The project includes a comprehensive test suite with 63 tests covering core functionality, achieving 76% code coverage.
# Install development dependencies
pip install -r requirements-dev.txt
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run with coverage report
pytest --cov=. --cov-report=html --cov-report=term
# Run specific test file
pytest tests/test_export_utils.py
# Run tests by marker
pytest -m unit # Unit tests only
pytest -m integration # Integration tests onlyConfiguration Management (26 tests):
- YAML configuration loading and validation
- Default value handling
- Error handling for missing/invalid files
- All configuration getter methods
- Singleton pattern implementation
Retry Logic (15 tests):
- Retry decorator creation and configuration
- Exponential backoff timing
- Connection error handling
- HTTP error handling (404, 500)
- Timeout and request exception handling
- Safe API call wrapper with fallback values
Logging (14 tests):
- Logger setup and configuration
- Console and file handlers
- Log level configuration
- Log message formatting
- File creation and writing
- Error handling for invalid paths
Export Utilities (8 tests):
- JSON export with metadata
- Markdown export formatting
- Empty conversation handling
- Unicode support (Chinese, Arabic, emoji)
- Parameter preservation
- Structure validation
Coverage reports: htmlcov/index.html (detailed interactive report)
The system prompt is automatically prepended to all prompts, defining the AI's personality and behavior:
# Configured in config.yaml
conversation:
system_prompt: "You are a helpful AI assistant powered by DeepSeek-R1."
# Automatically included in every API call
# User sees normal conversation, but model receives contextAdjust model behavior in real-time via the Advanced Settings accordion:
| Parameter | Range | Description | Impact |
|---|---|---|---|
| Temperature | 0.0-1.0 | Creativity level | 0.0 = deterministic, 1.0 = creative |
| Top-P | 0.0-1.0 | Nucleus sampling | Higher = more diverse vocabulary |
| Top-K | 1-100 | Vocabulary limit | Higher = more variety in responses |
| Max Tokens | 256-4096 | Response length | Higher = longer responses |
Note: UI parameters override config defaults for the current session only. Changes are not persisted.
Visual feedback during response generation:
- 🤔 Thinking... - Processing user input
- ⏳ Generating response... - Model is generating
- ✅ Complete! - Response delivered
Export your conversations with full metadata:
JSON Format Features:
- Complete metadata (timestamp, model, system prompt, version)
- All conversation exchanges with parameters
- Structured for programmatic access
- Unicode support (Chinese, Arabic, emoji)
Markdown Format Features:
- Human-readable format for documentation
- Includes metadata header
- System prompt display
- Timestamped exchanges
- Parameter annotations for each message
File Naming: ollama_conversation_YYYY-MM-DD_HHMMSS.{json|md}
Network requests automatically retry on failure with exponential backoff:
- Attempt 1: Immediate
- Attempt 2: Wait 2 seconds
- Attempt 3: Wait 4 seconds
- Maximum wait: 10 seconds
- Handles: Connection errors, timeouts, HTTP errors, request exceptions
Comprehensive error handling for all scenarios:
| Error Type | User Message | Recovery Action |
|---|---|---|
| Connection Error | "Cannot connect to Ollama server" | Check if Ollama is running |
| Timeout | "Request timed out after 120 seconds" | Try shorter prompt or increase timeout |
| Model Not Found | "404 Client Error: model not found" | Check model name in config.yaml |
| Empty Response | "Received empty response" | Retry request |
| Unexpected Error | "An unexpected error occurred" | Check logs for details |
Application Won't Start
- Verify all dependencies installed:
pip install -r requirements.txt - Check Python version:
python --version(3.8+ required) - Ensure config.yaml exists and is valid YAML
Cannot Connect to Ollama
# Check if Ollama is running
ps aux | grep ollama
# Start Ollama if not running
ollama serve
# Verify API is accessible
curl http://localhost:11434/api/versionModel Not Found (404 Error)
# List available models
ollama list
# Pull the required model (match config.yaml model_name)
ollama pull deepseek-r1:8bSlow Responses
- DeepSeek-R1 is a reasoning model and may take longer
- Check your system resources (CPU/RAM usage)
- Try a smaller model:
ollama pull llama3.2(faster) - Reduce
num_predictin config.yaml or via UI
Values Cut Off in Slider Inputs
- This issue has been resolved with optimized padding (1px top, 22px-23px bottom)
- If still experiencing issues, check your browser zoom level (should be 100%)
Port Already in Use
# Change port in config.yaml
ui:
server:
port: 8080 # Use different portAll operations are logged to ollama_chat.log. For detailed debugging:
-
Enable debug logging:
# config.yaml logging: level: "DEBUG"
-
Check the log file:
tail -f ollama_chat.log
-
Test API directly:
curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{"model":"deepseek-r1:8b","prompt":"Hello","stream":false}'
- Conversation Memory: Automatically trims to
max_messagesto stay within token limits - Request Timeout: Default 120 seconds (configurable)
- Retry Logic: Maximum 3 attempts with exponential backoff
- Token Usage: Monitor
num_predictto control response length and cost - Model Selection: Smaller models (7B-8B) respond faster than larger models (70B+)
- UI Responsiveness: Status indicators provide real-time feedback during generation
- Local Deployment: Runs on 127.0.0.1 by default (localhost only)
- No API Keys: Uses local Ollama - no external API credentials needed
- Public Sharing: Disabled by default (
share: falsein config.yaml) - Input Validation: All user inputs validated before processing
- Error Sanitization: Error messages sanitized to avoid information leakage
- No Data Collection: All conversations stay local
Edit config.yaml:
ollama:
model_name: "llama3.2" # Change to your preferred modelEnsure the model is installed:
ollama pull llama3.2| Model | Size | Best For | Speed |
|---|---|---|---|
| deepseek-r1:8b | 8B | Reasoning, analysis | Medium |
| llama3.2 | 3B-70B | General chat, versatile | Fast-Slow |
| mistral | 7B | Efficient, balanced | Fast |
| qwen2.5 | 7B-72B | Multilingual, coding | Fast-Medium |
| gemma2 | 2B-27B | Lightweight, efficient | Very Fast |
ui:
title: "My Custom AI Chat"
description: "Powered by Ollama and Python"
theme: "soft" # Options: default, soft, monochrome
share: true # Enable public sharing (use with caution)
server:
port: 8080 # Custom portconversation:
system_prompt: "You are an expert Python developer and educator. Provide clear, detailed explanations with code examples."For stateless interactions:
conversation:
memory_enabled: falseThis project demonstrates production-ready Python development:
- Configuration over code: Settings externalized to YAML
- Observability: Comprehensive logging for debugging
- Resilience: Retry logic for network failures
- Modularity: Reusable utilities for common operations
- Type safety: Full type hints for better IDE support
- User Experience: Professional UI with real-time feedback
- Documentation: Clear docstrings and comprehensive README
- Type hints on all functions and methods
- Google-style docstrings
- Specific exception handling (no bare
except) - Structured logging throughout
- Configuration-driven behavior
- No hardcoded values
- 76% test coverage
The modular architecture makes it easy to extend:
Adding a new feature:
- Add configuration to
config.yaml - Update
ConfigLoaderwith getter methods - Implement feature in
app.py - Add logging and error handling
- Write tests in
tests/ - Update README documentation
Creating custom utilities:
- Add new module to
common/ - Export in
common/__init__.py - Import in
app.pyas needed - Write comprehensive tests
This application utilizes the following technologies:
- Ollama - Local LLM runtime and API
- Gradio - Web UI framework
- DeepSeek-R1 - Reasoning-focused language model
- Tenacity - Retry library with exponential backoff
- PyYAML - YAML configuration parser
- Pytest - Testing framework
- Ollama Official Documentation
- Ollama Model Library
- Gradio Documentation
- DeepSeek-R1 Model Card
- Python Type Hints Guide
Made with ❤️ using Ollama, Gradio, and Python
For issues, feature requests, or contributions, visit the GitHub repository
