Troubleshooting Guide¶
Audience: Ops, Dev
You will learn:
- Systematische Problemdiagnose für das Icon-Tool
- Häufige Probleme und deren Lösungen
- Debug-Techniken und Monitoring-Tools
- Escalation-Pfade und Support-Procedures
Pre-requisites: - Operations Runbook verstanden - System-Administration Grundkenntnisse - Zugriff auf Logs und Monitoring-Tools
Allgemeine Diagnose-Strategie¶
1. Problem-Kategorisierung¶
graph TD
A[Problem erkannt] --> B{Service läuft?}
B -->|Ja| C{Health Check OK?}
B -->|Nein| D[Service-Problem]
C -->|Ja| E{Performance OK?}
C -->|Nein| F[Funktions-Problem]
E -->|Ja| G[Benutzer-Problem]
E -->|Nein| H[Performance-Problem]
D --> I[Service-Restart]
F --> J[Daten-Validierung]
H --> K[Resource-Analyse]
G --> L[Benutzer-Support]
2. Standard-Diagnose-Flow¶
#!/bin/bash
# scripts/diagnose.sh
echo "=== Icon Tool Diagnostics ==="
echo "Timestamp: $(date)"
# 1. Service Status
echo -e "\n🔍 Service Status:"
systemctl is-active icon-tool && echo "✓ Service running" || echo "❌ Service stopped"
# 2. Health Check
echo -e "\n🏥 Health Check:"
if curl -s -f http://localhost:5000/health > /dev/null; then
echo "✓ Health check passed"
curl -s http://localhost:5000/health | jq .
else
echo "❌ Health check failed"
curl -s http://localhost:5000/health || echo "No response"
fi
# 3. Resource Usage
echo -e "\n📊 Resource Usage:"
echo "Memory: $(ps aux | grep 'python app.py' | awk '{sum+=$6} END {print sum/1024 " MB"}')"
echo "CPU: $(ps aux | grep 'python app.py' | awk '{sum+=$3} END {print sum "%"}')"
echo "Disk: $(df -h /opt/icon-tool | tail -1 | awk '{print $5}')"
# 4. Recent Errors
echo -e "\n❌ Recent Errors:"
tail -50 /opt/icon-tool/logs/icon-tool.log | grep ERROR | tail -5
# 5. Connection Status
echo -e "\n🌐 Network Status:"
echo "Active connections: $(netstat -an | grep :5000 | grep ESTABLISHED | wc -l)"
echo "Listening: $(netstat -an | grep :5000 | grep LISTEN | wc -l)"
# 6. File System
echo -e "\n📁 File System:"
echo "Icons count: $(ls /opt/icon-tool/static/icons/*.svg 2>/dev/null | wc -l)"
echo "Metadata exists: $([ -f /opt/icon-tool/icons.json ] && echo "Yes" || echo "No")"
Evidenz: Operational troubleshooting patterns
Service-Level Probleme¶
Problem: Service startet nicht¶
Symptome¶
systemctl status icon-tool
# ● icon-tool.service - ak Systems Icon Management Tool
# Loaded: loaded (/etc/systemd/system/icon-tool.service; enabled; vendor preset: enabled)
# Active: failed (Result: exit-code) since Thu 2025-08-24 10:30:15 UTC; 2min ago
Diagnose¶
# 1. Detaillierte Logs prüfen
journalctl -u icon-tool --no-pager
# 2. Python-Fehler identifizieren
python /opt/icon-tool/app.py
# Direkte Ausgabe ohne systemd
# 3. Dependency-Check
cd /opt/icon-tool
python -c "import flask; print('Flask OK')"
python -c "import json; print('JSON OK')"
# 4. File-Permissions prüfen
ls -la /opt/icon-tool/
ls -la /opt/icon-tool/static/icons/
Häufige Ursachen & Lösungen¶
Ursache | Symptom | Lösung |
---|---|---|
Missing Dependencies | ModuleNotFoundError: No module named 'flask' |
pip install flask |
Permission Denied | PermissionError: [Errno 13] |
chown -R iconuser:icongroup /opt/icon-tool/ |
Port bereits belegt | Address already in use |
lsof -i :5000 und Prozess beenden |
Missing Icons | FileNotFoundError: static/icons |
node extract-icons.js |
Corrupt Metadata | json.decoder.JSONDecodeError |
Restore icons.json from backup |
Lösungsschritte¶
# Standard-Reparatur-Sequenz
cd /opt/icon-tool
# 1. Dependencies reparieren
pip install flask
# 2. Icons neu extrahieren
node extract-icons.js
# 3. Permissions korrigieren
sudo chown -R iconuser:icongroup .
sudo chmod 755 .
sudo chmod 644 icons.json
sudo chmod 755 static/icons/
sudo chmod 644 static/icons/*.svg
# 4. Service neu starten
sudo systemctl restart icon-tool
# 5. Validierung
sleep 5
curl -f http://localhost:5000/health
Evidenz: systemd service configuration, common startup issues
Problem: Service läuft aber ist nicht erreichbar¶
Symptome¶
Diagnose¶
# 1. Port-Binding prüfen
netstat -tulpn | grep :5000
# Sollte zeigen: python app.py listening on 0.0.0.0:5000
# 2. Firewall prüfen
ufw status
iptables -L | grep 5000
# 3. Process-Status
ps aux | grep "python app.py"
# 4. Application-Logs
tail -f /opt/icon-tool/logs/icon-tool.log
Lösungen¶
# Port-Konflikt beheben
sudo lsof -i :5000
sudo kill <PID> # Falls anderer Prozess Port blockiert
# Firewall-Regel hinzufügen
sudo ufw allow 5000
# Flask-Binding prüfen (in app.py)
# app.run(host='0.0.0.0', port=5000) # Nicht nur 127.0.0.1
Funktions-Probleme¶
Problem: Icons werden nicht angezeigt¶
Symptome¶
curl http://localhost:5000/ # 200 OK
curl http://localhost:5000/api/icons # {"icons": [], "categories": {}}
Diagnose¶
# 1. Icons-Directory prüfen
ls -la /opt/icon-tool/static/icons/
# Sollte *.svg Dateien enthalten
# 2. Metadata prüfen
cat /opt/icon-tool/icons.json | jq .
# Sollte valid JSON mit Kategorien sein
# 3. API-Response analysieren
curl -s http://localhost:5000/api/icons | jq '.icons | length'
# Sollte > 0 sein
# 4. File-Permissions
ls -la /opt/icon-tool/static/icons/ | head -5
Lösungen¶
# Icons neu extrahieren
cd /opt/icon-tool
node extract-icons.js
# Erwartete Ausgabe:
# ✓ 162 Icons erfolgreich extrahiert
# ✓ ZIP-Archiv erstellt: 69.234 bytes
# Validierung
ls static/icons/*.svg | wc -l # Sollte 162 sein
curl -s http://localhost:5000/api/icons | jq '.icons | length' # Sollte 162 sein
Evidenz: extract-icons.js output, app.py icon loading logic
Problem: Kategorie-Filter funktioniert nicht¶
Symptome¶
// Browser Console
fetch('/api/icons').then(r => r.json()).then(d => console.log(d.categories))
// {} (leeres Objekt statt Kategorien)
Diagnose¶
# 1. Metadata-File prüfen
cat /opt/icon-tool/icons.json | jq 'keys'
# Sollte Kategorie-Namen zeigen
# 2. JSON-Syntax validieren
python3 -c "import json; json.load(open('icons.json')); print('Valid JSON')"
# 3. Backend-Response testen
curl -s http://localhost:5000/api/icons | jq '.categories | keys'
Lösungen¶
# Metadata reparieren
cd /opt/icon-tool
# Backup erstellen
cp icons.json icons.json.backup
# Neu-Kategorisierung (wenn icons.json corrupt)
python3 -c "
import json
import os
icons = [f for f in os.listdir('static/icons') if f.endswith('.svg')]
categories = {'Uncategorized': icons}
with open('icons.json', 'w') as f:
json.dump(categories, f, indent=2)
print(f'Created basic categorization for {len(icons)} icons')
"
# Service neu starten
systemctl restart icon-tool
# Validierung
curl -s http://localhost:5000/api/icons | jq '.categories | keys'
Evidenz: icons.json structure, app.py category loading
Performance-Probleme¶
Problem: Langsame API-Responses¶
Symptome¶
Diagnose¶
# 1. Response-Time-Breakdown
curl -w "@curl-format.txt" -o /dev/null http://localhost:5000/api/icons
# 2. System-Load prüfen
uptime
iostat -x 1 5
# 3. Memory-Usage
free -h
ps aux | grep python | awk '{print $6/1024 " MB"}'
# 4. File-System-Performance
time ls /opt/icon-tool/static/icons/ | wc -l
Lösungen¶
# 1. Caching aktivieren
export ENABLE_CACHING=true
systemctl restart icon-tool
# 2. Icon-Anzahl reduzieren (temporär)
cd /opt/icon-tool
mkdir static/icons-backup
mv static/icons/*.svg static/icons-backup/
cp static/icons-backup/home.svg static/icons-backup/user.svg static/icons/
# 3. System-Resources überwachen
top -p $(pgrep -f "python app.py")
# 4. Disk-I/O optimieren
# Icons auf SSD verschieben
sudo mkdir /mnt/ssd/icons
sudo cp -r static/icons/* /mnt/ssd/icons/
sudo ln -sfn /mnt/ssd/icons static/icons
Problem: Hoher Memory-Verbrauch¶
Symptome¶
ps aux | grep "python app.py"
# USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
# iconuser 1234 1.0 25.0 500000 250000 ? S 10:30 0:05 python app.py
# RSS > 100MB ist verdächtig
Diagnose¶
# 1. Memory-Profiling
python3 -c "
import psutil
p = psutil.Process()
print(f'Memory: {p.memory_info().rss / 1024 / 1024:.1f} MB')
print(f'CPU: {p.cpu_percent()}%')
"
# 2. Memory-Leaks identifizieren
# Monitor über Zeit
while true; do
ps aux | grep "python app.py" | awk '{print strftime(\"%Y-%m-%d %H:%M:%S\"), \$6/1024 \" MB\"}'
sleep 60
done
# 3. Object-Anzahl im Python-Process
python3 -c "
import gc
print(f'Objects in memory: {len(gc.get_objects())}')
"
Lösungen¶
# 1. Memory-Limit setzen (systemd)
# In /etc/systemd/system/icon-tool.service:
# MemoryLimit=100M
# 2. Graceful-Restart implementieren
*/6 * * * * systemctl restart icon-tool # Alle 6 Stunden
# 3. Caching optimieren
export ICON_CACHE_TIMEOUT=60 # Kürzere Cache-Zeit
systemctl restart icon-tool
# 4. Python-Garbage-Collection forcieren
systemctl kill -s USR1 icon-tool # Wenn USR1 handler implementiert
Evidenz: Performance monitoring, resource optimization
Data-Integrity Probleme¶
Problem: Icon-Metadaten inkonsistent¶
Symptome¶
curl -s http://localhost:5000/api/icons | jq '.icons | length' # 162
cat icons.json | jq '[.[]] | flatten | length' # 158
# Mismatch zwischen Dateien und Metadaten
Diagnose¶
# Consistency-Check Script
python3 -c "
import json
import os
from pathlib import Path
# Load metadata
with open('icons.json') as f:
categories = json.load(f)
# Get categorized icons
categorized = set()
for cat_icons in categories.values():
categorized.update(cat_icons)
# Get actual files
icons_dir = Path('static/icons')
actual = {f.name for f in icons_dir.glob('*.svg')}
print(f'Actual files: {len(actual)}')
print(f'Categorized: {len(categorized)}')
uncategorized = actual - categorized
missing = categorized - actual
if uncategorized:
print(f'Uncategorized files: {uncategorized}')
if missing:
print(f'Missing files: {missing}')
if not uncategorized and not missing:
print('✅ Metadata is consistent')
else:
print('❌ Metadata inconsistency detected')
"
Lösungen¶
# Automatische Reparatur
python3 -c "
import json
import os
from pathlib import Path
# Load current metadata
try:
with open('icons.json') as f:
categories = json.load(f)
except:
categories = {}
# Get all actual SVG files
icons_dir = Path('static/icons')
actual_files = {f.name for f in icons_dir.glob('*.svg')}
# Remove missing files from categories
for category in categories:
categories[category] = [f for f in categories[category] if f in actual_files]
# Add uncategorized files
categorized = set()
for cat_icons in categories.values():
categorized.update(cat_icons)
uncategorized = actual_files - categorized
if uncategorized:
if 'Uncategorized' not in categories:
categories['Uncategorized'] = []
categories['Uncategorized'].extend(sorted(uncategorized))
# Save repaired metadata
with open('icons.json', 'w') as f:
json.dump(categories, f, indent=2)
print(f'✅ Repaired metadata for {len(actual_files)} files')
"
# Service neu starten
systemctl restart icon-tool
# Validierung
curl -s http://localhost:5000/api/icons | jq '.icons | length'
Problem: Corrupted SVG Files¶
Symptome¶
Diagnose¶
# SVG-Integrität prüfen
for svg in /opt/icon-tool/static/icons/*.svg; do
if ! head -1 "$svg" | grep -q "<svg"; then
echo "Corrupted: $svg"
fi
done
# File-Types prüfen
file /opt/icon-tool/static/icons/*.svg | grep -v "SVG"
# SVG-Syntax validieren
xmllint --noout /opt/icon-tool/static/icons/home.svg 2>&1 || echo "Invalid XML"
Lösungen¶
# Icons komplett neu extrahieren
cd /opt/icon-tool
# Backup der aktuellen Icons
mv static/icons static/icons-corrupted-$(date +%Y%m%d_%H%M%S)
# Neu-Extraktion
node extract-icons.js
# Validierung
ls static/icons/*.svg | wc -l
head -1 static/icons/home.svg | grep "<svg"
# Service neu starten
systemctl restart icon-tool
Evidenz: SVG generation process, file integrity checks
Network & Connectivity¶
Problem: External Access funktioniert nicht¶
Symptome¶
Diagnose¶
# 1. Binding-Address prüfen
netstat -tulpn | grep :5000
# Sollte 0.0.0.0:5000 zeigen, nicht 127.0.0.1:5000
# 2. Firewall-Status
ufw status verbose
iptables -L INPUT -v
# 3. Network-Interface
ip addr show
ping external-ip # Von anderem System
Lösungen¶
# 1. Flask-Binding korrigieren
# In app.py: app.run(host='0.0.0.0', port=5000)
# 2. Firewall-Regel hinzufügen
sudo ufw allow 5000/tcp
sudo iptables -A INPUT -p tcp --dport 5000 -j ACCEPT
# 3. Service neu starten
systemctl restart icon-tool
# 4. Validierung
netstat -tulpn | grep :5000 # Sollte 0.0.0.0:5000 zeigen
Problem: Load Balancer Health Checks schlagen fehl¶
Symptome¶
Diagnose¶
# 1. Health-Endpoint direkt testen
curl -v http://localhost:5000/health
# 2. Response-Headers prüfen
curl -I http://localhost:5000/health
# 3. Load Balancer Konfiguration
nginx -t # Syntax-Check
cat /etc/nginx/sites-enabled/icon-tool
Lösungen¶
# 1. Dedicated Health-Check Route
# app.py erweitern:
@app.route('/lb-health')
def lb_health():
return 'OK', 200
# 2. Nginx-Konfiguration anpassen
upstream icon_tool {
server 127.0.0.1:5000;
# Health check configuration
keepalive 32;
keepalive_requests 100;
keepalive_timeout 60s;
}
location /health {
proxy_pass http://icon_tool/lb-health;
proxy_read_timeout 5s;
}
# 3. Service und nginx neu laden
systemctl restart icon-tool
nginx -s reload
Evidenz: Network configuration, load balancer integration
Escalation & Support¶
Support-Level Matrix¶
Problem Severity | Response Time | Escalation Path |
---|---|---|
Critical (Service Down) | 15 minutes | On-call → Dev Team Lead |
High (Performance) | 1 hour | Dev Team → Architecture |
Medium (Feature Issues) | 4 hours | Support → Dev Team |
Low (Documentation) | 24 hours | Support → Documentation |
Information Gathering für Support¶
#!/bin/bash
# scripts/support-info.sh
echo "=== Support Information Package ==="
echo "Generated: $(date)"
echo "System: $(uname -a)"
# 1. System Status
echo -e "\n## System Status"
systemctl status icon-tool --no-pager
uptime
free -h
df -h
# 2. Application Health
echo -e "\n## Application Health"
curl -s http://localhost:5000/health 2>/dev/null | jq . || echo "Health check failed"
# 3. Recent Logs
echo -e "\n## Recent Logs (Last 50 lines)"
tail -50 /opt/icon-tool/logs/icon-tool.log
# 4. Configuration
echo -e "\n## Configuration"
echo "Flask Environment: $FLASK_ENV"
echo "Python Version: $(python --version)"
echo "Node Version: $(node --version)"
# 5. File System Status
echo -e "\n## File System"
ls -la /opt/icon-tool/ | head -10
echo "Icon count: $(ls /opt/icon-tool/static/icons/*.svg 2>/dev/null | wc -l)"
echo "Metadata size: $(wc -l < /opt/icon-tool/icons.json)"
# 6. Network Status
echo -e "\n## Network"
netstat -tulpn | grep :5000
ss -tulpn | grep :5000
# Package to send to support
tar -czf support-package-$(date +%Y%m%d_%H%M%S).tar.gz \
/tmp/support-info.txt \
/opt/icon-tool/logs/icon-tool.log \
/opt/icon-tool/icons.json \
--exclude='*.svg'
echo -e "\n✅ Support package created: support-package-*.tar.gz"
Remote Debugging¶
# Sichere Remote-Debugging-Session
# 1. SSH-Tunnel für sicheren Zugriff
ssh -L 5000:localhost:5000 user@production-server
# 2. Debug-Mode temporär aktivieren (Vorsicht!)
export FLASK_DEBUG=1
systemctl restart icon-tool
# 3. Nach Debugging: Debug-Mode deaktivieren
unset FLASK_DEBUG
systemctl restart icon-tool
Incident Documentation¶
# Incident Report Template
## Incident Summary
- **Date/Time:** 2025-08-24 14:30 UTC
- **Duration:** 15 minutes
- **Severity:** High
- **Root Cause:** Disk space exhaustion
## Timeline
- 14:30 - Alert triggered: Service unhealthy
- 14:32 - Investigation started
- 14:35 - Root cause identified: /opt full
- 14:40 - Mitigation: Log rotation and cleanup
- 14:45 - Service restored and validated
## Impact
- **Users Affected:** All users (estimated 50)
- **Service Degradation:** Complete outage
- **Data Loss:** None
## Resolution
- Immediate: Freed disk space by rotating logs
- Long-term: Automated disk cleanup cron job
## Prevention
- [ ] Implement disk usage monitoring
- [ ] Automated log rotation
- [ ] Disk usage alerts at 80%
## Lessons Learned
- Need proactive monitoring for disk usage
- Log rotation should be automatic
- Better alerting thresholds needed
Evidenz: Incident response best practices, support workflows
Troubleshooting Toolkit: - [ ] Standard-Diagnose-Skript verfügbar - [ ] Common-Issues-Playbook dokumentiert - [ ] Escalation-Matrix definiert - [ ] Support-Information-Gathering automatisiert - [ ] Remote-Debugging-Procedures getestet