Initial commit: hospital evacuation portal placement script

Fetches hospital data from OSM + Wikipedia, calculates priority (1-5), outputs CSV with WGS84 portal coordinates for Russia, US, EU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 05:49:28 +00:00 · 2026-03-31 13:54:01 +06:00 · 2026-03-31 13:54:01 +06:00 · dff93187c5
commit dff93187c5
4 changed files with 530 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,8 @@
+__pycache__/
+*.py[cod]
+output/
+*.egg-info/
+dist/
+build/
+.env
+.venv/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,31 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project
+
+Python script that fetches hospital data from OpenStreetMap (Overpass API) + Wikipedia, calculates evacuation portal placement priority (1-5), and outputs `output/portals.csv` with WGS84 coordinates.
+
+Single-file project: `portals.py`. No tests, no build system.
+
+## Commands
+
+```bash
+pip install requests shapely    # dependencies
+python portals.py               # full run (~15-20 min, rate-limited APIs)
+```
+
+## Architecture
+
+`portals.py` — 4-stage pipeline in sequence:
+
+1. **`fetch_all_osm()`** — Queries Overpass API for `amenity=hospital`. Russia split by 85+ ISO 3166-2 regions to avoid timeouts; US and EU fetched per-country. Uses mirror rotation (`_overpass_post`) on failure.
+2. **`fill_missing_cities()`** — Nominatim reverse geocoding for hospitals without `addr:city`. Rate-limited to 1 req/s per Nominatim policy.
+3. **`enrich_with_wiki()`** — MediaWiki API search to extract `beds` count from infoboxes. Emergency hospitals first, then first 500 regular. Searches enwiki, falls back to ruwiki for Russia.
+4. **`calculate_priority()` / `write_csv()`** — Score-based priority tiers (P1-P5). Score = `beds * 2.0` (emergency) or `beds * 1.0`; fallback scores 200/50. Absolute thresholds: P5 >= 1000, P4 >= 400, P3 >= 150, P2 >= 50, P1 < 50.
+
+## Key constraints
+
+- All external APIs are free/public with strict rate limits — respect delays between requests.
+- `shapely` is listed as dependency but not actively used yet (polygon area calculation stub).
+- CSV output uses `utf-8-sig` encoding for Excel compatibility.
--- a/README.md
+++ b/README.md
@ -0,0 +1,68 @@
+# Emergency Evacuation Portal Placement
+
+Python-скрипт для определения оптимальных точек размещения экстренных эвакуационных порталов на базе данных больниц из OpenStreetMap и Википедии.
+
+## Результат
+
+CSV-файл `output/portals.csv` со столбцами:
+
+| Столбец         | Описание                                         |
+|-----------------|--------------------------------------------------|
+| country         | Страна                                           |
+| city            | Город                                            |
+| hospital_name   | Название медицинского учреждения                 |
+| priority        | Приоритет размещения портала (1-5, 5 = высший)  |
+| latitude        | Широта (WGS84)                                   |
+| longitude       | Долгота (WGS84)                                  |
+
+## Установка зависимостей
+
+```bash
+pip install requests shapely
+```
+
+## Запуск
+
+```bash
+python portals.py
+```
+
+Скрипт последовательно проходит 4 этапа:
+
+1. **OSM** — загрузка больниц из Overpass API (Россия по субъектам, США целиком, 20 стран ЕС)
+2. **Nominatim** — заполнение городов для больниц без тега `addr:city`
+3. **Википедия** — дополнение числа коек из en/ru wiki
+4. **Приоритет** — расчёт и запись CSV
+
+Полный прогон занимает 15-20 минут.
+
+## Покрытие
+
+| Регион       | Страны                                                                 |
+|--------------|------------------------------------------------------------------------|
+| Россия       | 85+ субъектов РФ (по ISO 3166-2)                                       |
+| США          | Все штаты                                                              |
+| Европа       | DE, FR, NL, IT, ES, PL, SE, NO, FI, DK, AT, CZ, BE, CH, PT, IE, GR, HU, RO, BG |
+
+## Приоритеты
+
+Приоритет рассчитывается на основе:
+
+- **Число коек** (`beds`) — основной фактор. Берётся из OSM-тега `beds=*` или из инфобокса Википедии.
+- **Наличие экстренной помощи** (`emergency=yes` в OSM) — множитель x2 к скорингу.
+- **Фоллбэки** — если коек нет: emergency-больница получает базовый скор 200, обычная — 50.
+
+| Приоритет | Скоринг (score) | Критерий                                              |
+|-----------|-----------------|-------------------------------------------------------|
+| P5        | >= 1000         | >500 коек (emergency) или >1000 коек (обычная)        |
+| P4        | >= 400          | >200 коек (emergency) или >400 коек (обычная)         |
+| P3        | >= 150          | >75 коек (emergency) или >150 коек (обычная) / emergency без коек |
+| P2        | >= 50           | >25 коек (emergency) или >50 коек (обычная) / любая без коек |
+| P1        | < 50            | Небольшие учреждения                                   |
+
+## Ограничения
+
+- **OSM-данные**: тег `beds=*` заполнен далеко не для всех больниц (хуже всего по РФ, лучше по Германии/Нидерландам).
+- **Nominatim**: лимит 1 запрос/сек, поэтому этап заполнения городов долгий для десятков тысяч больниц. Можно пропустить, если города не критичны.
+- **Википедия**: обогащение идёт только для emergency-больниц и первых 500 обычных (лимит API).
+- **Overpass API**: при таймаутах скрипт автоматически переключается на зеркала (`lz4`, `z`, `kumi`).
--- a/portals.py
+++ b/portals.py
@ -0,0 +1,423 @@
+"""
+Emergency Evacuation Portal Placement Script
+Fetches hospital data from OpenStreetMap + Wikipedia,
+calculates priority, outputs CSV with portal coordinates (WGS84).
+"""
+
+import csv
+import json
+import math
+import re
+import time
+from pathlib import Path
+
+import requests
+
+OVERPASS_MIRRORS = [
+    "https://overpass-api.de/api/interpreter",
+    "https://lz4.overpass-api.de/api/interpreter",
+    "https://z.overpass-api.de/api/interpreter",
+    "https://overpass.kumi.systems/api/interpreter",
+]
+WIKI_API = "https://{lang}.wikipedia.org/w/api.php"
+NOMINATIM_URL = "https://nominatim.openstreetmap.org/reverse"
+
+# For large countries (RU, US), query by admin level 4 regions
+RU_REGIONS = [
+    "RU-AD", "RU-AL", "RU-AM", "RU-ARK", "RU-AST", "RU-BA", "RU-BEL",
+    "RU-BRY", "RU-BU", "RU-CE", "RU-CHE", "RU-CHU", "RU-CU", "RU-DA",
+    "RU-IN", "RU-KL", "RU-KC", "RU-KDA", "RU-KEM", "RU-KGD", "RU-KIR",
+    "RU-KO", "RU-KOS", "RU-KR", "RU-KGN", "RU-KRS", "RU-KYA", "RU-LEN",
+    "RU-LIP", "RU-MAG", "RU-ME", "RU-MO", "RU-MOS", "RU-MUR", "RU-NEN",
+    "RU-NGR", "RU-NIZ", "RU-NVS", "RU-OMS", "RU-ORE", "RU-ORL", "RU-PER",
+    "RU-PNZ", "RU-PRI", "RU-PSK", "RU-ROS", "RU-RYA", "RU-SA", "RU-SAK",
+    "RU-SAM", "RU-SAR", "RU-SE", "RU-SMO", "RU-SPE", "RU-STA", "RU-SVE",
+    "RU-TA", "RU-TAM", "RU-TOM", "RU-TUL", "RU-TVE", "RU-TY", "RU-TYU",
+    "RU-UD", "RU-ULY", "RU-VLA", "RU-VLG", "RU-VOR", "RU-YAR", "RU-ZAB",
+    "RU-MOW", "RU-SPE",  # federal cities
+]
+
+EU_COUNTRIES = [
+    "DE", "FR", "NL", "IT", "ES", "PL", "SE", "NO", "FI", "DK",
+    "AT", "CZ", "BE", "CH", "PT", "IE", "GR", "HU", "RO", "BG",
+]
+ALL_COUNTRIES = ["RU", "US"] + EU_COUNTRIES
+
+COUNTRY_NAMES = {
+    "RU": "Russia", "US": "United States", "DE": "Germany", "FR": "France",
+    "NL": "Netherlands", "IT": "Italy", "ES": "Spain", "PL": "Poland",
+    "SE": "Sweden", "NO": "Norway", "FI": "Finland", "DK": "Denmark",
+    "AT": "Austria", "CZ": "Czech Republic", "BE": "Belgium", "CH": "Switzerland",
+    "PT": "Portugal", "IE": "Ireland", "GR": "Greece", "HU": "Hungary",
+    "RO": "Romania", "BG": "Bulgaria",
+}
+
+# Priority thresholds for beds-based scoring
+PRIORITY_TIERS = [0, 50, 200, 500, 1000]
+
+
+def _overpass_post(query, timeout=300):
+    """Try Overpass mirrors until one responds."""
+    last_err = None
+    for url in OVERPASS_MIRRORS:
+        try:
+            resp = requests.post(url, data={"data": query}, timeout=timeout)
+            if resp.status_code == 200:
+                return resp.json()
+            last_err = f"HTTP {resp.status_code}"
+        except Exception as e:
+            last_err = str(e)
+    raise RuntimeError(f"All Overpass mirrors failed: {last_err}")
+
+
+def overpass_query(iso_code):
+    """Fetch all hospitals for a country via Overpass API."""
+    query = f"""
+[out:json][timeout:180];
+area["ISO3166-1"="{iso_code}"]->.a;
+(
+  node["amenity"="hospital"](area.a);
+  way["amenity"="hospital"](area.a);
+  relation["amenity"="hospital"](area.a);
+);
+out center tags;
+"""
+    return _overpass_post(query)
+
+
+def overpass_query_area(area_code):
+    """Fetch hospitals by admin area code (ISO3166-2)."""
+    query = f"""
+[out:json][timeout:60];
+area["ISO3166-2"="{area_code}"]->.a;
+(
+  node["amenity"="hospital"](area.a);
+  way["amenity"="hospital"](area.a);
+  relation["amenity"="hospital"](area.a);
+);
+out center tags;
+"""
+    return _overpass_post(query, timeout=120)
+
+
+def parse_osm_element(elem):
+    """Extract hospital fields from an OSM element."""
+    tags = elem.get("tags", {})
+    name = tags.get("name", tags.get("name:en", ""))
+    if not name:
+        return None
+
+    lat = lon = None
+    if elem["type"] == "node":
+        lat = elem.get("lat")
+        lon = elem.get("lon")
+    else:
+        center = elem.get("center", {})
+        lat = center.get("lat")
+        lon = center.get("lon")
+
+    if lat is None or lon is None:
+        return None
+
+    beds_str = tags.get("beds", "")
+    beds = None
+    if beds_str and beds_str.strip().isdigit():
+        beds = int(beds_str)
+
+    emergency = tags.get("emergency", "").lower() == "yes"
+
+    # Try multiple OSM tags for city
+    city = (
+        tags.get("addr:city", "")
+        or tags.get("is_in:city", "")
+        or tags.get("addr:town", "")
+        or tags.get("addr:village", "")
+        or ""
+    )
+
+    return {
+        "name": name,
+        "city": city,
+        "lat": lat,
+        "lon": lon,
+        "beds": beds,
+        "emergency": emergency,
+        "area": None,  # polygon area not available in center-mode
+    }
+
+
+def fetch_all_osm():
+    """Fetch hospitals for all target countries.
+
+    Large countries (RU, US) are split by region to avoid timeouts.
+    EU countries are fetched as a whole.
+    """
+    hospitals = []
+    total_parts = len(RU_REGIONS) + 1 + len(EU_COUNTRIES)  # RU regions + US + EU
+    done = 0
+
+    def _add(elements, country_code):
+        nonlocal hospitals, done
+        for elem in elements:
+            h = parse_osm_element(elem)
+            if h:
+                h["country_code"] = country_code
+                h["country"] = COUNTRY_NAMES.get(country_code, country_code)
+                hospitals.append(h)
+
+    # Russia: by region
+    print(f"--- Russia: {len(RU_REGIONS)} regions ---")
+    for region in RU_REGIONS:
+        done += 1
+        print(f"  [{done}/{total_parts}] {region}...", end=" ", flush=True)
+        try:
+            data = overpass_query_area(region)
+            elems = data.get("elements", [])
+            _add(elems, "RU")
+            print(f"{len(elems)} elems")
+        except Exception as e:
+            print(f"ERROR: {e}")
+        time.sleep(6)
+
+    # US: whole country (OSM data for US is manageable)
+    done += 1
+    print(f"--- United States ---")
+    print(f"  [{done}/{total_parts}] US...", end=" ", flush=True)
+    try:
+        data = overpass_query("US")
+        elems = data.get("elements", [])
+        _add(elems, "US")
+        print(f"{len(elems)} elems")
+    except Exception as e:
+        print(f"ERROR: {e}")
+    time.sleep(11)
+
+    # EU countries
+    print(f"--- Europe: {len(EU_COUNTRIES)} countries ---")
+    for code in EU_COUNTRIES:
+        done += 1
+        print(f"  [{done}/{total_parts}] {code} ({COUNTRY_NAMES.get(code, code)})...", end=" ", flush=True)
+        try:
+            data = overpass_query(code)
+            elems = data.get("elements", [])
+            _add(elems, code)
+            print(f"{len(elems)} elems")
+        except Exception as e:
+            print(f"ERROR: {e}")
+        time.sleep(11)
+
+    return hospitals
+
+
+def fill_missing_cities(hospitals):
+    """Reverse-geocode hospitals that have no city using Nominatim."""
+    missing = [h for h in hospitals if not h["city"]]
+    if not missing:
+        return hospitals
+
+    print(f"\nReverse-geocoding {len(missing)} hospitals without city...")
+    filled = 0
+    for i, h in enumerate(missing):
+        if (i + 1) % 100 == 0:
+            print(f"  Nominatim: {i+1}/{len(missing)}...")
+        try:
+            params = {
+                "lat": h["lat"],
+                "lon": h["lon"],
+                "format": "json",
+                "zoom": 10,
+            }
+            headers = {"User-Agent": "EvacPortalScript/1.0"}
+            resp = requests.get(NOMINATIM_URL, params=params, headers=headers, timeout=10)
+            if resp.status_code == 200:
+                data = resp.json()
+                addr = data.get("address", {})
+                h["city"] = (
+                    addr.get("city", "")
+                    or addr.get("town", "")
+                    or addr.get("village", "")
+                    or addr.get("county", "")
+                    or addr.get("state", "")
+                    or ""
+                )
+                if h["city"]:
+                    filled += 1
+        except Exception:
+            pass
+        time.sleep(1.1)  # Nominatim policy: max 1 req/s
+
+    print(f"  City fill done: {filled}/{len(missing)}")
+    return hospitals
+
+
+def wiki_search_beds(name, city, lang="en"):
+    """Search Wikipedia for a hospital and extract beds count from infobox."""
+    params = {
+        "action": "query",
+        "list": "search",
+        "srsearch": f"{name} {city} hospital",
+        "srlimit": 3,
+        "format": "json",
+    }
+    try:
+        resp = requests.get(WIKI_API.format(lang=lang), params=params, timeout=15)
+        resp.raise_for_status()
+        results = resp.json().get("query", {}).get("search", [])
+        if not results:
+            return None
+
+        title = results[0]["title"]
+        params2 = {
+            "action": "query",
+            "prop": "revisions",
+            "rvprop": "content",
+            "rvslots": "main",
+            "titles": title,
+            "format": "json",
+            "formatversion": 2,
+        }
+        resp2 = requests.get(WIKI_API.format(lang=lang), params=params2, timeout=15)
+        resp2.raise_for_status()
+        pages = resp2.json().get("query", {}).get("pages", [])
+        if not pages:
+            return None
+
+        wikitext = pages[0].get("revisions", [{}])[0].get("slots", {}).get("main", {}).get("content", "")
+
+        # Extract beds from infobox patterns
+        patterns = [
+            r"\|\s*beds\s*=\s*(\d+)",
+            r"\|\s*beds1\s*=\s*(\d+)",
+            r"\|\s*capacity\s*=\s*(\d+)",
+            r"beds?\s*[=:]\s*(\d+)",
+        ]
+        for pat in patterns:
+            m = re.search(pat, wikitext, re.IGNORECASE)
+            if m:
+                return int(m.group(1))
+    except Exception:
+        pass
+    return None
+
+
+def enrich_with_wiki(hospitals):
+    """Fill missing beds from Wikipedia for major hospitals (emergency=yes)."""
+    need_beds = [h for h in hospitals if h["beds"] is None and h["emergency"]]
+    print(f"\nEnriching {len(need_beds)} emergency hospitals from Wikipedia...")
+    filled = 0
+    for i, h in enumerate(need_beds):
+        if (i + 1) % 10 == 0:
+            print(f"  Wiki: {i+1}/{len(need_beds)}...")
+        beds = wiki_search_beds(h["name"], h["city"], lang="en")
+        if beds is None and h["country_code"] == "RU":
+            beds = wiki_search_beds(h["name"], h["city"], lang="ru")
+        if beds is not None:
+            h["beds"] = beds
+            h["beds_source"] = "wiki"
+            filled += 1
+        time.sleep(0.5)
+
+    # Also try for large non-emergency hospitals (first 500 by name to limit API calls)
+    no_beds_no_emergency = [h for h in hospitals if h["beds"] is None and not h["emergency"]][:500]
+    print(f"Enriching {len(no_beds_no_emergency)} additional hospitals from Wikipedia...")
+    for i, h in enumerate(no_beds_no_emergency):
+        if (i + 1) % 50 == 0:
+            print(f"  Wiki: {i+1}/{len(no_beds_no_emergency)}...")
+        beds = wiki_search_beds(h["name"], h["city"], lang="en")
+        if beds is not None:
+            h["beds"] = beds
+            h["beds_source"] = "wiki"
+            filled += 1
+        time.sleep(0.3)
+
+    print(f"  Wiki enrichment done: {filled} beds filled")
+    return hospitals
+
+
+def calculate_priority(hospitals):
+    """Calculate priority score and assign 1-5 tier.
+
+    Scoring:
+      - beds * 2.0 for emergency, beds * 1.0 for regular
+      - emergency without beds -> 200
+      - no beds, no emergency -> 50
+    Priority tiers (absolute thresholds on score):
+      P5: >= 1000  (major hospital, >500 beds emergency or >1000 regular)
+      P4: >= 400
+      P3: >= 150
+      P2: >= 50
+      P1: < 50
+    """
+    for h in hospitals:
+        beds = h["beds"]
+        emergency = h["emergency"]
+
+        if beds is not None and beds > 0:
+            h["score"] = beds * 2.0 if emergency else beds * 1.0
+        elif emergency:
+            h["score"] = 200  # baseline for emergency without bed data
+        else:
+            h["score"] = 50  # baseline for regular without bed data
+
+    for h in hospitals:
+        s = h["score"]
+        if s >= 1000:
+            h["priority"] = 5
+        elif s >= 400:
+            h["priority"] = 4
+        elif s >= 150:
+            h["priority"] = 3
+        elif s >= 50:
+            h["priority"] = 2
+        else:
+            h["priority"] = 1
+
+    return hospitals
+
+
+def write_csv(hospitals, path="output/portals.csv"):
+    """Write results to CSV sorted by priority desc."""
+    out = Path(path)
+    out.parent.mkdir(parents=True, exist_ok=True)
+
+    sorted_h = sorted(hospitals, key=lambda h: (-h["priority"], -h["score"]))
+
+    with open(out, "w", newline="", encoding="utf-8-sig") as f:
+        writer = csv.writer(f)
+        writer.writerow(["country", "city", "hospital_name", "priority", "latitude", "longitude"])
+        for h in sorted_h:
+            writer.writerow([
+                h["country"],
+                h["city"],
+                h["name"],
+                h["priority"],
+                h["lat"],
+                h["lon"],
+            ])
+
+    print(f"\nWrote {len(sorted_h)} hospitals to {out}")
+
+
+def main():
+    print("=== Emergency Evacuation Portal Placement ===\n")
+    hospitals = fetch_all_osm()
+    print(f"\nTotal OSM hospitals: {len(hospitals)}")
+
+    hospitals = fill_missing_cities(hospitals)
+    hospitals = enrich_with_wiki(hospitals)
+    hospitals = calculate_priority(hospitals)
+
+    write_csv(hospitals)
+
+    # Summary
+    by_priority = {}
+    for h in hospitals:
+        by_priority[h["priority"]] = by_priority.get(h["priority"], 0) + 1
+    print("\nPriority distribution:")
+    for p in sorted(by_priority, reverse=True):
+        print(f"  P{p}: {by_priority[p]} hospitals")
+    print("\nDone!")
+
+
+if __name__ == "__main__":
+    main()