Initial commit: hospital evacuation portal placement script

Fetches hospital data from OSM + Wikipedia, calculates priority (1-5),
outputs CSV with WGS84 portal coordinates for Russia, US, EU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Dmitriy Kazimirov 2026-03-31 13:54:01 +06:00
commit dff93187c5
4 changed files with 530 additions and 0 deletions

8
.gitignore vendored Normal file
View file

@ -0,0 +1,8 @@
__pycache__/
*.py[cod]
output/
*.egg-info/
dist/
build/
.env
.venv/

31
CLAUDE.md Normal file
View file

@ -0,0 +1,31 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project
Python script that fetches hospital data from OpenStreetMap (Overpass API) + Wikipedia, calculates evacuation portal placement priority (1-5), and outputs `output/portals.csv` with WGS84 coordinates.
Single-file project: `portals.py`. No tests, no build system.
## Commands
```bash
pip install requests shapely # dependencies
python portals.py # full run (~15-20 min, rate-limited APIs)
```
## Architecture
`portals.py` — 4-stage pipeline in sequence:
1. **`fetch_all_osm()`** — Queries Overpass API for `amenity=hospital`. Russia split by 85+ ISO 3166-2 regions to avoid timeouts; US and EU fetched per-country. Uses mirror rotation (`_overpass_post`) on failure.
2. **`fill_missing_cities()`** — Nominatim reverse geocoding for hospitals without `addr:city`. Rate-limited to 1 req/s per Nominatim policy.
3. **`enrich_with_wiki()`** — MediaWiki API search to extract `beds` count from infoboxes. Emergency hospitals first, then first 500 regular. Searches enwiki, falls back to ruwiki for Russia.
4. **`calculate_priority()` / `write_csv()`** — Score-based priority tiers (P1-P5). Score = `beds * 2.0` (emergency) or `beds * 1.0`; fallback scores 200/50. Absolute thresholds: P5 >= 1000, P4 >= 400, P3 >= 150, P2 >= 50, P1 < 50.
## Key constraints
- All external APIs are free/public with strict rate limits — respect delays between requests.
- `shapely` is listed as dependency but not actively used yet (polygon area calculation stub).
- CSV output uses `utf-8-sig` encoding for Excel compatibility.

68
README.md Normal file
View file

@ -0,0 +1,68 @@
# Emergency Evacuation Portal Placement
Python-скрипт для определения оптимальных точек размещения экстренных эвакуационных порталов на базе данных больниц из OpenStreetMap и Википедии.
## Результат
CSV-файл `output/portals.csv` со столбцами:
| Столбец | Описание |
|-----------------|--------------------------------------------------|
| country | Страна |
| city | Город |
| hospital_name | Название медицинского учреждения |
| priority | Приоритет размещения портала (1-5, 5 = высший) |
| latitude | Широта (WGS84) |
| longitude | Долгота (WGS84) |
## Установка зависимостей
```bash
pip install requests shapely
```
## Запуск
```bash
python portals.py
```
Скрипт последовательно проходит 4 этапа:
1. **OSM** — загрузка больниц из Overpass API (Россия по субъектам, США целиком, 20 стран ЕС)
2. **Nominatim** — заполнение городов для больниц без тега `addr:city`
3. **Википедия** — дополнение числа коек из en/ru wiki
4. **Приоритет** — расчёт и запись CSV
Полный прогон занимает 15-20 минут.
## Покрытие
| Регион | Страны |
|--------------|------------------------------------------------------------------------|
| Россия | 85+ субъектов РФ (по ISO 3166-2) |
| США | Все штаты |
| Европа | DE, FR, NL, IT, ES, PL, SE, NO, FI, DK, AT, CZ, BE, CH, PT, IE, GR, HU, RO, BG |
## Приоритеты
Приоритет рассчитывается на основе:
- **Число коек** (`beds`) — основной фактор. Берётся из OSM-тега `beds=*` или из инфобокса Википедии.
- **Наличие экстренной помощи** (`emergency=yes` в OSM) — множитель x2 к скорингу.
- **Фоллбэки** — если коек нет: emergency-больница получает базовый скор 200, обычная — 50.
| Приоритет | Скоринг (score) | Критерий |
|-----------|-----------------|-------------------------------------------------------|
| P5 | >= 1000 | >500 коек (emergency) или >1000 коек (обычная) |
| P4 | >= 400 | >200 коек (emergency) или >400 коек (обычная) |
| P3 | >= 150 | >75 коек (emergency) или >150 коек (обычная) / emergency без коек |
| P2 | >= 50 | >25 коек (emergency) или >50 коек (обычная) / любая без коек |
| P1 | < 50 | Небольшие учреждения |
## Ограничения
- **OSM-данные**: тег `beds=*` заполнен далеко не для всех больниц (хуже всего по РФ, лучше по Германии/Нидерландам).
- **Nominatim**: лимит 1 запрос/сек, поэтому этап заполнения городов долгий для десятков тысяч больниц. Можно пропустить, если города не критичны.
- **Википедия**: обогащение идёт только для emergency-больниц и первых 500 обычных (лимит API).
- **Overpass API**: при таймаутах скрипт автоматически переключается на зеркала (`lz4`, `z`, `kumi`).

423
portals.py Normal file
View file

@ -0,0 +1,423 @@
"""
Emergency Evacuation Portal Placement Script
Fetches hospital data from OpenStreetMap + Wikipedia,
calculates priority, outputs CSV with portal coordinates (WGS84).
"""
import csv
import json
import math
import re
import time
from pathlib import Path
import requests
OVERPASS_MIRRORS = [
"https://overpass-api.de/api/interpreter",
"https://lz4.overpass-api.de/api/interpreter",
"https://z.overpass-api.de/api/interpreter",
"https://overpass.kumi.systems/api/interpreter",
]
WIKI_API = "https://{lang}.wikipedia.org/w/api.php"
NOMINATIM_URL = "https://nominatim.openstreetmap.org/reverse"
# For large countries (RU, US), query by admin level 4 regions
RU_REGIONS = [
"RU-AD", "RU-AL", "RU-AM", "RU-ARK", "RU-AST", "RU-BA", "RU-BEL",
"RU-BRY", "RU-BU", "RU-CE", "RU-CHE", "RU-CHU", "RU-CU", "RU-DA",
"RU-IN", "RU-KL", "RU-KC", "RU-KDA", "RU-KEM", "RU-KGD", "RU-KIR",
"RU-KO", "RU-KOS", "RU-KR", "RU-KGN", "RU-KRS", "RU-KYA", "RU-LEN",
"RU-LIP", "RU-MAG", "RU-ME", "RU-MO", "RU-MOS", "RU-MUR", "RU-NEN",
"RU-NGR", "RU-NIZ", "RU-NVS", "RU-OMS", "RU-ORE", "RU-ORL", "RU-PER",
"RU-PNZ", "RU-PRI", "RU-PSK", "RU-ROS", "RU-RYA", "RU-SA", "RU-SAK",
"RU-SAM", "RU-SAR", "RU-SE", "RU-SMO", "RU-SPE", "RU-STA", "RU-SVE",
"RU-TA", "RU-TAM", "RU-TOM", "RU-TUL", "RU-TVE", "RU-TY", "RU-TYU",
"RU-UD", "RU-ULY", "RU-VLA", "RU-VLG", "RU-VOR", "RU-YAR", "RU-ZAB",
"RU-MOW", "RU-SPE", # federal cities
]
EU_COUNTRIES = [
"DE", "FR", "NL", "IT", "ES", "PL", "SE", "NO", "FI", "DK",
"AT", "CZ", "BE", "CH", "PT", "IE", "GR", "HU", "RO", "BG",
]
ALL_COUNTRIES = ["RU", "US"] + EU_COUNTRIES
COUNTRY_NAMES = {
"RU": "Russia", "US": "United States", "DE": "Germany", "FR": "France",
"NL": "Netherlands", "IT": "Italy", "ES": "Spain", "PL": "Poland",
"SE": "Sweden", "NO": "Norway", "FI": "Finland", "DK": "Denmark",
"AT": "Austria", "CZ": "Czech Republic", "BE": "Belgium", "CH": "Switzerland",
"PT": "Portugal", "IE": "Ireland", "GR": "Greece", "HU": "Hungary",
"RO": "Romania", "BG": "Bulgaria",
}
# Priority thresholds for beds-based scoring
PRIORITY_TIERS = [0, 50, 200, 500, 1000]
def _overpass_post(query, timeout=300):
"""Try Overpass mirrors until one responds."""
last_err = None
for url in OVERPASS_MIRRORS:
try:
resp = requests.post(url, data={"data": query}, timeout=timeout)
if resp.status_code == 200:
return resp.json()
last_err = f"HTTP {resp.status_code}"
except Exception as e:
last_err = str(e)
raise RuntimeError(f"All Overpass mirrors failed: {last_err}")
def overpass_query(iso_code):
"""Fetch all hospitals for a country via Overpass API."""
query = f"""
[out:json][timeout:180];
area["ISO3166-1"="{iso_code}"]->.a;
(
node["amenity"="hospital"](area.a);
way["amenity"="hospital"](area.a);
relation["amenity"="hospital"](area.a);
);
out center tags;
"""
return _overpass_post(query)
def overpass_query_area(area_code):
"""Fetch hospitals by admin area code (ISO3166-2)."""
query = f"""
[out:json][timeout:60];
area["ISO3166-2"="{area_code}"]->.a;
(
node["amenity"="hospital"](area.a);
way["amenity"="hospital"](area.a);
relation["amenity"="hospital"](area.a);
);
out center tags;
"""
return _overpass_post(query, timeout=120)
def parse_osm_element(elem):
"""Extract hospital fields from an OSM element."""
tags = elem.get("tags", {})
name = tags.get("name", tags.get("name:en", ""))
if not name:
return None
lat = lon = None
if elem["type"] == "node":
lat = elem.get("lat")
lon = elem.get("lon")
else:
center = elem.get("center", {})
lat = center.get("lat")
lon = center.get("lon")
if lat is None or lon is None:
return None
beds_str = tags.get("beds", "")
beds = None
if beds_str and beds_str.strip().isdigit():
beds = int(beds_str)
emergency = tags.get("emergency", "").lower() == "yes"
# Try multiple OSM tags for city
city = (
tags.get("addr:city", "")
or tags.get("is_in:city", "")
or tags.get("addr:town", "")
or tags.get("addr:village", "")
or ""
)
return {
"name": name,
"city": city,
"lat": lat,
"lon": lon,
"beds": beds,
"emergency": emergency,
"area": None, # polygon area not available in center-mode
}
def fetch_all_osm():
"""Fetch hospitals for all target countries.
Large countries (RU, US) are split by region to avoid timeouts.
EU countries are fetched as a whole.
"""
hospitals = []
total_parts = len(RU_REGIONS) + 1 + len(EU_COUNTRIES) # RU regions + US + EU
done = 0
def _add(elements, country_code):
nonlocal hospitals, done
for elem in elements:
h = parse_osm_element(elem)
if h:
h["country_code"] = country_code
h["country"] = COUNTRY_NAMES.get(country_code, country_code)
hospitals.append(h)
# Russia: by region
print(f"--- Russia: {len(RU_REGIONS)} regions ---")
for region in RU_REGIONS:
done += 1
print(f" [{done}/{total_parts}] {region}...", end=" ", flush=True)
try:
data = overpass_query_area(region)
elems = data.get("elements", [])
_add(elems, "RU")
print(f"{len(elems)} elems")
except Exception as e:
print(f"ERROR: {e}")
time.sleep(6)
# US: whole country (OSM data for US is manageable)
done += 1
print(f"--- United States ---")
print(f" [{done}/{total_parts}] US...", end=" ", flush=True)
try:
data = overpass_query("US")
elems = data.get("elements", [])
_add(elems, "US")
print(f"{len(elems)} elems")
except Exception as e:
print(f"ERROR: {e}")
time.sleep(11)
# EU countries
print(f"--- Europe: {len(EU_COUNTRIES)} countries ---")
for code in EU_COUNTRIES:
done += 1
print(f" [{done}/{total_parts}] {code} ({COUNTRY_NAMES.get(code, code)})...", end=" ", flush=True)
try:
data = overpass_query(code)
elems = data.get("elements", [])
_add(elems, code)
print(f"{len(elems)} elems")
except Exception as e:
print(f"ERROR: {e}")
time.sleep(11)
return hospitals
def fill_missing_cities(hospitals):
"""Reverse-geocode hospitals that have no city using Nominatim."""
missing = [h for h in hospitals if not h["city"]]
if not missing:
return hospitals
print(f"\nReverse-geocoding {len(missing)} hospitals without city...")
filled = 0
for i, h in enumerate(missing):
if (i + 1) % 100 == 0:
print(f" Nominatim: {i+1}/{len(missing)}...")
try:
params = {
"lat": h["lat"],
"lon": h["lon"],
"format": "json",
"zoom": 10,
}
headers = {"User-Agent": "EvacPortalScript/1.0"}
resp = requests.get(NOMINATIM_URL, params=params, headers=headers, timeout=10)
if resp.status_code == 200:
data = resp.json()
addr = data.get("address", {})
h["city"] = (
addr.get("city", "")
or addr.get("town", "")
or addr.get("village", "")
or addr.get("county", "")
or addr.get("state", "")
or ""
)
if h["city"]:
filled += 1
except Exception:
pass
time.sleep(1.1) # Nominatim policy: max 1 req/s
print(f" City fill done: {filled}/{len(missing)}")
return hospitals
def wiki_search_beds(name, city, lang="en"):
"""Search Wikipedia for a hospital and extract beds count from infobox."""
params = {
"action": "query",
"list": "search",
"srsearch": f"{name} {city} hospital",
"srlimit": 3,
"format": "json",
}
try:
resp = requests.get(WIKI_API.format(lang=lang), params=params, timeout=15)
resp.raise_for_status()
results = resp.json().get("query", {}).get("search", [])
if not results:
return None
title = results[0]["title"]
params2 = {
"action": "query",
"prop": "revisions",
"rvprop": "content",
"rvslots": "main",
"titles": title,
"format": "json",
"formatversion": 2,
}
resp2 = requests.get(WIKI_API.format(lang=lang), params=params2, timeout=15)
resp2.raise_for_status()
pages = resp2.json().get("query", {}).get("pages", [])
if not pages:
return None
wikitext = pages[0].get("revisions", [{}])[0].get("slots", {}).get("main", {}).get("content", "")
# Extract beds from infobox patterns
patterns = [
r"\|\s*beds\s*=\s*(\d+)",
r"\|\s*beds1\s*=\s*(\d+)",
r"\|\s*capacity\s*=\s*(\d+)",
r"beds?\s*[=:]\s*(\d+)",
]
for pat in patterns:
m = re.search(pat, wikitext, re.IGNORECASE)
if m:
return int(m.group(1))
except Exception:
pass
return None
def enrich_with_wiki(hospitals):
"""Fill missing beds from Wikipedia for major hospitals (emergency=yes)."""
need_beds = [h for h in hospitals if h["beds"] is None and h["emergency"]]
print(f"\nEnriching {len(need_beds)} emergency hospitals from Wikipedia...")
filled = 0
for i, h in enumerate(need_beds):
if (i + 1) % 10 == 0:
print(f" Wiki: {i+1}/{len(need_beds)}...")
beds = wiki_search_beds(h["name"], h["city"], lang="en")
if beds is None and h["country_code"] == "RU":
beds = wiki_search_beds(h["name"], h["city"], lang="ru")
if beds is not None:
h["beds"] = beds
h["beds_source"] = "wiki"
filled += 1
time.sleep(0.5)
# Also try for large non-emergency hospitals (first 500 by name to limit API calls)
no_beds_no_emergency = [h for h in hospitals if h["beds"] is None and not h["emergency"]][:500]
print(f"Enriching {len(no_beds_no_emergency)} additional hospitals from Wikipedia...")
for i, h in enumerate(no_beds_no_emergency):
if (i + 1) % 50 == 0:
print(f" Wiki: {i+1}/{len(no_beds_no_emergency)}...")
beds = wiki_search_beds(h["name"], h["city"], lang="en")
if beds is not None:
h["beds"] = beds
h["beds_source"] = "wiki"
filled += 1
time.sleep(0.3)
print(f" Wiki enrichment done: {filled} beds filled")
return hospitals
def calculate_priority(hospitals):
"""Calculate priority score and assign 1-5 tier.
Scoring:
- beds * 2.0 for emergency, beds * 1.0 for regular
- emergency without beds -> 200
- no beds, no emergency -> 50
Priority tiers (absolute thresholds on score):
P5: >= 1000 (major hospital, >500 beds emergency or >1000 regular)
P4: >= 400
P3: >= 150
P2: >= 50
P1: < 50
"""
for h in hospitals:
beds = h["beds"]
emergency = h["emergency"]
if beds is not None and beds > 0:
h["score"] = beds * 2.0 if emergency else beds * 1.0
elif emergency:
h["score"] = 200 # baseline for emergency without bed data
else:
h["score"] = 50 # baseline for regular without bed data
for h in hospitals:
s = h["score"]
if s >= 1000:
h["priority"] = 5
elif s >= 400:
h["priority"] = 4
elif s >= 150:
h["priority"] = 3
elif s >= 50:
h["priority"] = 2
else:
h["priority"] = 1
return hospitals
def write_csv(hospitals, path="output/portals.csv"):
"""Write results to CSV sorted by priority desc."""
out = Path(path)
out.parent.mkdir(parents=True, exist_ok=True)
sorted_h = sorted(hospitals, key=lambda h: (-h["priority"], -h["score"]))
with open(out, "w", newline="", encoding="utf-8-sig") as f:
writer = csv.writer(f)
writer.writerow(["country", "city", "hospital_name", "priority", "latitude", "longitude"])
for h in sorted_h:
writer.writerow([
h["country"],
h["city"],
h["name"],
h["priority"],
h["lat"],
h["lon"],
])
print(f"\nWrote {len(sorted_h)} hospitals to {out}")
def main():
print("=== Emergency Evacuation Portal Placement ===\n")
hospitals = fetch_all_osm()
print(f"\nTotal OSM hospitals: {len(hospitals)}")
hospitals = fill_missing_cities(hospitals)
hospitals = enrich_with_wiki(hospitals)
hospitals = calculate_priority(hospitals)
write_csv(hospitals)
# Summary
by_priority = {}
for h in hospitals:
by_priority[h["priority"]] = by_priority.get(h["priority"], 0) + 1
print("\nPriority distribution:")
for p in sorted(by_priority, reverse=True):
print(f" P{p}: {by_priority[p]} hospitals")
print("\nDone!")
if __name__ == "__main__":
main()