feat: enhance parser domain-agnostic support (#117)

* feat: make parser domain-agnostic to support multiple Git hosts

- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](https://github.com/cyclotruc/gitingest/pull/115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
This commit is contained in:
Filip Christiansen 2025-01-13 05:46:29 +01:00 committed by GitHub
parent 0fd16bae18
commit dd8f1e0aac
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
22 changed files with 429 additions and 167 deletions

View file

@ -20,7 +20,7 @@ FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
# Install git
# Install Git
RUN apt-get update \
&& apt-get install -y --no-install-recommends git curl\
&& rm -rf /var/lib/apt/lists/*