• Accept hosts starting with “git.” or “gitlab.” in _looks_like_git_host
• Update doc-strings to document the heuristic
• Adjust git-host-agnostic tests: expect ValueError for slug form with
custom hosts; add real GitLab instance (git.rwth-aachen.de) to matrix
This commit introduces the `partial_clone_repo` function, which performs a sparse clone
of a repository (`git clone --filter=blob:none --sparse`) based on query parameters
from a `ParsedQuery` object.
- Add a new method (extact_clone_config) in ParsedQuery to encapsulate the creation
of a CloneConfig from query parameters.
- Replace repeated CloneConfig instantiation in repository_ingest.py and
query_processor.py with calls to the new method.
- Simplify code and improve maintainability by centralizing CloneConfig logic.
* Refactor cloning logic to support subpath-based partial clones
- Add `repo_name` and `subpath` fields to `CloneConfig` for flexible cloning.
- Split out `partial_clone_repo` and `full_clone_repo` to handle subpath vs. full clones.
- Update `CloneConfig` to include `repo_name` and `subpath`.
- Simplify query processing to always call `clone_repo`, which now delegates to partial or full clone.
- Improve docstrings to reflect new parameters and return types.
---------
Co-authored-by: cyclotruc <romain@coderamp.io>
* Add Python 3.9 support by using ParamSpec from typing_extensions and removing match statements
* Add Python 3.7 support by reverting inline generics and removing walrus usage
* Update pyproject.toml
* Refactor code for lifespan, template usage, and improved tests
- Move background tasks and rate-limit handler into utils.py
- Reference TEMPLATES from config instead of inline Jinja2Templates
- Adopt Given/When/Then docstrings for test clarity
- Parametrize some tests and consolidate code across query_parser tests
- Add pytest.warns context handler to test_parse_repo_source_with_failed_git_command
- Introduce ParsedQuery dataclass to store query parameters and metadata
- Update ingestion and parser modules to use ParsedQuery instead of dict[str, Any]
- Convert ignore_patterns and include_patterns to sets
- Clean references to max size and pattern handling
- Update tests to reflect new dataclass usage
* feat: make parser domain-agnostic to support multiple Git hosts
- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](https://github.com/cyclotruc/gitingest/pull/115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
2025-01-13 05:46:29 +01:00
Renamed from tests/test_query_parser.py (Browse further)