Commit graph

14 commits

Author SHA1 Message Date
Filip Christiansen
2f447ae632
chore: switch to ruff + pydoclint, deprecate .gitingest, and perform a repo-wide quality sweep (#329)
Some checks failed
CI / test (macos-latest, 3.10) (push) Has been cancelled
CI / test (macos-latest, 3.11) (push) Has been cancelled
CI / test (macos-latest, 3.12) (push) Has been cancelled
CI / test (macos-latest, 3.13) (push) Has been cancelled
CI / test (macos-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.9) (push) Has been cancelled
CI / test (ubuntu-latest, 3.10) (push) Has been cancelled
CI / test (ubuntu-latest, 3.11) (push) Has been cancelled
CI / test (ubuntu-latest, 3.12) (push) Has been cancelled
CI / test (ubuntu-latest, 3.13) (push) Has been cancelled
CI / test (ubuntu-latest, 3.8) (push) Has been cancelled
CI / test (ubuntu-latest, 3.9) (push) Has been cancelled
CI / test (windows-latest, 3.10) (push) Has been cancelled
CI / test (windows-latest, 3.11) (push) Has been cancelled
CI / test (windows-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.13) (push) Has been cancelled
CI / test (windows-latest, 3.8) (push) Has been cancelled
CI / test (windows-latest, 3.9) (push) Has been cancelled
OSSF Scorecard / Scorecard analysis (push) Has been cancelled
* **Pre-commit**: replace `black` & `darglint` with `ruff-check` / `ruff-format`;
  add `pydoclint` for docstring quality
* **Deps**: drop `tomli`; tighten `typing_extensions`; add `eval-type-backport`;
  remove `black`, `djlint`, `pylint` from `requirements-dev`
* **Ignore files**: deprecate TOML-based `.gitingest`; introduce
  `.gitingestignore` (git-wildmatch, parsed via `_parse_ignore_file`)
* **Config**: new unified `[tool.ruff]` (lint + format + isort); delete
  `[tool.black]`, keep minimal `[tool.isort]` for now
* **Refactor/style**: adopt `from __future__ import annotations`, kw-only args,
  richer types; reorder params & `__all__`; move type-only imports under
  `if TYPE_CHECKING`; extract `_CLIArgs` `TypedDict`, migrate form data to
  `pydantic.QueryForm`; deduplicate `cli.main` / `_async_main`; use `pathlib`,
  avoid file-IO in async; replace magic numbers with constants; delete
  `is_text_file` (logic now lives in `FileSystemNode.content`)
* **Bug fix**: remove silent error in `notebook_utils._process_cell`
* **Docs**: refresh README badges
* **Tests**: update fixtures & assertions

**BREAKING**: new `.gitingestignore` file replaces (now-deprecated) `.gitingest`.

No functional API or CLI changes.
2025-06-28 18:49:37 +02:00
Filip Christiansen
e5fadce158
feat(parser): relax host validation to support self-hosted GitLab & git.* domains (#314)
• Accept hosts starting with “git.” or “gitlab.” in _looks_like_git_host
• Update doc-strings to document the heuristic
• Adjust git-host-agnostic tests: expect ValueError for slug form with
  custom hosts; add real GitLab instance (git.rwth-aachen.de) to matrix
2025-06-23 20:50:08 +02:00
Filip Christiansen
95009bdf15
test: add pytest-mock, introduce fixtures & type hints (#290)
* Added pytest-mock to dev dependencies and pre-commit hooks
* Introduced InvalidGitHubTokenError for clearer token-validation failures
* Refactored tests:
  * Replaced ad-hoc mocks with reusable fixtures
  * Parametrised URL/branch matrices to cut duplication
  * Added type hints throughout
* New coverage:
  * validate_github_token (happy & error paths)
  * create_git_command / create_git_auth_header
2025-06-21 21:26:29 +02:00
Filip Christiansen
8be6f5620f
refactor: rename clone to clone_repo and consolidate schema & utility modules (#237)
* refactor: rename clone to clone_repo and consolidate schema & utility modules
2025-03-22 18:56:39 +01:00
Romain Courtois
b098bb4534
Refactor/pydantic(#226) 2025-03-11 00:56:58 +01:00
Romain Courtois
d6cb920660
Refactor/ingestion (#209)
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
2025-03-04 01:11:54 +01:00
Filip Christiansen
f4fd4bbe7a
feat: partial cloning (#188)
This commit introduces the `partial_clone_repo` function, which performs a sparse clone
of a repository (`git clone --filter=blob:none --sparse`) based on query parameters
from a `ParsedQuery` object.

- Add a new method (extact_clone_config) in ParsedQuery to encapsulate the creation
  of a CloneConfig from query parameters.
- Replace repeated CloneConfig instantiation in repository_ingest.py and
  query_processor.py with calls to the new method.
- Simplify code and improve maintainability by centralizing CloneConfig logic.

* Refactor cloning logic to support subpath-based partial clones

- Add `repo_name` and `subpath` fields to `CloneConfig` for flexible cloning.
- Split out `partial_clone_repo` and `full_clone_repo` to handle subpath vs. full clones.
- Update `CloneConfig` to include `repo_name` and `subpath`.
- Simplify query processing to always call `clone_repo`, which now delegates to partial or full clone.
- Improve docstrings to reflect new parameters and return types.

---------

Co-authored-by: cyclotruc <romain@coderamp.io>
2025-02-19 10:36:08 +01:00
cyclotruc
b227748bc1 fix test_query_parser with gist.github.com 2025-02-17 17:52:20 +00:00
Filip Christiansen
4397a45281
feat: Add Python 3.7 Support and Restore Compatibility with Older Syntax (#181)
* Add Python 3.9 support by using ParamSpec from typing_extensions and removing match statements

* Add Python 3.7 support by reverting inline generics and removing walrus usage

* Update pyproject.toml
2025-02-17 11:36:57 +01:00
Filip Christiansen
58dbe2cb7e
refactor: code cleanup, utils extraction, and test improvements (#141)
* Refactor code for lifespan, template usage, and improved tests

- Move background tasks and rate-limit handler into utils.py
- Reference TEMPLATES from config instead of inline Jinja2Templates
- Adopt Given/When/Then docstrings for test clarity
- Parametrize some tests and consolidate code across query_parser tests
- Add pytest.warns context handler to test_parse_repo_source_with_failed_git_command
2025-01-23 02:46:17 +01:00
Filip Christiansen
d721b00d03
Replace dict-based query with ParsedQuery dataclass (#133)
- Introduce ParsedQuery dataclass to store query parameters and metadata
- Update ingestion and parser modules to use ParsedQuery instead of dict[str, Any]
- Convert ignore_patterns and include_patterns to sets
- Clean references to max size and pattern handling
- Update tests to reflect new dataclass usage
2025-01-17 16:53:32 +01:00
Gowtham Kishore
3ce8e7e21e
fix: handling of branch names with slashes (#131)
* Fixing the branch name with nested /

* Adding fallback logic if git command fails'
2025-01-16 20:55:26 +01:00
Filip Christiansen
0e74d6777f
Add gitingest.com to known Git hosts (#134) 2025-01-15 02:14:12 +01:00
Filip Christiansen
dd8f1e0aac
feat: enhance parser domain-agnostic support (#117)
* feat: make parser domain-agnostic to support multiple Git hosts

- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](https://github.com/cyclotruc/gitingest/pull/115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
2025-01-13 05:46:29 +01:00
Renamed from tests/test_query_parser.py (Browse further)