• Accept hosts starting with “git.” or “gitlab.” in _looks_like_git_host
• Update doc-strings to document the heuristic
• Adjust git-host-agnostic tests: expect ValueError for slug form with
custom hosts; add real GitLab instance (git.rwth-aachen.de) to matrix
* feat(web-ui, backend): allow ingesting private GitHub repos with PAT authentication
* Accept a GitHub personal access token (PAT) from the UI and forward it through
- `git_form.jinja` → new “Private Repository” checkbox + PAT field
- routers (`index.py`, `dynamic.py`) and `query_processor.py`
* Propagate `token` throughout the ingestion stack
- `gitingest.entrypoint.parse_query`
- `query_parsing` (including `try_domains_for_user_and_repo`) so we can infer the host when the user enters a bare “user/repo” slug
* Tests
- Added `"token": ""` to the `form_data` dict in the tests in `tests/test_flow_integration.py`
**Limitation:** This PR enables PAT-protected cloning **only for GitHub**; other hosts (GitLab, Gitea, etc.) remain public-only for now.
* help link to generate PAT
* pre-commit hooks
---------
Co-authored-by: cyclotruc <romain@coderamp.io>
* Add option to output digest to stdout
This change introduces the ability for users to direct the output of the gitingest tool to standard output (stdout) instead of writing to a file. This is useful for piping the output to other commands or viewing it directly in the terminal.
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
* fix: split sparse-checkout & commit checkout when cloning; refresh docs/CLI
* Run `git sparse-checkout set …` and `git checkout <sha>` as two calls—matches Git’s CLI rules and fixes failures.
* Tidy clone path creation via _ensure_directory; use DEFAULT_TIMEOUT.
* Clarify CLI/help strings and schema docstrings.
* Update tests for the new two-step checkout flow.
* feat(auth): support private GitHub repos & correct sparse-checkout flow
* CLI: new `--token/-t` flag (fallback to `GITHUB_TOKEN`)
* clone_repo:
* injects Basic-auth header when a PAT is supplied
* validates PAT format (`github_pat_*`)
* git_utils:
* `create_git_auth_header`, `validate_github_token`, `create_git_command`
* `_check_github_repo_exists` & branch-listing now work with tokens
* os_utils.ensure_directory extracted for reuse
* tests updated to reflect new call signatures
* allow git PAT to start with gth_
* fix GITHUB_PAT_PATTERN and add instructions to README
* fix gph_ to ghp_
* docs: add GITHUB_TOKEN env var example to README
* add GITHUB_TOKEN environment variable also in code
This commit introduces the `partial_clone_repo` function, which performs a sparse clone
of a repository (`git clone --filter=blob:none --sparse`) based on query parameters
from a `ParsedQuery` object.
- Add a new method (extact_clone_config) in ParsedQuery to encapsulate the creation
of a CloneConfig from query parameters.
- Replace repeated CloneConfig instantiation in repository_ingest.py and
query_processor.py with calls to the new method.
- Simplify code and improve maintainability by centralizing CloneConfig logic.
* Refactor cloning logic to support subpath-based partial clones
- Add `repo_name` and `subpath` fields to `CloneConfig` for flexible cloning.
- Split out `partial_clone_repo` and `full_clone_repo` to handle subpath vs. full clones.
- Update `CloneConfig` to include `repo_name` and `subpath`.
- Simplify query processing to always call `clone_repo`, which now delegates to partial or full clone.
- Improve docstrings to reflect new parameters and return types.
---------
Co-authored-by: cyclotruc <romain@coderamp.io>
* Add Python 3.9 support by using ParamSpec from typing_extensions and removing match statements
* Add Python 3.7 support by reverting inline generics and removing walrus usage
* Update pyproject.toml
* Improvement: Make the CLI work on windows
* Fix tmp file creation and add test
* add error message when git missing
* update CI to test windows and Macos
---------
Co-authored-by: Romain Courtois <romain@coderamp.io>
* Refactor project into a dedicated 'server' module and update all references accordingly
---------
Co-authored-by: Romain Courtois <romain@coderamp.io>
* Refactor code for lifespan, template usage, and improved tests
- Move background tasks and rate-limit handler into utils.py
- Reference TEMPLATES from config instead of inline Jinja2Templates
- Adopt Given/When/Then docstrings for test clarity
- Parametrize some tests and consolidate code across query_parser tests
- Add pytest.warns context handler to test_parse_repo_source_with_failed_git_command
- Introduce ParsedQuery dataclass to store query parameters and metadata
- Update ingestion and parser modules to use ParsedQuery instead of dict[str, Any]
- Convert ignore_patterns and include_patterns to sets
- Clean references to max size and pattern handling
- Update tests to reflect new dataclass usage
* feat: make parser domain-agnostic to support multiple Git hosts
- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](https://github.com/cyclotruc/gitingest/pull/115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
* Refactor project structure, enhance logic, update configurations, and improve code quality
Refactoring and Logic Improvements
- Refactored the `_scan_directory` function in `src/gitingest/ingest_from_query.py` by extracting loop logic into the new `_process_item` function, and further separating functionality into `_process_symlink` and `_process_file`
- Replaced multiple return statements with error raising and catching, introducing custom exceptions (`MaxFilesReachedError`, `MaxFileSizeReachedError`, `AlreadyVisitedError`) in the `_process_item` and `_scan_directory` functions
- Enhanced the logic in the `process_query` function in `src/process_query.py` for better flow and maintainability
- Improved the logic in `_generate_token_string` in `src/gitingest/ingest_from_query.py`
- Refined the `download_ingest` function in `src/routers/download.py` for better clarity and functionality
Exception Handling Enhancements
- Replaced broad `Exception` handling with specific `OSError` in the `_read_file_content` function in `src/gitingest/ingest_from_query.py`
- Refined exception handling throughout the codebase, including removing redundant try-except-raise blocks, e.g., in `clone_repo` function in `src/gitingest/clone.py`
- Added custom exceptions to `src/gitingest/exceptions.py`: `MaxFilesReachedError`, `MaxFileSizeReachedError`, and `AlreadyVisitedError`
- Included explicit re-raising of exceptions in various functions for improved error propagation
Test Suite Refactoring
- Cleaned up and reorganized test files:
- Moved tests from `src/gitingest/tests/` to `tests/`
- Consolidated fixtures from `tests/test_ingest.py` into `tests/conftest.py`
- Removed redundant content from `tests/conftest.py`
- Migrated configuration from `pytest.ini` to `pyproject.toml`, deleted `pytest.ini`, and updated `.dockerignore`
Documentation Improvements
- Added `darglint` for enforcing `numpy` docstring style in `.pre-commit-config.yaml` for `src/` files
- Updated docstrings throughout the codebase, including adding module docstrings where needed
- Updated `README.md`:
- Added "GitHub stars" badge
- Moved the "Discord" badge to its own line
- Replaced occurrences of "Gitingest" with "GitIngest" for consistency and clarity
Linting and Code Quality
- Integrated `pylint` into `.pre-commit-config.yaml` for both `src/` and `tests/` directories
- Created `tests/.pylintrc` for linting configuration specific to test files
Code Clean-up
- Removed the redundant `src/__init__.py` file
Naming Conventions and Code Style
- Renamed `logSliderToSize` to `log_slider_to_size` in `src/server_utils.py` for consistency with Python's naming conventions
- Added explicit encoding specification in multiple instances of `open` throughout the code