Commit graph

45 commits

Author SHA1 Message Date
jpotw
38c23171a1
feat: include_submodules option (#313)
* feat: add optional --include-submodules flag to CLI and ingestion

- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Adds include_submodules example in README.md
- Web UI for this option is not implemented for now (https://github.com/cyclotruc/gitingest/pull/313#issuecomment-3019912523)

---------

Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
2025-07-03 19:19:11 +02:00
Filip Christiansen
016817d559
feat: add Tailwind CSS pipeline, tag-aware cloning & overhaul CI/CD (#352)
Frontend

* introduce Tailwind CSS (package.json, tailwind.config.js, input CSS)
* build site.css on-the-fly (removed tracked artefact; added .gitignore)
* new favicon/icon assets & template cleanup
* split JS into modular files

Docker

* replace single-stage image with 3-stage build
  • css-builder (Node 20 alpine) → compiles Tailwind
  • python-builder installs project with PEP 621 metadata
  • runtime image copies site-packages + compiled CSS, runs as uid 1000

CI/CD

* ci.yml: cache by pyproject.toml, install with `pip -e .[dev]`
* new frontend job builds/archives CSS after tests
* publish.yml: build CSS first, then wheel/sdist; trusted OIDC upload
* tidy scorecard workflow

Core library

* clone.py, parser & utils now resolve tags in addition to branches/commits
* fallback branch/tag discovery when `git ls-remote` fails
* compat\_func.py back-ports Path.readlink / str.removesuffix for Py 3.8

Tooling & docs

* add `[dev]` extra, drop requirements-dev.txt & its pre-commit fixer
* refreshed CONTRIBUTING.md with Node/Tailwind instructions
* updated tests for new tag logic
2025-07-02 21:31:14 +02:00
Zarial
2b1f228ae1
feat: Refactor backend to a rest api (#346)
* refactor: Refactor backend to a rest api and make weak changes to the front to make it works

* fix: remove result bool in Success response object since it's useless, remove sensitive data from the Success Response object, make the integration unit test compatible

* fix: clean query processor return types, remove deprecated fields, better handling for few errors

* fix: unit tests, remove deprecated is_index from jinja templates

---------

Co-authored-by: ix <n.guintini@protonmail.com>
2025-07-02 16:36:39 +02:00
Filip Christiansen
f8d397e66e
refactor: centralize PAT validation, streamline repo checks & misc cleanup (#349)
Some checks failed
CI / test (macos-latest, 3.10) (push) Has been cancelled
CI / test (macos-latest, 3.11) (push) Has been cancelled
CI / test (macos-latest, 3.12) (push) Has been cancelled
CI / test (macos-latest, 3.13) (push) Has been cancelled
CI / test (macos-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.9) (push) Has been cancelled
CI / test (ubuntu-latest, 3.10) (push) Has been cancelled
CI / test (ubuntu-latest, 3.11) (push) Has been cancelled
CI / test (ubuntu-latest, 3.12) (push) Has been cancelled
CI / test (ubuntu-latest, 3.13) (push) Has been cancelled
CI / test (ubuntu-latest, 3.8) (push) Has been cancelled
CI / test (ubuntu-latest, 3.9) (push) Has been cancelled
CI / test (windows-latest, 3.10) (push) Has been cancelled
CI / test (windows-latest, 3.11) (push) Has been cancelled
CI / test (windows-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.13) (push) Has been cancelled
CI / test (windows-latest, 3.8) (push) Has been cancelled
CI / test (windows-latest, 3.9) (push) Has been cancelled
OSSF Scorecard / Scorecard analysis (push) Has been cancelled
* refactor: centralize PAT validation, streamline repo checks & housekeeping

* `.venv*` to `.gitignore`
* `# type: ignore[attr-defined]` hints in `compat_typing.py` for IDE-agnostic imports
* Helpful PAT string in `InvalidGitHubTokenError` for easier debugging

* Bump **ruff-pre-commit** hook → `v0.12.1`
* CONTRIBUTING:
  * Require **Python 3.9+**
  * Recommend signed (`-S`) commits
* PAT validation now happens **only** in entry points
  (`utils.auth.resolve_token` for CLI/lib, `server.process_query` for Web UI)
* Unified `_check_github_repo_exists` into `check_repo_exists`, replacing
  `curl -I` with `curl --silent --location --write-out %{http_code} -o /dev/null`
* Broaden `_GITHUB_PAT_PATTERN`
* `create_git_auth_header` raises `ValueError` when hostname is missing
* Tests updated to expect raw HTTP-code output

* Superfluous “token can be set via `GITHUB_TOKEN`” notes in docstrings
* `.gitingestignore` & `.terraform` from `DEFAULT_IGNORE_PATTERNS`
* Token validation inside `create_git_command`
* Obsolete `test_create_git_command_invalid_token`

* Adjust `test_clone.py` and `test_git_utils.py` for new status-code handling
* Consolidate mocks after token-validation relocation

BREAKING CHANGE:
`create_git_command` no longer validates GitHub tokens; callers must ensure
tokens are valid (via `validate_github_token`) before invoking lower-level
git helpers.


---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-01 14:21:13 +02:00
Filip Christiansen
d099de5d3b
fix GitHub PAT regex (#341)
Some checks are pending
CI / test (macos-latest, 3.10) (push) Waiting to run
CI / test (macos-latest, 3.11) (push) Waiting to run
CI / test (macos-latest, 3.12) (push) Waiting to run
CI / test (macos-latest, 3.13) (push) Waiting to run
CI / test (macos-latest, 3.8) (push) Waiting to run
CI / test (macos-latest, 3.9) (push) Waiting to run
CI / test (ubuntu-latest, 3.10) (push) Waiting to run
CI / test (ubuntu-latest, 3.11) (push) Waiting to run
CI / test (ubuntu-latest, 3.12) (push) Waiting to run
CI / test (ubuntu-latest, 3.13) (push) Waiting to run
CI / test (ubuntu-latest, 3.8) (push) Waiting to run
CI / test (ubuntu-latest, 3.9) (push) Waiting to run
CI / test (windows-latest, 3.10) (push) Waiting to run
CI / test (windows-latest, 3.11) (push) Waiting to run
CI / test (windows-latest, 3.12) (push) Waiting to run
CI / test (windows-latest, 3.13) (push) Waiting to run
CI / test (windows-latest, 3.8) (push) Waiting to run
CI / test (windows-latest, 3.9) (push) Waiting to run
OSSF Scorecard / Scorecard analysis (push) Waiting to run
2025-06-30 22:00:51 +02:00
Filip Christiansen
2f447ae632
chore: switch to ruff + pydoclint, deprecate .gitingest, and perform a repo-wide quality sweep (#329)
Some checks failed
CI / test (macos-latest, 3.10) (push) Has been cancelled
CI / test (macos-latest, 3.11) (push) Has been cancelled
CI / test (macos-latest, 3.12) (push) Has been cancelled
CI / test (macos-latest, 3.13) (push) Has been cancelled
CI / test (macos-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.9) (push) Has been cancelled
CI / test (ubuntu-latest, 3.10) (push) Has been cancelled
CI / test (ubuntu-latest, 3.11) (push) Has been cancelled
CI / test (ubuntu-latest, 3.12) (push) Has been cancelled
CI / test (ubuntu-latest, 3.13) (push) Has been cancelled
CI / test (ubuntu-latest, 3.8) (push) Has been cancelled
CI / test (ubuntu-latest, 3.9) (push) Has been cancelled
CI / test (windows-latest, 3.10) (push) Has been cancelled
CI / test (windows-latest, 3.11) (push) Has been cancelled
CI / test (windows-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.13) (push) Has been cancelled
CI / test (windows-latest, 3.8) (push) Has been cancelled
CI / test (windows-latest, 3.9) (push) Has been cancelled
OSSF Scorecard / Scorecard analysis (push) Has been cancelled
* **Pre-commit**: replace `black` & `darglint` with `ruff-check` / `ruff-format`;
  add `pydoclint` for docstring quality
* **Deps**: drop `tomli`; tighten `typing_extensions`; add `eval-type-backport`;
  remove `black`, `djlint`, `pylint` from `requirements-dev`
* **Ignore files**: deprecate TOML-based `.gitingest`; introduce
  `.gitingestignore` (git-wildmatch, parsed via `_parse_ignore_file`)
* **Config**: new unified `[tool.ruff]` (lint + format + isort); delete
  `[tool.black]`, keep minimal `[tool.isort]` for now
* **Refactor/style**: adopt `from __future__ import annotations`, kw-only args,
  richer types; reorder params & `__all__`; move type-only imports under
  `if TYPE_CHECKING`; extract `_CLIArgs` `TypedDict`, migrate form data to
  `pydantic.QueryForm`; deduplicate `cli.main` / `_async_main`; use `pathlib`,
  avoid file-IO in async; replace magic numbers with constants; delete
  `is_text_file` (logic now lives in `FileSystemNode.content`)
* **Bug fix**: remove silent error in `notebook_utils._process_cell`
* **Docs**: refresh README badges
* **Tests**: update fixtures & assertions

**BREAKING**: new `.gitingestignore` file replaces (now-deprecated) `.gitingest`.

No functional API or CLI changes.
2025-06-28 18:49:37 +02:00
Juan Cruz-Benito
af95bae087
Improve checks for github.com links & adding compatibility with GHE companies that use github.<company>.xxx type of links (#334)
* Improve checks for github.com links & adding compatibility with GHE companies that use github.<company>.xxx type of links
2025-06-28 03:40:33 +02:00
Arman
ba701a80c9
feat: ignore .gitignore files by default (use --include-gitignored to stay
Some checks are pending
CI / test (macos-latest, 3.10) (push) Waiting to run
CI / test (macos-latest, 3.11) (push) Waiting to run
CI / test (macos-latest, 3.12) (push) Waiting to run
CI / test (macos-latest, 3.13) (push) Waiting to run
CI / test (macos-latest, 3.8) (push) Waiting to run
CI / test (macos-latest, 3.9) (push) Waiting to run
CI / test (ubuntu-latest, 3.10) (push) Waiting to run
CI / test (ubuntu-latest, 3.11) (push) Waiting to run
CI / test (ubuntu-latest, 3.12) (push) Waiting to run
CI / test (ubuntu-latest, 3.13) (push) Waiting to run
CI / test (ubuntu-latest, 3.8) (push) Waiting to run
CI / test (ubuntu-latest, 3.9) (push) Waiting to run
CI / test (windows-latest, 3.10) (push) Waiting to run
CI / test (windows-latest, 3.11) (push) Waiting to run
CI / test (windows-latest, 3.12) (push) Waiting to run
CI / test (windows-latest, 3.13) (push) Waiting to run
CI / test (windows-latest, 3.8) (push) Waiting to run
CI / test (windows-latest, 3.9) (push) Waiting to run
OSSF Scorecard / Scorecard analysis (push) Waiting to run
* use_gitignore flag to exclude gitignore
---------

Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
2025-06-25 05:04:50 +02:00
Filip Christiansen
e5fadce158
feat(parser): relax host validation to support self-hosted GitLab & git.* domains (#314)
• Accept hosts starting with “git.” or “gitlab.” in _looks_like_git_host
• Update doc-strings to document the heuristic
• Adjust git-host-agnostic tests: expect ValueError for slug form with
  custom hosts; add real GitLab instance (git.rwth-aachen.de) to matrix
2025-06-23 20:50:08 +02:00
Romain Courtois
db7ee0cc07
Fix: Add proper test isolation for CLI stdout output test (#298) 2025-06-22 18:42:02 +02:00
Filip Christiansen
95009bdf15
test: add pytest-mock, introduce fixtures & type hints (#290)
* Added pytest-mock to dev dependencies and pre-commit hooks
* Introduced InvalidGitHubTokenError for clearer token-validation failures
* Refactored tests:
  * Replaced ad-hoc mocks with reusable fixtures
  * Parametrised URL/branch matrices to cut duplication
  * Added type hints throughout
* New coverage:
  * validate_github_token (happy & error paths)
  * create_git_command / create_git_auth_header
2025-06-21 21:26:29 +02:00
Filip Christiansen
3869aa32e3
feat(web-ui): add private-GitHub ingestion via PAT (#286)
* feat(web-ui, backend): allow ingesting private GitHub repos with PAT authentication

* Accept a GitHub personal access token (PAT) from the UI and forward it through
  - `git_form.jinja` → new “Private Repository” checkbox + PAT field
  - routers (`index.py`, `dynamic.py`) and `query_processor.py`
* Propagate `token` throughout the ingestion stack
  - `gitingest.entrypoint.parse_query`
  - `query_parsing` (including `try_domains_for_user_and_repo`) so we can infer the host when the user enters a bare “user/repo” slug
* Tests
  - Added `"token": ""` to the `form_data` dict in the tests in `tests/test_flow_integration.py`

**Limitation:** This PR enables PAT-protected cloning **only for GitHub**; other hosts (GitLab, Gitea, etc.) remain public-only for now.

* help link to generate PAT

* pre-commit hooks

---------

Co-authored-by: cyclotruc <romain@coderamp.io>
2025-06-21 20:19:16 +02:00
Casey West
c656635f6d
Add option to output digest to stdout (#264)
Some checks failed
CI / test (macos-latest, 3.10) (push) Has been cancelled
CI / test (macos-latest, 3.11) (push) Has been cancelled
CI / test (macos-latest, 3.12) (push) Has been cancelled
CI / test (macos-latest, 3.13) (push) Has been cancelled
CI / test (macos-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.9) (push) Has been cancelled
CI / test (ubuntu-latest, 3.10) (push) Has been cancelled
CI / test (ubuntu-latest, 3.11) (push) Has been cancelled
CI / test (ubuntu-latest, 3.12) (push) Has been cancelled
CI / test (ubuntu-latest, 3.13) (push) Has been cancelled
CI / test (ubuntu-latest, 3.8) (push) Has been cancelled
CI / test (ubuntu-latest, 3.9) (push) Has been cancelled
CI / test (windows-latest, 3.10) (push) Has been cancelled
CI / test (windows-latest, 3.11) (push) Has been cancelled
CI / test (windows-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.13) (push) Has been cancelled
CI / test (windows-latest, 3.8) (push) Has been cancelled
CI / test (windows-latest, 3.9) (push) Has been cancelled
OSSF Scorecard / Scorecard analysis (push) Has been cancelled
* Add option to output digest to stdout

This change introduces the ability for users to direct the output of the gitingest tool to standard output (stdout) instead of writing to a file. This is useful for piping the output to other commands or viewing it directly in the terminal.

Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
2025-06-19 09:21:13 +02:00
Filip Christiansen
1dd133c3e0
feat: add private-repo support to CLI & core (UI coming next) (#282)
Some checks failed
CI / test (macos-latest, 3.10) (push) Has been cancelled
CI / test (macos-latest, 3.11) (push) Has been cancelled
CI / test (macos-latest, 3.12) (push) Has been cancelled
CI / test (macos-latest, 3.13) (push) Has been cancelled
CI / test (macos-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.9) (push) Has been cancelled
CI / test (ubuntu-latest, 3.10) (push) Has been cancelled
CI / test (ubuntu-latest, 3.11) (push) Has been cancelled
CI / test (ubuntu-latest, 3.12) (push) Has been cancelled
CI / test (ubuntu-latest, 3.13) (push) Has been cancelled
CI / test (ubuntu-latest, 3.8) (push) Has been cancelled
CI / test (ubuntu-latest, 3.9) (push) Has been cancelled
CI / test (windows-latest, 3.10) (push) Has been cancelled
CI / test (windows-latest, 3.11) (push) Has been cancelled
CI / test (windows-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.13) (push) Has been cancelled
CI / test (windows-latest, 3.8) (push) Has been cancelled
CI / test (windows-latest, 3.9) (push) Has been cancelled
OSSF Scorecard / Scorecard analysis (push) Has been cancelled
* fix: split sparse-checkout & commit checkout when cloning; refresh docs/CLI

* Run `git sparse-checkout set …` and `git checkout <sha>` as two calls—matches Git’s CLI rules and fixes failures.
* Tidy clone path creation via _ensure_directory; use DEFAULT_TIMEOUT.
* Clarify CLI/help strings and schema docstrings.
* Update tests for the new two-step checkout flow.

* feat(auth): support private GitHub repos & correct sparse-checkout flow

* CLI: new `--token/-t` flag (fallback to `GITHUB_TOKEN`)
* clone_repo:
  * injects Basic-auth header when a PAT is supplied
  * validates PAT format (`github_pat_*`)
* git_utils:
  * `create_git_auth_header`, `validate_github_token`, `create_git_command`
  * `_check_github_repo_exists` & branch-listing now work with tokens
* os_utils.ensure_directory extracted for reuse
* tests updated to reflect new call signatures

* allow git PAT to start with gth_

* fix GITHUB_PAT_PATTERN and add instructions to README

* fix gph_ to ghp_

* docs: add GITHUB_TOKEN env var example to README

* add GITHUB_TOKEN environment variable also in code
2025-06-15 23:30:46 +02:00
Aaron
789be9b339
fix: traverse directories to allow pattern matching of files within them (#259)
Some checks failed
CI / test (ubuntu-latest, 3.13) (push) Has been cancelled
CI / test (ubuntu-latest, 3.8) (push) Has been cancelled
CI / test (ubuntu-latest, 3.9) (push) Has been cancelled
CI / test (windows-latest, 3.10) (push) Has been cancelled
CI / test (windows-latest, 3.11) (push) Has been cancelled
CI / test (windows-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.13) (push) Has been cancelled
CI / test (windows-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.10) (push) Has been cancelled
CI / test (macos-latest, 3.11) (push) Has been cancelled
CI / test (macos-latest, 3.12) (push) Has been cancelled
CI / test (macos-latest, 3.13) (push) Has been cancelled
CI / test (macos-latest, 3.8) (push) Has been cancelled
CI / test (macos-latest, 3.9) (push) Has been cancelled
CI / test (ubuntu-latest, 3.10) (push) Has been cancelled
CI / test (ubuntu-latest, 3.11) (push) Has been cancelled
CI / test (ubuntu-latest, 3.12) (push) Has been cancelled
CI / test (windows-latest, 3.9) (push) Has been cancelled
OSSF Scorecard / Scorecard analysis (push) Has been cancelled
* fix: traverse directories to allow pattern matching of files within them
2025-06-13 17:30:49 +02:00
Filip Christiansen
8be6f5620f
refactor: rename clone to clone_repo and consolidate schema & utility modules (#237)
* refactor: rename clone to clone_repo and consolidate schema & utility modules
2025-03-22 18:56:39 +01:00
Filip Christiansen
7923fab077
chore: run pre-commit autoupdate 2025-03-21 13:28:11 +01:00
Filip Christiansen
3cee6725d3
Remove unused pattern_type parameter from IngestionQuery fixture (#228) 2025-03-13 02:35:18 +01:00
Romain Courtois
b098bb4534
Refactor/pydantic(#226) 2025-03-11 00:56:58 +01:00
Romain Courtois
d6cb920660
Refactor/ingestion (#209)
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
2025-03-04 01:11:54 +01:00
Filip Christiansen
f4fd4bbe7a
feat: partial cloning (#188)
This commit introduces the `partial_clone_repo` function, which performs a sparse clone
of a repository (`git clone --filter=blob:none --sparse`) based on query parameters
from a `ParsedQuery` object.

- Add a new method (extact_clone_config) in ParsedQuery to encapsulate the creation
  of a CloneConfig from query parameters.
- Replace repeated CloneConfig instantiation in repository_ingest.py and
  query_processor.py with calls to the new method.
- Simplify code and improve maintainability by centralizing CloneConfig logic.

* Refactor cloning logic to support subpath-based partial clones

- Add `repo_name` and `subpath` fields to `CloneConfig` for flexible cloning.
- Split out `partial_clone_repo` and `full_clone_repo` to handle subpath vs. full clones.
- Update `CloneConfig` to include `repo_name` and `subpath`.
- Simplify query processing to always call `clone_repo`, which now delegates to partial or full clone.
- Improve docstrings to reflect new parameters and return types.

---------

Co-authored-by: cyclotruc <romain@coderamp.io>
2025-02-19 10:36:08 +01:00
cyclotruc
b227748bc1 fix test_query_parser with gist.github.com 2025-02-17 17:52:20 +00:00
Filip Christiansen
4397a45281
feat: Add Python 3.7 Support and Restore Compatibility with Older Syntax (#181)
* Add Python 3.9 support by using ParamSpec from typing_extensions and removing match statements

* Add Python 3.7 support by reverting inline generics and removing walrus usage

* Update pyproject.toml
2025-02-17 11:36:57 +01:00
cyclotruc
9be28a4eef add submodules to tests 2025-02-15 06:57:28 +00:00
Shrey Purohit
a2d9dfaaae
Improvement: Make the CLI work on windows (#161)
* Improvement: Make the CLI work on windows
* Fix tmp file creation and add test
* add error message when git missing
* update CI to test windows and Macos

---------

Co-authored-by: Romain Courtois <romain@coderamp.io>
2025-02-04 06:43:11 +01:00
Rayan Louahche
ecaee49329
Comprehensive Integration Test Suite (#140) 2025-01-25 00:10:58 +01:00
Javier
361147a6fd
feat: add branch option to CLI and ingest function for cloning specific branches (#155) 2025-01-24 22:27:55 +01:00
Filip Christiansen
b34b7f47a1
refactor: refactor codebase to unify server module and update file paths (#142)
* Refactor project into a dedicated 'server' module and update all references accordingly

---------

Co-authored-by: Romain Courtois <romain@coderamp.io>
2025-01-24 07:12:07 +01:00
Filip Christiansen
58dbe2cb7e
refactor: code cleanup, utils extraction, and test improvements (#141)
* Refactor code for lifespan, template usage, and improved tests

- Move background tasks and rate-limit handler into utils.py
- Reference TEMPLATES from config instead of inline Jinja2Templates
- Adopt Given/When/Then docstrings for test clarity
- Parametrize some tests and consolidate code across query_parser tests
- Add pytest.warns context handler to test_parse_repo_source_with_failed_git_command
2025-01-23 02:46:17 +01:00
Gowtham Kishore
18368097f9
Enhancements: CLI Test Cases (#144) 2025-01-22 16:39:57 +01:00
Filip Christiansen
d721b00d03
Replace dict-based query with ParsedQuery dataclass (#133)
- Introduce ParsedQuery dataclass to store query parameters and metadata
- Update ingestion and parser modules to use ParsedQuery instead of dict[str, Any]
- Convert ignore_patterns and include_patterns to sets
- Clean references to max size and pattern handling
- Update tests to reflect new dataclass usage
2025-01-17 16:53:32 +01:00
Gowtham Kishore
3ce8e7e21e
fix: handling of branch names with slashes (#131)
* Fixing the branch name with nested /

* Adding fallback logic if git command fails'
2025-01-16 20:55:26 +01:00
Filip Christiansen
0e74d6777f
Add gitingest.com to known Git hosts (#134) 2025-01-15 02:14:12 +01:00
Filip Christiansen
8137ce1064
test: add coverage for run_ingest_query and clone_repo async timeout (#129) 2025-01-15 02:12:45 +01:00
Filip Christiansen
60391143f6
feat: add optional parameter to include notebook cell outputs in generated script (#128) 2025-01-13 21:51:00 +01:00
Filip Christiansen
dd8f1e0aac
feat: enhance parser domain-agnostic support (#117)
* feat: make parser domain-agnostic to support multiple Git hosts

- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](https://github.com/cyclotruc/gitingest/pull/115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
2025-01-13 05:46:29 +01:00
Rayan Louahche
1fd741ae17
Enhanced Directory Pattern Matching Test Coverage (#123) 2025-01-13 04:23:12 +01:00
Rayan Louahche
e8663c676a
Add test for non-existent file extension pattern (*.qwerty) (#121) 2025-01-12 08:34:21 +01:00
Rayan Louahche
aa470aa818
Add test for *.txt include pattern filtering (#116) 2025-01-09 13:24:51 +01:00
Filip Christiansen
6d92ed9f59
refactor: refactor module names to avoid function/module name clashes (#114)
- Renamed:
  - `clone.py`       → `repository_clone.py`
  - `ingest.py`      → `repository_ingest.py`
  - `ingest_from_query.py` → `query_ingestion.py`
  - `parse_query.py` → `query_parser.py`
2025-01-09 00:30:39 +01:00
Filip Christiansen
d2825eac20
feat: add support for improved handling of jupyter notebooks (#105) 2025-01-08 23:46:24 +01:00
Filip Christiansen
551d09ac9a
fix(parse_query): make URL handling case insensitive (#115) 2025-01-08 23:14:49 +01:00
Joydeep Tripathy
9024c33376
docs: added docs for test files (#104) 2025-01-07 10:18:20 +01:00
Filip Christiansen
123f0ef0d7
Refactor: Replace os.path usage with pathlib.Path for improved maintainability (#106) 2025-01-07 10:13:22 +01:00
Filip Christiansen
d1b21d97ac
Refactor project structure, enhance logic, update configurations, and improve code quality (#85)
* Refactor project structure, enhance logic, update configurations, and improve code quality

Refactoring and Logic Improvements

- Refactored the `_scan_directory` function in `src/gitingest/ingest_from_query.py` by extracting loop logic into the new `_process_item` function, and further separating functionality into `_process_symlink` and `_process_file`
- Replaced multiple return statements with error raising and catching, introducing custom exceptions (`MaxFilesReachedError`, `MaxFileSizeReachedError`, `AlreadyVisitedError`) in the `_process_item` and `_scan_directory` functions
- Enhanced the logic in the `process_query` function in `src/process_query.py` for better flow and maintainability
- Improved the logic in `_generate_token_string` in `src/gitingest/ingest_from_query.py`
- Refined the `download_ingest` function in `src/routers/download.py` for better clarity and functionality

Exception Handling Enhancements

- Replaced broad `Exception` handling with specific `OSError` in the `_read_file_content` function in `src/gitingest/ingest_from_query.py`
- Refined exception handling throughout the codebase, including removing redundant try-except-raise blocks, e.g., in `clone_repo` function in `src/gitingest/clone.py`
- Added custom exceptions to `src/gitingest/exceptions.py`: `MaxFilesReachedError`, `MaxFileSizeReachedError`, and `AlreadyVisitedError`
- Included explicit re-raising of exceptions in various functions for improved error propagation

Test Suite Refactoring

- Cleaned up and reorganized test files:
  - Moved tests from `src/gitingest/tests/` to `tests/`
  - Consolidated fixtures from `tests/test_ingest.py` into `tests/conftest.py`
  - Removed redundant content from `tests/conftest.py`
- Migrated configuration from `pytest.ini` to `pyproject.toml`, deleted `pytest.ini`, and updated `.dockerignore`

Documentation Improvements

- Added `darglint` for enforcing `numpy` docstring style in `.pre-commit-config.yaml` for `src/` files
- Updated docstrings throughout the codebase, including adding module docstrings where needed
- Updated `README.md`:
  - Added "GitHub stars" badge
  - Moved the "Discord" badge to its own line
  - Replaced occurrences of "Gitingest" with "GitIngest" for consistency and clarity

Linting and Code Quality

- Integrated `pylint` into `.pre-commit-config.yaml` for both `src/` and `tests/` directories
- Created `tests/.pylintrc` for linting configuration specific to test files

Code Clean-up

- Removed the redundant `src/__init__.py` file

Naming Conventions and Code Style

- Renamed `logSliderToSize` to `log_slider_to_size` in `src/server_utils.py` for consistency with Python's naming conventions
- Added explicit encoding specification in multiple instances of `open` throughout the code
2025-01-03 08:33:28 +01:00