Architecture

Overview

package-scan uses a modular architecture with ecosystem-specific adapters. This design allows easy extension to new package ecosystems while maintaining a consistent API.

src/package_scan/
├── cli.py                  # Multi-ecosystem CLI
├── core/                   # Shared components
│   ├── models.py           # Finding dataclass
│   ├── threat_database.py  # Multi-ecosystem CSV loading
│   └── report_engine.py    # Unified reporting
└── adapters/               # Ecosystem-specific scanners
    ├── base.py             # EcosystemAdapter interface
    ├── npm_adapter.py      # JavaScript/Node.js
    ├── java_adapter.py     # Maven/Gradle
    └── python_adapter.py   # pip/poetry/pipenv/conda

Core Components

CLI (cli.py)

The main command-line interface built with Click. Responsibilities:

Parse command-line arguments
Load threat database
Detect and instantiate appropriate adapters
Coordinate scanning across ecosystems
Aggregate and display results

Key Functions:

cli(): Main entry point
npm_scan_cli(): Legacy npm-only command

ThreatDatabase (core/threat_database.py)

Manages the threat database with multi-ecosystem support.

Responsibilities:

Load threats from CSV files
Support threat-specific loading (e.g., –threat sha1-Hulud)
Query compromised versions by ecosystem and package
Track which threats were loaded for reporting

Key Methods:

load_threats(): Load from threats directory or specific CSV
get_compromised_versions(ecosystem, package): Query vulnerabilities
get_all_packages(): Get all packages in database
get_ecosystems(): Get list of ecosystems
get_loaded_threats(): Get list of loaded threat names

Data Structure:

{
    'npm': {
        'left-pad': ['1.3.0'],
        'lodash': ['4.17.20', '4.17.21']
    },
    'maven': {
        'org.springframework:spring-core': ['5.3.0']
    }
}

ReportEngine (core/report_engine.py)

Aggregates findings from all adapters and generates reports.

Responsibilities:

Collect findings from multiple adapters
Generate summary statistics by ecosystem
Format console output
Export JSON reports

Key Methods:

add_findings(findings): Add findings from an adapter
print_report(): Display formatted console output
save_report(filepath): Export JSON report
_generate_summary(): Calculate statistics

Finding Model (core/models.py)

Standardized data structure for vulnerability findings.

Fields:

ecosystem: npm, maven, or pip
finding_type: manifest, lockfile, or installed
file_path: Location where found
package_name: Name of compromised package
version: Compromised version
match_type: exact or range
declared_spec: Original version specification (optional)
metadata: Ecosystem-specific additional data (optional)

Methods:

to_dict(): Convert to dictionary for JSON export
from_legacy_npm_dict(): Convert legacy format

Ecosystem Adapters

Base Adapter (adapters/base.py)

Abstract base class defining the adapter interface.

Required Methods:

_get_ecosystem_name(): Return ecosystem identifier
get_manifest_files(): List of manifest filenames
get_lockfile_names(): List of lockfile filenames
detect_projects(root_dir): Find project directories
scan_project(project_dir): Scan a single project

Provided Methods:

scan(root_dir): Scan directory tree (implemented in base)

NpmAdapter (adapters/npm_adapter.py)

Handles JavaScript/Node.js ecosystem.

Supported Files:

package.json (manifest)
package-lock.json (npm lockfile, v1/v2/v3)
yarn.lock (Yarn lockfile)
pnpm-lock.yaml (pnpm lockfile)
node_modules/ (installed packages)

Version Matching:

Uses semantic_version.NpmSpec for npm semver ranges:

^1.2.3 → >=1.2.3 <2.0.0
~1.2.3 → >=1.2.3 <1.3.0
>=1.0.0 <2.0.0 → range matching

Key Methods:

_scan_package_json(): Parse package.json
_scan_package_lock_json(): Parse package-lock.json
_scan_yarn_lock(): Parse yarn.lock
_scan_pnpm_lock_yaml(): Parse pnpm-lock.yaml
_scan_node_modules(): Scan installed packages

JavaAdapter (adapters/java_adapter.py)

Handles Maven and Gradle ecosystems.

Supported Files:

pom.xml (Maven manifest)
build.gradle (Gradle Groovy DSL)
build.gradle.kts (Gradle Kotlin DSL)
gradle.lockfile (Gradle lockfile)

Version Matching:

Maven ranges: [1.0,2.0), [1.0,), (,2.0)
Gradle dynamic: 1.2.+
Exact versions: 1.2.3

Package Naming:

Maven uses groupId:artifactId format: org.springframework:spring-core

Key Methods:

_scan_pom_xml(): Parse Maven POM with XML
_scan_gradle_build(): Parse Gradle files with regex
_is_maven_range(): Detect version ranges
_get_matching_maven_versions(): Match version ranges

PythonAdapter (adapters/python_adapter.py)

Handles Python package ecosystem.

Supported Files:

requirements.txt, requirements-*.txt
pyproject.toml (Poetry)
poetry.lock
Pipfile (pipenv)
Pipfile.lock
environment.yml (conda)

Version Matching:

PEP 440 specifiers:

==1.2.3 → exact match
>=1.0,<2.0 → range match
~=1.2.0 → compatible release
Poetry caret: ^1.2.3 → >=1.2.3,<2.0.0
Poetry tilde: ~1.2.3 → >=1.2.3,<1.3.0

Package Naming:

Normalized to lowercase (PyPI convention)

Key Methods:

_scan_requirements_txt(): Parse requirements files
_scan_pyproject_toml(): Parse Poetry manifests
_scan_poetry_lock(): Parse Poetry lockfiles
_scan_pipfile(): Parse Pipfile
_scan_pipfile_lock(): Parse Pipfile.lock
_scan_conda_environment(): Parse conda environments

Data Flow

CLI Parsing
- User runs package-scan --dir /path --threat sha1-Hulud
- CLI parses arguments and options
Threat Loading
- ThreatDatabase loads specified threat(s)
- CSV files parsed into in-memory structure
Adapter Detection
- CLI determines which ecosystems to scan
- Instantiates appropriate adapters
Project Discovery
- Each adapter scans directory tree
- Identifies relevant manifest/lockfiles
Scanning
- Adapters parse files and extract packages
- Version matching against threat database
- Generate Finding objects
Aggregation
- ReportEngine collects findings from all adapters
- Calculates summary statistics
Output
- Console: Formatted, grouped output
- JSON: Structured report with metadata

Adding New Ecosystems

To add support for a new ecosystem (e.g., Ruby/gem):

Create Adapter

Create src/package_scan/adapters/ruby_adapter.py

Inherit from Base

from .base import EcosystemAdapter

class RubyAdapter(EcosystemAdapter):
    def _get_ecosystem_name(self):
        return 'gem'

Implement Required Methods
- get_manifest_files() → [‘Gemfile’]
- get_lockfile_names() → [‘Gemfile.lock’]
- detect_projects() → Find directories with Gemfile
- scan_project() → Parse and scan Ruby files

Register Adapter

In adapters/__init__.py:

from .ruby_adapter import RubyAdapter

ADAPTER_REGISTRY = {
    'npm': NpmAdapter,
    'maven': JavaAdapter,
    'pip': PythonAdapter,
    'gem': RubyAdapter,  # Add new adapter
}

Add Test Fixtures

Create examples/test-ruby/ with sample files
Update Documentation

Add ecosystem to documentation and README

No changes to core components are needed!

Design Principles

Separation of Concerns: Core logic separate from ecosystem-specific code
Open/Closed Principle: Open for extension (new adapters), closed for modification (core)
Single Responsibility: Each component has one clear purpose
Dependency Inversion: Core depends on adapter interface, not implementations
Graceful Degradation: Optional dependencies handled with warnings

Performance Considerations

Lazy Loading: Only load adapters for detected ecosystems
Skip Patterns: Automatically skip node_modules, .git, venv, build directories
Streaming: Parse large files line-by-line where possible
No Execution: Pure static analysis, never executes package managers
Single Pass: Each file parsed once, results cached

Security Considerations

Sandboxed Execution: Only reads files, never executes commands
Path Validation: Stays within specified scan directory
No Network Access: No external API calls or downloads
Read-Only: Never modifies scanned files or directories
Static Analysis: Threat detection via file parsing only