Architecture
Overview
package-scan uses a modular architecture with ecosystem-specific adapters. This design allows easy extension to new package ecosystems while maintaining a consistent API.
src/package_scan/
├── cli.py # Multi-ecosystem CLI
├── core/ # Shared components
│ ├── models.py # Finding dataclass
│ ├── threat_database.py # Multi-ecosystem CSV loading
│ └── report_engine.py # Unified reporting
└── adapters/ # Ecosystem-specific scanners
├── base.py # EcosystemAdapter interface
├── npm_adapter.py # JavaScript/Node.js
├── java_adapter.py # Maven/Gradle
└── python_adapter.py # pip/poetry/pipenv/conda
Core Components
CLI (cli.py)
The main command-line interface built with Click. Responsibilities:
Parse command-line arguments
Load threat database
Detect and instantiate appropriate adapters
Coordinate scanning across ecosystems
Aggregate and display results
Key Functions:
cli(): Main entry pointnpm_scan_cli(): Legacy npm-only command
ThreatDatabase (core/threat_database.py)
Manages the threat database with multi-ecosystem support.
Responsibilities:
Load threats from CSV files
Support threat-specific loading (e.g., –threat sha1-Hulud)
Query compromised versions by ecosystem and package
Track which threats were loaded for reporting
Key Methods:
load_threats(): Load from threats directory or specific CSVget_compromised_versions(ecosystem, package): Query vulnerabilitiesget_all_packages(): Get all packages in databaseget_ecosystems(): Get list of ecosystemsget_loaded_threats(): Get list of loaded threat names
Data Structure:
{
'npm': {
'left-pad': ['1.3.0'],
'lodash': ['4.17.20', '4.17.21']
},
'maven': {
'org.springframework:spring-core': ['5.3.0']
}
}
ReportEngine (core/report_engine.py)
Aggregates findings from all adapters and generates reports.
Responsibilities:
Collect findings from multiple adapters
Generate summary statistics by ecosystem
Format console output
Export JSON reports
Key Methods:
add_findings(findings): Add findings from an adapterprint_report(): Display formatted console outputsave_report(filepath): Export JSON report_generate_summary(): Calculate statistics
Finding Model (core/models.py)
Standardized data structure for vulnerability findings.
Fields:
ecosystem: npm, maven, or pipfinding_type: manifest, lockfile, or installedfile_path: Location where foundpackage_name: Name of compromised packageversion: Compromised versionmatch_type: exact or rangedeclared_spec: Original version specification (optional)metadata: Ecosystem-specific additional data (optional)
Methods:
to_dict(): Convert to dictionary for JSON exportfrom_legacy_npm_dict(): Convert legacy format
Ecosystem Adapters
Base Adapter (adapters/base.py)
Abstract base class defining the adapter interface.
Required Methods:
_get_ecosystem_name(): Return ecosystem identifierget_manifest_files(): List of manifest filenamesget_lockfile_names(): List of lockfile filenamesdetect_projects(root_dir): Find project directoriesscan_project(project_dir): Scan a single project
Provided Methods:
scan(root_dir): Scan directory tree (implemented in base)
NpmAdapter (adapters/npm_adapter.py)
Handles JavaScript/Node.js ecosystem.
Supported Files:
package.json (manifest)
package-lock.json (npm lockfile, v1/v2/v3)
yarn.lock (Yarn lockfile)
pnpm-lock.yaml (pnpm lockfile)
node_modules/ (installed packages)
Version Matching:
Uses semantic_version.NpmSpec for npm semver ranges:
^1.2.3→ >=1.2.3 <2.0.0~1.2.3→ >=1.2.3 <1.3.0>=1.0.0 <2.0.0→ range matching
Key Methods:
_scan_package_json(): Parse package.json_scan_package_lock_json(): Parse package-lock.json_scan_yarn_lock(): Parse yarn.lock_scan_pnpm_lock_yaml(): Parse pnpm-lock.yaml_scan_node_modules(): Scan installed packages
JavaAdapter (adapters/java_adapter.py)
Handles Maven and Gradle ecosystems.
Supported Files:
pom.xml (Maven manifest)
build.gradle (Gradle Groovy DSL)
build.gradle.kts (Gradle Kotlin DSL)
gradle.lockfile (Gradle lockfile)
Version Matching:
Maven ranges:
[1.0,2.0),[1.0,),(,2.0)Gradle dynamic:
1.2.+Exact versions:
1.2.3
Package Naming:
Maven uses groupId:artifactId format:
org.springframework:spring-core
Key Methods:
_scan_pom_xml(): Parse Maven POM with XML_scan_gradle_build(): Parse Gradle files with regex_is_maven_range(): Detect version ranges_get_matching_maven_versions(): Match version ranges
PythonAdapter (adapters/python_adapter.py)
Handles Python package ecosystem.
Supported Files:
requirements.txt, requirements-*.txt
pyproject.toml (Poetry)
poetry.lock
Pipfile (pipenv)
Pipfile.lock
environment.yml (conda)
Version Matching:
PEP 440 specifiers:
==1.2.3→ exact match>=1.0,<2.0→ range match~=1.2.0→ compatible releasePoetry caret:
^1.2.3→ >=1.2.3,<2.0.0Poetry tilde:
~1.2.3→ >=1.2.3,<1.3.0
Package Naming:
Normalized to lowercase (PyPI convention)
Key Methods:
_scan_requirements_txt(): Parse requirements files_scan_pyproject_toml(): Parse Poetry manifests_scan_poetry_lock(): Parse Poetry lockfiles_scan_pipfile(): Parse Pipfile_scan_pipfile_lock(): Parse Pipfile.lock_scan_conda_environment(): Parse conda environments
Data Flow
CLI Parsing
User runs
package-scan --dir /path --threat sha1-HuludCLI parses arguments and options
Threat Loading
ThreatDatabase loads specified threat(s)
CSV files parsed into in-memory structure
Adapter Detection
CLI determines which ecosystems to scan
Instantiates appropriate adapters
Project Discovery
Each adapter scans directory tree
Identifies relevant manifest/lockfiles
Scanning
Adapters parse files and extract packages
Version matching against threat database
Generate Finding objects
Aggregation
ReportEngine collects findings from all adapters
Calculates summary statistics
Output
Console: Formatted, grouped output
JSON: Structured report with metadata
Adding New Ecosystems
To add support for a new ecosystem (e.g., Ruby/gem):
Create Adapter
Create
src/package_scan/adapters/ruby_adapter.pyInherit from Base
from .base import EcosystemAdapter class RubyAdapter(EcosystemAdapter): def _get_ecosystem_name(self): return 'gem'
Implement Required Methods
get_manifest_files()→ [‘Gemfile’]get_lockfile_names()→ [‘Gemfile.lock’]detect_projects()→ Find directories with Gemfilescan_project()→ Parse and scan Ruby files
Register Adapter
In
adapters/__init__.py:from .ruby_adapter import RubyAdapter ADAPTER_REGISTRY = { 'npm': NpmAdapter, 'maven': JavaAdapter, 'pip': PythonAdapter, 'gem': RubyAdapter, # Add new adapter }
Add Test Fixtures
Create
examples/test-ruby/with sample filesUpdate Documentation
Add ecosystem to documentation and README
No changes to core components are needed!
Design Principles
- Separation of Concerns
Core logic separate from ecosystem-specific code
- Open/Closed Principle
Open for extension (new adapters), closed for modification (core)
- Single Responsibility
Each component has one clear purpose
- Dependency Inversion
Core depends on adapter interface, not implementations
- Graceful Degradation
Optional dependencies handled with warnings
Performance Considerations
- Lazy Loading
Only load adapters for detected ecosystems
- Skip Patterns
Automatically skip node_modules, .git, venv, build directories
- Streaming
Parse large files line-by-line where possible
- No Execution
Pure static analysis, never executes package managers
- Single Pass
Each file parsed once, results cached
Security Considerations
- Sandboxed Execution
Only reads files, never executes commands
- Path Validation
Stays within specified scan directory
- No Network Access
No external API calls or downloads
- Read-Only
Never modifies scanned files or directories
- Static Analysis
Threat detection via file parsing only