E2E Testing
End-to-end tests run a real LLM agent connected to the Falcon MCP Server to validate full tool-call workflows from natural language prompts.
Configuration
Section titled “Configuration”Copy the development example file:
cp .env.dev.example .envThen configure the E2E testing variables:
# RequiredFALCON_CLIENT_ID=your-client-idFALCON_CLIENT_SECRET=your-client-secret
# Optional (defaults to US-1)FALCON_BASE_URL=https://api.crowdstrike.com
# API key for OpenAI or compatible APIOPENAI_API_KEY=your-api-key
# Optional: Custom base URL (for VPN-only or custom endpoints)OPENAI_BASE_URL=https://your-custom-llm-endpoint.com/v1
# Optional: Comma-separated list of models to test againstMODELS_TO_TEST=example-model-1,example-model-2Running E2E Tests
Section titled “Running E2E Tests”E2E tests require the --run-e2e flag:
uv run pytest --run-e2e tests/e2e/uv run pytest --run-e2e tests/e2e/test_mcp_server.py::TestFalconMCPServerE2E::test_get_top_3_high_severity_detectionsVerbosity Levels
Section titled “Verbosity Levels”uv run pytest --run-e2e -s tests/e2e/uv run pytest --run-e2e -v -s tests/e2e/uv run pytest --run-e2e -vv -s tests/e2e/Retry Logic
Section titled “Retry Logic”Each test runs multiple times against different models and passes if a threshold percentage succeeds. Defaults in tests/e2e/utils/base_e2e_test.py:
DEFAULT_MODELS_TO_TEST = ["gpt-4.1-mini", "gpt-4o-mini"]DEFAULT_RUNS_PER_TEST = 2DEFAULT_SUCCESS_THRESHOLD = 0.7 # 70% of runs must passOverride with environment variables:
| Variable | Description |
|---|---|
MODELS_TO_TEST | Comma-separated model list |
RUNS_PER_TEST | Number of runs per test |
SUCCESS_THRESHOLD | Minimum pass rate (0.0–1.0) |
Troubleshooting
Section titled “Troubleshooting”Not seeing any output?
# CORRECT: shows detailed outputuv run pytest --run-e2e -v -s tests/e2e/
# INCORRECT: no output visibleuv run pytest --run-e2e -v tests/e2e/Using a custom LLM endpoint:
OPENAI_BASE_URL=https://your-endpoint.com/v1 uv run pytest --run-e2e -s tests/e2e/Diagnosing failures: Use -vv -s to see complete prompt/response content and step-by-step agent execution.