Skip to content

E2E Testing

End-to-end tests run a real LLM agent connected to the Falcon MCP Server to validate full tool-call workflows from natural language prompts.

Copy the development example file:

Terminal window
cp .env.dev.example .env

Then configure the E2E testing variables:

# Required
FALCON_CLIENT_ID=your-client-id
FALCON_CLIENT_SECRET=your-client-secret
# Optional (defaults to US-1)
FALCON_BASE_URL=https://api.crowdstrike.com
# API key for OpenAI or compatible API
OPENAI_API_KEY=your-api-key
# Optional: Custom base URL (for VPN-only or custom endpoints)
OPENAI_BASE_URL=https://your-custom-llm-endpoint.com/v1
# Optional: Comma-separated list of models to test against
MODELS_TO_TEST=example-model-1,example-model-2

E2E tests require the --run-e2e flag:

Run all E2E tests
uv run pytest --run-e2e tests/e2e/
Run a specific test
uv run pytest --run-e2e tests/e2e/test_mcp_server.py::TestFalconMCPServerE2E::test_get_top_3_high_severity_detections
Standard output
uv run pytest --run-e2e -s tests/e2e/
Verbose — tool calls and responses
uv run pytest --run-e2e -v -s tests/e2e/
Extra verbose — agent thought process, all events
uv run pytest --run-e2e -vv -s tests/e2e/

Each test runs multiple times against different models and passes if a threshold percentage succeeds. Defaults in tests/e2e/utils/base_e2e_test.py:

DEFAULT_MODELS_TO_TEST = ["gpt-4.1-mini", "gpt-4o-mini"]
DEFAULT_RUNS_PER_TEST = 2
DEFAULT_SUCCESS_THRESHOLD = 0.7 # 70% of runs must pass

Override with environment variables:

VariableDescription
MODELS_TO_TESTComma-separated model list
RUNS_PER_TESTNumber of runs per test
SUCCESS_THRESHOLDMinimum pass rate (0.0–1.0)

Not seeing any output?

Terminal window
# CORRECT: shows detailed output
uv run pytest --run-e2e -v -s tests/e2e/
# INCORRECT: no output visible
uv run pytest --run-e2e -v tests/e2e/

Using a custom LLM endpoint:

Terminal window
OPENAI_BASE_URL=https://your-endpoint.com/v1 uv run pytest --run-e2e -s tests/e2e/

Diagnosing failures: Use -vv -s to see complete prompt/response content and step-by-step agent execution.