CVE-2024-0243

EUVD-2024-0652

26.02.2024, 16:27

With the following crawler configuration:

```python
from bs4 import BeautifulSoup as Soup

url = "https://example.com"
loader = RecursiveUrlLoader(
    url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()
```

An attacker in control of the contents of `https://example.com` could place a malicious HTML file in there with links like "https://example.completely.different/my_file.html" and the crawler would proceed to download that file as well even though `prevent_outside=True`.

https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51

Resolved in https://github.com/langchain-ai/langchain/pull/15559

SSRF

Provider	Type	Base Score	Atk. Vector	Atk. Complexity	Priv. Required	Vector
NIST	Primary	8.1 HIGH	NETWORK	HIGH	NONE	CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H
@huntr_ai	CNA	3.7 LOW	LOCAL	HIGH	HIGH	CVSS:3.0/AV:L/AC:H/PR:H/UI:R/S:C/C:L/I:L/A:N

Base Score

CVSS 3.x

EPSS Score

Percentile: 22%

Affected Products (NVD)

Vendor	Product	Version
langchain	langchain	𝑥 < 0.1.0

𝑥

= Vulnerable software versions

Known Exploits!

Common Weakness Enumeration

CWE-918 - Server-Side Request Forgery (SSRF)
The web server receives a URL or similar request from an upstream component and retrieves the contents of this URL, but it does not sufficiently ensure that the request is being sent to the expected destination.

References

https://github.com/langchain-ai/langchain/commit/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22

https://github.com/langchain-ai/langchain/pull/15559

https://huntr.com/bounties/370904e7-10ac-40a4-a8d4-e2d16e1ca861

https://github.com/langchain-ai/langchain/commit/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22

https://github.com/langchain-ai/langchain/pull/15559

https://huntr.com/bounties/370904e7-10ac-40a4-a8d4-e2d16e1ca861