CVE-2024-5206

EUVD-2024-0161

06.06.2024, 19:16

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

Provider	Type	Base Score	Atk. Vector	Atk. Complexity	Priv. Required	Vector
NIST	Primary	4.7 MEDIUM	LOCAL	HIGH	LOW	CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N

Base Score

CVSS 3.x

EPSS Score

Percentile: 8%

Affected Products (NVD)

Vendor	Product	Version
scikit-learn	scikit-learn	𝑥 < 1.5.0

𝑥

= Vulnerable software versions

Early Detection

Affected products identified ahead of NVD analysis through intelligence sources.

Vendor	Product	Version	Source
scikit-learn	scikit-learn	𝑥 < 1.5.0	ADP

Debian Releases

Debian Product

Codename

scikit-learn

bookworm	unimportant
bullseye	unimportant
forky	unimportant
sid	unimportant
trixie	unimportant

Ubuntu Releases

Ubuntu Product

Codename

scikit-learn

bionic	needs-triage
focal	needs-triage
jammy	needs-triage
mantic	ignored
noble	needs-triage
oracular	ignored
plucky	needs-triage
questing	needs-triage
trusty	needs-triage
xenial	needs-triage

Common Weakness Enumeration

References

https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8

https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c

https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8

https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c