CVE-2024-5206

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.
ProviderTypeBase ScoreAtk. VectorAtk. ComplexityPriv. RequiredVector
NISTNIST
4.7 MEDIUM
LOCAL
HIGH
LOW
CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N
@huntr_aiCNA
4.7 MEDIUM
LOCAL
HIGH
LOW
CVSS:3.0/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N
CISA-ADPADP
---
---
CVEADP
---
---
Base Score
CVSS 3.x
EPSS Score
Percentile: 6%
VendorProductVersion
scikit-learnscikit-learn
𝑥
< 1.5.0
𝑥
= Vulnerable software versions
Debian logo
Debian Releases
Debian Product
Codename
scikit-learn
bullseye
unimportant
bookworm
unimportant
sid
unimportant
trixie
unimportant
Ubuntu logo
Ubuntu Releases
Ubuntu Product
Codename
scikit-learn
plucky
needs-triage
oracular
needs-triage
noble
needs-triage
mantic
ignored
jammy
needs-triage
focal
needs-triage
bionic
needs-triage
xenial
needs-triage
trusty
needs-triage