StarDict sends X11 clipboard to remote servers
StarDict is a GPLv3-licensed cross-platform dictionary application. It includes dictionaries for a number of languages, and has a rich plugin ecosystem. It also has a glaring security problem: while running on X11, using Debian's default configuration, it will send a user's text selections over unencrypted HTTP to two remote servers.
On August 4, Vincent Lefevre reported the problem to the oss-security mailing list and to Debian's bug tracker. He identified it while testing his setup before the upcoming Debian 13 ("trixie") release. Installing StarDict will also install the stardict-plugin package by default, because the former recommends the latter. The plugins package contains a set of commonly used StarDict plugins, including a plugin for YouDao, a Chinese search engine that supplies Chinese-to-English translations. The plugin also contacts a second online Chinese dictionary, dict.cn.
This would normally not be much cause for concern; of course a dictionary program will include code to talk to dictionary-providing web sites. But one of StarDict's features, which is also enabled by default, is its "scan" functionality: it will watch the user's text selections (i.e. text highlighted with the mouse), and automatically provide translations as a pop-up. Taken together, the two features result in any selected text being sent to both servers. This only occurs while StarDict is open, but the application is designed to be left open in the background in case the user needs a quick reference while reading.
StarDict on Wayland doesn't have this problem, because Wayland prevents applications from being able to capture text from other applications by default. That does mean that it breaks StarDict's scan feature, though.
Xiao Sheng Wen, the Debian package maintainer for StarDict, didn't see a problem with the behavior, noting that if a user doesn't want to use the scan functionality or the YouDao plugin, both can be disabled. Lefevre wasn't satisfied with that, saying:
But this is not the whole point. Features with privacy concerns should never be enabled by default (unless the feature is the only purpose of the package, and such a package would never be installed automatically — and even in such a case, there should be a big warning first).
In response, Xiao
pointed out that
the package description can be read by any
user who chooses to install the software, and it does mention the scan feature.
That said, I noted during my investigation that the
description of stardict-plugin did
not mention that the YouDao plugin uses an online service instead of an offline
dictionary. Xiao suggested splitting the networked dictionary plugins into a
separate package, but was "not sure whether it's very necessary to do so
".
It is worth noting that the scan feature, while obviously a problem in this context, is one of the reasons that a user might choose to use StarDict over an alternative. Reading foreign-language media is often easier when words can be sought in a dictionary with as little fuss as possible. From that perspective, it makes sense that Xiao might not view the feature as problematic.
Any user who did read the description of the package, and who knew what the YouDao plugin would do, might nevertheless expect the resulting communication to at least be encrypted. But the plugin actually reaches out to its backend servers — dict.youdao.com and dict.cn — over unsecured HTTP. So, not only are these servers sent any text the user selects, but anyone who can view traffic anywhere along its path can see the same thing.
This is not even the first time that StarDict has sent user selections to the internet; the same kind of problem was reported by Pavel Machek in 2009 and again by "niekt0" in 2015. The 2009 bug was solved by patching the application's default configuration to disable networked dictionaries. That appears to have worked for a time, but the YouDao plugin, which was added in 2016, does not respect the configuration option. The 2015 problem was not fixed until August 6 of this year (although the package was removed from Debian for unrelated reasons for a few months from 2020 to 2021). That fix just removed the stardict_dictdotcn.so plugin, which also sent translation requests to dict.cn and was later subsumed by the YouDao plugin, from the package. In fairness to Xiao, he was not the StarDict maintainer in 2015 — that was Andrew Lee — but Xiao knew about the 2015 bug since at least 2021, even if he didn't consider it a priority.
According to Debian's package popularity contest statistics, only 178 people have StarDict installed, down from around a thousand between 2009 and 2015. That obviously doesn't capture people who have configured their Debian system not to participate in the statistics collection, but it does suggest there were a number of people who might have been broadcasting their text selections to the internet for several years. Given that people copy and paste passwords from their password managers, or select the text of sensitive emails and documents during the course of editing, that should be a significant cause for concern.
Debian is a large distribution, containing tens of thousands of packages.
Moreover, because of its commitment to stability, a decent fraction of these
are older software with delayed or sporadic updates. The reality is that Linus's
law ("given enough eyeballs, all bugs are shallow
") only holds up if
people are looking — and if, once they have looked, and have reported things,
the people who have taken up maintenance of the software actually agree that
there is a problem.
Part of the justification for moving to Wayland over X11 is to make security vulnerabilities relating to one application spying on another more difficult to introduce. That obviously has to be balanced against the cost of adapting to a new way of doing things, but it's not hard to see why so many people are eager to make Wayland work. Maybe, in the future, StarDict's default behavior would have had little to no impact. Or maybe StarDict would have started asking for special permissions to let it work on Wayland, and users would have accepted those defaults the same way they currently do.
Either way, the existence of serious security problems that can be found, diagnosed, reported, and still remain unfixed is cause for concern. Linux has long enjoyed a reputation for security; maintaining that reputation depends on the developers, maintainers, and users of open-source software caring enough to fix security problems when they arise.
Posted Aug 11, 2025 17:28 UTC (Mon)
by jadedctrl (subscriber, #178426)
[Link]
Posted Aug 11, 2025 17:37 UTC (Mon)
by shironeko (subscriber, #159952)
[Link] (1 responses)
Posted Aug 12, 2025 10:29 UTC (Tue)
by aragilar (subscriber, #122569)
[Link]
Posted Aug 11, 2025 18:05 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link] (8 responses)
Posted Aug 11, 2025 18:31 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (5 responses)
Cheers,
Posted Aug 12, 2025 7:33 UTC (Tue)
by chris_se (subscriber, #99706)
[Link] (3 responses)
Nah, without any corroborating evidence, Hanlon's Razor applies. The feature itself will be considered useful by some people and my guess is that the authors don't care that much about privacy to begin with, but not because of malice, but more because of a different perspective.
Doesn't make the impact of this issue any better, but I also don't think this kind of rhetoric is helpful at all.
Posted Aug 12, 2025 8:37 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
Considering that the impact COULD be severe economic damage (remember, the US is unusual, in the rest of the world mere disclosure of economic secrets destroys any value), I don't think I'm being over the top.
If I (as an outsider) notice that document, I can use it to block any patent application, for example. Okay, espionage implies intent, but the end result is near as dammit identical.
Cheers,
Posted Aug 12, 2025 11:26 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
If you're referring to the "first to file" policy used in the rest of the world…the US joined that party in 2013[1].
[1] https://en.wikipedia.org/wiki/First_to_file_and_first_to_...
Posted Aug 12, 2025 14:40 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
The rest of the world has used "filing must be the first publication" - if you accidentally let slip your filing documents today, and file tomorrow, that slip counts as prior art and will invalidate your application. It's happened! ("Publication" meaning "making available to the public", not necessarily our normal meaning of the word of "printing copies and selling them". Which includes accidentally losing a copy on the bus ...)
As far as RoW is concerned, "first to file" is just an accidental byproduct of the first publication rule - if my application pre-dates yours, any conflict is resolved in my favour not because I filed before you, but because I published before you. That's why there's discussion every now and then about a "journal of inventions" - the whole point of which is to prevent any future patent applications on those ideas because of "first to publish".
The problem is that if patent examiners mostly read only patent applications, they may well grant invalid patents because they are unaware of prior publications.
Cheers,
Posted Aug 17, 2025 14:29 UTC (Sun)
by rapiz (guest, #153529)
[Link]
Not saying we should do this in 2025. But at the time the software was written, it was a pretty much standard feature for this kind of software.
Posted Aug 12, 2025 7:02 UTC (Tue)
by danieldk (subscriber, #27876)
[Link] (1 responses)
I am glad that this was discovered (again).
Posted Aug 29, 2025 14:05 UTC (Fri)
by nim-nim (subscriber, #34454)
[Link]
Chinese logographs are the penultimate technical debt no one knows how to get rid of.
Posted Aug 11, 2025 20:26 UTC (Mon)
by Hobart (subscriber, #59974)
[Link] (2 responses)
Posted Aug 12, 2025 0:43 UTC (Tue)
by shironeko (subscriber, #159952)
[Link] (1 responses)
Posted Aug 12, 2025 0:54 UTC (Tue)
by shironeko (subscriber, #159952)
[Link]
Posted Aug 12, 2025 4:14 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
https://wiki.debian.org/PrivacyIssues
As an example, Evolution/Balsa/Geary have privacy issues with HTML email:
https://www.emailprivacytester.com/badClients
Luckily there are things like opensnitch that can block some of these issues:
Posted Aug 12, 2025 4:18 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
Posted Aug 12, 2025 5:07 UTC (Tue)
by gdt (subscriber, #6284)
[Link]
Posted Aug 12, 2025 8:17 UTC (Tue)
by metan (subscriber, #74107)
[Link]
[1] https://github.com/gfxprim/libstardict
Posted Aug 12, 2025 10:54 UTC (Tue)
by helge.bahmann (subscriber, #56804)
[Link] (18 responses)
Posted Aug 12, 2025 14:28 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
And such consent must be given by the data SUBJECT, not the data PROCESSOR, so in this particular example, consent is probably not even possible ...
Cheers,
Posted Aug 13, 2025 1:53 UTC (Wed)
by linuxrocks123 (subscriber, #34648)
[Link] (16 responses)
Separately, you are putting forth an absurd interpretation of the GPDR. He's not collecting data on anyone; he just wrote a program that queries a website he neither owns nor controls. Think about this: does the GPDR require Firefox to get your explicit consent before letting you search Google? No, because that would be insane? Okay, then neither does this thing. It's just Firefox for your clipboard.
Posted Aug 13, 2025 3:20 UTC (Wed)
by helge.bahmann (subscriber, #56804)
[Link] (9 responses)
Posted Aug 13, 2025 3:35 UTC (Wed)
by linuxrocks123 (subscriber, #34648)
[Link] (8 responses)
Posted Aug 13, 2025 7:16 UTC (Wed)
by helge.bahmann (subscriber, #56804)
[Link] (7 responses)
Posted Aug 13, 2025 12:56 UTC (Wed)
by linuxrocks123 (subscriber, #34648)
[Link] (6 responses)
No, a browser can also browse the local filesystem and internal networks -- and you overestimate laypersons. (And you're not quoting anything, including especially https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679 so why the quotes?)
Also, sorry, but I don't have a lot of free time right now, so I'm just going to guess at both parts of how this would play out and let that be it for me (see below). However, it would be interesting to know what part of the actual statute or administrative regulations you're basing your opinions on. I'm not going to argue with you about your interpretation of whatever because I don't have time, but I will check back in case you want to quote something, because I would be interested to know whether the GPDR really is insane enough to outlaw certain kinds of open source software or whether you're basing your claims on a misinterpretation of the law, or on a gut feeling rather than the law.
---
You: "By installing and opening the browser, the user consents to using it to transfer data on the network."
(we go in circles 3-5 times)
corbet: something about we need to shut up or take it outside
Posted Aug 13, 2025 14:19 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (4 responses)
Me: But the user does NOT HAVE LAWFUL POWER to consent to sharing this particular data. They are NOT the data subject, who is the only person who can.
This is the exact same problem we have with TPMs, and DRMs (the bad version) where some people seem incapable of understanding that the USER and the OWNER do not have the same rights, and are not necessarily one and the same.
If you are not the data subject, you cannot consent to share someone else's data, and to do this in the EU is a serious breach of the GDPR. "but I didn't realise this program was sending everything to China" is likely to get *extremely* short shrift from the regulators - and the fines can be *extremely* painful.
Cheers,
Posted Aug 13, 2025 14:30 UTC (Wed)
by pizza (subscriber, #46)
[Link] (3 responses)
Huh?
"this particular data" is the user's. Why don't they have the lawful power to consent to sharing their own data?
Posted Aug 13, 2025 14:45 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (2 responses)
The whole premise of this sub-thread (if I have it right) is that by typing/copying data into Firefox's search bar, the user is consenting to it being sent out for processing elsewhere. NOWHERE has there been any discussion about whether the user actually has the power to consent, or whether the data is theirs, leading to the classic ASS-U-ME scenario.
Which is the exact same problem we have with StarDict, that started all this ...
Cheers,
Posted Aug 13, 2025 15:25 UTC (Wed)
by pizza (subscriber, #46)
[Link] (1 responses)
While your point about how the "user" and "subject" are not necessarily the same person is valid, you keep using "user" interchangeably to refer to both. Please be consistent!
Meanwhile, any potential liability for leaking third-party PII falls entirely on the [employer of the] meatbag sitting in the chair. Assuming they even had the right to "process" that PII to begin with, they are responsible for safeguarding that information, which extents to their choice of tools and properly configuring/maintaining them.
But in a more general sense, the "problem" you describe is as old as humanity itself; There is no way to prevent two people from gossiping about a third party. There is only punishment after the fact, Sometimes. Long after any damage has been done.
Posted Aug 13, 2025 16:55 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
And where exactly have *I* done that? Of course, if the user and the data subject are the same person, the terms *are* interchangeable. But if they're *not* the same person, then the user is the pbkac, and the location of the data subject is completely unknown. And the *user* "hat" does not have power to consent. Ever.
> Meanwhile, any potential liability for leaking third-party PII falls entirely on the [employer of the] meatbag sitting in the chair. Assuming they even had the right to "process" that PII to begin with, they are responsible for safeguarding that information, which extents to their choice of tools and properly configuring/maintaining them.
Notice in my example I did explicitly say I had the right to *process* the data.
> But in a more general sense, the "problem" you describe is as old as humanity itself; There is no way to prevent two people from gossiping about a third party. There is only punishment after the fact, Sometimes. Long after any damage has been done.
I have to agree. There are (and have been for a long time) laws on libel, slander, and eavesdropping. But Linux should not include programs whose default (and undeclared) activity falls blatantly within the eavesdropping category. It's basically the same as all the screams about Microsoft backing up all your activity so that any random 3rd-party who gains access to that data (that you quite possibly didn't even realise existed) can retrospectively view everything you did!
This basically is the point. Firefox is EXPECTED to share the stuff you type in. If the user is not the data subject, then there is potential for a serious GDPR breach (I expect it's like trespass - if you have good grounds for expecting the data subject to agree, then the risk is minimal. If you expect (or know) the data subject will refuse, then DON'T DO IT!) If the user isn't aware of the law, the response will be "ignorance is no excuse".
The problem with StarDict, is the user has no expectation that the data will be shared, so the blame could easily end up landing on the entity responsible for installing it. And that could end up being the distro itself.
Cheers,
Posted Aug 13, 2025 15:15 UTC (Wed)
by helge.bahmann (subscriber, #56804)
[Link]
Posted Aug 13, 2025 7:28 UTC (Wed)
by chris_se (subscriber, #99706)
[Link] (1 responses)
Well, at least Firefox gets your implied consent when searching Google, because there's the expectation that entering a search query in Firefox will contact an external service.
But just starting a program in the background causing all text I select with my mouse to be sent over the network without any encryption in that program's default (!) setting is something extremely far from any expectation I would have.
I'm not saying this feature isn't useful, and that the program can't include it - but it shouldn't be the default. It should either be disabled by default and the user should have to enable it explicitly in the settings (with a message text in the setting indicating clearly what the consequences are), or this choice should be shown during the first program startup, with an equally explicit explanation.
I agree though that the GPDR talk is a bit of a red herring. But I think that the program in its current form can easily run afoul of other laws. For example, in my country of Germany there's § 202a of our criminal code, which may or may not apply to this, depending on how a court interprets the law (I can see this going both ways).
Posted Aug 13, 2025 10:28 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
GDPR is *very* relevant.
Let's say I am (lawfully) processing your PII on my system. This thing sends YOUR data to China, without MY knowledge, or your consent. Cue all hell breaking loose when the Data Commissioner is informed ...
Cheers,
Posted Aug 13, 2025 8:55 UTC (Wed)
by excors (subscriber, #95769)
[Link] (1 responses)
Consent is not the only lawful basis for data processing under the GDPR - in particular it can be justified as "legitimate interest", when that processing is necessary for those interests and is balanced against the user's own interests. (And it must be explicitly documented in the privacy notice, so users can decide whether they agree with that balance before using your product.)
Sending search terms to Google seems like a clear example of legitimate interest, because web search is valuable for both the browser developer and the user, and transmitting the search terms is necessary (you can't do offline web search), and the privacy harm is minimised by using encrypted communication to a company that has its own GDPR-compliant privacy policy, etc.
Firefox's privacy notice (https://www.mozilla.org/en-US/privacy/firefox/) lists its legal basis for search data as "Legitimate interest in providing and improving search functionality, as well as a more personalized search experience and sponsored results", and also "Consent when you choose to opt into an enhanced search experience and share additional personal data" (I assume this is https://blog.mozilla.org/data/2025/05/07/data-and-firefox..., which sends more data to Mozilla and partners and is opt-in).
Posted Aug 13, 2025 17:11 UTC (Wed)
by helge.bahmann (subscriber, #56804)
[Link]
Posted Aug 13, 2025 10:34 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
But the GDPR DOES require me to get YOUR consent if I want to put *YOUR* PII into Firefox's search bar.
You are all missing the point that this thing is sharing stuff that the user has no legal right to share (because it's *someone else's* PII, or trade secrets, or yada yada). Without the user's knowledge, or consent, which the user is legally bound to refuse.
Cheers,
Posted Aug 13, 2025 12:46 UTC (Wed)
by rschroev (subscriber, #4164)
[Link]
He's sending data to a third party, that's even worse. Data which is almost certain to contain sensitive / confidential / personally identifiable information at some point.
Beauty is all around
memory
memory
Seriously?
Seriously?
Wol
Seriously?
Seriously?
Wol
Seriously?
Seriously?
Wol
Seriously?
Seriously?
Seriously?
Upstream is pretty surreal.
It's... unusual. https://web.archive.org/web/20230704145152/http://www.huz...
Upstream is pretty surreal.
Upstream is pretty surreal.
Privacy issues in Linux distros
Distro privacy policies?
Security policy and secure-by-default
I wrote my own stardict comatible library and app
[2] https://github.com/gfxprim/gpdict
[3] http://gfxprim.ucw.cz/packages.html
GDPR violation
I presume it is an honest error by the maintainer, but when trying to justify external data processing it is important to also look at it from this POV in addition to moral obligations.
GDPR violation
Wol
GDPR violation
GDPR violation
My comment was mostly to the discussion about "safe by default" -- that is not only the right thing to do, but "privacy by default" is the factual legal requirement, I find that arguing against this is really strange.
GDPR violation
GDPR violation
The distinction is that a clipboard is designed for exchange of on information between apps, and you are not consenting to anything anytime you may even make an (sometimes accidental) selection in an application, and it virtually impossible to operate a GUI without ever making use of a clipboard. If you ask any lay person whether the purpose of a clipboard is to send all information to a remote server, you will get a response to that, and legal interpretation naturally follows.
The law is there to protect privacy, and it is constructed very sensibly: "privacy by default, if in doubt ask user for consent, don't just force your opinion on privacy preferences on others". It is not helpful to try to contort the law in absurd ways in order to argue that you cannot be bothered to be bound by it. Privacy laws really are one of open sources best friends as everyone else keeps trampling on them.
GDPR violation
Me: "By installing and opening a program whose documented behavior is transfer the clipboard data over the network, ditto."
You: "But it's not obvious the program would be doing that."
Me: "It's provably not obvious to laypersons that using private browsing doesn't actually change the way browsers transfer data over the network."
You: something about common sense and network access being necessary to functionality
Me: something about offline mode not being the default and network access being necessary for non-degraded functionality of the clipboard program
GDPR violation
Wol
GDPR violation
GDPR violation
Wol
GDPR violation
GDPR violation
Wol
GDPR violation
If you look at how GDPR cases are regularly won against companies trying to "sneak" underhanded data processing into their products, this is really exactly it.
GDPR violation
GDPR violation
Wol
GDPR violation
GDPR violation
GDPR violation
Wol
GDPR violation