TL;DR Identification of privacy exposure of publicly available emails for password reset departments, universities, government departments and private communications. Issue is widespread. I have spent the past month notifying universities and non-government groups affected.
I recently found a data leak through a Google search string which gave results of information that should probably not be public.
The results included information in private emails such as password reset emails, login details, non-disclosure agreements, attachments and emails marked “confidential” from government departments across the world.
I chose to help rectify the issue for the innocent people involved. Primarily, the university students who had sent their passwords to help desk teams saying:
“Hello service desk, my password which is usually XXXXX isn’t working. My username is”.
– Anonymous university student
This is serious as the passwords were primarily surnames, last two digits of a date of birth, and of course an exclamation mark for the illusion of safety. For example Smith84!
This style of predicability in password design, followed by access to an email address, university, name and student number creates a a dangerous arsenal for black hat hackers.
An example of the other side of this issue was highlighted by a government department email correspondence I found, again on a Google search result, where the email was marked as “confidential” with attached pdf documents and the contents of the email showing:
“Attached file protected with the password for security reasons. Password is XXXXXX”
– Confidential government correspondence
But first, let me take you through the process of how I found it, what happened, and what all of this means in terms of safety online.
A note on before we look at the details, I don’t for any minute consider myself a cybersecurity professional, and I too have fallen in the trap of being poor at personal security. However I do use:
- A password manager
- Two-factor authentication (almost standard across platforms now)
- Encryption devices and physical tokens for things that make me think “I ought to be careful with that”
Of course, there’s a lot more you can do, but those are just some of the basics I follow.
Looking for something you shouldn’t find
As digital researchers, there’s a lot of niche tools we have in our belt that can open up various pockets of the information world.
Think of it like a flashlight allowing you to navigate dark caves, while others remain outside still figuring out how to walk in the dark through the cave’s passageways.
Sometimes in our research we come across spiders, crawlies and other bugs in that cave. Should we warn anyone so they don’t get stung? Well that’s a question of ethics that comes down to the person.
In this (accidentally) open source case we take a look at Google Dorking and the steps I took to try and help notify organisations affected.
What on earth is Google Dorking?
If you have never heard of it before, you might be asking yourself what is Google dorking? I like to think of it as typing random things into Google and hoping for the best (and feeling like Eliot Alderson while you do it).
A more technical description would be using Google Search advanced operators to filter the bottleneck of information and communicate more effectively with the archiving language used by Google’s indexing bots.
There are a few basic phrases in Google Dorking that you can use to request specific things from the search engine. Some of the more common ones, using the United States Space Force as an example, are:
- site:spaceforce.mil rocket (restricts results to that domain)
- inurl:rocket (gives results with that word in the url)
- filetype:pdf (gives only pdf search results)
These are interesting when combined as well, such as looking for only pdfs at spaceforce.mil, or looking for urls mentioning ‘rocket’ under the spaceforce.mil domain.
You might have also heard of custom search engines (CSEs) which also run off the same principle, but store those search strings so that your custom search engine always delivers results based on your filters.
Here is one I made in 2018, called GEOINTsearch – it delivers results that have the Google Maps shortened tag from Twitter, Reddit and 4Chan platforms, essentially delivering results on geolocation and Google Maps references.
For a more exhaustive list of Google dorks, here’s a piece written by my puppet friend Sector035.
‘Hey your passwords are public’
In this case I was taking a dive into a search string of keywords on Google consisting of inurl:something and site:wellIamnotdisclosingthat.
What I eventually came across were a number of university engineering departments that had their password reset forms open for the world to see.
I was faced, with a simple click on Google, with things like this
And this result was not isolated
One academic institution had at least 129 results. This was a concern to me as a lot of them were password request forms with people saying that their usual password ‘XXX123!’ had not worked. And in response to that, a link could be found to reset the password.
What was even more interesting is that I was able to peruse staff emails, private club newsletters, and correspondence between lecturers and students.
I was shocked. This material should not be available through a simple Google search.
First thing I did was reach out to the engineering department of said university. I had their email addresses because, well, ehm, I had their emails. So I sent a friendly and polite notice
The university responded straight away with a fix. Which was great to see.
I also contacted Mailman who were aware of the issue.
Don’t publicly archive emails that are meant to be confidential…
Unfortunately the problem is not new, it has been around for quite some time and appears to be part of a much wider problem of Pipermail archives and the administrators who administer those archives.
Part of the solution is by the action of the administrators of these archives to just not publicly archive helpdesk lists where login information is regularly shared.
But wait there’s more!
It’s not just university institutions that have publicly archived Pipermail emails, it’s also government agencies and a couple of fan fiction groups.
I also went through notifying fan groups and some of the exposures where NDAs were visible.
There are still many other government departments, agencies and institutions with the same issue – needless to say the problem is widespread.
A search under the parameters of inurl:gov inurl:pipermail and a term “confidential” shows just how widespread information security issues like this can be when it originates out of a widely used platform.
Some of them have used the same administrative or default passwords over a number of years. Others use them for standard document access, as seen below
Removing URLs from the Google Cache
If you or your organisation has been effected by this, you can either change the configuration settings of pipermail, or you can change the configuration in apache to limit access to specific IP addresses or to blanket login the domains.
But when you do remove this access, Google’s cache will often keep them available for some time.
Google have an administrative Removals Tool to assist with this. You can find more information about it here. They also have a walkthrough video explaining that process below
This process is done through the Google Search Console as seen in the screenshot below.
A note on this report
As stated, I have made all attempts possible to assist the universities involved to rectify the issue, and have contacted Mailman security about the issue.
However, it appears the fault remains with the controller of the security settings, which in many of these cases appears to be the administrators.
The purpose of this report is to stimulate conversation, research and development in the open source and information security community and is in no way to the detriment of the subject or any business or person identified as a result of these findings.
The issue is really down to the user in this case, and the users are the administrators. They’re responsible for the information security of their users, whether they’re students, staff or public. They’re not doing their job if these archives are available online for everyone to see.
Thank you for your blog post. Really thank you! Awesome. Bernelle Jaimie Gee