Search engines are a gateway to the Internet. It enables users to easily navigate the Internet for what they want and take advantage of what the Internet has to offer. Without them, the Internet would still be a place for universities, businesses and the government. Since the early 2010s, the Tor network has been getting more attention in the mainstream world. With this new found attention, there will be a lot of new users that will want to navigate the network for themselves. The Ahmia project gives these users a way to get around the Tor network that is familiar to them. The Ahmia search engine indexes Tor hidden services and makes them searchable.
For GSoC, I plan to work on automating blacklisting, adding advance search options, making changes to the crawler and adding more documentation to Ahmia's web pages. First, I will work my way through the codebase to learn how the search engine works. During this time, I will also start writing a wiki page and extra documentation for developers. Then, I will change the Ahmia crawler so that is connects directly to Tor's SOCKS and phase out the use of polipo, since it is no longer maintained.Then, I will be implementing an algorithm that automates blacklisting child abuse websites.Afterwards, I will be adding a form for owners of hidden services so that they can add their websites to the search engine. Next, I will be adding advanced search options to make searching for hidden services easier. Optional goals for the summer include implementing some of the things last year's Ahmia mentee, Ismael Riahi, suggested in this blog post such as indexing by language and improving the speed of the crawler by ignoring unchanged content.
The algorithm takes advantage of the fact that a lot of hidden services link to child abuse websites.
These are the initial search options that will be added to the search engine. All these options can be used in combination with one another.
Example: The search term "Tor Project" should return all pages containing the string Tor Project.
Example: The search term -hidden should return all pages that don't contain the word hidden.
Example: The search term spam AND eggs should return pages that contain both spam and eggs.
Example:The search term spam OR eggs should return pages that contain both spam and eggs or either spam or eggs.
Period | Objective |
---|---|
Community Bonding Period |
|
May 28 - June 11 |
|
June 12 - June 18 |
|
June 18 - June 26 |
|
June 27 - July 17 |
|
July 18 - July 24 |
|
July 25 - August 8 |
|
August 9 - August 20 |
|
August 21 - August 29 |
|
As we all know, a lot of the times our plans don't always work out. So these are the potential bottlenecks that I may encounter during GSoC
I have written a few web applications with Django that can be seen on my github page.
Since my elementary school days, I have had a strong interest in all things technology. This includes security and artificial intelligence. In the 6th grade, I wanted to become a hacker. Thus, I read upon all the things hackers used and all the knowledge they have to gain in order to be successful hackers. During this period, I discovered the Tor project and in particular the Tor browser. It was slow and I was ignorant about its significance. Since then, the Tor project has come such a long way and I have seen it become more and more on people's radar. Ever since Edward Snowden disclosed the NSA surveillance, I began using the Tor browser more regularly and developed a deeper interest in security. I realized that I was taking for granted how open and free the Internet is and why we need organizations like the Tor project and EFF to defend what makes the Internet probably the most important human achievement in recent history. Another motivation is to learn more about open source software development and start contributing to it. I personally use a lot of free software and have the skills to contribute. However, I haven't contributed anything to any of the products I use. So, I am taking this opportunity to give back to a project that has improved my life for the better. I also would love to learn more about hidden services, how they work and how to create my own. I find the ideas and technology behind Tor fascinating and would like to learn more about it. Lastly, I have a lot of experience programming in Python and I have a year of experience with Django. Hence, I believe that I will be able to contribute immensely to the project during the summer months.
I have not contributed to free solftware developement before. However, I have tried to start such a project called WikiSummary.I also have code posted on my github released under the MIT License that I used for learning purposes. Moreover, while building the Ahmia development environment, I submitted 2 pull requests to the Ahmia-Crawler and Ahmia-Index and I also started to write a wiki for Ahmia (available here).
Since I plan on taking evening courses, I have started a bit early by going through the codebase, submitting some pull requests for small issues, written an initial draft of the wiki and talking to Juha Nurmi. During the summer, I plan to take evening courses for my degree. Other than this, I will be able to work on GSoC full-time. The way summer courses work at my university is that there is 2 semesters: the first from May to June and the second from July to August. The exam period for the first semester is from June 26 to 30 and for the second is from August 15 to 18. The exact dates of my exams have not been posted yet. When I get them, I will inform my mentor. For the first semester, I will have lectures on Mondays and Wednesdays from 6 pm to 9 pm. For the second semester, I will have lectures from Monday to Thursday from 6 pm to 9 pm. If any extra time is need, I will use the weekends for any extra work that needs to be done. I have also dedicated a part of my time during GSoC for catching up.
In order to let everyone know about my progress, I will be posting weekly to the mailing list and will be using Github to keep track of commits. If there are any questions, I will be available on IRC and you can send me e-mails.
I do not have any previous GSoC experiences. I am also not applying to any other organizations.
My project will require maintenance and in order to keep up with the tactics that websites with inappropriate content come up with, I will need to stay up to speed with how to improve the algorithm for backlisting. As for the wiki and support pages, I will update accordingly whenever I get feedback about them. As we gather more user feedback, we can determine the most useful and easiest advanced search options to integrate into Ahmia.I will also continue working on how we can anonymously gather data and report these numbers in a an approachable way. I am also considering becoming a mentor for GSoC 2018.
I would like to thank Juha Nurmi for helping with this application and for helping develop my ideas in accordance with the Ahmia Search Engine's goals. He's also been very helpful in helping me set up the development environment for working with Ahmia. I look forward to working with him throughout the summer. Thanks also to Ismael Riahi, who worked on the Ahmia search engine during GSoC last year, for giving me some advice and ideas for GSoC for this year. I would also like to thank everyone on the #tor-dev and #tor channels on IRC for helping develop my ideas, answering my questions and making me more knowledgeable about Tor and its challenges. I strongly believe that a user firendly search engine on the Tor network would encourage more people to use Tor. So, I would like to contribute to projects make Tor more accessible to the lay person. Moreover, I believe that mentorship is very important and that GSoC is a great initiative to bringing more people like me to work on open source projects.