GSOC 2017 Proposal for Tor Project: Ahmia Search Engine

Mikerah Quintyne-Collins

2nd Year Mathematics and Statistics undergraduate Student at the University of Toronto, St.George.

IRC: kiki101

Email: mikerah14@gmail.com

Github: Mikerah

Time Zone: Eastern Standard Time (UTC is 4 hours ahead)

Abstract

Search engines are a gateway to the Internet. It enables users to easily navigate the Internet for what they want and take advantage of what the Internet has to offer. Without them, the Internet would still be a place for universities, businesses and the government. Since the early 2010s, the Tor network has been getting more attention in the mainstream world. With this new found attention, there will be a lot of new users that will want to navigate the network for themselves. The Ahmia project gives these users a way to get around the Tor network that is familiar to them. The Ahmia search engine indexes Tor hidden services and makes them searchable.

For GSoC, I plan to work on automating blacklisting, adding advance search options, making changes to the crawler and adding more documentation to Ahmia's web pages. First, I will work my way through the codebase to learn how the search engine works. During this time, I will also start writing a wiki page and extra documentation for developers. Then, I will change the Ahmia crawler so that is connects directly to Tor's SOCKS and phase out the use of polipo, since it is no longer maintained.Then, I will be implementing an algorithm that automates blacklisting child abuse websites.Afterwards, I will be adding a form for owners of hidden services so that they can add their websites to the search engine. Next, I will be adding advanced search options to make searching for hidden services easier. Optional goals for the summer include implementing some of the things last year's Ahmia mentee, Ismael Riahi, suggested in this blog post such as indexing by language and improving the speed of the crawler by ignoring unchanged content.

Tasks

Optional Tasks

Timeline

Period Objective
Community Bonding Period
  • Review Ahmia's code base
  • Write a rough draft of the wiki
  • Start replacing Polipo
  • Upgrade outdated requirements
  • Fix broken links on the main website
  • Fix bugs
May 28 - June 11
  • Implement a proof of concept for identifying and blacklisting child abuse websites
  • Do testing on PoC to determine areas of improvement
  • Implement a form for new hidden services and have the crawler index these new sites.
June 12 - June 18
  • Integrate blacklisting algorithm and form to the production code and official website.
June 18 - June 26
  • Prepare and submit Phase 1 Evaluation
June 27 - July 17
  • Start implementing advanced search options and the corresponding support pages.
  • Integrate them into official website.
July 18 - July 24
  • Prepare and submit Phase 2 Evaluation
July 25 - August 8
  • Complete wiki pages started during the Community Bonding period
August 9 - August 20
  • I will be taking this time to catch up and improve the quality of my code.
August 21 - August 29
  • Complete final evaluations

Deliverables

Mid term Evaluation

Final term Evaluation

Risks

As we all know, a lot of the times our plans don't always work out. So these are the potential bottlenecks that I may encounter during GSoC

Code Samples

I have written a few web applications with Django that can be seen on my github page.

Motivation

Since my elementary school days, I have had a strong interest in all things technology. This includes security and artificial intelligence. In the 6th grade, I wanted to become a hacker. Thus, I read upon all the things hackers used and all the knowledge they have to gain in order to be successful hackers. During this period, I discovered the Tor project and in particular the Tor browser. It was slow and I was ignorant about its significance. Since then, the Tor project has come such a long way and I have seen it become more and more on people's radar. Ever since Edward Snowden disclosed the NSA surveillance, I began using the Tor browser more regularly and developed a deeper interest in security. I realized that I was taking for granted how open and free the Internet is and why we need organizations like the Tor project and EFF to defend what makes the Internet probably the most important human achievement in recent history. Another motivation is to learn more about open source software development and start contributing to it. I personally use a lot of free software and have the skills to contribute. However, I haven't contributed anything to any of the products I use. So, I am taking this opportunity to give back to a project that has improved my life for the better. I also would love to learn more about hidden services, how they work and how to create my own. I find the ideas and technology behind Tor fascinating and would like to learn more about it. Lastly, I have a lot of experience programming in Python and I have a year of experience with Django. Hence, I believe that I will be able to contribute immensely to the project during the summer months.

Experiences with free software development

I have not contributed to free solftware developement before. However, I have tried to start such a project called WikiSummary.I also have code posted on my github released under the MIT License that I used for learning purposes. Moreover, while building the Ahmia development environment, I submitted 2 pull requests to the Ahmia-Crawler and Ahmia-Index and I also started to write a wiki for Ahmia (available here).

Availability

Since I plan on taking evening courses, I have started a bit early by going through the codebase, submitting some pull requests for small issues, written an initial draft of the wiki and talking to Juha Nurmi. During the summer, I plan to take evening courses for my degree. Other than this, I will be able to work on GSoC full-time. The way summer courses work at my university is that there is 2 semesters: the first from May to June and the second from July to August. The exam period for the first semester is from June 26 to 30 and for the second is from August 15 to 18. The exact dates of my exams have not been posted yet. When I get them, I will inform my mentor. For the first semester, I will have lectures on Mondays and Wednesdays from 6 pm to 9 pm. For the second semester, I will have lectures from Monday to Thursday from 6 pm to 9 pm. If any extra time is need, I will use the weekends for any extra work that needs to be done. I have also dedicated a part of my time during GSoC for catching up.

Keeping in touch throughout the summer

In order to let everyone know about my progress, I will be posting weekly to the mailing list and will be using Github to keep track of commits. If there are any questions, I will be available on IRC and you can send me e-mails.

GSoC Experience and Other applications

I do not have any previous GSoC experiences. I am also not applying to any other organizations.

After GSoC 2017

My project will require maintenance and in order to keep up with the tactics that websites with inappropriate content come up with, I will need to stay up to speed with how to improve the algorithm for backlisting. As for the wiki and support pages, I will update accordingly whenever I get feedback about them. As we gather more user feedback, we can determine the most useful and easiest advanced search options to integrate into Ahmia.I will also continue working on how we can anonymously gather data and report these numbers in a an approachable way. I am also considering becoming a mentor for GSoC 2018.

Comments

I would like to thank Juha Nurmi for helping with this application and for helping develop my ideas in accordance with the Ahmia Search Engine's goals. He's also been very helpful in helping me set up the development environment for working with Ahmia. I look forward to working with him throughout the summer. Thanks also to Ismael Riahi, who worked on the Ahmia search engine during GSoC last year, for giving me some advice and ideas for GSoC for this year. I would also like to thank everyone on the #tor-dev and #tor channels on IRC for helping develop my ideas, answering my questions and making me more knowledgeable about Tor and its challenges. I strongly believe that a user firendly search engine on the Tor network would encourage more people to use Tor. So, I would like to contribute to projects make Tor more accessible to the lay person. Moreover, I believe that mentorship is very important and that GSoC is a great initiative to bringing more people like me to work on open source projects.