[Wget] [TitleIndex] [WordIndex


This page is intended to describe how I prioritize GSoC applications; it is intended to be useful as a basis for students to improve their chances of getting accepted. It's my assumption that serious GSoC applicants will already be subscribed to this list, or at least be reading it through the archives or gmane's news portal.

Please bear in mind that if you are not accepted this year, it doesn't mean that I wasn't interested in your application. You may have missed getting a slot by a very thin margin: competition is close. Please don't get discouraged, and please do consider submitting your proposal to Wget again next year, if you are still an eligible student.

The primary motivation for the policies that follow, is selfish: what is most likely to benefit Wget, both now and in the long term? To this end, the better an asset a given student appears phe may become to the GNU Wget development team, and the more important the proposed work is to Wget, the more attractive the application is to me.

So, without further ado, the specific factors I'm using to evaluate existing applications. These are not listed in order of priority: they all contribute together towards the final decision.

Proficiency as a Programmer

People with significant experience and expertise in coding with the C programming language, who can demonstrate that they've learned good coding habits and can avoid common pitfalls in C, are likelier to get more work done at a higher level of quality, in the period of time allotted to them for coding. It also means less hand-holding to me, which is attractive because I already have insufficient time to do the work that needs to be done for Wget.

An alternative approach would of course be that I should give consideration to willing and eager, but less-experienced students, so that they may have the benefit of enhanced knowledge and experience, and become overall better coders. However, the reason for Wget's involvement in GSoC is a selfish one: to better Wget. Bettering others is of course desirable as a secondary goal; but it cannot be our primary goal. Therefore, we are looking for students who are already well-studied, wherever possible.

It is of course impossible for me to establish how proficient someone is with C unless they supply me with example source code. This can be examples of patches, contributed to Wget or other projects; however, unless these patches contribute significant additional code, rather than being mainly tweaks to existing functionality, they'll tell me relatively little. The ideal would be to link to complete programs or libraries that the student has written. It's OK if it's code you now find embarrassing: I'm interested in what kind of a coder you are now, not what kind of a coder you once were. If you'd do things differently today, explain how. But please give me code. :)

Aptitude for the Proposed Task

In addition to understanding how to write in C, you need to understand the problem domain of whatever it is you're proposing to do, and how it might be solved. Are you working on HTTP authentication? Your proposal should demonstrate understanding of RFC 2617, and probably the underlying security principles. Implementing internationalization enhancements? I need to see, in the detailed description, or at least the public comment threads, a sufficient description of the solution as to instill confidence that you understand the basics of technologies such as UTF-8 encoding, handling transcoding where appropriate, etc. Is your proposal likely to require organization of medium-to-large quantities of data? I'd like to see that you have a solid understanding of algorithms and ADTs, and that you have an idea as to which ones might be appropriate for storing and looking up the data you'll be using.

* Please note, the above examples are examples; they are not meant to imply anything about whether the students who have actually submitted proposals for those features have failed to demonstrate aptitude. The specific cases of HTTP auth and i18n each have exactly one proposal so far, and I already have confidence in both students involved that they possess the requisite understanding and skillset.

Understanding of Wget's Source Base

Prior involvement and familiarity with Wget is desirable. My evaluation of a student application will definitely be augmented if I know that a student has already familiarized perself with Wget's internals. Obviously, having submitted or discussed patches on the list demonstrates such familiarity; if that's not applicable, then briefly touching on how your enhancements will interact with or change existing components within Wget will be helpful. I do not want low-level details: just a very high-level description that is sufficient to demonstrate that you understand what you'll need to do.

Community Involvement

Likewise, participation and communication with the existing Wget community (such as it is) is important. If you start posting questions related to your project, to the mailing list and on IRC, it demonstrates to me that you are actively engaged, and especially that my communication as a GSoC mentor with you will be easy, since we're already in communication on a regular basis.

You will also earn brownie points if I see that you are answering other people's questions on the list or on IRC. It's a clear indication that you will have value to Wget above and beyond your ability to write code for us. :) It's also a good opportunity to demonstrate an existing understanding of Wget.

Importance of Enhancement to Wget

Again, in the interests of selfishness, I need to give priority to enhancements that Wget really needs right now, over those that Wget will need at some point, or ones that are sexy but not critical. If push comes to shove, improving Wget's security (HTTP Auth or SSL/TLS) is more important than getting sexy features in like regex support in acc/rej lists. The FeatureSpecifications/SessionInfoDataBase probably falls somewhere between: it's a mostly a "sexy" feature; but it offers some fairly huge potential benefits (particularly, the potential to unambiguously determine what the local filename is for a given URI, and the ability to continue from an aborted Wget session). OTOH, it's not critical to Wget in the way that working authentication and internationalization (given that non-ASCII TLDs are being introduced this year) are.

Obviously, importance by itself can't be the only factor. If a student's proposal for a critical feature is of significantly lower quality than a proposal for a "nice-to-have" feature, I'll probably go for the "nice-to-have", and implement the critical one myself. Also, if I feel that the student with a critical-feature proposal is less likely to continue their involvement with Wget in the long term, than a student proposing a less-critical feature, it's not unlikely I'll go with the student who I expect to be a long-term asset to the project: the ideal goal is not that we get two months of free work from a couple of students, but that each year at GSoC we will have added new and valuable members to the Wget community. (It is of course my wish that some students will opt to contribute in helpful ways to Wget even if their proposal is not accepted, which would certainly reflect well for them when next year's GSoC comes around.)

Appropriate Amount of Work

Obviously, I'm less likely to approve a proposal whose workload I do not believe will fill at least most of a two-month development period. If your proposed enhancements sound to me like something I could do in two weeks, I'm not apt to go for it. This is mainly a theoretical observation: if I feel that your proposal does not represent an appropriate workload, I will post a comment indicating this, and all you have to do to make things right is to add enough additional material based on my feedback.

On the other hand, I have received proposals that offer the moon and the stars, and maybe a good-sized bit of the sun as well. A little over-zealousness is fine: I'll post a comment suggesting that you ease your workload a bit. There are some very excellent proposals that fit that category, and the students have worked with me to balance the amount of work involved.

However, a student whose proposal includes fifteen or twenty enhancements - some of which would take an entire GSoC or more just take individually - demonstrates to me a basic lack of understanding of the problems and their solutions (lack of "Aptitude for the Proposed Task(s)"). There do exist coders capable of near-superhuman feats of productivity: certainly the hacker culture tends to encourage such people; and I do not wish to underestimate the creative power of caffeinated-and-enthusiastic students on summer break, engaged in coding for Wget full-time (or, if particularly enthusiastic, quite possibly more than full-time).

But, if you want me to believe that you have shockingly prodigious levels of productivity, you need to prove it to me. If you have promised me what looks like two years of work in two-to-three months, I need you to give me either documented proof that you have performed equivalently in the past, or you should spend a couple days to work on the code now, so I can see the resulting two weeks' worth of productivity for myself. Barring that, I am very likely to treat enormous workloads as a sign of ineptitude rather than of extreme productivity. Call me a skeptic.

Back to ensuring a sufficient quantity of work, though: I understand that summer here is not summer everywhere: it may be Winter where you are, with school responsibilities still going full-bore. Or you may live in a country where school exams are still in progress until nearly up to the GSoC midterm evaluations. These are not the end of the world; in particular, if you have already familiarized yourself with Wget's source code and begun communication with me about how to go about it, you can probably take advantage of the month Google has graciously provided as the "Community Bonding" period to get some actual code accomplished.

Be that as it may, please note that the GSoC program expects that students are working full-time on their proposed projects, and that submitting a proposal is an implicit affirmation that you have the free time to get the work done. Yes, you need to eat. Yes, you need to do well in school so you can succeed in life. Yes, it sucks if you live in a country where it's not summer and it's not a break, and it's not fair that you're made to compete with North Americans who have all the time they need to dedicate to GSoC. But none of this means that I can afford to give you special consideration: if you can't do the job, don't apply for it. As I've said, our interests are intrinsically selfish, and it is not in our best interests to consider someone who has four free hours a day to dedicate to a full-time job, and probably isn't taking into account the unfairly large amounts of homework their professors will assign to them. Nor is it in Google's best interests if they're spending money on stipends for people who will not accomplish their tasks.

If you submit a great proposal for something that might be 7 weeks' work rather than 9, I'm prepared to be lenient. If you inform me that you will not have a lot of free time to dedicate to Wget during final exams, but will be able to get some extra work done ahead of the official GSoC start, this too may be alright. I might even be prepared to go on two weeks of quality work for your midterm evaluation instead of four, on the understanding (and confidence) that you will be able to dedicate yourself to Wget and produce six weeks' work out of the next four. What I am not prepared to do is agree to let you shift your responsibilities by a few weeks so that you don't have much to show for the midterm evaluations and I'm forced to "take it on faith" that you'll perform as promised for two months starting there. You and I can negotiate on what exactly is to be done by midterm, but there must be something substantial enough for me to actually evaluate your work and progress. Students who fall well short of the work they agreed to have done by midterm, will be dropped, and will not receive the rest of their stipend from Google.

Informative Proposal

Related to the previous point, it is very important that your proposal specify, very clearly, what will have been accomplished by the midterm and final evaluations. The evaluation of your performance, and my recommendation that you receive your stipend, will be based on the work you and I have agreed upon for you to do. If this information is missing from your proposal, or too vague, you'll be asked to clarify it.

As previously mentioned, your proposal needs to demonstrate a basic understanding of the problem you're trying to solve. It's an informal requirements document, but should also be a very high-level design document. I need to understand not only what you are going to do, but how you are going to do it. As mentioned before, I will generally expect to know what types of structures and algorithmic tools you'll need to use to accomplish your job, and how your work will incorporate relevant standards and RFCs. Clarifying what you do not intend to accomplish might also be a good idea.

When I say I want to know what types of structures you'll use, I'm talking about Abstract Data Types (see your Sedgewick or Knuth textbooks, or check Wikipedia). It does not mean that I want details of what your C structs will be named, or what your functions will be named, or what the names of your files will be, or snippets of C code. That's all good stuff for us to use in discussion; it's generally not appropriate for a proposal. Please try to focus on how Wget will use it, as opposed to how it will look in code.

I am not concerned with minor typographical errors and grammatical errors. Your English should be good enough that it's not a hindrance to our communication; it does not need to be excellent.

Ideas for Next Year

It occurs to me that, since I've placed some importance on the amount of workload, it might be useful in the future if I attach estimated completion times for the features I list on our ideas page. I'll try to do this in the future.

It might also be a good idea for me to prepare a questionnaire of programming- and HTTP-related questions, to discover a student's knowledge and understanding of essential protocols and technologies.

2017-01-04 00:04