Diaries of a Core Developer

I am posting this in light of webchick's insightful series of blog posts, a more recent revival of the topic, and my own utter frustration. I understand this subject has been brought up before with no change so I was a bit reluctant to bring it up again, but I think it is important. Personally I have been so fed up with core development that I have taken over a month off from serious core development. Given that other developers express this sentiment from time to time and that you can feel the discontent in the community (granted some are perfectly happy) I think things need to change.

Keep in mind I have reworked this quite a bit over the course of several weeks and sat on the whole subject for some time to ensure that I had my perspective right. Not all my reasons, experiences, and thoughts are included as I tried to focus on the most important.

I am going to present some of my own experiences and my personal mindset when working on Drupal core and the reasons behind it. I will not point to specific issues or people.

General patch workflow/mindset
In general when I submit a patch I expect there to be several revisions of the patch before it is finally committed. I think that is a healthy sign and the patch ends up better at the end. Another thing I expect is the patch to take 1-4 weeks to get committed on average with a few exceptions on both both ends of the spectrum. It usually takes a week or two to get a review and bring it to RTBC status and then anywhere from from a day to three weeks to be committed.

Given that most patches are not standalone, that is to say they are part of a larger set of changes, it can take an extremely long time to get any sort of major change into core. A shortcut to this is the controversial and feared "large patch." There are two reactions to such a patch. If the patch is a cleanup/refactoring (ie. unexciting) or for whatever whim it may be rejected as "two large to review" and the creator is told to split it into smaller patches. In many cases this is virtually impossible and if possible will take months to get them all the patches committed, not to mention the overload of work to split the patch up. I have let a number of large cleanups of the testing system die for this reason and reworking smaller patches is a joke. To accomplish any major overhaul can easily take 10 patches at ~3 weeks a piece it is impractical. Which means SimpleTest still has issues with static variables, after several attempts, among other things.

There is a more general implication of this, I do not even bother rolling several patches at a time to the same system because they will sit so long that I will be forced to waste time continually re-rolling them. I have attempted this on several occasions hoping I could get things to move a bit faster, but became utterly frustrated. Additionally, I personally ignore minor changes and cleanups, because I do not feel like going through the hassle of getting an obvious patch committed.

Responsibility without authority
Another problem that I have run into is architecture decisions. These decisions tend to result in massive wastes of time from stagnation or constant rewriting. As Larry Garfield noted (excerpt):

Schema API is right now lagging far behind the rest of the DB layer. It's API is old-school, it has limited capabilities, it's nearly useless outside of update hooks, and it's totally out of sync with the rest of the system. In April, I posted an issue laying out two different ways we could take Schema API. I asked "OK, so which do we want to do?" because as Matt says above no one wants to spend dozens of hours on one approach only to be told later "nah, let's do the other one instead". You'd charge a client extra to tell you that, but I can't bill Dries for my core development time. :-)

That was in mid-April.

I personally asked Dries for feedback by email.

I personally asked webchick for feedback in IRC.

I emailed the dev list pointing people to the issue.

I got involved in a lengthy discussion on the dev list, asking "so as DB Maintainer, what the heck do I actually DO? Do I have the authority to just make this decision and not have it overruled on a whim later?"

There are always multiple ways of doing something and many times people do not comment until they see something they do not like. A problem is that in many cases an entire rewrite is required to satisfy them when many times it is just not necessary. Having more people with authority on large decisions will help alleviate this stress.

Bestow responsibility
Now some may say, "So what are you suggesting? Give more authority to the 'Maintainers'?" People may then look at various core developers, many of whom hold "maintainer" positions and point at the patches that they marked as RTBC that were not. I know I am guilty of this and many other developers as well. My reasoning is that a patch can be debated for centuries before it is marked RTBC and there will still be complaints made so I would rather get those complaints early on rather then later (after waiting a week+ for a review in RTBC queue of course). Better to get my patch's place in line than debate it before an initial core committer review. I can say for a fact I review my code more thoroughly for my contrib modules since I have commit access to them. I am sure the same would hold true if more core developers were given commit access.

Others may say that it is good to have patches thoroughly reviewed by people who are intimately involved with Drupal (core committers) as they will ensure the patches are better. Granted many reviews I receive in such a manor are very thorough and well thought out, but there are plenty of terrible patches that get by which I can spot flaws with. Everyone is human and no one can catch everything. Then of course I must wait weeks to get the fix in for those patches. Of course the argument above is only valid if you assume a larger commit group means that unqualified developers will review patches. If you take a look at the MAINTAINERS.txt file in core I think you will find that quite the opposite is true.

Another note on the reviews. Many times a patch will improve a system or bit of code dramatically and there are further improvements planned. In the interest of incremental changes (and Extreme Programming) it makes sense to commit the code even if it is not as good as it could be. When writing a follow-up patch one will most likely find shortcomings of the previous patch and clean it up along the way. This process is much faster and much more effective. This allows the majority of code to be better overall instead of just select areas that are reviewed with intense scrutiny.

Conclusion
From what I can see this aged process of core development is holding back Drupal and asking for a fork. It is easily conceivable that enough people will get annoyed (especially as Drupal continues to grow) with the slow speed of development. I do not want to see that happen and I would like to see Drupal get better, faster.

I do not pretend to have all the answers, but I can present the issues I have encountered and logical responses as I see them.

Firstly, regardless of what you look at: patches sitting in RTBC queue, developers sitting for months waiting for a decision, or the fact that a "code slush" is even needed it is clear to see that two committers is not enough. Many people will argue we need more reviewers, but one must look at the RTBC queue and see that would only make things worse. Given that systems like the Schema API are not up-to-date with DBTNG, even though that was introduced very early on in D7 cycle, you know we have issues. I can say for a fact that I was just totally burned out after rewriting the internal browser patch on every whim and as such it is not implemented anywhere in core and I can only imagine that this holds true for numerous systems.

Proposal
I do not think that adding one, even two committers is the solution. We need a committer for each system (like every other open source project I know and Drupal contrib), who knows that system inside and out. Given that we already have a list of those people, who have no authority at the moment, the fix seems easy to implement. I do not mean that the database maintainer should only have access to /includes/database as that leads to the cases where two lines need to be changed elsewhere or perhaps all modules need to be updated for database API changes and now the maintainer is powerless. Instead maintainers should be watched over by the community and possibly 1-2 overall maintainers, like we have now.

The maintainers then have final say over their respective areas, but should take into consideration discussion with the community and of course the overall maintainers. In the situation Larry described, he would be free to decide on a path and implement it given that no feedback was provided and noone presented any valid reasons against the approach.

Defining some sort of overall roadmap, like what core is to be makes sense and that may fall more in the realm of the overall maintainer(s) and the community. The maintainers are then bound within that roadmap.

I am sure there are additional ways to improve the core development cycle, but I believe the above changes are crucial. If nothing changes, I suspect I will focus on hacking contrib after the D7 bug cycle.

Thoughts
I appreciate thoughts and comments on this issue, but I would appreciate it if only those "actively involved" in Drupal core development would comment as those are the individuals most effected by this.

Tags:

Comments

Well given that I'm mentioned by name a couple of times I suppose I should offer a thought or two. :-)

I have in the past and continue to argue that Drupal is too big for one or two people to scrutinize every line of code that goes into it. Especially with all of the new modules and systems that went into Drupal 7, it's bigger than ever and we have only two people through whom all code must flow, when, as Matt Farina points out in the linked article, we used to have three. That simply does not scale.

Now, I'm not sure I'm entirely in agreement about adding a dozen more committers in the form of MAINTAINERS.txt. Currently most subsystem maintainers AFAIK still write huge swaths of code for their respective subsystems (lord knows I do for the DB layer), and you don't want people committing their own code. I freely and openly admit that much of my first-draft stuff sucks. That's why we have reviews, and I try to be open about the reviews I get even if it doesn't always seem like it. :-) And 9/10 times, the code is in the end better for it.

Matt's suggestion is based on the realization that Drupal is large enough that it doesn't need a single person to grok every single line. It needs a few people who grok how it all fits together at a high level, and then people who can micro-review in certain areas. I'm totally fine with having to get all DB-related stuff to go through someone else to approve... but there is no reason whatsoever why that person should also be concerned with the D7UX work, which has virtually nothing to do with the database. That can easily be split into two people (Matt's third core committer), thus making those people's lives easier because they have less they need to fit in their head and my life easier because when I need to have a deep architectural discussion with someone who can actually approve an idea that person is juggling fewer conversations at once.

To her credit, when I asked on the dev list "so what the heck does being a maintainer mean?" webchick was the one and only person who provided anything even resembling a useful answer: I get a "bat phone" to her when I need it. OK, great, cool, not necessarily all that useful at times but at least it's something. Except that even when I made use of that batphone, half the time she was too busy with D7UX or Fields-in-Core or SimpleTest or something else to be able to pick up even if she wanted to. That's what happened to Schema API. And that's webchick, who seems to invent time out of thin air and juggle 15 more things at once than I can even hope to do. Imagine what will happen with Drupal 8 getting even larger and someone not as skilled as webchick (and without so forgiving an employer) is maintainer. Such questions keep me up at night. (Literally, as it's after midnight as I write this. :-) )

As a subsystem maintainer, I don't need commit access; I need clear jurisdiction, and for those above me in the Drupal hierarchy (don't lie to yourself and say we don't have one) to not be over-worked. The only way that commit access would work is if we used a DVCS and I was able to maintain the "Crell branch", merge in lots of DB-related patches, and then let Dries or webchick merge that branch back to mainline in larger chunks. (That's the model Linux follows.) (That to me is the #1 argument in favor of a DVCS, but that's another story.)

You touch on another very very important point here, although you don't go into it in detail: Not all reviews are created equal. Sorry, they're not. It's not just the person giving the review that matters, but the type of review and the backing for it. The issue queues are getting worse and worse and worse in bikeshedding. A number of patches have been seriously harmed by it, and have resulted in worse code as a result, if the code got in at all. Bikeshedding by nature defies "the community will figure it out" solutions. Sure, one can make reasoned arguments for or against a given approach, if others are willing to listen. Some are just religious about whatever the issue is, which is a problem. Remember the LOWER() issue for user login, and its sibling issue of user display names? It's spawned several multi-hundred issues and still isn't in because no one with actual authority to do so stepped in and said "do X, Y, Z and I will commit that, kthxbye". The only people with authority to do so are Dries and webchick, neither of whom are willing to do so 99% of the time. I won't even touch that question at this point until someone actually makes a decision. At current rate, that means I won't be touching it until sometime in 2012. That is a problem.

And of course, the time to make such a statement is before the code is written, not after. It's time to revise our long-standing motto:

Talk is silver, code is gold, a plan is platinum.

I really enjoyed reading this, you explain the situation very lucidly and concretely. What's more disturbing to me is how many people who used to be core developers who don't anymore because it is so time consuming and frustrating. I've been "doing the Drop" for about 3 years now professionally, but I only really got into core dev (beyond a patch or two) since I've been working @ Acquia, and that mainly when Acquia is paying me to do it.

We at Acquia who dedicate a TON of our engineering resources to core dev have issues too. It is next to impossible to estimate the amount of time it will take to get something committed due to the "Responsibility without authority" issue, and the ease at which someone can come in and break consensus. Companies who's business is not centered around Drupal, but around clients absolutely cannot afford this development style, and this creates a vacuum where a lot of good developers don't get involved in core.

I think that the problem goes beyond CVS commit rights. It's about consensus decision making. Giving more CVS write accounts might help, but might also hurt. It's hard to say. I think that a combo of technology (distributed VCS + better tools) and cultural guidance / methodology is most needed. What does this mean?

Bring in some experts. People have been debating about how to debate for as long as they've been debating. There are a lot of devices and cultural norms which can be established to make core development more fun and efficient. It requires some structure, people can't just break consensus at any moment, there needs to be follow up and ownership, reputation needs to be codified and power has to be linked with responsibility in a way which doesn't alienate newbs. It's not easy, but the issue queues are like a conference call with 10000 people speaking different languages (all in English). I'm sure we can structure it a little better than that.

I see this discussion very closely related to the always recurrent topic of switching from cvs to other vcs.

I you have seen the video of Linus talking about Git at Google (http://www.youtube.com/watch?v=4XpnKHJAok8) you will understand what I mean. His explanation about how distributed vcs change the way teams work are very related to your idea of commiters for each system.

In summary, the idea is that each maintainer (actually any people) can maintain its own repository will full commit access, and the core maintainer just pull the changes in a subsystem from another (trusted) person repository, merging code and commit history. This creates a network of trust between maintainers that allows core commiters to more easily delegate the work.

Really interesting post and discussion. So far I agree with almost everything written here. And I'm one more of that developers that periodically becomes frustrated with core development, then spend some time working on contrib only, then try again some core patches.... (I never learn) Just the solution is not a simple thing we have to do. Better we need to rework the whole process.

Some gems in this thread:
- Larry: "Talk is silver, code is gold, a plan is platinum."
- Jacob: "... the issue queues are like a conference call with 10000 people speaking different languages (all in English)"

In other words, our old process doesn't scale to work with our current size (both of code base and community)

Some things we do need (though we may need some more).

- More core committers for each subsystem, the original post idea. And we really can do that now we have unit testing. "So an API change in my subsystem breaks your subsystem? You should have written proper tests on your side".

- A roadmap (planning). There's no other way for 1000 people to work together than agreeing on some common goals first. So far the most similar thing we have is that "Favorite-of-Dries" tag. Come on, we can do better and specially we could discuss that in advance before wasting too much time on coding.

- Serious filtering for issues that go into Drupal's issue tracker. "So you've just installed Drupal and you filed a bug because you cannot find where to change the site name? Dude, we have forums for that. I don't want your support issues to bury my 100 hours hard worked patch."

- On the other side we want new users to easily find answers to their questions. But we are throwing on them 1000 search results from the issue tracker when they just search Drupal for ".

(The obvious fix for these last two would be to break down drupal.org in a few more targetted subsites)

- And last but not least, a smaller more focused Drupal core (yup #smallcore). I'm trying to fix an API, you want some AJAX in your forum pages, obviously we are not playing the same game. Sadly the only way my patch can get the proper attention would be to shut down yours.

So my motto (though I have to admit Larry's is the best one) would be: Smaller, more focused, better planned.

I've spoken about commit control on a per module / per maintainer basis before, and totally agree with it. In fact, what I had proposed in the past is that each module / area of responsibility in Drupal have at least 2 maintainers. If you can't find 2 maintainers, then you have to seriously examine whether it should stay in core...

I saw the argument for switching to another VCS multiple times now. However, I have two problems with that, which perhaps might only be a technical understanding issue, but perhaps not.

  1. Targets for community code gardeners

    You know, looking back on my involvement in Drupal core development thus far, I'd say that I mostly review patches, and in 80% of all cases, I'm spotting awkward code, ill logic, completely missing or wrong documentation, or entirely wrong implementation ideas/approaches. Although I really didn't intend it, I see myself (next to a few others) as "code gardener", who tries to get (all kind of) patches into "real" RTBC state before the responsible core maintainers look at it. Doing that work requires a knowledge and good understanding of all APIs as well as most features and implementations in Drupal core and contributed modules.

    I'm able to do that, because I have a target to work on. Whenever someone says: "Let's do sub-system branches and let the overall maintainers pull in the changes they want." then I start to worry about my target. Maybe I get the envisioned idea wrong, but if I get it right: It's not only that I would have to track completely separate targets (separate repo => separate issue queue?) and watch out for any patches, but it's also that I would have to understand every single sub-system repo/branch on its own to make any sense of the code in it or a patch that comes out of it.

    I.e. the entire idea currently feels like I would have to work in completely detached software projects for me. And if I can't follow/track all of them and provide peer-reviews in time, then I figure that the end result will be the same (which is partially what you want to prevent): Instead of having peer-reviewers constructively critize patches early, the overall core maintainers will have more work and more patches to reject, which in turn means that any subsequent, dependent patches and branches relying on an initial patch will break, because the implementation needs to be changed.

  2. Applying and using APIs properly

    Partially related to the previous point, based on my patch reviewing experience, another problem of detaching API development even further is that the knowledge of and awareness for the actual, current APIs, as well as the planned, in-progress APIs, as well as the actual, real needs for API implementors and implementations would vastly decrease. Too many API additions or changes are just squeezed in, only minimally reworking the implementations or even skipping that altogether, in case we just added something nice that doesn't break existing implementations. Other developers working on other parts or sub-systems of core don't even know of the new goodness that already exists or will exist very soon. So their code still assumes old APIs, often using ugly workarounds to achieve what they want. Currently, it is easy for people like me to overlook most of the big picture and point developers to already existing or pending (almost RTBC) changes that will make their life and patches much easier. And most often, those pointers span over multiple sub-systems/APIs.

    It sometimes scares me that some sub-system developers don't have any idea of another sub-system, which they may need to touch to implement their improvements. I'm glad to help then, but you know, it feels a bit strange when someone tries to change something without knowing about the consequences.

    There are two very good examples that showcase the point of this problem in its full range: If every sub-system would be developed and implemented as nice and consistently as DBTNG, then I would not see a problem at all (though all of us also know the major effort and work that was required to get there). However, if you look at things like triggers and actions, then you see something that was intended and should've been a proper sub-system, but still is mostly dead code in Drupal core. No one really knows about it and therefore no one really cares about it - although it could be the major building block for total awesomeness in customizing most of Drupal's assumptions.

    That is the full range, but there is a lot of in between - namely the form system, rendering system, menu system, and theme system - which all have their own totally awkward nightmares if you look at some of the actual implementations.

I'm not sure how to summarize this, but let me try: First and foremost, I'm concerned about the proper implementation of sub-systems and APIs, and the communication of changes to others, which is a problem today already. We need sub-system specialists to improve the low-level, but we also need generalists that are able to understand and track all sub-systems on a low-level, so the framework as a whole makes sense in the end and especially for the implementations.

sun, I agree entirely. The problem with Drupal's lack of structure is that it ends up with even more segmentation, not less. Because the same people are trying to look at the forest AND the leaves, there's no time left to coordinate and communicate. Even just documenting and coordinating the architecture of DBTNG has been a challenge for me, and that's just one subsystem. And the reason it's "nice and consistent" is that it was largely a single architect doing a complete rewrite, and it was small enough (relatively speaking) that I could fit the whole thing in my head at once. Without someone (me or otherwise) really hounding on the architecture of that system, and curating it over time, I'm quite sure in 2-3 versions it will turn into a mess just like so many other systems. (Of course, I am going to do my darndest to ensure that doesn't happen, but we'll see.)

We need some way to have that coordination, so that while chx is rooting around inside the guts of FAPI and Moshe is doing crazy things to the rendering system someone can spot the parallels there and get them talking so figure out how to merge efforts, or at least make them parallel, or see if there are commonalities that can be factored out to a common system, or any number of other really good things that currently don't happen except by accident. And that person, doing that, will simply not have anywhere near enough bandwidth to also be checking every patch to keep out tab characters or missing curly braces. Those need to be separate jobs.

Ideally, for me, we'd have a small number of high-level PMs (Dries et al) doing that sort of coordinating and cat herding on a macro scale and then other people doing lower-level development and code management on a smaller scale. The Macro leads can then coordinate the Micro leads, resulting in a more consistent overall architecture, common patterns, shared optimizations, good refactoring, and, dare I say it, vision and roadmap. A roadmap does not have to be a set-in-stone schedule of exact features; in fact it should not be. (See Eaton's excellent presentation from Paris.)

Right now, there's still stuff in D7 that, despite being a major major contributor to it, I don't even know about. That's really scary.

" In the interest of incremental changes (and Extreme Programming) it makes sense to commit the code even if it is not as good as it could be. When writing a follow-up patch one will most likely find shortcomings of the previous patch and clean it up along the way. This process is much faster and much more effective. This allows the majority of code to be better overall instead of just select areas that are reviewed with intense scrutiny."

Sometimes it isn’t the perfect gets in the way of the good, but the revolutionary gets in the way of the evolutionary.

My first dip into core contribution was instituting a direct way to intercept and rewrite messages sent with drupal_set_message(), d.o. issue 127262, initially proposed against D5. It started at 'cool, nice and clean', 'let's get it in', 'it is needed', only to get bogged down with 'it would be really great if it also.....'. It is now pushed to D8 with comments of hooking into drupal_mail() instead and a conversation of how good devs shouldn't place static string messages in their modules. Cancel a decent patch because we trust contrib authors to always write good code, right?

10 sites later I still use my patched core code because it did the required job, and I am sure others would have used it much in the last two versions of Drupal as well.

So I am all for incremental changes being pushed in that are immediately followed by a new issue/patch pair to help complete the work. Also, if patch approval cycles were sped up, this may not be needed as much.

Good find, Pedro!

A lot of the stated problems are not technical in nature - they require better management or communication skills.

However, I agree that a lot could be achieved by switching to a DVCS - git, bzr, hg - whatever, as long as it's distributed. Take for example the overlays patch. It was being discussed for over 400 issues, growing steadily in size to over 60KB, beginning to affect a lot of other subsystems and modules (batch API, toolbar module...), constantly breaking installs and needing to be rerolled.

Then it got split up into dozens of mini-issues, all of which are easier to handle, but also making it more difficult to see which parts belong, which got committed etc. A seperate branch showed up on GitHub to deal with this - some are comfortable with this, some aren't, it's a seperate system...

A decent DVCS would allow overlay development to continue on a seperate branch. Changes could be pulled in from HEAD at any point, easily, no need for rerolling. Experimental branches could be run, subsystems changed at will - if it doesn't turn out well, we can ditch the changes. If it turns out fine, we send a pull request to the core maintainers, who either merge the whole thing in, or can cherry-pick changes.

It means a lot less frustration on unnecessary technical hurdles (rerolling, large patches, etc.) which could in itself improve communication.

I'd be happy to help out in migrating to a DVCS if it's the way forward.