Mailing List Archive

Release process
Hi everyone,

There have been a number of calls to make the release process more
predictable (or maybe just faster). There are plenty of examples of
projects that have very predictable release schedules, such as the
GNOME project or the Ubuntu Linux distribution. It's not at all
unreasonable to expect that we could achieve that same level of
predictability if we're prepared to make some tradeoffs, such as:

1. Is the release cadence is more important (i.e. reverting features
if they pose a schedule risk) or is shipping a set of features is
important (i.e. slipping the date if one of the predetermined feature
isn't ready)? For example, as pointed out in another thread + IRC,
there was a suggestion for creating a branch point prior to the
introduction of the Resource Loader.[1] Is our priority going to be
about ensuring a fixed list of features is ready to go, or should we
be ruthless about cutting features to make a date, even if there isn't
much left on the feature list for that date?
2. Projects with generally predictable schedules also have a process
for deciding early in the cycle what is going to be in the release.
For example, in Ubuntu's most recently completed release schedule [2],
they alloted a little over 23 weeks for development (a little over 5
months). The release team slated a "Feature Definition Freeze" on
June 17 (week 7), with what I understand was a pretty high bar for
getting new features listed after that, and a feature freeze on August
12 (week 15). Many features originally slated in the feature
definition were cut. Right now, we have nothing approaching that
level of formality. Should we?
3. How deep is the belief that Wikimedia production deployment must
precede a MediaWiki tarball release? Put another way, how tightly are
they coupled?

Thoughts on these? Any other tradeoffs we need to consider? We're
going to have a number of conversations over the coming days on this
topic, so I wanted to add a little structure and get some (more)
initial impressions now.

Rob

[1] MZMcBride's mail:
http://lists.wikimedia.org/pipermail/wikitech-l/2010-October/049969.html
...which in turn references IRC from 2010-10-18 @ 14:08 or so:
http://toolserver.org/~mwbot/logs/%23mediawiki/20101018.txt
[2] Ubuntu Maverick Meerkat (10.10) release schedule:
https://wiki.ubuntu.com/MaverickReleaseSchedule

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Thu, Oct 21, 2010 at 7:56 AM, Rob Lanphier <robla@wikimedia.org> wrote:

> 1. Is the release cadence is more important (i.e. reverting features
> if they pose a schedule risk) or is shipping a set of features is
> important (i.e. slipping the date if one of the predetermined feature
> isn't ready)? For example, as pointed out in another thread + IRC,
> there was a suggestion for creating a branch point prior to the
> introduction of the Resource Loader.[1] Is our priority going to be
> about ensuring a fixed list of features is ready to go, or should we
> be ruthless about cutting features to make a date, even if there isn't
> much left on the feature list for that date?
>

I'm afraid that branching before RL merge is not going to help in the
present state of affairs. We have a zillion of unreviewed and untested
revisions before that, so maintaining two branches will require us to
virtually double the efforts.


> 2. Projects with generally predictable schedules also have a process
> for deciding early in the cycle what is going to be in the release.
> For example, in Ubuntu's most recently completed release schedule [2],
> they alloted a little over 23 weeks for development (a little over 5
> months). The release team slated a "Feature Definition Freeze" on
> June 17 (week 7), with what I understand was a pretty high bar for
> getting new features listed after that, and a feature freeze on August
> 12 (week 15). Many features originally slated in the feature
> definition were cut. Right now, we have nothing approaching that
> level of formality. Should we?
>

Obviously, we're not ready to determine the exact date of 1.17 release,
because we worked on it (and are continuing doing so) without a set date in
mind. The question is what we should do to make things more predictable for
1.18. When we see how well that goes on, we could decide how strict we want
our schedule to be - IMHO, Ubuntu's way results in buggy releases, as people
reported some blatantly stupid regressions in 10.10.


> 3. How deep is the belief that Wikimedia production deployment must
> precede a MediaWiki tarball release? Put another way, how tightly are
> they coupled?
>

I believe that every developer believes so.


> Thoughts on these? Any other tradeoffs we need to consider? We're

going to have a number of conversations over the coming days on this

topic, so I wanted to add a little structure and get some (more)
> initial impressions now.
>

Can these discussions be made accessible to those of us who will not be
present? A skypecast would be ideal, but simpler ways would do, including
text transcripts.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Wed, Oct 20, 2010 at 11:56 PM, Rob Lanphier <robla@wikimedia.org> wrote:
> 1.  Is the release cadence is more important (i.e. reverting features
> if they pose a schedule risk) or is shipping a set of features is
> important (i.e. slipping the date if one of the predetermined feature
> isn't ready)?  For example, as pointed out in another thread + IRC,
> there was a suggestion for creating a branch point prior to the
> introduction of the Resource Loader.[1]  Is our priority going to be
> about ensuring a fixed list of features is ready to go, or should we
> be ruthless about cutting features to make a date, even if there isn't
> much left on the feature list for that date?

IMO, the best release approach is to set a timeline for branching and
then release the branch when it's done. This is basically how the
Linux kernel works, for example, and how MediaWiki historically worked
up to about 1.15. We'd branch every three months, then give it a
while to stabilize before making an RC, then make however many RCs
were necessary to stabilize. This gives pretty predictable release
schedules in practice (until releases fell by the wayside for us after
1.15 or so), but not anything that we're forced to commit to.

(Actually, Linux differs a lot, because the official repository has a
brief merge window followed by a multi-month code freeze, and actual
development occurs in dozens of different trees managed by different
people on their own schedules. But as far as the release schedule
goes, it's "branch on a consistent timeframe and then release when
it's ready", with initial branching time-based but release entirely
unconstrained. So in that respect it's similar to how we used to do
things.)

I don't think it's a good idea to insist on an exact release date, as
Ubuntu does, or even to set an exact release date at all. That could
force us to release with significant regressions if they come up at
the last minute. On the other hand, I don't see any real benefits.
Does anyone care exactly when MediaWiki is released? If so, why can't
they just use RCs? The RC tarball is just as easy to unpack as the
release tarball.

I also don't think it makes any sense for us to do feature-based
releases. The way that would work is to decide on what features you
want in the release, then allocate resources to get those features
done in time. But Wikimedia currently doesn't use the releases, it
deploys new features continually. So resources will naturally not be
targeted at the release date, they'll be targeted for deployment
whenever they're done. Wikimedia has no big reason to pay people to
rush to complete something in time for a release that it isn't going
to use anyway.

Furthermore, even if Wikimedia did use releases -- IIRC, you thought
that was a reasonable plan when this came up before -- I still think
feature-based releases are a bad idea. It encourages you to either
delay releases excessively or ship half-baked features. If you
instead say that you'll ship whatever is mature at the time of
release, with no commitment to what makes it in, it encourages more
focus on correctness and quality. Feature-based releases really only
belong in the proprietary software world, where the vendor needs a
feature list to encourage people to pay for the new version.

> 2.  Projects with generally predictable schedules also have a process
> for deciding early in the cycle what is going to be in the release.
> For example, in Ubuntu's most recently completed release schedule [2],
> they alloted a little over 23 weeks for development (a little over 5
> months).  The release team slated a "Feature Definition Freeze" on
> June 17 (week 7), with what I understand was a pretty high bar for
> getting new features listed after that, and a feature freeze on August
> 12 (week 15).  Many features originally slated in the feature
> definition were cut.  Right now, we have nothing approaching that
> level of formality.  Should we?

IMO, no. I think it's best to just ship whatever's done when the
release branch is made. Processes like Ubuntu or Mozilla have only
make sense when the organization paying for development is primarily
interested in the actual release, not when the organization is
primarily interested in its own use of the product. In the latter
case, it makes much more sense to do incremental development and
deployment and do releases mostly as an afterthought.

Wikimedia is in an unusual position here, really. Very few sites that
pay for in-house code development for their own use then make real
open-source releases of it. Either they keep it closed or just throw
source over the wall occasionally, or they're interested mostly in
getting third parties to use it. I'm not personally familiar with
other open-source projects in a similar position to us, although they
exist (like StatusNet?). We have to be careful with analogies to
software development that's dissimilar in purpose to ours.

> 3.  How deep is the belief that Wikimedia production deployment must
> precede a MediaWiki tarball release?  Put another way, how tightly are
> they coupled?

IMO, it's essential that Wikimedia get back to incrementally deploying
trunk instead of a separate branch. Wikipedia is a great place to
test new features, and we're in a uniquely good position to do so,
since we wrote the code and can very quickly fix any reported bugs.
Wikipedia users are also much more aware of MediaWiki development and
much more likely to know who to report bugs to. I think any site
that's in a position to use its own software (even if it's
closed-source) should deploy it first internally, and if I'm not
mistaken, this is actually a very common practice.

Beyond that, this development model also gives volunteers immediate
reward for their efforts, in that they can see their new code live
within a few days. When a Wikipedia user reports a bug, it's very
satisfying to be able to say "Fixed in rXXXXX, you should see the fix
within a week". It's just not the same if the fix won't be deployed
for months.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On 10/21/10 2:16 PM, Aryeh Gregor wrote:

>> 3. How deep is the belief that Wikimedia production deployment must
>> precede a MediaWiki tarball release? Put another way, how tightly are
>> they coupled?
>
> IMO, it's essential that Wikimedia get back to incrementally deploying
> trunk instead of a separate branch.

I agree with this very strongly.

I would like to know what (if any) arguments there are for doing a
separate deploy branch. It seems to me that we ought to be deploying
constantly to the website, and making occasional MediaWiki branch
releases. On the projects we want timeliness, and downstream MediaWiki
packagers want stability,

For what it's worth, I'm influenced by my former job at Flickr, where
the practice was to deploy several times *per day*, directly from trunk.
That may be more extreme than we want but be aware there are people who
are doing it successfully -- it just takes a few extra development
practices.

BTW, I wanted to say this stuff earlier, but I found that I couldn't
respond meaningfully to Rob's questions. They were "big questions"
asking for a lot of context, and a relative newcomer like me is a bit
intimidated by those.

--
Neil Kandalgaonkar ( ) <neilk@wikimedia.org>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
Aryeh Gregor wrote:
> On Wed, Oct 20, 2010 at 11:56 PM, Rob Lanphier <robla@wikimedia.org> wrote:
>> 1. Is the release cadence is more important (i.e. reverting features
>> if they pose a schedule risk) or is shipping a set of features is
>> important (i.e. slipping the date if one of the predetermined feature
>> isn't ready)? For example, as pointed out in another thread + IRC,
>> there was a suggestion for creating a branch point prior to the
>> introduction of the Resource Loader.[1] Is our priority going to be
>> about ensuring a fixed list of features is ready to go, or should we
>> be ruthless about cutting features to make a date, even if there isn't
>> much left on the feature list for that date?
>
> IMO, the best release approach is to set a timeline for branching and
> then release the branch when it's done. This is basically how the
> Linux kernel works, for example, and how MediaWiki historically worked
> up to about 1.15. We'd branch every three months, then give it a
> while to stabilize before making an RC, then make however many RCs
> were necessary to stabilize. This gives pretty predictable release
> schedules in practice (until releases fell by the wayside for us after
> 1.15 or so), but not anything that we're forced to commit to.
>
> (Actually, Linux differs a lot, because the official repository has a
> brief merge window followed by a multi-month code freeze, and actual
> development occurs in dozens of different trees managed by different
> people on their own schedules. But as far as the release schedule
> goes, it's "branch on a consistent timeframe and then release when
> it's ready", with initial branching time-based but release entirely
> unconstrained. So in that respect it's similar to how we used to do
> things.)
>
> I don't think it's a good idea to insist on an exact release date, as
> Ubuntu does, or even to set an exact release date at all.

+1. Fuzzy dates are good, but setting a fixed date is not.
This doesn't mean that WMF shouldn't be more lazy in allocating
resources for the release, though.


> Does anyone care exactly when MediaWiki is released? If so, why can't
> they just use RCs? The RC tarball is just as easy to unpack as the
> release tarball.

Because RC have that unstable feeling. So many people end up not testing
the RCs, which make wmf deploys much more important.


> I also don't think it makes any sense for us to do feature-based
> releases. The way that would work is to decide on what features you
> want in the release, then allocate resources to get those features
> done in time.

We have had too much chaotic releases. I don't think we should aim for
release delaying features for now.
It's fine planning a set of features, or tweaking a bit the dates to
stabilize some feature / release before a branch merge.

Plus, we don't have such big features missing. A normal release wil lbe
just a lot of small fixes and tiny new features.
We have a number of them for 1.17 but that's an anomaly (and due to the
delay).



>> 3. How deep is the belief that Wikimedia production deployment must
>> precede a MediaWiki tarball release? Put another way, how tightly are
>> they coupled?
>
> IMO, it's essential that Wikimedia get back to incrementally deploying
> trunk instead of a separate branch. Wikipedia is a great place to
> test new features, and we're in a uniquely good position to do so,
> since we wrote the code and can very quickly fix any reported bugs.
> Wikipedia users are also much more aware of MediaWiki development and
> much more likely to know who to report bugs to. I think any site
> that's in a position to use its own software (even if it's
> closed-source) should deploy it first internally, and if I'm not
> mistaken, this is actually a very common practice.

I consider that very important. Specially for a big release such as the
upcoming one. A WMF deployment will get it more tested in a few weeks
than many months by normal third-party users (specially in feedback terms).

I don't oppose to having a wmf branch. It comes from the admission that
there are live patches, and having a branch actually documents them and
allow us to see what is really deployed.
However I completely agree with Aryeh on the importance of wmf running
almost trunk. The process itself could be automated, eg. a cron job
automatically branching from trunk each Tuesday morning, and having the
deploy programmed for Thursday. NB: I'm assuming a model where everyone
can commit to the branch in the meantime.


> Beyond that, this development model also gives volunteers immediate
> reward for their efforts, in that they can see their new code live
> within a few days. When a Wikipedia user reports a bug, it's very
> satisfying to be able to say "Fixed in rXXXXX, you should see the fix
> within a week". It's just not the same if the fix won't be deployed
> for months.

+10.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Thu, Oct 21, 2010 at 6:31 PM, Neil Kandalgaonkar <neilk@wikimedia.org> wrote:
> For what it's worth, I'm influenced by my former job at Flickr, where
> the practice was to deploy several times *per day*, directly from trunk.
> That may be more extreme than we want  but be aware there are people who
> are doing it successfully -- it just takes a few extra development
> practices.

Personally, I think it would be awesome if we could migrate to this
level of deployment frequency eventually. I imagine that
comprehensive automated test suites are a major part of making this
reliable. To the extent you can share any details about how stuff
works at Flickr, what long-term changes are necessary for this to be
practical?

On Thu, Oct 21, 2010 at 6:35 PM, Platonides <Platonides@gmail.com> wrote:
> However I completely agree with Aryeh on the importance of wmf running
> almost trunk. The process itself could be automated, eg. a cron job
> automatically branching from trunk each Tuesday morning, and having the
> deploy programmed for Thursday. NB: I'm assuming a model where everyone
> can commit to the branch in the meantime.

We shouldn't branch at all for routine deployments of trunk. Just
make sure everything looks good, maybe revert or temporarily disable
anything that hasn't seen enough testing, then deploy current trunk.
That way we don't have to worry about backporting.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Thu, Oct 21, 2010 at 3:31 PM, Neil Kandalgaonkar <neilk@wikimedia.org>wrote:

> On 10/21/10 2:16 PM, Aryeh Gregor wrote:
>
> >> 3. How deep is the belief that Wikimedia production deployment must
> >> precede a MediaWiki tarball release? Put another way, how tightly are
> >> they coupled?
> >
> > IMO, it's essential that Wikimedia get back to incrementally deploying
> > trunk instead of a separate branch.
>
> I agree with this very strongly.
>
> I would like to know what (if any) arguments there are for doing a
> separate deploy branch. It seems to me that we ought to be deploying
> constantly to the website, and making occasional MediaWiki branch
> releases. On the projects we want timeliness, and downstream MediaWiki
> packagers want stability,
>


Original announcement thread from July 2009:
http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg03903.html

The original purpose of having a deployment branch was so that we actually
knew what we were running! :) Even with fairly regular deployments from
trunk, we had two big problems:

1) "live hacks" -- little tweaks, patches, and one-off hacks in the live
code to work around temporary problems. These would accumulate over time and
eventually we'd end up with surprise merging problems at deployment time, or
just forgetting to merge important bits of code back into trunk... sometimes
hacks that should have been kept got even accidentally removed when a new
version got pulled in!

Knowing that was was in deployment was *exactly* what was in SVN means that
we know
a) where they came from
b) when and by whom they were committed
and
c) allows folks to easily see the difference between trunk and deployment
and make sure that important work is in fact merged back.

In theory, live hacks are punishable by eternal torture in the bowels of SVN
branching. In practice, they'll happen as long as it's _possible_ to deploy
code that's not in SVN.


2) Temporary breakages on trunk right in the middle of an important quick
fix

If we don't do those one-off fixes, workarounds, and debugging hacks as live
hacks, the alternative without a deployment branch is to actually do them
*on* trunk. That means that when you want to slap in a one-line tweak to fix
or debug something, you *also* have to deploy the last few days' worth of
trunk changes.

Hopefully there are no regressions or incompatible changes. Right? Right? :)


But ultimately, wmf-deployment was never intended to diverge from trunk by
more than a couple weeks in regular usage; I was aiming for a weekly or
biweekly deployment schedule.

With the sort of backlog we've developed during the long slog of stabilizing
the new JavaScript layers, they've ended up HUGEly divergent, which is very
unpleasant -- especially with SVN's primitive merging systems.

For what it's worth, I'm influenced by my former job at Flickr, where
> the practice was to deploy several times *per day*, directly from trunk.
> That may be more extreme than we want but be aware there are people who
> are doing it successfully -- it just takes a few extra development
> practices.
>

If in-progress work were done on branches and merged to trunk when stable,
that would be a grand way to go. That's a big pain in the butt with SVN's
branching, unfortunately, but much easier with git.

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On 10/21/10 4:04 PM, Aryeh Gregor wrote:
> On Thu, Oct 21, 2010 at 6:31 PM, Neil Kandalgaonkar<neilk@wikimedia.org> wrote:
>> For what it's worth, I'm influenced by my former job at Flickr, where
>> the practice was to deploy several times *per day*, directly from trunk.
>> That may be more extreme than we want but be aware there are people who
>> are doing it successfully -- it just takes a few extra development
>> practices.
>
> Personally, I think it would be awesome if we could migrate to this
> level of deployment frequency eventually. I imagine that
> comprehensive automated test suites are a major part of making this
> reliable.

Nope. Automated tests help a lot with this approach but Flickr doesn't
have much better tests than MediaWiki does.

We *should* have better tests, but I would just say that it is not
required for us to have a great test suite before doing this.


> To the extent you can share any details about how stuff
> works at Flickr, what long-term changes are necessary for this to be
> practical?

Flickr engineers have already talked a lot about this in public. See
references below.

The main insight here is that branching is a bad way for a website to
manage change. We do not have an install base that's out there in the
world, like shrink-wrapped software, where we issue patches on CD. For a
website, we control the entire install base.[1]

What we need are ways of managing change across our server clusters, or
managing incremental feature and infrastructure upgrades. This leads to
"branching in code".

Doing things the Flickr way entirely would require:

1 - A "feature flag" system, for "branching in code". The point is to
start developing a new feature with it being turned off by default for
most environments and without succumbing to branching and merging
misery. In other words, day one of a new feature looks like this:

if ( $wgFeature['MyNewThing'] ) {
/* ... new code ... */
} else {
/* ... old code ... */
}

Of course if you're fixing bugs there's no need to hide that behind a
feature flag.


2 - Every developer with commit access is thinking about deployment onto
a cluster of machines all the time. Committing to the repository means
you are asserting this will work in production. (This is the hard part
for us, I think, but maybe not insurmountable).

3 - One can deploy with a single button press (and there is a system
recording what changes were deployed and why, for ops' convenience).

4 - When there's trouble, new deploys can be blocked centrally, and then
ops can revert to a previous version with a single button press.

5 - Developers are good about "cleaning up" code that was previously
protected by feature flags once the behaviour is standard. (HINT: this
is the part Flickr doesn't talk about in public... but as an open source
project with more visible dirty laundry, perhaps we can do better.)

This system does result in more "oops" moments. But the point is to make
those easy to recover from, and to have a culture where people aren't
blamed too much for this. Not to make a system that tries to ensure that
deploy branches can be tested to be almost perfect. The real problems
are always things that nobody anticipated anyway.


NOTES

[1] I am for the purposes of the argument ignoring MediaWiki as a
deliverable and only thinking about project websites.

REFERENCES

Here's the most concise presentation:
"Always Ship Trunk: Managing Change In Complex Websites" by Paul Hammond
http://www.paulhammond.org/2010/06/trunk/alwaysshiptrunk.pdf

And a longer talk about all this from Paul Hammond and John Allspaw
10+ Deploys Per Day: Dev/Ops Cooperation at Flickr
http://velocityconference.blip.tv/file/2284377/

Blog post about the Feature Flag system by Ross Harmes
"Flipping out"
http://code.flickr.com/blog/2009/12/02/flipping-out/



--
Neil Kandalgaonkar ( ) <neilk@wikimedia.org>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On 10/21/10 4:23 PM, Brion Vibber wrote:

> The original purpose of having a deployment branch was so that we actually
> knew what we were running! :) Even with fairly regular deployments from
> trunk, we had two big problems:
>
> 1) "live hacks" -- little tweaks, patches, and one-off hacks in the live
> code to work around temporary problems.

> In theory, live hacks are punishable by eternal torture in the bowels of SVN
> branching. In practice, they'll happen as long as it's _possible_ to deploy
> code that's not in SVN.

I feel that this has to be a symptom of some other problem. What sort of
things go into "live hacks"?

If they are about rapidly reconfiguring, rolling back, or turning off
features, I think that's better answered by having an explicit system to
do such a thing (see my other post in this thread about Flickr's system).


> 2) Temporary breakages on trunk right in the middle of an important quick
> fix
>
> If we don't do those one-off fixes, workarounds, and debugging hacks as live
> hacks, the alternative without a deployment branch is to actually do them
> *on* trunk. That means that when you want to slap in a one-line tweak to fix
> or debug something, you *also* have to deploy the last few days' worth of
> trunk changes.

Yes, definitely a problem. In the Flickr world, you're never more than a
few hours off of trunk anyway; but we're not in that world, so we start
to feel the need for a deploy branch.

--
Neil Kandalgaonkar ( ) <neilk@wikimedia.org>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Thu, Oct 21, 2010 at 5:18 PM, Neil Kandalgaonkar <neilk@wikimedia.org>wrote:

> The main insight here is that branching is a bad way for a website to
> manage change. We do not have an install base that's out there in the
> world, like shrink-wrapped software, where we issue patches on CD. For a
> website, we control the entire install base.[1]
>

Of course MediaWiki is a product for third-party use, too. :)


> Doing things the Flickr way entirely would require:
>
> 1 - A "feature flag" system, for "branching in code". The point is to
> start developing a new feature with it being turned off by default for
> most environments and without succumbing to branching and merging
> misery. In other words, day one of a new feature looks like this:
>
> if ( $wgFeature['MyNewThing'] ) {
> /* ... new code ... */
> } else {
> /* ... old code ... */
> }
>

Many features in MediaWiki have been developed in *exactly* this way, either
hidden behind a configuration switch or encapsulated within an extension
which simply isn't enabled until it's ready.

Where this really falls down is where you're refactoring a big subsystem; in
some cases we can keep the entire "new" system separate, and move things
over bit by bit -- and sometimes we've done exactly that -- but it can be
difficult if there are a lot of dependencies that need to be touched because
interfaces are changing. (Think of ResourceLoader and its predecessors as an
example here; lots of little things had to change just to get it in... but
there's still code that uses a lot of old systems just fine and can be
cleaned up bit by bit.)

It falls down more moderately when you're simply "fixing" or "enhancing"
code, and don't realize that you just introduced some breakage.


> 2 - Every developer with commit access is thinking about deployment onto
> a cluster of machines all the time. Committing to the repository means
> you are asserting this will work in production. (This is the hard part
> for us, I think, but maybe not insurmountable).
>

That's exactly what people are supposed to think when committing to
MediaWiki trunk ever since we switched to the continuous integration w/
quarterly release cycle a few years ago. Breakage in trunk is certainly not
something you're EVER supposed to do on purpose... but it still happens by
accident.

3 - One can deploy with a single button press (and there is a system
> recording what changes were deployed and why, for ops' convenience).
>

In the olden days we had exactly that:

svn up && scap

Addition of the deployment branch made it a two-step process -- first you
perform a single SVN command to merge changes down, then you do the above
command.


> 4 - When there's trouble, new deploys can be blocked centrally, and then
> ops can revert to a previous version with a single button press.
>

That's exactly what the deployment branch was created for -- ensuring that
deployed code was in source control meant that you actually *could* return
to a previous state.

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Thu, Oct 21, 2010 at 5:28 PM, Neil Kandalgaonkar <neilk@wikimedia.org>wrote:

> I feel that this has to be a symptom of some other problem. What sort of
> things go into "live hacks"?
>
> If they are about rapidly reconfiguring, rolling back, or turning off
> features, I think that's better answered by having an explicit system to
> do such a thing (see my other post in this thread about Flickr's system).
>

Primarily:

1) debug logging statements to provide additional information on problems
seen in production that can't yet be reproduced offline
2) temporary performance hacks to disable individual code paths in
particular circumstances (say, the caching bug that caused serious cache
contention on the 'Michael Jackson' article one day) -- these are usually
not "features" but more like "this chunk of processing for this feature when
used in a very particular way on this one article"
3) horrible, horrible temporary hacks to block particularly unpleasant
actions or make exceptions for something that other code doesn't yet allow.

These are usually done live because live because whatever you're reacting to
is live -- the code is part of a production debugging session.

Debug logging hacks usually are discardable immediately. Performance hacks
usually need to be maintained or replaced with better code -- these are the
ones we had to worry about not accidentally losing by replacing the live
deployment with code from trunk. :) Temporary hacks to disable or enable
things or help catch vandalism are sort of an in-between space.

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Fri, Oct 22, 2010 at 2:18 AM, Neil Kandalgaonkar <neilk@wikimedia.org> wrote:
>   if ( $wgFeature['MyNewThing'] ) {
>     /* ... new code ... */
>   } else {
>     /* ... old code ... */
>   }
>
The Aryeh method :) I must say I like it a lot.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Release process [ In reply to ]
On Thu, Oct 21, 2010 at 8:18 PM, Neil Kandalgaonkar <neilk@wikimedia.org> wrote:
> Nope. Automated tests help a lot with this approach but Flickr doesn't
> have much better tests than MediaWiki does.
>
> We *should* have better tests, but I would just say that it is not
> required for us to have a great test suite before doing this.

Interesting.

> Doing things the Flickr way entirely would require:
>
> 1 - A "feature flag" system, for "branching in code". The point is to
> start developing a new feature with it being turned off by default for
> most environments and without succumbing to branching and merging
> misery. In other words, day one of a new feature looks like this:
>
>   if ( $wgFeature['MyNewThing'] ) {
>     /* ... new code ... */
>   } else {
>     /* ... old code ... */
>   }
>
> Of course if you're fixing bugs there's no need to hide that behind a
> feature flag.

I always do this anyway, personally (as apparently Bryan noticed).
Sometimes it can get cumbersome and hard to maintain, but thankfully
it means I've never had to learn how to use SVN branches. :)

> 2 - Every developer with commit access is thinking about deployment onto
> a cluster of machines all the time. Committing to the repository means
> you are asserting this will work in production. (This is the hard part
> for us, I think, but maybe not insurmountable).

This was true when we had regular deployments too. Anything that
wasn't ready for production yet would just get reverted.
TranslateWiki still runs on trunk, and its developers vigorously prod
anyone who breaks trunk. So I think we're okay on this score.

> 3 - One can deploy with a single button press (and there is a system
> recording what changes were deployed and why, for ops' convenience).

We have something like this . . .

> 4 - When there's trouble, new deploys can be blocked centrally, and then
> ops can revert to a previous version with a single button press.

. . . not this, I don't think, but doesn't sound too hard.

> 5 - Developers are good about "cleaning up" code that was previously
> protected by feature flags once the behaviour is standard. (HINT: this
> is the part Flickr doesn't talk about in public... but as an open source
> project with more visible dirty laundry, perhaps we can do better.)

This doesn't seem like it should be hard.


Anyway, it's something to think about once we get review and
deployment caught up. Maybe we should do daily deployment instead of
weekly, or even multiple-times-daily.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l