Dead code

When I look for a module that fits my needs I always ask myself.
Is this module mantained? Will I get bugfixes when I report them?
Will the API be breaking soon?

The node- and npm-folks take pride in the growth-rate of the node community. In this JS-Conf-Talk, Mikael Rogers shows graphs where npm outnumbers Maven and is still the fastest growing opensource ecosystem in the world. "240 new packages per day", "Make it supersimple". The npmjs website itself states: "When everyone else is adding force, we work to reduce friction.".

I do agree up to a point. It's great to reduce friction. A few years ago, I was thinking about publishing some Maven modules to Maven-Central. I had a short look at the instructions and then decided it was too complicated and not worth the time spent (for a hobby). Instead I did what I think many people do: Setup my own Maven repository on my own server.
Last fall, I started publishing npm packages and it was really easy. Enter mail-adress (caution, it's public), enter password, confirm, setup a local .npmrc and run npm publish.

That's probably good, isn't it. Its good for authors and it's good for npm. It's also good for the community, because there are a lot of packages for every purpose that can easily be used as dependency.

But...

When I search npm, the quality of the resulting packages is often not as good as I would wish for. You search and click on a package. There is not even a README. You search and click on a package. It hasn't been modified for 2 years, but it's still version 0.2.0. After clicking on multiple packages, you realize that they are all forks from the same github project, published under different names. Which one should you take?

Is it just bad luck, me finding all these stale, undocumented and forked packages? Or do those package make up a major part of the package-pool on npm, because that would clearly be the downside of "reducing friction".

I have done a litte research on version numbers and modification times. For this research, I'm going to make a couple of assumptions:

Semantic versioning

The first assumption is, that package authors generally use Semantic Versioning (SemVer) to version their modules. SemVer is a versioning scheme that makes the version-number meaningful in a way computers can understand:

As long as you give it a version 0.x.x, you show that the module is still in the "initial development phase", which basically means that you are still trying to find a good API. You are allowed to make arbitrary changes to the API until your giving the version 1.0.0 to your module.

Once you do that, SemVer says: Increase the patch version (i.e. 1.0.0 to 1.0.1) for bugfixes. Increase the minor version (i.e. 1.0.0 to 1.1.0) for new features. Increase the major version (i.e. 1.0.0 to 2.0.0) for breaking API changes. Never release two versions of your software under the same version number.

I do not know how many node developers really care about SemVer, but it is kind of built into npm. When you define dependencies for your package, you can say: "use version ^1.2.4". This basically means: "Use any version, from 1.2.4 up to (but excluding 2.0.0)". According the the SemVer specification, you can be sure to depend on a version that contains the features you need and has no breaking changes.
So, it's built-in and I think the usual node-developer knows that and tries to comply. Well, some may try harder than others, but I think it's a good assumption.

Major Version Zero

As I said, version like 0.1.2 are meant for initial development until the package API is considered stable.

My second assumption is, that if an author publishes a module with a zero major version, he thinks that the module is not ready yet.
Before I publish a module with a non-zero major version, I inspect the dependencies and make sure it does not depend on any package with a 0.x.x version.

A lot of modules like grunt, karma, pm2 and other packages from "most downloaded" list on npm, have a 0.x.x version, so my assumption may be plaing wrong.

Still I think, that when an author publishes a package, and he considers the package to be production-ready, he should choose version 1.0.0. I think grunt is doing it wrong.

Research

I have replicated the npm-registry to a local CouchDB instance to find out, how many of the currently 176.159 packages are actually usable in terms of my assumption.

I created two views.

  • zeroers containes all documents with a version-number 0.x.x.
  • doublezero containes all documents with a version-number 0.0.x. (I don't use this view in the blog post anymore, but I created it)

Both views use the date of the last modification (npm publish) as key, so that it is possible able to determine, how many modules with a zero major-version were published a certain time ago.
This is a short summary of my results:

Of currently 174.917 modules in my local replica, there are 121.705 that have a version number 0.x.x, so from my stated point-of-view, about a third of the libraries can be used in production.
50.857 of the 0.x.x version modules have been modified in 2015, so this is another third that is worthwhile looking at, but that I would omit for production use.

Conclusion

About one third of the packages in the npm registry are packages that are neither complete (from the authors point of view) nor actively developed.

It would be nice to have a tool that hides those packages from the search result. I think the folks at npmjs.com would not do that, because they would consider it as adding friction. If find some time, I may be setting up a site myself that allows such filters.

For now, I would like to here your thoughts about. So please enter any comments below, contact me on Twitter (@knappi79), or use my npm email-adress to send me some mail.