Blog Rolling with MongoDB, Node.js and Coffeescript

This morning I woke up with a lingering thought on my mind that was left over from recent conversations. In the technical community we often get so invested in our work that rather than talk about the simple building blocks that build our success we talk about the huge breakthroughs we make. The problem however is that our breakthroughs most often aren’t accessible to someone who wants to just get started. So today I will give an intro tutorial to using node.js, coffeescript and mongodb to build a simple blog. It builds off the concept in a tutorial I first used to learn node.js more than a year ago, but with a completely from scratch approach. In this tutorial I will also cover practicing Behavior Driven Development using Mocha.

Streaming Files from MongoDB GridFS

Not too long ago I tweeted what I felt was a small triumph on my latest project, streaming files from MongoDB GridFS for downloads (rather than pulling the whole file into memory and then serving it up). I promised to blog about this but unfortunately my specific usage was a little coupled to the domain on my project so I couldn’t just show it off as is. So I’ve put together an example node.js+GridFS application and shared it on github and will use this post to explain how I accomplished it. :)

GridFS module

First off, special props go to tjholowaychuk who responded in the #node.js irc channel when I asked if anyone has had luck with using GridFS from mongoose. A lot of my resulting code is derived from an gist he shared with me. Anyway, to the code. I’ll describe how I’m using gridfs and after setting the ground work illustrate how simple it is to stream files from GridFS.

I created a gridfs module that basically accesses GridStore through mongoose (which I use throughout my application) that can also share the db connection created when connecting mongoose to the mongodb server.

We can’t get files from mongodb if we cannot put anything into it, so let’s create a putFile operation.

This really just delegates to the putFile operation that exists in GridStore as part of the mongodb module. I also have a little logic in place to parse options, providing defaults if none were provided. One interesting feature to note is that I store the filename in the metadata because at the time I ran into a funny issue where files retrieved from gridFS had the id as the filename (even though a look in mongo reveals that the filename is in fact in the database).

Now the get operation. The original implementation of this simply passed the contents as a buffer to the provided callback by calling store.readBuffer(), but this is now changed to pass the resulting store object to the callback. The value in this is that the caller can use the store object to access metadata, contentType, and other details. The user can also determine how they want to read the file (either into memory or using a ReadableStream).

This code just has a small blight in that it checks to see if the filename and fileId are equal. If they are, it then checks to see if metadata.filename is set and sets store.filename to the value found there. I’ve tabled the issue to investigate further later. :)

The Model

In my specific instance, I wanted to attach files to a model. In this example, let’s pretend that we have an Application for something (job, a loan application, etc) that we can attach any number of files to. Think of tax receipts, a completed application, other scanned documents.

Here I define files as an array of Mixed object types (meaning they can be anything) and a method addFile which basically takes an object that at least contains a path and filename attribute. It uses this to save the file to gridfs and stores the resulting gridstore file object in the files array (this contains stuff like an id, uploadDate, contentType, name, size, etc).

Handling Requests

This all plugs in to the request handler to handle form submissions to /new. All this entails is creating an Application model instance, adding the uploaded file from the request (in this case we named the file field “file”, hence req.files.file) and saving it.

Now the sum of all this work allows us to reap the rewards by making it super simple to download a requested file from gridFS.

Here we simply look up a file by id and use the resulting file object to set Content-Type and Content-Disposition fields and finally make use of ReadableStream::pipe to write the file out to the response object (which is an instance of WritableStream). This is the piece of magic that streams data from MongoDB to the client side.

Ideas

This is just a humble beginning. Other ideas include completely encapsulating gridfs within the model. Taking things further we could even turn the gridfs model into a mongoose plugin to allow completely blackboxed usage of gridfs.

Feel free to check the project out and let me know if you have ideas to take it even further. Fork away! :)

Enabling JMX in Gradle’s jetty Plugin

It’s another day, which means another gradle tip. I have been experimenting with JMX lately and using MBeanExporter to export spring beans so that I can interact with them over JMX (specifically, stopping and starting rabbitMQ consumers). I can get this working on any container easily enough but I really wanted to get it working with my locally running jetty instance launched by gradle.

First you’ll set a jettyConfig for the jettyRun task. I usually do this for both jettyRun and jettyRunWar:

The additionalRuntimeJars is needed because of a transitive dependency on mx4j. I don’t know why this is, but it is required. I add mx4j as a providedRuntime dependency along with jetty-management:

Finally you need to setup your jetty configuration to startup a JMX server. There’s a bit of freedom here with what you can do but here is one that I stole shamelessly from the jetty website:

Now run gradle jettyRun and have jconsole open a remote connection to service:jmx:rmi://localhost:2100/jndi/rmi://localhost:2099/jmxrmi and go do whatever you want to do with JMX. :)

Gradle Tip: Start/Stop Embedded Jetty for System Tests

I thought I’d share another feature of gradle that i have found extremely useful, starting and stopping an embedded jetty server when my tests run. This is really useful for projects that host web services as it allows me to hit them and very the correct results plus it verifies the full stack is configured correctly. One could quite possibly also use this setup on web projects and have Geb based tests run against their project.

Given a project setup using the jetty plugin as I described in my previous post, all you need to do is hook jetty into run before and after the test task:

And that’s it. Now whenever you run gradle test the embedded jetty server will run along with your tests.

Gradle: Using JNDI with the Jetty Plugin

I use gradle a lot at work and I believe one discovery that was a true win was discovering how to fake JNDI when using the jettyRun task for local development. Originally googling and searching the documentation didn’t yield anything so I thought I’d write a quick post detailing how to do it in case you’re like me and googling for the same thing.

First off you need to create a jetty-env configuration file and put it somewhere in your project (I prefer src/test/resources). Here’s a sample of one I use that uses H2 for the dataSource:

This uses an already running H2 instance and runs an init script located in src/test/resources for populating the database with some tables. From our gradle script we need to reference the file from the jettyRun task (I also add it to the jettyRunWar task as well).

Finally, to complete this example, we want H2 running before jetty kicks off. So we add the h2 dependency to our build script and run the main method of org.h2.tools.Server.

I’ve created a sample spring MVC 3 project that makes use of all the above for local development. Just clone it and run gradle jettyRun to see it in action!

Big Company vs. Small Company

The other day I was having lunch with a friend of mine who works for a medium sized company (by medium sized I mean large, but not Fortune 500 large). Our discussions touched a variety of topics by one that caught my attention was when he voiced his frustration on his current project. “We’re not doing much programming right now,” he quiped, “for the most part we’re doing static content management and updating pages that is basically a ‘Recent News’ section for the organization.” With all respect to my friend (who is a really good programmer) this discussion really reminded me of what I dislike about companies with large company mindsets.

It’s hard for me to put my finger on it, but the gist of it is that developers paid upwards of $60,000+ a year were doing static content management while a $15 an hour developer fresh out of college would most likely setup Drupal or a WordPress blog and let them update it while he focused on more important things. I’d know… I was once that $15 an hour developer.

It’s a common problem I’ve seen in large companies in my experiences… problem A is easily solved by tool B but tool B is written in language C while the company has embraced language D (and its derivatives). No good equivalent to tool B exists in language D so the developers will spend lots of time doing manual work, in which case you have one to five developers with degrees updating text (I’ve seen it). The worst case (and common) scenario is the company will possibly spend upwards of a quarter million or more developing a poor imitation of tool B in language D that will only work in house and will never be able to be used outside the company.

Why is this? Why is it so hard to just use tool B and keep focusing on the more important tasks at hand? From what I’ve observed in my career is that companies with large company mindsets rarely can consider using languages outside of their core language choice. It requires server provisioning, training, hiring developers or server administrators experienced with language C… and quite possibly the hiring of consultants who specialize in tool B. The prep time for such a task could easily take up to six months or more and in the end it probably will be decided that tool B isn’t up to the task.

Compare this to a more fluid development environment. The developer probably uses language D too but knows tool B would be the best choice to get the job done. He’d probably have the company drop 40 stones on a rackspace instance and install tool B and language C on it and integrate it with the current site written in language D. I’ve seen this done in days, if not hours.

Notice I really try to emphasize “big company mindset” over “big companies” as the guilty party of this tomfoolery. Just because you’re big doesn’t mean you have to act like this and I’ve seen small companies engage in this behavior as well. “Large Company Minded” companies tend to prefer a great deal of process in the way they do business and identify success with solid, foolproof process that can be adhered to by anyone. Don’t get me wrong, I’m not saying absolute chaos of no processes are a good thing but I believe that a business should have a certain degree of fluidity to be successful. Why not just do a quick cost analysis over how much it would cost to have a single server (probably even one already running some other application server) to run language C and tool B rather than redevelop tool B in language D? Why not just determine that that would be the best option and just do it instead of lollygagging around and wasting your company’s money?

Accessing a Connect Session From Socket.IO

UPDATE 11/28/2011: After talking with TJ Hallowaychuck I discovered I was doing it wrong… there are better ways to do this then the hack I had come up with. Using connect to parse the cooie you can use this instead:

Which is much cleaner than what I originally posted. The original post is intact for historical reasons. :)

I thought I’d post on a technique I’ve been using to associate the users session with a socket.io server. Although this technique was done in a pure node.js app, it’s probably possible to do the same to grab the session id from your PHP app or Grails app that is utilizing socket.io.

Anyhow, here’s what I’m rolling with:

Yep, it’s sneaky. I sniff the sid out and simply use redisClient (since redis is my backing session store) to look the session up. Now on all socket requests I can access the session directly. :)

My Current Java Workflow

I thought I’d share the current workflow I have been using at work and why I think it’s been a pretty decent scheme so far. For this post I’m only covering jar deployment but I could see it easily working the same for gems or node.js npm packages.

Just for a bit of background, here’s some of the tooling we’re using:

For this example, let’s create a jar that is really just a simple wrapper to Joiner.join in Google Guava. Yes this is pretty lame and has no use in the real world but we don’t want to detract from the point of this post, setting up a pragmatic build system.

So to start, we create a new project and create the various files needed for our project. Create a directory for the module (for this project we’re just calling it module-example). cd into and type “git flow init”. You’ll be prompted for branch names but I always just accept the defaults. If you type git branch you’ll see that you have two branches, develop and master, with develop checked out.

Now it’s time to create our initial project structure. I create build.gradle, settings.gradle, and gradle.properties with the following contents:


All this does is specify what gradle plugins we’ll be using, dependencies, and where to resolve our dependencies from. The javadocJar and sourcesJar tasks are just to produce javadoc and sources jars as part of the build task.

I’ll explain the maven plugin in better detail later.


This just specifies the groupId and the initial integration version. Nuff said.


Unfortunately gradle uses the folder name as the module name, which will be workspace be default in jenkins. You can change this in jenkins, but I prefer setting the module name explicitly to ensure the right name is always used regardless of the dir name.


This is my .gitignore to ignore eclipse and gradle generated artifacts. I’ve learned that lots of problems get caused by committing IDE files and you also don’t want to commit your compiled jars/classes/test-reports/etc. from gradle.

With everything done, I do a “git add -A” and “git commit -am ‘initial commit’” to start tracking files. Finally, I do a “git push –all” to push all branches to github. You can see my initial project structure here.

Creating the First Feature

To start with our first feature we use git flow to create a new feature branch.

git flow feature start the-combiner

This creates and checks out a new feature branch from develop. Now I just create a simple class named Combiner that looks like the following:

With the feature finished we add the untracked file, commit and type “git flow feature finish” to merge our changes back into develop. We also push the development branch back to github.

Now Would Be a Good Time to Setup Jenkins

I’m going to assume Jenkins is already setup. You’ll need these plugins installed:

  • Jenkins Artifactory Plugin
  • Jenkins Gradle plugin
  • Jenkins GIT plugin

I then setup a Jenkins job called “module-example snapshot”. This checks out any pushes to the develop branch, runs the gradle build task on it (which runs tests and produces artifacts on successful test passes) and then pushes a snapshot release to our in house artifactory server. This means any push to develop will trigger a build that releases a snapshot jar of that module that others could use for their development. For this I use the following artifactory plugin settings:

Next, I create a job named “module-example release” which does the same as above except it watches and checks out the master branch and publishes a release artifact rather than a snapshot.

Finally, I have a job named “module-example post-release” that checks develop out and bumps the version number up to the next integration release (1.0.1-SNAPSHOT) and pushes it back to the develop branch. I have not found an easy way to do this in jenkins so I simply have a task like the following in my scripts to bump the snapshot version:

If we push to develop now, we should see a new jar with the version 1.0.0-SNAPSHOT show up in our artifactory snapshots repository. :)

Let’s do a Release

Doing a release with git flow is pretty easy. We’re ready to deploy version 1.0.0 of our new module so we type

git flow release start 1.0.0

We now edit our gradle.properties file and take the -SNAPSHOT off the version number and do any other pre-release activities that might be needed.

Once we’re ready we run

git flow release finish 1.0.0 && git push --all

And sit back and watch the magic happen… 1.0.0 of our module is released to our artifactory server for others to use and when we fetch the latest upstream the next version is 1.0.1-SNAPSHOT in the develop branch.

Thoughts

Although the initial setup requires a little investment up front, the flow of actually working and releasing pays off in huge dividends IMHO. Our build script knows next to nothing about what SCM we use and where artifacts get published to… it’s only concerned about what it is ment for: building. Releasing is just a matter of running two commands and pushing master. Another nicety is that master ALWAYS reflects something that could be released… there’s no confusion on whether or not what is in master could be released.

In the near future I’ll follow up more with our evolving process of deploying JEE web applications using this process.

Playing To Win

Previously I had posted about the sad drudgery developers often have to deal with in large companies. I’ve decided to follow up that post due to two epic milestones the past week: taking RabbitMQ live within the company and getting github blessed by legal and infrastructure for internal use. My previous post was a rant in response to port 22 being blocked and preventing us developers from using github so I thought I’d detail how I got past the resistance and how you too as a corporate rebel can help bring change to your company as well. :)

First, let me set the stage. It has been my experience that larger companies tend to err on the side of caution when it comes to introducing new technologies/ideas/practices. This can be due to being burned in the past as well as a combination of getting more process or control in place due to social scalability issues. The later just boils down to it being easier to trust the devs you work with or near you versus 200 or so developers of which sadly a percentage have probably downloaded viruses or other silly things. There’s always a lot of resistance within these type of companies as adopting something new poses a risk that quite frankly some people might rather do without. So, given my successes in the last couple weeks I’ve decided to share some of my techniques for battling this resistance.

  1. Research. Do a lot of experimentation and research on what it is you are wanting to bring in. Sometimes this involves just using it on personal or side projects. I’ve also found that writing articles, blog posts, and presenting at usergroups about the topic is a GREAT way to learn more about it.
  2. Share Information. The best way to have a new idea gain traction is to share information about it. Brown bag lunches and internal presentations are the gold standard here, just give presentations and demos internally to those who are interested without trying to imply you’re hoping the company will adopt it. Not only will this expose people to it but it will also help you gain insight into what kind of concerns people might have. In addition to presentations, also share links from time to time on case studies, tutorials, and the like. Extra bonus: you might be a catalyst that will ignite someone else to help bring the idea forward.
  3. Find an ally. With my situation with RabbitMQ, I found allies in different departments and even on the business side. These “alliances” came in strong for the win in the 11th hour, especially when the dreaded “what is the business justification for…” resistance tactic came in. When that question came up, my product owner was able to pipe up and make a strong case for why he needed what I was proposing. Not only that, but he was able to help coordinate with managers in other departments to ensure we succeeded.
  4. Be there. Every one hates 1am production support issues. Remember that what you’re trying to bring to the company will undoubtedly lead to someone’s lost sleep sooner or later and they know it. Be ready to take the fallout from the risk that your change will bring and be ready to respond to production support calls. The first night we took RabbitMQ live I had to bust my butt and head down to the data center at two in the morning to respond to “queue depth limit exceeded” alerts because message consumers weren’t able to stay caught up with the heavy load from publishers. The moral is, be ready to be available and respond to those situations. Make sure others know that you are taking the responsibility for the risks that you are bringing. Bonus points in outlining those risks and how you will counter them.
  5. Never give up. If you’re convinced that something will improve your company and make a change for the better, stand behind it 100%. No matter how many times you fail keep pushing it, push until it breaks. There are a lot of times I’ve seen people talk about things like “it’d be a lot better if we used something like Hadoop instead of our homebrew framework” and not do squat about it. They might even self defeat themselves by making an excuse for why it probably won’t do well in our environment. Don’t let that happen! Throw your full weight behind it don’t give up.

Hope that helps for all of you other change agents out there! :)

Respect

So the past couple of weeks at work have left me with a bad taste in my mouth. A bad taste I thought I’d reflect on. Granted it has been an awesome couple of weeks with lots of progress forward, there’s still those small nagging issues. It’s the silly idea in the corporate world that the developers you hire must be shielded from doing damage to themselves.

You know what I’m talking about… organizations that block tons of websites and introduce draconian network policies that are completely prohibitive. Want to share your delicious bookmarks with your fellow teammates on gradle? Too bad, it’s blocked. Want to download a zip file of an example application similar to what you’re working on? No way, even though our proxy will detect a virus we’re still blocking it from you. Want to push your personal development project to github? No dice… we turned port 22 off. Only for the developers though, operations can still ssh out. Oh yeah, and so can developers that aren’t at YOUR geographical location. And no this isn’t an accident, there are security reasons for why we did this, just for your development group. I guess I should be happy that we actually can download jars so we can resolve dependencies through gradle. Yeah, couple years back the place blocked jars from being downloaded and you had to come up with a “business justification” to have something like Spring downloaded.

I know, I’m venting. But there’s something to be said about such “security measures”: a lack of respect. Anyone under such restrictions will simply feel like they’re being treated like a child and feel a little disgusted about it. Especially when other employees aren’t subjected to such restrictions. When a company does that all it does is place their developers at the bottom of the food chain and you can’t help but wonder how long you’re going to keep those devs before they go on to greener pastures.

In this age a company can’t afford to prohibit their developers from being able to get their job done. Innovative companies know this and openly embrace new technologies and allow their developers to experiment and even open source useful libraries. It’s time we recognized that to truly be successful we need to hire and retain people who are technically competent enough that we know they won’t click on some random facebook link to a virus. People that we know are responsible enough to make the right decisions when evaluating adopting a new technology. And above all, companies that truly succeed recognize that trust and respect are amongst the most important attributes amongst peers.

Flame off. Hopefully tomorrow I can get this dang port 22 unblocked and start our trial of using github. ;)

Subscribe to RSS Feed Follow me on Twitter!