Last night I presented at the West Michigan .NET Users Group. This was a Special Interest Group (SIG) focusing on ALT.NET. The talk was called ‘Unapologetically MEAN – The why & how of the MEAN Stack’ (slides here). As part of the talk I setup a MEAN app called ‘The Programmer Challenge’. The source code is available on GitHub. The instructions to run the application are in the README file. If you have any issues getting it running just let me know.
A special thanks to OST for hosting the event!
I just updated judo.js to v0.2.1. This release gives you the option of using the
phantomProcs option to specify the number of PhantomJS processes to run concurrently while generating snapshots. Another minor fix with this release is optional callbacks, which were a minor oversight in the first release. Let me know what judo is missing or how it could be improved.
Luckily this is not a new issue and Google has laid out a strategy for crawling of AJAX sites. This strategy dictates that when a request from a crawler for a dynamic page comes to the server, that you serve a static HTML version of the content. The way that the crawler typically handles the request is by replacing the hash fragment (e.g. #!) with “&_escaped_fragment_=” therefore transforming the dynamic AJAX route into query parameters for your server to parse. For my site I chose to use HTML5 pushState for my URLs since Angular provides some nice support for it. This allows me to use clean URLs without a hash fragment in them, but this also means that the crawler does not see the URLs as being dynamic. It will see them as plain URLs that it expects to be able to pull content from. To get around this you need to include the following meta tag in the head of your HTML, <meta name=”fragment” content=”!” />. This will tell the crawler to include the _escaped_fragment_ query parameter when making requests to the server.
On the back end you’ll have to configure your server to catch requests with the _escaped_fragment_ and route them accordingly. For my setup I’m using NGINX and it’s HttpRewriteModule to redirect these requests to some static HTML snapshots that I’ve already generated for the site. When I initially started digging into SEO I found a blog post specifically talking about AngularJS and SEO. The post describes how to generate static snapshots using PhantomJS, a headless WebKit browser. It’s definitely worth reading, so check it out when you have time.
After implementing my own solution for creating snapshots, I decided to create a node.js module to help others out too. Judo.js is a node module which is meant to help users generate sitemaps & HTML snapshots for their sites. The basic idea is that you describe your URLs via a configuration object to Judo and it will take care of the creation part. Some of the nice features of Judo is that you can configure the freshness of the snapshots per URL. This makes sense when you have a lot of URLs for your site. You may only want some snapshots to be re-generated monthly, but maybe others need to be re-generated daily. Another nice feature of Judo is that you can have it generate one snapshot for a given URL, but then generate multiple HTML snapshot files from it. This can help save time when you have duplicate URLs pointing to the same content.
I’ve configured Judo to run both as a scheduled cron job which updates all of my site content and then I’ve also integrated it directly into my application so that it generates snapshots on the fly for newly created content. My main reasoning for doing snapshots on the fly is not for search engine crawlers, but for the Facebook plugins on my site. When you click the ‘Share’ button on my site Facebook will crawl the page and store the data for others to share also. So you’ll have to serve snapshots to Facebook when people try to share content and because that content can be created at any time by a user I have to create the snapshot on the fly. Note: Unfortunately Facebook doesn’t respect the meta tag described above for dynamic content without hash fragments. So in my NGINX config file I actually sniff out the Facebook crawler so that I can treat it as a crawler and serve it the snapshots.
I hope that the information in this post will be helpful to others building MEAN/dynamic sites. If not let me know. Also if you are starting to build a new site that uses some kind of front end framework please think about SEO up front, otherwise it may cost you in your rankings. Lastly please check out Judo if you are looking at generating a sitemap/snapshots for your dynamic site, it’s available as a package on NPM. I’m welcome to comments/suggestions.
Last night I presented at GR Web Dev on my experiences building The Red Book on a MEAN (MongoDB, Express, Angular, Node) stack. Check out my slides here. To learn more check out the following links on some of the technologies I presented on.
- MongoDB – No-SQL Database
- Express – Web Application Framework for Node
- Batarang - AngularJS Development Chrome Extension
- PhantomJS – Headless WebKit Browser
- Yeoman – Modern Workflow for Modern Web Apps
- WebStorm IDE - IDE for Web/Node Development