Rob Dodson talks internets

  • RSS
  • Blog
  • Archives

My First Chain

May 19th, 2012

Back in April I was reading hacker news when I came across a blog posted titled ‘366 or How I Tricked Myself into Being Awesome’. It was written by a fellow named Chris Strom. It was written on blogspot. It was mostly unstyled.

Chris wrote every single day for 366 days and in so doing self-published three books on programming languages that he knew nothing about. His post was to champion that milestone. At the time I read it and thought, “I can totally do this,” meaning if I follow the steps that Chris has outlined I can potentially trick myself into becoming a blogger.

That probably seems rather silly to say but it’s true. Every developer I know Googles for answers when they’re stuck or trying to learn something new. And it seems like, over and over again, we end up in the same places. And if you’re a developer, or maybe if you’re just me, you totally revere these people. They are guides in what is a truly unfamiliar world and they do it without asking anything in return.

So I set out to try to do my own chain. I have to write until I go to Europe on June 27th. At this moment I have 22 blog posts that I’ve written as a result of the chain. Prior to that I’d written 3 in an entire year. At first I found the whole process exhilarating, until it started to get in the way of my personal activities. Now I have to figure out how to write something of substance while still balancing my job and my home life. This is not easy. It requires setting boundaries and self-discipline. I try to write in the mornings, usually between the hours of 7 to 9. Frequently I don’t finish and my posts have to be resumed in the evening. But working like this interferes with the time I can spend with my girlfriend and that breaks one of the unspoken rules I have which is that writing should not disturb my normal social life. Getting to the first 10 posts this was not a problem but now that I’m passing 20 it is. I’ve changed my writing style from full blown tutorials to more of a play-by-play as I code. I’m always striving to be more succinct in what I write but usually the challenge is disappearing down a rabbit hole while I research something new only to realize that I’ve blown half an hour of my writing time googling minutia. I’m going to try to associate googling minutia with some guy getting in between me and my girlfriend. As a result I will want to stab googling minutia.

Anyway, if you find this post and you are thinking about writing let me tell you that I highly recommend it. Here is some quick advice:

Don’t worry about what your blog looks like.

I’m 100% serious on this point. If you spend any time designing your blog before you write your first 5 articles then you’re doing it wrong. I have fallen into this trap innumerable times. Just accept this challenge: Make it to 10 blog posts, then you can redesign the thing.

I think we fall in love with the idea of having a beautiful blog and get lost in design and programming how everything will look. This is a mistake. Blogging is supposed to just be a journal of what you’re currently working on and thinking. Assume no one will read it (this is probably true). After you have a few readers then work on the look and feel if you choose. Personally I’ve found that not worrying about the design at all has been incredibly freeing. Again, look at Chris Strom’s blog. He has a ton of readers, is a published author, and is running the default blogspot theme.

Try to write at the same time every day

I find it easiest to focus in the morning especially when everyone else is asleep. I think Chris works late at night. Figure out what times suits you and do your best to stick to it. When I finish a post in the morning I feel free for the whole rest of the day. It’s kind of cool to have that sense of accomplishment before arriving at work :D

Use the best tools you can

I tried writing in Wordpress on several different occasions. I’ve also tried Tumblr and Posterous. I find writing in shitty WYSIWYG editors drives me totally crazy. There are apps out there that let you write in more of a desktop setting but I’m not sure if they are still subject to Wordpress or Tumblr’s weird formatting. Basically if you’re writing a code blog it fucking sucks to use a WYSIWYG because they’ll try to wrap all of your funky syntax in weird markup. I found Octopress and it’s been the best tool I’ve ever used for writing. I also wrote a little article on it if you’re trying to get it setup for your personal domain. Octopress is great because it uses Markdown, the same language that’s used to generate most of the pages on Github, there’s no database and you can write in any text editor. I do all of my blogging in Sublime Text 2, often times with my blog in one cell with my code in the other. Here’s a screenshot of what this can look like.

Don’t worry when no one reads it

Finally, don’t get too hung up on who is (or isn’t) reading your blog. I know that pretty much all of the visits I see in my google analytics are actually just me checking the site on my phone or laptop. Definitely do add analytics so you can see which parts are successful and which are not but don’t expect to be Daring Fireball overnight. In fact, don’t ever expect to be Daring Fireball. Keep in mind that what you’re doing is a personal journal. It may not seem like blogs are framed in that context but that’s what they’re best at. I often times find this really interesting flow where I write down what I think I should build before I build it, then I write a test, then I write the implementation. Often times I work out what I’m doing in the blog post well before I’ve even written the test. This is like a whole other kind of BDD, Blog Driven Development :) Use it for what it’s best at and you’ll find it rewarding.

Ok that’s it for now. Goodnight!

Backbone Boilerplate: Playing With Require.js

May 18th, 2012

I want to keep playing with require.js and AMD modules today so I can really internalize the concepts around them. I’m going to go through the examples in the require documentation starting with loading regular scripts and then defining modules and loading those. Here’s our boilerplate HTML:

Here is our boilerplate HTML. It’s a standard HTML5 file which just includes require.js at the bottom of the page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <meta name="viewport" content="width=device-width,initial-scale=1">

  <title>Require.js Sandbox</title>

  <!-- Application styles -->
  <link rel="stylesheet" href="/assets/css/index.css">
</head>

<body>
  <!-- Main container -->
  <div role="main" id="main"></div>

  <!-- Application source -->
  <script src="/assets/js/libs/require.js"></script>
</body>
</html>

I’m also going to define a file called foo.js which will just console log “Hello World!”. To update our HTML we’ll add the following script tag after the call to include require.js

1
2
3
4
5
<script>
    require(["foo"]);
</script>

And as expected the console outputs 'Hello World!'. Let's step it up a notch and define a module. Our first module will just return an object literal [like in this example.](http://requirejs.org/docs/api.html#defsimple) It will be a `Person` module with our name and city. We'll place it in an `app` folder in the root of our project. So our stucture looks like this:

index.html | | app/ | | person.js | | assets/ | | js/

|
|_ libs/
  |
  |_ require.js
1
The `Person` module just needs to implement a define function which takes an object as an argument. It looks like this:

js app/person.js define({

name: "Rob Dodson",
city: "San Francisco"

});

1
And in our updated index.html we're going to require that module.

html <!doctype html>

Require.js Sandbox

1
2
3
4
5
6
7
8
9
Opening up that page in the browser should give us the proper output in the console.

### AMD modules for dummies

Let's stop here for a moment to understand what's going on. In one file we implemented a `define` function and in another place we implemented a `require` function. In the most basic sense this is all we really need to do to start using AMD. I think the concept of javascript modules is really weird for most folks but if you're coming from a language like Java or Flash just think of define and require as two different interfaces that have to be implemented in order to recreate the `import` functionality that you're used to. Require.js is going to make sure everything loads properly so long as we stick to this convention.

If you're coming from more of a design background and you're used to having one big javascript file think of these modules as a way to break off pieces of code which you might otherwise put into separate script files. And I'm not talking one or two script files, I'm talking like 20 or 30. You could try to manage loading all of those dependencies yourself but that will be challenging. If you are building a blog then this probably isn't a big deal for you. In that case a few included js files is fine. But if you're trying to build a responsive web app for mobile then you're going to want to only load the bits of code you absolutely need. If a page doesn't require 90% of your JS then don't waste the time downloading it over a shitty AT&T connection.

Ok let's write a module that's a bit more realistic. We'll use a function to return our object so it's kind of like a constructor.

js app/monster.js “use strict”;

define(function () {

var estimated_age = 99 + 1;
var spookySaying = 'I vant to suck your blooood!';

return {
    name: 'Dracula',
    home: 'Florida',
    age: estimated_age,
    saySomethingSpooky: function() {
      console.log(spookySaying);
    }
};

});

1
2
3
This is a simple monster object. Notice that we build a variable called `estimated_age` right before defining our object literal. We then return this variable. If we ask for the monster's age it will return this value. It's worth noting that this makes the `estimated_age` variable private since it only lives in the scope of the anonymous function returning our object literal. We've also got a method, `saySomethingSpooky` which will print out another private variable `spookySaying`. Wow it's *almost* the JavaScript classes I've always dreamed of! Before you go thinking that remember that modules are not instanceable, meaning, when you load in a module it works like a [Singleton](http://en.wikipedia.org/wiki/Singleton_pattern) almost. You can't go monster.new() all over the place.. it just doesn't work that way. Don't get disouraged though, this is still pretty cool so let's continue...

Next up is a module with dependencies. We'll make the monster depend on his coffin.

js app/coffin.js “use strict”;

define(function () { var color = ‘Blackest black’;

return {

color: color,
open: function() {
  console.log('*creeeeeek*');
}

}; });

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
```js app/monster.js
"use strict";

define(['./coffin'], function (coffin) {
  var estimated_age = 99 + 1;
  var spookySaying = 'I vant to suck your blooood!';

  return {
    name: 'Dracula',
    home: 'Florida',
    age: estimated_age,
    saySomethingSpooky: function() {
      console.log(spookySaying);
    },
    goToSleep: function() {
      console.log('Time for bed!');
      coffin.open();
    }
  };
});
index.html
1
2
3
4
5
6
7
<script data-main="" src="/assets/js/libs/require.js"></script>
<script>
  require(['app/monster'], function(monster) {
    monster.saySomethingSpooky();
    monster.goToSleep();
  });
</script>

You can see that we’ve created a dependency for our monster, it has to load the coffin module before it’s ready to be loaded itself. Otherwise it won’t be able to run goToSleep() properly. Require.js will sort all of this out so long as we declare our dependencies as the first argument to the define function.

We aren’t limited to objects though, we can also return functions (which are objects in their own right). For instance if we wanted to define a helper module that greets people we could do something like this:

app/greet.js
1
2
3
4
5
6
7
"use strict";

define(function () {
  return function(name) {
    return 'Why hello, ' + name;
  }
});

then in our index we’ll just use the greet function as if it were globally available.

index.html
1
2
3
require(['app/greet'], function(greet) {
  console.log(greet('Rob'));
});


bear in mind that each module requires an http request to load it so you don’t want to go overboard defining helper function modules. Note the extra http request in the profiler which loads greet.js.

Ok that’s it for today. I’ll try to continue on Saturday!

Getting Familiar With Backbone Boilerplate

May 17th, 2012

I have an upcoming project which uses Backbone and Node.js so I thought it would be good to blog about the topics (particularly Backbone) for a while to make sure I’m well up to speed.

We’re using the Backbone Boilerplate to get us started since it includes a bit of file structure and a build process. As they mention in the docs you have to install Grunt if you want to use the build process they’ve stubbed out. Grunt is a javascript build tool which uses Node (think Rake in JS).

As a refresher course I’m going to dig into the open-source Backbone Fundamentals book by Addy Osmani.

First thing’s first though, after we have nodejs and grunt installed we need to also install the bbb (backbone boilerplate build, I guess?) tool. You can grab it here.

We’ll create a new folder for our project and run bbb init. If all goes well it should stub out some project directories and files for us.

The Backbone Boilerplate templates

I’ll start with the index.html file. It seems like your standard HTML5 doc with the noteable exception that it includes require.js at the bottom of the page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <meta name="viewport" content="width=device-width,initial-scale=1">

  <title>Backbone Boilerplate</title>

  <!-- Application styles -->
  <link rel="stylesheet" href="/assets/css/index.css">
</head>

<body>
  <!-- Main container -->
  <div role="main" id="main"></div>

  <!-- Application source -->
  <script data-main="app/config" src="/assets/js/libs/require.js"></script>
</body>
</html>

Require.js is a module and file loader which will help us manage our AMD modules. AMD (which stands for Asynchronous Module Definition) is a specification which details how to break JS down into modules that are loaded in, as needed, at runtime. Again we turn to Addy Osmani for a good explanation.

If you notice this block:

1
2
<!-- Application source -->
  <script data-main="app/config" src="/assets/js/libs/require.js"></script>

the data-main attribute in the script tag is telling require.js what to load first. In this case it’s the app/config.js file. If you omit the js require will add it for you. If you add the .js require will respect the path exactly as it was given. This distinction seems kind of trivial here but later on when you start configuring require with baseUrls and whatnot, it becomes more important.

Let’s look at that confg file, shall we?

app/config.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Set the require.js configuration for your application.
require.config({
  // Initialize the application with the main application file
  deps: ["main"],

  paths: {
    // JavaScript folders
    libs: "../assets/js/libs",
    plugins: "../assets/js/plugins",

    // Libraries
    jquery: "../assets/js/libs/jquery",
    underscore: "../assets/js/libs/underscore",
    backbone: "../assets/js/libs/backbone",

    // Shim Plugin
    use: "../assets/js/plugins/use"
  },

  use: {
    backbone: {
      deps: ["use!underscore", "jquery"],
      attach: "Backbone"
    },

    underscore: {
      attach: "_"
    }
  }
});

One of the first things you can do with Require is to pass it a configuration object. The config object can be used for a ton of bootstrap options like setting paths, requiring other scripts, setting timeouts, etc. The first option we see here is deps: ["main"]. We can infer this is telling require to load our main.js file first. But how does it get the path to main.js? From the docs we see that since we haven’t defined a baseUrl property require is using the path from our data-main attribute.

If no baseUrl is explicitly set in the configuration, the default value will be the location of the HTML page that loads require.js. If a data-main attribute is used, that path will become the baseUrl.

So we know that our baseUrl is app/ and anything we require will be relative to that.

Next up we have this block:

1
2
3
4
5
6
7
8
9
10
11
12
13
paths: {
    // JavaScript folders
    libs: "../assets/js/libs",
    plugins: "../assets/js/plugins",

    // Libraries
    jquery: "../assets/js/libs/jquery",
    underscore: "../assets/js/libs/underscore",
    backbone: "../assets/js/libs/backbone",

    // Shim Plugin
    use: "../assets/js/plugins/use"
  },

The paths property defines paths relative to baseUrl. If we say

1
require(["libs/module"])

require.js will look for this libs path and find it in our config file. Most of these make sense till we hit the last line which creates a path for the use plugin.

It seems like use was created by Tim Branyen, the author of the Backbone Boilerplate, to help with loading libraries that are non-AMD compliant. Most of the big libraries are currently not AMD compliant (underscore and backbone itself) so this makes sense. So instead of creating a shim for each of those libraries the use plugin should take care of things for us. We can see how it’s used further in the config file:

1
2
3
4
5
6
7
8
9
10
use: {
    backbone: {
      deps: ["use!underscore", "jquery"],
      attach: "Backbone"
    },

    underscore: {
      attach: "_"
    }
  }

Let’s start at the bottom so we can see that underscore is defined and mapped to “_”. attach is going to take whatever library we’re defining and attach it to window. So underscore will be attached as window._. Next we see that backbone is defined and depends on our version of underscore and jquery. Since jquery is AMD compliant we don’t need the call to use! but we will need it for underscore. Finally backbone is attached to the window as window.Backbone.

That covers the configuration file. I’ll move on to main.js in the next post.

  • Time: 7:49 am
  • Mood: Awake, Tired, Lazy
  • Sleep: 7
  • Hunger: 4
  • Coffee: 0

Object Oriented Scraper Backed With Tests Pt. 8

May 16th, 2012

Yesterday’s I refactored my specs and crawler to support ignoring selections. While I started parsing the metadata I quickly realized that certain bits were rather specific and needed to have custom parsing methods. Today I’m going to write some format objects to help with all that.

Our metadata on the page looks like this:

1
2
3
4
5
Time: 7:42 am
Mood: Awake, Alert, Focused
Sleep: 6
Hunger: 0
Coffee: 0

Sleep, hunger and coffee are all floats, so one object could be just FloatFormat. Mood should produce an Array of objects so we could have a CollectionFormat. Finally time is going to combine the time listed in the metadata and the post date. We’ll make a DateTimeFormat for that. These could all be methods of one big Format object as well but experience tells me that you need to be careful of monolithic actors that consume tons of different data types and spit out results. Those classes have a tendency to bloat very easily as project requirements change. I think it’s better to produce classes which can be extended or abstracted as needs arise.

So we know who is going to format but we still don’t know how. I think I’d like to build a manifest which matches the metadata category to a format. Maybe something like this?

1
2
3
4
5
6
7
{
  'Time'    => DateTimeFormat,
  'Mood'    => CollectionFormat,
  'Sleep'   => FloatFormat,
  'Hunger'  => FloatFormat,
  'Coffee'  => FloatFormat
}

I could probably look at each item and “detect” what kind of format it needs but I’d rather be explicit. If, for instance, I want to add another format, it’s a lot easier to just change my manifest file vs. hacking on some detection scheme. I think we can just produce this manifest file in YAML and load it in at runtime. One thing I don’t like about this approach is that it specifically names our format classes. You could generalize it so that it just matches a category to the desired output data, for instance 'Coffee' => Float but then you run into problems with flexibility. What if Coffee still needed to output a float but had to go through a different Format than Hunger or Sleep? With that in mind we’ll stick to the plan already laid out.

tentacles/lib/tentacles/formats.yml
1
2
3
4
5
time:     DateTimeFormat
mood:     CollectionFormat
sleep:    FloatFormat
hunger:   FloatFormat
coffee:   FloatFormat

The Format object

I would love it if I could use the Format object as a module and just call a method on it from Crawler. It might look like this:

1
2
3
4
5
6
7
def metadata_by_selector(selector)
  node = nodes_by_selector(selector).first
  metadata = {}
  node.children.each do |child|
    Tentacles::Format.insert(child, metadata)
  end
end

The only problem is Format needs to load in and parse its formats.yml file before it’s any good to us. There’s some interesting talk of the Module#autoload method but that’s not quite what I need…

Seems like I can’t find any good documentation on this so instead we’ll make it an instance of the class. Also I’m lazy so I’m going to have that instance load its own formats.yml file. Normally I like to only have one entry point for configuration files but…whatever.

How do I convert a string into a class name in Ruby?

Well we know we can load our YAML file but all of our format classes are going to come in as strings. I did some digging to figure out how to convert the string into an actual class that can then be instantiated. If you just want to convert a String into a class you can use Object.const_get('Foobar').new but that’s not going to work for us since our code is wrapped in a module. To convert a string into a module class we’ll need to use the name of our module: Tentacles.const_get('DateTimeFormat').new.

With that in mind I want to spec out a simple test that passes in string of metadata and receives a printed notification that the right formatter has been created. We’ll then refactor it to actually use the formatter on the string.

tentacles/spec/format_spec.rb
1
2
3
4
5
6
7
8
9
10
11
require_relative '../lib/tentacles/format'
require_relative '../lib/tentacles/date_time_format'

describe Tentacles::Format do
  describe "when asked to parse some metadata" do
    it "should create the right formatter" do
      @format = Tentacles::Format.new
      @format.parse('Time: 8:03 am').should be_an_instance_of(Tentacles::DateTimeFormat)
    end
  end
end
tentacles/lib/format.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
require 'yaml'
require_relative 'date_time_format'

module Tentacles
  class Format
    def initialize
      @categories = YAML.load(File.open(File.dirname(__FILE__) + '/formats.yml'))
    end

    def parse(data)
      category = data.split(':')[0]
      category.downcase!
      Tentacles.const_get(@categories[category]).new
    end
  end
end
tentacles/lib/date_time_format.rb
1
2
3
4
5
6
7
module Tentacles
  class DateTimeFormat
    def initialize
      puts 'DateTimeFormat created!'
    end
  end
end

Now let’s take it a step further so we can convert an actual time into a DateTime object. Here’s our updated spec:

1
2
3
4
5
6
7
8
9
10
11
require_relative '../lib/tentacles/format'
require 'date'

describe Tentacles::Format do
  describe "when asked to parse some metadata" do
    it "should create the right formatter" do
      @format = Tentacles::Format.new
      @format.parse('Time: 8:03 am').should be_an_instance_of(Date)
    end
  end
end

To pull this off we’ll need the help of at least 2 new gems: Chronic and ActiveSupport. Chronic is a natural language parser which can convert strings into useable timestamps. ActiveSupport is a library of extensions originally created for Rails which have been abstracted into a general purpose toolset. We’re going to combine these two gems to turn the phrase “8:03 am” into a Ruby DateTime.

Gotta first update the Gemfile with our new dependencies and run bundle install.

1
2
3
4
5
6
7
8
source 'https://rubygems.org'

gem 'rspec', '2.9.0'
gem 'nokogiri', '~>1.5.2'
gem 'awesome_print', '~>1.0.2'
gem 'fakeweb', '~>1.3.0'
gem 'chronic', '~> 0.6.7'
gem 'activesupport', '~> 3.2.3'

Next we bang out a quick parse method inside of DateTimeFormat. Our Tentacles::Format is going to delegate its parse call to whichever subordinate formatter it creates. Code speaks louder than words:

tentacles/lib/tentacles/format.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
require 'yaml'
require_relative 'date_time_format'

module Tentacles
  class Format
    def initialize
      @categories = YAML.load(File.open(File.dirname(__FILE__) + '/formats.yml'))
    end

    # Create a formatter based on the content of the passed
    # in data. Delegate the parse call to this new formatter
    def parse(data)
      category, *content = data.split(':')
      category.downcase!
      formatter = Tentacles.const_get(@categories[category]).new
      formatter.parse(content)
    end
  end
end
tentacles/lib/tentacles/date_time_format.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
require 'chronic'
require 'active_support/core_ext/string/conversions.rb'

module Tentacles
  class DateTimeFormat
    def initialize
      puts 'DateTimeFormat created!'
    end

    def parse(content)
      Chronic.parse(content.join(':')).to_datetime
    end
  end
end

With all that in place our test should pass. Nice!!!!!! We’re well on our way to processing the remaining metadata. Tomorrow I’ll whip up our other formats and figure out how to pull the date out of a blog post so we can combine that with the time to get a proper DateTime.

  • Time: 7:42 am
  • Mood: Awake, Alert, Focused
  • Sleep: 6
  • Hunger: 0
  • Coffee: 1

Object Oriented Scraper Backed With Tests Pt. 7

May 15th, 2012

During my last post I realized that including my metadata in the blog post as only a ul meant that all the words were being scraped as part of the keyword frequency search. After thinking about it for a while I think I’m going to give the keyword search method an optional value which it can use to ignore or delete certain nodes.

Thankfully I have my tests in place to validate what our final output should look like. Which means I’m basically hacking away at Nokogiri to get things to pass. Here’s what I finally settle on:

1
2
3
4
5
6
7
8
9
10
11
12
13
def words_by_selector(selector, ignored_selector = nil)
  node = nodes_by_selector(selector).first
  if ignored_selector
    ignored = node.css(ignored_selector)
    ignored.remove()
  end
  words = words_from_string(node.content)
  count_frequency(words)

  sorted = @counts.sort_by { |word, count| count }
  sorted.reverse!
  sorted.map { |word, count| "#{word}: #{count}"}
end

I think the code is pretty self explanatory. Moving on to the metadata we expect a Hash that looks like this:

1
2
3
4
5
6
7
{
  datetime: 2012-05-13T08:03:00-07:00,
  mood: ['Happy', 'Drowsy', 'Peaceful'],
  sleep: 5.5,
  hunger: 3.0,
  coffee: 0.0
}

As I’m playing back and forth with the metadata selector methods I’m realizing that writing non-brittle tests is extremely difficult!

I’m noticing that some of the metadata, when broken into Strings, don’t parse very well. For instance:

Time: 8:03 splits up into ["Time", " 8", "03"]

We can use a splat operator to clean that up a bit for us:

1
2
3
4
5
6
7
8
def metadata_by_selector(selector)
  node = nodes_by_selector(selector).first
  metadata = {}
  node.children.each do |child|
    key, *value = child.content.split(':')
    puts "#{key}: #{value}"
  end
end

The above should produce something like:

1
2
3
4
5
Time: [" 8", "03 am"]
Mood: [" Happy, Drowsy, Peaceful"]
Sleep: [" 5.5"]
Hunger: [" 3"]
Coffee: [" 0"]

Close… but still not perfect. I think the best thing to do would be to write some formatter objects or functions to handle the different kinds of metadata. We’ll tackle that tomorrow.

  • Time: 9:34pm
  • Mood: Fat, Tired, Drunk
  • Sleep: 6
  • Hunger: 0
  • Coffee: 1

Hacking the PATH Variable in Sublime Text

May 14th, 2012

This is going to be a bit of a lightning post but I wanted to quickly show off how to edit the PATH variable that Sublime text uses. I should warn you that that I am neither an expert in Python nor am I a very seasoned Sublime user. So having said that take all of this with a grain of salt and use at your own risk.

Our first (crappy) plugin!

Sublime has a great plugin architecture that makes it extremely easy to add to the platform. If you create a .py file in the ~/Library/Application Support/Sublime Text 2/Packages/User/ folder it will be loaded as soon as Sublime starts. Writing plugins seems to be actually quite easy based on their documentation and examples. We won’t be following the typical plugin architecture since we’re just trying to hack a system variable and that doesn’t seem to necessitate the use of their built in modules.

Here’s a script I’m calling Pathway at the moment.

1
2
3
4
5
6
7
8
9
10
11
12
13
import os

LOCAL = '/usr/local/bin:/usr/local/sbin:'
HOME = '/Users/Rob'  ### !!! REPLACE WITH YOUR HOME PATH !!! ###
RVM = HOME + '/.rvm/bin:'

# Sublime's default path is
# /usr/bin:/bin:/usr/sbin:/sbin
os.environ['PATH'] += ':'
os.environ['PATH'] += LOCAL
os.environ['PATH'] += RVM

print 'PATH = ' + os.environ['PATH']

If you add this file to the Sublime user’s directory outlined above you should be able to hit cmd + ` to fire up the Sublime console which will print out our new PATH variable.

I would also recommend adding a shell plugin to Sublime. At the moment I use Shell Turtlestein..

Now that I have my hacked path variable and my shell plugin I can check to see if RVM works. Using Shell Turtlestein you can hit cmd-shift-c to open a little console prompt. Typing rvm current returns our ruby version number and gemset. Nice! What’s even nicer is this means I can now run Rake tasks from inside of Sublime!

I should point out if all you want to do is run Rake or Ant then there are already plugins for that sort of thing. My main effort in doing all this is to try to integrate the command line with Sublime a bit better. If anyone knows how to simply tell Sublime to use the path in my .bash_profile or .bashrc then I would gladly use that approach instead. But after crawling the forums for a while it looks like this is still a common problem with no good solution.

  • Time: 8:26 pm
  • Mood: Happy, Peaceful, Hurried
  • Sleep: 7
  • Hunger: 6
  • Coffee: 0

Object Oriented Scraper Backed With Tests Pt. 6

May 13th, 2012

Yesterday we verified that our Crawler was able to hit a document and, given the right selector, pull down a list of words and their frequency on the page. We also created a custom exception to be used whenever the selector fails to pull down the right content. I’m going to repeat this process today with the get_metadata_by_selector. If there’s time we’ll try to output another file with our data, otherwise that’ll be tomorrow’s homeworkd :D

Let’s take a moment to look at today’s metadata to figure out what we’d like our output to reflect.

1
2
3
4
5
- Time: 8:03 am
- Mood: Happy, Drowsy, Peaceful
- Sleep: 5.5
- Hunger: 3
- Coffee: 0

That’s the actual markdown that goes into the editor but it gets converted into a ul. I don’t think you can pass a CSS class to markdown syntax otherwise I’d use one here. We could go back and wrap everything in regular HTML tags but since we know that our metadata is going to be the last ul per entry we’ll just use that knowledge to build our selector. Obviously a more robust solution would use a CSS class so that might be a good refactoring for the future.

I figure for now we’ll just parse the metadata into a Hash that’ll look something like this:

1
2
3
4
5
6
7
{
  datetime: 2012-05-13T08:03:00-07:00,
  mood: ['Happy', 'Drowsy', 'Peaceful'],
  sleep: 5.5,
  hunger: 3.0,
  coffee: 0.0
}

In the final iteration we’ll toss all of our Metadata Hashes into an ordered Array so we can visualize them over time.

Red, Green, Refactor

Ok, time for a failing test. Let’s make sure that our selector pulls something down and if it doesn’t we should raise the custom SelectionError we defined yesterday. I’m already seeing some repetitive code in our Crawler so I’m refactoring it. Where we need to get a group of XML nodes from the document via selector I’ve created a private helper called nodes_by_selector. This is also where we’ll raise our exception if nothing came back. I’m also cleaning up some of the word cruff from our public API so instead of get_words_by_selector it’s not just words_by_selector. The same goes for our metadata method.

tentacles/lib/tentacles/crawler_rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
require 'open-uri'
require 'nokogiri'
require_relative 'selection_error'

module Tentacles
  class Crawler

    attr_reader :doc

    def self.from_uri(uri)
      new(uri)
    end

    def initialize(uri)
      @uri = uri
      @doc = Nokogiri::HTML(open(@uri))
      @counts = Hash.new(0)
    end

    def words_by_selector(selector)
      nodes = nodes_by_selector(selector)
      nodes.each do |node|
        words = words_from_string(node.content)
        count_frequency(words)
      end

      sorted = @counts.sort_by { |word, count| count }
      sorted.reverse!
      sorted.map { |word, count| "#{word}: #{count}"}
    end

    def metadata_by_selector(selector)
      nodes = nodes_by_selector(selector)
    end

  private

    def nodes_by_selector(selector)
      nodes = doc.css(selector)
      raise Tentacles::SelectionError,
        'The selector did not return an results!' if nodes.empty?
      nodes
    end

    def words_from_string(string)
      string.downcase.scan(/[\w']+/)
    end

    def count_frequency(word_list)
      for word in word_list
        @counts[word] += 1
      end
      @counts
    end
  end
end

Going back to the tests we need to refactor a bit for any place that’s been broken. Immediately I saw that my nodes_by_selector method was not initially returning the nodes so I added that back in. The tests brought that to my attention before I had to do any potentially painful debugging. Beyond that we just need to fix up our method names:

tentacles/spec/crawler_spec.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
require_relative '../lib/tentacles/crawler'
require 'fakeweb'

describe Tentacles::Crawler do

  before do
    # Create a mock options object
    @options = {
      uri: 'http://robdodson.me',
      post_selector: '.entry-content',
      metadata_selector: '.personal-metadata'
    }

    # Create a mock web request
    FakeWeb.register_uri(:get, @options[:uri],
                         :body => '<div class="' + @options[:post_selector].delete(".") +
                         '">Hello Hello Hello World World Foobar!</div>')
  end

  describe "constructors" do
    describe "#from_uri" do
      it "should respond" do
        Tentacles::Crawler.should respond_to(:from_uri)
      end

      it "should return an instance" do
        crawler = Tentacles::Crawler.from_uri(@options[:uri])
        crawler.should be_an_instance_of(Tentacles::Crawler)
      end
    end
  end

  describe "instances" do
    before do
      @crawler = Tentacles::Crawler.from_uri(@options[:uri])
    end

    subject { @crawler }

    it { should respond_to(:words_by_selector) }
    it { should respond_to(:metadata_by_selector) }

    context "post-construct" do
      it "should have the right document" do
        @crawler.doc.content.should =~ /Hello Hello Hello World World Foobar!/
      end
    end

    describe "#words_by_selector" do
      it "should produce an Array of keywords" do
        expected_array = ['hello: 3', 'world: 2', 'foobar: 1']
        actual_array = @crawler.words_by_selector(@options[:post_selector])
        actual_array.should eq(expected_array)
      end

      it "should raise an exception if nothing was returned" do
        expect { @crawler.words_by_selector('some-gibberish-selector') }.to raise_error(Tentacles::SelectionError, 'The selector did not return an results!')
      end
    end

    describe "#metadata_by_selector" do
      it "should raise an exception if nothing was returned" do
        expect { @crawler.metadata_by_selector('some-gibberish-selector') }.to raise_error(Tentacles::SelectionError, 'The selector did not return an results!')
      end
    end
  end
end

We’ve got a duplicate test in there where both #words_by_selector and #metadata_by_selector are checking that they both raise an exception if nothing comes down. Let’s see if we can refactor those into an RSpec shared example. I’m not sure if this is a best practice or not but here’s my implementation:

tentacles/spec/crawler_spec.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
shared_examples_for "all selector methods" do
  describe "when selection has no nodes" do
    it "should raise an exception" do
      expect { @crawler.send(selector_method, 'some-gibberish-selector') }.to raise_error(Tentacles::SelectionError, 'The selector did not return an results!')
    end
  end
end

### ...

describe "#words_by_selector" do
  it_behaves_like "all selector methods" do
    let(:selector_method) { :words_by_selector }
  end

# ...

end

describe "#metadata_by_selector" do
  it_behaves_like "all selector methods" do
    let(:selector_method) { :metadata_by_selector }
  end
end

Basically we’re putting our method name as a symbol into a variable using let and then calling that method in the shared_examples_for block. Notice how we’re using @crawler.send(selector_method, ...)? In this case selector_method refers to our method name symbol.

If you run this in RSpec’s nested mode it looks pretty cool:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Tentacles::Crawler
  constructors
    #from_uri
      should respond
      should return an instance
  instances
    should respond to #words_by_selector
    should respond to #metadata_by_selector
    post-construct
      should have the right document
    #words_by_selector
      should produce an Array of keywords
      behaves like all selector methods
        when selection has no nodes
          should raise an exception
    #metadata_by_selector
      behaves like all selector methods
        when selection has no nodes
          should raise an exception

Ok, so we know that all of our selector methods raise the proper exception if they are called with a bunk selector. Now let’s make sure we can get our metadata downloaded and structured.

Unfortunately I’m realizing that if the ul for our metadata is part of the post then those words get counted along with everything else, which is not what I want. I need to figure out how to exclude that content…

I could either tell my crawler to explicitly ignore that content or wrap my blog entry in an even more specific class and just select that. I guess that’ll be an exercise for tomorrow :\

  • Time: 8:03 am
  • Mood: Happy, Drowsy, Peaceful
  • Sleep: 5.5
  • Hunger: 3
  • Coffee: 0

Object Oriented Scraper Backed With Tests Pt. 5

May 12th, 2012

Last night I got the Crawler passing its test for #get_words_by_selector. This morning I realize that when someone sends in a junk selector I want to raise an exception of some kind. Since I don’t know much about Ruby Exceptions I’m doing a little digging…Ruby has both throw/catch and raise/rescue so what’s the difference between throw/catch and raise/rescue in Ruby?

Throwing exceptions for control flow

There’s a great guest post by Avdi Grimm on RubyLearning which covers this topic in depth. To summarize throw/catch is mainly used when doing exceptions as control flow. In other words, if you need to break out of a deeply nested loop or some other expensive operation you can throw an exception symbol which can be caught someone high up the call stack. Initially this rubbed me the wrong way since I know that things like goto and labels are a bad practice. Someone else raised this point in the comments to which Avid responded:

There is a fundamental difference between throw/catch and goto. Goto, in languages which support it, pays no attention to the stack. Any resources which were allocated before the goto are simply left dangling unless they are manually cleaned up.

throw/catch, like exception handling, unwinds the stack, triggering ensure blocks along the way. So, for example, if you throw inside an open() {…} block, the open file will be closed on the way up to the catch() block.

Raising exceptions for everything else

With throw/catch out of the way that leaves raise/rescue to handle everything else. I’m willing to bet that 99% of error code should probably be raising exceptions and throw/catch should only be used in situations where you need the control flow behavior. With that knowledge in hand I need to decide between one of Ruby’s built-in Exceptions or defining one of my own. Let’s define one of our own so we can get that experience under our belt.

Creating an exception subclass in Ruby

One tip I picked up while doing my research into raise and throw is that any exception that doesn’t subclass StandardError will not be caught by default. Here’s an example to illustrate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
###
# First we define an exception class which doesn't
# inherit from StandardError. As a result it won't
# be caught by a simple rescue. Instead we would
# need to rescue by its class name
###
class MyBadException < Exception
end

def miss_bad_exception
  raise MyBadException.new
  rescue
  p "I'll never be called :("
end

miss_bad_exception
MyBadException: MyBadException
  from (irb):4:in `miss_bad_exception'
  from (irb):8
  from /Users/Rob/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in `<main>

# See that calling the method produces an uncaught exception...


###
# Next we'll subclass StandardError. As a result
# we won't have to explicitly define our class name
# for a rescue to work.
###
class MyGoodException < StandardError
end

def save_good_exception
  raise MyGoodException.new
  rescue
  p "I'm saved! My hero!"
end

save_good_exception
"I'm saved! My hero!"

# Yay! Our exception was caught!

We’ll call our Exception SelectorError to indicate that the provided selector did not return any results. For reference I often refer to this chart on RubyLearning when I want to see a list of all the available Exception classes. In our case we’ll just inherit from StandardError.

tentacles/lib/selection_error.rb
1
2
3
4
module Tentacles
  class SelectionError < StandardError
  end
end

I don’t think we actually need to do much more than that. The ability to pass a payload message should come from the super class so I think we’re good to go. Here’s our updated spec:

1
2
3
it "should raise an exception if nothing was returned" do
        expect { @crawler.get_words_by_selector('some-gibberish-selector') }.to raise_error(Tentacles::SelectionError, 'The selector did not return an results!')
end


Initially the test fails so now we need to update our Crawler to check if nothing was returned and raise the custom exception.

Here’s our updated Crawler with additional require and updated method.

tentacles/lib/crawler.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
require 'open-uri'
require 'nokogiri'
require_relative 'selection_error'

module Tentacles
  class Crawler

    attr_reader :doc

    def self.from_uri(uri)
      new(uri)
    end

    def initialize(uri)
      @uri = uri
      @doc = Nokogiri::HTML(open(@uri))
      @counts = Hash.new(0)
    end

    def get_words_by_selector(selector)
      entries = doc.css(selector)
      raise Tentacles::SelectionError,
        'The selector did not return an results!' if entries.empty?
      entries.each do |entry|
        words = words_from_string(entry.content)
        count_frequency(words)
      end

      sorted = @counts.sort_by { |word, count| count }
      sorted.reverse!
      sorted.map { |word, count| "#{word}: #{count}"}
    end

    def get_metadata_by_selector(selector)
      # TODO
    end

  private

    def words_from_string(string)
      string.downcase.scan(/[\w']+/)
    end

    def count_frequency(word_list)
      for word in word_list
        @counts[word] += 1
      end
      @counts
    end
  end
end

All tests passing, we’re good to go :)

  • Time: 7:00 am
  • Mood: Alert, Awake, Anxious
  • Sleep: 8
  • Hunger: 3
  • Coffee: 0

Object Oriented Scraper Backed With Tests Pt. 4

May 11th, 2012

Continuing from our previous post we’re going to keep working on our Crawler and our specs to see if we can start pulling real data from our site.

The first thing I did this morning was to run my tests:

1
2
3
4
5
6
bundle exec rspec spec/

..............

Finished in 0.01271 seconds
14 examples, 0 failures

As someone totally new to TDD/BDD this is kind of an awesome feeling. I left my code for a few days and now I can come back and verify that everything still works. We can take it even further and run rspec with a documentation formatter to get some pretty printed output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
bundle exec rspec spec/ -cf d

Tentacles::Crawler
  constructors
    #from_uri
      should respond
      should return an instance
  instances
    should respond to #get_words_by_selector
    should respond to #get_metadata_by_selector

Tentacles::Options
  should respond to #uri
  should respond to #post_selector
  should respond to #metadata_selector
  #initialize
    when parsing the URI
      when URI is valid
        should display the right URI
      when URI is invalid
        should raise an exception
      when URI does not contain a scheme
        should raise an IO exception
      when URI does not contain a host
        should raise an IO exception

Tentacles::Runner
  should respond to #run
  when parsing the config file
    should raise an error if the config file is missing
    should raise an error if the config file is invalid

Finished in 0.01359 seconds
14 examples, 0 failures

In rspec the -c flag enables color in the output. The -f flag sets a formatter and d specifies the documentation format.

1
2
3
4
5
6
-f, --format FORMATTER           Choose a formatter.
                                       [p]rogress (default - dots)
                                       [d]ocumentation (group and example names)
                                       [h]tml
                                       [t]extmate
                                       custom formatter class name

Neat.

In crawler_spec.rb I’m going to add a test that checks to see if our instance has actually stored the content from our mocked web request.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
require_relative '../lib/tentacles/crawler'
require 'fakeweb'

describe Tentacles::Crawler do

  before do
    # Create a mock options object
    @options = {
      uri: 'http://robdodson.me',
      post_selector: '.entry-content',
      metadata_selector: '.personal-metadata'
    }

    # Create a mock web request
    FakeWeb.register_uri(:get, @options[:uri], :body => "Hello World! Hello San Francisco!")
  end

  describe "constructors" do
    describe "#from_uri" do
      it "should respond" do
        Tentacles::Crawler.should respond_to(:from_uri)
      end

      it "should return an instance" do
        crawler = Tentacles::Crawler.from_uri(@options[:uri])
        crawler.should be_an_instance_of(Tentacles::Crawler)
      end
    end
  end

  describe "instances" do
    before do
      @crawler = Tentacles::Crawler.from_uri(@options[:uri])
    end

    subject { @crawler }

    it { should respond_to(:get_words_by_selector) }
    it { should respond_to(:get_metadata_by_selector) }

    context "post-construct" do
      it "should have the right document" do
        @crawler.doc.content.should =~ /Hello World! Hello San Francisco!/
      end
    end
  end
end

I want to write a test to parse the content for keywords but I realize now that our FakeWeb request returns a string without any classes or id’s. Gotta go back and wrap it in some HTML to match our selectors. So I’m changing the mock web request to look like this:

1
2
3
# Create a mock web request
    FakeWeb.register_uri(:get, @options[:uri],
                         :body => '<div class="' + @options[:post_selector] + '">Hello World! Hello San Francisco!</div>')

Hello Hello Hello World!

After a lot of back and forth I finally get my test to pass. I realize along the way that there are a bunch of things I need to change. For starters having most of my words be the same count doesn’t really help me to validate that my keyword counting is working all that well. So I’m changing our FakeWeb request and the subsequent specs which test against it.

1
2
3
# Create a mock web request
    FakeWeb.register_uri(:get, @options[:uri],
                         :body => '<div class="' + @options[:post_selector].delete(".") + '">Hello Hello Hello World World Foobar!</div>')
1
2
3
4
5
context "post-construct" do
  it "should have the right document" do
    @crawler.doc.content.should =~ /Hello Hello Hello World World Foobar!/
  end
end

Next I need to make sure that my get_words_by_selector method is accepting a selector.

1
2
3
4
5
6
7
8
9
10
11
def get_words_by_selector(selector)
      entries = doc.css('div.entry-content')
      entries.each do |entry|
        words = words_from_string(entry.content)
        count_frequency(words)
      end

      sorted = @counts.sort_by { |word, count| count }
      sorted.reverse!
      sorted.map { |word, count| "#{word}: #{count}"}
    end

I also realize that I’d like my Array of keywords to be in desceding order so I reverse it after the initial sort.

Next I’m going to write the test to verify that we’ve received a group of words, counted them up and tossed them into an Array in descending order:

1
2
3
4
5
6
7
describe "#get_words_by_selector" do
  it "should produce an Array of keywords" do
    expected_array = ['hello: 3', 'world: 2', 'foobar: 1']
    actual_array = @crawler.get_words_by_selector(@options[:post_selector])
    actual_array.should eq(expected_array)
  end
end

I actually wrote the test first and did everything else to make it pass. But at this point it should all be passing and we can verify that given a request with the appropriate selector we should be able to build a basic word frequency list. Yay!

  • Time: 7:35 am
  • Mood: Calm, Awake, Curious
  • Sleep: 7
  • Hunger: 4
  • Coffee: 0

Design by Configuration Sucks

May 10th, 2012

What is design by configuration?

As an experienced developer if you find that you are performing the same actions over and over naturally your brain will start to think “Hey, this isn’t very DRY”. DRY, or the principle of “Don’t Repeat Yourself” is pretty common dogma for most developers. How many times have you heard something like, “If you’re doing it twice, you’re doing it wrong.” Typically when I do an action more than once I start looking for ways to wrap the work into functions or objects. This process can easily lead to what some refer to as “Design by Configuration,” or breaking your work into configurable operations. To explore this concept a bit more, and why I think it’s rather brittle, let’s come up with a hypothetical. In our scenario we’re working for a large company redesigning their web presence. On each page we have widgets of various shapes and sizes. Here’s an example of some:

1
2
3
4
5
6
7
8
9
10
11
<div class="widget-container grey-background rounded-corners">
  <div class="widget" title="Awesome Widget" data-foo="bar">
    <p>Hey I'm an awesome widget!</p>
  </div>
</div>

<div class="widget-container red-background square-corners">
  <div class="widget" title="Stellar Widget" data-foo="baz">
    <p>Bodly going where no widget has gone before...</p>
  </div>
</div>

You’ve probably already noticed that our two widgets are nearly identical with only subtle differences in the classes, titles and paragraph content. That seems like a great candidate for automation! Because we don’t know the names of, or how many classes we might support, we’ll try to make it really flexible so we can pass in tons of different values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
function makeWidget(attributes) {
    var classes,
        title,
        data,
        paragraph,
        widget;

    classes = attributes['classes'] || '';
    title = attributes['title'] || '';
    data = attributes['data'] || '';
    paragraph = attributes['paragraph'] || '';

    widget = '<div class="' + classes + '">'+
             '<div class="widget" title="' + title + '" data-foo="' + data + '">'+
             '<p>' + paragraph + '</p>'+
             '</div>'+
             '</div>';
    $('body').append($(widget));
}

makeWidget({ classes: 'widget-container grey-background rounded-corners', title: 'Hello World!', data: 'ribeye', paragraph: 'Neato paragraph!'});

You can play around with the previous code snippet and create your own widgets in the console or on jsFiddle. Writing a little function like this seems pretty standard for a lot of cases and I don’t want to argue against it entirely but I do want to point out a few gotchas.

Everything was perfect. Until it wasn’t

Let’s say that our code works perfectly. We do about 95% of the project and toward the end the client mentions an extra widget that slipped their mind. They’d like it to act just like all the other widgets but they also want to add an additional class to the p tag. “Not a problem,” you think, “I’ll just add a paragraphClasses attribute to our hash.”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
function makeWidget(attributes) {
    var classes,
        title,
        data,
        paragraph,
        paragraphClasses
        widget;

    classes = attributes['classes'] || '';
    title = attributes['title'] || '';
    data = attributes['data'] || '';
    paragraph = attributes['paragraph'] || '';
    paragraphClasses = attributes['paragraphClasses'] || '';

    widget = '<div class="' + classes + '">'+
             '<div class="widget" title="' + title + '" data-foo="' + data + '">'+
             '<p ' + paragraphClasses + '>' + paragraph + '</p>'+
             '</div>'+
             '</div>';
    $('body').append($(widget));
}

makeWidget({ classes: 'widget-container grey-background rounded-corners', title: 'Hello World!', data: 'ribeye', paragraph: 'Neato paragraph!'});

Easy enough right? Well, yeah… except you just changed one line that affects ALL of your widgets. Hope you got all those quotation marks perfect!

Later on your client decides that they’d like to add one more widget but this time it should have two paragraph tags instead of one. That puts us in a bit of a dilemma… We could modify our makeWidget function to maybe check if there’s a subParagraph attribute, or we could just hand code this one widget on this one page. Er..did I say one page? Well actually the client just called and said this widget will need to appear on 4 pages.

At this point we can either hack our makeWidget function, create an entirely new function like makeSuperWidget or we could hand code a custom widget in 4 places and hope that if there are any changes we remember to update all 4. Typically I think most people choose either the first or second option, figuring that the changes to the original function are small enough or that creating a new function is still much DRYer than hand coding the thing 4 times.

At this point I feel like we’ve now fallen into the trap of design by configuration. Basically we’ve setup our function to accept configuration parameters but the core elements are static and extremely hard to change. We can add lots of classes to our containing div and our first p tag but what if we want to add other attributes? Do we need to break open the code every time?

I think a better solution looks a lot like the syntax for D3.js, which provides helpers to make the process of widget creation easier, but it doesn’t completely remove the developer from the process. Here’s some pseudo code to illustrate what I think might be a better approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
widget = make('div')
        .attr('class', 'widget-container grey-background rounded-corners')
        .attr('title', 'Sweet containing div')
        .attr('data-zerp', 'porkchops')
      .append('div')
        .attr('class', 'widget')
        .attr('title', 'Slick inner div')
        .attr('data-foo', 'short-rib')
        .attr('data-bar', 'cutlet')
        .attr('data-baz', 'filet')
      .append('p')
        .html('I can haz contents?')
      .sibling('p')
        .html('I too can haz contents?');

Unfortunately this is still a lot of code and my first solution to slim it down is to create a helper function. At that point we’re basically back to design by configuration… I’m not entirely ready to give up on this approach because I feel like their might be something here, I’m just not sure what yet. I think the design by configuration problem falls right into that sweet spot between not needing to create a factory and obviously needing to create a factory. I’ll try to explore this more in a later post. For now it’s time for bed.

← Older Blog Archives

Recent Posts

  • My First Chain
  • Backbone Boilerplate: Playing with Require.js
  • Getting Familiar with Backbone Boilerplate
  • Object Oriented Scraper Backed with Tests Pt. 8
  • Object Oriented Scraper Backed with Tests Pt. 7

GitHub Repos

  • Status updating…
@robdodson on GitHub

Latest Tweets

  • Status updating…
Follow @rob_dodson

My Pinboard

    Fetching linkroll…

My Pinboard Bookmarks »

Copyright © 2012 - Rob Dodson - Powered by Octopress