Rob Dodson - Home

Outputting JSON from Ruby

— 5 minute read
It looks like you've found one of my older posts 😅 It's possible that some of the information may be out of date (or just plain wrong!) If you've still found the post helpful, but feel like it could use some improvement, let me know on Twitter.

Last night I got the scraper to write an output.txt file which listed all the contents of words_by_selector. Today I want to make it write to JSON instead of plain text and I want to back it with some tests.

Updating our tests

Our current test for words_by_selector looks like this:

it "should produce the correct Array of keywords" do
expected_array = ['hello: 3', 'world: 2', 'foobar: 1']
actual_array = @crawler.words_by_selector(@options[:post_selector], @options[:ignored_post_selector])
actual_array.should eq(expected_array)
end

We're going to need to break that sucker so it'll produce something more like this:

it "should produce the correct Hash of keywords" do
expected_hash = {
word_count: [
{
word: 'hello',
count: 3
},
{
word: 'world',
count: 2
},
{
word: 'foobar',
count: 1
},
]
}

actual_hash = @crawler.words_by_selector(@options[:post_selector], @options[:ignored_post_selector])
actual_hash.should eq(expected_hash)
end

And we update words_by_selector to look like this:

def words_by_selector(selector, ignored_selector = nil)
node = nodes_by_selector(selector).first
if ignored_selector
ignored = node.css(ignored_selector)
ignored.remove()
end
words = words_from_string(node.content)
count_frequency(words)

sorted = @counts.sort_by { |word, count| count }
sorted.reverse!
sorted.map! do |word, count|
{ word: word, count: count }
end
{ word_count: sorted }
end

Our new test should pass. Feel free to flip one of the numbers in the expected_hash to 99 or something to see it fail.

Now let's make sure the runner takes the content out of the crawler and writes it to a JSON file.

it "should create a directory for our output" do
@runner.run
Dir.exists?('../../output').should be_true
end

it "should output the correct JSON" do
@runner.run
File.open("../../output/word_count.json") do |file|
file.each_line do |line|
puts line
end
end
end

And in runner.rb...

def run
@crawler = Tentacles::Crawler.from_uri(@options.uri)
output = @crawler.words_by_selector(@options.post_selector, 'ul:last-child')

Dir.mkdir('../../output') unless Dir.exists?('../../output')

File.open("../../output/word_count.json", "w") do |file|
file.puts JSON.pretty_generate(output)
end
end

And there we go. Our first decent output from the crawler :D -Rob

You should follow me on Twitter here.

  • Mood: Awake
  • Sleep: 6
  • Hunger: 4
  • Coffee: 0

Filed under: