Tag: sample

  • As I’ve said in previous posts, I like being able to generate sample data for the projects I’m working on very quickly. It allows new developers to get up to speed fast, and new developers to move faster.

    When I don’t have a sample data generation method, I’m always scare to try whether, for example, deleting a project deletes all the todos in a project tracking system. Simply because I’ll have to generate that project and all todos by hand. Many times I end up not testing those destructive actions as often as I should.

    The other reason while having a stable set of sample data is that you start to know it: “Hey! the users Paul and John are supposed to be on the same team, why I am not seeing them together? Something is broken”. To help with that I also use data that we already know. If I have teams with members I would create one team with John, Paul, George and Ringo called Beatles and another with Freddie, Brian, Roger and John called Queen. If you see Paul next to Freddie, something is broken.

    To generate the sample data I use factories; which I also use to test instead of fixtures. If you are not familiar with factories, please, stop reading and go to check factory girl. I don’t care if you never come back to this blog if you start using factories instead of fixtures. Factories is so much better! But that’s probably repeated a thousand times over the web, so I’m not going to go into details.

    In lib/tasks/data.rake I end up creating:

    namespace :db do
      desc "Generate sample data for developing"
      task :sample_data => :environment do
        destroy_data()
    
        puts "==  Data: generating sample data ".ljust(79, "=")
    
        beatles = Factory.create :team, :name => "The Beatles"
        Factory.create :user, :name => "John Lennon", :team => beatles
        Factory.create :user, :name => "Paul McCartney", :team => beatles
        Factory.create :user, :name => "George Harrison", :team => beatles
        Factory.create :user, :name => "Ringo Starr", :team => beatles
    
        queen = Factory.create :team, :name => "Queen"
        Factory.create :user, :name => "Freddie Mercury", :team => queen
        Factory.create :user, :name => "Brian May", :team => queen
        Factory.create :user, :name => "John Deacon", :team => queen
        Factory.create :user, :name => "Roger Taylor", :team => queen
    
        puts "==  Data: generating sample data (done) ".ljust(79, "=") + "\n\n"
      end
    end
    

    For the implementation of destroy_data look at Deleting all records in a Rails project.

    The problem with doing that with factories is that it is too silent. I like knowing what’s going on and for new developers it’s good to get a glimpse of the data. All users have the same password so after rake db:sample_data finishes, a new developer already know what email and password to use to log in. If you want to make it even easier, you can print out the password doing sample data generation.

    The password is of course defined in the user factory:

    Factory.define :user do |user|
      user.email { Factory.next :email }
      user.password "testing"
      user.password_confirmation "testing"
    end
    

    To be able to make factories verbose I created VFactory (for Verbose Factory of course) that you use just like Factory, but it prints out everything. This is its code:

    # Verbose factory.
    module VFactory
      def self.create *args
        human_factory_name = args.first.to_s.gsub("_", " ")
        if args.size > 1
          human_arguments = args.second.map { |name, value| "#{name}=>#{value.is_a?(Array) ? value.join(", ") : value}" }.to_sentence
          puts "-- creating #{human_factory_name} with #{human_arguments}."
        else
          puts "-- creating #{human_factory_name}."
        end
        Factory.create(*args).tap do |obj|
          puts "   -> done: #{obj}"
        end
      end
    end
    

    The output of this is more or less like this:

    ==  Data: generating sample data ==============================================
    -- creating team with name=>The Beatles.
       -> done: #
    -- creating user with name=>John Lennon and team=>#.
       -> done: #
    -- creating user with name=>Paul McCartney and team=>#.
       -> done: #
    -- creating user with name=>George Harrison and team=>#.
       -> done: #
    -- creating user with name=>Ringo Starr and team=>#.
       -> done: #
    -- creating team with name=>Queen.
       -> done: #
    -- creating user with with name=>Freddie Mercury.
       -> done: #
    -- creating user with with name=>Brian May.
       -> done: #
    -- creating user with name=>John Deacon.
       -> done: #
    -- creating user with name=>Roger Taylor
       -> done: #
    ==  Data: generating sample data (done) =======================================
    

    If you are wondering why my objects look so pretty when printed, that’s because I always define a to_s for all models that contain the id and other important data. In this case it would be:

    def to_s
       ""
    end
    

    That’s very useful for debugging. I also try to always have a name method in my models that give me something that represents the object and that I can show to the users.

    The next step in data awesomeness would be, with one command, being able to download and import all production data. This really helps reproducing and debugging reported issues; specially when those issues are related to destructive changes.

    Update: this is now a gem.