1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 #See TumblrCleanr class for full documentation require 'rubygems' # Note: not using Tumblr API because it does not support delete # http://ruby-tumblr.rubyforge.org/ require 'mechanize' # http://mechanize.rubyforge.org/mechanize require 'net/http' # http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html require 'uri' require 'logger' #== TumblrCleanr - clean/reset your tumblr by deleting all posts #Author:: engtech (http://InternetDuctTape.com, http://rubeh.tumblr.com) #Copyright:: Copyright (c) 2008 engtech #License:: Creative Commons Attribution-Noncommercial 2.5 License #Id:: $Id: $ # #Tumblr is rapidly becoming my favorite hosted blogging platform (more so than Blogger/WordPress.com) because of all the things they do correct: # #- RSS feed importing #- free domain name support #- free CSS/theme support #- Google Analytics support #- keeping it simple # #However, there's one feature that's missing: how do you delete your Tumblr? At some point you might want to destroy all traces of your tumblr (privacy concerns, or you want to use it for something else) and there isn't an option to do that -- other than click the delete button on every individual post. I wanted to repurpose a tumblr I had been using for feed aggregation and it had over 18,000 posts. That's a lot of clicks. # #Enter the TumblrCleanr. Provide it with your tumblr domain name as well as your username and password and it will delete up to the latest 3000 posts at a time. You can keep running it until your entire tumblr is clean as a whistle. # #== Privacy Concerns # #TumblrCleanr does not store your login information anywhere and only uses it to communicate with tumblr.com. Every time you run the program you will have to re-enter your login details. # #== License # #This work is licensed under the Creative Commons #Attribution-Noncommercial 2.5 License. # #To view a copy of this license, visit # http://creativecommons.org/licenses/by-nc/2.5/ or #send a letter to # Creative Commons, 543 Howard Street, 5th Floor, # San Francisco, California, 94105, USA. # class TumblrCleanr # tumblr domain name is set by login method @domain # email address is set by login method @email # password is set by login method @password # WWW::Mechanize agent is created in login method @agent # Array of tumblr postids is set by post_archive method @postids #Initializing TumblrCleanr will start an interactive prompt asking your your tumblr domain name, email address and password. #domain name:: the domain name used to access your tumblr, without the http:// prefix (eg: popstar.tumblr.com) #email address:: the email address used to login to tumblr (eg: brittney.spears@gmail.com) #password:: the password for your tumblr account, not the password for your email address # #When the program finishes the user is prompted to press enter to quit. # def initialize begin puts "Welcome to TumblrCleanr by http://InternetDuctTape.com" query_loop parse_archive clean print "Success" rescue Interrupt => e puts "User pressed Ctrl-C" rescue Exception => e puts "#{e.class}: #{e.message}" puts e.backtrace.join("n") end puts "Press enter to exit..." gets end private #Keeps asking the user for their login information until they enter something that works. #Press Ctrl-C to exit the loop (and the program). # def query_loop login_success = false while not login_success do begin query_user login login_success = true rescue Interrupt => e raise Interrupt, "user abort" rescue Exception => e puts "#{e.class}: #{e.message}" puts e.backtrace.join("n") login_success = false puts "" puts "Unable to login with #{@email}/#{@password} on #{@domain}" puts "Type 'Ctrl-C' to abort" end end end #Asks the user for @domain, @email, and @password # #Side Effect:: sets @domain, @email and @password # def query_user puts "" puts "Tumblr domain (ie: popstar.tumblr.com): " @domain = gets.chomp puts "Tumblr email address (ie: brittney.spears@gmail.com): " @email = gets.chomp puts "Tumblr password (ie: kfedsux): " @password = gets.chomp end #Creates a WWW::Mechanize @agent and uses it to verify that @domain is correct and #that the @email/@password combination logs in. # #Side Effect:: creates @agent # def login puts "Trying to connect to tumblr" @agent = WWW::Mechanize.new do |a| # a.log = Logger.new("mech.log") # a.log.level = Logger::DEBUG a.redirect_ok = true a.user_agent_alias = 'Windows Mozilla' end # Is the domain any good? This will raise 404 error if bad. @agent.get("http://#{@domain}") # Can the user login? page = @agent.get('http://www.tumblr.com/login') login_form = page.forms.first login_form.email = @email login_form.password = @password result = login_form.submit(login_form.buttons.first) raise "Bad username or password" unless "Logging in..." == result.title end #The Tumblr API does not provide a bandwidth efficient means of getting a list of all postids #without getting the entire posts as well. #This is a bad hack to use the /archive page to get a list of 3000 post_ids at a time. #It uses Net::HTTP because the post_ids which are stored as javascript, so Mechanize can't access them. #ie: location.href='http://rubeh.tumblr.com/post/22655521 # #Side Effect:: sets up @postids as an array of postids (as strings) # def parse_archive url = URI.parse("http://#{@domain}/archive") req = Net::HTTP::Get.new(url.path) res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) } # with the body of the archive page, split it into chunks that have one postid each. # use a regular expression to extract the postid @postids = res.body.split("onclick").map{|chunk| (chunk =~ /location.href='http://[^/]+/post/(d+)/) ? $1 : nil }.reject{|i| nil == i} end # Using the list of @postids from parse_archive, iterate through them and send HTTP POSTs to the /delete/id action. # It does not check that the delete occurs. As a matter of fact, it intentionally asks to redirect to a 404 to reduce # bandwidth. # def clean total_ids = @postids.size @postids.each_with_index do |postid, i| puts "nDeleted #/#{total_ids}" if i % 25 == 0 print "." result = @agent.post("http://www.tumblr.com/delete", 'id' => postid, 'redirect_to' => '/404') rescue nil # usually tumblr redirects to the dashboard after a delete happens # I'm intentially creating a 404 because it's much less bandwidth intensive end puts end end