March 29, 2015
This post should conclude my current Pinboard obsession, and it’s a post that I wondered about writing at all for a number of reasons: Pinboard offers a great paid full text search option, this script hammers the Instapaper API and the code is rough and ready. But with all that said, this script did save my bacon twice already during internet outages so I will leave it here as a Proof of Concept. The Python script doesn’t create the
urls directory, doesn’t perform any fancy rate-limiting or threading, and is specific to my setup but you should be able to get it to work on your own machine if you want to try it out. It relies on a local Pinboard bookmarks file, which you can export from the Pinboard site or grab using the pinboard-backup Perl script that I use. The script sends each url it finds in the bookmarks file to Instapaper to be converted to Markdown and writes the resulting file to a local directory. It’s quick and a little bit nasty but it works. Some urls just won’t lend themselves to conversion to Markdown, some I have deliberately excluded (images, scripts, pdfs) and the script will try to just keep on working through most errors.
#!/usr/bin/env python import os import codecs import re from bs4 import BeautifulSoup from html2md import UrlToMarkdown bookmarks = open('/Users/larry/pinboard/bookmarks.html') soup = BeautifulSoup(bookmarks) u2md = UrlToMarkdown('instapaper') for link in soup.find_all('a'): link = link.get('href').encode('utf-8') if re.search('(.pl|.sh|.txt|.pdf|.rb|.jpe?g|.png|.gif)$', link): print 'Skipping ', link continue localfile = link.split('/') try: if link.split('/') in link: localfile = localfile + '_' + link.split('/') if link.split('/') in link: localfile = localfile + '_' + link.split('/') if link.split('/') in link: localfile = localfile + '_' + link.split('/') except: localfile = localfile localfile = localfile + '.markdown' if os.path.exists('/Users/larry/urls/%s' % localfile): print 'Skipping ' '/Users/larry/urls/%s' % localfile continue try: markdown = u2md.convert(link) with codecs.open('/Users/larry/urls/%s' % localfile, 'w', 'utf-8') as file: file.write(markdown) file.close() except IOError: pass except AttributeError: pass except Exception: pass