March 29, 2015
This post should conclude my current Pinboard obsession, and it’s a post that I wondered about writing at all for a number of reasons: Pinboard offers a great paid full text search option, this script hammers the Instapaper API and the code is rough and ready. But with all that said, this script did save my bacon twice already during internet outages so I will leave it here as a Proof of Concept. The Python script doesn’t create the urls
directory, doesn’t perform any fancy rate-limiting or threading, and is specific to my setup but you should be able to get it to work on your own machine if you want to try it out. It relies on a local Pinboard bookmarks file, which you can export from the Pinboard site or grab using the pinboard-backup Perl script that I use. The script sends each url it finds in the bookmarks file to Instapaper to be converted to Markdown and writes the resulting file to a local directory. It’s quick and a little bit nasty but it works. Some urls just won’t lend themselves to conversion to Markdown, some I have deliberately excluded (images, scripts, pdfs) and the script will try to just keep on working through most errors.
#!/usr/bin/env python
import os
import codecs
import re
from bs4 import BeautifulSoup
from html2md import UrlToMarkdown
bookmarks = open('/Users/larry/pinboard/bookmarks.html')
soup = BeautifulSoup(bookmarks)
u2md = UrlToMarkdown('instapaper')
for link in soup.find_all('a'):
link = link.get('href').encode('utf-8')
if re.search('(.pl|.sh|.txt|.pdf|.rb|.jpe?g|.png|.gif)$', link):
print 'Skipping ', link
continue
localfile = link.split('/')[2]
try:
if link.split('/')[3] in link:
localfile = localfile + '_' + link.split('/')[3]
if link.split('/')[4] in link:
localfile = localfile + '_' + link.split('/')[4]
if link.split('/')[5] in link:
localfile = localfile + '_' + link.split('/')[5]
except:
localfile = localfile
localfile = localfile + '.markdown'
if os.path.exists('/Users/larry/urls/%s' % localfile):
print 'Skipping ' '/Users/larry/urls/%s' % localfile
continue
try:
markdown = u2md.convert(link)
with codecs.open('/Users/larry/urls/%s' % localfile, 'w', 'utf-8') as file:
file.write(markdown)
file.close()
except IOError:
pass
except AttributeError:
pass
except Exception:
pass