November 11, 2016

Converting Pinboard’s XML export to Netscape HTML bookmark format

I use a Perl script, written by Gareth Jones, to make a nightly local backup of my Pinboard bookmarks. And it’s great. It does, however, involve keeping my username and password in the script so it can simulate a full login. It does this because you can’t get the delightfully archaic - but seemingly universally accepted - Netscape HTML bookmark format exported through the Pinboard API, you can only get XML or JSON. (I had to figure this out the hard way when I was messing around with a Pastebin and accidentally uploaded the backup script. With my credentials in it.)

I did a bit of digging and found Python’s xml.etree.ElementTree module, which makes short work of XML parsing. The following script takes two arguments: an --infile (-i) and an --outfile (-o). The infile should be an XML export of your Pinboard bookmarks. You can get this from the Pinboard API with something like:

curl -G https://api.pinboard.in/v1/posts/all \
    --data auth_token="YOUR PINBOARD API TOKEN" \
    -o pb.xml

(You can find your API Token on the Pinboard settings page.) Once you have your XML file, just run the script with it as the infile, like:

python pb.py -i pb.xml -o pb.html

…and you should have your bookmark file. I know very little about the Netscape bookmark file format, but it seems to be the lingua franca for bookmark import/export. The page most usually referenced seems to be this one, on Microsoft.com, though I did get some clues on this Mozilla-related page. For our purposes, that page gives a format of:

<DT><A HREF="{url}" SHORTCUTURL="{keyword}" ICON="{data:icon}" ADD_DATE="{date}" LAST_MODIFIED="{date}" LAST_VISIT="{date}" ID="{rdf:id}">{title}</A>
<DD>{description}

I’ve deliberately ignored the optional description <DD> (which is known as “extended” in Pinboard parlance, for compatibility with the old Del.icio.us API) and just written the URL, tags, date and title.

#!/usr/bin/env python

# Copyright (c) 2016 Larry Hynes <my first name at larryhynes.com>
# 
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
# 
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import xml.etree.ElementTree as ET

import argparse
import calendar
import codecs
import time

parser = argparse.ArgumentParser()
parser.add_argument("-i", "--infile",
    help="The xml file to read from", required=True)
parser.add_argument("-o", "--outfile",
    help="The bookmark file to write to", required=True)
args = parser.parse_args()

tree = ET.parse(args.infile)
root = tree.getroot()

with codecs.open(args.outfile, 'w', 'utf-8') as file:
    file.write("""<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Pinboard Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL>
""")

    for child in root:
        # get the fields we're interested in
        url = child.attrib.get('href')
        title = child.attrib.get('description')
        tags = child.attrib.get('tag')
        date = child.attrib.get('time')
        # format them to our needs
        comma_delimited_tags = tags.replace(' ', ',')
        time_as_struct = time.strptime(date, "%Y-%m-%dT%H:%M:%SZ")
        epoch_time = calendar.timegm(time_as_struct)
        # write them to the bookmark file
        entry = "    <DT><A HREF=\"%s\" ADD_DATE=\"%d\" \
TAGS=\"%s\">%s</A>\n" % (url, epoch_time, comma_delimited_tags, title)
        file.write(entry)

    file.write("</DL>\n")
file.close()

Do note that we’re dealing with XML here, and running this script with untrusted input could have icky consequences… There’s a nice big warning at the top of the page at Python.org.

“But I have found that sitting under the ElementTree, one can feel the Zen of XML.”