November 25, 2014

Exploring Pinboard bookmarks

I mentioned on Twitter recently that I use pinboard-backup to make a version-controlled local copy of my Pinboard bookmarks every night. (It came up in the context of my passing the 5,000 bookmarks milestone; make of that what you will.) Anyway, I started wondering just what information I could glean from it. The file is a Pinboard export—a straight dump—of all my bookmarks in the Netscape bookmark file format. A sample looks like this:

<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Pinboard Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL><p>
<DT><A HREF="https://chronicle.com/article/Speed-Kills/149401/" ADD_DATE="1416783511" PRIVATE="1" TOREAD="0" TAGS="slow">Speed Kills - The Chronicle Review - The Chronicle of Higher Education</A>
</DL></p>

It’s very easy to go on to the Pinboard site and find out how many bookmarks, to-read items, public versus private, etc., one has but the ADD_DATE caught my eye - maybe I could see how many bookmarks I was stashing per month, or per day. Glancing at the date I thought the 14 in 1416783511 was the year, but it’s not. The date is expressed in seconds elapsed since midnight January 1, 1970 AKA Unix time (or POSIX time or Epoch time). I’ve been learning Python and decided to have a go and see if I could make any sense out of it. Here’s the result.

from bs4 import BeautifulSoup
import datetime

bookmarks = BeautifulSoup(open('pinboard/bookmarks.html'))
dates = open('dates.txt', 'w')

for link in bookmarks.find_all('a'):
    epochdate = link.get('add_date')
    epochdate = float(epochdate)
    date = datetime.datetime.fromtimestamp(epochdate).strftime('%d-%m-%Y')
    dates.write(date)
    dates.write('\n')

dates.close()

This gives me a file containing a list of dates in day-month-year format, like this:

2014-11-24
2014-11-23
2014-11-23
2014-11-23
2014-11-23
2014-11-22
2014-11-21
2014-11-21
2014-11-20
2014-11-20
2014-11-19
2014-11-19
2014-11-19
2014-11-19
2014-11-19
2014-11-18
2014-11-18
...

Running that through sort dates.txt | uniq -c gives me a nice list of days and number of bookmarks.

2 2014-11-18
5 2014-11-19
2 2014-11-20
2 2014-11-21
1 2014-11-22
4 2014-11-23
1 2014-11-24
...

And if I run that through sort again, like sort dates.txt | uniq -c | sort -g I get a leaderboard…

 21 2012-07-18
 22 2012-07-02
 23 2011-12-13
 24 2012-11-08
 26 2012-05-30
 28 2012-11-05
 38 2012-11-07
123 2014-09-10
...

The 123 bookmarks in October was the result of a script that imported all my starred repos from GitHub, so that’s an anomaly, but the others are accurate reflections so my biggest day for bookmarking was 7th November 2012.

So what about monthly figures? Well, if I cut just the months out of my dates.txt I can get a nice monthly reckoning with cat dates.txt | cut -c 1-7 | sort | uniq -c

 146 2014-03
 149 2014-04
  82 2014-05
  68 2014-06
  77 2014-07
  99 2014-08
 221 2014-09
 122 2014-10
  61 2014-11
...

…and a Top 10 by month, cat dates.txt | cut -c 1-7 | uniq -c | sort -rg?

359 2012-11
263 2012-07
236 2012-06
221 2014-09
202 2012-08
198 2012-10
188 2012-09
187 2012-12
167 2012-05
163 2013-10

I now need to go and have a good long look at the bookmarks I saved in November ’12 and see what was so damned important.