At the beginning of September I accepted a challenge of writing one blog post a day for the rest of the month. With the month coming to an end I thought it would be interesting to see how I did, and also do a quick analysis on my posts.
September has 30 days. I wrote 28 posts. Try as much as you want, 28 does not equal 30. The math simply does not work out. So I did fail in my attempt at doing a post a day. But on the bright side I did get pretty close. And this is also the most I’ve ever posted in a month. So yay.
I took my small sample set (you’ve heard of big data? This is small, artisanal, and organic data) of the blog posts I wrote and ran it through a simple script in order to get some statistics. The input command line argument is a folder containing all the posts, where the title of the file is the title of the post and the content of the file is the content of the post.
Yes, I know the regular expression used to split a post into words is very primitive and doesn’t handle all the cases. It doesn’t really matter for such a simple analysis. This script is probably not the most elegant or efficient. But I’m pretty tired and should really be sleeping instead of writing this post right now.
Onto the statistics.
My shortest posts (in ascending order) were:
- M2 with 11 words.
- What I’m Currently Listening To: Scale the Summit with 26 words.
- What I’m Currently Listening To: Foals with 29 words.
I’m not surprised by #2 and #3; these posts are typically very short. #1 is short because I was exhausted from MHacks.
My longest posts (in descending order) were:
I’m surprised that my longest posts aren’t that long.
Which brings me to my average post length, which is 138.96 words. A friend of mine said that I should write longer posts. I think I agree.
The total number of words I wrote is 3891. The number of unique words is 1107. In other words (pun intended), 28.4%. I wonder what the average value for this ratio is.
The list of the 100 words I used the most is dominated by commonly used words in the English language. This makes sense because I did not do any pre-processing on the data to remove/ignore them. Other words that I used a lot are:
- time. 24 occurrences.
- think. 15 occurrences.
- Portland. 9 occurrences (I was in Portland for the long weekend in September).
- climbing. 7 occurrences (I discovered climbing very recently).
- MHacks. 6 occurrences (I was at MHacks in September).
Overall, this blogging challenge turned out to be harder than I expected. But it was also super fun! I will use the momentum built over the course of the month in order to (hopefully) maintain a more frequent posting schedule.