I can’t remember who got me interested in wordle.net but it struck me that I would love to see all of my Twitter posts as a “Beautiful Word Cloud” (from their site). The Java applet on that site transforms a bunch of text (or a site) into a word cloud with the size of the word being representative of its frequency in the text. I saw a the post (and source code) from Adam Franco for retrieving Twitter content using PHP but was more interested in a Powershell version, and I didn’t want the posts in XML – that wouldn’t be useful input to Wordle.
Starting from the end … what I wanted was to use either of the following lines in Powershell:
"testfirst" | Export-Tweets | out-file adam.txt
or
Export-Tweets -User "testfirst" | out-file adam.txt
The first of these uses the pipeline to receive the Twitter username, useful if you want to put it into a loop to say, extract the contents of more than one user account. The second one is a more literal request for the tweets of a single Twitterer. In both cases, the output is to be saved in a file name of my choice using the handy out-file cmdlet. In other words, I don’t want a script that hard-codes the output file name.
Recognize that first you have to get the Export-Tweets function into memory, so perhaps a more complete scenario that depicts my intended usage is the following, typed into a Powershell console:
.Export-Tweets.ps1 "testfirst" | Export-Tweets | out-file adam.txt
The following script accomplishes these modest goals. There are two Powershell tricks to point out. The first trick is enabling pipeline input for a function. You’ll see that in the definition of the function Export-Tweets since it has the begin{}, process{}, and end{} blocks. The pipeline input is enabled in the process{} block while the explicit syntax with the User parameter is enabled in the end{} block. The trick is to define the function that does the work in the begin{} block.
The second trick is substituting % for foreach. This really cuts down on the script length, however, use that substitution with caution if you’re writing scripts that other people have to read. It’s a useful and learnable enough reduction that I wanted to use it in this case.
Here’s the script:
function global:AppendTweets([xml] $tweetsPage) { $tweetsPage.selectnodes("/statuses/status") | % { $_.selectSingleNode("text").get_InnerXml() | write-output } } function global:CountTweets([xml] $tweetsPage) { return $tweetsPage.selectnodes("/statuses/status").Count } function global:GetTweetsPage([string] $user, [string] $page) { [string]$urlbase="http://twitter.com/statuses/user_timeline/" [string]$url=$urlbase + $user + ".xml?page=" + $page write-host "Connecting to URL " $url $webclient= New-Object "System.Net.WebClient" [xml] $tweetsPage=$webclient.DownloadString($url) return $tweetsPage } function global:Export-Tweets([string] $User) { begin { function GetUserTweets ([string] $User) { $pageIndex = 1 $numTweets = 1 while ($numTweets -ge 1) { $tweetsPage = GetTweetsPage $user $pageIndex $numTweets = CountTweets $tweetsPage if ($numTweets -ge 1) { AppendTweets $tweetsPage } $pageIndex += 1 } } } process { if ($_) { GetUserTweets $_ } } end { if ($User) { GetUserTweets $User } } }