I recently found myself needing to scrape information from a website that uses login credentials. The authentication and session information was available in several cookies, which Wget could use, if the cookies were stored in a plain text file. I used Firefox to login and set the cookies, but Firefox saves it’s cookies in an sqlite data file, which must be exported before Wget can use it. A quick Google search turned up a few possible methods using sqlite3, which I’ve adapted here to use with Wget. I’ve also added some additional (example) code to extract hrefs and print them out, along with the webpage url. The script is called with the target url as the only command line argument.
UNIX (Generic)
Conceal Email Address with JavaScript
I wrote this perl script years ago when I needed to include my email address on a webpage, but also conceal it from spam bots and spiders.
#!/usr/bin/perl -w
# /usr/local/bin/esc-mailto.pl
# Conceal email address with javaScript.
# by Jean-Sebastien Morisset (http://surniaulula.com/)
use strict;
my ($email, $text) = @ARGV;
$text = $email if (!$text);
if ($email && $text) {
my $mailto = "<a href=\"mailto:$email\" class=\"esc-mailto\">$text</a>";
$mailto =~ s/(.)/sprintf("%%%x", ord($1))/ge;
print "<script language=\"JavaScript\">document.write(unescape(\"$mailto\"))</script>\n";
} else {
print "syntax: $0 {email_address} [optional_link_text]\n";
exit 1;
}
exit 0;
Beautify Query Strings with Rewrites
Sometimes I’ll work on something just to see what it looks like when it’s done. I guess this Apache rewrite might be something like that — I wanted to change the WordPress search query from /?s=value
to /s/value
, just to make the URL look a little prettier. :) There are probably a few ways to do this, and if you’d like to share some alternatives, feel free to post a comment.
There are two parts to this problem; The first, executing a search query from an /s/value
URL, is easily addressed by a rewrite and proxy command. The second problem — how to rewrite a regular search query, but not a proxied search query — is a little tricker. I decided to add an htproxy hostname to my domain with an IP of 127.0.0.1. Then in a rewrite condition, I check for the htproxy hostname, and skip the rewrite if it’s a proxied request. The htproxy hostname must be included in the website’s Apache config as a ServerAlias
.
Update a Dynamic DNS IP with BIND
I wrote the following nsupdate-ddns.sh script to update the dynamic DNS entry for my laptop when switching network locations. There are several ways to execute a script like this automatically (cronjob, startup script, launcher, etc.) — I chose to use Sidekick for Mac OS X, which allows me to execute it when switching locations (either network or physical). This script can also create the private authentication key needed by the DDNS BIND server, and will display some sample configuration values. If you’re setting up a new DDNS BIND server, you can use the examples to configure your dynamic zone file.
Random Password in Perl
Here’s a script I keep in ~/bin/randpwd.pl and use frequently to generate random password strings. It prints an 8 character password by default, but you can also specify a different password length on the command line. I left out the capital letter “o” to avoid confusion with the number zero. ;-)
Autocomplete SSH Hostnames
There are plenty of SSH autocomplete (or command-line completion) scripts available on the web, but I found most don’t go far enough — they usually just parse the ~/.ssh/known_hosts, ignoring the ~/.ssh/config and /etc/hosts files. Some of these scripts also generate a static autocomplete list at login, and can’t include new hostnames added during the session. The following script uses a function call to autocomplete hostnames dynamically, and fetches hostnames from the ~/.ssh/known_hosts, ~/.ssh/config and system-wide /etc/hosts file.