Tuesday, August 31, 2010

Quick multi level mergesort algorithm in javascript

/* Quick multi level mergesort algorithm in javascript *  * Sort the array first by field1 then by field 2      */
var sort = function(arr) {
var len = arr.length;
if (len < 2) return arr;
var pivot = Math.ceil(len / 2);
return merge(sort(arr.slice(0, pivot)), sort(arr.slice(pivot)));
};

var merge = function(left, right) {
var result = [];
var li = 0;
var ri = 0;
var ll = left.length;
var rl = right.length;
var le = null;
var re = null;

while ((li < ll) && (ri < rl)) {

le = left[li];
re = right[ri];

if (le.field1 > re.field1) {
result.push(le);
li++;
continue;
}

if (le.field1 < re.field1) {
result.push(re);
ri++;
continue;
}

if (le.field2 > re.field2) {
result.push(le);
li++;
continue;
}

result.push(re);
ri++;
continue;

}

result = result.concat(left.slice(li), right.slice(ri));
return result;
};

Thursday, August 12, 2010

Wget as a spider/crawler - Recursively download a webpage and everything it links to

Here's a handy command line for using the Linux utility wget as a web crawler.

wget -r -np -p -k http://www.example.com

An explanation of the options -

r - enable recursive downloads
np - wget will not follow links up the url. e.g. it will not follow a link from example.com/abcd/page1.html to example.com/page2.html.
p - get all the page requisites. e.g. get all the image/css/js files linked from the page.
k - convert all links to make them suitable for local viewing. Will convert all absolute links to relative links if the file has been downloaded locally.


Some more useful tips:
c - continue a previous download. This option is very handy to resume past aborted download attempts. It compares the local filesize with the remote filesize and downloads only the difference. Beware that if the files have changed on the server, you would end up with a garbled file.
X - Supply a list of directories to exclude from downloading. Is helpful for example when you want to not download a particular section of the site. You can include wildcards in the directory pattern. e.g. -X /ads/* will skip over anything that begins with www.example.com/ads folder.

Saturday, August 07, 2010

Linux BASH function to search recursively for a string in all files in the current directory

If you do any serious coding at all, you would have used a "Search" function in your editor. Some IDEs even allow you to search within the entire project/workspace or a subset of files within that workspace. But when you are using a simple text editor that does not support such features, such functionality is sorely missed.

Fortunately, the Linux shell BASH provides an easy way to search for any string recursively in a directory. Just cd to the desired location and do -
grep -ri "[a phrase]" .
This would give you the results of a recursive (-r), case insensitive match (-i) for the supplied string in all the files in the current directory.

But I find even this small line a pain to type when I have to do this every other minute during a marathon coding session. So I wrote a small wrapper function to make this even easier -

function grepcr() {
if [ $# -gt 0 ]; then
a=$1
shift
fi
sp=" "
while [ $# -gt 0 ]; do
a=$a$sp$1
shift
done
echo "grep -ri \"$a\" ."
grep -ri "$a" .
}

Just put this function in your ~/.bashrc file and then you can search for "a phrase" using the simple -
grepcr a phrase
As you may have guessed, "grepcr" stands for GREP Contents Recursively.