PatrickBay.ca

This time, it's personal.

I imagine that this application would benefit from a little TorAS integration but in the meantime it’s still pretty useful [downloads are at the bottom of this post].

Since I spent time figuring out how to describe it on Google Code, I thought I’d rather post that than hurt my brain again. So here’s what Araknid does:

[Araknid is] An Adobe AIR application that crawls and archives web pages and assets such as images and scripts into a searchable/convertible SQLite database file. Use it to create a deep search index of your (or any other) site, your intranet, or even the whole web. Uses user-defined regular expressions to match and extrapolate links and other dynamic site information. Built on SwAG Toolkit ActionScript 3 application framework, user interface developed in Flash CS6 (.fla file). Dynamically supports latest versions of Adobe AIR on mobile.

Araknid is still quite new, meaning database interactions will need to be optimized, regular expressions will need to be tweaked, and some important features will need to be added (for example, crawl only under a specific URL or domain). But core functionality should be there, along with plenty of room for growth for anyone who wants to fiddle with the source code.

Here’s what you’ll see when you run the software…the legend is below:

Araknid User Interface Legend

1. Click here to select where the Araknis database (SQLite) will be stored. Once set, clicking here will open the SQLLite database file in the default application.
2. Pauses and unpauses any currently running crawl.
3. Opens the currently running crawl URL (#6) in a new browser window (if you’re curious to see what’s being crawled).
4. Variable crawl delay (in seconds). Prevents servers from flagging Araknid as an attacker or malware.
5. The percentage loaded of the currently running crawl URL (#6).
6. The currently running crawl URL. Pause Araknid, enter your own URL, and unpause to start a new crawl.
7. Countdown clock showing when the next crawl will happen (in seconds). Based on the crawl delay (#4).
8. A list of URLs extracted from the currently running crawl URL. These are added to the database (if not already crawled), and queued for subsequent crawls.

TIP: Rename or delete the database file to reset Araknid (like a new install).

ADVANCED TIP: You can update the regular expressions used to filter / extract URLs from various HTML tags. To do this:
1. Shut down Araknid.
2. Update the configuration XML file (config.xml) in the Araknid installation directory (for example, C:\Program Files (x86)\Araknid\config.xml).
3. Delete the Araknid application storage folder: C:\Users\[YOUR USER NAME]\AppData\Roaming\Araknid\Local Store
 WARNING: This step resets your application settings (you’ll need to re-select the database file on next startup if you want to continue previous crawls).
4. On next restart, Araknid will use the new regular expressions for subsequent crawls.

Araknid v1.0 Downloads

All desktop operating systems:

  1. Download and execute Adobe AIR installer: http://get.adobe.com/air/
  2. Download and install Araknid: http://patrickbay.ca/downloads/Araknid.air

Windows (32/64-bit):

Source code (ActionScript / Flash CS6 user interface):

…and something to view the resulting SQLite database:

January 21st, 2014

Posted In: Development / Coding, Internet / Web

Tags: , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *