PatrickBay.ca

This time, it's personal.

I imagine that this application would benefit from a little TorAS integration but in the meantime it’s still pretty useful [downloads are at the bottom of this post].

Since I spent time figuring out how to describe it on Google Code, I thought I’d rather post that than hurt my brain again. So here’s what Araknid does:

[Araknid is] An Adobe AIR application that crawls and archives web pages and assets such as images and scripts into a searchable/convertible SQLite database file. Use it to create a deep search index of your (or any other) site, your intranet, or even the whole web. Uses user-defined regular expressions to match and extrapolate links and other dynamic site information. Built on SwAG Toolkit ActionScript 3 application framework, user interface developed in Flash CS6 (.fla file). Dynamically supports latest versions of Adobe AIR on mobile.

Araknid is still quite new, meaning database interactions will need to be optimized, regular expressions will need to be tweaked, and some important features will need to be added (for example, crawl only under a specific URL or domain). But core functionality should be there, along with plenty of room for growth for anyone who wants to fiddle with the source code.

Here’s what you’ll see when you run the software…the legend is below:

Araknid User Interface Legend

1. Click here to select where the Araknis database (SQLite) will be stored. Once set, clicking here will open the SQLLite database file in the default application.
2. Pauses and unpauses any currently running crawl.
3. Opens the currently running crawl URL (#6) in a new browser window (if you’re curious to see what’s being crawled).
4. Variable crawl delay (in seconds). Prevents servers from flagging Araknid as an attacker or malware.
5. The percentage loaded of the currently running crawl URL (#6).
6. The currently running crawl URL. Pause Araknid, enter your own URL, and unpause to start a new crawl.
7. Countdown clock showing when the next crawl will happen (in seconds). Based on the crawl delay (#4).
8. A list of URLs extracted from the currently running crawl URL. These are added to the database (if not already crawled), and queued for subsequent crawls.

TIP: Rename or delete the database file to reset Araknid (like a new install).

ADVANCED TIP: You can update the regular expressions used to filter / extract URLs from various HTML tags. To do this:
1. Shut down Araknid.
2. Update the configuration XML file (config.xml) in the Araknid installation directory (for example, C:\Program Files (x86)\Araknid\config.xml).
3. Delete the Araknid application storage folder: C:\Users\[YOUR USER NAME]\AppData\Roaming\Araknid\Local Store
 WARNING: This step resets your application settings (you’ll need to re-select the database file on next startup if you want to continue previous crawls).
4. On next restart, Araknid will use the new regular expressions for subsequent crawls.

Araknid v1.0 Downloads

All desktop operating systems:

  1. Download and execute Adobe AIR installer: http://get.adobe.com/air/
  2. Download and install Araknid: http://patrickbay.ca/downloads/Araknid.air

Windows (32/64-bit):

Source code (ActionScript / Flash CS6 user interface):

…and something to view the resulting SQLite database:

January 21st, 2014

Posted In: Development / Coding, Internet / Web

Tags: , , , ,

Leave a Comment

This TorAS thing is going to be a lengthy project — there’s still lots left to implement and older code to one day upgrade — but I received some great feedback about the appropriateness of communicating over Tor using unencrypted HTTP (it’s not very appropriate in many situations). So I’ve branched the current stable version and added TLS/SSL support (courtesy of as3crypto), in a development branch:

https://code.google.com/p/toras/source/browse/#svn%2Fbranches%2Fdevelopment

I’ll add it back into the main branch as soon as I get some time to document it fully, but the bulk of the new functionality has been covered in the updated Developers Guide:

https://code.google.com/p/toras/wiki/TorASDeveloperGuide

I may also have added source code comments so have a peek.

TorAS ActionScript source code is made public under a liberal license (MIT), developed on open source technology (FlashDevelop), incorporating open source technology (Tor “Expert Bundle” binary , released under a similar license and spirit),  and comes with a handy step-by-step guide. It’s most powerful on Windows (on account of the bundled Tor binary), but you should be able to use it on AIR for Android with something like Orbot (is there an official Tor router app for iOS?), and in fact anywhere AIR and Tor can run together.

January 21st, 2014

Posted In: Development / Coding, Internet / Web, Privacy / Surveillance

Tags: , , , , , , ,

2 Comments

A revealing breakdown of known Tor Exit Node locations by hackertarget.com:

Tor_node_map

January 14th, 2014

Posted In: Internet / Web, Privacy / Surveillance

Tags: , ,

Leave a Comment

The following is a sample usage of the FileFinder class I developed for an ongoing project. You’ll find the source code below the usage samples. It’s pretty generic and has only one purpose — to recursively search (through subdirectories) for a specific file name from a starting directory. Usage is straightforward…

package {

	import FileFinder;

	public class Main extends Sprite {

		private var _fileFinder:FileFinder = null;

		public function Main() {
			//Find "chrome.exe" somewhere on the c: drive...
			this._fileFinder = new FileFinder("c:\", "chrome.exe");
	        }
	}
}

Additionally, you can create an optional list of excluded directories, as a File vector array, that you don’t wan’t searched:

package {

	import FileFinder;
	import flash.filesystem.File;

	public class Main extends Sprite {

		private var _fileFinder:FileFinder = null;

		public function Main() {
			//Find "chrome.exe" somewhere on the c: drive, but don't look in "c:\Windows\"...
			var exclusions:Vector.<File>=new Vector.<File>();
			exclusions.push(FileFinder.resolveToFile("c:\Windows\");
			this._fileFinder = new FileFinder("c:\", "chrome.exe", exclusions);
		}
	}
}

I’ve fully commented the source code below, as well as sections of the class that I highly recommend you update. At the very least, update the section that says “File was found! Do something with currentFile here…” (because it’s not a very useful class otherwise :) )

//Update package path as desired...
package  {

	import flash.filesystem.File;
	import flash.events.FileListEvent;

	public class FileFinder {

		private var _basePath:* = null;
		private var _fileName:String = null;
		private var _currentDir:File = null;
		private var _directoryStack:Vector.<File> = null;
		private var _completedStack:Vector.<File> = null;

		/**
		 * Searches for a specific file from a specified base path, optionally excluding certain paths.
		 * 
		 * @param	basePath The base path (for example, "c:\" or "/") to begin the search at.
		 * @param	fileName The file name ("file.ext") to search for.
		 * @param	exclusions An optional File vector array of paths to exclude. Any File items that are not
		 * directories will be removed.
		 */
		public function FileFinder(basePath:*, fileName:String, exclusions:Vector.<File>=null) {
			this._basePath = basePath;
			this._fileName = fileName;
			if (exclusions != null) {
				//The exclusions are the completed stack, we just need to ensure that they're all directories.
				this._completedStack = exclusions;
				this._completedStack = this.pruneNonDirectories(this._completedStack);
			} else {
				this._completedStack = new Vector.<File>();
			}//else			
			this._directoryStack = new Vector.<File>();			
			this.findFile();
		}//constructor

		/**
		 * Begins the file search by resolving the base path and starting the initial asynch file list retrieval.
		 */
		private function findFile():void {
			this._currentDir = resolveToFile(this._basePath);						
			if (this._currentDir == null) {
				trace ("Couldn't resolve root directory: \""+this._basePath+"\"");
				return;
			}//if
			this._currentDir.addEventListener(FileListEvent.DIRECTORY_LISTING, this.onDirectoryListing);
			this._currentDir.getDirectoryListingAsync();
		}//findFile

		/**
		 * Event handler for asynch directory listing.
		 * 
		 * @param	eventObj A FileListEvent object.
		 */
		private function onDirectoryListing(eventObj:FileListEvent):void {
			var dirString:String = new String();
			this._currentDir.removeEventListener(FileListEvent.DIRECTORY_LISTING, this.onDirectoryListing);
			//Current directory being searched: this._currentDir.nativePath
			//We set _currentDir to null as a precaution here. Not 100% necessary.
			this._currentDir = null;			
			var fileList:Array = eventObj.files;
			if (fileList.length == 0)  {
				//This is an empty directory so abort here and keep searching...
				this._currentDir = this._directoryStack.shift();
				this._completedStack.push(this._currentDir);
				this._currentDir.addEventListener(FileListEvent.DIRECTORY_LISTING, this.onDirectoryListing);
				this._currentDir.getDirectoryListingAsync();
				return;
			}//if					
			//Number of directories remaining to be searched: this._directoryStack.length
			//Directories already searched: this._completedStack.length
			for (var count:uint = 0; count < fileList.length; count++) {
				var currentFile:File = fileList[count] as File;
				if (currentFile.isDirectory) {
					//Current search entry is a directory so push on stack if not already searched
					if (this._completedStack.indexOf(currentFile)<0) {
						this._directoryStack.push(currentFile);
					}//if
				} else {							
					if (currentFile.name == this._fileName) {
						//File was found! Do something with currentFile here; maybe dispatch an event?
						return;
					}//if
				}//else
			}//for
			if (this._directoryStack.length == 0) {
				//File was not found. We're done.
				return;
			}//if			
			this._currentDir = this._directoryStack.shift();
			this._completedStack.push(this._currentDir);
			this._currentDir.addEventListener(FileListEvent.DIRECTORY_LISTING, this.onDirectoryListing);
			this._currentDir.getDirectoryListingAsync();
		}//onDirectoryListing

		/**
		 * Prunes all non-directory items from the supplied File vector array.
		 * 
		 * @param	stackVector An array of File items.
		 * @return  	A copy of the input vector array with only the directory items included.
		 */
		private function pruneNonDirectories(stackVector:Vector.<File>):Vector.<File> {
			return (stackVector.filter(this.pruneNonDirFilter, this));
		}//pruneNonDirectories

		/**
		 * Filter function used in conjuction with a File vector's "filter" method.
		 * 
		 * @param	item The File item being analyzed.
		 * @param	index The index of the item currently being analyzed.
		 * @param	vector A reference to the File vector currently executing the "filter" method.
		 * 
		 * @return True if the supplied File item is a directory, false otherwise.
		 */
		private function pruneNonDirFilter(item:File, index:int, vector:Vector.<File>):Boolean {
			if (item.isDirectory) {
				return (true);
			} else {
				return (false);
			}//else
		}//pruneNonDirFilter		

		/**
		 * Resolves a native path string (for example "c:\Windows\", or relative root paths like "\" or "/"), to an
		 * ActionScript File object.
		 * 
		 * @param	nativePath The native path to convert to a File instance.
		 * 
		 * @return A File instance pointing to the native path specified, or null if something went horribly wrong.
		 */
		public static function resolveToFile(nativePath:String):File {
			try {
				var returnFile:File = File.userDirectory;
				if ((nativePath == "\\") || (nativePath == "/")) {
					//AIR won't resolve slashes as roots so we do a little guessing instead...
					var rootDirs:Array = File.getRootDirectories();
					returnFile = rootDirs[0] as File;		
				} else {			
					returnFile = returnFile.resolvePath(nativePath);
				}//else
				return (returnFile);
			} catch (err:*) {
				return (null);
			}//catch
			return (null);
		}//resolveToFile

	}//FileFinder class

}//package

After running this a few times I realized that it’s not as efficient as it could be, but it’s a good place to start. Everything runs asynchronously so it shouldn’t affect your application much, though in my experience there are still hiccups when accessing certain directories.

January 6th, 2014

Posted In: Development / Coding

Tags: , , ,

2 Comments

fiddler logo The web debugging proxy Fiddler is a terrific tool.

On top of being über useful in capturing all of the web traffic coming in and out of your computer, it can also be used to analyze, manipulate, and redirect that traffic.

For web developers, the type of functionality that Fiddler offers is invaluable, but it can be just as useful for the amateur tinkerer. With only a little knowledge, it’s possible to reverse engineer web APIs, sites, and online services. You can easily see malware launching communications from your device, be it your desktop PC, laptop, or mobile, and you can fiddle with these communications (hence the name) down to very fine details.

I have to give a shout-out to my favourite software in the same category, Charles, but Fiddler’s price tag (free) makes it a good, albeit not quite as powerful alternative.

It’s good enough, in fact, to even capture and analyze HTTPS – an encrypted layer of communication that sits on top of standard web traffic that’s used from anything from accessing your GMail account to your online bank account. In fact, Fiddler’s ability to capture and analyze such traffic is at once very useful and simultaneously disturbing and somewhat scary. After all, if Fiddler can do it so easily, why not some other piece of software that’s not quite as visible … perhaps not even installed by you?

I’ll have to save that discussion for some other time, though, because in this post I wanted to cover what happens when Fiddler stops being able to deal with HTTPS, or encrypted web traffic. In other words, what if it stops being able to do what you set it up to legitimately do?

For example, I have Fiddler running pretty much all the time as part of my day job, and most of the time I don’t have to think twice — it negotiates standard HTTP and encrypted HTTPS without any issues and, in the web browser, everything loads fine.

Then this happens:

Google Chrome security certificate warning

Google Chrome security certificate warning

Here was have Google’s Chrome complaining about a security certificate issued, apparently, from Google itself!

Other times the site would simply fail to load altogether, though if I replaced “https://” with “http://” in the address bar, everything would work fine. In other words, secure browser connections, or HTTPS, were failing — but only when Fiddler was attempting to capture them. Turning off Fiddler fixed the problem right away.

At this point I’m still not certain why exactly this happened. All I know for certain is that at some point, the security certificates stored on my computer were no longer recognized and the browser wouldn’t allow any HTTPS connections with Fiddler. And since Fiddler intercepts traffic at the operating system level, and not the browser level, any HTTPS connections from any browser would fail.

Oh, and my current project required me to analyze HTTPS traffic. Of course.

I searched a variety of knowledge bases and found information that was both woefully out of date and that didn’t work anyways. I’m using Fiddler 2 (v4.4.4.3 beta), which is not terribly new but seems quite exotic in terms of the internet’s help sites, so I was left to my own devices.

Luckily the solution was quite simple. This can be used the first time you’re setting up Fiddler for use with HTTPS, or if it stops working for whatever reason.

  1. Open up Fiddler’s HTTPS options panel by selecting (from the main menu):
    Tools -> Fiddler Options… -> HTTPS (tab)
  2. Clear any installed interception certificates. Ensure that “Capture HTTPS CONNECTs” is selected (checked), but other options such as “Decrypt HTTPS traffic” are unselected (unchecked). This will enable the “Remove Interception Certificates” button — click it.
  3. Accept any pop-up dialogs that are displayed until the certificates are removed.
  4. Fully close and restart Fiddler (you’ll probably be prompted to do so anyway).
  5. Repeat step 1 to open the HTTPS tab.
  6. This time ensure that you select “Capture HTTPS CONNECTs” as well as “Decrypt HTTPS traffic“. As soon as you select this second option you will receive a dialog box that looks something like this:
    Fiddler certificate
    Click on “Yes” (including on any subsequent dialogs), to install Fiddler’s certificate in the operating system. This will allow Fiddler to sit in the middle of encrypted web traffic, capture it, and analyze it.
  7. Click on “OK” in the HTTPS tab you’ve now returned to.
  8. You should now be able to seamlessly connect to HTTPS and see the traffic in Fiddler. You may also have to accept an untrusted certificate in your browser next time you try to connect, but since you know that you’re legitimately only going to be viewing your own traffic, you can accept it. Here’s an example from Firefox (here you would click “Add Exception…” to force the browser to accept this connection:
    Certificate exception

Now you should be able to see secure web connections right along side standard HTTP traffic. The only real differences are the initial HTTPS “tunnel” that’s established as s first step in a secure connection, and the fact that the “Protocol” column for subsequent communications with that address will show “HTTPS“:

https traffic

Doing in-depth analysis of this information is a topic in itself so that’d best saved for a future post, but if you’ve spent enough time online and recognize web addresses and URLs, this should be more than enough to get you started, and it should be sufficient to allow you to flag suspicious activity.

July 2nd, 2013

Posted In: Development / Coding, Internet / Web

Tags: , , , ,

Leave a Comment

« Previous Page