Saturday, October 16, 2010

Ruby, Mechanize, Nokogiri, Xpath and Firebug

I have done a fair amount of extremely painful screen scraping to get data for a web page in the past using tools like WWW::Mechanize or LWP for perl. It can be disgustingly ugly to parse HTML to get at the data you need. When it came it recently all the bad experiences I have had came rushing back and I was dreading starting. Luckily the world is a nicer place and this had become trivial with ruby. The once painful process became:
  1. Open page I want to scrape in firefox
  2. load firebug
  3. rick click on the html element I want to scrape and select copy xpath
  4. fire up textmate and create a simple ruby mechanize script
  5. past in the xpath to script.
The code simply looks like this:
require 'rubygems'
require 'mechanize'
require 'mysql'

agent = Mechanize.new

page = agent.get('http://url.to.scrape.com')

upString = page.parser.xpath("/html/body/form/table[2]/tr/td/table/tr[5]/td/table/tr[6]/td[3]/a/img")[0]['title'].to_s
Life is good.

Saturday, September 25, 2010

Creating an image of a running EC2 instance

Prerequisites
  1. AWS User ID
  2. AWS Key ID
  3. AWS Secret Key
  4. x.509 Key pair (cert and private key)
All of this can be obtained from the AWS account info page here. Note AWS does not store the private key of your x.509 key pair, if you do not have you will need to create a new key pair.

Creating the bundle
  1. Upload your x.509 cert and private key to your running ec2 instance.
  2. scp PATH_TO_KEYS/{cert,pk}-*.pem root@AWS_INSTANCE:/mnt
  3. Log into your ec2 instance
  4. ssh -i YOURKEY.pem root@AWS_INSTANCE
  5. Set up some environment variables to make the processes a little easier. Set arch to either i386 or x86_64 depending if you have a 64 bit or 23 bit instance. If your not sure which to choose you can check here
  6. # export AWS_USER_ID=YOUR_AWS_USER_ID
    # export AWS_ACCESS_KEY_ID=YOUR_KEY_ID
    # export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
    # export arch=i386
  7. Create bundle
  8. ec2-bundle-vol -r $arch -d /mnt/ -p $prefix -u $AWS_USER_ID -k /mnt/pk-*.pem -c /mnt/cert-*.pem -s 10240 -e /mnt,/root/.ssh
  9. Upload bundle
  10. ec2-upload-bundle -b $bucket -m /mnt/$prefix.manifest.xml -a $AWS_ACCESS_KEY_ID -s $AWS_SECRET_ACCESS_KEY
  11. Register bundle
  12. ec2-register --name "$bucket/$prefix" $bucket/$prefix.manifest.xml

Saturday, February 27, 2010

Compiling GCC 4.4.3 on Solaris

First I have to say the compiling on solaris can be a major pain in the ass. I found my-self wanting gnu find on one of our solaris 10 machines but had issues compiling it, there was a known bug in gcc that can be fixed by upgrading gcc. That is when the fun began. I download gcc from gnu.org. The first problem I ran into was that I didn't read the dependency list, my bad. I needed to grab gmp and mpfr. I downloaded and compiled both and passed the --with-gmp and --with-mpfr flags and I though I was all set. Turns out that I was wrong I ran into a problem finding the mpfr lib. That sucked seeing as I just downloaded and compiled them. After looking at the config.log it was pretty obvious, I was building against 64 bit libraries. Adding the following got me to the next error:
env CC="gcc -m64"
The next problem was similar, apparently Solaris includes the 32 bit binaries in the search path but not 64 bit. The error I got was:
configure: error: cannot compute suffix of object files
When I checked the config.log the specific error I found was:
libgcc_s.so.1: wrong ELF class: ELFCLASS32
I was able to fix it with the following:
export LD_LIBRARY_PATH=/usr/sfw/lib/64/
Ok so now things looked to be on the right path. The compile went on for an hour and half and died with a new error, again the error was wrong ELF class. At this point I was about to pull my hair out. The problem was the CFLAGS were not being passed down to the cross coplier, note the following line:

/users/srb55/gcc-4.4.3/host-sparc-sun-solaris2.10/prev-gcc/xgcc -B/users/srb55/gcc-4.4.3/host-sparc-sun-solaris2.10/prev-gcc/ -B/global/inf/sys/software/gcc/gcc-4.4.2/sparc-sun-solaris2.10/bin/ -c -g -O2 -DIN_GCC
--- snip ---
So the fix to this wasnt so bad I just disbabled the bootstrap compile with --disable-bootstrap. This time the compile completed (after like 3 hours). The configure line that worked in the end was:
./configure --prefix=/global/inf/sys/software/gcc/gcc-4.4.3 --with-mpfr=/global/inf/sys/software/mpfr/mpfr-2.4.0 --with-gmp=/global/inf/sys/software/gmp/gmp-5.0.1/ --enable-shared --disable-nls --disable-bootstrap --disable-multilib -enable-languages=c,c++
Victory!

Sunday, March 15, 2009

Creating a TextMate Bundle

I spend a fair amount of time dealing with code written in SQR. Since this isn't a very popular language there isn't a good editor with syntax highlighting for it. I use TextMate a good deal and so it seemed natural to write a bundle for SQR. Step one is to open TextMate and click Bundle then Bundle Editor and then Show Bundle Editor. Once there click the plus at the bottom of the screen and add new bundle.


Now name the bundle something appropriate, in my case SQR. Now you need to add a new language to your bundle, the language is what controls syntax highlighting.



This will generate the following code.

{ scopeName = 'source.untitled';
fileTypes = ( );
foldingStartMarker = '/\*\*|\{\s*$';
foldingStopMarker = '\*\*/|^\s*\}';
patterns = (
{ name = 'keyword.control.untitled';
match = '\b(if|while|for|return)\b';
},
{ name = 'string.quoted.double.untitled';
begin = '"';
end = '"';
patterns = (
{ name = 'constant.character.escape.untitled';
match = '\\.';
},
);
},
);
}

Lets go through this in detail.
  • scopeName (line 1) — this should be a unique name for the language. The convention is for these to be dot separated where the left most piece is the most specific. In my case I used source.SQR. (TextMate 1)
  • fileTypes (line 2) — this is an array of file type extensions that the language should (by default) be used with. This is referenced when TextMate does not know what grammar to use for a file the user opens. If however the user selects a grammar from the language pop-up in the status bar, TextMate will remember that choice. (TextMate 1)
  • foldingStartMarker / foldingStopMarker (line 3-4) — these are regular expressions that lines (in the document) are matched against. If a line matches one of the patterns (but not both), it becomes a folding marker. The means you will get the arrow in the left margin that will allow you to expand and collapse large sections of code (TextMate 1)
  • patterns (line 5-18) — this is an array with the actual rules used to parse the document. In this example there are two rules (line 6-8 and 9-17). (TextMate 1)


For example this is the beginning of my definition for the SQR language.


{ scopeName = 'source.SQR';
fileTypes = ( 'sqr', 'sqc', 'SQR', 'SQC' );
foldingStartMarker = '(?i)^\s*+(Begin-Procedure|If|Begin-Report|Begin-Heading|begin-select|begin-sql|#ifdef).*';
foldingStopMarker = '(?i)^\s*+(End-Procedure|End-If|End-Report|End-Heading|end-select|end-sql|#end-if).*';
patterns = (
{ name = 'comment.line.double-slash.SQR';
match = '!.*\n';
},
...
);
}
The name for the scope is source.SQR and it operates by default on files with the following extensions sqr, sqc, SQR, and SQC. Lets look at just one of the folding markers, Begin-Procedure/End-Procedure. The regex match I set up does a case insensitive match looking for a line that starts with any number of spaces then Begin-Procedure and basically the same for End-Procedure. The result:


Yeah for little arrow thingys!

So the other you might note is the lines with '!' are grayed out, this is because they are set to be comments by our pattern. The name of the pattern is a special name that matches a TextMate property, for a full list see here under 12.4 Naming Conventions. In this case I choose one that is for comments, "comment.line.double-slash" then added ".SQR" for the right scope. The second part of the pattern is the match in this case "!.*\n." This will watch and '!' and any characters after it.

You can go on to add more patterns for reserved words, variables, strings, etc. In not too long you can have syntax highlighting all set. Then you can go on to add snippets and other commands, but we while have to save that for another time.

You can download what I have so far here.

(TextMate 1, http://manual.macromates.com/en/language_grammars#naming_conventions.html)

Saturday, September 27, 2008

Sunday, September 14, 2008

Installing Oracle Calendar on Fedora

This will be the first post in a series chronically my "Can I use Fedora for my work OS?" project. My first task was to see if I could install Oracle Calendar which is our enterprise calendaring system.

In theory this should not be a hard install you can get the installer from oracle.com and unzip that tarball and run gui_install.sh. Of course things never go that easy, the first error I got was this:

Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
awk: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
dirname: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
/bin/ls: error while loading shared libraries: librt.so.1: cannot open shared object file: No such file or directory
basename: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
dirname: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
basename: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
hostname: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
Launching installer...
grep: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
/tmp/install.dir.7150/Linux/resource/jre/bin/java: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory
After doing some research it seems that the LD_ASSUME_KERNEL environment var can cause problems, it is old var used during the time of kernel threading switch. Some apps, mostly java, still have this set and it cuases problems in fedora. The soluction is to turn it off:

perl -pi -e 's/export LD_ASSUME_KERNEL/#xport LD_ASSUME_KERNEL/' cal_linux


After that I got a missing object error:

libXp.so.6: cannot open shared object file: No such file or directory


This was an easy fix, I just installed the missing lib.


yum install libXp.so.6


The Final error I had was:

libstdc++.so.5: cannot open shared object file: No such file or directory


I checked the lib version and found that I had so.6. My first though was to link this to the so.5 and hope for the best. I decided to first look for a compat library and I found it. The solution:


yum install compat-libstdc++-33

Finally I had a working calendar! Woo Hoo!

Wednesday, September 3, 2008

Techie Haiku

Your razor-sharp wit
Can never stand up to my
Adamantium