Friday, November 5, 2010

jQuery and Node.js

jQuery makes it really easy to select certain parts not only of a HTML page but also of an XML document. So if you just want to parse some (and not too big) XML file or do not care too much ab the speed of your parser a combination of node.js and jquery can help a lot and make some stuff really less painful.

So what is needed (ubuntu linux):
  • Install node.js on ubuntu: http://ec2-50-16-39-169.compute-1.amazonaws.com/blog/?p=1688
  • Download jquery (you could also include an online version - google is your friend): I use jquery-1.4.2.min.js (1.4.3 did not work for me)
Then try this example:

Type into a terminal:
> node helloJQueryNode.js

It should write to the console:
> Hello World, It works! Really!

Now a more complex example:

again Type into a terminal:
> node xmlTest.js

It should write to the console:
> 229.29

Pretty simple!! So what does it do? So you have got an XML file. Node js fetches the file and jQuery is used for parsing. Well using jQuery selector syntax to extract the stuff you are looking  for from an XML file is typically a lot easier than for example using XPath.
Okay, as already mentioned for big XML files and if you have lots of files to crawl *I* would use more likely Java (apache httpclient + a fast STAX-parser such as Woodstox). For very simple tasks Node.js + jQuery is really a good choice.

Some more about Node.js:
http://net.tutsplus.com/tutorials/javascript-ajax/learning-serverside-javascript-with-node-js/

Friday, October 29, 2010

How to convert SVG files to pdf with Inkscape

This short post concerns windows. I usually have a folder with my standard batch files in a folder which is on the classpath (e.g. C:\Users\me\work\tools\bat\ ).

In this folder I have got to batch files:

  • inkscape.bat containing this line: "C:\Program Files\Inkscape\inkscape.exe" %*
    • don’t forget the qoutes if you white spaces in your path to inkscape
    • %* mean hand over all provided parameters/ arguments

  • svg2pdf containing this line: inkscape "%CD%\%1" -D --export-pdf "%CD%\%1.pdf"
    • %1 stands for the first provided parameter – you could modify this to check if there is a second parameter given and use that one as output file
    • %CD% stands for current directory
    • -D stands for export only the drawing area.0 The allowed parameters for inkscape can be found in the manual.

In an command window you navigate to the folder desired (in windows xp navigate to folder and press [Windows Key] + [r] and enter cmd – in windows 7 navigate to the parent folder of your desired folder and press [Shift] + your right mouse key and select “Open command window here”) and enter for example: svg2pdf foo.svg

One cool thing is with svgs it is easy to change the font for the used text. You simple replace the font name. Here is a simple python script for that and that additionally calls inkscape after that to convert the file – of course this works only if you have got python installed:


 

One other cool thing about inkscape is: If you have got a tool such as UMLet which can export to svg but renders fonts as lines (pdf export is not an option – you want to replace the used standard font and you can not afford an acrobat licence), you can not replace the fonts in the svg. You can use Inkscape to convert the pdf to an svg via a simple command: inkscape foo.pdf --export-plain-svg foo.svg or you can create a batch file analogue to the one above (e.g. svg2svg.bat):  inkscape "%CD%\%1" -D --export-plain-svg "%CD%\%1.svg". Having a svg now, you can replace the font in the svg and then export the svg to a pdf.

Note: I had sometimes problems with the newest versions of inkscape rendering fonts as lines in the resulting pdf and therefore losing the plain text. With version 0.46 it keeps the text as text. That might have to do with additional parameters or whatever. I do not care. Keep it simple make it work…

Friday, December 11, 2009

Playing around with JavaScript, Tag Clouds, Delicious

This is a post about an usage of a Cumulus Tag Cloud for delicious user I wrote. After providing a delicious name the application fetches your tags plus their frequency of usage for the corresponding delicious user. It can be found here.
Here an example screenshot.
image
This is done solely client side (no connection to my server, okay but delicious of course) by combining several JavaScript libraries (jquery, jquery URLEncode plugin, swfobject) and a flash tag cloud component.
1. The tags for a user are fetched from delicious. The same origin policy is bypassed with a little old trick (*):
var script = document.createElement('script');
script.setAttribute('src', 'http://feeds.delicious.com/v2/json/tags/' + user + '?callback=doTagCloud');
document.getElementsByTagName('head')[0].appendChild(script);

This trick inserts a script element in the head section of a html page. In html it will something like that:
<script type="text/javascript" src="'http://feeds.delicious.com/v2/json/tags/{user}?callback=doTagCloud"></script>

By providing a callback the actually loaded JavaScript code has the form:

doTagCloud({".net":1,"\\todo":1,"\"design":1, […]});
the function doTagCloud is called with the filled in tags.
2. The tags are sorted, pruned and rendered with the computed relative font weights.
swfobject.embedSWF("flash/tagcloud.swf", "tagCloud", "400", "500",
"7.0.0", false, flashvars, params, attributes);
 

(*) I am no big fan of the same origin policy since it does not actually prevent serious cross scripting attacks but on the other hand it hinders developing advanced client side mash ups without data proxies.

Thursday, September 10, 2009

European Summer School in Information Retrieval (ESSIR) 2009 in Padua

I have been on a Summer School about Information Retrieval in Padua last week - ESSIR 2009. In general it was awesome. I met many interesting people, had a lot of fun and also gathered some input concerning my current work. Most of the talks were excellent and really led me to some new insights about Information retrieval in

general.

Talks I liked most - no special order or distinction between quality (or my interests in the topic) of the talk:

  • The User in Interactive Information Retrieval Evaluation - Peter Ingwersen
  • Information Retrieval in Context - Ian Ruthven
  • Web Mining and Next-Generation Search - Aristides Gionis
  • Indexing Techniques - Mark Sanderson

Here are some pics:

image image 

image image

There are also some more pics at flickr.

Monday, August 17, 2009

JSLint with Eclipse

Some simple steps to check your JavaScript Code with JSLint:

  1. download and install JSRhino
  2. make it somehow available, e.g. create a js.bat (one line: java -jar "path/to/rhino/js.jar" %*
  3. download JSLint and put it in some folder, e.g. /path/to/jslint.js
  4. Add an external tool configuration in Eclipse:

    Location: path/to/js.bat

    Working Directory: ${workspace_loc}

    Arguments: path/to/jslint.js ${resource_loc}

  5. you can now check your JavaScript code with JSLint by calling this external tool.

Enjoy.

P.S.: There is a very interessting talk by Douglas Crockford  (creator of JSON and JSLint) at google tech talk that highlights the good and bad aspects of JavaScript. I liked that a lot.

Monday, July 6, 2009

This is just a test post

Windows Live Writer is just awesome to publish to your blog.



IMG_1157_small

I like that picture.

Monday, September 15, 2008

Clay Shirky about television and somehow waste of time

This is a talk of Clay Shirky at Web 2.0 Expo SF 2008.

He explains in a very entertaining way (in short) that people in the current generation watch too much tv and that in the time e.g. the people in US sit in front of their tv the whole wikipipedia could be written a many thousand times.

The new generation seems to use the media in a more active way. They write blog post and are more active in stating their opinion.

A quote:
TV Reporter: Where do people find all the time to do wikipedia?
Clay Shirky: No one who works in TV gets to ask that question.
You know where the times comes from. It comes from the cognitive surplus you have been masking for the last 50 years.


P.S.
Clay Shirky is known not only to the "tagging world" for his widely-read blog post with the title: "Ontology is Overrated: Categories, Links, and Tags". Tom Gruber refers to this post in an often cited article: "Ontology of Folksonomy: A Mash-up of Apples and Oranges". But that is an other story...