Monday, May 14, 2012

Deleting Orphan Raw Images after JPEGs have been Removed

I have been looking for a quick and dirty solution for removing unwanted files that have been copied from a camera that creates both raw and jpg files for shot pictures.

The python script below is very useful when you are working on a computer where no advanced tool, such as Adobe Lightroom, are available. You can use a simple image viewer, such as the one provided by Gnome or Irfanview on Windows, and browse through your images in a folder removing all unwanted pictures (the jpg version).

The python script below moves all raw images ("*.CR2")  with no corresponding jpg ("*.JPG") version to a sub folder "deleted". You can check than of the deleted folder and remove them manually. Alternatively replace "shutil.move(file, './deleted')" with "os.remove(file)" if you want to have the files deleted instead of moved.

The file extension naming is the one used by Canon. A modified version for other cameras might be not that difficult to write.  

Friday, November 5, 2010

jQuery and Node.js

jQuery makes it really easy to select certain parts not only of a HTML page but also of an XML document. So if you just want to parse some (and not too big) XML file or do not care too much ab the speed of your parser a combination of node.js and jquery can help a lot and make some stuff really less painful.

So what is needed (ubuntu linux):
  • Install node.js on ubuntu:
  • Download jquery (you could also include an online version - google is your friend): I use jquery-1.4.2.min.js (1.4.3 did not work for me)
Then try this example:

Type into a terminal:
> node helloJQueryNode.js

It should write to the console:
> Hello World, It works! Really!

Now a more complex example:

again Type into a terminal:
> node xmlTest.js

It should write to the console:
> 229.29

Pretty simple!! So what does it do? So you have got an XML file. Node js fetches the file and jQuery is used for parsing. Well using jQuery selector syntax to extract the stuff you are looking  for from an XML file is typically a lot easier than for example using XPath.
Okay, as already mentioned for big XML files and if you have lots of files to crawl *I* would use more likely Java (apache httpclient + a fast STAX-parser such as Woodstox). For very simple tasks Node.js + jQuery is really a good choice.

Some more about Node.js:

Friday, October 29, 2010

How to convert SVG files to pdf with Inkscape

This short post concerns windows. I usually have a folder with my standard batch files in a folder which is on the classpath (e.g. C:\Users\me\work\tools\bat\ ).

In this folder I have got to batch files:

  • inkscape.bat containing this line: "C:\Program Files\Inkscape\inkscape.exe" %*
    • don’t forget the qoutes if you white spaces in your path to inkscape
    • %* mean hand over all provided parameters/ arguments

  • svg2pdf containing this line: inkscape "%CD%\%1" -D --export-pdf "%CD%\%1.pdf"
    • %1 stands for the first provided parameter – you could modify this to check if there is a second parameter given and use that one as output file
    • %CD% stands for current directory
    • -D stands for export only the drawing area.0 The allowed parameters for inkscape can be found in the manual.

In an command window you navigate to the folder desired (in windows xp navigate to folder and press [Windows Key] + [r] and enter cmd – in windows 7 navigate to the parent folder of your desired folder and press [Shift] + your right mouse key and select “Open command window here”) and enter for example: svg2pdf foo.svg

One cool thing is with svgs it is easy to change the font for the used text. You simple replace the font name. Here is a simple python script for that and that additionally calls inkscape after that to convert the file – of course this works only if you have got python installed:


One other cool thing about inkscape is: If you have got a tool such as UMLet which can export to svg but renders fonts as lines (pdf export is not an option – you want to replace the used standard font and you can not afford an acrobat licence), you can not replace the fonts in the svg. You can use Inkscape to convert the pdf to an svg via a simple command: inkscape foo.pdf --export-plain-svg foo.svg or you can create a batch file analogue to the one above (e.g. svg2svg.bat):  inkscape "%CD%\%1" -D --export-plain-svg "%CD%\%1.svg". Having a svg now, you can replace the font in the svg and then export the svg to a pdf.

Note: I had sometimes problems with the newest versions of inkscape rendering fonts as lines in the resulting pdf and therefore losing the plain text. With version 0.46 it keeps the text as text. That might have to do with additional parameters or whatever. I do not care. Keep it simple make it work…

Friday, December 11, 2009

Playing around with JavaScript, Tag Clouds, Delicious

This is a post about an usage of a Cumulus Tag Cloud for delicious user I wrote. After providing a delicious name the application fetches your tags plus their frequency of usage for the corresponding delicious user. It can be found here.
Here an example screenshot.
This is done solely client side (no connection to my server, okay but delicious of course) by combining several JavaScript libraries (jquery, jquery URLEncode plugin, swfobject) and a flash tag cloud component.
1. The tags for a user are fetched from delicious. The same origin policy is bypassed with a little old trick (*):
var script = document.createElement('script');
script.setAttribute('src', '' + user + '?callback=doTagCloud');

This trick inserts a script element in the head section of a html page. In html it will something like that:
<script type="text/javascript" src="'{user}?callback=doTagCloud"></script>

By providing a callback the actually loaded JavaScript code has the form:

doTagCloud({".net":1,"\\todo":1,"\"design":1, […]});
the function doTagCloud is called with the filled in tags.
2. The tags are sorted, pruned and rendered with the computed relative font weights.
swfobject.embedSWF("flash/tagcloud.swf", "tagCloud", "400", "500",
"7.0.0", false, flashvars, params, attributes);

(*) I am no big fan of the same origin policy since it does not actually prevent serious cross scripting attacks but on the other hand it hinders developing advanced client side mash ups without data proxies.

Thursday, September 10, 2009

European Summer School in Information Retrieval (ESSIR) 2009 in Padua

I have been on a Summer School about Information Retrieval in Padua last week - ESSIR 2009. In general it was awesome. I met many interesting people, had a lot of fun and also gathered some input concerning my current work. Most of the talks were excellent and really led me to some new insights about Information retrieval in


Talks I liked most - no special order or distinction between quality (or my interests in the topic) of the talk:

  • The User in Interactive Information Retrieval Evaluation - Peter Ingwersen
  • Information Retrieval in Context - Ian Ruthven
  • Web Mining and Next-Generation Search - Aristides Gionis
  • Indexing Techniques - Mark Sanderson

Here are some pics:

image image 

image image

There are also some more pics at flickr.

Monday, August 17, 2009

JSLint with Eclipse

Some simple steps to check your JavaScript Code with JSLint:

  1. download and install JSRhino
  2. make it somehow available, e.g. create a js.bat (one line: java -jar "path/to/rhino/js.jar" %*
  3. download JSLint and put it in some folder, e.g. /path/to/jslint.js
  4. Add an external tool configuration in Eclipse:

    Location: path/to/js.bat

    Working Directory: ${workspace_loc}

    Arguments: path/to/jslint.js ${resource_loc}

  5. you can now check your JavaScript code with JSLint by calling this external tool.


P.S.: There is a very interessting talk by Douglas Crockford  (creator of JSON and JSLint) at google tech talk that highlights the good and bad aspects of JavaScript. I liked that a lot.

Monday, July 6, 2009