cljMovieIndexer: Revolución’s baby brother

As a task for exam I had to make something in Clojure. My idea was to do something similar to Revolución, just much simpler.

Clojure is a modern variant of Lisp that runs on the Java Virtual Machine. It’s designed for concurrent programming, has Lisp-like syntax, it’s built for Java Platform and it’s a functional language so it’s completely different style from Java, C#, C++ and similar languages. For me it wasn’t easy to switch to this style and logic, but once you are there it’s pretty much easy and fun :) Documentation could be more detailed. But there are few blogs/forums where you can find examples (now there is one more ;)).

The Idea

(… or how it works)

cljMovieIndexer runs from console. You specify the directory where your movies are, and it scans the directory and extracts movie names from sub-directory names. It will ignore unnecessary words like resolution, scene names, dots, release types etc. Using the extracted movie names cljMovieIndexer downloads movie info (stuff like release year, plot, actors, cover image) from TMDb and creates a movie description file (web archive format) which is saved in each movie directory.

The look of the description page can be changed by editing the template file (actually a css file).  Also it’s written with multithreading in mind, so it will use as many cores your CPU has when downloading movie info and creating description files.

The Code

And now some examples and interesting parts …

TMDb client

TMDb is a great open movie database and it has a pretty nice API to search for movies and get movie info and images. Requests are made using simple HTTP GET, and the results are in XML form.

First of all, we need to be able to search TMDb for movies:

(defn search
	"Search TMDB for movies with specified name."
	[movie-name]
		(when-not (nil? movie-name)
			(logger/info (str "Searching for '" movie-name "'"))
			(process-tmdb-search-response
				(httpa/stream
					(httpa/http-agent
						(str @tmdb-api-url "Movie.search/en/xml/" @tmdb-key "/" (URLEncoder/encode movie-name "UTF-8"))
						:connect-timeout 10000
						:read-timeout 10000)))))

For making HTTP requests there is an agent-based asynchronous HTTP client, so we create a request URL, start a request and pass it to stream method which return an InputStream of the HTTP response body. Then process-tmdb-search-response is used to read the stream.

(defn process-tmdb-search-response
	"Extracts movie id from search response."
	[response]
		(let [movie-xml (zzip/xml-zip (xxml/parse response))]
			(zipf/xml1-> movie-xml :movies :movie :id zipf/text)))

response stream is passed to parse method which parses the XML and returns a tree of elements. movie-xml is set to be zipper for xml elements. To extract data from xml there is a zip-filter system for filtering trees, and xml1-> method which gets the first item that meets the specified query predicates. So now we have TMDb id of the first movie from the search result.

Using the movie id we can get movie information like plot, cast, genre, images etc.

(defn info
	"Get movie info from TMDB for movie with specified id."
	[movie-id]
		(when-not (nil? movie-id)
			(logger/info (str "Get details for " movie-id))
			(process-tmdb-info-response
				(httpa/stream
					(httpa/http-agent
						(str @tmdb-api-url "Movie.getInfo/en/xml/" @tmdb-key "/" movie-id)
						:connect-timeout 10000
						:read-timeout 10000)))))

Like before, response is passed to another method, process-tmdb-info-response, for processing …

(defstruct movie :name :released :imdb :tagline :plot :rating :trailer :genres :images :actors :directors)

(defn process-tmdb-info-response
	"Create movie struct with data extracted from XML response."
	[response]
		(let [movie-xml (zzip/xml-zip (xxml/parse response))]
			(reduce
				(fn [current-movie mapping]
					(assoc current-movie
						(key mapping)
						(cond
							(= (key mapping) :directors) (extract-directors movie-xml)
							(= (key mapping) :actors) (extract-actors movie-xml)
							(= (key mapping) :images) (extract-covers movie-xml)
							(= (key mapping) :genres) (extract-genres movie-xml)
							:default (extract-value movie-xml (val mapping)))))
				(struct movie)
				{:name :name, :released :released, :imdb :imdb_id, :rating :rating, :trailer :trailer, :tagline :tagline, :plot :overview, :genres nil, :images nil, :actors nil, :directors nil})))

This is a little more complicated :), but I’ll try to explain. First we parse xml and then extract some of the info (I didn’t need all that is available) and store it in movie struct. reduce method …

… returns the result of applying function to value and the first item in collection, then applying function to that result and the 2nd item, etc.

In this particular case, supplied function receives two params – movie struct instance and pair of keys (one is a key in movie struct and another matches element in xml). In “first step”, empty movie struct instance is passed to function with first pair from the map (:name :name), and the function will extract the value of the name element in the xml and set it as a value for the name key in the movie struct. Then (“second step”) resulting movie struct is passed again with the next pair (:released :released), value is extracted and set, and so on. At the end process-tmdb-info-response method returns instance of movie struct with all movie info that is found.

HTML & MHT

There is a great library for working with HTML in Clojure called Hiccup. I used it to create HTML page with movie info from movie struct. It’s very simple to create html, so as a result of this:

(html [:html [:body [:p "Some text"]]])

… you’ll get:

<html><body><p>Some text</p></body></html>

When we have the page html string, then we need to embed the images and CSS, and create a web archive. That can be done using the classes from javax.mail package since you can use standard Java libraries, classes etc. from Clojure. You can see how at cljMovieIndexer project page @ googlecode. There is also an example of how to create a PDF document.

Make executable application

This was a little tricky to work out, but here it is, step by step:

  1. Download Leiningen (a build tool for Clojure) and run lein self-install
  2. Write a -main method:
    (defn -main [& args] ( ... ))
  3. Make a project.clj file in project root directory where you write project dependencies and path to clj file with main method
  4. Run lein uberjar to create a single standalone jar file

You can find complete source code and executable cljMovieIndexer application on the project page: https://github.com/joshefin/cljmovieindexer.

Note

As I said at the beginning, I made this just for practice, learning new stuff and good grade on the exam :), so it certainly is not 100% stable and bugs-free, but I hope it’ll be useful to someone who is learning Clojure or is looking for examples.

One comment on “cljMovieIndexer: Revolución’s baby brother

  1. […] is another project I would like to share. As before, this is a project for faculty course Security in computer […]

Leave a comment