viernes, 16 de mayo de 2014

Julia versus R - Playing around

So...as time goes by, I'm getting more proficient with Julia...which is something fairly easy as the learning curve is pretty fast...

I decided to load a file with 590,209 records that I got from Freebase...the file in question contains Actors and Actresses from movies...you can have a quick look here...


For this test, I'm using my Linux box on VMWare running on 2 GB of RAM...running Ubuntu 12.04.4 (Precise)

For R, I'm not using any special package...just plain R...version 2.14.1 and for Julia version 0.2.1, I'm using the DataFrames package...

Let's take a look at the R source code first along with its runtime processing...

Actors_Info.R
start.time <- Sys.time()
if(!exists("Actors")){
Actors<-read.csv("Actors_Table.csv", header=TRUE, 
                     stringsAsFactors=FALSE, colClasses="character", na.strings = "")
}
Actors<-unique(Actors)
Actors<-Actors[complete.cases(Actors),]
Actor_Info<-data.frame(Actor_Id=Actors$Actor_Id,Name=Actors$Name,Gender=Actors$Gender)
Actor_Info<-Actor_Info[order(Actor_Info$Gender),]
write.csv(Actor_Info,"Actor_Info_R.csv",row.names=TRUE)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

This source will first ask if the file was loaded already, if not...it will load it...then, it will eliminate the repeated records, delete all the null or NA's and the create a new Data Frame, sort it by "Gender" and then write a new CSV file...time will be taken to measure its speed...we will run it twice...first time the file is not loaded...second time it will...and that should improve greatly the execution time...



As we can see...the times are really good...and the different between the first and second run are pretty obvious...for the record...the generated file contains 105874 records...

Now...let's see the Julia version of the code...

Actors_Info.jl
using DataFrames
start = time()
isdefined(:Actors) || (Actors = readtable("Actors_Table.csv", header=true, nastrings=["","NA"]))
drop_duplicates!(Actors)
complete_cases!(Actors)
Actor_Info = DataFrame(Actor_Id=Actors["Actor_Id"],Name=Actors["Name"],Gender=Actors["Gender"])
sortby!(Actor_Info, [:Gender])
writetable("Actor_Info_Julia.csv", Actor_Info)
finish = time()
println("Time: ", finish-start)


Here...we're doing the same...we load the DataFrames package (But exclude that from the execution time), check if the file is loaded so we don't load it again on the second run...eliminate duplicates, delete all null or NA, create a new DataFrame, sort it by "Gender" and finally write a new CVS file...


Well...the difference between the second and first run is very significative...but of course...way slower than R...

But...let me tell you one simple thing...Julia is still a brand new language...the DataFrames package is not part of the core Julia language, which means...that its even newer...and optimizations are being performed as we speak...I would say that for a young language...18 seconds to process 590,209 records is pretty awesome...and of course...my R experience surpasses greatly my Julia experience...

So...I don't really want to leave you with the impression that Julia is not good or not fast enough...because believe me...it is...and you going to love my next experiment -;)

Let's take a look at the R source code first...

Random_Names.R
start.time <- Sys.time()
names<-c("Anne","Gigi","Blag","Juergen","Marek","Ingo","Lars","Julia",
         "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven")

last_names<-c("Hardy","Read","Tejada","Schmerder","Kowalkiewicz","Sauerzapf",
              "Karg","Satsuta","Keene","Ongkowidjojo","Vayssiere","Kylau",
              "Fenlon","Flynn","Taylor")
full_names<-c()
for(i in 1:100000){
  name<-sample(1:15, 1)
  last_name<-sample(1:15, 1)
  full_name<-paste(names[name],last_names[last_name],sep=" ")
  full_names<-append(full_names,full_name)
}
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

So this code is fairly simple...we have a couple of vectors with names and last names...then we loop 100000 times and then generate a couple of random numbers simply to read the vectors, create a full name and populate a new vector... with some random funny name combinations...



Well....the different between both runs is not really good...second time was a little bit higher...and 1 minute is kind of a lot...let's see how Julia behaves...

Here's the Julia source code...

Random_Numbers.jl
start = time()
names=["Anne","Gigi","Blag","Juergen","Marek","Ingo","Lars","Julia",
       "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven"]
last_names=["Hardy","Read","Tejada","Schmerder","Kowalkiewicz","Sauerzapf",
            "Karg","Satsuta","Keene","Ongkowidjojo","Vayssiere","Kylau","Fenlon","Flynn","Taylor"]
full_names=String[]
full_name = ""
for i = 1:100000
        name=rand(1:15)
        last_name=rand(1:15)
        full_name = names[name] * " " * last_names[last_name]
        push!(full_names,full_name)
end
finish = time()
println("Time: ", finish-start)

So this code as well, creates two arrays with names and last names, do a loop 100000 times, generate a couple of random numbers, mix a name with a last name and then populate a new array with some mixed full names...


Just like in the R code...the second time took Julia a little bit more...but...less than a second?! That's something like...amazingly fast and really took R by storm...

Now...I believe you will start to take Julia more seriously -:D

Hope you liked this blog...

Greetings,

Blag.
Development Culture.

jueves, 15 de mayo de 2014

Social Media Mining with R - Book review

I was really excited when my friend from Packt Publishing send me this book...as I haven't read any R book in a while...but don't get me wrong...the book is not bad...it's just that I expected a little bit more...let me explain a little bit...


This book is not too big, which is something I appreciate...it's 122 pages...it comes with a short introduction to R which is good for newbies and then it goes straight to Social Media Mining using Twitter.


The problem I had with this book...and that's maybe not really a bad thing...it has more Social Media Mining explanation than actual code...so sure, it does a great job explaining how Social Media Mining works but for a die hard developer like me...the source code is more important...


To be honest with you...I would bought this book if I haven't got any Social Media Mining experience....but I have worked and made several applications using R and Twitter in the past...so...this book wasn't really for me...

Greetings,

Blag.
Development Culture.

martes, 13 de mayo de 2014

My first post on Julia

So...what Julia? Just another nice programming language -;)

According to it's creators...

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.



I just started learning it a couple of days ago...and I must say that I really like it...it has a Python like syntax so I felt comfortable from the very start...

Of course...it's kind of a brand new language, so things are being added and fixed while we speak...but the community is growing and I'm glad to be amongst it's "early" supporters -:)

What I did right after I read the documentation and watch a couple of videos was to simply port one my old Python applications to Julia...the app was "LCD Numbers" which ask for a number and return it printed like in LCD format...

This is the Python code...

LCD_Numbers.py
global line1, line2, line3

line1 = ""
line2 = ""
line3 = ""

zero = {1: ' _  ', 2: '| | ', 3: '|_| '}
one = {1: '  ', 2: '| ', 3: '| '}
two = {1: ' _  ', 2: ' _| ', 3: '|_  '}
three = {1: '_  ', 2: '_| ', 3: '_| '}
four = {1: '    ', 2: '|_| ', 3: '  | '}
five = {1: ' _  ', 2: '|_  ', 3: ' _| '}
six = {1: ' _  ', 2: '|_  ', 3: '|_| '}
seven = {1: '_   ', 2: ' |  ', 3: ' |  '}
eight = {1: ' _  ', 2: '|_| ', 3: '|_| '}
nine = {1: ' _  ', 2: '|_| ', 3: ' _| '}

num_lines = {0: zero, 1: one, 2: two, 3: three, 4: four,
             5: five, 6: six, 7: seven, 8: eight, 9: nine}

def Lines(number):
    global line1, line2, line3
    line1 += number.get(1, 0)
    line2 += number.get(2, 0)
    line3 += number.get(3, 0)

number = str(input("\nEnter a number: "))
length = len(number)
for i in range(0, length):
    Lines(num_lines.get(int(number[i:i+1]), 0))

print ("\n")
print line1
print line2
print line3
print ("\n") 
And this is in turn...the Julia version of it...

LCD_Numbers.jl
zero = [1=> " _  ", 2=> "| | ", 3=> "|_| "]
one = [1=> "  ", 2=> "| ", 3=> "| "]
two = [1=> " _  ", 2=> " _| ", 3=> "|_  "]
three = [1=> "_  ", 2=> "_| ", 3=> "_| "]
four = [1=> "    ", 2=> "|_| ", 3=> "  | "]
five = [1=> " _  ", 2=> "|_  ", 3=> " _| "]
six = [1=> " _  ", 2=> "|_  ", 3=> "|_| "]
seven = [1=> "_   ", 2=> " |  ", 3=> " |  "]
eight = [1=> " _  ", 2=> "|_| ", 3=> "|_| "]
nine = [1=> " _  ", 2=> "|_| ", 3=> " _| "]

num_lines = [0=> zero, 1=> one, 2=> two, 3=> three, 4=> four,
             5=> five, 6=> six, 7=> seven, 8=> eight, 9=> nine]

line = ""; line1 = ""; line2 = ""; line3 = ""

function Lines(number, line1, line2, line3)
    line1 *= number[1]
    line2 *= number[2]
    line3 *= number[3]
    line1, line2, line3
end

println("Enter a number: "); number = chomp(readline(STDIN))
len = length(number)
for i in [1:len]
    line = Lines(num_lines[parseint(string(number[i]))],line1,line2,line3)
    line1 = line[1]; line2 = line[2]; line3 = line[3]
end

println(line1)
println(line2)
println(line3 * "\n")

As you can see...the code looks somehow similar...but of course...I got rid of those ugly global variables...and used some of the neat Julia features, like multiple value return and variable definition on one line... If you want to see the output...here it is...


Of course...this is just a test...things are going to become interesting when I port some R code into Julia and run some speed comparisons -;)

Greetings,

Blag.
Development Culture.

miércoles, 7 de mayo de 2014

Game Development with Three.js - Book Review

Last week I wrote a review about Learning Three.js: The JavaScript 3D Library for WebGL - Book Review and I said that I was going to read another Three.js book and write a review about it...well...here it is -;)

Well...the book is not so big...just 118 pages...but that's fine...the other book was too big...



It starts with a nice introduction to Three.js, so you can get comfortable with it...comes with some nice examples and even a First Person Shooting game...which sadly...it's not deeply explained so you need to download the source code and try to make sense of it...



To be honest...and maybe I'm becoming grumpy...I wouldn't buy this book if I haven't bought it already...the examples are good enough to give you a sense of what you can do with Three.js but not really good to actually teach you how to do real games...what I mean is...I would have preferred to have several small games showcasing the capabilities and techniques that can be used...instead of an already made FPS game....

I'm not sure if there's more Three.js books out there...but...I don't think I'm going to go any further learning it...no matter how good and awesome it is...without a book like I want...I don't see much of a point on spending a lot of time learning it...and sure...maybe it's because I'm not really a JavaScript guy...but that's just me...

Greetings,

Blag.
Development Culture.