My own little space on the world wide web

It’s been a very long time since I’ve been planning to get one done, but I’ve finally got a personal website/ academic profile up. It was almost half a year or a year back that my supervisor told me to create a website for myself and it’s somewhat essential for PhD students and stuff. But NTU wasn’t giving me any domain space and though I could get some server space on the PDCC servers for creating a team website and page for myself, that never materialized either because everyone was too busy. So I created some random thing on sites.google.com. Which looked pathetic and I couldn’t really figure out how to customize it much and most importantly I wanted my own domain.

This is not the first time I’ve made a website. But the last time I created a personal web page I was in 11th Standard (Junior College) and it was created using Front page and I think about 5 people in total saw it. I created it using geocities, I wonder if it’s still up. Anyway, after that I’ve started a few blogs now and then and I went through the initial process of trying WordPress and then Joomla for creating the team website. But for my own website I wanted to create it form scratch using just html and css nad not using pre specified templates.

So the first step was learning how to use CSS and brushing up on the basic HTML I knew and checking out html 5 (not that I needed it, but might as well see what the big deal was I thought). Anyway, with the help of some tutorials and templates from Lynda.com, I was able to create the website quite easily.

Next I had to get it up on the interweb. I’d heard from some friends that Amazon Web Services has some free tier of storage if I use less than a certain amount of space. And being amazon, I’m guessing the service is reliable. Anyway, I set up an account. Even though I had to submit my credit card details for it, I won’t be charged for the first year unless I add more than 5 GB of data to the storage provided. Also, they’ve allowed me to set an alarm if the charge ever goes above 0$.  Anyway, they have excellent instructions on how to use it. However, I don’t really think it is even necessary. It’s that easy. Just remember, DO NOT delete buckets randomly. It takes a while to have that name available again.

Now that I had my data stored there, the website was accessible but the link was a very long and weird one and it basically made the geocities/ google sites domain name look better. So I started searching for a name registrar. The first link and recommended article I got to was LifeHacker’s recommendation. Since I wanted the cheapest one out there, I went for namecheap.com. But a quick search online, showed that internet.bs provided domain registration for about a dollar less and people seemed to be reasonably satisfied by it.  So I went ahead and bought vaisaghvt.com for about 8.99USD a year.

The process was easy enough. But being  newbie I had no clue how to link that url to my aws bucket. I initially tried simple url forwarding. While this worked (and incredibly quickly at that ) it ended up showing that long useless URL as the URL in the address bar which I found quite irritating and inconvenient. Luckily I found this very helpful blogpost, which told me exactly how to solve this problem. Just remember, to name your bucket properly. In my case since I wanted it to be linked to www.vaisaghvt.com, this is the name that I gave to my bucket. While they kept telling me that this might take hours or a day to happen. In my case, I could see my website at that name in less than an hour.

I know it’s not that big a deal to create a website. But can’t help the childish glee in having my own little place in this interweb thingie.

Please do give it a look : vaisaghvt.com

Running simulations and analysing data

My first post in a long time. This is more as a journal entry for me to look back at when I need to.

My PhD project mostly involves running simulations of hundreds of people evacuating from a building and then analysing the simulation in various ways. While the MASON framework in Java helps a lot in the implementation of the model itself, something just as intersting and some thing that in the end feels a lot cooler is running all those simulations and getting data and analysing them.

Step 1: Running multiple Simulations

MASON allows you to run simulations in two major ways : Either using the GUI in which you get to see how the simulation is going. This mode is very useful and essential when creating and debugging the model. However when it comes to actually running simulations and gathering data for analysis, this is quite obviously impractical. This is when the console mode comes in handy. In the console mode, you run several replications of the required simulation with the required seed. Initially I used the handy in built function to do this. I also needed to store the simulation specific settings in some place. Initially I did this using constants in various classes, which I changed to storing all the constants in one class which was a lot more convenient to change and finally I resorted to a much more practical xml file which can easily be read from using JaxB in java. Though I think I might change to an sql based implementation soon. Anyway, the point is, I am able to run my simulation using it’s jar file and an xml file with all the parameters that are used for the simulation.

Step 2: Storing data:

The next step in this process is collecting data from these simulations. As a way to get started I stored my initial files as simple text files in csv format which I analysed in excel. Pretty soon, this became extremely impractical because of the amount of data I had to store. So I changed to storing in binary format and created  parser which would convert generated binary files to text files. I could have used some of java’s inbuilt analysis tools like some provided by the apache framework, but I was quite lazy, and I was working with someone who wanted the text files so that he could analyse it in Matlab, so I resorted to a binary file with a parser to convert to text.

However, despite the organised file hierarchy and names, this was still very difficult to analyse and keep organised and it was still very huge. Also there were a lot of complications when I were writing from multiple runs, experiments, etc. So I switched to what I should actually have done: a relational database. I set up a mysql server instance on my lab computer and wrote all the required database to the file at the end of each run of the simulation.

Step 3: Analysing the data :

Excel being boring, I shall not go into the details of how I did it initially. So once I got the data in MySQL, I needed some tool to analyse it. That’s when my prof recommended using mathplotlib in python. I’ve used python before to create a simple script to clean up references in a text file however, I’ve hardly used it for anything else even though I liked the language a lot. So I decided to give it a try. Interestingly enough I had a lot of trouble finding a free library for mysql. Though once I finally found, mySQLdb, the process of querying and analysing the data and getting some neat graphs took hardly a few lines of code. So now once I had the data, i could simply run the python script and get all the charts I needed.

Step 4: The power of the cloud

A single run of my simulation can take up to 5 minutes. For 100 replications of  under 6 different settings (this is what I needed for the particular run at that time) this would take about 3000 minutes or 50 hours or just over 2 days. While not bad, I needed my computer and I worked at the parallel and distributed computing center so it would have been a waste to not make use of all that computing power at our disposal. So I got myself an account on the cluster and created a simple shell script that would run the simulation with the fixed settings. Eventually I extended this so that it would read parameters from a separate text file, modify the xml file appropriately, and then run the simulation the required number of times and finally, at the end of the run I would be send an email. Here is the code for this first script:

 
#!/bin/bash
# runSimulations = runsSimulations from inputs in file 1

opath=$PATH
PATH=/bin:/usr/bin

case $# in
  0|1) echo 'Usage runSimulations settingsFile xmlFile' 1>&2; exit 1
esac

awk -v xmlFile=$2 '
BEGIN {totalCount=1
startingPoint[1]=1}
{
  model[NR] = $1
  startingPoint[NR+1] = startingPoint[NR]+NF-1
  for(i=2;i<=NF;i++){
    completeValuesList[totalCount] = $i
    totalCount++
  }
}
END {
  startingPoint[NR+1] = totalCount-1
  for (j=0; j<=NR; j++){
    indices[j] = 0
  }

  while(indices[0]!=1){
    timeNeeded=0

    for(j=1;j<=NR;j++){
      value[j] = completeValuesList[startingPoint[j]+indices[j]]
      command = "overwrite " xmlFile " xmlParser " model[j] " " value[j] " " xmlFile
      # print command
      system(command)
    }
    testCommand = "grep FilePath " xmlFile;
    testCommand |getline filePathLine
    close(testCommand)
    seed = 1
    javaCommand = "java -cp dist/CrowdSimulation.jar app.RVOModel -repeat 100 -time 100 -seed " seed
    # print javaCommand
    system(javaCommand)
    for(j=NR;j>=1;j--){
       if(startingPoint[j]+indices[j]==startingPoint[j+1]){
          indices[j]=0
          indices[j-1]++
       }else {
          if(j==NR){
             indices[j]++
          }
       }
    }
 }
}' $1
echo $1 $2 "run complete"|mail -s "Run Complete" vaisaghvt@gmail.com

For anyone with a little experience in shell scripting, this might seem like crap, so if are bored enough to go through this and you know some shell scripting, please do give me any suggestions that you have. That was the code for my first project. In my second project, I’ve changed my approach to having a separate class for each experiment. And also initially, I manually did the work of connecting to each cluster and initializing the job. Now, I’ve automated this too. So I specify the experiment and settings to be run and the script dispatches the jobs to the specified set of clusters and as above, I get emailed at the end when the data is available.

#!/bin/bash
opath=$PATH
PATH=/bin:/usr/bin

case $# in
  0) echo 'Usage runExperiment classToBeRun' 1>&2; exit 1
esac
program=$1

for cluster in "c0-0 0" "c0-1 20" "c0-2 40" "c0-3 60" "c0-4 80" "c0-5 100"
   do
      set -- $cluster
      ssh $1 "nohup ./runCommunication.sh $program $2 2> 2_$2.log 1> 2_$2_1.log < /dev/null &"
      echo "assigned to $1"
   done

SSHing to a remote client and running the command in nohup were the two most difficult parts of this. Nohup lets you run the process even after disconnecting from the machine. The & at the end makes the process run in the background so that you can disconnect and connect to the next machine or do other things. The output is redirected to log files so that I can keep a track of what is happening and finally, something that I took a long time to figure out, you should set input to be received from /dev/null, otherwise you will not be able to disconnect from that particular remote machine.

#!/bin/bash
opath=$PATH
PATH=/bin:/usr/bin

case $# in
  0|1) echo 'Usage runSimulation classFile parameter' 1>&2; exit 1
esac
java -cp IBEVAC.jar $1 $2

echo "$1 $2 run complete"|mail -s "Run Complete" vaisaghvt@gmail.com

There’s still a lot more automation I can and plan to do. But as of now, I’m in a state where I can run simulations quite easilly and I won’t be changing things much for some time. Next stop, getting a proper gitflow happening with Netbeans or eclipse.