Scaling the Minecraft Game for studying Human Behavior

I’ve created a game in Minecraft for testing human exploration in indoor environments. It’s available for playing at http://www.vaisaghvt.com/minecraft-experiment/ . Please go ahead and give it a try if you have a minecraft account and want to play Minecraft for Science . Or if you’re the competitive sort, try to get on the leaderboard.

 

It’s been more than a year since my last post on how I created the game. Since then, I’ve got some very basic results from it with students in NTU playing. However, it was surprisingly hard to get people and I had to host it online to get enough data. It took a while to get funding approved and stuff but finally I’ve managed to host my experiment on Amazon. In this post, I’ll try to describe the things I had to do since my last post to get this to work.

What I’d already done:

A short version of what I explained in my last post: I’d created an adventure of sorts in Minecraft where players get spawned at one location, follow instructions to explore a three storey building and then complete a few tasks. On disconnecting from the server (manually), all the data of the actions of the player is written to a MySQL server. I analyse this data in the hope of finding a cure for cancer. And this is what I had to do for that:

Step 1 : Shutting down the server automatically

I can’t believe how long it took me to do this, but it ended up being quite simple with my modified statistician plugin simply sending a shutdown command to the server once the player completed the last task.

Step 2: Starting on request

Next, I used Python’s brilliant twisted framework to write a simple server program that would listen for connections and start a tekkit server when such a request was received. To test things out I created a simple client side script also that simply sent the username. The database would be queried for existing attempts and the unique id consisting of the username and attempt number was created to store it in the database.

TO start and stop the server I used this handy little script: http://pastebin.com/Lgs4r5f8 .

Step 3: Hosting the server and scripts on AWS

Now that I had this simple set up working I got an amazon EC2 machine and a MySQL database and copied all my server files and scripts there. Instructions on how to do this are easily available on googling and quite straightforward to follow.

If anyone’s trying to copy a Tekkit server from a Windows to a linux machine, remember that you have to download the Tekkit server on the machine separately and just copy the world files (as opposed to trying to copy the whole folder there).

Step 4: Final Touches

Now that I had most of this working, I added a new page and with some simple python and javascript and a bit of effort in beautifying things, I was able to send requests to play from my website to the EC2 server and play.

I also made the server a white listed server with the white list being created on a person signing up. Finally, I put a time out on the server that would shut down the spawned tekkit server after an hour so that the experiment didn’t stop simply because someone started the server and didn’t play.

The end results of all this are there for all to see at http://www.vaisaghvt.com/minecraft-experiment/ .

 

Final Comments

I didn’t go to as much detail as I wanted simply because I don’t have the time to write in detail. I’ll try to share all the scripts and stuff I wrote on my GitHub page once I get time to clean it up a bit. In the mean time, if anyone’e curious to know more about any part of the process, do ask.

Also, if you play minecraft and have a minecraft account or know anyone who does, please go to http://www.vaisaghvt.com/minecraft-experiment/ and play minecraft for science 🙂

Each Tekkit server with all the plugins I have require about 3G of memory, so it’s expensive to get more than the one EC2 instance that I have now and making a proper queueing system that emails people when the server is available will take a lot of effort so as of now, only one player can play at a time and someone else who requests while the one player is playing will simply have to wait and check back later.

I’m working on a paper summarizing the results from my analysis. Shall share that also once I’m done with it.

 

Advertisements

A Sublime Text plugin for the careless

So after close to a year of procrastinating, I finally made the code that I had written about back in June last year into a plugin for sublime text 2. The plugin basically checks for a couple of mistakes that are commonly made in latex by me and hopefully other people who forget to title case titles and put spaces after punctuations. On running the command, the plugin highlights each mistake and suggests a replacement which can be accepted or rejected by the user with a single key press. IF you’re actually reading this post, do feel free to check it out at https://github.com/vaisaghvt/CheckTypos .

The process of making it into a plugin was quite straightforward since my original script was in python. After some help from the excellent NetTuts tutorial and some help from some of the existing plugins and obviously the API I had the plugin working. About 10 lines of JSON later, I had an option in the tools menu, a key mapping for mac, linux and windows and a command in the command palette.  Seriously, I’ve no idea why I waited for so long.

 

p.s. It’s now available in package control as CheckTypos plugin. Love how that things works.

Regex + Python to clean up my writing

After shifting to using latex (with sublime text) for writing, one of the things I’ve found rather irritating is correcting the silly mistakes I keep making. These are things like accidentally putting two spaces adjacent to each other, repeating phrases and forgetting to capitalize letters in the right places. Word used to make things easier with it’s spell check. There is a dictionary in sublime text but it works only for spelling mistakes and those aren’t always the problem. Checking the PDF and going through them looking for mistakes was obviously quite irritating.

Initial Solution

It was around this time I made a list of standard regexes that I could search for and replace using sublime text’s in built search. I put these in my sublime text latex cheatsheet for easy access. I’d just copy these from the cheat sheet and paste into the search bar and fix each error as I saw it. Obviously quite time consuming. I intended on automating this with some sort of script but just never got around to doing it. With all the other checks I had to do like whether the text appeared in a comment or an equation block or something. I had no clue how to do this in a simple bash script.

Python To the Rescue

It was at this time I was reading about someone using regexes in Python and I realised this would be an interesting way to improve my limited python skills and do something useful. And I set about making a python script that checks for the common mistakes I make (the regexes in my cheatsheet) and makes the appropriate suggestions for replacements and updates the file.

Adding new regex patterns and ways in which it has to be replaced is as simple as writing a simple function and adding a line to the list of patterns to be tested. I still need to do some basic testing on it and add more patterns but I’ve put the code up on GitHub already (link) and would be extremely grateful to anyone who checks it and gives any suggestions. I will update the Readme file and comment my code very soon (seriously.. i will ).

Finally, using the idea of functions being first class members in python for the first time was super interesting and super useful. Gives a hint of some of the biggest limitations of java; and a brilliant rant by Steve Yegge on java and functional programming: http://steve-yegge.blogspot.sg/2006/03/execution-in-kingdom-of-nouns.html.

Running simulations and analysing data

My first post in a long time. This is more as a journal entry for me to look back at when I need to.

My PhD project mostly involves running simulations of hundreds of people evacuating from a building and then analysing the simulation in various ways. While the MASON framework in Java helps a lot in the implementation of the model itself, something just as intersting and some thing that in the end feels a lot cooler is running all those simulations and getting data and analysing them.

Step 1: Running multiple Simulations

MASON allows you to run simulations in two major ways : Either using the GUI in which you get to see how the simulation is going. This mode is very useful and essential when creating and debugging the model. However when it comes to actually running simulations and gathering data for analysis, this is quite obviously impractical. This is when the console mode comes in handy. In the console mode, you run several replications of the required simulation with the required seed. Initially I used the handy in built function to do this. I also needed to store the simulation specific settings in some place. Initially I did this using constants in various classes, which I changed to storing all the constants in one class which was a lot more convenient to change and finally I resorted to a much more practical xml file which can easily be read from using JaxB in java. Though I think I might change to an sql based implementation soon. Anyway, the point is, I am able to run my simulation using it’s jar file and an xml file with all the parameters that are used for the simulation.

Step 2: Storing data:

The next step in this process is collecting data from these simulations. As a way to get started I stored my initial files as simple text files in csv format which I analysed in excel. Pretty soon, this became extremely impractical because of the amount of data I had to store. So I changed to storing in binary format and created  parser which would convert generated binary files to text files. I could have used some of java’s inbuilt analysis tools like some provided by the apache framework, but I was quite lazy, and I was working with someone who wanted the text files so that he could analyse it in Matlab, so I resorted to a binary file with a parser to convert to text.

However, despite the organised file hierarchy and names, this was still very difficult to analyse and keep organised and it was still very huge. Also there were a lot of complications when I were writing from multiple runs, experiments, etc. So I switched to what I should actually have done: a relational database. I set up a mysql server instance on my lab computer and wrote all the required database to the file at the end of each run of the simulation.

Step 3: Analysing the data :

Excel being boring, I shall not go into the details of how I did it initially. So once I got the data in MySQL, I needed some tool to analyse it. That’s when my prof recommended using mathplotlib in python. I’ve used python before to create a simple script to clean up references in a text file however, I’ve hardly used it for anything else even though I liked the language a lot. So I decided to give it a try. Interestingly enough I had a lot of trouble finding a free library for mysql. Though once I finally found, mySQLdb, the process of querying and analysing the data and getting some neat graphs took hardly a few lines of code. So now once I had the data, i could simply run the python script and get all the charts I needed.

Step 4: The power of the cloud

A single run of my simulation can take up to 5 minutes. For 100 replications of  under 6 different settings (this is what I needed for the particular run at that time) this would take about 3000 minutes or 50 hours or just over 2 days. While not bad, I needed my computer and I worked at the parallel and distributed computing center so it would have been a waste to not make use of all that computing power at our disposal. So I got myself an account on the cluster and created a simple shell script that would run the simulation with the fixed settings. Eventually I extended this so that it would read parameters from a separate text file, modify the xml file appropriately, and then run the simulation the required number of times and finally, at the end of the run I would be send an email. Here is the code for this first script:

 
#!/bin/bash
# runSimulations = runsSimulations from inputs in file 1

opath=$PATH
PATH=/bin:/usr/bin

case $# in
  0|1) echo 'Usage runSimulations settingsFile xmlFile' 1>&2; exit 1
esac

awk -v xmlFile=$2 '
BEGIN {totalCount=1
startingPoint[1]=1}
{
  model[NR] = $1
  startingPoint[NR+1] = startingPoint[NR]+NF-1
  for(i=2;i<=NF;i++){
    completeValuesList[totalCount] = $i
    totalCount++
  }
}
END {
  startingPoint[NR+1] = totalCount-1
  for (j=0; j<=NR; j++){
    indices[j] = 0
  }

  while(indices[0]!=1){
    timeNeeded=0

    for(j=1;j<=NR;j++){
      value[j] = completeValuesList[startingPoint[j]+indices[j]]
      command = "overwrite " xmlFile " xmlParser " model[j] " " value[j] " " xmlFile
      # print command
      system(command)
    }
    testCommand = "grep FilePath " xmlFile;
    testCommand |getline filePathLine
    close(testCommand)
    seed = 1
    javaCommand = "java -cp dist/CrowdSimulation.jar app.RVOModel -repeat 100 -time 100 -seed " seed
    # print javaCommand
    system(javaCommand)
    for(j=NR;j>=1;j--){
       if(startingPoint[j]+indices[j]==startingPoint[j+1]){
          indices[j]=0
          indices[j-1]++
       }else {
          if(j==NR){
             indices[j]++
          }
       }
    }
 }
}' $1
echo $1 $2 "run complete"|mail -s "Run Complete" vaisaghvt@gmail.com

For anyone with a little experience in shell scripting, this might seem like crap, so if are bored enough to go through this and you know some shell scripting, please do give me any suggestions that you have. That was the code for my first project. In my second project, I’ve changed my approach to having a separate class for each experiment. And also initially, I manually did the work of connecting to each cluster and initializing the job. Now, I’ve automated this too. So I specify the experiment and settings to be run and the script dispatches the jobs to the specified set of clusters and as above, I get emailed at the end when the data is available.

#!/bin/bash
opath=$PATH
PATH=/bin:/usr/bin

case $# in
  0) echo 'Usage runExperiment classToBeRun' 1>&2; exit 1
esac
program=$1

for cluster in "c0-0 0" "c0-1 20" "c0-2 40" "c0-3 60" "c0-4 80" "c0-5 100"
   do
      set -- $cluster
      ssh $1 "nohup ./runCommunication.sh $program $2 2> 2_$2.log 1> 2_$2_1.log < /dev/null &"
      echo "assigned to $1"
   done

SSHing to a remote client and running the command in nohup were the two most difficult parts of this. Nohup lets you run the process even after disconnecting from the machine. The & at the end makes the process run in the background so that you can disconnect and connect to the next machine or do other things. The output is redirected to log files so that I can keep a track of what is happening and finally, something that I took a long time to figure out, you should set input to be received from /dev/null, otherwise you will not be able to disconnect from that particular remote machine.

#!/bin/bash
opath=$PATH
PATH=/bin:/usr/bin

case $# in
  0|1) echo 'Usage runSimulation classFile parameter' 1>&2; exit 1
esac
java -cp IBEVAC.jar $1 $2

echo "$1 $2 run complete"|mail -s "Run Complete" vaisaghvt@gmail.com

There’s still a lot more automation I can and plan to do. But as of now, I’m in a state where I can run simulations quite easilly and I won’t be changing things much for some time. Next stop, getting a proper gitflow happening with Netbeans or eclipse.