This week I have started to work with Tensorflow and processing the Wikipedia data. There are about 800,000 edits that have to be processed to the data will take a long time to work with. A major challenge that I have faced is that Tensorflow is a complicated program to work with, and I have spent a lot of time reading over tutorials on the Tensorflow website to better understand how to write and run Tensorflow programs and not a lot of time writing code so far. The code I have been working with I have had to troubleshoot a lot, and although Tensorflow is a library in Python, which I am very familiar with, it feels like a completely different language. Hopefully this is a challenge that will not last for very long. I expect that as I continue to work more in-depth with the code I will become a lot more comfortable with creating programs to process the data.
Another challenge for me is that my advisor, Dr. Lu, is traveling for the rest of the summer and will not be available to meet with me in person. I have been working with other graduate students in the Visualization Lab on my project and they have been very helpful, but overall I have been working in a much more independent manner than some of my other friends in REUs. This is a welcome challenge, but I am worried about running into major roadblocks farther down the road, especially considering that I am still learning the ins and outs of the code I will be writing and working with.
Because my work is dealing with visualizing the data being processed at each node in a neural network, I was excited to read about and start working with TensorBoard, a built-in visualization framework for Tensorflow programs. However, after I had done some work on it and write about it to Dr. Lu, she asked me to change direction and work on reading data from spreadsheet files into a Tensorflow program, and to worry about visualizing the data later, because TensorBoard would likely not be able to handle the massive amount of data I need to process. This was a little disappointing because I put a couple of days into working with TensorBoard and reading tutorials on it, and now I have to move on a more difficult part of the project. However, I think that my work with TensorBoard will be beneficial in the end, because it has given me more experience with the way that variables in Tensorflow interact with each other, which will help as I continue to work with Tensorflow.
This week I started to work on my main project for this summer, which involves using deep learning to train a model to easily tell if a Wikipedia user is a vandal based on statistics about their editing habits. While this has been done before, the more specific goal of my research right now is to closely examine the inner workings of the model node by node, to determine what is happening at each level. The program to do this analysis will be written in Python using Tensorflow, a Python library made by Google used for machine learning. Over this week I installed Tensorflow on the server in the Visualization Lab (which took a lot longer than expected because of troubleshooting) and began reading the tutorials. After reading for a while I decided to start following the tutorials by copying the code in Python and following along to see for myself how Tensorflow works. As I work more with Tensorflow I am beginning to gain a much better understanding of machine learning, but working on a project like this is still a little overwhelming. I guess it’s just my inexperience with this topic and the fact that I am still starting my research.
One of my other jobs this week was to transfer around 500 GB of data for another project from one computer to another, and I had to do this by physically removing the hard drive from the first computer and plugging it into the second one to copy the files. It was a very fun experience because it moved me outside my comfort zone and was an unexpected job when I came into this thinking about working on software and programming and not considering working with the physical components of the computers.
The biggest challenge I’ve encountered and will probably encounter this summer is that my adviser will be gone after this week traveling so I will have to email her to communicate my work and questions. I will also have to rely on the other graduate students who work on visualization projects for more immediate help. While it’s a little inconvenient, it just means that I will have to be a little more independent in my work, which will be exciting as I develop my research further over the coming weeks.
This week I began meeting with Dr. Lu to discuss our plans for the summer. She introduced me to the projects currently underway in the Visualization Lab, including a project on visualizing biodiversity in the Smoky Mountains and some projects with VR and AR. I was allowed to try out an Oculus Rift, which was really fun, because I had never had any experience with VR before. I may work some on the VR projects if the work in the lab starts to gravitate more towards that field. Dr. Lu was very open minded and wanted to know about my personal interests in Computer Science. When I started talking about my interest in AI and machine learning, she discussed the many aspects of deep learning that are in the projects currently taking place in the lab. So for now it seems that my focus will be on deep learning and visualization in the lab. In fact, Dr. Lu has already notified me of a conference in Arizona that is accepting 2-4 page papers on deep learning and visualization, which would be a very good opportunity for me if I am able to do enough work in that field this summer, and with the paper only being 2-4 pages, I think I will be able to write that much if I am able to get a small project started this summer.
In the first few days in the lab I have done a lot of reading on deep learning and visualization, including some technical papers and other sources, such as the Google Research Blog, which has many articles on their research into machine learning. We are also working on moving the data for the Biodiversity project onto a new computer, but that may take some time due to technical difficulties with the computer the data is stored on.
Because a lot of the programs and frameworks I will be working with are new to me, I am spending a lot of time trying to set them up and reading about how to operate them. It has been a long time since I have worked this hands-on with software and programming, so I am looking forward to working with these applications in a new field.