Patrick Williams

Patrick Williams

Project title: Visualization of Deep Learning Algorithms for Vandal Detection

Blog:

July 21

I had a lot of great experiences this summer with my friends and in the lab. I think my favorite experience was when we went to the Whitewater center for the day. It was nice to hang out with the other Computer Science REU students, and although I hang out with them a lot during the week, having organized events just for us really meant a lot to me and helped the whole group bond more. I felt that that was a big upside to this REU; all of the students have been very close while I’ve been here. A lot of students in other REUs have commented on how jealous they are that the Computer Science students always hang out together and are so close.

In the lab getting to work with VR in my first week was probably the most fun. While it wasn’t directly related to my research it was an experience I had never had before and I was really excited to play around with an Oculus Rift for a while.

The hardest thing I’ve done this summer was working with and writing parts of code that was not written by me. Having to try and understand complex code for neural networks and write additions to it is difficult on its own, but the code I worked with wasn’t written by me and there were no comments in the code. Because I my advisor has been gone for most of the REU period, working on this code without being able to get help in person has posed a challenge.  However, I still value the challenging experience and all that I’ve learned while working in the lab. 

July 14

I had to do a lot of dissecting code this week, because I had to figure out the output of the neural network I am working with as data is being processed through it. I tried to get the program to print out the data at certain intervals but the information was stored in tensors and could not be printed in Python. I thought to use the debugger instead so I can see all of the variables at once and follow them through each line of the code to see any changes. From the debugger I was able to see the different layers of the model and the variables contained in them, including tensors for the output and other relevant information.

I could tell that what I found was important but I didn’t have enough of an understanding of the model to know all of the specifics and how to move from there so I emailed Yuemeng and Dr. Lu with my findings and asking for guidance. Yuemeng only directed me to Dr. Lu, who said to email Dr. Wu’s students to find out more. Yuemeng told me that he has a deadline and cannot meet this week so I was unable to meet with him in person to discuss things.

Earlier in the internship I was excited to take on the challenge of working individually with my advisor out of the country and the student she asked to help me in person working on a Ph.D. dissertation but in the end I feel that I have not accomplished as much as I wanted to so far. With Yuemeng only being able to communicate with me via email because he is out of the lab, a significant but simple question that blocks my way that is normally answered in minutes when people are in the lab instead takes hours to get answered. With the seminars over the past few weeks discussing how to present our research, writing abstracts, and making posters to show our work, I am getting concerned that I do not have enough work done compared to my peers. I feel a lot of pressure to get work done but there are many points where I cannot proceed with my work because I need someone with a higher understanding to help me, and waiting for a response to an email can eat up a lot of time. I think that I will be satisfied with my final product when I complete the poster but lately I feel that there has been a lot of time where I have been unproductive. Maybe over the next week I can focus on finding more things to do or work on to stay occupied.I had to do a lot of dissecting code this week, because I had to figure out the output of the neural network I am working with as data is being processed through it. I tried to get the program to print out the data at certain intervals but the information was stored in tensors and could not be printed in Python. I thought to use the debugger instead so I can see all of the variables at once and follow them through each line of the code to see any changes. From the debugger I was able to see the different layers of the model and the variables contained in them, including tensors for the output and other relevant information.

July 7

Dr. Lu asked me this week to change direction on my work. Instead of working on visualizing the data from the Keras code, I am now trying to extract information from the model on the individual nodes and layers that make up the neural network. This past week I met with Yuemeng to make sense of this, because I have only been working with Tensorflow and Keras code, which are easy to find documentation for and read tutorials on. However, I was concerned at first because looking at the code I cannot find any information about the model of the neural network, much less any information about individual nodes or layers.

            After meeting with Yuemeng I felt a lot better, because he was able to show me where in the code he thinks this information is stored. Now all I have to do is try some print statements to check the data being processed. I am glad to have someone else to ask questions to like Yuemeng, because he has helped me think of things I could not while working in the lab alone. He told me that his dissertation draft should be finished soon, which means he can spend more time with me one on one in the lab. This makes me happy, because I feel that my work has been progressing very slowly working on my own. I expect the next week to be very productive now that I have much more definite direction to move in.

June 30

Earlier on in my research, one of my problems was installing Tensorflow on the computer I work on in the lab. I made a lot of mistakes installing it because there were so many dependent packages that also needed to be installed and files that needed to be in specific places for Tensorflow to work properly. Because of these mistakes I had to troubleshoot a lot of installation problems and I was able to familiarize myself with the command line commands to run the installations of different packages. Last week when I met with Yuemeng we worked on installing some of the Keras dependent packages and we had a lot of trouble. But I was able to help figure things out because I had worked through a similar problem in the past. The mistakes I made earlier led me to learn more about how to handle those kind of problems and have prepared me for these kind of situations in the future.

This week I was happy to make a lot of progress with the code I was given, but I am still struggling to visualize the code. I think this is mostly because the code I am currently working with uses a package called Keras, while I want to use Tensorboard to visualize the data. Because Keras uses Tensorflow as a backend, there are a lot of ways that they interact, so I am trying to investigate this to see if I can use the tools I already have to visualize the data after processing it with the Keras code. I am unhappy with how slowly my work has been progressing, so I will focus much more on asking others for help in the coming weeks, considering the lack of one-on-one meetings I have in my lab. Although I enjoy trying to solve these problems on my own, I have a deadline and it will be much easier to at least get input from others around the lab or from the department. I expect next week to be even more productive! 

June 23

This week I was given a couple of programs written by another student that perform all of the calculations I need but using different packages from the one I was asked to work with (Tensorflow). I have been keeping in contact with Dr. Lu and the PhD student who is helping me, Yuemeng, about where to go from here. This has made progress this week very slow, because a lot of my time this week has been spent working with the new code as much as I can to gain an understanding of how it works and how I might begin to write new code using Tensorflow, while waiting to meet with Yuemeng so I can ask questions.

            I met with Yuemeng on Friday and we spent several hours discussing the code and we decided that if I can get the code I was given to run properly then there is no need to write any new code. This would be great for me because I have been very concerned about having to write such high-level code on my own, especially when the goal of my research is about visualizing the results of the code, and writing my own code that worked properly could take a lot of time.

            However I think this past week has shown how difficult it can be sometimes to be working alone in the lab with little hands-on support. When I get stuck on something the best I can do is text Dr. Lu or Yuemeng and wait for them to respond, so when I hit a roadblock and can’t get around it I always need to have something else to work on. But after meeting with Yuemeng I feel that I understand the code a lot better and I know exactly what I need to do over the next few days. I’m really excited that my research is starting to take a more definite direction. 

June 16

This week I have started to work with Tensorflow and processing the Wikipedia data. There are about 800,000 edits that have to be processed to the data will take a long time to work with. A major challenge that I have faced is that Tensorflow is a complicated program to work with, and I have spent a lot of time reading over tutorials on the Tensorflow website to better understand how to write and run Tensorflow programs and not a lot of time writing code so far. The code I have been working with I have had to troubleshoot a lot, and although Tensorflow is a library in Python, which I am very familiar with, it feels like a completely different language. Hopefully this is a challenge that will not last for very long. I expect that as I continue to work more in-depth with the code I will become a lot more comfortable with creating programs to process the data.

                Another challenge for me is that my advisor, Dr. Lu, is traveling for the rest of the summer and will not be available to meet with me in person. I have been working with other graduate students in the Visualization Lab on my project and they have been very helpful, but overall I have been working in a much more independent manner than some of my other friends in REUs. This is a welcome challenge, but I am worried about running into major roadblocks farther down the road, especially considering that I am still learning the ins and outs of the code I will be writing and working with.

                Because my work is dealing with visualizing the data being processed at each node in a neural network, I was excited to read about and start working with TensorBoard, a built-in visualization framework for Tensorflow programs. However, after I had done some work on it and write about it to Dr. Lu, she asked me to change direction and work on reading data from spreadsheet files into a Tensorflow program, and to worry about visualizing the data later, because TensorBoard would likely not be able to handle the massive amount of data I need to process. This was a little disappointing because I put a couple of days into working with TensorBoard and reading tutorials on it, and now I have to move on a more difficult part of the project. However, I think that my work with TensorBoard will be beneficial in the end, because it has given me more experience with the way that variables in Tensorflow interact with each other, which will help as I continue to work with Tensorflow.

June 9

This week I started to work on my main project for this summer, which involves using deep learning to train a model to easily tell if a Wikipedia user is a vandal based on statistics about their editing habits. While this has been done before, the more specific goal of my research right now is to closely examine the inner workings of the model node by node, to determine what is happening at each level. The program to do this analysis will be written in Python using Tensorflow, a Python library made by Google used for machine learning. Over this week I installed Tensorflow on the server in the Visualization Lab (which took a lot longer than expected because of troubleshooting) and began reading the tutorials. After reading for a while I decided to start following the tutorials by copying the code in Python and following along to see for myself how Tensorflow works. As I work more with Tensorflow I am beginning to gain a much better understanding of machine learning, but working on a project like this is still a little overwhelming. I guess it’s just my inexperience with this topic and the fact that I am still starting my research.

            One of my other jobs this week was to transfer around 500 GB of data for another project from one computer to another, and I had to do this by physically removing the hard drive from the first computer and plugging it into the second one to copy the files. It was a very fun experience because it moved me outside my comfort zone and was an unexpected job when I came into this thinking about working on software and programming and not considering working with the physical components of the computers.

            The biggest challenge I’ve encountered and will probably encounter this summer is that my adviser will be gone after this week traveling so I will have to email her to communicate my work and questions. I will also have to rely on the other graduate students who work on visualization projects for more immediate help. While it’s a little inconvenient, it just means that I will have to be a little more independent in my work, which will be exciting as I develop my research further over the coming weeks.

June 2

This week I began meeting with Dr. Lu to discuss our plans for the summer. She introduced me to the projects currently underway in the Visualization Lab, including a project on visualizing biodiversity in the Smoky Mountains and some projects with VR and AR. I was allowed to try out an Oculus Rift, which was really fun, because I had never had any experience with VR before. I may work some on the VR projects if the work in the lab starts to gravitate more towards that field. Dr. Lu was very open minded and wanted to know about my personal interests in Computer Science. When I started talking about my interest in AI and machine learning, she discussed the many aspects of deep learning that are in the projects currently taking place in the lab. So for now it seems that my focus will be on deep learning and visualization in the lab. In fact, Dr. Lu has already notified me of a conference in Arizona that is accepting 2-4 page papers on deep learning and visualization, which would be a very good opportunity for me if I am able to do enough work in that field this summer, and with the paper only being 2-4 pages, I think I will be able to write that much if I am able to get a small project started this summer.

In the first few days in the lab I have done a lot of reading on deep learning and visualization, including some technical papers and other sources, such as the Google Research Blog, which has many articles on their research into machine learning. We are also working on moving the data for the Biodiversity project onto a new computer, but that may take some time due to technical difficulties with the computer the data is stored on.

Because a lot of the programs and frameworks I will be working with are new to me, I am spending a lot of time trying to set them up and reading about how to operate them. It has been a long time since I have worked this hands-on with software and programming, so I am looking forward to working with these applications in a new field.

            I could tell that what I found was important but I didn’t have enough of an understanding of the model to know all of the specifics and how to move from there so I emailed Yuemeng and Dr. Lu with my findings and asking for guidance. Yuemeng only directed me to Dr. Lu, who said to email Dr. Wu’s students to find out more. Yuemeng told me that he has a deadline and cannot meet this week so I was unable to meet with him in person to discuss things.

            Earlier in the internship I was excited to take on the challenge of working individually with my advisor out of the country and the student she asked to help me in person working on a Ph.D. dissertation but in the end I feel that I have not accomplished as much as I wanted to so far. With Yuemeng only being able to communicate with me via email because he is out of the lab, a significant but simple question that blocks my way that is normally answered in minutes when people are in the lab instead takes hours to get answered. With the seminars over the past few weeks discussing how to present our research, writing abstracts, and making posters to show our work, I am getting concerned that I do not have enough work done compared to my peers. I feel a lot of pressure to get work done but there are many points where I cannot proceed with my work because I need someone with a higher understanding to help me, and waiting for a response to an email can eat up a lot of time. I think that I will be satisfied with my final product when I complete the poster but lately I feel that there has been a lot of time where I have been unproductive. Maybe over the next week I can focus on finding more things to do or work on to stay occupied.