Hwang spent the summer working for the Natural Language Text Processing Lab (NLP) and the Data Science Institute (DSI) on a joint project, doing research on gang violence in Chicago.
What was the topic/central focus of your research project?
I used the DSI’s Deep Neural Inspector to evaluate an NLP model that classified Tweets from gang-related users.
What were your findings?
Through my research, I found that the DNI reported higher correlation between hypothesis functions and neuron/layer output in trained models than random models, which confirms that the models learn how to classify the data input.
The aggression model showed interesting correlation with activation hypotheses, and the same with the loss model with imagery, which implies that aggressive speech tends to be very active (intense) and that text containing loss tend to use language that is concrete rather than abstract. If I had more time to continue this research, I would love to explore different types and sentiments in text and how that would affect how well a model learns its task.
What about the project did you find interesting?
The most interesting part of my research was seeing how interconnected all of these disciplines are. I split most of my time between the Natural Language Processing Lab and the Data Science Institute, but I also had the chance to meet some great people from the School of Social Work–their work on gang-related speech is part of an even bigger project to predict, and later prevent, violence based on social media data.
How did you get involved in/ choose this project?
I’ve been working at the NLP Lab since freshman year and decided to continue working there over the summer. In my opinion research is one of the best ways to develop your skillset and ask questions to people already established in the same field. I knew I wanted to pursue research even before I decided to major in computer science, and I feel so grateful to be included in a lab that combines so many of my interests and develops technology that matters.
How much time did it take and who did you work with?
The project was for three months and I worked with CS faculty – Professor Kathy McKeown and Professor Eugene Wu.
Which CS classes were most helpful in putting this project together?
Python, Data Structures
What were some obstacles you faced in working on this project?
I had just finished my sophomore year when I tackled this project, which means that the most advanced class I had taken at that point was Advanced Programming. I spent a lot of time just learning: figuring out how machine learning models work, reading a natural language processing textbook, and even conducting a literature review on violence, social media, and Chicago gangs just so I could familiarize myself with the dataset. I felt that I had to absorb an enormous amount of information all at once, which was intimidating, but I was surrounded by people with infinite patience for all of my questions.
What were some positives of this project?
Through this project, I really started to appreciate how accessible computer science is. Half of the answers we need are already out on the internet. The other half is exactly why we need research. I can learn an entire CS language for free in a matter of days thanks to all of these online resources, but it takes a bit more effort to answer the questions I am interested in: what makes text persuasive? What’s a fair way of summarizing emotional multi-document texts?
Can you discuss your experience presenting?
Along with the Columbia Summer Symposium, I have presented my research at the Harvard National Collegiate Research Conference and the Stanford Research Conference.
Do you plan to present this research at any other events/conferences?
Yes, but I have yet to hear if I have been accepted.
What do you plan to do with your CS undergraduate degree?
Not sure yet but definitely something in the natural language understanding/software engineering space.
Do you see yourself pursuing research after graduation?
Yes! I loved working on a project that mattered and added good to the world beyond just technology. I also loved presenting my research because it inspired me to think beyond my project: what more can we do, how can others use this research, and how can we keep thinking bigger?