Personal Project Info
In order to aid formation of groups, please post a message to the "Project Discussions" topic in courseworks with the following information:
- Academic program/profession
- Research interests for project
- Relevant background and experience
- Each group should contain 3-4 people, including at least 1 computational person and 1 biology person
- Single person groups are strongly discouraged -- there is too much work, and you will miss out on the benefits (and challenges) of cross-disciplinary collaboration!
- Group members are responsible for dividing up the work equally and making sure that everyone contributes; if there are serious problems, you can discuss them with me, but I will not police you or ask you to report on your groupmates!
Group project proposal is due Monday, March 29. Please submit a text/pdf/ps version of your proposal via courseworks, and include the names and emails of all group members in the proposal. You should describe the biological problem, methods, and if possible a potential dataset.
The purpose of the project is to have you accomplish the following:
- Implement or use one or more machine learning algorithms or techniques
- Use these techniques to look at real biological data
- Evaluate the performance of your method and attempt to draw conclusions from your results about the biological problem you are considering and/or about the appropriateness of your method for the problem
Within these three requirements, there is a lot of freedom to lean towards more computational or more biological projects: some groups will use standard machine learning techniques to explore a particular biological problem; some groups will evaluate a newer or original machine learning algorithm on a standard biological benchmark dataset. Of course, the most innovative projects would try to specialize or develop machine learning techniques suitable for the particular biological problem at hand.
There should be some original aspect to your project. For example, you should not simply reproduce the results of a published paper, using the same dataset and publically available software -- this is more like a homework exercise. However, you can build off of ideas in the literature in the following ways:
- You can use methods described in research papers and publically available software (please credit the authors) to analyze new datasets or explore different biological problems
- You can extend the machine learning techniques in a paper to try to improve results
- In certain cases, a paper might use very sophisticated but publically unavailable software -- in these cases, it would be an interesting and worthwhile project to write your own prototype of the machine learning software and see if you can reproduce the results in the paper with your code
Project Report Format
You should aim for a short, conference-style paper:
- About 10 pages of text
- Many charts, graphs, etc. showing and evaluating the results of your computational experiments
- List of references
Your entire report (text and figures) should be put online in HTML, postcript, or pdf format. I'll collect the URLs on the due date and create a class project page with links to all the group project pages.
Here is a suggested general format for your report:
Your introduction should describe the biological problem, quickly summarize previous work in the literature, outline your approach, and end with a statement of your results -- what you learned from your experiments
You may be able to publish your paper, or your work may be a first step towards a publishable paper -- if interested, I can give suggestions on how to get your work in shape for publication and where to submit.