COMS 4419: Internet Technology, Economics and Policy (Fall 2022)

Projects and Term Papers

The categories below are approximations - all projects and term papers may include quantitative analysis, interviews with experts, literature surveys and software development. In all cases, the relevant literature should be carefully considered and cited.

Any of the projects listed as part of the Datasets and Potential Research Questions 2013 would likely be suitable as a project.

Project Plan

Each team must submit a project plan outlining their goals for their project or term paper.

Progress Report

Each project team member should submit one updated progress report as a PDF file once every two weeks, on Thursday evenings starting two weeks after the project proposal, clearly indicating the work accomplished in those two weeks as well as any obstacles. There should be some verifiable signs of progress - "80% complete" is not helpful; "wrote functions to do X (app. 200 LoC)" is better.

Project Presentation

The presentation should be targeted to last no more than 12 minutes, leaving 3 minutes for questions, for both one-person and group projects. Since talks are back-to-back, we will have to cut short talks that exceed their time allotment. For most speakers and slide styles, this translates to (at most) 7-8 slides, including the title slide. For group projects, you can either split the presentation or designate a single speaker. The former is preferred, to give everyone a chance to practice. You should consider your talk like a "pitch talk", i.e., get the listener interested in your project. What problem did you tackle or what area did you investige? Why is this interesting or important? What were you most surprising results? What approaches did not work well? Briefly, what would be next steps? Please be sure to practice your talk so that you are sure of timing, content and hand-offs.

See talk hints, and Writing technical articles also links to materials related to talks.

Project Final Report (= Research Paper)

Project reports are typically 3,000 to 5,000 words per project team member, i.e., 6,000 to 10,000 words for a two-person project.

Papers should be single-spaced, 11 or 12 pt font, and should conform to the recommendations on writing style and avoid common mistakes. You can include any extensive graphs or tables as appendices if needed. Use of the IEEE templates is strongly suggested. The structure in this guidance page should be followed, although it is somewhat less applicable for analysis and review projects, where a standard "term paper" format is called for. Please ignore the guidance about page limits and individual project reports - the guidance is from a different university.

Experiments and Implementation

Some, but not all, of the projects below require computer networking background, e.g., from a class like CSEE 4119.

Privacy for the Internet of Things or smart TVs:
What kind of data do Internet of Things devices, smart TVs or video devices (Roku, Amazon Fire, Google TV) exchange with the outside world? Who do they "talk" to? The project requires knowledge of Wireshark or scapy.
Measuring Internet port blocking:
Most consumer Internet services block some Internet ports, sometimes for historical reasons, sometimes for security reasons and sometimes for reasons that are less obvious. Develop a tool that allows a user to test which UDP and TCP ports can be used for both incoming and outgoing packets, and whether other IP features such as IP options and IPv6 are usable.
Video quality:
How does bandwidth and packet loss affect the video quality of streaming and interactive applications such as Zoom, WebRTC applications, YouTube, Netflix and Skype? Consider using a network emulator to simulate various network conditions.
Finding Internet bottlenecks:
When Internet applications suffer from performance problems, it is often difficult to tell whether the problem is found in the home (Wi-Fi) network, in the first-hop access network (e.g., the shared cable network), the middle-mile network, the Internet backbone or at the server or CDN. Develop a tool for either a desktop or mobile OS to estimate where the performance problem is likely to be found.
Wi-Fi performance:
It is not uncommon that Wi-Fi is slower than LTE. Map the performance of Wi-Fi (e.g., the Columbia Wi-Fi network) vs. LTE in a geographic area, including indoors, e.g., using the FCC mobile measurement application.
Wi-Fi congestion:
Measure Wi-Fi spectrum usage in the 2.4 and 5.8 GHz bands in various locations in New York City (or wherever you may want to travel...), both indoors and outdoors. How many stations are visible on what frequency channels? Where are publicly accessible access points, such as "Cable Wi-Fi" visible and accessible? Measure the impairment due to interference, i.e., how much lower throughput is between a mobile device and the base station compared to a "silent" radio environment.
Internet speed tests and broadband labels:
There are a large number of speed tests available, including from the FCC (Measuring Broadband America), Ookla (speedtest.net), and Google. For the same network, they often provide very different results. Compare these approaches and analyze how they measure speed and latency. Can you explain the differences? How do you compare proposals for broadband consumer labels?
Location measurements:
What is the reliability of handset-provided geographic location data? Build a tool that allows users to indicate their true location based on a map and compare it to the location provided by GPS, Wi-Fi or cellular tower data. Explore the reliability systematically, both outdoors and within a building.
Indoor positioning:
Can you determine the room (apartment, office, ...) you are in by comparing Wi-Fi "fingerprints"? Can you apply machine learning techniques to the task?
Altitude information:
Many modern smartphones have built-in altimeters based on barometric pressure. Altitude (elevation) information can be very useful to dispatch first responders after a 911 call. Conduct experiments that allow you to evaluate the accuracy of this data in various buildings.
Emergency assistance:
How can citizens be better integrated into emergency response activities, e.g., after large-scale natural disasters? Consider an app that allows citizens to volunteer, be vetted, and then be dispatched similar to official first responders.
TTY replacement:
People who are Deaf or hard-of-hearing use text-based communication, either directly or via a relay service. The first text-based communication used TTYs, using analog modems. Architect and design a system that replaces the outdated TTY technology with an Internet-based system that can use either a dial-up modem or broadband, but still communicate with relay services (via IP) and existing analog TTYs (via a gateway).
Speech-to-text meeting summary:
Using recordings or after live participation by citizen reporters, can we auto-transcribe local government or regulatory meetings and provide summaries, to augment reporting by local journalists (who may no longer be able to cover every borough or county meeting)?
Ad tracking and cookie permissions:
Many websites allow you to choose whether to accept cookies or select among categories of cookies. Determine which cookies are affected? Does the loading speed or data volume of the website change?

Data Analysis

Peering:
Using routing and peering data, characterize peering relationships between carriers, content providers and CDNs. Who peers with whom? Under what conditions?
Robocalls:
I have access to a variety of data sources about robocalls, which may address questions such as: How many calls pretend to be local? Is this changing over time? Are there distinct campaigns that wane and wax (e.g., car warranties, electric utility scams, medical insurance)? Are the same numbers, real or fake, or are numbers being rapidly rotated? Who are the numbers assigned to? How many such calls are signed (STIR/SHAKEN)?
Broadband metrics:
The FCC now gathers a range of broadband performance indicators that are highly correlated, e.g., as part of the Measuring Broadband America data set. What is their relationship with each other? Which of these are independent or dependent variables?
Radio, TV:
Using FCC databases and TVStudy (OET69) software, estimate the number of TV or FM radio stations that can be received in various places. Provide estimates of population averages by state and population reach by station. (For example: "The average household in North Dakota can receive 3.5 TV channels. The average TV station reaches 150,000 households.")
Broadband pricing:
Try to estimate, based on online surveys (e.g., your Facebook or LinkedIn contacts) what they pay for wireless or wireline Internet connectivity and compare by performance, region and country.
Consumer expenditures:
Gather all available data on consumer expenditures for telephone, cellular and Internet services, comparing government data, industry analysis and corporate annual reports. (The BLS consumer expenditures survey provides some information, but may not map cleanly into current categories.) Is the data consistent? Can it be compare against other major OECD economies? How have expenditures changed?
Broadband deployment:
Analyze the FCC Form 477 broadband deployment data to show how connectivity, technology and bandwidth (speed) have changed over time. How do changes correlate to population density and household income or other demographic variables? Using the Universal Service Fund data, how does funding correlate to changes in broadband availability?
Broadband subsidies:
In the United States, both the Federal government and states subsidize broadband and communication services (mobile phones, mainly). Who benefits - consider rural vs. urban, richer vs. poorer areas, using data provided by the FCC, Census data and other sources.
Rural electric cooperatives:
Analyze the service territories of rural electric cooperatives. Using the FCC Form 477 data, how good (or bad) is broadband connectivity in those areas? Has it changed recently?
Network reliability:
Can you determine network outages, both "sunny day" and "rainy day", from the FCC Measuring Broadband America or ATLAS measurement infrastructure data?
Vaccine misinformation:
Various tropes of vaccine (or, more generally, COVID-19) misinformation seem to ebb and flow over time. Can one detect such changes? Are they reflected in Google searches, traditional media coverage or across multiple social media platforms such as Twitter and Facebook?

Literature Review and Analysis

The projects below summarize key resources in the topic area. They may involve data, but are likely to require smaller volumes of data and less advanced statistics. They may also draw on interviews you conduct with domain experts.

Digital "papers":
Read Carpenter v. United States, United States v. Jones, and Riley v. California and maybe some lower court cases, summarize how courts are handling search warrants of digital "papers". How has treatment changed? How do these decisions reflect (or not) the differences between traditional and digital letters and other personal documents?
Web scraping:
Read Van Buren v. United States and CFAA (about legality of webscraping). Read some of the amicus opinions. What are the main arguments? What are the equities involved?
FCC Internet privacy rule:
Read regulatory filings for the (now rescinded) FCC internet privacy rule. Evaluate the technical arguments about CPNI, encryption and the internet.
CLOUD Act:
Summarize the text of the CLOUD Act as well as the opinions on either side.
Social media monitoring:
Do a survey of different police department policies on monitoring social media. (You may need to contact the police departments public affairs office or search legal cases.)
Transparency report:
Do a survey and data analysis of tech company's transparency reports. What do they cover? How do they differ in categories and geographic detail? Do they indicate what they do not disclose? Can you design a template similar to, say, a 10-Q disclosure?
Mergers:
Track data about mergers and acquisitions before and after FIRRMA was passed. How did this impact foreign M&A?
Media:
For different TV and radio stations (e.g., in the NYC area), determine their programming mix, e.g., children's programming, local news, advertisements, syndicated programming, ...
Data portability:
For major consumer services for photos, messages, social media posts, address books, and email (e.g., various Google services, Facebook, Instagram, TikTok, Whatsapp, Yahoo Mail, Apple photos and email), can you extract your data, e.g., to move to a new service? How long does it take? How useful is the data you can extract? Can you import the data (e.g., email or photos) to another service? Are there tools to help?
Ad blocking:
Among popular websites, e.g., for news, which function well with ad blockers and which fail or explicitly refuse to provide content?
Content moderation:
What kind of discussion forums, ranking and content moderation do national and local news sites employ? Is there a way to measure the quality of the discussion? Consider contacting newspaper staff to gather their experiences.
Rural broadband:
Analyze the cost of deploying fiber in rural areas. What are the cost components, such as planning, fiber, electronics and construction? How does take-up affect cost and viability? What are financing models?
Cost of Internet access:
Using bills gathered from (Facebook, LinkedIn, real-life) friends and family, try to evaluate the typical cost structure of Internet and phone service. How much variation is there for similar services? How does this compare to the advertised rates?
Communication networks during natural disasters:
Using interviews with residents and public safety officials, as well as various data sources, describe how well various communication facilities help up during Harvey and Irma, including land mobile radio ("walkie-talkies"), cellular, landline and Internet access.
Spectrum usage:
Analyze what spectrum is used for, by whom and where, comparing use for categories such as broadcast, communication and non-communication (radar, medical, industrial) applications.
Spectral efficiency:
Compare the spectral efficiency of FM radio, digital over-the-air (ATSC 3) TV, land-mobile radio and cellular systems. Consider the encoding of information, the air interface, and how many bits of content are delivered to users, or how much spectrum it would take to replace a traditional service such as radio or TV with a cellular service. Note that there is no single definition of spectral efficiency, so the project should consider existing definitions in the literature and justify choices.
TV stations:
Investigate whether one could put all TV stations on cable or satellite, either generally or in more rural areas. How many stations are must-carry vs. retransmission consent? What would be the costs, potential sources of revenue and benefits?
Cybersecurity:
What are the principal causes of cybersecurity problems? Is there quantitative evidence? What remedies are likely to reduce the frequency or impact of such events? (Cite research to support your arguments.)
Cybersecurity:
Consider developing a label similar to a nutrition label or EnergyStar label that gives consumers basic information about the cybersecurity and privacy of an Internet of Things device or an app. You should at least informally evaluate your ideas with non-experts, e.g., in interviews or surveys.