My Interests

I am a Junior at Georgetown studying Computer Science, with a minor in Business Studies. In my time at Georgetown, I have focused on building a bridge between theoretical and applied Computer Science for myself and other students. I am interested in developing software products for a fast moving and impactful business as well as conducting research in graduate school and beyond.

Software Projects

Over the past two years, I have gained experience primarily developing backend infrastructure, working with data at scale, and applying machine learning solutions to business and research problems. I am looking to gain more experience in frontend and fullstack development, in addition to strengthening my current skillset.

International Business Machine

May-August, 2022

I worked with the monitoring and insights team in the CIO office to develop an automated bot for generating accounting checklist drafts for sellers. I developed infrastructure for hosting a machine learning solution running on an IBM Openshift cluster. I gained experience using Python libraries statsmodels, sci-kit learn, and Pandas, and using Agile development practices, Git and Github, IBM Cloud Storage solutions, and Slack APIs.

BC Remit

August-November, 2021

As the head of Georgetown Disruptive Tech's Special Project's team and a Google Student Developer Club Lead, I led a technical consulting project with BC Remit. We helped transition them to using a React app from their previous development framework. I gained experience with project management for a team of 5 interns below me, app development using React Native, and using the backend emulation tool Amazon Cognito.

CSD_Presentation_ B.pdf

CSD Presentation_ A.pdf

Los Alamos National Laboratory Supercomputer Institute

June-August, 2021

I was one of 5 interns from Georgetown that piloted a US Department of Defense sponsored program with the Los Alamos National Laboratory. I participated in a three week High Performance Computing (HPC) Bootcamp, and I researched the efficacy of Computational Storage Devices for HPC workloads. I gained experience using Apache Spark, HDFS, Kubernetes, and PySpark SQL while implementing our experiments.

admin.pdf

jira.pdf

Authomize.com

January-May 2021

As project manager for the Georgetown TAMID Chapter's founding technical consulting team, I led an internship for 8 students with our client, Authomize.com. We developed a command line application to convert Authomize's excel file client reports to pdfs with aggregated charts of their data. Examples of the app's output are shown above. I gained experience implementing this application with Node.js, HTML, and CSS and using Git and Github.

Project Olas

September 2020 - March 2021

I helped Project Olas scale their customer relationship processes by creating automated data transfer pipelines connecting their Airtable CRM database to their client scheduling forms. I also worked on automated feedback collection from clients to increase client communication efficiency. I gained experience using Node.js to implement these solutions by integrating with Google Sheets APIs and Airtable APIs.

Research Projects & Publications

Georgetown InfoSense Lab

GazBy Presentation

GazBy: Gaze-Based BERT Model to Incorporate Human Attention in Neural Information Retrieval

August 2021 - February 2022

In this project, I worked alongside Professor Grace Yang at InfoSense. We developed an approach to incorporate human eye tracking data into pre-trained transformer models for ad hoc retrieval. Our work has been published in the ACM Digital Library and was accepted to the International Conference on the Theory of Information Retrieval. I presented this work in a joint conference with SIGIR and ICTIR in Madrid, Spain, July 2022. I created a graduate course, Neural Information Retrieval with Professor Yang in August 2021 to facilitate research for this project.

Full Paper

Abstract

This paper is interested in investigating whether human gaze signals can be leveraged to improve state-of-the-art search engine performance and how to incorporate this new input signal marked by human attention into existing neural retrieval models. In this paper, we propose GazBy (Gaze-based Bert model for document relevancy), a light-weight joint model that integrates human gaze fixation estimation into transformer models to predict document relevance, incorporating more nuanced information about cognitive processing into information retrieval (IR). We evaluate our model on the Text Retrieval Conference (TREC) Deep Learning (DL) 2019 and 2020 Tracks. Our experiments show encouraging results and illustrate the effective and ineffective entry points for using human gaze to help with transformer-based neural retrievers. With the rise of virtual reality (VR) and augmented reality (AR), human gaze data will become more available. We hope this work serves as a first step exploring using gaze signals in modern neural search engines.

SEINE Architecture

SEINE: SEgment-based Indexing for NEural information retrieval.

March 2021 - February 2022

In my first project with the Georgetown InfoSense Lab, I assisted Sibo Dong, a PhD candidate at Georgetown with a neural index to improve retrieval efficiency on Nueral Information retrieval approaches. Our paper was accepted to SIGIR as a workshop paper and will be published in the ACM Digital Library. In June and July 2021, I received funding for this work from Georgetown as a Royden B. Davis Fellow. Read more about my experience as a Davis Fellow here.

Abstract

Many early neural Information Retrieval (NeurIR) methods are re-rankers that rely on a traditional first-stage retriever due to ex- pensive query time computations. Recently, representation-based retrievers have gained much attention, which learn query represen- tation and document representation separately, making it possible to pre-compute document representations offline and reduce the workload at query time. Both dense and sparse representation- based retrievers have been explored. However, these methods focus on finding the representation that best represents a text (aka met- ric learning) and the actual retrieval function that is responsible for similarity matching between query and document is kept at a minimum by using dot product. One drawback is that unlike traditional term-level inverted index, the index formed by these embeddings cannot be easily re-used by another retrieval method. Another drawback is that keeping the interaction at minimum hurts retrieval effectiveness. On the contrary, interaction-based retrievers are known for their better retrieval effectiveness. In this paper, we propose a novel SEgment-based Neural Indexing method, SEINE, which provides a general indexing framework that can flexibly support a variety of interaction-based neural retrieval methods. We emphasize on a careful decomposition of common components in existing neural retrieval methods and propose to use segment-level inverted index to store the atomic query-document interaction values. Experiments on LETOR MQ2007 and MQ2008 datasets show that our indexing method can accelerate multiple neural retrieval methods up to 28-times faster without sacrificing much effectiveness

Georgetown CyberSMART Lab

Smart Auto Insurance System Architecture

Smart Auto Insurance: High Resolution, Dynamic, Privacy-Driven, Telematic Insurance

This paper aimed to offer a proof of concept on transparent smart auto insurance using blockchain technology. I gained experience implementing this proof of concept in Java. You can read about our approach below on Arxiv.

Full Paper

Abstract

Data driven approaches to problem solving are—in many regards—the holy grail of evidence backed decision making. Using first-party empirical data to analyze behavior and establish predictions yields us the ability to base in-depth analyses on particular individuals and reduce our dependence on generalizations. Modern mobile and embedded devices provide a wealth of sensors and means for collecting and tracking individualized data. Applying these assets to the realm of insurance (which is a statistically backed endeavor at heart) is certainly nothing new; yet doing so in a way that is privacy-driven and secure has not been a central focus of implementers. Existing data-driven insurance technologies require a certain level of trust in the data tracking agency (i.e. insurer) to not misuse, mishandle, or over-collect user data. Smart contracts and blockchain technology provide us an opportunity to rebalance these systems such that the blockchain itself is a trusted agent which both insurers and the insured can confide in. We propose a "Smart Auto Insurance" system that minimizes data sharing while simultaneously providing quality-of-life improvements to both sides. Furthermore, we use a simple game theoretical argument to show that the clients using such a system are disincentivized from behaving adversarially

My Interests

Software Projects

Research Projects & Publications

Georgetown InfoSense Lab

GazBy: Gaze-Based BERT Model to Incorporate Human Attention in Neural Information Retrieval

SEINE: SEgment-based Indexing for NEural information retrieval.

Georgetown CyberSMART Lab

Smart Auto Insurance: High Resolution, Dynamic, Privacy-Driven, Telematic Insurance

Contact me at [jgoldstein46 at gmail dot com]