vitaLITy

Hi, Welcome to the vitaLITy project!

There are a few prominent practices for conducting reviews of academic literature, including searching for specific keywords on Google Scholar or checking citations from some initial seed paper(s). These approaches serve a critical purpose for academic literature reviews, yet there remain challenges in identifying relevant literature when similar work may utilize different terminology (e.g., mixed-initiative visual analytics papers may not use the same terminology as papers on model-steering, yet the two topics are relevant to one another). We built vitaLITy to help researchers perform academic literature reviews via serendipitous discovery. So far, we have two major releases:

vitaLITy 2 - [v2, 2024]

Reviewing Academic Literature Using Large Language Models

Hongye An, Arpit Narechania, Emily Wall, Kai Xu

Paper (IEEE VIS NLVIZ'24) Dataset of 66,692 articles (Coming Soon) Demo (Coming Soon)

vitaLITy 2 uses a Large Language Model or LLM-based approach to identify semantically relevant literature in a textual embedding space. We include a corpus of 66,692 papers from 1970-2023 which are searchable through text embeddings created by three language models. vitaLITy 2 contributes a novel Retrieval Augmented Generation (RAG) architecture and can be interacted with through an LLM with augmented prompts, including summarization of a collection of papers. vitaLITy 2 also provides a chat interface that allow users to perform complex queries without learning any new programming language. This also enables users to take advantage of the knowledge captured in the LLM from its enormous training corpus.

Citation:


    @misc{an2024vitality2,
        title={vitaLITy 2: Reviewing Academic Literature Using Large Language Models}, 
        author={Hongye An and Arpit Narechania and Emily Wall and Kai Xu},
        year={2024},
        eprint={2408.13450},
        archivePrefix={arXiv},
        primaryClass={cs.HC},
        url={https://arxiv.org/abs/2408.13450}, 
        howpublished={Presented at the NLVIZ Workshop, IEEE VIS 2024}
    }

vitaLITy - [v1, 2021]

Promoting Serendipitous Discovery of Academic Literature with Transformers & Visual Analytics

Arpit Narechania, Alireza Karduni, Ryan Wesslen, Emily Wall

Paper (TVCG'22) Dataset of 59,232 articles Poster (CRIDC'22) Demo

vitaLITy promotes serendipitous discovery of relevant literature using transformer language models, allowing users to find semantically similar papers in a word embedding space given (1) a list of input paper(s) or (2) a working abstract. vitaLITy visualizes this document-level embedding space in an interactive 2-D scatterplot using dimension reduction. vitaLITy also summarizes meta information about the document corpus or search query, including keywords and co-authors, and allows users to save and export papers for use in a literature review. We present qualitative findings from an evaluation of vitaLITy, suggesting it can be a promising complementary technique for conducting academic literature reviews. Furthermore, we contribute data from 38 popular data visualization publication venues in vitaLITy, and we provide scrapers for the open-source community to continue to grow the list of supported venues.

Citation (Paper):


    @article{narechania2021vitality,
        title={{vitaLITy: Promoting Serendipitous Discovery of Academic Literature with Transformers \& Visual Analytics}},
        author={Narechania, Arpit and Karduni, Alireza and Wesslen, Ryan and Wall, Emily},
        journal={IEEE TVCG},
        year={{2022}},
        url = {https://doi.org/10.1109/TVCG.2021.3114820},
        publisher={IEEE}
    }

Citation (Dataset):


    @inproceedings{narechania2021vitalitydataset, 
        title={{VitaLITy: A Dataset of Academic Articles}},
        journal={figshare}, 
        booktitle={figshare},
        publisher={figshare}, 
        author={Narechania, Arpit and Karduni, Alireza and Wesslen, Ryan and Wall, Emily}, 
        url={https://figshare.com/articles/dataset/VitaLITy_A_Dataset_of_Academic_Articles/14329151},
        year={{2021}}
    }

It is easy to install / develop / extend! Checkout the underlying repositories!

scraper: Academic Articles Web Scraper

Leverage and contribute to our Python scraper that scrapes metadata from digital libraries (e.g., ACM Digital Library)

embed: Generate Document Embeddings

Utilize our Embed API to generate gloVe, Specter, ADA embeddings for input documents.

rest-api: RESTful API for Similarity Search

Utilize our RESTful API to query for similar papers by a list of seed papers or a working abstract.

frontend: Interactive Visualization

Explore the document corpus of academic articles in our scalable, interactive UI.