Wordsmith in the cloud: refining language models using web-scale language networks

In this project, UMSI Assistant Professor Qiaozhu Mei addressed the issue of text information overload generated from online conversations. Tools that effectively manage this information overload rely on statistical language models, yet the quality of these models is limited by the sparseness of data, the mismatch of context, and the incapability of modeling semantic relations.

Start date: 7/1/2011
End date: 6/30/2013

Read More

This project employed cloud computing and novel map-reduce algorithms to extract heterogeneous language networks from Web-scale text collections. These networks were used to smooth over and contextualize language models in various domains, making them more accurate and robust. The refined language models were designed to help improve state-of-the-art text retrieval and mining techniques, enhancing the information access and knowledge acquisition experience of real users across community and language boundaries.

The techniques and resources (e.g., language networks and refined language models) will ultimately benefit a broad range of users that analyze text content in social media and many other domains. 

Research for this project was conducted by Qiaozhu Mei’s Foreseer Group. The group does cutting-edge research that is broadly related to data mining and information retrieval. Foreseer focuses on novel methods and applications of information retrieval, text mining, natural language processing, and social network analysis. The group’s research has found broad applications in Web search and mining, social computing, scientific literature mining, and health informatics.  For more information about this project, please visit the Foreseer Group website.

Grants

Wordsmith in the Cloud: Refining Language Models Using Web-Scale Language Networks, National Science Foundation: $214,985 

 

The National Science Foundation is an independent federal agency created by Congress in 1950 "to promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense..."

Press and Web mentions