BioPortal™ Research Goal
Infectious
disease outbreaks, either naturally occurring or caused by biological
terror attacks, pose a critical threat to public health and national
security. Information systems and infectious disease informatics (IDI)
research are playing an increasingly important role in developing a
comprehensive approach to prevent, detect, respond to, and manage
infectious disease outbreaks.
To support the surveillance and detection of infectious disease
outbreaks by public health professionals, we design and implement the
BioPortal system, a web-based IDI system that provides
convenient
access to distributed, cross-jurisdictional health data pertaining to
several major infectious diseases.
Our interdisciplinary system-development
team consists of researchers in both IS and public health,
and practitioners and officials from several state health
departments. BioPortal supports sophisticated
spatial-temporal data analysis methods, and has effective
data/information visualization capabilities.
BioPortal™ Research Areas
Outbreak Detection Algorithm Development
Aggregated-Syndrome-Count Time Series Analysis
Continuing our previous work on developing a free-text chief complaint classifier for syndromic surveillance, we are currently focusing on analyzing the aggregated syndrome count time series derived from the output of the classifier. The goal is to develop a new outbreak detection algorithm that can detect and present disease outbreaks effectively. Time series models that encompass structure changes are the foundation of the new outbreak detection algorithm. Extensive effort has been made on the development of the new algorithm and the design of the subsequent evaluation study.
The figure below shows the estimation results of the new
outbreak
detection algorithm (click for larger image). The input to the
algorithm is the respiratory syndrome count
time series of a hospital
in Phoenix during 1995 to 1999, showing in the first panel. The next
three panels are the probability of three states that the algorithm
has identified. State 1 (the second panel) is the median state
(mean = 23.5) representing the seasonal outbreaks. The state 2 (the
third panel) is the low state (mean = 15) representing the normal
state. The state 3 (the last panel) is the high state representing
the outbreaks in the dataset (mean = 42.3).
Risk Adjusted Support Vector Clustering (RSVC)
RSVC combines the idea of risk adjustment with a robust Support Vector Clustering (SVC) method to improve the quality of retrospective spatial-temporal analysis. Specifically, for regions with prior dense baseline data distribution, data points are less likely to be grouped to form anomaly clusters. Several steps are involved in the clustering process. First, the input data is implicitly mapped to a high-dimensional feature space defined by a kernel function (typically the Gaussian kernel). Second, the algorithm finds a hypersphere in the feature space with a minimal radius to contain most of the data. Third, the function estimating the support of the underlying data distribution is constructed using the kernel function and the parameters learned in the second step. The width parameter in the Gaussian kernel function is dynamically adjusted based on kernel density computed using background data. When mapped back to original space, the hypersphere splits into several clusters which indicate high risk outbreak areasProspective Support Vector Clustering (PSVC)
PSVC is based on our previous work on the Risk-Adjusted Support Vector Clustering (RSVC) algorithm, and the basic design ideas behind the well-known CUSUM (cumulative sum) method. Our major motivation is to overcome the baseline data specification difficulty to address the need for real-time disease outbreak monitoring. Our computational evaluation using simulated datasets shows that PSVC can effectively identify the abnormal areas demonstrating changes in the spatial distribution pattern over time and correctly ignore pure spatial clusters. When the abnormal area follows a simple regular shape (e.g., a circle in the emerging scenario), PSVC achieves better precision, while space-time scan statistic (SaTScan), the benchmark method, achieves better recall. PSVC significantly outperforms SaTScan in terms of spatial evaluation measures when detecting abnormal areas with complex, irregular shapes as in the case of the expanding and moving scenarios. PSVC detects abnormal areas as soon as SaTScan does but with fewer false alarms. This is particularly true when abnormal areas do not conform to simple regular shapes.Foot and Mouth Disease (FMD) News Monitoring
Foot and mouth disease (FMD) is a highly contagious and sometimes fatal viral disease of cattle and pigs. It is a significant hazard to agriculture. The 2001 epidemic in UK led to the loss of six million animals. Many research efforts have been put into this area, especially after the burst of FMD in 2001.
The FMD News Monitoring system automatically collects FMD related news from hundreds of websites over the world. This system allows us to not only gather FMD related news from the Web, but also do automatic news summarization, classification, and, importantly, extraction.
The major components of the research are:
- Web Spidering to gather FMD related web pages
- News Filtering by comparing keywords or using a two category classifier
- Machine Translation to translate non-English news
- News Summarization automatically generated for FMD news
- News Classification with the ability to compare the performance of classification algorithms
- Analysis including Important Feature Extraction, and Event Detection.
- Evaluation of our results by the domain expert in the FMD lab at UC Davis
Syndromic Surveillance Dashboard
The Syndromic Surveillance Dashboard is application of the
BioPortal for hospitals and other healthcare organizations. The dashboard
is integrated with Time Series Detection capability and the BioPortal
Hotspot Analysis and Visualization tools. It will also include
samples of summarized and detailed surveillance reports.Time Series Detection Server
Social Network Analysis for SARS
In
this project, we apply Social Network
Analysis to analyze the SARS epidemic in Taiwan in 2003.
Since SARS spread not only through close personal interaction but also
through some contaminated objects, traditional one-mode social networks
with only person nodes are not enough to describe the complexity of
disease spread. To solve this problem, we tried to incorporate
geographical locations as nodes, such as high risky areas and
hospitals, into social networks and found that introducing geographical
locations in social network provides a good way not only to see the
role that those locations play in the disease transmission but also to
identify potential bridges between those locations. Currently, this
project focuses on verifying the relationship between the social
network structure and spread of disease. The research results will be
further developed into a biosurveillance mechanism for outbreaks
detection.BioPortal Chief Complaint Classifier
There are three major stages in the BioPortal ontology enhanced chief complaint (CC) classifier- CC standardization, symptom grouping, and syndrome classification. Acronyms and abbreviations are expanded. CCs are divided into symptoms and mapped to standard Unified Medical Language System (UMLS) concepts.
- Standardized symptoms are grouped together using a symptom grouping table. Symptoms that cannot be found in the existing symptom grouping table but are closely related to known symptom groups according to the UMLS ontology, are grouped using the weighted semantic similarity score (WSSS) method. The WSSS method can expand existing grouping knowledge and improve overall performance.
- A rule-based engine is used to map from symptom groups to syndromic categories.
Chinese Chief Complaint Classifier
To
facilitate the processing of chief complaints (CCs) in Chinese, we
coupled a Chinese-English translation module and the BioPortal
ontology-enhanced CC classifier. There are three major steps
involved in the Chinese-English CC translation: - Separating Chinese and English text strings. Since the BioPortal CC classifier can process English CCs, existing English terms are preserved. Relative positions of text strings are marked and recorded.
- Chinese phrase segmentation. Chinese expressions in the Chinese CCs are segmented using the Chinese terms available from the phrase table, which was constructed using a statistical pattern extraction method based on the concept of mutual information. The longest possible phrases in the phrase list are used for segmentation.
- Chinese phrase translation. The segmented phrases
output from the previous step are used as the basic elements for
Chinese-English symptom mapping.
