Researchers at the University of Liverpool are building the world’s most comprehensive database describing human and animal pathogens, which can be used to prevent and tackle disease outbreaks around the globe.
The Enhanced Infectious Diseases (EID2) database has been developed by the Liverpool University Climate and Infectious Diseases of Animals (LUCINDA) team and is funded by a BBSRC Strategic Tools and Resources Development Fund grant.
Effectively mapping the relationships between human and animal diseases and their hosts, disease-causing pathogens and the ways in which pathogens are transmitted can offer huge benefits when it comes to knowing what the disease risks are in a population or geographical area, and how best to manage and eliminate them.
The EID2 team realised that there was a potential treasure trove of data already available in the scientific literature and in pre-existing databases, which was just waiting to be mined for useful insights – a ‘Big Data’ approach. By using openly accessible information in a new way, data from EID2 has been used in work to trace the history of human and animal diseases, to predict the effects of climate change on pathogens, to produce maps of which diseases are most likely in some areas and to categorise the complex relationships between human and animal carriers and hosts of numerous pathogens.
Epidemiologist Dr Marie McIntyre, one of the EID2 team, said: “The database is matchless in scale, and has the capacity to hold data on all known human and animal pathogens, when detailed information becomes available.”
“We use largely automated procedures to collate data on human and animal pathogens: where, when, and in which hosts there is evidence of their occurrence.
“After scientists have sequenced part or all of a pathogen’s DNA or RNA, they usually upload the sequence to public databases, and include information (called metadata) on where, when and from which host the pathogen was obtained.
“EID2 is unique in extracting the information on pathogens from this metadata and as there are already tens of millions of sequence uploads to look at, and millions more are added every year, EID2 has the capacity to become a comprehensive, definitive source of pathogen and disease information.
“In addition, the sequence data is supplemented with information from the NCBI’s database of scientific publications, PubMed. The procedures used for identifying the hosts in which pathogens occur, and where they occur, are objective, and the information the EID2 contains can be regularly updated and improved as more detailed information becomes available.”
All together there are more than 60 million pieces of data that have been brought together for EID2, with new information added all the time. The database is open-access, allowing registered researchers to use it, and the data can be manipulated in lots of ways to help scientists to tackle numerous questions.
Dr McIntyre said: “EID2 is useful because it gives access to sets of information on infectious pathogens which have, until now, been difficult to acquire. For example, it describes all of the known pathogens of a host species, and all of the hosts of a pathogen species. It can generate all of the recorded pathogens in a specific country or region, or all of the pathogens of a certain host in a specific country. It gives instant access to the raw data from which this information is built. It also allows the distribution of pathogens (and hosts) to be mapped.”
This disease mapping is one of the most important areas where EID2 can be a valuable tool.
Research has shown that only four percent of clinically-important diseases in humans have been geographically mapped, despite half having a strong rationale for mapping.
Because EID2 can pull together novel data sources, it can quickly and accurately map diseases, and because it isn’t limited as to which pathogens and hosts it can describe, it has the potential for large-scale global mapping of animal and crop diseases in the same way as is currently being undertaken for human and some animal diseases.
The EID2 data is already being used to contribute to work on emerging and zoonotic infections at a Health Protection Research Unit (HPRU) at the University of Liverpool, which was created at the end of 2013, a national centre of excellence in multidisciplinary research to protect the nation’s health.
In fact the potential of the EID2 data for analysis is incredibly wide ranging, and a host of exciting ideas are being considered by the Liverpool team. Among these are plans to use data for risk analysis, predicting where and in which species certain diseases are most likely to occur, and producing estimates of where diseases can occur based on environmental data such as climate, demographics and vegetation.