“Both for academic research & student training. ”
Academic research purpose: to build a comprehensive sentence-level literature evidence-supportive knowledge system for Maize breeding.
Student training purpose: a full academic training for young people in their early academic career.
Maize (Zea mays L.) is one of the world’s most important cereal crops, serving as food, feed, and raw material for industry. It has a large and complex genome, rich in genetic diversity, which makes it a meaningful organism for studying plant genetics and evolution. Effectively navigating the growing knowledge base requires literature mining techniques that can extract, classify, and organize information from research publications. Evidence collection from diverse sources—such as experimental reports, reviews, and field studies—supports the identification of key traits like yield performance, drought resilience, and disease resistance. By applying computational approaches to literature resources, researchers can synthesize existing findings, detect emerging trends, and accelerate the translation of scientific evidence into breeding strategies and sustainable agricultural practices.
The purpose of this Project is to design a knowledge base system that aims to curate data from literature, linking it to GO, TO Terms, and provide gene-phenotype associations.
Current Knowledge Base systems like MaizeGDB, Grammne, provide Maize genomic data related, which lack in providing detailed evidence for explaining the gene-phenotype complex associations as that in literature.
To reduce this limitation, MaizeLitBase is proposed to cover up the gaps and provide a knowledge database, which covers Maize-related entity recognition, relation extraction, Ontologies linking, data curation for abundant literature resource, and denoising the data to provide the rich resources from literature.
These are statistics for current MaizeLitBase Raw Data :
Year Data: 1985-2025
Total Raw Sentences : 3,681,891
Sentence Annotations (Pubtator) : 2,799,305
Total PMIDs : 34,757
Total Gene mentioned (Annotated) : 79,922
Unique Gene (Raw ) (Annotated) : 29,598
[{"sentence_index":199548,"pmid":"8541493","gene":"PPDK","sentence":"Pyruvate orthophosphate dikinase (PPDK) is a key enzyme of C4 photosynthesis providing the acceptor molecule for the primary CO2 fixation in the mesophyll cells.","annotation":"[('CO2', 'Chemical', 'MESH:D002245', (125, 128)), ('PPDK', 'Gene', 542759, (34, 38))]"},{"sentence_index":199551,"pmid":"8541493","gene":"PPDK","sentence":"Sequence analysis of the entire gene reveals that its coding sequence is identical to the previous isolated PPDK-cDNA from this species.","annotation":"[('PPDK', 'Gene', 542759, (108, 112))]"}]
“Sentence_index”:Unique Identification Number “Pmid”:Pubmed ID “Sentence”:Literature Evidence “Annotation”:annotation from sentence
Fetch all Genes: http://lit-evi.hzau.edu.cn/MaizeAlterome/all-genes/ Fetch all PMID:http://lit-evi.hzau.edu.cn/MaizeAlterome/all-pmids/ Search by one Gene:http://lit-evi.hzau.edu.cn/MaizeAlterome/searchbygene/?gene=PPDK Search by one PMID:http://lit-evi.hzau.edu.cn/MaizeAlterome/searchbypmid/?pmid=8541493
| # | Project Name | Purpose | Participent |
|---|---|---|---|
| 1. | Sentence Classification in MaizeLitBase Based on Semantic Abundance | Identify whether there are important functional maize sentences | Ruixiang/瑞祥, Jingbo, Zeyu, Tasos, Ken |
| 2. | Construction of the Gold Standard Validation Dataset for MaizeLitBase | To semi-automatically annotate Gene Ontology (GO) and Trait Ontology (TO) terms in sentences using large language models assisted by manual verification, thereby creating a reference dataset to guide subsequent annotation workflows. | Fumin/付敏, Jingbo, Pierre, Robert, Jin-Dong, Ken,Xinzhi |
| 3. | Graph Neural Network-Based Prediction of Gene-Trait Associations | Use GNN to predict gene-trait associations and identify key genes responsible for specific phenotypic traits. | KangKang/琪瑞, Pierre, Robert, Yuxing |
| 4. | Named Entity Recognition from Sentence | Gene, Trait, Pathway, Enzymes extraction from a sentence {MaizeGDB, OGER++} | Javeed/贾伟, Claire, Jin-Dong, Yanhong |
| 5. | OGER Annotation for MaizeLitBase and Evaluation | Use the OGER to get the annotation for MaizeLitBase and evaluate the results | Yawen/雅文,Claire, Yanhong/艳红, Xinzhi, Javeed |
| 6. | Data linking MaizeGDB and KEGG with literature | Data integration literature with MaizeGDB and KEGG {Other Databases are possible} | Javeed/贾伟,Pierre |
| 7. | Data quality evaluation for Annotated Data | Perform the evaluation step, check the quality of data | Javeed/贾伟, Anne-Sophie, Robert |
| 8. | Building a Knowledge Graph | Hands-on BioCypher or RDF-config to build a KG. Build a KG from MaizeLitBase {Gene, PMID} | Pierre/皮尔, Javeed; Yawen;Zhengcan |
| 9. | Intellectual property considerations | Identify the rules for downloading, storing, processing and publishing texts for text mining academic research purposes. | Claire/克莱尔, Jingbo,Yawen |
| Notes for all Projects (Click Here) | |||
| Name | Affliation |
|---|---|
| Anastasios Nentidis | NCSR, "Demokritos", Greece. |
| Anne-sophie Foussat | French National Research Institute for Agriculture, Food and Environment (INRAE), France. |
| Claire Nédellec | French National Research Institute for Agriculture, Food and Environment (INRAE), France. |
| Fabio Rinaldi | Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland. |
| Georgios Paliouras | NCSR, “Demokritos”, Greece. |
| Jin-Dong Kim | Database Center of Life Science, ROIS, Japan. |
| Kon Woo Kim | National Institute of Informatics (NII), Japan. |
| Martin Kralinger | Barcelona Supercomputing Center, Spain. |
| Pierre LARMANDE | French National Research Institute for Sustainable Development (IRD), University of Montpellier, France. |
| Robert Bossy | French National Research Institute for Agriculture, Food and Environment (INRAE), France. |
| Xinzhi Yao | Huazhong Agricultural Unviersity, China. |
| Yuxing Wang | Sun-Yat Sen University, China. |