Shared Task for MaizeLitBase 🌽 2025

Wuhan, Oct-Nov, 2025

“Both for academic research & student training. ”

Academic research purpose: to build a comprehensive sentence-level literature evidence-supportive knowledge system for Maize breeding.

Student training purpose: a full academic training for young people in their early academic career.


Maize (Zea mays L.) is one of the world’s most important cereal crops, serving as food, feed, and raw material for industry. It has a large and complex genome, rich in genetic diversity, which makes it a meaningful organism for studying plant genetics and evolution. Effectively navigating the growing knowledge base requires literature mining techniques that can extract, classify, and organize information from research publications. Evidence collection from diverse sources—such as experimental reports, reviews, and field studies—supports the identification of key traits like yield performance, drought resilience, and disease resistance. By applying computational approaches to literature resources, researchers can synthesize existing findings, detect emerging trends, and accelerate the translation of scientific evidence into breeding strategies and sustainable agricultural practices.

Purpose

The purpose of this Project is to design a knowledge base system that aims to curate data from literature, linking it to GO, TO Terms, and provide gene-phenotype associations.
Current Knowledge Base systems like MaizeGDB, Grammne, provide Maize genomic data related, which lack in providing detailed evidence for explaining the gene-phenotype complex associations as that in literature.
To reduce this limitation, MaizeLitBase is proposed to cover up the gaps and provide a knowledge database, which covers Maize-related entity recognition, relation extraction, Ontologies linking, data curation for abundant literature resource, and denoising the data to provide the rich resources from literature.

Selected Topics

  • Document/sentence-Level Semantic Indexing
  • Data Quality Control (e.g. data denoising, confidence scoring)
  • Named Entity Recognition (genes & traits for maize)
  • Concept Normalization (for maize trait ontology)
  • Relation Extraction (e.g. gene-phenotype)
  • Multi-Omics Data Curation (e.g. genome sequencing data)
  • Knowledge Graph Construction
  • LLM-Driven Methods Enhancement for the Above

Data Statistics

These are statistics for current MaizeLitBase Raw Data :
Year Data: 1985-2025
Total Raw Sentences : 3,681,891
Sentence Annotations (Pubtator) : 2,799,305
Total PMIDs : 34,757
Total Gene mentioned (Annotated) : 79,922
Unique Gene (Raw ) (Annotated) :  29,598

Data format

一, Example (Json)



[{"sentence_index":199548,"pmid":"8541493","gene":"PPDK","sentence":"Pyruvate orthophosphate dikinase (PPDK) is a key enzyme of C4 photosynthesis providing the acceptor molecule for the primary CO2 fixation in the mesophyll cells.","annotation":"[('CO2', 'Chemical', 'MESH:D002245', (125, 128)), ('PPDK', 'Gene', 542759, (34, 38))]"},{"sentence_index":199551,"pmid":"8541493","gene":"PPDK","sentence":"Sequence analysis of the entire gene reveals that its coding sequence is identical to the previous isolated PPDK-cDNA from this species.","annotation":"[('PPDK', 'Gene', 542759, (108, 112))]"}]

格式说明:

Sentence_index”:Unique Identification Number
Pmid”:Pubmed ID
Sentence”:Literature Evidence
Annotation”:annotation from sentence



一, Data API

Fetch all Genes: http://lit-evi.hzau.edu.cn/MaizeAlterome/all-genes/
Fetch all PMID:http://lit-evi.hzau.edu.cn/MaizeAlterome/all-pmids/
Search by one Gene:http://lit-evi.hzau.edu.cn/MaizeAlterome/searchbygene/?gene=PPDK
Search by one PMID:http://lit-evi.hzau.edu.cn/MaizeAlterome/searchbypmid/?pmid=8541493

Projects in ST-MaizeLitBase

# Project Name Purpose Participent
1. Sentence Classification in MaizeLitBase Based on Semantic Abundance Identify whether there are important functional maize sentences Ruixiang/瑞祥, Jingbo, Zeyu, Tasos, Ken
2. Construction of the Gold Standard Validation Dataset for MaizeLitBase To semi-automatically annotate Gene Ontology (GO) and Trait Ontology (TO) terms in sentences using large language models assisted by manual verification, thereby creating a reference dataset to guide subsequent annotation workflows. Fumin/付敏, Jingbo, Pierre, Robert, Jin-Dong, Ken,Xinzhi
3. Graph Neural Network-Based Prediction of Gene-Trait Associations Use GNN to predict gene-trait associations and identify key genes responsible for specific phenotypic traits. KangKang/琪瑞, Pierre, Robert, Yuxing
4. Named Entity Recognition from Sentence Gene, Trait, Pathway, Enzymes extraction from a sentence {MaizeGDB, OGER++} Javeed/贾伟, Claire, Jin-Dong, Yanhong
5. OGER Annotation for MaizeLitBase and Evaluation Use the OGER to get the annotation for MaizeLitBase and evaluate the results Yawen/雅文,Claire, Yanhong/艳红, Xinzhi, Javeed
6. Data linking MaizeGDB and KEGG with literature Data integration literature with MaizeGDB and KEGG {Other Databases are possible} Javeed/贾伟,Pierre
7. Data quality evaluation for Annotated Data Perform the evaluation step, check the quality of data Javeed/贾伟, Anne-Sophie, Robert
8. Building a Knowledge Graph Hands-on BioCypher or RDF-config to build a KG. Build a KG from MaizeLitBase {Gene, PMID} Pierre/皮尔, Javeed; Yawen;Zhengcan
9. Intellectual property considerations Identify the rules for downloading, storing, processing and publishing texts for text mining academic research purposes. Claire/克莱尔, Jingbo,Yawen
Notes for all Projects (Click Here)

Talks in ST-MaizeLitBase

Program Committee

Name Affliation
Anastasios Nentidis NCSR, "Demokritos", Greece.
Anne-sophie Foussat French National Research Institute for Agriculture, Food and Environment (INRAE), France.
Claire Nédellec French National Research Institute for Agriculture, Food and Environment (INRAE), France.
Fabio Rinaldi Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland.
Georgios Paliouras NCSR, “Demokritos”, Greece.
Jin-Dong Kim Database Center of Life Science, ROIS, Japan.
Kon Woo Kim National Institute of Informatics (NII), Japan.
Martin Kralinger Barcelona Supercomputing Center, Spain.
Pierre LARMANDE French National Research Institute for Sustainable Development (IRD), University of Montpellier, France.
Robert Bossy French National Research Institute for Agriculture, Food and Environment (INRAE), France.
Xinzhi Yao Huazhong Agricultural Unviersity, China.
Yuxing Wang Sun-Yat Sen University, China.
Loading...