Dear Editor,
With great interest, I read the recent article entitled “Characterization of lingual microbiota in pediatric geographic tongue” published in The Turkish Journal of Pediatrics.1 The study provides valuable insights into the potential role of oral microbiota dysbiosis in the pathogenesis of geographic tongue in children. I would like to comment on the use of Linear discriminant analysis Effect Size (LEfSe) for analyzing the microbiota data and the considerations when applying this method to small sample sizes.
LEfSe is a Python-based software tool that integrates statistical testing with biological consistency estimation to identify features that are significantly enriched in one or more investigator-defined groups.2 While LEfSe has been widely employed in thousands of microbiome studies, recent research indicates that the method is susceptible to false positives.3 LEfSe does not perform false discovery rate (FDR) correction, leading to the identification of a large number of false positives in the absence of a distinguishing signal. This can mislead research conclusions and reduce the reliability of the study. Cho et al. also discovered that LEfSe frequently exhibits type I errors exceeding 5%, indicating a potential to erroneously identify non-significant genes as differentially expressed.4 LEfSe method exhibits instability in sparse datasets, being susceptible to the degree of data sparsity.5 Therefore, caution is warranted when interpreting results obtained from LEfSe analysis of sparse microbiome data, and appropriate validation is necessary.
If the authors intend to retain this approach, it is advisable to acknowledge the limitations in the interpretation of the results as a potential weakness. Additionally, it should be noted that the methodology section of the paper does not reference the application of FDR correction. Given that LEfSe is employed, it is strongly recommended to implement FDR correction to mitigate the risk of false positives. To mitigate potential biases inherent in single-method analyses, LEfSe-derived results should be rigorously cross-validated with outcomes from complementary approaches, such as ANCOM-BC (for compositionally aware analysis) and ALDEx2 (for sparse data robustness).6,7 Consensus features identified across these independent frameworks are prioritized as high-confidence biomarkers, thereby reducing false discovery risks. Furthermore, to enhance the robustness of the findings, it is recommended to reuse publicly available data by downloading similar cohort data from NCBI SRA or EBI MGnify to expand the sample size.
In conclusion, this study revealed the association between pediatric geographic tongue and dysbiosis of the lingual microbiota, providing new insights into the pathogenesis of geographic tongue. While LEfSe is a valuable tool for microbiome analysis, its limitations, particularly in small sample sizes and sparse datasets, necessitate careful interpretation and complementary validation.
Source of funding
The author declares that the study is supported by the Shanghai Municipal Education Commission Professional Master’s Degree Excellent Cultivation Research Project (No. 2023PYZX20), the 2022 Research Project of Shanghai Sanda University (No. 2022BSZX04), and the 4th Priority Subject of Nursing for their support.
Conflict of interest
The authors declare that there is no conflict of interest.
References
- You Y, He Y, Huang P. Characterization of lingual microbiota in pediatric geographic tongue. Turk J Pediatr 2024; 66: 448-456. https://doi.org/10.24953/turkjpediatr.2024.4638
- Segata N, Izard J, Waldron L, et al. Metagenomic biomarker discovery and explanation. Genome Biol 2011; 12: R60. https://doi.org/10.1186/gb-2011-12-6-r60
- Nearing JT, Douglas GM, Hayes MG, et al. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun 2022; 13: 342. https://doi.org/10.1038/s41467-022-28034-z
- Cho H, Qu Y, Liu C, et al. Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data. Brief Bioinform 2023; 24: bbad279. https://doi.org/10.1093/bib/bbad279
- Wallen ZD. Comparison study of differential abundance testing methods using two large Parkinson disease gut microbiome datasets derived from 16S amplicon sequencing. BMC Bioinformatics 2021; 22: 265. https://doi.org/10.1186/s12859-021-04193-6
- Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun 2020; 11: 3514. https://doi.org/10.1038/s41467-020-17041-7
- Fernandes AD, Reid JN, Macklaim JM, McMurrough TA, Edgell DR, Gloor GB. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2014; 2: 15. https://doi.org/10.1186/2049-2618-2-15
Copyright and license
Copyright © 2025 The Author(s). This is an open access article distributed under the Creative Commons Attribution License (CC BY), which permits unrestricted use, distribution, and reproduction in any medium or format, provided the original work is properly cited.