
Prof. Dr. Barend Mons, FISC(born 1957,The Hague) is a molecular biologist by training and a leading FAIR data specialist. The first decade of his scientific career he spent on fundamental research on malaria parasites and later on translational research for malaria vaccines. In the year 2000 he switched to advanced data stewardship and (biological) systems analytics. He is most known for innovations in scholarly collaboration, especially nanopublications, and knowledge graph based discovery. In 2014 he organised the seminalFAIR conferenceat the Lorentz centre that led to theFAIR data initiativeandGO FAIR.In 2012 he was appointed as full Professor inbiosemanticsin the Department of Human Genetics at theLeiden University Medical Center(LUMC) in TheNetherlands. In 2015 Barend was appointed chair of the High Level Expert Group on theEuropean Open Science Cloud. Since 2017 Barend is heading the International Support and Coordination office of the GO FAIR initiative as director of the GO FAIR Foundation.From 2018-2023 he was the elected president ofCODATA, the affiliated organisation on research data related issues of theInternational Science Council.. He has also been the European representative in theBoard on Research Data and Information(BRDI) of theNational Academies of Science Engineering and Medicinein the USA. In 2024, he was appointed as Fellow of the International Science Council. At his retirement in 2024 he was Knighted by the Dutch King in the ‘Order of the Dutch Lion’, the oldest and highest reward for cultural and scientific contributions to the international society. He is currently leading theLeiden Institute for FAIR and Equitable Science.He is a frequent keynote speaker about FAIR and open science around the world, and continues to participates in various national and international boards.
Abstract
The rapid developments in the field of machine learning have also brought along some existential challenges, which are in essence all related to the broad concept of ‘trust’. Aspects of this broad concept include trust in the output of any ML process (and the prevention of black boxes, hallucinations and so forth). The very trust in science is at stake, especially now that LLMs can generate ‘good-looking nonsense’ and paper mills come up in response to the perverse reward systems in current research environments. The other side of the same coin is that ML, if nor properly controlled will also break through security and privacy barriers and violate GDPR and other Ethical, Legal and Societal barriers, including equitability. In addition, the existence of data ‘somewhere’ by no means automatically implies its actual Reusability. This includes the by now well established four elements of the FAIR principles: Much data is not even Findable, if found, not Accessible under well defined conditions, and if accessed not Interoperable (understandable by third parties and machines) and this results in the vast majority of data and information not being Reusable without violation of copyrights, privacy regulations or the basic conceptual models that implicitly or explicitly underpin the query or the deep learning algorithm.Now that more and more data will also be ‘independently’ used by machines, all these challenges will be severely aggravated.This keynote will address how ‘data visiting’ as opposed to classical ‘data sharing’, which carries the connotation of data downloads, transport and loosing control, mitigates most, if not all, the unwanted side effects of classical ‘data sharing’. For federated data visiting, the data should be FAIR in an additional sense or perspective, they should be ‘Federated, AI–Ready’, so that visiting algorithms can answer questions related to Access Control, Consent, Format, and can read rich (FAIR) metadata about the data itself to determine whether they are ‘fit for purpose’ and machine actionable (i.e. FAIR digital Objects, or Machine Actionable Units). The ‘fitness for purpose’ concept goes way beyond (but includes) information about methods, quality, error bars etc.The ‘immutable logging’ of all operation of visiting algorithms is crucial, especially when self learning algorithms in ‘swarm learning’ are being used. Enough to keep us busy for a while.
https://www.nature.com/articles/s41586-021-03583-3
