大学院講義「先端医科学研究概論」

Introduction to Publicly available Web Tools for Clinical Big Data Analysis
臨床ビッグデータ解析のための一般公開されているウェブツールの紹介

Sakura Eri MAEZONO, Ph.D. 
Bioinformatics Assistant Professor

2023-10-24

My Profile


PROFESSIONAL EXPERIENCE


Bioinformatics Assistant Professor 2023 April-Present
Advanced Medical Research Center, Yokohama City University

Bioinformatics researcher 2020-2023 March
Analytics team, Craif Inc., Japan 



EDUCATION


University of Tsukuba 2015-2020
Ph.D. in Human Biology
Graduate School of Integrative and Global Majors


SKILLS


Research and Study Design・Statistical Analyses Planning and Implementation・Clinical Data Management・ R/Shiny/Bioconductor・Python/Streamlit/Django・JavaScript (Google apps script)・Version control using Git・Machine Learning

Sakura Eri
Maezono, Ph.D.

What is Bioinformatics?

My Profile


PROFESSIONAL EXPERIENCE


Bioinformatics Assistant Professor 2023 April-Present
Advanced Medical Research Center, Yokohama City University

Bioinformatics researcher 2020-2023 March
Analytics team, Craif Inc., Japan

 


EDUCATION


University of Tsukuba 2015-2020
Ph.D. in Human Biology
Graduate School of Integrative and Global Majors


SKILLS


Research and Study Design・Statistical Analyses Planning and Implementation・Clinical Data Management・ R/Shiny/Bioconductor・Python/Streamlit/Django・JavaScript (Google apps script)・Version control using Git・Machine Learning

Sakura Eri
Maezono, Ph.D.

Past research | Bioinformatics Researcher at Craif Inc. 

Past research | Cancer risk assessment kit: Craif miSignal®


Seven cancer types

  • Esophagus
  • Lung
  • Breast
  • Stomach
  • Pancreas
  • Colorectal
  • Ovary

Responsibilities

  • Clinical Data management
  • miRNA expression data preprocessing
  • Biomarker discovery analyses
  • Prediction Algorithm development


Past research | How MiSignal® works


Past research | microRNAs as Cancer biomarkers


How?

  • their involvement in regulating gene expression
  • their association with various aspects of cancer, including diagnosis, classification, prognosis, and treatment response
  • their versatility and non-invasive detection make them valuable tools in oncology

Past research | Why Urine?

Why not blood ?

  • requires injection needles (vs. non-invasive)
  • lower miRNAs detected (~ 600); Impurities such as proteins might disturb detection (vs. >1300; Few contaminants due to filtration by the kidneys)

Past research | Role of Bioinformatics in developing miSignal®


My Profile


PROFESSIONAL EXPERIENCE


Bioinformatics Assistant Professor 2023 April-Present
Advanced Medical Research Center, Yokohama City University


Bioinformatics researcher 2020-2023 March
Analytics team, Craif Inc., Japan 


EDUCATION


University of Tsukuba 2015-2020
Ph.D. in Human Biology
Graduate School of Integrative and Global Majors


SKILLS


Research and Study Design・Statistical Analyses Planning and Implementation・Clinical Data Management・ R/Shiny/Bioconductor・Python/Streamlit/Django・JavaScript (Google apps script)・Version control using Git・Machine Learning

Sakura Eri
Maezono, Ph.D.

Bioinformatics Laboratory


https://www-user.yokohama-cu.ac.jp/~bioinfo/

Bioinformatics Educational Portal


https://edu.med.yokohama-cu.ac.jp/

Lecture Agenda


Part 1 Intro to Clinical Big Data Analysis


Part 2 Bioinformatics Web Tools


Part 3 Challenges and Future of Clinical Big Data Analysis


Part 1: Intro to Clinical Big Data Analysis

What is Clinical Big Data Analysis?


the process of extracting valuable insights from vast and diverse datasets related to healthcare and medicine

Importance of Clinical Big Data Analysis


The complete sequencing of the human genome has helped to unlock the genetic contribution for many diseases
Its applications include the following:


      

  • drug discovery: Drug target identification and drug candidate screening can be accelerated, and safer/more effective drugs can be developed based on molecular modelling and simulation

          
  • personalized medicine: A patient’s genetic profile can assist the doctor to predict susceptibility to certain diseases, provide proper medication, and with the proper dose to reduce side-effects

  • gene therapy: Identifying the best gene target site for each individual by taking their genetic profile into consideration can reduce the risk of unintended side effects
          
          
  • preventive medicine: Genomics, proteomics, and metabolomics data are analyzed for possible disease biomarkers to develop screening tests that identify the disease at an early stage

It always starts with…

Types of data involved

  • Patient Data: Demographic information, lifestyle factors, and health-related behaviors
  • Electronic Health Records (EHRs): Comprehensive patient records, including medical history, treatment, and lab results
  • Genomics Data: Information about an individual’s genetic makeup, including DNA sequences and variations


Available Big Data (databases) and how to access them

How to access?

Databases provide instructions but it is usually via the following:

⭐️Direct download from the website       ⭐️FTP server      ⭐️API (Shell/Python/R)

After selecting data, analysis can be done…


  • via R, Python, and other programming languages
  • via Publicly-available web tools


Part 2:Bioinformatics Web Tools

Publicly Available Web Tools


  • online applications accessible to a wide audience, often free/low-cost
  • aid in processing, managing, and interpreting large clinical/genomic datasets

Main purpose

Public Web tools turn data into interpretable results democratizing advanced data analysis without complex installations, programming skills, or high expenses

Advantages

  • Accessibility: easy access to advanced data analysis capabilities for healthcare professionals and researchers
  • Cost-Effectiveness: often free/affordable, reducing financial barriers
  • Community Support: users benefit from a collaborative community, sharing knowledge and solutions, enhancing the tools’ utility and troubleshooting capabilities

Examples of Publicly available web tools


How to Access and Use Web Tools


  • Most explain how to access tools on their websites (web links, registration, etc.)
  • A step-by-step guide (documentation) on how to use one or more of these tools for data analysis is usually provided

Case Studies – Publishing using Web tools


⭐️ Case Study 1: GDC Data Portal

TCGA enabled the researchers to analyze cancer stemness in ~12,000 samples of 33 tumor types

Publication: Fujimoto K., Ito K., Saito Y., et al. Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation. Cell Reports, 23(11), 3306-3320.e10, 2018. https://doi.org/10.1016/j.cell.2018.03.034

⭐️ Case Study 2: cBioPortal

The researchers investigated AKT1, AKT2, AKT3, CHUK, GSK3β, EGFR, PTEN, and PIK3AP1 as participants of EGFR-PI3K-AKT-mTOR signaling using data from cBioPortal

Publication:Brlek, P.; Kafka, A.; Bukovac, A.; Pećina-Šlaus, N. Integrative cBioPortal Analysis Revealed Molecular Mechanisms That Regulate EGFR-PI3K-AKT-mTOR Pathway in Diffuse Gliomas of the Brain. Cancers 2021, 13, 3247. https://doi.org/10.3390/cancers13133247

Tips & Best Practices and Pitfalls to Avoid


DOs

Know Your Data – Understand the format and quality of your data

Take your time with Data prep – Clean and preprocess data as needed

Select Appropriate tools – Choose the right tool for your analysis

Read Documentation – Study tool guides and understand their limitations

Pay attention to Parameters – Set tool parameters carefully

Record Parameters – Keep records for reproducibility

Validate results – Verify results with independent data or experiments

Secure Data – Comply with data privacy regulations

Seek Help – Collaborate or ask for assistance if needed       

DON’Ts

Misinterpret your Data – Be cautious in result interpretation

Take Data Quality for granted – Assess and preprocess data to ensure quality

Use all Data when unnecessary – Analyze relevant subsets for efficiency

Depend on one Tool – Use multiple tools for comprehensive analysis

Ignore Updates – Use the latest tool versions

Forgo Resource Check – Check hardware for computational capacity

Forget Publication Quality – Follow best practices for reporting

Neglect Ethical Considerations – Respect ethical guidelines and permissions

Part 3: Challenges and Future of Clinical Big Data Analysis

Key Challenges in Clinical Big Data Analysis with existing web tools


Future of Clinical Big Data Analysis


Collaboration and Integration


Interdisciplinary Collaboration

Teams of healthcare providers, data scientists, and researchers working together to drive innovation


Integration into Routine Healthcare

Seamless incorporation of data analysis into everyday healthcare practices for data-driven decision-making and personalized care


Global Data Sharing

Enhanced collaboration and sharing of data among healthcare institutions and researchers to deepen disease understanding and improve treatments

Current research | Project: Clinical data analyzer

迅速な臨床ビッグデータクリーニングと解析のための統合的ノーコードウェブアプリ


Problem

  • The quality of the input data is critical to the final results and their interpretation

  • HOWEVER, in Healthcare and Medicine, there are many examples of rich but unorganized, incomplete, and inconsistent data

Solution: Clinical Data Analyzer

Integrated no-code web app development for rapid clinical big data cleaning and analysis (Collaborators wanted!)

  • a free web application that allows medical practitioners to quickly and easily construct initial hypotheses from data with so-called no-code

  • consists of three major tools:

    1. Data cleaner
    2. Patient Finder
    3. Exploratory Analyzer

Current research | 1. Data cleaner: a semi-automated preprocessing data cleaning tool (1)


Current research | 1. Data cleaner: a semi-automated preprocessing data cleaning tool (2)

Current research | 2. Patient Finder: a patient cohort selector and visualizer tool


Current research | 3. Exploratory Analyzer: an interactive data analysis tool


Future research | Projects/Research Interest



Interested in becoming a collaborator?

Contact me: sakura.maezono[at]yokohama-cu.ac.jp
Contact us: bioinfo[at]yokohama-cu.ac.jp

Summary


1. Clinical Big Data Analysis has facilitated the extraction of valuable insights from the continuously expanding healthcare and medical data


2. Bioinformatics web tools democratize advanced data analysis through accessibility, cost-effectiveness, and presence of community support; Effective use of these tools involves knowing the correct data input, documentation reading, and data validation


3. Working together, using web tools, and sharing data worldwide can enhance our understanding of diseases and address challenges such as data quality, scalability, and governance in the field



Take-home message

YOU can take advantage of Bioinformatics web tools to dive into the ever-growing Clinical Big Data!

Through collaboration and sharing data, YOU can actively contribute to our collective knowledge about diseases, fostering a brighter future for healthcare

Questions and Discussion