[Visualization] Streamlit Simple Apps : K-Means Clustering

Install Streamlit

pip install streamlit

설치가 완료되었다면 hello world 앱으로 테스트할 수 있다.

콘솔 창에서 아래와 같이 streamlit 명령어를 입력하면 hello world앱이 구동된다.

streamlit hello

Implement Streamlit App : K-Means Algorithm App

* raw-data :

bdiag.csv

0.12MB

# File : app.py
# Author : 9valuemining@gmail.com

# Imports
# -----------------------------------------------------------
from scipy.sparse import data
import streamlit as st
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns

# Set theme
sns.set_style(style="whitegrid")

# Helper functions
# -----------------------------------------------------------
# Load data.
@st.cache
def load_data():
    
    df = pd.read_csv("bdiag.csv")
    # bdiag.csv data source :  https://bookdown.org/tpinto_home/Unsupervised-learning/k-means-clustering.html
    return df

def run_kmeans(df, n_clusters=2):
    #  세포핵을 특성화하는 변수 중 2개(radius_mean와 texture_mean)를 사용하여 클러스터를 식별.
    kmeans = KMeans(n_clusters, random_state=0).fit(df[["radius_mean", "texture_mean"]])

    fig, ax = plt.subplots(figsize=(8, 8))

    ax.grid(True)
    ax.set_facecolor("#FAFAFF")
    ax.tick_params(labelcolor="#4a4a4a")
    ax.yaxis.label.set(color="#4a4a4a", fontsize=20)
    ax.xaxis.label.set(color="#4a4a4a", fontsize=20)
    # --------------------------------------------------

    # Create scatterplot
    ax = sns.scatterplot(
        ax=ax,
        x=df.radius_mean,
        y=df.texture_mean,
        hue=kmeans.labels_,
        palette=sns.color_palette("colorblind", n_colors=n_clusters),
        legend=None,
    )

    # Annotate cluster centroids
    for ix, [x, y] in enumerate(kmeans.cluster_centers_):
        ax.scatter(x, y, s=200, c="#a8323e")
        ax.annotate(
            f"Cluster #{ix+1}",
            (x, y),
            fontsize=12,
            color="#a8323e",
            xytext=(x + 2, y + 1),
            bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="#a8323e", lw=2),
            ha="center",
            va="center",
        )

    return fig
# -----------------------------------------------------------
# -----------------------------------------------------------

# Sidebar Implementation
# -----------------------------------------------------------
sidebar = st.sidebar
df_display = sidebar.checkbox("Show raw-data", value=True)

n_clusters = sidebar.slider(
    label="Select Number of Clusters",
    min_value=2,
    max_value=10,
    value=3
)

sidebar.write(
    """
    Author: [@9valuemining](https://dadev.tistory.com/)
    """
)
# -----------------------------------------------------------


# Main
# -----------------------------------------------------------
# Create a title for app
st.title("K-Means Clustering Example")

df = load_data()

# Show scatter plot
st.write(run_kmeans(df, n_clusters=n_clusters))

if df_display:
    st.write("Raw-Data")
    st.write(df)
# -----------------------------------------------------------

Execute Apps

streamlit run app.py

브라우저에서 http://localhost:8501 주소를 앱 실행 결과를 확인할 수 있다.

실행결과

'Data Science > Data Visualization' 카테고리의 다른 글

데이터 시각화 솔루션 : Apach Superset 소개 및 설치 (0)	2021.10.20
데이터 분석가를 위한 데이터 시각화 솔루션 3종 : Gradio, Streamlit, Dash (0)	2021.09.17
[Grafana] Variables (0)	2021.09.01
[Grafana] Multi-Select Variables --> MYSQL REGEXP (0)	2021.09.01
[JS] Javascript Visualization 라이브러리 (0)	2021.07.20