728x90

Relative Keyword

* dataframe join

* dataframe merge 

* dataframe concat

* dataframe union all

 

pd.merge() API 

pd.merge(
    left,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=True,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)
  • left: A DataFrame or named Series object.
  • right: Another DataFrame or named Series object.
  • on: Column or index level names to join on. Must be found in both the left and right DataFrame and/or Series objects. If not passed and left_index and right_index are False, the intersection of the columns in the DataFrames and/or Series will be inferred to be the join keys.
  • left_on: Columns or index levels from the left DataFrame or Series to use as keys. Can either be column names, index level names, or arrays with length equal to the length of the DataFrame or Series.
  • right_on: Columns or index levels from the right DataFrame or Series to use as keys. Can either be column names, index level names, or arrays with length equal to the length of the DataFrame or Series.
  • left_index: If True, use the index (row labels) from the left DataFrame or Series as its join key(s). In the case of a DataFrame or Series with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame or Series.
  • right_index: Same usage as left_index for the right DataFrame or Series
  • how: One of 'left', 'right', 'outer', 'inner', 'cross'. Defaults to inner. See below for more detailed description of each method.
    Merge SQL JOIN  Description
    left LEFT OUTER JOIN Use keys from left frame only
    right RIGHT OUTER JOIN Use keys from right frame only
    outer FULL OUTER JOIN Use union of keys from both frames
    inner INNER JOIN Use intersection of keys from both frames
    cross CROSS JOIN Create the cartesian product of rows of both frames
  • sort: Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve performance substantially in many cases.
  • suffixes: A tuple of string suffixes to apply to overlapping columns. Defaults to ('_x', '_y').
  • copy: Always copy data (default True) from the passed DataFrame or named Series objects, even when reindexing is not necessary. Cannot be avoided in many cases but may improve performance / memory usage. The cases where copying can be avoided are somewhat pathological but this option is provided nonetheless.
  • indicator: Add a column to the output DataFrame called _merge with information on the source of each row. _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame or Series, right_only for observations whose merge key only appears in 'right' DataFrame or Series, and both if the observation’s merge key is found in both.
  • validate : string, default None. If specified, checks if merge is of specified type.
    • “one_to_one” or “1:1”: checks if merge keys are unique in both left and right datasets.
    • “one_to_many” or “1:m”: checks if merge keys are unique in left dataset.
    • “many_to_one” or “m:1”: checks if merge keys are unique in right dataset.
    • “many_to_many” or “m:m”: allowed, but does not result in checks.

     

    import pandas as pd 
    
    left = pd.DataFrame(
        {
            "key": ["K0", "K1", "K2", "K3"],
            "A": ["A0", "A1", "A2", "A3"],
            "B": ["B0", "B1", "B2", "B3"],
        }
    )
    
    right = pd.DataFrame(
        {
            "key": ["K0", "K1", "K2", "K3"],
            "C": ["C0", "C1", "C2", "C3"],
            "D": ["D0", "D1", "D2", "D3"],
        }
    )
    
    result = pd.merge(left, right, on="key")

import pandas as pd 

left = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)

right = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)

result = pd.merge(left, right, how="left", on=["key1", "key2"])

https://unsplash.com/photos/T9rKvI3N0NM

 

728x90
  • 네이버 블러그 공유하기
  • 네이버 밴드에 공유하기
  • 페이스북 공유하기
  • 카카오스토리 공유하기
반응형