# USAID WSSH

# Code Documentation

<div id="bkmrk-" style="display:none;"></div># 1. Preparation

```
import os
import re
import pandas as pd
import numpy as np
import difflib
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B2%5D%3A"></div>```
OUTPUT_DIR = '../output_data'
```

## 1.A. Data Input and Output

```
JMP_INPUT_FILE = '../input_data/JMP/jmp.csv'
JMP_OUTPUT_FILE = f'{OUTPUT_DIR}/table_jmp.csv'
IFS_INPUT_DIR = '../input_data/IFs'
IFS_OUTPUT_FILE = f'{OUTPUT_DIR}/table_ifs.csv'
```

## 1.B. Common Functions

Common functions are a collection of functions used by both data sources (IFS and JMP).

<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell" id="bkmrk-merge_id%3A-this-funct"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput">- **merge\_id**: This function merges two data tables based on a common column replaces missing values with 0, and renames the column for easier identification.
- **cleanup\_semicolon**: Replaces all occurrences of semicolons (;) with an empty string, cleaning up the extra characters that have been included in the Excel format from IFS.
- **replace\_bs\_with\_alb**: BS needs to be updated to ALB for consistency or clarification.

</div></div></div></div><div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B4%5D%3A"><div class="jp-Cell-inputWrapper"></div></div>```
def merge_id(prev_table, keys_table, name):
    merged_df = prev_table.merge(keys_table, left_on=name, right_on=name, how='left')
    merged_df = merged_df.rename(columns={'id': f'{name}_id'})
    merged_df = merged_df.drop(columns=[name])
    merged_df[f'{name}_id'] = merged_df[f'{name}_id'].where(merged_df[f'{name}_id'].notna(), 0).astype(int)
    return merged_df
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B5%5D%3A"><div class="jp-Cell-inputWrapper"></div></div>```
def cleanup_semicolon(source):
    with open(source, 'r') as file:
        content = file.read()
    updated_content = content.replace(';', '')
    with open(source, 'w') as file:
        file.write(updated_content)
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B6%5D%3A"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"></div></div></div>```
def replace_bs_with_alb(x):
    return "ALB" if x == "BS" else x
```

## 1.C. Key Table Generator

This function generates a unique key table for a specified column from both IFS and JMP table, saving the keys to a CSV file. If the CSV file already exists, it appends new values to the existing file while ensuring unique IDs for each entry.

```
def create_table_key(dataframe, column):
    file_path = f'{OUTPUT_DIR}/key_{column}.csv'
    new_table = pd.DataFrame(
        dataframe[column].unique(),
        columns=[column]
    ).dropna().sort_values(column).reset_index(drop=True)
    
    # If the file already exists, load it
    if os.path.exists(file_path):
        existing_table = pd.read_csv(file_path)
        # Find the new values that are not in the existing table
        new_values = new_table[~new_table[column].isin(existing_table[column])]
        if not new_values.empty:
            # Assign IDs to the new values, starting after the max existing ID
            max_id = existing_table['id'].max()
            new_values['id'] = range(max_id + 1, max_id + 1 + len(new_values))
            # Append the new values to the existing table
            updated_table = pd.concat([existing_table, new_values], ignore_index=True)
        else:
            updated_table = existing_table  # No new values to add, keep existing table as is
    else:
        # If the file doesn't exist, create new IDs starting from 1
        new_table['id'] = range(1, len(new_table) + 1)
        updated_table = new_table
    updated_table[['id', column]].to_csv(file_path, index=False)
    return updated_table
```

## 1.D. Country Mapping

This section compares two lists of country names—jmp\_country\_list from the JMP dataset and ifs\_country\_list from the IFS dataset—and finds the closest matches using string similarity. It also includes a mapping for countries with naming differences between the two lists.

```
data_jmp = pd.read_csv(JMP_INPUT_FILE, encoding='latin-1')
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B9%5D%3A"><div class="jp-Cell-inputWrapper"></div></div>```
jmp_country_list = list(data_jmp["COUNTRY, AREA OR TERRITORY"].unique())
ifs_country_list = ['All countries WHHS Tool1','Congo Dem. Republic of the','Ethiopia','Ghana','Guatemala','Haiti','India',
                    'Indonesia','Kenya','Liberia','Madagascar','Malawi','Mali','Mozambique','Nepal','Nigeria','Philippines',
                    'Rwanda','Senegal','Sudan South','Tanzania','Uganda','Zambia']
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-in%C2%A0%5B10%5D%3A"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"></div></div></div>```
# Find the closest match
for country in ifs_country_list:
    probability = difflib.get_close_matches(country, jmp_country_list, n=3, cutoff=0.4)
    if probability:
        if country not in probability:
            print(f"{country} -> {list(probability)}")
    else:
        print(f"NOT FOUND: {country}")
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk--1"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"></div></div></div>```
NOT FOUND: All countries WHHS Tool1
Congo Dem. Republic of the -> ['Democratic Republic of the Congo', 'Republic of Korea', 'Iran (Islamic Republic of)']
Sudan South -> ['Sudan', 'San Marino', 'South Sudan']
Tanzania -> ['Panama', 'Canada', 'Mauritania']
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B11%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
country_mapping = {
    "All countries WHHS Tool1": "All Countries",
    "United Republic of Tanzania": "Tanzania",
    "Congo Dem. Republic of the": "Democratic Republic of the Congo",
    "Sudan South": "South Sudan",
}
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B12%5D%3A"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"></div></div></div></div>```
def map_country_name(country):
    return country_mapping.get(country, country)
```

# 3. IFS Dataset

```
final_columns = ['indicator','year','country','unit','value_name','jmp_category','commitment','value']
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B14%5D%3A"></div>```
files_to_keep = [
    "01. Deaths by Category of Cause - Millions (2nd Dimensions = Diarrhea).csv",
    "06. Poverty Headcount less than $2.15 per Day, Log Normal - Millions.csv",
    "08. State Failure Instability Event - IFs Index.csv",
    "11. Governance Effectiveness - WB index.csv",
    # "12. Value Added by Sector, Currency - Billion dollars.csv",
    "13. Sanitation Services, Access, percent of population (2nd Dimensions = Basic + Safely Managed).csv",
    "14. Sanitation Services, Access, Number of people, million (2nd Dimensions = Basic + Safely Managed).csv",
    "15. Sanitation Services, Expenditure, Capital, Billion $ (2nd Dimensions = Basic + Safely Managed).csv",
    "16. Sanitation Services, Expenditure, Maintenance, Billion $ (2nd Dimensions = Basic + Safely Managed).csv",
    "17. Water Services, Access, percent of population (2nd Dimension = Basic + Safely Managed).csv",
    "18. Water Services, Access, Number of people, million (2nd Dimensions = Basic + Safely Managed).csv",
    "19. Water Services, Expenditure, Capital, Billion $ (2nd Dimensions = Basic + Safely Managed).csv",
    "20. Water Services, Expenditure, Maintenance, Billion $ (2nd Dimensions = Basic + Safely Managed).csv",
    # "21. Population - Millions.csv",
    "23. GDP (PPP) - Billion dollars.csv",
    "24. Stunted children, History and Forecast - Million.csv",
    # "25. Population under 5 Years, Headcount - Millions.csv",
    "26. Malnourished Children, Headcount - Millions.csv"
]
year_filter_config = {
    "year_range": {
        "years": list(range(2019, 2050)),
        "files": [
            "13. Sanitation Services, Access, percent of population (2nd Dimensions = Basic + Safely Managed).csv",
            "17. Water Services, Access, percent of population (2nd Dimension = Basic + Safely Managed).csv"
        ]
    },
    "milestone_years": [2030, 2050]
}
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B15%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
files = [
    f"{IFS_INPUT_DIR}/{f}" for f in os.listdir(IFS_INPUT_DIR)
    if os.path.isfile(os.path.join(IFS_INPUT_DIR, f))
]
files = [f"{IFS_INPUT_DIR}/{file}" for file in files_to_keep]
```

## 3.A. IFS Functions

IFS functions are a collection of functions used only by IFS data source

<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell" id="bkmrk-base_jmp_category%3A-t"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput">- **base\_jmp\_category**: This function assigns or updates the JMP category based on specific conditions in the input data, particularly for records where the value is base; It converts certain base categories to simplified abbreviations ("BS" or "SM"), otherwise; retains the existing category
- **get\_ifs\_name**: This function extracts and cleans the name of an IFS data file by removing unwanted text and formatting, such as numbering, directory paths, and file extensions.
- **get\_value\_types**: This function processes a string by manipulating its structure to generate a list of values based on certain patterns. But it also replaces occurrences of '0' with '\_0.' in the string, since '\_0\_5' is '0.5'.
- **cleanup\_data**: This function cleans a DataFrame by removing unnecessary parts of the text in the "unit" column and ensuring consistency in the "value" column. Specifically, it unifies the unit formatting by removing "2017" from units like "Billion 2017" and handles space and empty value issues in the "value" column.
- **filter\_dataframe\_by\_year**: This function filters a DataFrame based on a year configuration, depending on the config in previous section. It checks the configuration to determine whether to filter by a specific year range or milestone years.
- **remove\_unmatches\_jmp\_category**: This function identifies rows in a dataset where the JMP category ("jmp\_category") does not match the base category ("category\_base"), according to specific rules. It returns True for rows where the mismatch occurs, indicating that the row should be removed.
- **remove\_unmatch\_commitment**: This function identifies rows in a dataset where the "commitment" year does not match the actual "year" of the data. It returns True for rows where the mismatch occurs, indicating that the row should be removed.

</div></div></div></div><div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B16%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
def base_jmp_category(x):
    if x["value_name"] == "Base":
        if x["category_base"] == "Basic":
            return "BS"
        if x["category_base"] == "SafelyManaged":
            return "SM"
        return np.nan
    return x["jmp_category"]
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B17%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
def get_ifs_name(source):
    source = re.sub(r"\s*\(2nd Dimension.*?\)", "", source)
    return re.sub(r'^\d+\. ', '', source.replace(f"{IFS_INPUT_DIR}/", "")).replace(".csv", "")
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B18%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
def get_value_types(lst):
    lst = lst.split('.')[0]
    lst = lst.replace('_0_','_0.').split("_")
    return lst
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B19%5D%3A"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea">  
</div></div></div>```python
def cleanup_data(dataframe):
    dataframe['unit'] = dataframe['unit'].apply(lambda x: x.replace("2017","") if x else None)
    dataframe['value'] = dataframe['value'].apply(lambda x: x.replace(' ','') if ' ' in str(x) else x)
    dataframe['value'] = dataframe['value'].apply(lambda x: x if len(str(x)) > 0 else np.nan)
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B20%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
def filter_dataframe_by_year(dataframe, filename):
    filename = filename.split("/")[3]
    if filename in year_filter_config["year_range"]["files"]: # Filter using the year_range
        filtered_df = dataframe[dataframe['year'].isin(year_filter_config["year_range"]["years"])]
    else: # Filter using milestone_years
        filtered_df = dataframe[dataframe['year'].isin(year_filter_config["milestone_years"])]
    return filtered_df.reset_index(drop=True)
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B21%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
def remove_unmatches_jmp_category(x):
    if x["value_name"] != "Base":
        if x["category_base"] == "Basic" and x["jmp_category"] == "SM":
            return True
        if x["category_base"] == "SafelyManaged" and x["jmp_category"] == "BS":
            return True
    return False
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B22%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
def remove_unmatch_commitment(x):
    if str(x['commitment']).strip() == "2030":
        if str(x["year"]) != "2030":
            return True
    if str(x['commitment']).strip() == "2050":
        if str(x["year"]) != "2050":
            return True
    return False
```

## 3.B. IFS Data Processing

### 3.B.1. Combine, Filter and Remap IFS Values

This section describes the process of transforming and processing IFS data files into a unified DataFrame (combined\_df). The transformation involves cleaning, reshaping, and filtering the data to prepare it for analysis.

```python
combined_df = pd.DataFrame(columns=final_columns)
for file in files:
    print(f"Processing {file}")
    # test only 1 file
    # if file != "../input_data/IFs/14. Sanitation Services, Access, Number of people, million (2nd Dimensions = Basic + Safely Managed).csv":
    #   continue
    cleanup_semicolon(file)
    data = pd.read_csv(file, header=[1,2,4,5], sep=',')
    new_columns = list(data.columns)
    for i, col in enumerate(new_columns):
        if col == ('Unnamed: 0_level_0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2', 'Unnamed: 0_level_3'):
            new_columns[i] = 'Year'
    data.columns = new_columns
    df = pd.DataFrame(data.to_dict('records'))
    df_melted = df.melt(id_vars=['Year'], var_name='variable', value_name='value')
    new_data = []
    for value_list in df_melted.to_dict('records'):
        value_type = get_value_types(value_list["variable"][3])
        new_data.append({
            "year": int(value_list["Year"]),
            "country": map_country_name(value_list["variable"][0]),
            "category_base": value_list["variable"][1],
            "unit": value_list["variable"][2],
            "value_type": list(filter(lambda v:v,value_type)),
            "value": value_list["value"]
        })
    df = pd.DataFrame(new_data)
    df = filter_dataframe_by_year(df, file)
    df_split = pd.DataFrame(df['value_type'].tolist(), index=df.index)
    df_split.columns = ['value_name', 'jmp_category', 'commitment']
    df_final = pd.concat([df, df_split], axis=1)
    df_final['indicator'] = get_ifs_name(file)
    df_final['jmp_category'] = df_final.apply(base_jmp_category, axis=1)
    df_final['jmp_category'] = df_final['jmp_category'].apply(replace_bs_with_alb)
    df_final['remove'] = df_final.apply(remove_unmatches_jmp_category, axis=1)
    df_final = df_final[df_final['remove'] == False].reset_index(drop=True)
    df_final = df_final[final_columns]
    combined_df = pd.concat([combined_df.dropna(axis=1, how='all'), df_final], ignore_index=True)
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk--2"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea">  
</div></div></div>```
Processing ../input_data/IFs/01. Deaths by Category of Cause - Millions (2nd Dimensions = Diarrhea).csv
Processing ../input_data/IFs/06. Poverty Headcount less than $2.15 per Day, Log Normal - Millions.csv
Processing ../input_data/IFs/08. State Failure Instability Event - IFs Index.csv
Processing ../input_data/IFs/11. Governance Effectiveness - WB index.csv
Processing ../input_data/IFs/13. Sanitation Services, Access, percent of population (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/14. Sanitation Services, Access, Number of people, million (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/15. Sanitation Services, Expenditure, Capital, Billion $ (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/16. Sanitation Services, Expenditure, Maintenance, Billion $ (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/17. Water Services, Access, percent of population (2nd Dimension = Basic + Safely Managed).csv
Processing ../input_data/IFs/18. Water Services, Access, Number of people, million (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/19. Water Services, Expenditure, Capital, Billion $ (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/20. Water Services, Expenditure, Maintenance, Billion $ (2nd Dimensions = Basic + Safely Managed).csv
Processing ../input_data/IFs/23. GDP (PPP) - Billion dollars.csv
Processing ../input_data/IFs/24. Stunted children, History and Forecast - Million.csv
Processing ../input_data/IFs/26. Malnourished Children, Headcount - Millions.csv
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B24%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
# Test for Congo
# combined_df[
#    (combined_df["indicator"] == "Sanitation Services, Access, Number of people, million") &
#    (combined_df["country"] == "Democratic Republic of the Congo") &
#    (combined_df["value_name"] == "Base")
#]
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B25%5D%3A"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor">  
</div></div></div></div>```
combined_df['remove'] = combined_df.apply(lambda x: remove_unmatch_commitment(x), axis=1)
combined_df = combined_df[combined_df['remove'] == False].reset_index(drop=True)
combined_df = combined_df.drop(columns=['remove'])
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B26%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
#combined_df[
#(combined_df['country'] == "Democratic Republic of the Congo")
#& (combined_df['indicator'] == "GDP (PPP) - Billion dollars")
#& (combined_df['commitment'] == "2030")
#]
```

### 3.B.2. IFS Data Cleanup

```
cleanup_data(combined_df)
combined_df.head()
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B27%5D%3A-%C2%A0-indicator"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>indicator</th><th>year</th><th>country</th><th>unit</th><th>value\_name</th><th>jmp\_category</th><th>commitment</th><th>value</th></tr></thead><tbody><tr><th>0</th><td>Deaths by Category of Cause - Millions</td><td>2030</td><td>All Countries</td><td>Mil People</td><td>Base</td><td>NaN</td><td>None</td><td>1.237</td></tr><tr><th>1</th><td>Deaths by Category of Cause - Millions</td><td>2050</td><td>All Countries</td><td>Mil People</td><td>Base</td><td>NaN</td><td>None</td><td>1.143</td></tr><tr><th>2</th><td>Deaths by Category of Cause - Millions</td><td>2030</td><td>All Countries</td><td>Mil People</td><td>FS</td><td>ALB</td><td>2030</td><td>1.075</td></tr><tr><th>3</th><td>Deaths by Category of Cause - Millions</td><td>2050</td><td>All Countries</td><td>Mil People</td><td>FS</td><td>ALB</td><td>2050</td><td>1.068</td></tr><tr><th>4</th><td>Deaths by Category of Cause - Millions</td><td>2030</td><td>All Countries</td><td>Mil People</td><td>FS</td><td>SM</td><td>2030</td><td>0.955</td></tr></tbody></table>

</div></div></div></div></div></div>## 3.C. IFS Table of Keys

### 3.C.1. Indicators

```
indicator_table = create_table_key(combined_df, 'indicator')
indicator_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B29%5D%3A-%C2%A0-id-indica"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>indicator</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>Deaths by Category of Cause - Millions</td></tr><tr><th>1</th><td>2</td><td>GDP (PPP) - Billion dollars</td></tr><tr><th>2</th><td>3</td><td>Governance Effectiveness - WB index</td></tr><tr><th>3</th><td>4</td><td>Malnourished Children, Headcount - Millions</td></tr><tr><th>4</th><td>5</td><td>Poverty Headcount less than $2.15 per Day, Log...</td></tr><tr><th>5</th><td>6</td><td>Sanitation Services, Access, Number of people,...</td></tr><tr><th>6</th><td>7</td><td>Sanitation Services, Access, percent of popula...</td></tr><tr><th>7</th><td>8</td><td>Sanitation Services, Expenditure, Capital, Bil...</td></tr><tr><th>8</th><td>9</td><td>Sanitation Services, Expenditure, Maintenance,...</td></tr><tr><th>9</th><td>10</td><td>State Failure Instability Event - IFs Index</td></tr><tr><th>10</th><td>11</td><td>Stunted children, History and Forecast - Million</td></tr><tr><th>11</th><td>12</td><td>Water Services, Access, Number of people, million</td></tr><tr><th>12</th><td>13</td><td>Water Services, Access, percent of population</td></tr><tr><th>13</th><td>14</td><td>Water Services, Expenditure, Capital, Billion $</td></tr><tr><th>14</th><td>15</td><td>Water Services, Expenditure, Maintenance, Bill...</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.C.2. Units

```
units_table = create_table_key(combined_df, 'unit')
units_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B30%5D%3A-%C2%A0-id-unit-0"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>unit</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>Billion $</td></tr><tr><th>1</th><td>2</td><td>Index</td></tr><tr><th>2</th><td>3</td><td>Index 0-5</td></tr><tr><th>3</th><td>4</td><td>Mil People</td></tr><tr><th>4</th><td>5</td><td>Million</td></tr><tr><th>5</th><td>6</td><td>Percent</td></tr><tr><th>6</th><td>7</td><td>Trillion $</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.C.3. Value Names

```
value_names_table = create_table_key(combined_df, 'value_name')
value_names_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B31%5D%3A-%C2%A0-id-value_"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe" style="width:17.619%;"><thead><tr style="text-align:right;"><th style="width:17.0068%;"> </th><th style="width:19.7279%;">id</th><th style="width:63.2653%;">value\_name</th></tr></thead><tbody><tr><th style="width:17.0068%;">0</th><td style="width:19.7279%;">1</td><td style="width:63.2653%;">Base</td></tr><tr><th style="width:17.0068%;">1</th><td style="width:19.7279%;">2</td><td style="width:63.2653%;">FS</td></tr><tr><th style="width:17.0068%;">2</th><td style="width:19.7279%;">3</td><td style="width:63.2653%;">FW</td></tr><tr><th style="width:17.0068%;">3</th><td style="width:19.7279%;">4</td><td style="width:63.2653%;">FWS</td></tr><tr><th style="width:17.0068%;">4</th><td style="width:19.7279%;">5</td><td style="width:63.2653%;">SI</td></tr><tr><th style="width:17.0068%;">5</th><td style="width:19.7279%;">6</td><td style="width:63.2653%;">WI</td></tr><tr><th style="width:17.0068%;">6</th><td style="width:19.7279%;">7</td><td style="width:63.2653%;">WSI</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.C.4. JMP Categories

```
jmp_categories_table = create_table_key(combined_df, 'jmp_category')
jmp_categories_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B32%5D%3A-%C2%A0-id-jmp_ca"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>jmp\_category</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>ALB</td></tr><tr><th>1</th><td>2</td><td>SM</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.C.5. JMP Names Table (Custom)

```
jmp_names_table = pd.DataFrame([
    {"id": 1,"jmp_name": "Water"},
    {"id": 2,"jmp_name": "Sanitation"},
    {"id": 3,"jmp_name": "Water and Sanitation"}
])
jmp_names_table.to_csv(f'{OUTPUT_DIR}/key_jmp_name.csv',index=False)
```

### 3.C.6. Commitments

```
commitments_table = create_table_key(combined_df, 'commitment')
commitments_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B34%5D%3A-%C2%A0-id-commit"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>commitment</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>0.5x</td></tr><tr><th>1</th><td>2</td><td>2030</td></tr><tr><th>2</th><td>3</td><td>2050</td></tr><tr><th>3</th><td>4</td><td>2x</td></tr><tr><th>4</th><td>5</td><td>4x</td></tr><tr><th>5</th><td>6</td><td>6x</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.C.7. Country

```
countries_table = create_table_key(combined_df, 'country')
countries_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B35%5D%3A-%C2%A0-id-countr"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>country</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>All Countries</td></tr><tr><th>1</th><td>2</td><td>Democratic Republic of the Congo</td></tr><tr><th>2</th><td>3</td><td>Ethiopia</td></tr><tr><th>3</th><td>4</td><td>Ghana</td></tr><tr><th>4</th><td>5</td><td>Guatemala</td></tr><tr><th>5</th><td>6</td><td>Haiti</td></tr><tr><th>6</th><td>7</td><td>India</td></tr><tr><th>7</th><td>8</td><td>Indonesia</td></tr><tr><th>8</th><td>9</td><td>Kenya</td></tr><tr><th>9</th><td>10</td><td>Liberia</td></tr><tr><th>10</th><td>11</td><td>Madagascar</td></tr><tr><th>11</th><td>12</td><td>Malawi</td></tr><tr><th>12</th><td>13</td><td>Mali</td></tr><tr><th>13</th><td>14</td><td>Mozambique</td></tr><tr><th>14</th><td>15</td><td>Nepal</td></tr><tr><th>15</th><td>16</td><td>Nigeria</td></tr><tr><th>16</th><td>17</td><td>Philippines</td></tr><tr><th>17</th><td>18</td><td>Rwanda</td></tr><tr><th>18</th><td>19</td><td>Senegal</td></tr><tr><th>19</th><td>20</td><td>South Sudan</td></tr><tr><th>20</th><td>21</td><td>Tanzania</td></tr><tr><th>21</th><td>22</td><td>Uganda</td></tr><tr><th>22</th><td>23</td><td>Zambia</td></tr></tbody></table>

</div></div></div></div></div></div>## 3.D. IFS Table Results

### 3.D.1. Custom Table Mapping (JMP Name)

<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell" id="bkmrk-fs-%3D-full-sanitation"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput">- FS = Full Sanitation Access
- FW = Full Water Access
- FWS = Full Water and Sanitation Access
- SI = Sanitation Increased
- WI = Water Increased
- WSI = Water and Sanitation Increased

</div></div></div></div><div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B36%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
jmp_dict = dict(zip(jmp_names_table['jmp_name'], jmp_names_table['id']))
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B37%5D%3A"></div>```
def map_jmp_id(x):
    value_name = x["value_name"]
    # For the Base data
    if value_name == "Base":
        if x["indicator"].startswith("Water"):
            return jmp_dict['Water']
        if x["indicator"].startswith("Sanitation"):
            return jmp_dict['Sanitation']
    if 'W' in value_name and 'S' in value_name: # Water and Sanitation is indicated by 'WS' combined
        return jmp_dict['Water and Sanitation']
    if 'W' in value_name:  # Water is indicated by 'W'
        return jmp_dict['Water']
    if 'S' in value_name:  # Sanitation is indicated by 'S'
        return jmp_dict['Sanitation']
    return 0
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-in%C2%A0%5B38%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
combined_df['jmp_name_id'] = combined_df.apply(map_jmp_id, axis=1)
combined_df.tail(2)
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B38%5D%3A-%C2%A0-indicator"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>indicator</th><th>year</th><th>country</th><th>unit</th><th>value\_name</th><th>jmp\_category</th><th>commitment</th><th>value</th><th>jmp\_name\_id</th></tr></thead><tbody><tr><th>34774</th><td>Malnourished Children, Headcount - Millions</td><td>2030</td><td>Zambia</td><td>Mil People</td><td>WSI</td><td>SM</td><td>6x</td><td>0.208</td><td>3</td></tr><tr><th>34775</th><td>Malnourished Children, Headcount - Millions</td><td>2050</td><td>Zambia</td><td>Mil People</td><td>WSI</td><td>SM</td><td>6x</td><td>0.143</td><td>3</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.D.2. IFS Key Table Mapping

```
table_with_id = merge_id(combined_df, indicator_table, 'indicator')
table_with_id = merge_id(table_with_id, units_table, 'unit')
table_with_id = merge_id(table_with_id, value_names_table, 'value_name')
table_with_id = merge_id(table_with_id, jmp_categories_table, 'jmp_category')
table_with_id = merge_id(table_with_id, commitments_table, 'commitment')
table_with_id = merge_id(table_with_id, countries_table, 'country')
```

### 3.D.3. IFS Final Result

```
table_with_id = table_with_id[table_with_id['value'].notna()].reset_index(drop=True)
table_with_id = table_with_id.sort_values(by='year').reset_index(drop=True)
table_with_id.reset_index(drop=True).tail()
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B40%5D%3A-%C2%A0-year-valu"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>year</th><th>value</th><th>jmp\_name\_id</th><th>indicator\_id</th><th>unit\_id</th><th>value\_name\_id</th><th>jmp\_category\_id</th><th>commitment\_id</th><th>country\_id</th></tr></thead><tbody><tr><th>34771</th><td>2050</td><td>1.406</td><td>1</td><td>14</td><td>1</td><td>3</td><td>1</td><td>3</td><td>23</td></tr><tr><th>34772</th><td>2050</td><td>1.539</td><td>3</td><td>5</td><td>4</td><td>7</td><td>1</td><td>5</td><td>18</td></tr><tr><th>34773</th><td>2050</td><td>2.348</td><td>1</td><td>14</td><td>1</td><td>3</td><td>2</td><td>3</td><td>23</td></tr><tr><th>34774</th><td>2050</td><td>1.097</td><td>1</td><td>5</td><td>4</td><td>6</td><td>2</td><td>6</td><td>18</td></tr><tr><th>34775</th><td>2050</td><td>0.143</td><td>3</td><td>4</td><td>4</td><td>7</td><td>2</td><td>6</td><td>23</td></tr></tbody></table>

</div></div></div></div></div></div>### 3.D.2. Save IFS Table

```
table_with_id.to_csv(IFS_OUTPUT_FILE, index=False)
```

# 2. JMP Dataset

```
data = pd.read_csv(JMP_INPUT_FILE, encoding='latin-1')
data.head()
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B42%5D%3A-%C2%A0-country%2C-"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>COUNTRY, AREA OR TERRITORY</th><th>Year</th><th>Type</th><th>TOTAL - At least basic</th><th>TOTAL - Annual rate of change in \\nat least basic</th><th>TOTAL - Safely managed</th><th>TOTAL - Annual rate of change in safely managed</th><th>TOTAL - Annual rate of change SM, manual calculation</th><th>TOTAL - Annual rate of change ALB, manual calculation</th></tr></thead><tbody><tr><th>0</th><td>Afghanistan</td><td>2000</td><td>Water</td><td>27.4</td><td>2.5</td><td>11.1</td><td>0.9</td><td>-99.0</td><td>-99.0</td></tr><tr><th>1</th><td>Afghanistan</td><td>2001</td><td>Water</td><td>27.5</td><td>2.5</td><td>11.1</td><td>0.9</td><td>0.0</td><td>0.0</td></tr><tr><th>2</th><td>Afghanistan</td><td>2002</td><td>Water</td><td>29.7</td><td>2.5</td><td>12.0</td><td>0.9</td><td>0.9</td><td>2.2</td></tr><tr><th>3</th><td>Afghanistan</td><td>2003</td><td>Water</td><td>31.9</td><td>2.5</td><td>12.9</td><td>0.9</td><td>0.9</td><td>2.2</td></tr><tr><th>4</th><td>Afghanistan</td><td>2004</td><td>Water</td><td>34.1</td><td>2.5</td><td>13.8</td><td>0.9</td><td>0.9</td><td>2.2</td></tr></tbody></table>

</div></div></div></div></div></div>## 2.A. JMP Data Processing

### 2.A.1. Rename the columns

```
data.columns = [
    'country',
    'year',
    'jmp_name',
    'total_ALB',
    'annual_rate_change_ALB',
    'total_SM',
    'annual_rate_change_SM',
    'manual_rate_change_SM',
    'manual_rate_change_ALB'
]
data.head()
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B43%5D%3A-%C2%A0-country-y"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>country</th><th>year</th><th>jmp\_name</th><th>total\_ALB</th><th>annual\_rate\_change\_ALB</th><th>total\_SM</th><th>annual\_rate\_change\_SM</th><th>manual\_rate\_change\_SM</th><th>manual\_rate\_change\_ALB</th></tr></thead><tbody><tr><th>0</th><td>Afghanistan</td><td>2000</td><td>Water</td><td>27.4</td><td>2.5</td><td>11.1</td><td>0.9</td><td>-99.0</td><td>-99.0</td></tr><tr><th>1</th><td>Afghanistan</td><td>2001</td><td>Water</td><td>27.5</td><td>2.5</td><td>11.1</td><td>0.9</td><td>0.0</td><td>0.0</td></tr><tr><th>2</th><td>Afghanistan</td><td>2002</td><td>Water</td><td>29.7</td><td>2.5</td><td>12.0</td><td>0.9</td><td>0.9</td><td>2.2</td></tr><tr><th>3</th><td>Afghanistan</td><td>2003</td><td>Water</td><td>31.9</td><td>2.5</td><td>12.9</td><td>0.9</td><td>0.9</td><td>2.2</td></tr><tr><th>4</th><td>Afghanistan</td><td>2004</td><td>Water</td><td>34.1</td><td>2.5</td><td>13.8</td><td>0.9</td><td>0.9</td><td>2.2</td></tr></tbody></table>

</div></div></div></div></div></div>### 2.A.2. Categorize the Values

```
data_melted = pd.melt(
    data, 
    id_vars=['country', 'year', 'jmp_name'],  # columns to keep
    var_name='variable',  # melted
    value_name='value' # values
)
data_melted['value_type'] = data_melted['variable'].apply(lambda x: 'total' if 'total' in x else 'annual_rate_change')
data_melted['jmp_category'] = data_melted['variable'].apply(lambda x: 'ALB' if 'ALB' in x else 'SM')
data_melted['jmp_category'] = data_melted['jmp_category'].apply(replace_bs_with_alb)
data_melted['country'] = data_melted['country'].apply(map_country_name)
data_melted = data_melted.drop(columns=['variable'])
data_melted.head()
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B44%5D%3A-%C2%A0-country-y"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>country</th><th>year</th><th>jmp\_name</th><th>value</th><th>value\_type</th><th>jmp\_category</th></tr></thead><tbody><tr><th>0</th><td>Afghanistan</td><td>2000</td><td>Water</td><td>27.4</td><td>total</td><td>ALB</td></tr><tr><th>1</th><td>Afghanistan</td><td>2001</td><td>Water</td><td>27.5</td><td>total</td><td>ALB</td></tr><tr><th>2</th><td>Afghanistan</td><td>2002</td><td>Water</td><td>29.7</td><td>total</td><td>ALB</td></tr><tr><th>3</th><td>Afghanistan</td><td>2003</td><td>Water</td><td>31.9</td><td>total</td><td>ALB</td></tr><tr><th>4</th><td>Afghanistan</td><td>2004</td><td>Water</td><td>34.1</td><td>total</td><td>ALB</td></tr></tbody></table>

</div></div></div></div></div></div>## 2.B. JMP Table Keys

### 2.B.1. JMP Categories (Retry)

```
jmp_categories_table = create_table_key(data_melted, 'jmp_category')
jmp_categories_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B45%5D%3A-%C2%A0-id-jmp_ca"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>jmp\_category</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>ALB</td></tr><tr><th>1</th><td>2</td><td>SM</td></tr></tbody></table>

</div></div></div></div></div></div>### 2.B.2. JMP Value Types

```
value_types_table = create_table_key(data_melted, 'value_type')
value_types_table
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B46%5D%3A-%C2%A0-id-value_"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>id</th><th>value\_type</th></tr></thead><tbody><tr><th>0</th><td>1</td><td>annual\_rate\_change</td></tr><tr><th>1</th><td>2</td><td>total</td></tr></tbody></table>

</div></div></div></div></div></div>## 2.C. JMP Table Results

### 2.C.1. JMP Key Table Mapping

<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B47%5D%3A"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea">  
</div></div></div>```
table_with_id = merge_id(data_melted, value_types_table, 'value_type')
table_with_id = merge_id(table_with_id, countries_table, 'country')
table_with_id = merge_id(table_with_id, jmp_names_table, 'jmp_name')
table_with_id = merge_id(table_with_id, jmp_categories_table, 'jmp_category')
```

### 2.C.2. JMP Data Cleanup

<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell" id="bkmrk-remove-nullable-coun"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput">- Remove Nullable Country

</div></div></div></div><div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs" id="bkmrk-in%C2%A0%5B48%5D%3A"><div class="jp-Cell-inputWrapper">  
</div></div>```
table_with_id = table_with_id[table_with_id['country_id'] != 0].reset_index(drop=True)
```

### 2.C.3. JMP Final Result

```
table_with_id.head()
```

<div class="jp-Cell jp-CodeCell jp-Notebook-cell" id="bkmrk-out%5B49%5D%3A-%C2%A0-year-valu"><div class="jp-Cell-inputWrapper"><div class="jp-InputArea jp-Cell-inputArea"><div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor"><div class="CodeMirror cm-s-jupyter">  
</div></div></div></div><div class="jp-Cell-outputWrapper"><div class="jp-OutputArea jp-Cell-outputArea"><div class="jp-OutputArea-child jp-OutputArea-executeResult"><div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult"><div><table class="dataframe"><thead><tr style="text-align:right;"><th> </th><th>year</th><th>value</th><th>value\_type\_id</th><th>country\_id</th><th>jmp\_name\_id</th><th>jmp\_category\_id</th></tr></thead><tbody><tr><th>0</th><td>2000</td><td>37.6</td><td>2</td><td>2</td><td>1</td><td>1</td></tr><tr><th>1</th><td>2001</td><td>37.5</td><td>2</td><td>2</td><td>1</td><td>1</td></tr><tr><th>2</th><td>2002</td><td>37.5</td><td>2</td><td>2</td><td>1</td><td>1</td></tr><tr><th>3</th><td>2003</td><td>37.4</td><td>2</td><td>2</td><td>1</td><td>1</td></tr><tr><th>4</th><td>2004</td><td>37.4</td><td>2</td><td>2</td><td>1</td><td>1</td></tr></tbody></table>

</div></div></div></div></div></div>### 2.C.3. Save JMP Table

```
table_with_id.to_csv(JMP_OUTPUT_FILE, index=False)
```