smipoly.smip.monc module

Monomer categolization system of the compound list in SMILES.

Classifies monomers based on functional groups and other criteria. This function processes a DataFrame containing SMILES strings, extracts and classifies monomers, and appends the results to the DataFrame. It also supports optional display of classification results.

smipoly.smip.monc.moncls(df, smiColn, minFG=None, maxFG=None, dsp_rsl=None)

Select monomers from given dataset of small molecule compounds and categolize them into a monomer class.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing chemical data. smiColn (str): Column name in the DataFrame containing SMILES strings.

  • minFG (int, optional) – Minimum number of functional groups for poly-functionalized monomers. Defaults to 2.

  • maxFG (int, optional) – Maximum number of functional groups for poly-functionalized monomers. Defaults to 4.

  • dsp_rsl (bool, optional) – Whether to display classification results. Defaults to False.

Returns:

A modified DataFrame with classification results appended.

Return type:

pd.DataFrame

Raises:

ValueError – If the specified SMILES column name is invalid.

Notes

  • The function appends additional rows for carbonate structures.

smipoly.smip.monc.olecls(df, smiColn, minFG=None, maxFG=None, dsp_rsl=None)

Select olefinic monomers from given dataset of small molecule compounds and categolize them into a olefinic monomer class.

Parameters:
  • df (pd.DataFrame) – The input DataFrame containing chemical data. Must include the structure of a compound written in SMILES.

  • smiColn (str) – The column name in the DataFrame containing SMILES strings.

  • minFG (int, optional) – Minimum number of functional groups to consider. Defaults to 1.

  • maxFG (int, optional) – Maximum number of functional groups to consider. Defaults to 4.

  • dsp_rsl (bool, optional) – Whether to display results during processing. Defaults to False.

Returns:

The updated DataFrame with olefin classification results.

Return type:

pd.DataFrame

Notes

  • The function assumes the existence of several global variables such as monLg, exclLg, mon_vals, mon_dic_inv, and Ps_rxnL.

  • The function modifies the input DataFrame by adding new columns for olefin classification.

  • The genmol, genc_smi, ole_sel_cru, update_nested_dict and diene_14 functions are defined in ‘funclib.py’.

  • The ole_cls column is refined for conjugated diene classification using a specific reaction.