smipoly.smip.funclib module

functions for MonomerClassifier (monc.py) and PolymerGenerator (polyg.py).

smipoly.smip.funclib.bipolymA(reactant, targ_rxn, monL, Ps_rxnL, P_class)

Generates a polymer CRU formed from two monomers by iteratively reacting a monomer until no further reactions are possible.

Parameters:
  • reactant (tuple) – A tuple of reactant molecules to be used in the reaction.

  • targ_rxn (rdkit.Chem.rdChemReactions.ChemicalReaction) – The target chemical reaction to apply.

  • monL (list) – A list of monomer SMARTS patterns indexed by integers.

  • Ps_rxnL (dict) – A list of monomer SMARTS patterns indexed by integers.

  • P_class (type) – A class type used for polymer processing.

Returns:

A list of SMILES strings representing the generated polymer products.

Return type:

list

smipoly.smip.funclib.coord_polym(smi, targ_rxn)

Generate a list of CRUs for olefin copolymer polymers from the input SMILES string with a target reaction.

Parameters:
  • smi (str) – The SMILES string of the input molecule.

  • targ_rxn (rdkit.Chem.rdChemReactions.ChemicalReaction) – The target reaction to apply to the input molecule.

Returns:

A list of unique SMILES strings representing the products of the reaction.

Return type:

list

smipoly.smip.funclib.count_fg(m, patt)

Counts the number of functional groups (FG) in a molecule based on a given pattern.

Parameters:
  • m (rdkit.Chem.Mol) – The molecule object to search for substructure matches.

  • patt (rdkit.Chem.Mol) – The pattern molecule used to identify substructure matches.

Returns:

The number of functional groups identified in the molecule.

Return type:

int

smipoly.smip.funclib.diene_12to14(smi, rxn)

Convert the structure of the 1,2-adducted CRU to a 1,4-adduct. Place this function right before def diene_14() so that it can be used within the function diene_14.

Parameters:
  • smi (str) – The input SMILES string containing asterisks (*) as placeholders.

  • rxn (rdkit.Chem.rdChemReactions.ChemicalReaction) – Ps_rxnL[209] was applied.

Returns:

The resulting SMILES string after the reaction, with placeholders replaced back to asterisks (*).

Return type:

str

Raises:
  • rdkit.Chem.rdchem.KekulizeException

  • If the molecule sanitization fails.

  • IndexError – If the reaction does not produce any products.

smipoly.smip.funclib.diene_14(x, rxn)

Generate 1,4-addition CRU from a conjugated diene monomer.

Parameters:
  • x (dict) – The results of olefin classification and the chemical structure of these CRU generated by ole_sel_cru.

  • rxn (rdkit.Chem.rdChemReactions.ChemicalReaction) – Ps_rxnL[209] was applied.

Returns:

The modified dictionary x with the transformed SMILES string in x[‘conjdiene’][2], if applicable. If ‘conjdiene’ is not present or empty, the dictionary is returned unchanged.

Return type:

dict

smipoly.smip.funclib.genc_smi(m)

Generates a RDkit canonical SMILES string from a molecule object.

Parameters:

m (rdkit.Chem.Mol) – A molecule object, from the RDKit library.

Returns:

The SMILES string representation of the molecule if successful, otherwise returns np.nan.

Return type:

str or np.nan

smipoly.smip.funclib.genmol(s)

Generates a molecular object from a SMILES string.

Parameters:

s (str) – A SMILES (Simplified Molecular Input Line Entry System) string representing the molecular structure.

Returns:

A molecular object if the SMILES string is valid, otherwise returns numpy.nan.

Return type:

rdkit.Chem.Mol or numpy.nan

smipoly.smip.funclib.homopolymA(mon1, mons, excls, targ_mon1, Ps_rxnL, mon_dic, monL)

Generates a polymer CRU formed from a single monomer by iteratively reacting a monomer until no further reactions are possible.

Parameters:
  • mon1 (rdkit.Chem.Mol) – The initial monomer to start the polymerization process.

  • mons (list) – A list of SMARTS strings representing monomer patterns to match against the molecule.

  • excls (list) – A list of SMARTS strings representing exclusion patterns to check against the molecule.

  • targ_mon1 (object) – The target monomer class for the polymerization process.

  • Ps_rxnL (list) – A dictionary of polymerization reaction objects indexed by integers.

  • mon_dic (dict) – A dictionary containing monomer class.

  • monL (list) – A list of monomer SMARTS patterns indexed by integers.

Returns:

A list of SMILES strings representing the generated homopolymers.

Return type:

list

smipoly.smip.funclib.monomer_sel_mfg(m, mons, excls)

Determining whether the given small molecule compound qualifies as a self-polymerizable monomer and categolize it into a monomer class.

Parameters:
  • m (rdkit.Chem.Mol) – The molecule to be analyzed. If None or NaN, the function returns default values.

  • mons (list of str) – A list of SMARTS strings representing monomer patterns to match against the molecule.

  • excls (list of str) – A list of SMARTS strings representing exclusion patterns to check against the molecule.

Returns:

A list containing:
  • fchk (bool): True if the molecule matches any monomer pattern

and does not match any exclusion pattern,

otherwise False.

  • fchk_c (int): The total count of substructure matches

for all monomer patterns.

Return type:

list

smipoly.smip.funclib.monomer_sel_pfg(m, mons, excls, minFG, maxFG)

Determining whether the given small molecule compound qualifies as a monomer or not. If so, count a number of polymerizeble functional group and categolize it into a monomer class.

Parameters:
  • m (rdkit.Chem.Mol) – The monomer molecule to evaluate.

  • mons (list of str) – A list of SMARTS patterns representing the functional groups to count in the monomer.

  • excls (list of str) – A list of SMARTS patterns representing the exclusion patterns to check against the monomer.

  • minFG (int) – The minimum number of functional groups required.

  • maxFG (int) – The maximum number of functional groups allowed.

Returns:

A list containing:
  • fchk (bool): True if the monomer satisfies the conditions,

False otherwise. - fchk_c (int): The total count of functional groups found in the monomer.

Return type:

list

smipoly.smip.funclib.ole_cru_gen(m, mon)

Generates a CRU from olefinic monomer by applying a reaction iteratively until no further reactions are possible.

Parameters:
  • m (rdkit.Chem.Mol) – The input molecule to which the reaction will be applied.

  • mon (str) – A SMARTS string representing the monomer pattern.

Returns:

A list containing:
  • rdkit.Chem.Mol: The final CRU after all reactions.

  • list of str: A list of SMILES strings for CRUs.

Return type:

list

Raises:
  • Exception – If there is an issue with sanitizing the molecule

  • during reaction processing.

smipoly.smip.funclib.ole_rxnsmarts_gen(reactant)

Generates a polymerization reaction SMARTS string for a given olefinic monomer. Place this function right before def ole_cru_gen() so that it can be used within the function ole_cru_gen.

Parameters:

reactant (str) – The input reactant string in SMARTS format.

Returns:

The reaction SMARTS string representing the transformation from the reactant to the product.

Return type:

str

smipoly.smip.funclib.ole_sel_cru(m, mons, excls, minFG, maxFG)

Selects and processes a molecule based on specific criteria and generates a SMILES representation.

Parameters:
  • m (rdkit.Chem.Mol) – The molecule to be processed.

  • mons (list of str) – A list of SMARTS patterns representing the functional groups to count in the monomer.

  • excls (list of str) – A list of SMARTS patterns representing the exclusion patterns to check against the monomer.

  • minFG (int) – The minimum number of olefinic polymerizable site required.

  • maxFG (int) – The maximum number of olefinic polymerizable site allowed.

Returns:

A list containing:
  • The result of the monomer_sel_pfg function (list of bool and other values).

  • The SMILES representation of the processed molecule (str).

Return type:

list

smipoly.smip.funclib.seq_chain(prod_P, targ_mon1, Ps_rxnL, mon_dic, monL)

This function applied to multifunctional monomers for chain polymerization except polyolefine. Processes a molecular structure by applying a sequential reactions based on specific substructure matches.

Parameters:
  • prod_P (rdkit.Chem.Mol) – The input molecule to be processed.

  • targ_mon1 (str) – Target monomer type, used to determine processing logic.

  • Ps_rxnL (dict) – A dictionary of polymerization reaction objects indexed by integers.

  • mon_dic (dict) – A dictionary containing monomer class (not used in this function).

  • monL (list) – A list of monomer SMARTS patterns indexed by integers.

Returns:

The processed molecule after applying the reactions.

Return type:

rdkit.Chem.Mol

smipoly.smip.funclib.seq_successive(prod_P, targ_rxn, monL, Ps_rxnL, P_class)

This function applied to multifunctional monomers for successive polymerization. Processes a molecular structure by applying a sequential reactions based on specific substructure matches.

Parameters:
  • prod_P (rdkit.Chem.Mol) – The product molecule to be processed.

  • targ_rxn (Any) – Target reaction (not used in the current implementation).

  • monL (list) – A list containing SMARTS patterns for functional groups.

  • Ps_rxnL (list) – A list of polymerization reaction objects to be applied to the product molecule.

  • P_class (str) – The polymer class of the product molecule, which determines the reaction sequence.

Returns:

The processed product molecule after applying the reaction sequence.

Return type:

rdkit.Chem.Mol

Notes

  • The function uses substructure matching to determine which reactions to apply.

  • The behavior of the function depends on the P_class of the molecule.

  • Specific reaction sequences are applied for classes such as ‘polyolefin’, ‘polyoxazolidone’, ‘polyimide’, and ‘polyester’.

  • If the P_class is not recognized, the product molecule is returned unchanged.

smipoly.smip.funclib.update_nested_dict(row, dict_col, new_val, updated_k)

Used in the function ‘olecls’. If the classification result for the olefin class is True, writes the number of functional groups and the SMILES notation of the CRU.

Parameters:
  • row (dict) – The dictionary representing a row of data.

  • dict_col (str) – The key in the row that contains the nested dictionary to be updated.

  • new_val (str) – The key in the row whose value will be assigned to the nested dictionary.

  • updated_k (str) – The key in the nested dictionary to be updated.

Returns:

The updated row with the modified nested dictionary.

Return type:

dict