复现——Data-Driven Design of High-Performance Polyimides With Enhanced Heat Resistance and Dielectric Properties
标题——数据驱动的高性能聚酰亚胺设计:增强耐热性与介电性能
期刊名称:Advanced Functional Materials
文章解读:
intrduction:强调背景、抛砖引玉、娓娓道来
- 对象:聚酰亚胺Pi,在高频、高温下的需求
- 策略:mag,先讲传统策略的缺陷,劣势;突出强调mag方法
- mag:The MGA is a systematic research strategy to accelerate the discovery and optimization of new materials through high-throughput experimentation, computation, simulation, and data analysis. The core of MGA is to accurately predict material properties through machine learning (ML) and effectively screen desired materials from candidates.
- 举例:其他人用mag做出了什么,开始说在Pi领域也有人在用mag做研究。
- However:(画风一转)However, applying these methods to the design of polyimides still faces numerous challenges, such as the issue of data sparsity in high-frequency dielectric properties and the need for precise representation of polyimide structures.
点明矛盾问题:高频段数据稀缺;聚亚酰胺的空间结构的精确表达 - 工作方法:the classical Havriliak-Negami (H-N) dielectric relaxation dual-parameter model——对于高频段数据稀缺性采用经典的Havriliak-Negami(H-N)介电弛豫双参数模型(第一性原理的引入);
developed multi-level descriptors to comprehensively capture the characteristics of the molecular structures of PIs——对于后者空间结构的表征采用的是多级描述符 - 多层神经网络HNN进行多任务学习MTL——Subsequently, using these data, we employed multi-task learning (MTL) with hierarchical neural networks (HNN) to establish an efficient and accurate machine learning model.
- 人工神经网络ANN——Meanwhile, an artificial neural network (ANN) model for predicting the glass transition temperature of PIs was developed
- 遗传算法——A genetic algorithm was employed to create a series of PIs exhibiting exceptional high-frequency dielectric properties and heat resistance.
结果与讨论
- Section 2.1 introduces the data preparation, and Section 2.2 details the workflow of ML model construction, as shown in Figure 1b. Section 2.3 describes the structural design of polyimides using genetic algorithms. In Section 2.4, we validated the reliability of the MGA method through experiments. Section 2.5 reveals potential chemical rules through interpretable feature analysis.
2.1数据准备
原始数据:ployinfo总的聚亚酰胺的重复单元、介电性能(指不同频率和温度下的介电常数与介电损耗)、玻璃转化温度等参数;另一部分人工方式提取。
对象限制:所以聚酰亚胺均为不可交联型
数据拓展:Considering the need for sufficient low-frequency dielectric performance data to obtain the parameters of Equation 1, fitting was performed for 13 kinds of PIs that met the criteria (having at least 5 data points for low-frequency dielectric properties at the same test temperature).对低频拟合,再对高频计算
![[../../images/Pasted image 20251104140004.png]]分布展示
![[../../images/Pasted image 20251104140130.png]]关于数据表征:聚合物表征——简化的分子输入行条目系统(SMILES)
空间特征表征(多级描述符):were automatically generated using the Python third-party library Mordred。(Python第三方库)
另外一种:RDKit固有的分子描述符。特征工程:将描述符数量减少,减少至130、107和170个
子结构描述符SD方法,关键基元,更微小的结构。
gspan算法:基于图的子结构模式挖掘算法,来提取每种聚酰亚胺中与各项性能相关的子结构,从而整合不同子结构对目标性能的贡献度
结合定量结构-性能关系分析,我们最终分别获得了508、284和1313个特征,用于构建聚酰亚胺结构与介电常数、介电损耗和玻璃化转变温度之间的定量关系模型。
2.2基于机器学习的定量结构-性质关系模型构建
- 简单来说HNN来多精度问题;ANN用来预测缺失的玻璃化转变温度;广泛使用的高斯过程回归(GPR)模型被确立为基准模型,用以评估神经网络模型的性能。
- Due to the significant impact of the size of the feature set on the performance of ML models, a practical and highly accurate dimensionality reduction method is necessary to reduce redundant information in the features.需要采用一种实用且高精度的降维方法,以减少特征中的冗余信息。
- We selected three feature reduction methods used in ML, including LASSO regression, ridge regression, and recursive feature elimination (RFE)
- Note that we have taken the logarithm of the dielectric constants and the dielectric losses to guarantee that the final predicted values are all positive.
未完待续