The first source of polyphenols for this study is the Phenol Explorer data base version 3.6 (www.phenol-explorer.eu), in there it was selected the file: Polyphenols having composition data: compounds.csv. (main menu bar >> downloads), where we can find the entire list of polyphenols contents in foods and beverages.
Table 1: Polyphenols in Phenol Explorer
## Number of variables for each polyphenol: 13
## Total number of polyphenols in phenol explorer (november 2022): 501
There are a total of 501 polyphenols in phenol explorer, that have curated experimental data publications. Now let’s look which of them are reported in RED WINE.
This is the important question in the study, to answer it we need to add a data frame from phenol explorer database, (main menu bar >> downloads): Complete composition data: composition-data.csv, which reports the composition for each polyphenol in every food.
## Number of variables for each polyphenol: 17
## Total number of entries for each food with their polyphenols composition data (november 2022, phenol explorer): 7486
There is a total of 7486 entries that describe every food with their polyphenol composition in Phenol Explorer. Next step is to explore the data, to identify how many food/beverage we have in the data base, w
## Number of food/beverage reported in phenol-explorer 458
Now we can answer the question: Which food or beverage have the more phenol content?:
Figure 1: Top 10 Food/beverage with more polyphenols compounds
As we can see there are information about 458 different food/beverage and Red wine is the one with the presence with a greater number of different polyphenols according to phenol-explorer. For the study we are going to analyse the polyphenols that are in Red wine, therefore, the next step is to select only the ones labeled as: Wine [red].
Table 2: Polyphenols in Red wine (Phenol-explorer raw data)
## Number of variables of Polyphenols in Red Wine: 17
## Total number of polyphenols in red wine (november 2022, phenol explorer): 123
Now we need to check that we have select correctly all the items that appear in both data frames for that we will intersect and identify if there are different polyphenols:
## Polyphenols that are different between both data frames: 02 mers 03 mers 04-06 mers 07-10 mers Polymers (>10 mers) Polyphenols, total
There are 6 different polyphenols that are present in food composition data frame and not in the general phenol explorer data frame this is due to that these 6 compounds represents a general group of polyphenols and not specific names, therefore they will be filter out.
## List of polyphenols to eliminate: 02 mers 03 mers 04-06 mers 07-10 mers Polymers (>10 mers) Polyphenols, total
## Number of polyphenols to filter out: 6
## Total number of polyphenols in red wine (filtered list): 117
At this point from the total of 501 polyphenols, we have filter 117 that are present in Red Wine according to Phenol-Explorer Data base.
Now that we have our table with polyphenols in red wine it is important to have all the information of the two data sets (Table 1 and 2) that have different number of variables. So it will be merge them into one single table.
## Number of variables of Polyphenols in Red wine with composition: 27
## Total number of Polyphenols in Red wine with composition (november 2022, Phenol Explorer): 117
We can see that we end up with 27 variables and 117 polyphenos.
Next step is to check that we do not have duplicates
## Number of duplicates in table 4: 13
## List of the duplicates: Hesperetin Naringenin Apigenin Isorhamnetin Kaempferol Myricetin Quercetin Rhamnetin Lariciresinol Matairesinol Pinoresinol Secoisolariciresinol p-Coumaric acid
## Number of unique Polyhenols with composition: 104
There are 104 unique polyphenols in the table 4, and we can see that there are 13 duplicates. The reason of this duplicates is because they are analyzed by two different methods, (Chromatography and Chromatography after hydrolysis).
For the analysis in the next section we will use the unique names of the polyphenols so lets make a table with unique names (using Chromatography methodology)
Table 3: Polyphenols in Red wine (Phenol-explorer clean data)## Number of unique Polyhenols with composition: 104
At this point we have one single table, with 104 unique polyphenols, with the data collected from Phenol-explorer, but there are more polyphenol information outside this data base.
There are different articles and scientific papers that shows another polyphenols in red wine, so manually it was collected all the data obtained from different publications. And a csv table was created.
Table 4: Polyphenols in red wine from bibliography
## Number of variables of Polyphenols in Bibliography: 21
## total number of polyphenols in Bibliography: 40
## Polyphenols repeated in Phenol explorer and Bibliography:
There are not duplicates between table 3 and table 4.
Next step is to merge the list of polyphenols from the two sources Phenol Explorer + list from the bibliography, but first let’s explore the relation between them with a Venn diagram
This Diagram allow to see that there
are 20 polyphenols that are in Phenol-explore data base, but have not
been reported yet in Red Wine according to the curators. Also there are
20 polyphenols that are not listed in phenol explorer as compounds.
Which are the ones in Phenol explorer and outside the data base?
## Polyphenols that are in Phenol explorer but not classified as Red wine: (+)-Catechin 3-O-gallate (+)-Gallocatechin 3-O-gallate Caffeic acid ethyl ester Cinnamic acid Cyanidin Cyanidin 3-O-(6''-p-coumaroyl-glucoside) Cyanidin 3-O-rutinoside Delphinidin 3,5-O-diglucoside Ellagic acid Feruloyl tartaric acid Homovanillic acid Homoveratric acid Morin Myricetin 3-O-galactoside Myricetin 3-O-glucoside Pelargonidin 3,5-O-diglucoside Petunidin 3,5-O-diglucoside Quercetin 3-O-galactoside Quercetin 3-O-glucuronide Quercetin 3-O-xyloside
##
## Polyphenols in Red wine but are not in Phenol Explorer: (R)-rosmarinic acid Cyanidin-3,5-O-diglucoside Delphinidin 3-O-(6-caffeoyl-glucoside) Dihydrokaempferol-3-O-rhamnoside Ellagitannin Fisetin Kaempferol-3-O-caffeoylate Laricitrin-3-O-glucoside Malvidin-3-O-(6-acetyl-glucoside)-5-O-glucoside Malvidin-3-O-(6-caffeoyl-glucoside)-5-O-glucoside Malvidin-3-O-glucoside 4-pyruvic acid Malvidin-3,5-diglucoside Pelargonidin-3-O-(6-caffeoyl-glucoside) Pelargonidin-3-O-(6-coumaroyl)glucoside) Peonidin-3-O-(6-caffeoyl-glucoside)-5-O-glucoside Peonidin-3-O-(6-coumaroyl)glucoside)-5-O-glucoside Peonidin-3,5-O-diglucoside Petunidin-3-O-(6-caffeoyl-glucoside)-5-Oglucoside Syringetin-3-O-galactoside Syringetin-3-O-glucoside
Now let’s merge and have our final table
Table 5: Merge table of Phenol explorer and Literature search
## Number of variables of Polyphenols in Red wine: 28
## Total number of polyphenols in red wine: 144
After the merge it can be see that there is a total of 144 polyphenols in red wine, from Phenol explorer data base and from bibliography and 28 variables, lets look for the missing data and cleaning variables.
##Building the Polyphenol Red wine data base
First let’s see which information is missing:
There are 40 missing synonyms that are the ones from bibliography, we are going to complete this with PubChem data base and is going to be added, Canonical Smiles and Inchkey for each polyphenol.
It is important to note there are 20 polyphenols without information of their composition in red wine, because they where detected in red wine but the publications are not clear about the amount.
##PubChem
Canonical SMILES, synonyms, Inchi, inchikey molecular weight and checking of the PubChem ID was retrievied from PubChem Data Base. The 144 were search by name. (https://pubchem.ncbi.nlm.nih.gov/ ) (Kim et al., 2022). The Data base obtained is:
## Number of variables of Pubchem data frame: 24
## total number of polyphenols in PubChem: 140
We can see that there are 140 compounds with pubchem ID of the 144, 4 of them do not have pubchem ID, and they were search by name and there wasn’t any information either, this missing ID, make it difficult for further analysis, we are going to have them as polyphenol in red wine data base but they are not going to be used for further analysis.
## Total number of polyphenols in red wine with pubchem: 140
## [1] 28
Now we are going to add the synonyms, Pubchem ID, Inchikey and the smiles
## Number of variables of Polyphenols in Red wine data base: 32
## Total number of polyphenols in red wine data base for analysis: 140
At this step we have our table with 140 polyphenols with PubChem ID, smiles inchi and synonyms complete with 32 variables for each polyphenol
Now we are going to clean our table and eliminate the columns that we do not need. for example, the column of created_at and updated_at and id, are concerning only for phenol explorer data base, also the food, food_group_food_sub_group, is the same in each, because all of them are from wine so we are going to eliminate them.
Table 7: Red wine Polyphenols for analysis
## Number of variables of Polyphenols in Red wine: 25
## Total number of polyphenols in red wine: 140
Thanks to PubChem we have completed more data, and have added SMILE information. Now we are ready to explore and visualize our red wine polyphenols.
#Exploration of the Data
Let’s explore our final table of 140 polyphenols and group by class and sub class of polyphenols
In figure 3 we can observe that in Red Wine, Flavonoids are the ones
that are in more amounts and in second place phenolic acids.
Let’s visualize in descent order the mean content of the polyphenols in red wine, and represented in the figure 4. It is important to note that we have the report the mean value of 124/140 polyphenols. There are 16 missing values that are not been represented in the graph.
## Total number of polyphenols in red wine with composition: 124
If we analize for the polyphenols that are more present in Red wine
we have this graph:
Malvidin 3,5-diglucoside is the polyphenol that has been found in more
Mg/100ml, this is reported by Zhao Q, Duan C, Wang J, 2010. with an
average of 78.62 mg/100 mL. Also, we can notice that the Flavonoids is
the class that is more detected, following by phenolic acid.The
concentration rank is between 0.00- 78.62 mg/100 mL.