1 Building the data base of Polyphenols in Red Wine

1.1 Polyphenols in Phenol explorer database

The first source of polyphenols for this study is the Phenol Explorer data base version 3.6 (www.phenol-explorer.eu), in there it was selected the file: Polyphenols having composition data: compounds.csv. (main menu bar >> downloads), where we can find the entire list of polyphenols contents in foods and beverages.

Table 1: Polyphenols in Phenol Explorer

## Number of variables for each polyphenol:  13
## Total number of polyphenols in phenol explorer (november 2022):  501

There are a total of 501 polyphenols in phenol explorer, that have curated experimental data publications. Now let’s look which of them are reported in RED WINE.

1.1.1 Polyphenols in red wine

This is the important question in the study, to answer it we need to add a data frame from phenol explorer database, (main menu bar >> downloads): Complete composition data: composition-data.csv, which reports the composition for each polyphenol in every food.

## Number of variables for each polyphenol:  17
## Total number of entries for each food with their polyphenols composition data (november 2022, phenol explorer):  7486

There is a total of 7486 entries that describe every food with their polyphenol composition in Phenol Explorer. Next step is to explore the data, to identify how many food/beverage we have in the data base, w

## Number of food/beverage reported in phenol-explorer  458

Now we can answer the question: Which food or beverage have the more phenol content?:

Figure 1: Top 10 Food/beverage with more polyphenols compounds

As we can see there are information about 458 different food/beverage and Red wine is the one with the presence with a greater number of different polyphenols according to phenol-explorer. For the study we are going to analyse the polyphenols that are in Red wine, therefore, the next step is to select only the ones labeled as: Wine [red].

Table 2: Polyphenols in Red wine (Phenol-explorer raw data)

## Number of variables of Polyphenols in Red Wine:  17
## Total number of polyphenols in red wine (november 2022, phenol explorer):  123

Now we need to check that we have select correctly all the items that appear in both data frames for that we will intersect and identify if there are different polyphenols:

## Polyphenols that are different between both data frames:  02 mers 03 mers 04-06 mers 07-10 mers Polymers (>10 mers) Polyphenols, total

There are 6 different polyphenols that are present in food composition data frame and not in the general phenol explorer data frame this is due to that these 6 compounds represents a general group of polyphenols and not specific names, therefore they will be filter out.

## List of polyphenols to eliminate:  02 mers 03 mers 04-06 mers 07-10 mers Polymers (>10 mers) Polyphenols, total
## Number of polyphenols to filter out:  6
## Total number of polyphenols in red wine (filtered list):  117

At this point from the total of 501 polyphenols, we have filter 117 that are present in Red Wine according to Phenol-Explorer Data base.

Now that we have our table with polyphenols in red wine it is important to have all the information of the two data sets (Table 1 and 2) that have different number of variables. So it will be merge them into one single table.

## Number of variables of Polyphenols in Red wine with composition:  27
## Total number of Polyphenols in Red wine with composition (november 2022, Phenol Explorer):  117

We can see that we end up with 27 variables and 117 polyphenos.

Next step is to check that we do not have duplicates

## Number of duplicates in table 4:  13
## List of the duplicates:  Hesperetin Naringenin Apigenin Isorhamnetin Kaempferol Myricetin Quercetin Rhamnetin Lariciresinol Matairesinol Pinoresinol Secoisolariciresinol p-Coumaric acid
## Number of unique Polyhenols with composition:  104

There are 104 unique polyphenols in the table 4, and we can see that there are 13 duplicates. The reason of this duplicates is because they are analyzed by two different methods, (Chromatography and Chromatography after hydrolysis).

For the analysis in the next section we will use the unique names of the polyphenols so lets make a table with unique names (using Chromatography methodology)

Table 3: Polyphenols in Red wine (Phenol-explorer clean data)
## Number of unique Polyhenols with composition:  104

At this point we have one single table, with 104 unique polyphenols, with the data collected from Phenol-explorer, but there are more polyphenol information outside this data base.

1.2 Polyphenols in red wine outside phenol explorer

There are different articles and scientific papers that shows another polyphenols in red wine, so manually it was collected all the data obtained from different publications. And a csv table was created.

Table 4: Polyphenols in red wine from bibliography

## Number of variables of Polyphenols in Bibliography:  21
## total number of polyphenols in Bibliography:  40
## Polyphenols repeated in Phenol explorer and Bibliography:

There are not duplicates between table 3 and table 4.

Next step is to merge the list of polyphenols from the two sources Phenol Explorer + list from the bibliography, but first let’s explore the relation between them with a Venn diagram

This Diagram allow to see that there are 20 polyphenols that are in Phenol-explore data base, but have not been reported yet in Red Wine according to the curators. Also there are 20 polyphenols that are not listed in phenol explorer as compounds.

Which are the ones in Phenol explorer and outside the data base?

## Polyphenols that are in Phenol explorer but not classified as Red wine:  (+)-Catechin 3-O-gallate (+)-Gallocatechin 3-O-gallate Caffeic acid ethyl ester Cinnamic acid Cyanidin Cyanidin 3-O-(6''-p-coumaroyl-glucoside) Cyanidin 3-O-rutinoside Delphinidin 3,5-O-diglucoside Ellagic acid Feruloyl tartaric acid Homovanillic acid Homoveratric acid Morin Myricetin 3-O-galactoside Myricetin 3-O-glucoside Pelargonidin 3,5-O-diglucoside Petunidin 3,5-O-diglucoside Quercetin 3-O-galactoside Quercetin 3-O-glucuronide Quercetin 3-O-xyloside 
## 
## Polyphenols in Red wine but are not in Phenol Explorer:  (R)-rosmarinic acid Cyanidin-3,5-O-diglucoside Delphinidin 3-O-(6-caffeoyl-glucoside) Dihydrokaempferol-3-O-rhamnoside Ellagitannin Fisetin Kaempferol-3-O-caffeoylate Laricitrin-3-O-glucoside Malvidin-3-O-(6-acetyl-glucoside)-5-O-glucoside Malvidin-3-O-(6-caffeoyl-glucoside)-5-O-glucoside Malvidin-3-O-glucoside 4-pyruvic acid Malvidin-3,5-diglucoside Pelargonidin-3-O-(6-caffeoyl-glucoside) Pelargonidin-3-O-(6-coumaroyl)glucoside) Peonidin-3-O-(6-caffeoyl-glucoside)-5-O-glucoside Peonidin-3-O-(6-coumaroyl)glucoside)-5-O-glucoside Peonidin-3,5-O-diglucoside Petunidin-3-O-(6-caffeoyl-glucoside)-5-Oglucoside Syringetin-3-O-galactoside Syringetin-3-O-glucoside

Now let’s merge and have our final table

Table 5: Merge table of Phenol explorer and Literature search

## Number of variables of Polyphenols in Red wine:  28
## Total number of polyphenols in red wine:  144

After the merge it can be see that there is a total of 144 polyphenols in red wine, from Phenol explorer data base and from bibliography and 28 variables, lets look for the missing data and cleaning variables.

##Building the Polyphenol Red wine data base

First let’s see which information is missing:

There are 40 missing synonyms that are the ones from bibliography, we are going to complete this with PubChem data base and is going to be added, Canonical Smiles and Inchkey for each polyphenol.

It is important to note there are 20 polyphenols without information of their composition in red wine, because they where detected in red wine but the publications are not clear about the amount.

##PubChem

Canonical SMILES, synonyms, Inchi, inchikey molecular weight and checking of the PubChem ID was retrievied from PubChem Data Base. The 144 were search by name. (https://pubchem.ncbi.nlm.nih.gov/ ) (Kim et al., 2022). The Data base obtained is:

## Number of variables of Pubchem data frame:  24
## total number of polyphenols in PubChem:  140

We can see that there are 140 compounds with pubchem ID of the 144, 4 of them do not have pubchem ID, and they were search by name and there wasn’t any information either, this missing ID, make it difficult for further analysis, we are going to have them as polyphenol in red wine data base but they are not going to be used for further analysis.

## Total number of polyphenols in red wine with pubchem:  140
## [1] 28

Now we are going to add the synonyms, Pubchem ID, Inchikey and the smiles

## Number of variables of Polyphenols in Red wine data base:  32
## Total number of polyphenols in red wine data base for analysis:  140

At this step we have our table with 140 polyphenols with PubChem ID, smiles inchi and synonyms complete with 32 variables for each polyphenol

Now we are going to clean our table and eliminate the columns that we do not need. for example, the column of created_at and updated_at and id, are concerning only for phenol explorer data base, also the food, food_group_food_sub_group, is the same in each, because all of them are from wine so we are going to eliminate them.

Table 7: Red wine Polyphenols for analysis

## Number of variables of Polyphenols in Red wine:  25
## Total number of polyphenols in red wine:  140

Thanks to PubChem we have completed more data, and have added SMILE information. Now we are ready to explore and visualize our red wine polyphenols.

#Exploration of the Data

1.3 Class and Sub class of polyphenols in Red Wine

Let’s explore our final table of 140 polyphenols and group by class and sub class of polyphenols

In figure 3 we can observe that in Red Wine, Flavonoids are the ones that are in more amounts and in second place phenolic acids.

1.4 Content of polyphenols in Red Wine (mean value in mg/100ml)

Let’s visualize in descent order the mean content of the polyphenols in red wine, and represented in the figure 4. It is important to note that we have the report the mean value of 124/140 polyphenols. There are 16 missing values that are not been represented in the graph.

## Total number of polyphenols in red wine with composition:  124

If we analize for the polyphenols that are more present in Red wine we have this graph: Malvidin 3,5-diglucoside is the polyphenol that has been found in more Mg/100ml, this is reported by Zhao Q, Duan C, Wang J, 2010. with an average of 78.62 mg/100 mL. Also, we can notice that the Flavonoids is the class that is more detected, following by phenolic acid.The concentration rank is between 0.00- 78.62 mg/100 mL.

3 Polyphenol-Protein interaction STITCH

STITCH is data base that shows interaction chemical protein, therefore, it is important to download the tsv file of the protein chemical links detailes v.5.5 in homo sapines (9606) where show all the chemicals in their base.

FALSE Rows: 15473939 Columns: 7
FALSE ── Column specification ────────────────────────────────────────────────────────
FALSE Delimiter: "\t"
FALSE chr (2): chemical, protein
FALSE dbl (5): experimental, prediction, database, textmining, combined_score
FALSE 
FALSE ℹ Use `spec()` to retrieve the full column specification for this data.
FALSE ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

First we need to filter the polyphenols in red wine that we have, from the beginning we have 140 polyphenols and in STITCH we find:

## Number of Chemical-protein interactions:  14583

There are 14,583 entries of our red wine polyphenols now we can explore the data:

Let’s group them to identify the number of polyphenols with STITCH information:

## Number of polyphenols with protein interactions:  96

After exploring the data it was notice that, even though, there are 14647 proteins not all of them have experimental evidence, let’s filter and stay with direct evidence for the following analysis (the rest it will be available for the data set)

Table No. 13 Polyphenols-protein with experimental evidence.

## Number of Chemical-protein interactions with experimental evidence:  764
## Number of polyphenols with protein interactions with experimental evidence:  42
## Number of unique proteins with experimental evidence:  466

With the list of the names of the 42 polyphenols it is used STITCH data base in the web to build a network with 0.900 confidence and no more than 20 interactors and we can see in the figure: the representation of 40 polyphenols that meet the filters applied:

Figure 21 Network of Red wine Polyphenols

#Aplication of the RW data base

##Selecting Polyphenols related to cancer

## Number of polyphenols related to cancer:  35

There are 35 polyphenols related with cancer.

## Number of polyphenols related to cancer:  3

##Genes related to lung neoplasm

to know the genes related to cancer we can use CTD web data base and look for lung neoplasm (MESH:D008175), the data frame resulting is

Table 14: Genes related to Lung neoplasm

What are the genes shared between lung neoplasm and polyphenols?

Table 15: Genes related between polyphenols and Lung neoplasm
## Number of polyphenols with genes related 197

#String

With the list of 197 it can be used STRING and see the network and also verify with DAVID data base which are the pathways enriched with these genes.

Figure 23 Network on genes related to polyphenols and lung neoplasm

This is the network of the 197 genes and 0.900 confidence, there are a lot of PPI interaction.

We can check the 197 genes applying enrichment analysis to do that we are going to use David data base.

Were in the figure we can see that the Non small lung cancer is enriched (in the last line), therefore, our filter was correct:

If we select the item we can get the Kegg pathway related and the market with red star are the genes that are related with wine polyphenols and it is a starting point to find new targets.