Category: Uncategorized

Getting Data From Website by Using ‘Selenium’

Apr 8th, 2023

%%shell

cat > /etc/apt/sources.list.d/debian.list <<‘EOF’
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF

apt-key adv –keyserver keyserver.ubuntu.com –recv-keys DCC9EFBF77E11517
apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 648ACFD622F3D138
apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 112695A0E562B32A

apt-key export 77E11517 | gpg –dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg –dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg –dearmour -o /usr/share/keyrings/debian-security-buster.gpg

cat > /etc/apt/preferences.d/chromium.pref << ‘EOF’
Package: *
Pin: release a=eoan
Pin-Priority: 500

Package: *
Pin: origin “deb.debian.org”
Pin-Priority: 300

Package: chromium*
Pin: origin “deb.debian.org”
Pin-Priority: 700
EOF

!apt-get update
!apt-get install chromium chromium-driver

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup as bs
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
import pandas as pd

from selenium.webdriver.common.by import By
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def web_driver():
options = webdriver.ChromeOptions()
options.add_argument(“–verbose”)
options.add_argument(‘–no-sandbox’)
options.add_argument(‘–headless’)
options.add_argument(‘–disable-gpu’)
options.add_argument(“–window-size=1920, 1200”)
options.add_argument(‘–disable-dev-shm-usage’)
driver = webdriver.Chrome(options=options)
return driver

url = “https://www.imdb.com/title/tt3371366/reviews?ref_=tt_urv”

driver = web_driver()
driver.get(url)

while True:
try:
load_more_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, “load-more-trigger”)))
load_more_button.click()
time.sleep(2) # wait for the reviews to load
except NoSuchElementException:
break

reviews = []
ratings = []
review_elements = driver.find_elements(By.CLASS_NAME, “lister-item-content”)
for review in review_elements:
try:
# extract the text and rating of the review
content = review.find_element(By.CLASS_NAME, “content”)
text = content.find_element(By.CLASS_NAME, “text”).text
rating = content.find_element(By.CLASS_NAME, “ipl-rating-star__rating”).text
reviews.append(text)
ratings.append(rating)
except NoSuchElementException:
# skip over the review if any element is not found
continue

reviews_list = []
for review, rating in zip(reviews, ratings):
try:
title = review_elements[reviews.index(review)].find_element(By.CLASS_NAME, “title”).text
reviews_list.append({“title”: title, “rating”: rating, “text”: review})
except NoSuchElementException:
continue

imdb_reviews = pd.DataFrame(reviews_list)

import os

directory = ‘drive/MyDrive/IMDB’
if not os.path.exists(directory):
os.makedirs(directory)

imdb_reviews.to_csv(‘drive/MyDrive/transformers_reviews.csv’)

import numpy as np

imdb_reviews[‘text’].replace(“”, np.nan, inplace=True)
imdb_reviews.dropna(inplace=True)
imdb_reviews[‘text’] = imdb_reviews[‘text’].str.lower()
spec_chars = [“±”,”@”,”#”,”$”,”%”,”^”,
“&”,”*”,”(“,”)”,”_”,”+”,”=”,
“-“,”/”,”>”,”<“,”?”,
“~”,”`”,”‘”,”[“,”]”,”|”,”}”,
“{“,’”‘, “.”,”,”,”!”,”;”]

for char in spec_chars:
imdb_reviews[“text”] = imdb_reviews[“text”].str.replace(char, “”)
imdb_reviews[‘text’] = imdb_reviews[‘text’].str.replace(‘\n’, “”)
imdb_reviews[“text”].apply(lambda x: x.encode(‘ascii’, ‘ignore’).decode(‘ascii’))
imdb_reviews[“text”] = imdb_reviews[“text”].str.replace(‘\d+’, “”) # Remove numbers using regex
imdb_reviews[“rating”] = imdb_reviews[“rating”].apply(lambda x: x.split(“/”)[0])

This code is a Python script for scraping reviews of a movie from the IMDB website using the Selenium and BeautifulSoup libraries. The reviews are saved in a CSV file and preprocessed using Numpy and Pandas.

The script starts by adding the Debian Buster repository and keys to the sources.list.d directory using cat and apt-key commands. The script then prefers the Debian repository for the chromium package and installs it using apt-get commands.

Next, the script imports the necessary libraries for scraping and pre-processing the reviews. It defines a function, web_driver(), to initialize a Selenium web driver with the required options. It then sets the URL of the movie reviews page and opens it using the web driver.

The script clicks on the “Load More” button to load all the reviews available on the page. It then extracts the text and rating of each review using Selenium and stores them in two lists, reviews and ratings.

The script then creates a list of dictionaries, reviews_list, to store the extracted reviews, ratings, and titles. It uses a loop to iterate over the reviews and ratings lists, extracts the title of each review, and appends a dictionary with the title, rating, and text to reviews_list.

The script converts the reviews_list to a Pandas DataFrame, imdb_reviews, and saves it to a CSV file using the to_csv method. It then preprocesses the reviews by removing special characters, newlines, and numbers using Numpy and Pandas methods.

Finally, the preprocessed reviews are saved to the ‘text’ column of the imdb_reviews DataFrame, which can be used for further analysis.
Programmatic interpretation of Big Mac index and tall latte index

Jan 16th, 2023
Jiseok Oh

Background information

<About Big Mac index>

“The Big Mac Index is published by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and provides a test of the extent to which market exchange rates result in goods costing the same in different countries. It ‘seeks to make exchange-rate theory a bit more digestible.’ The index, created in 1986, takes its name from the Big Mac, a hamburger sold at McDonald’s restaurants.”

For example, the Korean Big Mac’s price is 4,500 won, which is 3.84 dollars in dollars, and 5.3 dollars in the US. Since the exchange rate between Korea and the US is 1,145 won per dollar. It can be seen that the won is 27.5%(1-(3.84/5.3)) undervalued than the dollar.

<About Tall Latte index>

The Tall Latte Index, like the Big Mac Index, is an index that compares the prices of Starbucks Tall Lattes that are spread around the world to know the economic level of a country. The calculation method is also the same as the Big Mac index.

Purpose

The goal is to be able to program the Big Mac index and the tall latte index and to prove it

Programming Design and Practice by C language for currency related index

Key assumption

Currency value declines and increases in real time due to changes in currency value.

The currency changes in real time, but the exchange rate can be expressed in the key currency, the dollar.

The Big Mac Index and Tall Latte Index as currency related indexes can be used as indexes that reflect currency value according to exchange rates.

Programming Procedure

<Search equation of Big Mac index>

Values
- MU : Monetary unit that user want to know BIg Mac index
- MC : Big Mac prices in countries where users want to know the Big Mac index
- ER : Exchange Rate
- 5 : US Big Mac price
Equation
- Big Mac index = MCER5
<Search equation of Tall Latte index>

Values
- MU : Monetary unit that user want to know Tall Latte index
- MC : Tall Latte prices in countries where users want to know the Tall Latte index
- ER : Exchange Rate
- 3.85 : US Tall Latte price 5
Equation
- Tall Latte index = MCER3.65
<Make Big Mac index and Tall Latte index calculator with C language>

Values
- reply : Record user’s reply
- MU : Monetary unit that user want to know Big Mac index or Tall Latte index
- MC : Big Mac or Tall Latte prices in countries where users want to know the Tall Latte
index
- ER : Exchange Rate
- FIN : Final calculated value
Functions

  BIG : Calculate Big Mac index

  TALL : Calculate Tall Latte index

  Main : print question for user and call function BIG of TALL

Final code

  <Function Main>

Figure 1. Main Function

In Main(Figure 1), the program asks the user a few questions. You can answer with 1 or 2, and if you answer with a character other than 1 or 2, an error message is displayed.

Through the answer variable, the user’s answer is recorded, and it is possible to determine whether to use the Big Mac index calculator or the Tall Latte index calculator. When everything is done in the Main function, the BIG function or the TALL function is called.

<Function BIG>

If you select the route from the main function to the BIG function(Figure 2), the big function is called. BIG Function provides a Big Mac index calculator and asks a few questions to get the BIG Mac index.

In the BIG function, after calculating the Big Mac index, if it is 1 or more, it indicates that it is overvalued compared to the dollar, and if it is less than 1, it indicates that it is undervalued compared to the dollar.

Figure 2. BIG Function

<Function TALL>

If you select the route from the main function to the TALL function(Figure 3), the toll function is called. Like the BIG function, the TALL function derives the Tall Latte index after a few questions.

Like the BIG function, the TALL function is programmed to have a different output based on the Tall Latte index of 1.

Figure 3. TALL Function

Result

Big Mac index calculating

Real index

A Big Mac costs 4,500 won in South Korea and US$5.66 in the United States. The implied exchange rate is 795.05. The difference between this and the actual exchange rate, 1,097.35, suggests the South Korean won is 27.5% undervalued

In Big Mac index calculator

KRW was esteemed 27.5247% lower than USD

Tall Latte index calculating

Real index

Tall Latte index in South Korea : 7% overvalued than USD

In Tall Latte index calculator

KRW was esteemed 7.3332% higher than USD

Discussion

The current program has a fixed US price of Big Mac and Tall Latte. How can you develop the program if these prices change? If the US prices for Big Macs and Tall Lattes change, the program will also need to be modified. However, it makes no sense to keep changing the fixed value. Therefore, the price of the American Big Mac and Tall Latte can be input as a variable, and in the future, the Big Mac Index and Tall Latte Index can be displayed by simply typing the country name by connecting to a real-time database.

Conclusion

  Through this experiment, it was found that the value of money can be compared with a specific object. In modern society, the exchange rate changes in real time, but the price of goods does not change in real time. Therefore, if you compare the price of a product with the exchange rate, you can see which product should be bought in which currency and whether it is an advantage.

Especially in the case of software. Because software is downloaded rather than physically bought, it is easier to benefit from exchange rates. In the old case, Microsoft sold Windows at a 90% discount in the Czech Republic. But at this time, even people from other countries came to the Czech server and bought Windows. It is possible to purchase software more easily and efficiently without physically moving like this.
Hello!

Jan 16th, 2023

Welcome to Ben’s Science Research! This is my first post.

Category: Uncategorized

Getting Data From Website by Using ‘Selenium’

Programmatic interpretation of Big Mac index and tall latte index

Hello!