Tuesday, 30 August 2016

Python to scrape Chinese websites: gb2312 decoding issue solved

When scraping Chinese website using python (Windows 10 system) and requests, it usually returns "gb2312" code for Chinese characters. However, if one does not declare the encoding of requests, it uses its default encoding, which usually is not "gb2312". After reading the website, we would like to save the Chinese content with the unicode encoding in some data file, so that it is easier read next time. The following example is a solution to this issue.

import requests
from bs4 import BeautifulSoup
# set sys default enconding to be unicode utf-8
import sys 
sys.setdefaultencoding("utf-8") 

# website scraping with request
url_to_scrape = 'http://www.mitbbs.com'
     
readOut = requests.get(url_to_scrape)
# in request, there is a method to search/get the real encoding of the website which is
# apparent_endcoding, so one need to set the encoding to be the apparent_encoding
readOut.encoding = readOut.apparent_encoding

# use beautifulsoup to get the text information
textSoup = BeautifulSoup(readOut.text, "lxml")

# now when printing out the content in the textSoup, you will get the right display of Chinese characters.
print(textSoup.title.string)

# with the sys default encoding to be uft-8, it will give the right display of Chinese characters in the txt file.
fileToWrite = open("fileToWrite.txt", "w")
fileToWrite.write("%s" %textSoup.title.string)
fileToWrite.close()



Tuesday, 15 March 2016

Room Temperature Superconductor? No, it is not, it is pyrolytic graphite

Pyrolytic graphite is one of the materials that has the largest diamagnetism at room temperature. Diamagnetic materials tend to repel the magnetic field. Thus, if there is a magnetic field potential, like the first figure, there would be a levitation force for the diamagnetic material.




Pyrolytic graphite is a low-density material, and hence a small levitation force is enough for it to float in the air. I did a small experiment with rare-earth magnets to forming the potential. As shown in the following graph, the little graphite flake is floating.

Thursday, 18 February 2016

Auto Data Acquisition GUI by wxPython, using SR850 Lockin Amplifier and LTC21 Temperature Controller

Recently, I have written a python GUI for measuring low temperature resistance. The codes are in my github.

It uses GPIB communication to control  SR850 Lockin Amplifier and LTC21 Temperature Controller. But the code can be modified to adopt any other GPIB communicated instruments. It also allows realtime data displaying.

Here is the interface of the GUI