Tuesday, 30 August 2016

Python to scrape Chinese websites: gb2312 decoding issue solved

When scraping Chinese website using python (Windows 10 system) and requests, it usually returns "gb2312" code for Chinese characters. However, if one does not declare the encoding of requests, it uses its default encoding, which usually is not "gb2312". After reading the website, we would like to save the Chinese content with the unicode encoding in some data file, so that it is easier read next time. The following example is a solution to this issue.

import requests
from bs4 import BeautifulSoup
# set sys default enconding to be unicode utf-8
import sys 
sys.setdefaultencoding("utf-8") 

# website scraping with request
url_to_scrape = 'http://www.mitbbs.com'
     
readOut = requests.get(url_to_scrape)
# in request, there is a method to search/get the real encoding of the website which is
# apparent_endcoding, so one need to set the encoding to be the apparent_encoding
readOut.encoding = readOut.apparent_encoding

# use beautifulsoup to get the text information
textSoup = BeautifulSoup(readOut.text, "lxml")

# now when printing out the content in the textSoup, you will get the right display of Chinese characters.
print(textSoup.title.string)

# with the sys default encoding to be uft-8, it will give the right display of Chinese characters in the txt file.
fileToWrite = open("fileToWrite.txt", "w")
fileToWrite.write("%s" %textSoup.title.string)
fileToWrite.close()



Tuesday, 15 March 2016

Room Temperature Superconductor? No, it is not, it is pyrolytic graphite

Pyrolytic graphite is one of the materials that has the largest diamagnetism at room temperature. Diamagnetic materials tend to repel the magnetic field. Thus, if there is a magnetic field potential, like the first figure, there would be a levitation force for the diamagnetic material.




Pyrolytic graphite is a low-density material, and hence a small levitation force is enough for it to float in the air. I did a small experiment with rare-earth magnets to forming the potential. As shown in the following graph, the little graphite flake is floating.

Thursday, 18 February 2016

Auto Data Acquisition GUI by wxPython, using SR850 Lockin Amplifier and LTC21 Temperature Controller

Recently, I have written a python GUI for measuring low temperature resistance. The codes are in my github.

It uses GPIB communication to control  SR850 Lockin Amplifier and LTC21 Temperature Controller. But the code can be modified to adopt any other GPIB communicated instruments. It also allows realtime data displaying.

Here is the interface of the GUI

Tuesday, 20 October 2015

Matlab: Match elements of a column in two arrays with different number of rows and move data from one to the other

Problem:

    For example, there are 'Name', 'Count' in inventory tables Sample1, Sample2, Sample3. Table Sample1, Sample2, Sample3 share some common variables of 'Names', but with different number of  'Count'. Now we want to combine all three tables into one table T_sum, in which 'Name' is a unique list and in the row of each name have the count for each sample.

Eg. from the many of the following table

Name Count
Bread 234
Pop 123
... ...

to a summary table like the following
Name_all Sample1-Count Sample2-Count ...
Bread 234 236 ...
Pop 123 456 ...
... ... ... ...

Solution #1:  use for loop

%% find the list of Names
Name_all = unique([Sample1.Name; Sample2.Name ...]);

%% do a for loop to migrate data from individual tables to a summary table
% preallocate the memory for the summary table
Counts = zeros(size(Name_all, 1),1);
T_sum = table(Name_all, Counts, Counts, ... );

% set table column names
T_sum.Properties.VariableNames(2:end) = ['Sample1-Count', 'Sample2-Count', ...] 

% for loop for coping data
for  jj = 1: N_samples
    for ii = 1:size(Name_all, 1)
        % find the index of the Name in Sample{jj}
        tmpIndex = Sample{jj}.Name == Name_all(ii);
        % if there Name_all(ii) is in the list of Sample{jj}.Name, copy Count to T_sum under the column of Samplejj-Count
        if sum(tmpIndex) ==1
            T_sum(ii+1,jj) = Sample{jj}.Count(tmpIndex);
        end
    end
end

This code is good for small set of data (<10 a="" becomes="" br="" comes="" data="" it="" large="" of="" set="" slow.="" to="" very="" when="">

Solution #2:  Vectorization (faster)

Same as the codes above for initialization and preallocation. We just need to replace the for-loops to the following codes:

%% use vectorized indexing to replace the for-loops

for jj = 1: N_samples
     % use ismember to find the order (locations) for each name in Sample{jj} that appears in Name_all
     [tmpIndex, tmploc] = ismember(Name_all, Sample{jj}.Name);
     % delete zeros in tmploc
     tmploc(~tmploc) = [];
     % Copying counts to the Summary table
     T_sum(tmp, jj+1) = Sample{jj}.Count(tmploc);
end


Now, it is much faster.

Please leave a comment if you have any question.




Wednesday, 6 May 2015

R Loading xlsx problem

require(xlsx)
Loading required package: xlsx
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: fun(libname, pkgname)
  error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures.
Failed with error:  ‘package ‘rJava’ could not be loaded’


Problem answered by http://stackoverflow.com/questions/17376939/problems-when-trying-to-load-a-package-in-r-due-to-rjava

The reason is probably linked to the fact you are using a 64-bit OS and R version but do not have Java installed with the same architecture. What you have to do is to download Java 64-bit from this page: https://www.java.com/en/download/manual.jsp After that just try to reload the xlxs package. You shouldn't need to re-start R.

Friday, 1 May 2015

how to change the alt-tab application switcher in ubuntu unity

Original post is from here : http://ubuntuforums.org/showthread.php?t=2211863



sudo apt-get install compizconfig-settings-manager
sudo apt-get install compiz-plugins

Open compizconfig-settings-manager with alt-F2, type ccsm.

Scroll down to "Ubuntu Unity Plugin". Choose the tab "Switcher". Disable the alt-tab and shift-alt-tab key bindings. ("Key to start the switcher" and "Key to switch to the previous window in the Switcher".
Click the "Back" button.

Scroll down to the "Window management" section. Here you can select another switcher.
I enable the "Static Application Switcher", resolve any potential conflicts by setting the setting for "Static Application Switcher".
Now you can tweak the switcher by clicking on it. I have changed alt-tab and shift-alt-tab to "Next window (All windows)" and "Prev window (All windows)".

Experiment to find the settings that work best for you. It you want to go back to the poriginal settings, simply disable the Static Application Switcher and enable the key bindings in "Ubuntu Unity Plugin" again.

Thursday, 16 April 2015

Find the maximum and its coordinates in a region of a matrix

%% Select the region of interest
Subregion = A(rowStart:rowEnd, colStart:colEnd);

% find max value and get its index
[value, k] = max(Subregion(:));[i, j] = ind2sub(size(Subregion), k);
% move indexes to correct spot in matrix
i = i + rowStart-1;
j = j + colStart-1;

Original link to the solution is http://stackoverflow.com/questions/7677996/matlab-finding-max-value-in-a-region-of-2d-matrix