Archive for February, 2010

all news and information will be posted on twitter

Thursday, February 25th, 2010

all news and information will be posted on twitter, this blog has been moved to twitter.

java程序远程读写Hbase数据库的两种实现方法[转]

Friday, February 5th, 2010

关键字: hbase 远程连接

在java程序中如何访问远程Hbase服务器的数据?你在本地编写、调试程序,读取hbase中数据如何做到,我来给你扫扫盲。

两种方法,第一种是通过添加hbase-default.xml和hbase-site.xml到你的工程的classpath路径下。
另一种方法是在程序中制定,例如:

Java 代码
  1. HBaseConfiguration conf = new HBaseConfiguration();
  2. conf.set(“hbase.master”,”192.168.2.38:60000″);
  3. HTable table = new HTable(conf, ”blogposts”);
HBaseConfiguration conf = new HBaseConfiguration();
conf.set("hbase.master","192.168.2.38:60000");
HTable table = new HTable(conf, "blogposts");
怎么样,连接上了把,比起用mysql一堆的JDBC要方便很多……
fwd:  http://beyiwork.javaeye.com/blog/441960

Python正则表达式

Thursday, February 4th, 2010

字符串替换

1.替换所有匹配的子串

用newstring替换subject中所有与正则表达式regex匹配的子串

result, number = re.subn(regex, newstring, subject)

2.替换所有匹配的子串(使 用正则表达式对象)

reobj = re.compile(regex)
result, number = reobj.subn(newstring, subject)

字符串拆分

1.字符串拆分

result = re.split(regex, subject)

2.字符串拆分(使用正则表示式对象)

reobj = re.compile(regex)
result = reobj.split(subject)

匹配

下面列出Python正则表达式的几种匹配用法:

1.测试正则表达式是否 匹配字符串的全部或部分

regex=ur"..." #正则表达式
if re.search(regex, subject):
    do_something()
else:
    do_anotherthing()

2.测试正则表达式是否匹配整个字符串

regex=ur"...\Z" #正则表达式末尾以\Z结束
if re.match(regex, subject):
    do_something()
else:
    do_anotherthing()

3. 创建一个匹配对象,然后通过该对象获得匹配细节

regex=ur"..." #正则表达式
match = re.search(regex, subject)
if match:
    # match start: match.start()
    # match end (exclusive): match.end()
    # matched text: match.group()
    do_something()
else:
    do_anotherthing()

4.获取正则表达式所匹配的子串

(Get the part of a string matched by the regex)

regex=ur"..." #正则表达式
match = re.search(regex, subject)
if match:
    result = match.group()
else:
    result = ""

5. 获取捕获组所匹配的子串

(Get the part of a string matched by a capturing group)

regex=ur"..." #正则表达式
match = re.search(regex, subject)
if match:
    result = match.group(1)
else:
    result = ""

6. 获取有名组所匹配的子串

(Get the part of a string matched by a named group)

regex=ur"..." #正则表达式
match = re.search(regex, subject)
if match:
    result = match.group("groupname")
else:
    result = ""

7. 将字符串中所有匹配的子串放入数组中

(Get an array of all regex matches in a string)

result = re.findall(regex, subject)

8.遍历所有匹配的子串

(Iterate over all matches in a string)

for match in re.finditer(r"<(.*?)\s*.*?/\1>", subject)
    # match start: match.start()
    # match end (exclusive): match.end()
    # matched text: match.group()

9.通过正则表达式 字符串创建一个正则表达式对象

(Create an object to use the same regex for many operations)

reobj = re.compile(regex)

10.用法1的正则表达式对象版本

(use regex object for if/else branch whether (part of) a string can be matched)

reobj = re.compile(regex)
if reobj.search(subject):
    do_something()
else:
    do_anotherthing()

11.用法2的正则表达式对象版本

(use regex object for if/else branch whether a string can be matched entirely)

reobj = re.compile(r"\Z") #正则表达式末尾以\Z 结束
if reobj.match(subject):
    do_something()
else:
    do_anotherthing()

12. 创建一个正则表达式对象,然后通过该对象获得匹配细节

(Create an object with details about how the regex object matches (part of) a string)

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
    # match start: match.start()
    # match end (exclusive): match.end()
    # matched text: match.group()
    do_something()
else:
    do_anotherthing()

13.用正则表达式对象获取匹配子串

(Use regex object to get the part of a string matched by the regex)

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
    result = match.group()
else:
    result = ""

14.用正则表达式对象获取 捕获组所匹配的子串

(Use regex object to get the part of a string matched by a capturing group)

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
    result = match.group(1)
else:
    result = ""

15.用正则表达式对象获取 有名组所匹配的子串

(Use regex object to get the part of a string matched by a named group)

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
    result = match.group("groupname")
else:
    result = ""

16.用正则表达式 对象获取所有匹配子串并放入数组

(Use regex object to get an array of all regex matches in a string)

reobj = re.compile(regex)
result = reobj.findall(subject)

17.通过正则表达式对象遍历所有匹 配子串

(Use regex object to iterate over all matches in a string)

reobj = re.compile(regex)
for match in reobj.finditer(subject):
    # match start: match.start()
    # match end (exclusive): match.end()
    # matched text: match.group

Examples of how to use httplib2

Wednesday, February 3rd, 2010

Simple Retrieval

import httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://example.org/", "GET")

The ‘content’ is the content retrieved from the URL. The content is already decompressed or unzipped if necessary. The ‘resp’ contains all the response headers.

Authentication

To PUT some content to a server that uses SSL and Basic authentication:

import httplib2
h = httplib2.Http(".cache")
h.add_credentials('name', 'password')
resp, content = h.request("https://example.org/chap/2",
    "PUT", body="This is text",
    headers={'content-type':'text/plain'} )

Cache-Control

Use the Cache-Control: header to control how the caching operates.

import httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://bitworking.org/")
 ...
resp, content = h.request("http://bitworking.org/",
    headers={'cache-control':'no-cache'})

The first request will be cached and since this is a request to bitworking.org it will be set to be cached for two hours, because that is how I have my server configured. Any subsequent GET to that URI will return the value from the on-disk cache and no request will be made to the server. You can use the Cache-Control: header to change the caches behavior and in this example the second request adds the Cache-Control: header with a value of ‘no-cache’ which tells the library that the cached copy must not be used when handling this request.

Forms

Below is an example of using httplib2 to submit a form. Note that we have to use the urlencode() function from urllib to encode the data before using it as the POST body.

>>> from httplib2 import Http
>>> from urllib import urlencode
>>> h = Http()
>>> data = dict(name="Joe", comment="A test comment")
>>> resp, content = h.request("http://bitworking.org/news/223/Meet-Ares", "POST", urlencode(data))
>>> resp
{'status': '200', 'transfer-encoding': 'chunked', 'vary': 'Accept-Encoding,User-Agent',
 'server': 'Apache', 'connection': 'close', 'date': 'Tue, 31 Jul 2007 15:29:52 GMT',
 'content-type': 'text/html'}

Cookies

When automating something, you often need to “login” to maintain some sort of session/state with the server. Sometimes this is achieved with form-based authentication and cookies. You post a form to the server, and it responds with a cookie in the incoming HTTP header. You need to pass this cookie back to the server in subsequent requests to maintain state or to keep a session alive.

Here is an example of how to deal with cookies when doing your HTTP Post.

First, lets import the modules we will use:

import urllib
import httplib2

Now, lets define the data we will need. In this case, we are doing a form post with 2 fields representing a username and a password.

url = 'http://www.example.com/login'  
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}

Now we can send the HTTP request:

http = httplib2.Http()
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))

At this point, our “response” variable contains a dictionary of HTTP header fields that were returned by the server. If a cookie was returned, you would see a “set-cookie” field containing the cookie value. We want to take this value and put it into the outgoing HTTP header for our subsequent requests:

headers['Cookie'] = response['set-cookie']

Now we can send a request using this header and it will contain the cookie, so the server can recognize us.

So… here is the whole thing in a script. We login to a site and then make another request using the cookie we received:

#!/usr/bin/env python

import urllib
import httplib2

http = httplib2.Http()

url = 'http://www.example.com/login'  
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))

headers = {'Cookie': response['set-cookie']}

url = 'http://www.example.com/home'  
response, content = http.request(url, 'GET', headers=headers)

Proxies

httplib2 can use a SOCKS proxy if the third-party socks module is installed.

Here is an example of how to use the proxy support:

import httplib2
import socks

httplib2.debuglevel=4
h = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, 'localhost', 8000))
r,c = h.request("http://bitworking.org/news/")