Archive for the ‘hbase key-value store’ Category

all news and information will be posted on twitter

Thursday, February 25th, 2010

all news and information will be posted on twitter, this blog has been moved to twitter.

java程序远程读写Hbase数据库的两种实现方法[转]

Friday, February 5th, 2010

关键字: hbase 远程连接

在java程序中如何访问远程Hbase服务器的数据?你在本地编写、调试程序,读取hbase中数据如何做到,我来给你扫扫盲。

两种方法,第一种是通过添加hbase-default.xml和hbase-site.xml到你的工程的classpath路径下。
另一种方法是在程序中制定,例如:

Java 代码
  1. HBaseConfiguration conf = new HBaseConfiguration();
  2. conf.set(“hbase.master”,”192.168.2.38:60000″);
  3. HTable table = new HTable(conf, ”blogposts”);
HBaseConfiguration conf = new HBaseConfiguration();
conf.set("hbase.master","192.168.2.38:60000");
HTable table = new HTable(conf, "blogposts");
怎么样,连接上了把,比起用mysql一堆的JDBC要方便很多……
fwd:  http://beyiwork.javaeye.com/blog/441960

GQL Reference

Wednesday, October 14th, 2009

GQL is a SQL-like language for retrieving entities or keys from the App Engine scalable datastore. While GQL’s features are different from those of a query language for a traditional relational database, the GQL syntax is similar to that of SQL.

The GQL syntax can be summarized as follows:

  SELECT [* | __key__] FROM <kind>
    [WHERE <condition> [AND <condition> ...]]
    [ORDER BY <property> [ASC | DESC] [, <property> [ASC | DESC] ...]]
    [LIMIT [<offset>,]<count>]
    [OFFSET <offset>]

  <condition> := <property> {< | <= | > | >= | = | != } <value>
  <condition> := <property> IN <list>
  <condition> := ANCESTOR IS <entity or key>

As with SQL, GQL keywords are case insensitive. Kind and property names are case sensitive.

A GQL query returns zero or more entities or Keys of the requested kind. Every GQL query always begins with either

SELECT * FROM

or

SELECT __key__ FROM

, followed by the name of the kind. (A GQL query cannot perform a SQL-like “join” query.)

Tip:

SELECT __key__

queries are faster and cost less CPU than

SELECT *

queries.

The optional

WHERE

clause filters the result set to those entities that meet one or more conditions. Each condition compares a property of the entity with a value using a comparison operator. If multiple conditions are given with the

AND

keyword, then an entity must meet all of the conditions to be returned by the query. GQL does not have an

OR

operator. However, it does have an

IN

operator, which provides a limited form of

OR

.

The

IN

operator compares value of a property to each item in a list. The

IN

operator is equivalent to many

=

queries, one for each value, that are ORed together. An entity whose value for the given property equals any of the values in the list can be returned for the query.

Note: The

IN

and

!=

operators use multiple queries behind the scenes. For example, the

IN

operator executes a separate underlying datastore query for every item in the list. The entities returned are a result of the cross-product of all the underlying datastore queries and are de-duplicated. A maximum of 30 datastore queries are allowed for any single GQL query.

A condition can also test whether an entity has a given entity as an ancestor, using the

ANCESTOR IS

operator. The value is a model instance or Key for the ancestor entity. For more information on ancestors, see Keys and Entity Groups.

The left-hand side of a comparison is always a property name. The right-hand side can be one of the following (as appropriate for the property’s data type):

  • a
    str

    literal, as a single-quoted string. Single-quote characters in the string must be escaped as

    ''

    . For example:

    'Joe''s Diner'
  • an integer or floating point number literal. For example:
    42.7
  • a Boolean literal, as
    TRUE

    or

    FALSE

    .

  • the
    NULL

    literal, which represents the null value (

    None

    in Python).

  • a datetime, date, or time literal, with either numeric values or a string representation, in the following forms:
    • DATETIME(<em>year</em>, <em>month</em>, <em>day</em>, <em>hour</em>, <em>minute</em>, <em>second</em>)
    • DATETIME('<em>YYYY-MM-DD HH:MM:SS</em>')
    • DATE(<em>year</em>, <em>month</em>, <em>day</em>)
    • DATE('<em>YYYY-MM-DD</em>')
    • TIME(<em>hour</em>, <em>minute</em>, <em>second</em>)
    • TIME('<em>HH:MM:SS</em>')
  • an entity key literal, with either a string-encoded key or a complete path of kinds and key names/IDs:
    • KEY('<em>encoded key</em>')
    • KEY('<em>kind</em>', <em>'name'/ID</em> [, '<em>kind</em>', <em>'name'/ID</em>...])
  • a User object literal, with the user’s email address:
    USER('<em>email-address</em>')
  • a GeoPt literal, with the latitude and longitude as floating point values:
    GEOPT(<em>lat</em>, <em>long</em>)
  • a bound parameter value. In the query string, positional parameters are referenced by number:
    title = :1

    Keyword parameters are referenced by name:

    title = :mytitle

Note: conditions of the form

property = NULL

(which are equivalent) check to see whether a null value is explicitly stored in the datastore for that property. This is not the same as checking to see if the entity lacks any value for the property! Datastore queries which refer to a property never return entities which don’t have some value for that property.

Bound parameters can be bound as positional arguments or keyword arguments passed to the GqlQuery constructor or a Model class’s gql() method. Property data types that do not have corresponding value literal syntax must be specified using parameter binding, including the list data type. Parameter bindings can be re-bound with new values during the lifetime of the GqlQuery instance (such as to efficiently reuse a query) using the bind() method.

The optional

ORDER BY

clause indicates that results should be returned sorted by the given properties, in either ascending (

ASC

) or descending (

DESC

) order. If the direction is not specified, it defaults to

ASC

. The

ORDER BY

clause can specify multiple sort orders as a comma-delimited list, evaluated from left to right.

An optional

LIMIT

clause causes the query to stop returning results after the first

count

entities. The

LIMIT

can also include an

offset

to skip that many results to find the first result to return. An optional

OFFSET

clause can specify an

offset

if no

LIMIT

clause is present.

Note: A

LIMIT

clause has a maximum of 1000. If a limit larger than the maximum is specified, the maximum is used. This same maximum applies to the fetch() method of the GqlQuery class.

Note: Like the

offset

parameter for the fetch() method, an

OFFSET

in a GQL query string does not reduce the number of entities fetched from the datastore. It only affects which results are returned by the fetch() method. A query with an offset has performance characteristics that correspond linearly with the offset size.

For information on executing GQL queries, binding parameters, and accessing results, see the GqlQuery class, and the Model.gql() class method.

Daemon service shell for Hypertable on CentOS 5.3

Friday, October 2nd, 2009
#!/bin/bash
#
# chkconfig:    2345 99 01
#
# description:  This is a daemon which periodically for hypertable.
# processname:  hypertabled
#

### BEGIN INIT INFO
# Provides: hypertabled
# Required-Start: $syslog $local_fs
# Required-Stop: $syslog $local_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: hypertable daemon
# Description: This is a daemon which periodically for hypertable
### END INIT INFO


# source function library
. /etc/rc.d/init.d/functions

RETVAL=0

start() {
    echo -n $"Starting hypertabled: "
    su - hypertable -c "/opt/hypertable/0.9.2.7/bin/start-all-servers.sh kfs"
    RETVAL=$?
    echo
    [ $RETVAL -eq 0 ]
}

stop() {
    echo -n $"Stopping hypertabled: "
    su - hypertable -c "/opt/hypertable/0.9.2.7/bin/stop-servers.sh"
    echo
    [ $RETVAL -eq 0 ]
}

restart() {
    stop
    start
}

case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  restart|force-reload|reload)
    restart
    ;;
  condrestart|try-restart)
    restart
    ;;
  status)
    status ThriftBroker
    status Hypertable.Master
    status localBroker
    status Hyperspace.Master
    status Hypertable.RangeServer
    RETVAL=$?
    ;;
  *)
    echo $"Usage: $0 {start|stop|status|restart|reload|force-reload|condrestart}"
    exit 1
esac

exit $RETVAL

cd /etc/init.d
vi hypertabled
paste above code
save and exit (ESC +WQ)

chmod +x hypertabled
chkconfig –add hypertabled
setup (check if it’s already automatic service)

The Hypertable Query Language (HQL) SELECT Syntax

Friday, October 2nd, 2009

version: Hypertable 0.9.2.7 (v0.9.2.7)

SELECT

EBNF

SELECT ('*' | column_family_name [',' column_family_name]*)
  FROM table_name
  [where_clause]
  [options_spec]

where_clause:
    WHERE where_predicate [AND where_predicate ...]

where_predicate:
  cell_predicate
  | row_predicate
  | timestamp_predicate

relop: '=' | '<' | '<=' | '>' | '>=' | '=^'

cell_spec: row ',' column

cell_predicate:
  [cell_spec relop] CELL relop cell_spec
  | '(' [cell_spec relop] CELL relop cell_spec
        (OR [cell_spec relop] CELL relop cell_spec)* ')'

row_predicate:
  [row_key relop] ROW relop row_key
  | '(' [row_key relop] ROW relop row_key
          (OR [row_key relop] ROW relop row_key)* ')'

timestamp_predicate:
  [timestamp relop] TIMESTAMP relop timestamp

options_spec:
  (REVS revision_count
  | LIMIT row_count
  | INTO FILE filename[.gz]
  | DISPLAY_TIMESTAMPS
  | KEYS_ONLY
  | NOESCAPE
  | RETURN_DELETES)*

timestamp:
  'YYYY-MM-DD HH:MM:SS[.nanoseconds]'

Description

The parser only accepts a single timestamp predicate. The ‘=^’ operator is the “starts with” operator. It will return all rows that have the same prefix as the operand.

Options

REVS revision_count

Each cell in a Hypertable table can have multiple timestamped revisions. By default all revisions of a cell are returned by the

SELECT

statement. The

REVS

option allows control over the number of cell revisions returned. The cell revisions are stored in reverse-chronological order, so

REVS=1

will return the most recent version of the cell.

LIMIT row_count

Limits the number of rows returned by the

SELECT

statement to

row_count

.

INTO FILE filename[.gz]

The result of a

SELECT

command is displayed to standard output by default. The

INTO FILE

option allows the output to get redirected to a file. If the file name specified ends in a

.gz

extension, then the output is compressed with gzip before it is written to the file. The first line of the output, when using the

INTO FILE

option, is a header line, which will take one of the two following formats. The second format will be output if the

DISPLAY_TIMESTAMPS

option is supplied.

 #row '\t' column '\t' value

 #timestamp '\t' row '\t' column '\t' value

DISPLAY_TIMESTAMPS

The

SELECT

command displays one cell per line of output. Each line contains three tab delimited fields, row, column, and value. The

DISPLAY_TIMESTAMPS

option causes the cell timestamp to be included in the output as well. When this option is used, each output line will contain four tab delimited fields in the following order:

 timestamp, row, column, value

KEYS_ONLY

The

KEYS_ONLY

option suppresses the output of the value. It is somewhat efficient because the option is processed by the RangeServers and not by the client. The value data is not transferred back to the client, only the key data.

NOESCAPE

The output format of a

SELECT

command comprises tab delimited lines, one cell per line, which is suitable for input to the

LOAD DATA INFILE

command. However, if the value portion of the cell contains either newline or tab characters, then it will confuse the

LOAD DATA INFILE

input parser. To prevent this from happening, newline and tab characters are converted into two character escape sequences, described in the following table.

Character Escape Sequence
newline \n
 '\' 'n'
tab \t
 '\' 't'

The

NOESCAPE

option turns off this escaping mechanism.

RETURN_DELETES

The

RETURN_DELETES

option is used internally for debugging. When data is deleted from a table, the data is not actually deleted right away. A delete key will get inserted into the database and the delete will get processed and applied during subsequent scans. The

RETURN_DELETES

option will return the delete keys in addition to the normal cell keys and values. This option can be useful when used in conjuction with the

DISPLAY_TIMESTAMPS

option to understand how the delete mechanism works.

Examples

SELECT * FROM test WHERE ('a' <= ROW <= 'e') and
                         '2008-07-28 00:00:02' < TIMESTAMP < '2008-07-28 00:00:07';
SELECT * FROM test WHERE ROW =^ 'b';
SELECT * FROM test WHERE (ROW = 'a' or ROW = 'c' or ROW = 'g');
SELECT * FROM test WHERE ('a' < ROW <= 'c' or ROW = 'g' or ROW = 'c');
SELECT * FROM test WHERE (ROW < 'c' or ROW > 'd');
SELECT * FROM test WHERE (ROW < 'b' or ROW =^ 'b');
SELECT * FROM test WHERE "farm","tag:abaca" < CELL <= "had","tag:abacinate";
SELECT * FROM test WHERE "farm","tag:abaca" <= CELL <= "had","tag:abacinate";
SELECT * FROM test WHERE CELL = "foo","tag:adactylism";
SELECT * FROM test WHERE CELL =^ "foo","tag:ac";
SELECT * FROM test WHERE CELL =^ "foo","tag:a";
SELECT * FROM test WHERE CELL > "old","tag:abacate";
SELECT * FROM test WHERE CELL >= "old","tag:abacate";
SELECT * FROM test WHERE "old","tag:foo" < CELL >= "old","tag:abacate";
SELECT * FROM test WHERE ( CELL = "maui","tag:abaisance" OR
                           CELL = "foo","tag:adage" OR
                           CELL = "cow","tag:Ab" OR
                           CELL =^ "foo","tag:acya");

Access Hypertable via Django and Python on Apache

Friday, October 2nd, 2009

from django.http import HttpResponse

import sys
from hypertable.thriftclient import *
from hyperthrift.gen.ttypes import *

def index(request):
try:
client = ThriftClient(“localhost”, 38080)
print “HQL examples”
res = client.hql_query(“show tables”)
print res
res = client.hql_query(“select * from thrift_test”)
print res

print “mutator examples”;
mutator = client.open_mutator(“thrift_test”, 0, 0);
client.set_cell(mutator, Cell(“py-k1″, “col”, None, “py-v1″))
client.flush_mutator(mutator);

print “scanner examples”;
scanner = client.open_scanner(“thrift_test”,
ScanSpec(None, None, None, 1), True);

while True:
cells = client.next_cells(scanner)
if (len(cells) == 0):
break
print cells

except:
print sys.exc_info()
raise
return HttpResponse(“Hello, Django2.” + repr(res))