星期二, 12月 27, 2011

[Java] Parameterized method in JDBC

JDBC supports many setXXX() methods, one for each Java data type, so that you can set parameter values directly with the desired Java data types without any conversion. Here is a list of setXXX() methods:


[Java] Preventing SQL Injection in Java

一般基礎常犯的執行sql查詢的危險寫法:

conn = pool.getConnection( );
String sql = "select * from user where username='" + username +"' and password='" + password + "'";
stmt = conn.createStatement();
rs = stmt.executeQuery(sql);
if (rs.next()) {
loggedIn = true;
 out.println("Successfully logged in");
} else {
 out.println("Username and/or password not recognized");
}


為避免執行sql語句讓hacker使用sql injection,可改用PreparedStatement物件


String selectStatement = "SELECT * FROM User WHERE userId = ? ";
PreparedStatement prepStmt = con.prepareStatement(selectStatement);
prepStmt.setString(1, userId);
ResultSet rs = prepStmt.executeQuery();

如何使用PreparedStatement執行批次

import java.util.*;
import java.sql.*;
import javax.sql.*;
import javax.naming.*;
public class MySqlPreparedStatementBatch {
  public static void main(String [] args) {
    Connection con = null;
    try {
      com.mysql.jdbc.jdbc2.optional.MysqlDataSource ds 
        = new com.mysql.jdbc.jdbc2.optional.MysqlDataSource();
      ds.setServerName("localhost");
      ds.setPortNumber(3306);
      ds.setDatabaseName("HerongDB");
      ds.setUser("Herong");
      ds.setPassword("TopSecret");
      con = ds.getConnection();

// PreparedStatement
      PreparedStatement ps = con.prepareStatement(
 "INSERT INTO Profile (FirstName, LastName) VALUES (?, ?)");

// Provide values to parameters for copy 1
      ps.setString(1,"John");
      ps.setString(2,"First");

// Create copy 1
      ps.addBatch();

// Provide values to parameters for copy 2
      ps.setString(1,"Bill");
      ps.setString(2,"Second");

// Create copy 2
      ps.addBatch();

// Provide values to parameters for copy 3
      ps.setString(1,"Mark");
      ps.setString(2,"Third");

// Create copy 3
      ps.addBatch();

// Provide values to parameters for copy 4
      ps.setString(1,"Jack");
      ps.setString(2,"Last");

// Create copy 4
      ps.addBatch();

// Execute all 4 copies
      int[] counts = ps.executeBatch();
      
      int count = 0;
      for (int i=0; i<counts.length; i++) {
       count += counts[i];
      }
      System.out.println("Total effected rows: "+count);

// Close the PreparedStatement object
      ps.close();

      con.close();        
    } catch (Exception e) {
      System.err.println("Exception: "+e.getMessage());
      e.printStackTrace();
    }
  }
}

Reference:
Preventing SQL Injection in Java PreparedStatement in Batch Mode
如何在 Java 網站應用程式中防範 SQL Injection

[Java] Regex replace matcher

專案有一個需求需要將找到的字元進行相關取代的處理,
由於每一個找到的字元需要經過decode的處理,所以不適用replaceAll方法來處理。
以下找到符合的處理方法:

Reference:http://stackoverflow.com/questions/5568081/regex-replace-all-ignore-case

Avoid ruining the original capitalization:

In the above approach however, you're ruining the capitalization of the replaced word. Here is a better suggestion:
String inText="Sony Ericsson is a leading company in mobile. " +
              "The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
Pattern p = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(inText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
  String replacement = m.group().replace(' ', '~');
  m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
}
m.appendTail(sb);
String outText = sb.toString();
System.out.println(outText);
Output:
Sony~Ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001

星期四, 12月 15, 2011

星期二, 12月 13, 2011

[Alfresco] Search API


Search API Example:
  var def =
  {
     query: "cm:name:test*",
     language: "fts-alfresco"
  };
  var results = search.query(def);

Parameters
search
{
   query: string,          mandatory, in appropriate format and encoded for the given language
   store: string,          optional, defaults to 'workspace://SpacesStore'
   language: string,       optional, one of: lucene, xpath, jcr-xpath, fts-alfresco - defaults to 'lucene'
   templates: [],          optional, Array of query language template objects (see below) - if supported by the language 
   sort: [],               optional, Array of sort column objects (see below) - if supported by the language
   page: object,           optional, paging information object (see below) - if supported by the language
   namespace: string,      optional, the default namespace for properties
   defaultField: string,   optional, the default field for query elements when not explicit in the query
   onerror: string         optional, result on error - one of: exception, no-results - defaults to 'exception'
}

sort
{
   column: string,         mandatory, sort column in appropriate format for the language
   ascending: boolean      optional, defaults to false
}

page
{
   maxItems: int,          optional, max number of items to return in result set
   skipCount: int          optional, number of items to skip over before returning results
}

template
{
   field: string,          mandatory, custom field name for the template
   template: string        mandatory, query template replacement for the template
}

Reference:
Full Text Search Query Syntax
Alfresco.util.DataTable and search maxItems 

What is the maximum length of a URL?

http://www.boutell.com/newfaq/misc/urllength.html

[Java] DataOutputStream 的 writeBytes(String s) 編碼問題!!

Java編碼紀綠:

java 的DataOutputStream 的 writeBytes(String s) 方法對中文編碼會錯誤


public final void writeBytes(String s) throws IOException {

int len = s.length();

for (int i = 0 ; i < len ; i++) {

out.write((byte)s.charAt(i));

}

incCount(len);

}


举个例子,以字符串"你好"作为参数输入,(byte)s.charAt(i) 这句就会导致问题,
因为java里的char类型是16位的,一个char可以存储一个中文字符,在将其转换为 byte后高8位会丢失,
这样就无法将中文字符完整的输出到输出流中。
所以在可能有中文字符输出的地方最好先将其转换为字节数组,然再通过write(byte[] b)方法输出。例:

String s = "你好";

write(s.getBytes());

注意:getBytes沒指定編碼格式的話是使用預設系統的編碼。

2012/02/09更新:
DataOutputStream模擬form post上傳的時候,改用write方法才能解決中文編碼的問題。

  /*
  --boundary\r\n
  Content-Disposition: form-data; name=""; filename=""\r\n
  Content-Type: \r\n
  \r\n
  \r\n
  */ 
  this.dataOutputStream.writeBytes(this.PREFIX);
  this.dataOutputStream.writeBytes(this.boundary);
  this.dataOutputStream.writeBytes(this.CRLF);
  this.dataOutputStream.writeBytes("Content-Disposition: form-data; name=\"");
//  this.dataOutputStream.writeBytes(fieldName);
  this.dataOutputStream.write(fieldName.getBytes());
  this.dataOutputStream.writeBytes("\"; filename=\"");
  //don't support char in Chinese
//  this.dataOutputStream.writeBytes(fileName);
  this.dataOutputStream.write(fileName.getBytes());
  this.dataOutputStream.writeBytes("\"");
  this.dataOutputStream.writeBytes(this.CRLF);
  if(mimeType != null){
   this.dataOutputStream.writeBytes("Content-Type:");
   this.dataOutputStream.writeBytes(mimeType);
   this.dataOutputStream.writeBytes(this.CRLF);
   this.dataOutputStream.writeBytes(this.CRLF);
  }



2012/03/12 更新
今天在別的case之下竟然會讓中文亂碼,所以使用getBytes方法前還是指定你要的編碼比較合適
getBytes(Charset.forName("utf-8"))

Reference:
java 的DataOutputStream 的 writeBytes(String s) 方法在向
java String.getBytes()的問題

星期一, 12月 12, 2011

[Java] InputStreamReader

An InputStream is a binary stream, so there is no encoding. 
When you create the Reader, you need to know what character encoding to use, and that would depend on what the program you called produces (Java will not convert it in any way).


If you do not specify anything for InputStreamReader, it will use the platform default encoding, which may not be appropriate. There is another constructor that allows you to specify the encoding.


If you know what encoding to use (and you really have to know):

new InputStreamReader(process.getInputStream(), "UTF-8") // for example

[Java] 設定verbose參數顯示java application的詳細資訊

How to use verbose option while running a Java application
verbose option displays whole information while running a java application.
There are also some extensions available:
-verbose:class print information about each class loaded.
-verbose:gc Displays each garbage collection event.
-verbose:jni Report native methods used in application


設定步驟
步驟一:
$vi cataout.sh

步驟二:在JAVA_OPTS加入verbose參數

JAVA_OPTS='-server -Xms1024m -Xmx1024m -XX:MaxNewSize=256m -XX:MaxPermSize=256m -XX:+CMSParallelRemarkEnabled -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:+UseConcMarkSweepGC -verbose'


星期三, 12月 07, 2011

[Lucene] Stopwords

Lucene不支援以下的stopwords做搜尋:

"a", "an", "and", "are", "as", "at", "be", "but", "by",


"for", "if", "in", "into", "is", "it",


"no", "not", "of", "on", "or", "such",


"that", "the", "their", "then", "there", "these",


"they", "this", "to", "was", "will", "with"

Regex special characters


The patterns used in RegExp can be very simple, or very complicated, depending on what you're trying to accomplish. To match a simple string like "Hello World!" is no harder then actually writing the string, but if you want to match an e-mail address or html tag, you might end up with a very complicated pattern that will use most of the syntax presented in the table below.
PatternDescription
Escaping
\Escapes special characters to literal and literal characters to special.

E.g: /\(s\)/ matches '(s)' while /(\s)/ matches any whitespace and captures the match.
Quantifiers
{n}{n,}{n,m}*+?Quantifiers match the preceding subpattern a certain number of times. The subpattern can be a single character, an escape sequence, a pattern enclosed by parentheses or a character set.

{n} matches exactly n times.
{n,} matches n or more times.
{n,m} matches n to m times.
* is short for {0,}. Matches zero or more times.
+ is short for {1,}. Matches one or more times.
? is short for {0,1}. Matches zero or one time.

E.g: /o{1,3}/ matches 'oo' in "tooth" and 'o' in "nose".
Pattern delimiters
(pattern)(?:pattern)Matches entire contained pattern.

(pattern) captures match.
(?:pattern) doesn't capture match

E.g: /(d).\1/ matches and captures 'dad' in "abcdadef" while /(?:.d){2}/ matches but doesn't capture 'cdad'.

Note: (?:pattern) is a JavaScript 1.5 feature.
Lookaheads
(?=pattern)(?!pattern)A lookahead matches only if the preceding subexpression is followed by the pattern, but the pattern is not part of the match. The subexpression is the part of the regular expression which will be matched.

(?=pattern) matches only if there is a following pattern in input.
(?!pattern) matches only if there is not a following pattern in input.

E.g: /Win(?=98)/ matches 'Win' only if 'Win' is followed by '98'.

Note: Lookahead is a JavaScript1.5 feature.
Alternation
|Alternation matches content on either side of the alternation character.

E.g: /(a|b)a/ matches 'aa' in "dseaas" and 'ba' in "acbab".
Character sets
[characters][^characters]Matches any of the contained characters. A range of characters may be defined by using a hyphen.

[characters] matches any of the contained characters.
[^characters] negates the character set and matches all but the contained characters

E.g: /[abcd]/ matches any of the characters 'a', 'b', 'c', 'd' and may be abbreviated to /[a-d]/. Ranges must be in ascending order, otherwise they will throw an error. (E.g: /[d-a]/ will throw an error.)
/[^0-9]/ matches all characters but digits.

Note: Most special characters are automatically escaped to their literal meaning in character sets.
Special characters
^$.? and all the highlighted characters above in the table.Special characters are characters that match something else than what they appear as.

^ matches beginning of input (or new line with m flag).
$ matches end of input (or end of line with m flag).
. matches any character except a newline.
? directly following a quantifier makes the quantifier non-greedy (makes it match minimum instead of maximum of the interval defined).

E.g: /(.)*?/ matches nothing or '' in all strings.

Note: Non-greedy matches are not supported in older browsers such as Netscape Navigator 4 or Microsoft Internet Explorer 5.0.
Literal characters
All characters except those with special meaning.Mapped directly to the corresponding character.

E.g: /a/ matches 'a' in "Any ancestor".
Backreferences
\nBackreferences are references to the same thing as a previously captured match. n is a positive nonzero integer telling the browser which captured match to reference to.

/(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
/<(\S+).*>(.*)<\/\1>/ matches any tag.

E.g: /<(\S+).*>(.*)<\/\1>/ matches '
text
' in "text
text
text".
Character Escapes
\f\r\n\t\v\0[\b]\s,\S\w\W\d\D\b\B\cX,\xhh\uhhhh\f matches form-feed.
\r matches carriage return.
\n matches linefeed.
\t matches horizontal tab.
\v matches vertical tab.
\0 matches NUL character.
[\b] matches backspace.
\s matches whitespace (short for [\f\n\r\t\v\u00A0\u2028\u2029]).
\S matches anything but a whitespace (short for [^\f\n\r\t\v\u00A0\u2028\u2029]).
\w matches any alphanumerical character (word characters) including underscore (short for [a-zA-Z0-9_]).
\W matches any non-word characters (short for [^a-zA-Z0-9_]).
\d matches any digit (short for [0-9]).
\D matches any non-digit (short for [^0-9]).
\b matches a word boundary (the position between a word and a space).
\B matches a non-word boundary (short for [^\b]).
\cX matches a control character. E.g: \cm matches control-M.
\xhh matches the character with two characters of hexadecimal code hh.
\uhhhh matches the Unicode character with four characters of hexadecimal code hhhh.


Reference:
Regular Expressions patterns

其他你感興趣的文章

Related Posts with Thumbnails