Siebel Connection Pool in Oracle Service Bus 11g

07/04/2012

When a middleware is used to make a lot of Web Service calls to Siebel CRM, the usual login mechanism is too slow and expensive. In the usual login, the Siebel application makes one login / logoff to every Siebel layer involved (application, database, etc) to every single call. When making a lot of concurrent calls, contention may occur in any of the underlying layers and some requests can be delayed or rejected, returning an error to the caller in some cases.

To ease this problem and improve performance, the most commonly solution is to use a Stateless Session Management of Authorization, which means that username and password are provided only in the first call, and subsequent calls are made using a SessionToken that is returned every time for the next call.

Obviously it would be amazing to do some kind of Connection Pool, or more specifically, a Token Pool/Cache to open some stateless sessions on demand, and reuse them using the next tokens.

I did a search in Google to find some references and cases of this approach, and I found some articles about using JAX-WS SOAP Handlers. But I was disappointed because I can’t use SOAP Handlers in my Middleware, and I can’t propagate this pool solution along all my applications making a Point-To-Point integration to Siebel.

The SOA architecture that we use here takes advantage of the Oracle Service Bus 11g to encapsulate all Siebel calls acting as a Proxy, doing input validation, security centralization and enforcement, endpoint management, policy attachment, statistics monitoring, parameterized data enrichment, service lifecycle management, and so on. Basically, all clients take access to Siebel through OSB:

In this article I will show a solution to make the Siebel Connection Pool using SessionToken at the OSB level, in a SOA approach. The best thing about this solution is the fact that none of the Siebel service clients are impacted by this change because the connection pool is centralized at the mediation layer, in this case, OSB, and no interfaces are changed.

The Connection Pool Classes

To control the pool, I made some simple Java classes, the first one is a simple DTO do pack the Siebel username and password into a single object to act as the key to store the tokens (implementing equals and hashcode default generated by Eclipse):

package com.wordpress.gibaholms;

public class SiebelCredential {

	private String username;
	private String password;

	public SiebelCredential(String username, String password) {
		this.username = username;
		this.password = password;
	}

	@Override
	public String toString() {
		return "SiebelCredential [username=" + username + ", password=" + password + "]";
	}

	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + ((password == null) ? 0 : password.hashCode());
		result = prime * result + ((username == null) ? 0 : username.hashCode());
		return result;
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		SiebelCredential other = (SiebelCredential) obj;
		if (password == null) {
			if (other.password != null)
				return false;
		} else if (!password.equals(other.password))
			return false;
		if (username == null) {
			if (other.username != null)
				return false;
		} else if (!username.equals(other.username))
			return false;
		return true;
	}

	public String getUsername() {
		return username;
	}

	public String getPassword() {
		return password;
	}

}

I created another simple class to hold a Siebel token with his creation date. This date will be used to avoid returning certainly expired tokens:

package com.wordpress.gibaholms;

import java.util.Date;

public class SiebelToken {

	private String token;
	private Date creationDate;

	public SiebelToken(String token, Date creationDate) {
		this.token = token;
		this.creationDate = creationDate;
	}
	
	public SiebelToken(String token) {
		this(token, new Date());
	}
	
	@Override
	public String toString() {
		return "SiebelToken [token=" + token + ", creationDate=" + creationDate + "]";
	}

	public String getToken() {
		return token;
	}

	public Date getCreationDate() {
		return creationDate;
	}

}

The most important class to control the pool is very simple, just making a static cache of a Queue of tokens mapped by credential (username and password pair). I also added some counters to be retrieved by one statistics service if necessary. As you can see, the methods to get/add tokens are synchronized to support concurrent requests from OSB. Obviously some short locks may occur, but the performance loss is insignificant compared to the login/logoff Siebel problem, because the methods are very simple and cheap:

package com.wordpress.gibaholms;


import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
import java.util.Queue;

public class TokenPool {
	
	private static final int SIEBEL_TOKEN_EXPIRATION_IN_MINUTES = 15;
	
	private static final Map<SiebelCredential, Queue<SiebelToken>> tokensByCredential;
	private static long addCount;
	private static long pollCount;
	private static long hitCount;
	private static long missCount;
	
	static {
		tokensByCredential = new HashMap<SiebelCredential, Queue<SiebelToken>>();
	}
	
	public synchronized static String pollToken(String username, String password) {
		pollCount++;
		SiebelCredential credential = new SiebelCredential(username, password);
		if (tokensByCredential.containsKey(credential)) {
			Queue<SiebelToken> tokens = tokensByCredential.get(credential);
			while (tokens.size() > 0) {
				SiebelToken token = tokens.poll();
				long diffInMinutes = (System.currentTimeMillis() - token.getCreationDate().getTime()) / (60 * 1000);
				if (diffInMinutes < SIEBEL_TOKEN_EXPIRATION_IN_MINUTES) {
					hitCount++;
					return token.getToken();
				}
			}
		}
		missCount++;
		return null;
	}
	
	public synchronized static void addToken(String username, String password, String token) {
		addCount++;
		SiebelCredential credential = new SiebelCredential(username, password);
		if (!tokensByCredential.containsKey(credential)) {
			tokensByCredential.put(credential, new LinkedList<SiebelToken>());
		}
		tokensByCredential.get(credential).add(new SiebelToken(token));
	}

	public static void eraseCounters() {
		addCount = 0;
		pollCount = 0;
		hitCount = 0;
		missCount = 0;
	}
	
	public static long getAddCount() {
		return addCount;
	}

	public static long getPollCount() {
		return pollCount;
	}

	public static long getHitCount() {
		return hitCount;
	}

	public static long getMissCount() {
		return missCount;
	}
	
}

Notice also that we have a constant SIEBEL_TOKEN_EXPIRATION_IN_MINUTES to give us a hint to indicate if the token would be expired on Siebel. The token expiration time on Siebel is a system parameter named SessionTokenTimeout that can be adjusted according to your needs. The default value indicated in Siebel documentation is 15 minutes. More information can be found on the official docs here: http://docs.oracle.com/cd/B40099_02/books/EAI2/EAI2_WebServices32.html#wp178856

Using this simple date control we avoid having old tokens into pool and minimize the chances of waste Siebel calls to discover that the token is expired to try again with full header unnecessarily, improving the pool efficiency. However some expired tokens may still pass because sometimes may occur a milliseconds window between the token creation date on Siebel and the token creation date into our pool, so the expired token error still need to be handled.

Now just assemble these classes into a JAR file and we are ready to define the best Message Flow to get/put tokens into this cache in the appropriate moments.

The Service Bus Project

Our OSB project looks like this:

  • SiebelSWEBusiness.biz
    Is a generic business service for all Siebel Web Services. All SWE services use the same endpoint and are only differed by WSDL, so we can use a single BS of type “Any SOAP Service – SOAP 1.1”.

  • siebel-connection-pool-1.0.jar
    Is a simple JAR file containing the classes shown above in this article. We will use the “Java Callout” activity to get/add tokens from the pool using the static methods of the following class: “com.wordpress.gibaholms.TokenPool”.
  • SiebelAccount.sa
    Is a Service Account of type “Static” that holds the Siebel username and password. The OSB clients authenticate themselves through the Weblogic security realm using WS-Security, which centralizes the access policies to corporative resources.

  • AssignSiebelHeaderFull.xq
    Is a reusable XQuery which mounts the full Siebel header, with username and password:

  • AssignSiebelHeaderToken.xq
    Is a reusable XQuery which mounts the Siebel header with SessionToken tag only:

  • SiebelStatelessSession.proxy
    Is the main Proxy Service which contains all the logic to handle the requests and manage the Siebel tokens. It would be a bad idea to replicate all this logic along all Siebel Proxy Services, so I decided to encapsulate this logic into a single PS of type “Any SOAP Service” and protocol “local”. Then any Siebel service can use the connection pool, just routing the flow to this PS:

    The message flow shown above is the secret for the solution. To understand more deeply I suggest download the code and inspect the boxes, but I will try to explain the flow in a more high level:

    Stage

    Description

    AssignSiebelAccount Get the username and password from the SiebelAccount.sa for use later when mounting the headers
    GetTokenFromPool Try to get a token from the pool and assign to a variable. If the pool is empty, the variable will be null
    AssignSiebelHeader If exists token, the token is used in the header, otherwise, the full header is mounted
    BackupRequestBody This stage saves the original request body into a variable for the case of getting token expired error on the first attempt, because, in this case, it will be used for a second attempt
    DefineErrorType According to the return of the first attempt, we need to take some actions. The OSB Error Handler is a very bad place to put complex logic because of the product architecture, then the route handler only detects the type of error occurred and let the flow resume to the response pipeline, which takes the correct actions.
    HandleSiebelReturn In this place the flow detects four possibilities:

    TokenExpired: makes a second attempt using the full header, with username and password

    ErrorWithToken: put the response token into the pool and return the error

    ErrorWithoutToken: just return the error

    Success: put the response token into the pool and return the response

    For security reasons, the response header is removed from the message in all cases.

    HandleSecondAttemptError The flow will fall into this block only if the second attempt results in error too, so it just verify if the error response have a token, which is added into the pool, otherwise, the error is returned to the caller.
  • MySiebelService.wsdl
    This WSDL represents our Siebel regular SWE Web Service, generated by Siebel Tools. No secrets here.
  • MySiebelService.proxy
    This Proxy Service represents the access point to our Siebel service, which will be called by end consumers. Note that is very simple to reuse the “SiebelStatelessSession.proxy” by just creating an ordinary Proxy Service from Siebel service WSDL and adding a route, replying the original fault:

Supported Scenarios

  • Success on the first call – OK;
  • Generic application error on the first call, without returning token – OK;
  • Generic application error on the first call, returning token – OK;
  • Token expired error in the first call, and success on the retry call – OK;
  • Token expired error in the first call, and generic application error on the retry call, without retuning token – OK;
  • Token expired error in the first call, and generic application error on the retry call, retuning token – OK.

Benefits of the Solution

  • No changes to the service interfaces;
  • No changes to the service consumers;
  • No code replication;
  • No infrastructure logic leak to the service consumers;
  • Centralized solution;
  • Easy to manage and maintain;
  • Pluggable solution, easy to enable/disable to any service;
  • Incredible performance gain to the entire corporation, because all Siebel services can be pool enabled as fast as add a route in OSB.

Now we can easy enable the Connection Pool to any Siebel SWE service into the SOA mediation layer, with no impacts on actual service consumers and “zero” code replication, in a corporative way.

Attachments

(Updated 08/02/2012 – 08:47)

Source Code:
https://github.com/gibaholms/articles/tree/master/Siebel_Connection_Pool_in_OSB

FFPOJO and Spring Batch Integration

06/11/2012

This week I had a need to integrate FFPOJO (version 1.0) and Spring Batch (version 2.1.8.RELEASE). I know that Spring Batch have its own LineTokenizers and FieldSetMappers, but is much more clean, flexible and fast using FFPOJO to do this parsing job. The integration was very easy, just by creating a simple LineMapper extension.

The FFPojoLineMapper class:

import org.springframework.batch.item.file.LineMapper;

import com.github.ffpojo.FFPojoHelper;

public class FFPojoLineMapper<T> implements LineMapper<T> {

	private Class<T> recordClazz;

	public T mapLine(String line, int lineNumber) throws Exception {
		return FFPojoHelper.getInstance().createFromText(recordClazz, line);
	}

	public void setRecordClazz(Class<T> recordClazz) {
		this.recordClazz = recordClazz;
	}

}

That’s it!

To use this glue, let’s sample with the POJO record class shown below:

@PositionalRecord
public class Customer {

	private Long id;
	private String name;
	private String email;
	
	@PositionalField(initialPosition = 1, finalPosition = 5)
	public Long getId() {
		return id;
	}
	public void setId(Long id) {
		this.id = id;
	}
	// must use a String setter or a FieldDecorator
	public void setId(String id) {
		this.id = Long.valueOf(id);
	}
	
	@PositionalField(initialPosition = 6, finalPosition = 25)
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	
	@PositionalField(initialPosition = 26, finalPosition = 55)
	public String getEmail() {
		return email;
	}
	public void setEmail(String email) {
		this.email = email;
	}
}

Then the reader declaration in the job context xml will look like this:

<batch:job id="sampleJob">
	<batch:step id="sampleFFPojoReadingStep">
		<batch:tasklet>
			<batch:chunk commit-interval="100">
				<batch:reader>
					<bean class="org.springframework.batch.item.file.FlatFileItemReader">
						<property name="lineMapper">
							<bean class="sample.reader.FFPojoLineMapper">
								<property name="recordClazz">
									<value type="java.lang.Class">sample.Customer</value>
								</property>
							</bean>
						</property>
					</bean>
				</batch:reader>
				<batch:writer>
					<bean class="sample.writer.MyItemWriter" />
				</batch:writer>
			</batch:chunk>
		</batch:tasklet>
	</batch:step>
</batch:job>

Soon this code will be available in an integration module embedded into FFPOJO project binaries, and more samples will be added.

The FFPOJO framework can be found in the Maven Central:

<dependencies>
	<dependency>
		<groupId>com.github.ffpojo</groupId>
		<artifactId>ffpojo</artifactId>
		<version>1.0</version>
	</dependency>
</dependencies> 

If you want to know better the FFPOJO library, please visit the project website on GitHub.

Central Maven Repository Publish Experience

03/13/2012

This week I published an Open Source Project that I maintain, to the Sonatype OSS Repository Hosting Service, with the objective of make the project available in the Central Maven Repository.

The project that I published is the FFPOJO Project, which is a Flat-File Parser, POJO based, library for Java applications.

The objective of this post is to comment my experience in the publishing process and provide some tips and points to people who are in this process to.

  • It’s quite obvious but you must own the domain in your GroupId and the package names must start with this domain too;
  • If you host your project at GitHub, you can create a free account and turn it on an organizational account, then link this account to your personal user as an organization member. This can provide you an domain like “https://github.com/projectname&#8221;, which let you to use the group id and package names like “com.github.projectname”;
  • Create a parent-pom and use inheritance and multi-modules to concentrate project-specific pom tags into the parent-pom and facilitate the release process. Remember that the nested project structure is more compatible than flat structure;
  • Follow strictly the instructions in the official Sonatype repository usage guide;
  • Use the javadoc pluging and the source plugin to generate the “-javadoc.jar” and “-sources.jar”;
  • Use the maven release plugin to facilitate the release management;
  • If you use GitHub and Windows for development, you might get an error on release:prepare that maven stucks after the push command. This happens when you use passphrase in your GitHub SSH certificate. The best solution I found is to use other certificate with no passphrase. I found other solutions like use putty pageant/plink to cache the certificate and passphrase and use it as ssh client, but not worked for me. The easiest is not use passphrase at all;
  • If you use GitHub and Windows for development, when you call release:prepare on Cygwin Git Bash, you might see an error like “pom.xml is outside repository”. It’s a relative/absolute path trouble in the maven git scm plugin. The best solution I found is to run maven from cmd.exe instead of Git Bash. Then the ssh.exe and git.exe folders must be in the PATH variable;
  • Don’t forget to publish your GPG public key at the public keyservers (hkp://pool.sks-keyservers.net, hkp://keyserver.ubuntu.com, hkp://pgp.mit.edu). This is verified by the central at release promotion time.

That’s it. I have not found any other trouble in the process, it is very fast and the JIRA administrators are very attentive. Finally, the FFPOJO framework can be found in the Maven Central like any other open source framework:

<dependencies>
	<dependency>
		<groupId>com.github.ffpojo</groupId>
		<artifactId>ffpojo</artifactId>
		<version>1.0</version>
	</dependency>
</dependencies> 

If you want to know better the FFPOJO library, please visit the project website on GitHub.

XQuery TitleCase Function in Oracle Service Bus 11g

03/06/2012

One of my clients made a requirement of transforming some customer data, returned by the legacy systems wrapped by OSB services, to the “Title Case” format, also known as “Pascal Case” and “Upper Camel Case”. That means that the text must be tokenized at the blank spaces and the first letter of every word must be in capitalized.

For example:

Input: hello world

Output: Hello World

The easiest and more convenient way to implement this requirement is by creating a function in pure Java and generating a Custom XPath or making a Java Call in OSB.

To my client, I made a pure Java Custom XPath, delivered with all the JUnit tests to guarantee the maintenance and evolution of the function. But, in parallel, I spent a little time making the “Title Case” function using only XQuery, just to exercise the language, which can be very complex at a first look.

Saying again, the only purpose of this article is to show the power of XQuery and demonstrate the kind of things that can be done using this powerful language, which supports variable creation, decision and flow control structures, like any other programming language.

Title Case XQuery test project structure:

Below is the “xq/TitleCase.xq” file, implemented only using XQuery:

(:: pragma type="xs:string" ::)

declare namespace xf = "http://tempuri.org/xq/TitleCase/";

declare function xf:TitleCase($str as xs:string) as xs:string {
	let $words := tokenize($str, '\s')
	return 
		let $result := 
			for $word in $words
			return concat(upper-case(substring($word, 1, 1)), substring($word, 2))
		return string-join($result, ' ')
};

declare variable $str as xs:string external;

xf:TitleCase($str)

To test the function, just create the XQuery above in the sbconsole and launch the test console:

The “Title Case” XQuery has only one string parameter, so input some text and execute it:

Then you must see the text “Title Cased”, like shown below:

To functions that will be widely and heavily used, making them using Java is the best approach of course. However, to do simplest and smaller things, or to avoid an xpath deploy and a server restart, you always can use the good XQuery language.

Attachments

Source Code:
https://github.com/gibaholms/articles/tree/master/XQuery_TitleCase_Function_in_OSB

Read XML Resource in Oracle Service Bus 11g

02/23/2012

Some time ago I was thinking in how to read XML Resources available in OSB projects at runtime. It would be very useful to hold some configuration data to be read by the proxy services through XPath or Java Call. After many searches in Google, I did not find any documentation or example in how to read OSB resources content programmatically at runtime. The popular OSB APIs and MBeans do not provide any method to get the resources content, but only their references. The question was how to get the resources content through their references.

With the help of JShrink I spent some time decompiling the OSB native jar functions and making a reverse engineering to discover how to do this. Fortunately, after some good tries I found what I was looking for. Using some internal OSB built-in libraries we can access any project resource at runtime.

In this article I will show how to read a sample XML Resource programmatically at runtime. In this sample I will use a Custom XPath to read a XML resource that holds application configuration parameters. Obs.: the same API shown here can also be used into a Java Call.

The sample project structure will look like this:

File “xsd/Parameters.xsd”: a simple schema that we’ll use in this scenario to describe the XML structure that holds the parameters:

<xsd:complexType name="Parameters">
	<xsd:sequence>
		<xsd:element name="ParameterList" type="tns:ParameterList"/>
	</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ParameterList">
	<xsd:sequence>
		<xsd:element name="Parameter" type="tns:Parameter" minOccurs="0" maxOccurs="unbounded"/>
	</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="Parameter">
	<xsd:sequence>
		<xsd:element name="Key" type="xsd:string"/>
		<xsd:element name="Value" type="xsd:string"/>
	</xsd:sequence>
</xsd:complexType>

File “xml/Parameters.xml”: the sample XML that will be read by our XPath function at runtime:

<ns0:Parameters xmlns:ns0="http://gibaholms.wordpress.com/samples/xsd/2012/02/parameters">
	<ns0:ParameterList>
		<ns0:Parameter>
			<ns0:Key>Color</ns0:Key>
			<ns0:Value>Blue</ns0:Value>
		</ns0:Parameter>
		<ns0:Parameter>
			<ns0:Key>Size</ns0:Key>
			<ns0:Value>500</ns0:Value>
		</ns0:Parameter>
	</ns0:ParameterList>
</ns0:Parameters>

File “xq/ReadXmlParameters.xq”: a sample test XQuery that calls our XPath function passing as argument the full reference path to the XML parameters file:

xquery version "1.0" encoding "Cp1252";
(:: pragma bea:schema-type-return type="ns0:Parameters" location="../xsd/Parameters.xsd" ::)

declare namespace xf = "http://tempuri.org/ReadXmlResource/xq/ReadXmlParameters/";
declare namespace ns0 = "http://gibaholms.wordpress.com/samples/xsd/2012/02/parameters";
declare namespace param = "http://gibaholms.wordpress.com/xpath/ReadXmlResource";

declare function xf:ReadXmlParameters() as element() {
    param:readXml("ReadXmlResource/xml/Parameters")
};

xf:ReadXmlParameters()

Now, let’s create the custom XPath that does the magic. Create a simple Java project and add the following compile dependencies:

  • <MIDDLEWARE_HOME>\Oracle_OSB1\modules\com.bea.common.configfwk_1.5.0.0.jar
  • <MIDDLEWARE_HOME>\Oracle_OSB1\modules\com.bea.core.xml.xmlbeans_2.1.0.0_2-5-1.jar
  • <MIDDLEWARE_HOME>\Oracle_OSB1\lib\modules\com.bea.alsb.resources.core.jar
  • <MIDDLEWARE_HOME>\Oracle_OSB1\lib\modules\com.bea.alsb.resources.xml.jar

The XPath project structure is shown below:

In the file “osb-readxmlresourcefunction.xml” we describe the XPath functions contract. This file is a requirement of OSB and must be copied to the functions directory with the generated jar file:

<xpf:xpathFunctions xmlns:xpf="http://www.bea.com/wli/sb/xpath/config">
	<xpf:category id="Custom Functions">
		<xpf:function>
			<xpf:name>readXml</xpf:name>
			<xpf:comment>This function reads a XML Resource file from OSB</xpf:comment>
			<xpf:namespaceURI>http://gibaholms.wordpress.com/xpath/ReadXmlResource</xpf:namespaceURI>
			<xpf:className>com.wordpress.gibaholms.xpath.ReadXmlResource</xpf:className>
			<xpf:method>org.apache.xmlbeans.XmlObject readXml(java.lang.String)</xpf:method>
			<xpf:isDeterministic>true</xpf:isDeterministic>
			<xpf:scope>Pipeline</xpf:scope>
			<xpf:scope>SplitJoin</xpf:scope>
		</xpf:function>
	</xpf:category>
</xpf:xpathFunctions>

The secret to access OSB resources at runtime is the various “*Repository” classes. The code of our “readXml” XPath function is shown below:

package com.wordpress.gibaholms.xpath;

import org.apache.xmlbeans.XmlException;
import org.apache.xmlbeans.XmlObject;

import com.bea.wli.config.Ref;
import com.bea.wli.config.component.NotFoundException;
import com.bea.wli.sb.resources.config.XmlEntryDocument;
import com.bea.wli.sb.resources.xml.XmlRepository;

public class ReadXmlResource {

	public static XmlObject readXml(String xmlRefPath) {
		Ref ref = new com.bea.wli.config.Ref("XML", Ref.getNames(xmlRefPath));
		XmlObject xmlObject = null;
		try {
			XmlEntryDocument xmlEntryDocument = XmlRepository.get().getEntry(ref);
			String xmlContent = xmlEntryDocument.getXmlEntry().getXmlContent();
			xmlObject = XmlObject.Factory.parse(xmlContent);
		} catch (NotFoundException e) {
			e.printStackTrace();
			throw new RuntimeException("XML Resource not found.");
		} catch (XmlException e) {
			e.printStackTrace();
			throw new RuntimeException("Error parsing XML content.", e);
		}
		return xmlObject;
	}
	
}

As we can see in the code, the class “com.bea.wli.sb.resources.xml.XmlRepository” does the magic of retrieve the XML resource content through it “com.bea.wli.config.Ref” reference. Following the same principle, we can use the other various “*Repository” classes to access other types of resources in projects at runtime, like Schemas, XSLTs, WSDLs and so on.

To test the function, just run the XQuery “xq/ReadXmlParameters.xq” in the sbconsole:

The test XQuery has no parameters, so just execute it:

Then you must see the parameters XML loaded at runtime, like shown below:

I’m glad to share this discovery with other ALSB / OSB developers and I hope help you to do more useful things with this knowledge.

If someone already has another solution to read OSB project resources at runtime or already use this API for some other purpose, please share the experience with us through comments. Thanks.

Attachments

Source Code:
https://github.com/gibaholms/articles/tree/master/Read_XML_Resource_in_OSB

Advanced Validation in Oracle Service Bus 11g

10/31/2011

When implementing Proxy Services in Oracle Service Bus (old BEA AquaLogic Service Bus), it’s important to think about validation of request data.

The OSB provides a Validate action that is a good mechanism to validate when the incoming request is conformant to the XML Schema defined in the service contract. However, using only this action is a very poor mechanism to effectively validate a request because of the following reasons:

  • The Validate action accepts validation only against XML Schema (xsd), that has a limited syntax and cannot support more complex validations or rules;
  • XML Schema cannot support business oriented validations, only structural validations (and very limited);
  • A good choice for better support in validation would be Schematron, but Oracle Service Bus 11g does not provide Schematron support yet (SOA Suite Mediator already supports, but OSB not);
  • If your company uses Canonical Model, is very common to use on all your xsd entities elements with minOccurs=”0” to get full reusability on the model, however is hard to validate when a specific service operation needs that an element be mandatory, and it weakens the service contract.

In this article I will show a good pattern that I use to do more advanced, business oriented validation in Oracle Service Bus using XQuery. Note that deep XQuery knowledge is not in the scope of this article, for more in-depth knowledge about XQuery you need to google (a good start can be found at http://www.w3schools.com/xquery/default.asp). I will predict that the reader has basic XPath concepts to find some nodes in a XML instance (more at http://www.w3schools.com/xpath/default.asp).

In this sample I will focus on the Proxy Service, so I will ignore any business-proxy-business transformations and the sample will show a simple Business Service virtualization through data by-pass, but implementing some complex validation on the request input that is the main objective of this article. You can download the example artifacts in the links bellow this article.

Let’s take a look into the sample request input message that we’ll use in this scenario:

<xsd:element name="OrderRequest">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Customer">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="FirstName" type="xsd:string" minOccurs="0" />
<xsd:element name="LastName" type="xsd:string" minOccurs="0" />
<xsd:element name="DocumentNumber" type="xsd:string" minOccurs="0" />
<xsd:element name="Email" type="xsd:string" minOccurs="0" />
<xsd:element name="Age" type="xsd:int" minOccurs="0"/>
<xsd:element name="Password" type="xsd:string" minOccurs="0" />
<xsd:element name="PasswordConfirmation" type="xsd:string" minOccurs="0" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="Product">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="ProductID" type="xsd:string" minOccurs="0" />
<xsd:element name="ProductRestriction" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="EVERYONE"/>
<xsd:enumeration value="OVER18"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="Quantity" type="xsd:int" minOccurs="0" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

Observations:

  • Obviously this dummy contract is not real world and has no business relevance in an real order scenario;
  • The sample simulates the use of Canonical Model entities which sets minOccurs=”0” all over the place, but we know that some information is mandatory to our execute order operation.

The starting point of this sample is the structure shown below, that consists of:

  • One OSB Project named “AdvancedValidation” in the default configuration
  • The “OrderService-v1.wsdl” that represents the contract used in the sample
  • One Business Service named “OrderBusiness” created from the OrderService WSDL (I recommend you to generate a Mock Service in SoapUI tool to point this Business Service to and run your tests)
  • One Proxy Service named “OrderProxy” created from the OrderService WSDL, with a basic Message Flow that only routes the original request to the OrderBusiness in the execute operation.

By now the focus will be the Proxy Service message flow.

To start our robust validation let’s add the main pipeline pair and put the simple “Validate” action into the request branch, because the minimum expected is that the request message does comply to the XML Schema, and if don’t, the flow does not need to work anymore. In the XPath select the OrderRequest input element in the wizard, in variable body against its XML Schema definition present into the OrderService WSDL.

- XPath: ./ord:OrderRequest
- In Variable: body
- Against Resource: OrderService-v1.wsdl/OrderRequest
- Raise Error

Now comes the great part, lets prepare our advanced XQuery validation. Create a XSD named “validation.xsd” into xsd folder. This schema will hold the validation data though the flow. Use the following structure:

<xsd:complexType name="Validation">
<xsd:sequence>
<xsd:element name="Payload">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:any/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="ValidationErrorList" type="tns:ValidationErrorList"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ValidationErrorList">
<xsd:sequence>
<xsd:element name="ValidationError" type="tns:ValidationError" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ValidationError">
<xsd:sequence>
<xsd:element name="code" type="xsd:int"/>
<xsd:element name="message" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>

The next step is create the XQuery transformation file to do the validation and fill the previous structure. The input of transformation is the OrderRequest schema and the output is the Validation schema.

- Source Types: OrderService-v1.wsdl/OrderRequest
- Target Types: validation.xsd/Validation

Now I suggest you to forget the visual editor. Drag-and-drop fields will not help you in more complex transformations (it’s not .NET rs) and will make a mess in the source code, so the better to do is reading some concepts of XQuery and then make your code by the hands. In this sample we’ll validate the following:
- The mandatory fields to the operation, that are: FirstName, DocumentNumber, Email, Age, ProductID and Quantity
- DocumentNumber: must contain only numbers
- Email: must comply to the regex (^[a-z0-9]+([\._-][a-z0-9]+)*@[a-z0-9_-]+(\.[a-z0-9]+){0,4}\.[a-z0-9]{1,4}$)
- Password and PasswordConfirmation: if filled, must be the same value
- ProductRestriction: if filled and value is “OVER18”, the customer must be 18+ years old
- Echo the full request message in the payload tag to use in the flow if necessary

(:: pragma bea:global-element-parameter parameter="$orderRequest" element="ns1:OrderRequest" location="../wsdl/OrderService-v1.wsdl" ::)
(:: pragma bea:schema-type-return type="ns0:Validation" location="../xsd/validation.xsd" ::)

declare namespace xf = "http://tempuri.org/AdvancedValidation/xq/RequestValidation/";
declare namespace ns1 = "http://gibaholms.wordpress.com/samples/wsdl/Order-v1.0";
declare namespace ns0 = "http://gibaholms.wordpress.com/samples/xsd/2011/11/validation";

declare function xf:RequestValidation($orderRequest as element(ns1:OrderRequest)) as element() {
<ns0:Validation>
<ns0:Payload>{$orderRequest/.}</ns0:Payload>
<ns0:ValidationErrorList>{
(: BEGIN - Required Field Validations :) 
if (empty($orderRequest/ns1:Customer/ns1:FirstName/text())) then
<ns0:ValidationError>
<ns0:code>1</ns0:code>
<ns0:message>FirstName: Required Field</ns0:message>
</ns0:ValidationError>
else if (empty($orderRequest/ns1:Customer/ns1:DocumentNumber/text())) then
<ns0:ValidationError>
<ns0:code>2</ns0:code>
<ns0:message>DocumentNumber: Required Field</ns0:message>
</ns0:ValidationError>
else if (empty($orderRequest/ns1:Customer/ns1:Email/text())) then
<ns0:ValidationError>
<ns0:code>3</ns0:code>
<ns0:message>Email: Required Field</ns0:message>
</ns0:ValidationError>
else if (empty($orderRequest/ns1:Customer/ns1:Age/text())) then
<ns0:ValidationError>
<ns0:code>4</ns0:code>
<ns0:message>Age: Required Field</ns0:message>
</ns0:ValidationError>
else if (empty($orderRequest/ns1:Product/ns1:ProductID/text())) then
<ns0:ValidationError>
<ns0:code>5</ns0:code>
<ns0:message>ProductID: Required Field</ns0:message>
</ns0:ValidationError>
else if (empty($orderRequest/ns1:Product/ns1:Quantity/text())) then
<ns0:ValidationError>
<ns0:code>6</ns0:code>
<ns0:message>Quantity: Required Field</ns0:message>
</ns0:ValidationError>
(: END - Required Field Validations :) 

else

(: BEGIN - Field Specific Validations :) 
if (not(matches($orderRequest/ns1:Customer/ns1:DocumentNumber/text(), '^[0-9]+$'))) then
<ns0:ValidationError>
<ns0:code>7</ns0:code>
<ns0:message>DocumentNumber: Must Contain Only Numbers</ns0:message>
</ns0:ValidationError>
else if (not(matches($orderRequest/ns1:Customer/ns1:Email/text(), '^[a-z0-9]+([\._-][a-z0-9]+)*@[a-z0-9_-]+(\.[a-z0-9]+){0,4}\.[a-z0-9]{1,4}$'))) then
<ns0:ValidationError>
<ns0:code>8</ns0:code>
<ns0:message>Email: Not a Valid Email Format</ns0:message>
</ns0:ValidationError>
else if (not(empty($orderRequest/ns1:Customer/ns1:Password/text()))
and $orderRequest/ns1:Customer/ns1:Password/text() != $orderRequest/ns1:Customer/ns1:PasswordConfirmation/text()) then
<ns0:ValidationError>
<ns0:code>9</ns0:code>
<ns0:message>Password: Must Match PasswordConfirmation</ns0:message>
</ns0:ValidationError>
else if (not(empty($orderRequest/ns1:Product/ns1:ProductRestriction/text()))
and $orderRequest/ns1:Product/ns1:ProductRestriction/text() = 'OVER18'
and xs:int($orderRequest/ns1:Customer/ns1:Age/text()) < 18) then
<ns0:ValidationError>
<ns0:code>10</ns0:code>
<ns0:message>Customer Must Be OVER 18 Years</ns0:message>
</ns0:ValidationError>
(: END - Field Specific Validations :) 

else ''
}</ns0:ValidationErrorList>
</ns0:Validation>
};

declare variable $orderRequest as element(ns1:OrderRequest) external;

xf:RequestValidation($orderRequest)

Now add an “Assign” activity to evaluate the validation XQuery against a newly created validation variable. We’ll use this variable in the error handler to gain access to the validation codes and messages.

- Expression: $body/ord:OrderRequest
- Variable: validation

We put now an “IF-ELSE” control to check if exists any validation errors. Don’t forget to add the validation xsd target namespace into “val” prefix.

- Add Namespace Definition: val – http://gibaholms.wordpress.com/samples/xsd/2011/11/validation
- Condition: count($validation/val:ValidationErrorList/val:ValidationError)

Then we raise a custom error with the code of our choice “BUS-1”. Note that this code will be used to capture the error in the handler, just remember that code.

- Code: BUS-1
- Message: Request Validation Error

Now the pipeline is finished. We now concentrate the fault treatment in a single error handler associated to the entire flow, adding a “IF-ELSE” action to act as a switch, verifying which of the code is present in the “fault” variable and manually throwing the typed faults declared in our service contract (OrderBusinessFault and OrderTechnicalFault). Obs.: the code “BEA-382505” refers to the internal OSB code that is used for “Validate” action errors.

- Condition: $fault/ctx:errorCode = ‘BEA-382505’


- Condition: $fault/ctx:errorCode = ‘BUS-1’

Now just add an Assign activity in the “if” branches that apply the typed soap faults. Just remember that the built-in $fault variable refers to the OSB infrastructure fault, and not to soap fault. So if you try to assign a soap fault directly to this variable, it just doesn’t work. To throw a soap fault you must assign the entire soap body. Don’t forget to add the “ord” prefix namespace pointing to “http://gibaholms.wordpress.com/samples/wsdl/Order-v1.0”. First let’s add the technical faults, that apply in case of schema validation failure (BEA-382505) and case else (any infrastructure error that we are not treating):

- Add Namespace Definition: ord – http://gibaholms.wordpress.com/samples/wsdl/Order-v1.0
- Variable: body
- Expression:

<soap-env:Body>
<soap-env:Fault>
<faultcode>soap-env:Server</faultcode>
<faultstring/>
<detail>
<ord:OrderTechnicalFault>
<ord:message>Schema Validation Failure</ord:message>
</ord:OrderTechnicalFault>
</detail>
</soap-env:Fault>
</soap-env:Body>

- Add Namespace Definition: ord – http://gibaholms.wordpress.com/samples/wsdl/Order-v1.0
- Variable: body
- Expression:

<soap-env:Body>
<soap-env:Fault>
<faultcode>soap-env:Server</faultcode>
<faultstring/>
<detail>
<ord:OrderTechnicalFault>
<ord:message>SOA Infrastructure Error</ord:message>
</ord:OrderTechnicalFault>
</detail>
</soap-env:Fault>
</soap-env:Body>

Now to add the treatment to our “BUS-1” fault, we can use the message applied by the validation XQuery (notice that any information that might be of interest can be propagated through the $validation variable). Don’t forget to declare the “val” namespace prefix to “http://gibaholms.wordpress.com/samples/xsd/2011/11/validation”:

- Add Namespace Definition: ord – http://gibaholms.wordpress.com/samples/wsdl/Order-v1.0
- Variable: body
- Expression:

<soap-env:Body>
<soap-env:Fault>
<faultcode>soap-env:Server</faultcode>
<faultstring/>
<detail>
<ord:OrderBusinessFault>
<ord:message>{$validation/val:ValidationErrorList/val:ValidationError[1]/val:message/text()}</ord:message>
</ord:OrderBusinessFault>
</detail>
</soap-env:Fault>
</soap-env:Body>

To finish our sample, just add the “Reply” activities to each “if” branch indicating that invocation returned “With Failure” (causes http 500 return code):

- Reply: With Failure

Well done! You can open the OSB console and do the tests in the Proxy Service. Input some data and observe the results. Don’t forget to start the Mock Service into SoapUI and point the Business Service endpoint to the mock address. An easy way to simulate a generic technical fault is stopping the mock service, crashing the business service like it was “unavailable”.Now I hope this article has helped you to doing more complex validations beyond the limits of xsd into Oracle Service Bus 11g.

Attachments

Source Code: https://github.com/gibaholms/articles/tree/master/Advanced_Validation_in_OSB

Improve Hibernate Caching Performance

10/24/2011

In a Hibernate application, a common problem that people have is performance bottleneck. Principally in web applications, where is difficult to predict the number of users that will access the application, when the system faces an increase of simultaneous accesses, the response time decreases abruptly.

The common solution usually is the following:

  • Use a connection polling framework to hold some open connections ready to use (e.g. c3p0, Commons DBCP)
  • Enable second level cache using a well known cache provider (e.g. EhCache, OSCache, SwarmCache)
  • Use query cache to the most used queries

However, regardless the fact that the connection polling saves the time to open a new ready to use connection, fetching the connection from the pool is very expensive too. Principally in the case of using pooling frameworks that not handle concurrence and blocking very well (particularly I never made benchmarks comparing connection pooling frameworks, however some people in the web says that Commons DBCP have more locks in a concurrence scenario than c3p0, that handles betters multithreading accesses).

Few people know (or even care) that the Hibernate framework by default gets a connection from the pool every time that a session is created, regardless of hitting the cache or not. In other words, if your query spends 1ms to get the connection for the pool and 2ms to execute the statement, the caching will gain only the last 2ms because Hibernate always gets a connection, and adding the fact of the bad concurrence handling of the pooling framework, it can be a very bad bottleneck.

LazyConnectionDataSourceProxy

To solve this problem, the Spring framework provides a class named org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy (http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/jdbc/datasource/LazyConnectionDataSourceProxy.html). This class acts like a proxy to the real pooled data source fetching the connections lazily. This proxy will fetch the connection only when the first statement is created. In other words, if your hibernate query hits the cache, no connection is fetched from the data source at all.

To use this class is very easy and declarative through the Spring context file and no code changes are required, thanks to dependency injection:


<!-- The real DataSource, e.g. a pooled datasource registered at server JNDI -->
<bean id="dataSource">
<property name="jndiName" value="jdbc/MYDATABASE"/>
</bean>

<!-- Wrapping the real datasource into the Spring lazy datasource feature -->
<bean name="lazyConnectionDataSourceProxy">
<property name="targetDataSource" ref="dataSource" />
</bean>

<!-- Refer to the lazy datasource bean when injecting data sources (e.g. session factory, transaction manager) -->
<bean id="hibernateSessionFactory">
<property name="dataSource" ref="lazyConnectionDataSourceProxy" />
<!-- other configuration data … -->
</bean>

Conheçam o FFPOJO – Flat File Pojo Parser

08/08/2010

Olá pessoal… meus colegas de trabalho mais próximos já conhecem o projeto FFPOJO, mas eu ainda estava devendo um post dedicado a ele aqui no meu blog.

A idéia começou quando percebi quem em 60% dos projetos em que eu estava atuando, de uma forma ou outra, trabalhavam em algum momento com arquivos texto… importações de arquivos texto para o banco de dados, exportação de arquivos para integração com terceiros, para parametrizações… e ainda extrapolando a barreira dos arquivos texto, os layouts de troca de informação “posicional” ou utilizando um “delimitador” também são muito vistos em comunicações via sockets.

A princípio, os desenvolvedores utilizaram a abordagem convencional… leitura via streams de dados e parse com substrings (posicionais) e splits (delimitados). Porém, como já era de se esperar, o código ficava horroroso… totalmente estruturado e dificílimo de manter e evoluir. Foi neste momento que senti a necessidade de utilizar uma abordagem mais orientada a objetos, e fui no Google à procura de frameworks para parse de arquivos texto.

Minha busca foi muito decepcionante, encontrei poucos frameworks, e muito ruins… em alguns deles o código necessário para utilizá-los era tão grande que ficaria igual ou pior do que a abordagem tradicional, e eles abstraiam muito pouco o seu domínio principal.

Neste momento eu filosofei um pouco sobre o ORM (Object Relational Mapping), que é basicamente uma técnica para abstrair o banco de dados para uma abordagem orientada a objetos… neste momento senti que era justamente disso que eu precisava, e no momento me veio na cabeça a expressão OFM (Object Flat Mapping), que a meu ver seria perfeito, trabalhar com arquivos texto (flat files) orientado a objeto.

Foi quando decidi criar o FFPOJO, um framework open source para manipulação de arquivos texto baseados em layouts posicionais ou delimitados, no qual implementei algumas características interessantes:

  • Configuração do “OFM” (Object Flat Mapping) via XML, Annotations, ou ambos (onde o XML sobrescreve as Annotations)
  • Cache de metadados, o que permite um parsing mais performático
  • Flexibilidade para trabalhar em baixo nível (text to pojo e pojo to text)
  • Uso do conceito de Decorators para conversão customizada de campos (sim, roubei a idéia do DisplayTag)
  • Conceito de Flat File Reader, permitindo definição de headers e traillers
  • Para arquivos em disco, realiza leitura utilizando o NIO (Java New IO), onde constatei um ganho de 25% no acesso a disco para leitura de dados
  • Permite trabalhar em modo push através do conceito de Record Processor, que suporta processamento single-thread e multi-thread
  • Conceito de Flat File Writer para gravação de arquivos
  • Leve, sem dependências a outros frameworks

O FFPOJO foi criado utilizando modelagem orientada ao domínio (DDD) e abusando dos testes unitários.

Meus colegas, eu ficaria feliz se tentassem utilizá-lo quando precisarem trabalhar com arquivos texto, podem contar comigo se tiverem alguma dúvida de implementação… também ficaria feliz se contribuíssem e reportassem possíveis bugs, sugestões de melhorias também são muito bem vindas.

Para instruções técnicas criei um rápido manual utilizando o Trac do SourceForge, segue abaixo os links:

Abraços a todos !

Prova SCDJWS 5.0

05/17/2010

Prova SCDJWS 5.0 – Sun Certified Developer for Java Web Services

Quem acompanha meu Twitter já soube… depois de bastante esforço, consegui passar na prova SCDJWS 5.0 da Sun “na raça”. Quem teve a oportunidade de prestar a prova quando ela ainda era Beta, pode usufruir de uma porcentagem mais baixa para ser aprovado, algo em torno de 42%. Porém, quem deixou para depois (meu caso), agora tem que encarar uma nota de corte de 68% para ser aprovado. Felizmente consegui um score de 89% e passei por mais essa.

Enfim, o objetivo deste post é comentar um pouco sobre a prova e como me preparei para ela… espero fornecer algumas informações úteis para quem deseja encarar este desafio.

Para quem não conhece a prova, segue abaixo os macro-objetivos e o site da prova:

Principais Características da Prova

  • Poucas questões exigindo conhecimento de código de APIs. Durante os estudos me preocupei bastante que caíssem muitas questões contendo códigos-fonte das APIs solicitadas (SAAJ, JAXR, SAX, DOM, StAX, XSLT, etc). Porém, as questões da prova foram na maioria conceituais, solicitando saber diferenças, propósito e cenários de uso destas APIs.
  • Muito XML Schema, contendo montagens corretas e incorretas de elementos e tipos complexos. É bastante cobrado também mapeamentos de tipos xsd para Java (e também para C# na seção de interoperabilidade).
  • Como disse acima, cai sim questões sobe Microsoft WCF, porém sempre bem básicas, como por exemplo uso do svctool, características do mapeamento xsd para C# e questões conceituais de interoperabilidade (WSIT).
  • Bastante cobrado WSDL e identificação de diferenças entre document-style e rpc-style, mapeamento de portTypes para Java e WS-I Basic Profile.
  • Pouco conhecimento específico das annotations da JAX-WS. As questões de JAX-WS não exigem decoreba de annotations, são mais conceituais.
  • Na questão de segurança, foi cobrado um pouco de SSL, WS-Security e SAML (single-sign-on) de forma conceitual. Também são mostrados cenários onde devemos identificar qual abordagem de segurança é mais apropriada.
  • Quanto aos Web Service / Integration Patterns também são mostrados cenários.
  • UDDI – maldito tópico! Decorem todas as operações fornecidas pela Publishing e Inquiry API, suas respectivas na JAXR e as entidades trabalhadas tanto na espec. do UDDI quanto na JAXR.

Principais “Gafes” da Prova

  • Nos objetivos pede para estudar SOAP 1.2, porém na parte de SOAP Fault, acaba cobrando sobre as soap:Faults da versão SOAP 1.1. A parte de soap:Header já cobra da 1.2.
  • No papel impresso de report que sai no final da prova com o resultado, no meu apareceu “Passing Score 42%, Your Score: 89%”. Que gafe ! 42% era o score da prova beta, o score oficial publicado no site e em todas as informações durante a prova é de 68%. Agora nunca vou saber se este erro ocorreu apenas na impressão do report, ou na hora de computar a aprovação/reprovação, qual score que vai contar de verdade. Porém, existe um comentário no fórum JavaRanch de um candidato que bombou com 60%, logo acredito que na hora de contar, o que vale são os 68 mesmo, não tem moleza não… mas de qualquer forma, bela gafe da equipe de organização da prova, isso aí pode valer até processo judicial.

Material de Estudo

É isso ae galera… desculpem não ter escrito mais, mas já são 1h da matina de domingo e amanhã preciso trabalhar.

De qualquer forma, espero ter ajudado… BOA SORTE !!!

Fora do Quadrado com Java NIO e Binary Search

03/25/2010

Primeiramente, boas vindas pessoal !!      

Este é o meu primeiro post… primeiro de muitos. Obrigado aos meus colegas de trabalho @rafanoronha e @alnascimento por me incentivarem constantemente na criação do meu blog.      

 Neste primeiro assunto vou mostrar um caso real de software que ocorreu em um projeto do nosso time no Software Delivery Center na Stefanini, onde tive o prazer de dar alguns pitacos técnicos. Tinhamos uma funcionalidade em que precisávamos localizar uma determinada lista de palavras dentro de um dicionário de aproximadamente 70 mil palavras (não podíamos utilizar Lucene e o sistema era intranet).      

Primeira solução imaginada… no quadrado… vamos subir estas palavras no nosso poderoso Oracle, criar um mega índice na palavra, que a busca será super rápida. E realmente era rápida… eh… para até umas 10 palavras. Quando testávamos um texto razoável de 200 palavras, o tempo de busca já ia pra ordem de minutos, o que ra inviável em produção. Aí já viu… faz mais índice, segmenta em tabelas separadas, enche de if, parseia via procedure, diversas tentativas e nada.      

Foi quando percebi que precisava encontrar uma solução fora do quadrado… deitado na minha cama, pensei… Oracle ? Pra que ?! Quem precisa de Oracle ? Por acaso o Google indexa sua base de sites no Oracle ?! Acho que não ! Logo me veio em mente, vamos indexar isso em arquivo texto! Todos sabem que IO no disco é muito mais rápido que ir no banco, não tem controles de conexão, sessões, camadas de rede no meio… é tudo alí, na lata.       

Solução: Java NIO + Binary Search  … vamos entender porque.       

Java NIO      

Muitas pessoas não sabem que este recurso existe, muito menos conhecem seu poder. A maioria dos desenvolvedores utilizam o velho IO, baseado em Streams (xxxInputStream, xxxOutputStream)… o problemas dos Streams é que eles são baseados em bytes de forma unitária. Um Stream comum faz operações de IO byte-a-byte, e são unidirecionais (ou é input, ou é output).      

A partir do JDK 1.4, foi introduzida a API Java New IO (NIO), que trouxe o poder do IO de baixo nível para o java. Alguns autores brincam que ela deveria se chamar de LLIO (Low-Level IO) ao invés de NIO. O principal diferencial desta nova API é que ela trabalha com blocos de bytes (buffers) e canais (channels). Fazer operações de IO com blocos de bytes é muito mais rápido que fazer byte-a-byte… os channels são bidirecionais, o que é muito mais natural pois é como o Sistema Operacional trabalha. Sem contar da capacidade de implementar non-blocking-io e utilizar recursos a nível de SO, como as Mapped Files, que são a base da nossa solução desenvolvida.       

Problema 1: como gerar o arquivo texto ?       

Ora… esta é fácil… fazemos um batch que roda 1 vez por dia, de madrugada. Tudo que ele tem de fazer é um “SELECT *” na tabela de palavras e gravar num arquivo texto, com um número fixo de bytes por registro (palavra), para podermos encontrá-los facilmente no arquivo. E o mais importante…. o “ORDER BY” ! Um dos truques sa solução é gerar o arquivo de palavras ordenado em ordem alfabética, para podermos aplicar o famoso algoritmo da Busca Binária (ou Binary Search).      

Problema 2: como garantir acesso rápido do IO ao arquivo ?       

Aqui utilizamos o poder do NIO, através das Mapped Files. Este é um recurso poderosíssimo, talvez este post não seja o suficiente para entender a fundo… requer um pouco de conhecimento sobre SO. Mas em resumo, consiste no seguinte… quando fazemos um IO comum para ler um arquivo, os bytes do arquivo são trazidos para um buffer intermediário na memória, e este buffer é lido na aplicação. Ou seja, em um Random Access File, para se posicionar no meio do arquivo por exemplo, este buffer precisa ser percorrido até chegar no ponto desejado.      

O recurso das Mapped Files consiste em utilizar o próprio mapeamento do arquivo no FileSystem para identificar os bytes do arquivo, ou seja, não são criados buffers intermediários. É criado um mapeamento do arquivo na memória virtual que fica sincronizado com o mapeamento do arquivo no FileSystem:      

Mapped Buffer

Mapped Buffer

O que acontece de engraçado:    

  • Qualquer alteração no objeto de buffer reflete diretamente no arquivo no disco (você está trabalhando no filesystem, oras !) 
  • Mesmo para mapear um arquivo de vários gigas é consumida pouquíssima memória, pois são utilizados os recursos de cache e paginação comuns do filesystem. Na hora da leitura efetiva do arquivo, ela é feita sobre demanda.
  •  Sua paginação é sincronizada, ou seja, não são criados buffers intermediários.
  • Se o arquivo for modificado após ter sido mapeado, irão ocorrer exceptions ao fazer a leitura. Portanto, ou fazemos o lock via código, ou garantimos que o arquivo não será modificado durante a leitura. Se for modificado, temos que mapeá-lo novamente antes de ler.
  • Não ocupa quase nada do Heap da JVM para mapear o arquivo. É tudo feito a nível de SO e memória virtual.
  • O mapeamento é liberado quando o objeto do buffer é recolhido pelo Garbage Colector (e não quando fechamos o channel).

  

Problema 3: como localizar a palavra rapidamente ?       

Em uma busca convencional, teriamos que ler palavra por palavra, e comparar cada uma com a palavra que buscamos, até encontrá-la (ou não). Absurdo ! Ordenamos justamente para utilizar um modo mais performático… a Busca Binária. Com este algorítimo, descartamos as palavras de metade em metade, conseguindo um match com pouquíssimas comparações. O NIO garantirá acesso rápido a qualquer posição de byte dentro do arquivo (pois sabemos quantos bytes cada palavra ocupa). Conseguimos encontrar uma palvravra em 70 mil com algo em torno de apenas 17 comparações.      

Produto Final: GibaBizarreSearch.class  

  

public class GibaBizarreSearch {
 private final int tamanhoRegistroInBytes;
 private final int tamanhoPagina;
 private final File file;
 private final String charset; 

 private List<MappedByteBuffer> buffers; 

 public GibaBizarreSearch(File file, String charset, int tamanhoRegistroInBytes) throws IOException {
  if (tamanhoRegistroInBytes <= 0) {
   throw new IllegalArgumentException("O tamanho do registro em bytes deve ser maior que zero.");
  } else if (file == null) {
   throw new IllegalArgumentException("O objeto file não pode ser nulo.");
  }
  this.tamanhoRegistroInBytes = tamanhoRegistroInBytes;
  this.tamanhoPagina = (Integer.MAX_VALUE / tamanhoRegistroInBytes) * tamanhoRegistroInBytes;
  this.file = file;
  this.charset = charset;
 }
 
 public void mapear() throws IOException {
  if (!file.exists()) {
   throw new IllegalStateException("O arquivo indicado para leitura não existe.");
  } else if (!file.isFile()) {
   throw new IllegalStateException("O arquivo indicado para leitura não é um arquivo.");
  } else if (!file.canRead()) {
   throw new IllegalStateException("O arquivo indicado para leitura não pode ser lido, verifique as permissões no SO.");
  }
  
  this.buffers = new ArrayList<MappedByteBuffer>();
  FileChannel channel = (new RandomAccessFile(file, "r")).getChannel();
  
  long channelSize = channel.size();
  long inicio = 0;
  int tamanho = 0;
  for (long i = 0; inicio + tamanho < channelSize; i++) {
   if ((channelSize / tamanhoPagina) == i) {
    tamanho = (int)(channelSize - i * tamanhoPagina);
   } else {
    tamanho = tamanhoPagina;
   }
   inicio = i * tamanhoPagina;
   MappedByteBuffer pagina = channel.map(FileChannel.MapMode.READ_ONLY, inicio, tamanho);
   buffers.add(pagina);
  }
  
  channel.close();
 }
 
 public void desalocar() {
  if (buffers != null && !buffers.isEmpty()) {
   buffers.clear();
   buffers = null;
  }
 }
 
 public long buscar(String texto) throws UnsupportedEncodingException {
  if (buffers == null) {
   throw new IllegalStateException("O arquivo não foi mapeado.");
  }
  
  long posicaoAchou = 0;
  for (int i = 0; i < buffers.size(); i++) {  
   MappedByteBuffer buf = buffers.get(i); 

   int qtdRegistros = buf.limit() / tamanhoRegistroInBytes;
   
   byte[] registro = new byte[tamanhoRegistroInBytes];
   int inf = 0;
   int sup = qtdRegistros - 1;
   int meio = -1; 

   posicaoAchou = 0;
   while(inf <= sup && posicaoAchou == 0) {
    meio = (int)(inf + sup) / 2;
    buf.position(meio * tamanhoRegistroInBytes);
    buf.get(registro, 0, tamanhoRegistroInBytes);
    String valor = new String(registro, charset).trim();
    if (texto.compareTo(valor) > 0) {
     inf = meio + 1;
    } else if (texto.compareTo(valor) < 0) {
     sup = meio - 1;
    } else {
     posicaoAchou = meio + 1;
    }
   } 

   if (posicaoAchou > 0 || meio != qtdRegistros - 1) {
    if (posicaoAchou > 0) {
     posicaoAchou += i * (tamanhoPagina / tamanhoRegistroInBytes);
    }
    break;
   }
  } 

  return posicaoAchou;
 } 

}  

 

Explicação:  

  • Construtor: inicializa as variáveis e define o tamanho máximo da página que seja divisível pelo tamanho do registro. O tamanho máximo que podemos mapear num buffer é o range do int (Integer.MAX_VALUE). Portanto, caso o nosso arquivo tenha um tamanho em bytes que exceda este valor (tipo alguns gigas), precisamos dividi-lo em mais buffers. Porém, não podemos cortar um registro no meio e deixar uma parte em cada página, por isso esta divisão.
  • mapear(): verifica quantas páginas (buffers) são necessárias para mapear o arquivo. A chamada channel.size() retorna o tamanho total do arquivo em bytes. Este método apenas decide quantas páginas serão necessárias para mapear o arquivo inteiro, de acordo com o tamanho máximo da página que definimos no construtor. Ele realiza então os mapeamentos, e guarda cada página em um buffer da lista. Apenas arquivo muito grandes precisarão de mais de um buffer.
  • desalocar(): remove o mapeamento. Vimos que fechar o channel não influencia em nada no mapeamento. Se quisermos desfazer o mapeamento, temos que deixar os buffers elegíveis ao Garbage Collector.
  • buscar(String): efetua a busca binária e retorna a posição em que o registro (ou neste caso palavra) foi encontrada, e caso não encontre, retorna zero 0. Note que existe ao final uma consistência para verificar se existe a necessidade de abrir o próximo buffer, pois como está ordenado, se a palavra não foi encontrada em um buffer e for menor que as palavras do próximo buffer, nem precisa verificá-los pois é fato que ela não existe. Ao final existe um pequeno ajuste para adequar a posição do registro de acordo com o número do buffer atual (ex.: se é o registro de posição 5 do segundo buffer, e cada buffer tem 10 registros, então na verdade é o registro de posição 15).

Fica aqui um incentivo ! Vamos pensar fora do quadrado !


Follow

Get every new post delivered to your Inbox.