Semantic Cache System

One of the economical ways to develop a very large scale database is to distribute it among multiple server nodes. Main problem in these types of systems is retrieval of data within significant time; especially when network or server load is high. This task becomes more critical when data is to be retrieved from the database against frequent queries. Cache is used to increase the retrieval performance of mobile computing and distributed database systems. Whenever data is found locally from the cache it is termed as cache hit. Percentage of user posed queries that can be processed (partially or fully) locally from cache is called hit ratio. So, the cache system should be designed in a way that will increase the hit ratio. Improvement in hit ratio ensures efficient reuse of stored data. Due to efficient reuse of stored data, lesser amount of data is required to be retrieved from remote location.


Introduction
One of the economical ways to develop a very large scale database is to distribute it among multiple server nodes.Main problem in these types of systems is retrieval of data within significant time; especially when network or server load is high.This task becomes more critical when data is to be retrieved from the database against frequent queries.Cache is used to increase the retrieval performance of mobile computing and distributed database systems.Whenever data is found locally from the cache it is termed as cache hit.Percentage of user posed queries that can be processed (partially or fully) locally from cache is called hit ratio.So, the cache system should be designed in a way that will increase the hit ratio.Improvement in hit ratio ensures efficient reuse of stored data.Due to efficient reuse of stored data, lesser amount of data is required to be retrieved from remote location.
Typically, cache is organized in three ways, as page, tuple, and semantic.Unit of transfer in page cache is page (multiple tuples) and in tuple cache is a tuple.In page cache irrelevant tuples may be retrieved for a user query.Retrieval of irrelevant data causes to wastage of valuable resources.Tuple cache overcomes this problem by stopping the retrieval of irrelevant tuples.Major problem (retrieving portion of tuple instead of complete tuple) still exists which cannot be handled using both (page & tuple) of these caching models.These caching schemes (page & tuple) are not able to identify whether the answer is contained in the cache in case of query not fully matched (partial matched).In page and tuple cache schemes all of the data is retrieved from remote site even in the presence of partial data on cache.In simple words portion of page or portion of tuple cannot be reused in the presences of page or tuple caching.As a result hit ratio is not up to that extent to which it should be.
To answer the queries partially from local site concept of semantic cache is introduced.Semantic cache has an ability to increase the hit ratio up to possible extent.Semantic cache provides better performance than page and tuple cache and this system is referred as semantic caching.Semantic caching provides the significance workload reduction in distributed systems, especially in mobile computing as well as improves the performance.
In semantic caching the semantic descriptions of processed query with actual contents are stored.Next posed query is processed for stored semantic descriptions of data in the cache and posed query is divided into probe (portion available at cache) and remainder (portion that is not available at cache and have to retrieved from cache) queries.In this context we can say that there are two major activities query processing and cache management are involved in semantic caching.So, efficiency of the semantic caching will depends on these two major activities (query processing and cache management).Query processing is the process which returns the result against user posed query.In semantic cache query processing is done by dividing the user query into probe and remainder queries on the base of query matching.In fact, efficiency of query processing will depend on the efficiency of division process (query trimming) of user query into sub queries (probe and remainder) as well as on retrieval time against both probe and remainder queries.However, efficiency of query trimming will depends on the semantic indexing.In fact, semantic indexing at cache is a major activity of cache management.In this context we can say that efficient semantic caching system demands efficient query processing and indexing scheme.In this chapter we have discussed the state of art query processing techniques and semantic indexing scheme.We also have presented a query processing scheme sCacheQP and its complete working.Working of sCacheQP is explained with the help of case study.

Definitions
This section presents some definitions that are used in rest of the chapter.
Definition 1: Given a user query Qu = π A σ P ( R ); where 'A' is set of attributes required by user, 'P' is a condition (WHERE clause) of the user query, and 'R' is a relation.User Query's Semantics will be 3 Definition 2: Given a database D = {Ri} and its attributes set A = UA Ri , 1 ≤ i ≤n, Semantic Enabled Schema will be 6-tupple <D, R, A, S A , P, C> where 'D' is the name of database, 'R' is name of relation, 'A' is a set of attributes, is a status of attributes, 'P' is predicate (condition) on which data has been retrieved and cached, and 'C' is the refrence of contents.Definition 8: Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W > and <D, R, A, S A , P, C> respectively; Predicate Matching is a process in which user query's condition <Q W > is matched with condition indexed by semantic enabled schema <P>.
Definition 9: Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W , P A > and <D, R, A, S A , P, C>; Query Trimming is a process in which user query (Q U ) is divided into probe and remainder query.

State of the art
This section presents the brief related work to semantic caching in the context of semantic indexing and query (SELECT and PROJECT) processing for relational databases.For detail work survey done by Ahmad et al. (Ahmad et al, 2008) can be considered and for aggregate queries is discussed by Cai et al (Cai et al, 2005).Semantic caching is extensively studied by researchers in both relational and XML databases.In fact, query processing and cache management are two main areas of semantic cache system.In this section we have described the state of the art query processing as well as semantic indexing schemes.
Results of already executed queries are cached to generate more efficient query plans for centralized systems (Roussopoulosl, 1991).Some strategies are defined for cache to prefecth the data by using semantics (Kang et al, 2006).Query refinement technique is introduced to enhance the response time in multimedia databases (Chakrabarti et al, 2000).A predicate based scheme for cache is presented by Keller et al. for client server applications (Keller and Basu, 1996).A scheme with the name of Intelligent Cache Management (Chen et al, 1994) and its extensions are introduced (Ahmed et al, 2005, Altinel et al, 2003, Bashir and Qadir, 2007) to reduce the overhead of page and tuple cache.To answer the queries partially from local site; concept of semantic cache on the base of implication (Sun et al, 1989) and the base of description logic (Ali et al, 2010, Ali andQadir, 2010) is introduced to increase the hit ratio up to possible extent (Bashir andQadir 2007, Ahmad et al, 2008a).Idea of amending query is introduced to increase the hit ratio (Ren et al, 2003), graph based query trimming to enhance efficiency (Abbas et al, 2010) and 112 rules (Bashir and Qadir, 2007b) are defined to reduce query processing time by efficient query matching.Rules (112) are only applicable for simple queries (excluding the disjunct or conjunct operator).Jonsson and colleagues presents query matching scheme by using predicate of query (Jonsson et al, 2006).This scheme is not able to handle the SELECT CLAUSE of SQL queries.Query matching algorithm that reduces query processing time in the domain of relational database is also studied in our previous work (Ahmad et al, 2008a, Ahmad et al, 2008, Ahmad et al, 2009].Work in semantic cache in other domains like web [Lee et al, 1995, Luo et al, 2003] and XML (Chen et al, 2002, Sanaullah et al, 2008) also studied in literature.Importance of semantic cache and disadvantages of page and tuple cache is presented [Ren et al, 2003, Dar et al, 1996] by providing comparisons of semantic cache with page and tuple cache.
There are different structures used to index the semantic description like; flat structure (Dar et al, 1996), 3-level hierarchal (Sumalatha et al, 2007a(Sumalatha et al, , 2007b(Sumalatha et al, , 2007c) ) segments (Ren et al, 2003), and 4-HiSIS (Bashir and Qadir, 2007).When semantics of queries are store in a flat structure (Dar et al, 1996) the query matching process is very expensive (time consuming) (Godfrey et al, 1997, Ahmad et al, 2008, Ahmad et al, 2009).Cache is divided into segments (Ren et al, 2003) and chunks (Deshpande et al, 1998) to reduce the cost.Runtime complexity and caching efficiency is improved by division of cache into segments and chunks.List of chunks is build on the base of previous queries and then this list of chunks is used to split the user posed (new) queries into two portions; one answered locally from cache and the second computed remotely (Deshpande et al, 1998).4-level hierarchal semantic indexing scheme (4-HiSIS) is introduced to accelerate the semantic matching (Bashir and Qadir, 2006).In 4-HiSIS; semantic matching accomplished in four steps.At first; database name is matched.After successful matching of the database name; the relation name is matched in the second step.At third, attributes are matched at successful matching of relation match.In the final step predicate matching is performed on the based of successful matching of first three steps.There is a limitation of 4-HiSIS in the context of incompleteness; because there is no refrence of actual contents of cache is stored in 4-HiSIS.This limitation is overcomed by the graph based semantic indexing scheme (Ahmad et al, 2010) by storing the refrence of actual contents.In graph based semantic indexing scheme the matching procedure is performed in five steps.At the state of art graph based indexing is most efficient semantic indexing scheme.It also have a limitation; it has no ability to process the " Select *" type and incorrect queries in cache system.
State of the art semantic cache system has limitation in both areas (query processing and cache management i.e semantic indexing schemes).In this chapter we have presented the new scheme for semantic cache query processing.We named this system as sCacheQP.sCacheQP has an ability to overcome the limitaion in the context of query processing which is the main area of the semantic cahce system.

Semantic Cache Query Processing (sCacheQP)
This section presents the sCacheQP which is a complete procedure of query processing that overcomes the limitaions of the previous systems.Working and main driver algorithm of the sCacheQP is given in Figure 1 and Figure 2 respectively.

Query matching
In semantic cache, user posed query is matched with the stored semantics on cache.In this process the decision is taken place either data is available at cache or not.Query matching process is accomplished in two sub process splitter and rejecter.Splitter will accept the user query Q U from the user interface and splits the query on the base of three clauses (SELECT, FROM, WHERE) of the query.These three portions are called Q S (SELECT: projected attributes in the user query), Q F (FROM: Relation) and Q W (WHERE: selected rows/tupples on specific condition); and send to the rejecter for initial level checking.Q W will be empty if there is no condition on user posed query (Ahmad et al, 2009).Algorithm for splitting the user query is presented by Ahmad et al. (Ahmad et al, 2009) and given in figure 3. Responsibility of rejecter is to checks the validity of user posed query by sending the list of selected attributes (Q S ), relation (Q F ) and predicate attributes (P A ) on the schema based indexing semantics.Predicate attribute is extracted by rejecter from Q W and included in the list.If attributes list of QS, Q F and P A matched with stored schema then processing will be continued otherwise query will be rejected and processing will be stopped.Rejecter also builds Q S in the case of '*' by retrieving all attributes from schema as a list if predicate attribute exist in schema (Ahmad et al, 2009).Algorithm to validate the user query is presented by Ahmad et al. (Ahmad et al, 2009) and given in figure 4.

Query trimming
When it has been decided that data is available at cache then second step of sCacheQP is performed.In this step query is divided into two sub queries called probe and remainder queries called query trimming.This process accomplished in two stages.At first stage vertical partition takes place and the attributes that are not available (D A ) at cache directly sent to the server as rq1 (remainder query) with original predicate.We called it 1 st level query rewriter (Ahmad et al, 2009) and its algorithm is given in figure 5.The query rq1 will be computed as follow: Fig. 5. Algorithm for 1 st Level Query Rewriter.
Rest of attributes; that are common in both user and cached query forwarded to the predicate processor which worked at second stage.Predicate processor consists of four sub modules; semantic extractor, Explicit Semantic Matcher, Implicit Semantic Matcher, and Predicate Merger.At this stage predicate is simplified by just separating the portions of it on the base of conjunct and disjunct operators.Then semantics of user's query predicate with respect to the cached predicate is extracted in the form of matching columns (Mc-similar in both user query predicate and cached predicate), non-matching columns of cache (NMccolumns in cached query that are not matched with user query) and non-matching columns of user query (NMu-columns in user query that are not matched with cached query).Some other information like; data value of cache predicate (D Vc ), data value of user predicate, (D Vu ), comparison operator in cache predicate (Opc), comparison operator in user predicate (Opu).Algorithm to extract the semantics of predicate is given below in figure 6.    7 is used to match and trims the predicate.The output of predicate matching algorithm is predicate that is available at cache (C1) and predicate that is not available at cache (NC1).Working of the algorithm is explained above.

Algorithm 7: ExplicitSemanticMatching Input:{Mc[n], OPc[n], OPu[n], DVc[n], DVu[n], CC[n i], CU[n]} Output: {C1[n], NC1[n]} Method: Initilaize
www.intechopen.com As we have discussed that Explicit Semantic Matching algorithm is based on the boundary value and basic comparison operator.On the base of boundary value and comparison operators; algorithm 7 will trim the predicate into probe and remainder queries.
Remember that predicate matching algorithm having better time complexity is an alternative of satisfiability/implication (Guo et al, 1996) used to help process query in the literature (Ren et al, 2003, Jonsson et al, 2006).Computed values C1, NC1 and NMu sent to the Implicit Semantic Matching algorithm to remove the additional information.Algorithm 8 is used to perform this job that is given below in figure 8.  Algorithm 8: ImplicitSemanticMatching Input:

Else If (Mc[i] = = null) then a. C2:= (NMCC[i]) + (NMCU[i]) b. NC2 := NMCU[i] + R(NMCC[i])
Algorithm 9: PredicateMerging Input: The computed predicates are then sent to the 2 nd level query rewriter.Finally, probe and remainder queries will be computed by the 2 nd level query rewriter as follow: pq will be executed locally ad rq will sent to the server.Then result of both will be sent to the rebuilder to combine the result.

Query rebuilding
Rebuilder receives the result form server (S R ) which is retrieved across remainder queries (rq1 and rq2) and results from cache (C R ) across probe query (pq) combines both as a final result F R .Final result is viewed to the user and also updated in the cache contents if required.

Case study
To validate our proposed semantic indexing and query processing, we consider the case study of university.
Figure 10 presents the schema of university with two relations employee and students having 4 and 3 fields respectively.Fig. 10.Schema for University.
For the above given schema of university; there are 15 and 7 segments possible across employee and students relations respectively according to previous work (Ren et al., 2003).
In simple words, we can say that there are 15 queries are possible against employee and similarly 7 for students as given in table 1.
In the above example there are 22 possible queries that make separate segments.So the formula to calculate possible segments across a single relation over 'n' attributes is "2n-1".Then add segments across each relation.As in example 15+7 =22: (24-1=15 &23-1=7).Hence, 22 segments are to be visited to check availability of data on cache in the worst case which increases the response time drastically.Schema based hierarchal scheme reduces the number of comparisons to find out whether data is available at cache or not.Only 'n' comparisons are required to check availability of data on cache.Table 1 can be rearranged according to our proposed schema based semantic indexing scheme as in Table 2.  Now we divide our case study into five cases in such a way that one can easily understand our contribution and novelty of our approach.For simplicity, each of five cases is discussed standalone and not linked with other.Each case should be considered separately.We have considered that cache is managed from initial for each case.

DB Name
Case-I: In this case we will take an example that covers the query rejection at initial level.
Let us consider that user has already posed the following query and result has been stored in cache.
Data on cache will be as given in All of three queries should be rejected at initial level; but according to all of previous work query will be posted on server due to unavailability of data on cache.
It is a beauty of our proposed schema based indexing scheme that all of three queries will be rejected and query processing time will be saved.According to our proposed semantic caching architecture list of projected attributes (eName, Age in first query), relation (emMloyee in first query) and predicate attribute is checked from schema based indexing scheme; and query will be rejected due to unavailability of "emMloyee" relation in schema.
Similarly, query 2 and 3 will be rejected due to unavailability of "gpa" and "rollno" in employee table respectively and there will be no probe and remainder.
By rejecting query at initial stage; query processing can be saved.In this context our proposed semantic caching scheme has better performance than previous.

Case-II:
In this case we will take an example that covers the handling of queries having * in SELECT CLAUSE.
Let us consider that user has already posed the following query and result has been stored in cache.
Data for above query will be retrieved and stored on cache will be as given in table 5. Note that e_ID is not required but retrieved.It is due to the requirement of key-contained (Ren et al., 2003) contents.Now let us assume that user has posed the following query.Now all of the fields of employee required; but according to previous work common set will be calculated (intersection of cached attributes and user's query attributes).There is no way defined to calculate the common set of '*' and some attributes.Here we can say that all of the cached attributes for employee are required, but how can it be decided which of the attributes are not in cache and should retrieved from server.
Here again we need schema at cache (first we need schema for zero level query rejection).If schema is available at cache then SELECT CLAUSE with '*' can be handled easily.By this hit ratio is improved.Splitter splits the query and sent it to the rejecter.Rejecter checks the list of fields with relation and predicate attribute from schema based indexing semantics.Query will not be rejected due to availability of all member of list at schema.Common and difference set of attributes will be computed and sent to the 1 st level query matcher.i.e.C A and D A will be computed.Remainder query (rq1) with difference attributes (here is only one difference attribute that is 'Sal') will be generated by 1 st level query matcher like below.
Common attributes (e_ID, Age, eName) will be sent to the Query Generator (QG).Query Generator will generate probe and remainder query on the base of predicate matching.Conditioned attribute (Age) is already retrieved; so there is no need of amending query in this case.
SELECT * FROM employee WHERE age>30 First of semantics of predicate will be computed by semantic extractor as follow.
After computation of predicate semantics, predicate for probe and remainder query will be computed by using predicate matching algorithm (actually 112 rules are used here).At first, main class of algorithm is selected, here data value of user ( DVu=30) is equal to the data value of cache (DVc = 30).So, class 3 of predicate matching will be selected then priority of relational operator will be computed.Relational operator in both queries is '>'; it is low priority (defined in previous work; (Bashir and Qadir, 2007a)) operator.So, following porton of the algorithm will be executed.given below.
Here, Cc is Age, operator is '>' and DVc is 30.
So predicate for probe and remainder will be computed we say it C1 (for cached) and NC1 (for non-cached).
Due to simple predicate rules defined for complex queries will not be applied.Finally subtraction algorithm will be applied to generate final predicate for probe and remainder queries.As it is computed that NMc and NMu both are Null.So, first case of subtraction algorithm will be applied.
So, there will be no change in predicate of probe and remainder query.Then, probe query (pq) and second remainder query (rq2) will be generated as below.
In the last step result of rq1, pq and rq2 is combined by rebuilder.

Case-III:
In this case generation of amending query is elaborated.Let us consider that user has already posed the following query and result has been stored in cache.
Data for above query will be retrieved and stored on cache will be as given in table 6.Note that e_ID is not required but retrieved.It is due to the requirement of key-contained (Ren et al., 2003) contents.Now let us assume that user has posed the following query.

E_ID
Here, generation of amending query is discussed.Remaining procedure will be same as discussed in case-II.
Note that data across eName and Sal is present on cache, but predicate attribute (Age) is not on cache.Now some one cannot select the data from cache due to absence of predicate attribute; because some one cannot decide which of the data satisfy the selection criteria (Age>35).To solve this problem, another query called amending query (Ren et al., 2003) to retrieve primary attribute from server on user select criteria as below.
Then retrieved primary keys will be mapped with keys on cache and data will be presented to user.By this hit ratio is increased.

Case-IV:
In this case efficient predicate matching to improve hit ratio by using subtraction algorithm is elaborated with example.www.intechopen.comoverlapped (partially & fully) queries locally.The major challenges of semantic caching are efficient query processing and cache management.For efficient query processing we have proposed and demonstrated the working of sCacheQP system.We have provided complete working and algorithms of sCacheQP.Case study is given to elaborate the sCacheQP.In future, we have a plan to implement the system for data mining and data warehousing.

Definition 3 :
Given a user query Qu = π A σ P ( R ) and Q C having semantics <D, R, A, S A , P, C>; Data Set Du and Dc will be retrieved rows in the result of execution of Qu and Q C respectively.Definition 4: Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W > and <D, R, A, S A , P, C> respectively; Probe Query (pq) will be Du ∩ Dc .Definition 5: Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W > and <D, R, A, S A , P, C> respectively; Remainder Query (rq) will be (Du -Dc).Definition 6: Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W > and <D, R, A, S A , P, C> respectively; Query Matching is a process in which user query's semantics <Q S , Q F , Q W > are matched with semantic enabled schema <D, R, A, S A , P, C>.It is further divided into two processes; attribute and predicate matching.

Definition 7 :
Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W > and <D, R, A, S A , P, C> respectively; Attribute Matching is a process in which user query's attributes <Q S > are matched with attributes indexed by semantic enabled schema <A>.Common attributes (C A ) Q S ∩ A and difference attributes (D A ) Q S -A are calculated for probe and remainder queries respectively.

Fig. 7 .
Fig. 7. Algorithm to Evaluate Predicate.After extraction of semantics Mc, D Vc , D Vu , Opc, and Opu sent to the Explicit Semantic Matcher.Explicit Semantic Matcher trims the predicate into two portions; one for remainder (C1) and other for probe query (NC1).Explicit Semantic Matching algorithm is based on the boundary values as well as on the nature of comparison operators.There are 112 rules defined on the base of boundary values and basic comparison operators (<, <=, >, >=, = =, !=).Algorithm 7 given in figure7is used to match and trims the predicate.The output of predicate matching algorithm is predicate that is available at cache (C1) and predicate that is not available at cache (NC1).Working of the algorithm is explained above.

Definition 10: Given a user predicate Q W and cached predicate P; Predicate Implication (Q W →P) holds if and only if Q W completely overlapped with P. Definition 11: Given a user predicate Q W and cached predicate P; Predicate Satisfiablity holds if and only if Q W partially overlapped with P. Definition 12: Given
a user predicate Q W and cached predicate P; Predicate Unsatisfiablity holds if and only if Q W is not overlapped with P.

Definition 13: Given
a user query Q U and cached query Q C with semantics < Q S , Q F , Q W , P A > and <D, R, A, S A , P, C>;

Query Implications holds if and only if Q S A as well as predicate implication holds. Definition 14: Given
a user query Q U and cached query Q C with semantics < Q S , Q F , Q W , P A > and <D, R, A, S A , P, C>; Query Satisfiability holds if and only if Q S ∩ A != Φ as well as predicate implication/satisfiability holds.Given a user query Q U and cached query Q C with semantics < Q S , Q F , Q W , P A > and <D, R, A, S A , P, C>; Query Unsatisfiaility holds either Q S ∩ A = Φ or predicate implication/satisfiability does not holds.

Table 1 .
Possible segments for Given Database.

Table 2
represents structure of schema based semantic indexing instead of actual contents.There is only need to compare/match 4 and 3 fields instead of 15 and 7 segments respectively according to previous work.Also it has the ability to reject invalid queries at initial level instead of further processing.For detailed discussion and simplicity, we consider only employee table of university database.Let us consider there is employee table on server with 4 fields defined in university schema in Figure10.Employee table on server is given in table 3 below.

Table 3 .
Employee Table on Database.

table 4 .
SELECT * FROM employee WHERE age>30

Table 4 .
Contents on cache in case-I.Now let us consider user is going to pose following three queries.

Table 5 .
Contents on cache in case-II.

Table 6 .
Contents on cache in case-III.