Saturday, March 7, 2020

Data Provenance in E-Learning Essays

Data Provenance in E-Learning Essays Data Provenance in E-Learning Essay Data Provenance in E-Learning Essay We live in an information age, where the volume of data processed by humans and organizations increases exponentially by grid middleware and availability of huge storage capacity. So, Data management comprises all the disciplines related to managing data as a valuable resource. The openness of the Web and the ease to combine linked data from different sources creates new challenges. Systems that consume linked data must evaluate quality and trustworthiness of the data.A common approach for data quality assessment is the analysis of provenance information. [1] Data provenance, one kind of Meta data, relate to the transformational workflows of a data products (files, tables and virtual collections) starting from its original sources. Meta Data refers to â€Å"Data about Data†. The workflows can generate huge amount of data with rich Meta data in order to understand and reuse the data. Data provenance techniques used in e for Different provenance Consumers In this they used an example scenario to express different provenance consuming type of users. They took power consumption forecast workflows as their example scenario.In this scenario there are three kind of consuming users: the software architect, the data analyst, and the campus facility operator. So they need different provenance model for each of them. The word â€Å"quality impact†, which indicates how the quality of a process affects the output quality, is then used to guide users on what processes and data objects they need to exercise more quality control upon. 3. 2. 2 An Apropos Presentation view Generally we use two kind classifications of approaches for determine suitable presentation view of the provenance: decomposition approach and clustering approach.Decomposition method is well suited for presence of granularities clearly defined in the provenance model. In each individual activity in the workflow, we identify the most appropriate presentation granularity to satisfy the usage requirement and to meet the user’s interest. When granular levels are not specified clear, clustering approach will be used. This approach incrementally clusters the initial fines for source data are the content of a document used for machine learning, the entries in a database used to answer a query, and the statements in a knowledge base used to entail a new statement. Other artifacts that may be used in a data creation are the creation guidelines, it is used for guiding the execution of the data creation. Examples for creation guidelines are mapping definitions, transformation rules, database queries and entailment rules. The data access centers on data access executions.Data accessors perform data access executions to retrieve data items contained in documents from a provider on the Web. To enable a detailed representation of providers the model describe in paper[12] distinguishes data providing services that process data access requests and send the do cuments over the Web, data publishers who use data providing services to publish their data, and service providers who operate data providing services. Furthermore, the model represents the execution of integrity verifications of artifacts and the results.A system that uses Web data must access this data from a provider on the Web. Information about this process and about the providers is important for a representation of provenance that aims to support the assessment of data qualities. Data published on the Web is embedded in a host artifact, usually a document. Following the terminology of the W3C Technical Architecture Group we call this artifact an information resource. Each information resource has a type, e. g. , it is an RDF document or an HTML document. The data accessor, retrieves information resources from a provider.Their provenance model allows a detailed representation of providers by distinguishing data providing services, data publishers, and service providers. [1] In paper [12] a provenance graph has represented as a tuple (PE; R; type; attr) where, ? PE denotes the set of provenance elements in the graph, ? R [pic] PE X PE X RN denotes the labeled edges in the graph where RN is the set of relationship names as introduced by our provenance model, ? type : PE ; ? (T) is a mapping that associates each provenance element with its types where T is the set of element types as introduced by our provenance model attr : PE ; ? (A X V ) is a mapping that associates each provenance element with additional properties represented by attribute-value pairs where A is a set of available attributes and V is a set of values They didn’t specify the sets A and V any further because the available possible values,attributes, and the meaning of these depend on the use case. However, they introduced an abbreviated notation to refer to the target of an edge in a provenance graph: if (p? 1; p? 2; rn) [pic] R we write p? 1 [pic] = p? 2. 3. 4 Using Data Provenance for Quality AssessmentFor assessment of the quality of data, we need to find out the information types that can be used for evaluating and a methodology for calculating quality attributes. In this research paper they have introduce a provenance model custom-made to the needs for tracing and tracking provenance information about Web data. This model describes about the creation of a data item and the provenance information about who made the data to be accessed through the Web. Most of the existing approaches for information quality assessment are based on the information provided by users.Quantitative approach described in the research paper [12] follows three steps: ? Collecting the quality attributes which are needed for provenance information ? Making decision on the influence of these attributes on the assessment ? Application of a function to compute the quality In this paper author has described information quality as a combined value of multiple quality attributes, such as a ccuracy, completeness, believability, and timeliness. The assessment method described in the paper [12] follows three steps. Those are, 1. Generate a provenance graph for the data item; . Annotate the provenance graph with impact values; 3. Calculate an IQ-score for the data item from the annotated provenance graph. The main idea behind these approach is automatically determining the quality measure for a data item, from impact values, which represent the influence of the elements in a provenance graph on the particular quality of the assessed data item. In order to design a actual assessment method for the above mention general assessment approach we have to make some design decisions. We have to answer few design related question to take design decision.Questions for Step 1: What types of provenance elements are necessary to determine the considered information quality and what level of detail (i. e. granularity) is necessary to describe the provenance elements in the application scenario? and Where and how do we get the provenance information to generate the provenance graph for a data item?. For Step 2: How might each type of provenance element influence the quality of interest? and what kind of impact values are necessary for the application scenario? For Step 3: How do we determine the impact values or where do we get them from? nd How can we represent the considered information quality by a value and what function do we use to calculate such a value from the annotated provenance graph? 3. 5 Using Data Provenance for Measuring the Information Assurance Data Provenance is multidimensional metadata that specifies Information Assurance attributes like Integrity, Confidentiality, Authenticity, Non-Repudiation etc. Each Information Assurance attribute may contain sub-components within like objective and subjective values or application security versus transport security within them.In the paper [11] authors have mentioned about a framework which is based on s ubjective Logic that includes uncertainty by representing values as a triple of . The model discussed in the paper [11] is an information flow model based on complex and simple messages about which objective information assurance attribute values are collected. This model incorporates the capability to roll up data provenance information over a multi-step information flow and/or over a complex message. These aggregation is called as Figures of Merit.Next goal after having the figure of merit and information assurance attribute values is to summarize these information in a simple visual icon which helps those who must act on information quickly to understand how confidential, authentic, and unmodified the data is, therefore it helps to make more clear decision when dealing with the data. 3. 5. 1 Framework for capture data provenance record A single Data Provenance (DP) record is created, each time a message was transmitted between agents, systems or processes. This record can be stor ed or it can be send along in parallel with the message.There are two parts of DP record, one is sender part and the other one is receiver part. Each part has a variant and an invariant section. Routing information to forward the message to the final destination is contained within the variant section, and it may change during the routing process. The invariant section of the DP record remains unchanged, the sender’s invariant section may include the following components: Identity of the Author of the message, Message ID, Timestamp, Message contents and type, References to other message IDs, e. g. attachments, Destination, Security label or classification, Outgoing Information Assurance values, and Hash value of the message contents. The receiver appends his own values to the record, adding Identity of the Receiver of the message, Timestamp, Incoming Information Assurance values, and Hash of the message body as seen by the receiver. The receiver may append a signature or an e ncrypted hash based on both the sender and receiver’s records. 3. 5. 2 Subjective Logic Josang’s Subjective Logic is used for modeling a flexible mechanism to calculate the confidentiality and this mechanism also helps to deal with uncertainty.Josang’s Subjective Logic uses three values b, d, and u, where b = belief, or the belief that the proposition is true d = disbelief, or the belief the proposition is false u = uncertainty, or the amount of uncommitted belief These components satisfy b + d + u =1, and b, d, u [pic] [0,1] 3. 5. 3 Implementation The models of information flow and of the data provenance at each point along that flow is captured in a semantic model. The target representation was the Web Ontology Language (OWL) with a rules layer above to capture domain inferences not implied by the formal models.Controlled English representation called the Semantic Application Design Language (SADL) is used as the authoring environment. SADL is a language that maps directly and unambiguously into OWL and Jena Rules or SWRL. An Eclipse-based SADL-IDE supports the authoring, testing, and version control of the models. Snapshots of the data provenance state of the Message are captured as instances of DPInfo. When a Message is sent by an Agent, a SenderDPInfo (subclass of DPInfo) captures relevant data provenance information.When a Message is received by an Agent, a ReceiverDPInfo (also a sub class of DPInfo) captures the data provenance state at receipt. In this model they have decided to calculate each of IA attributes individually. They have created a visual summary of the IA values, to support in the decision process. The IA values they used are Integrity, Confidentiality, Authenticity, Availability, and Non-repudiation. 3. 6 Issues in Data Provenance There are some open problems exist with data provenance. Those are Information management infrastructure, Provenance analytics and visualization, Interoperability, Connecting database and wo rkflow provenance. 6] Information management infrastructure. With the growing volume of raw data, workflows and provenance information, there is a need for efficient and effective techniques to manage these data. Besides the need to handle large volumes of heterogeneous and distributed data, an important challenge that needs to be addressed is usability: Information management systems are notoriously hard to use. As the need for these systems grows in a wide range of applications, notably in the scientific domain, usability is of paramount importance. The growth in the volume of provenance data also calls for techniques that deal with information overload.Provenance analytics and visualization. The problem of mining and extracting knowledge from provenance data has been largely unexplored. By analyzing and creating insightful visualizations of provenance data, scientists can debug their tasks and obtain a better understanding of their results. Mining this data may also lead to the d iscovery of patterns that can potentially simplify the notoriously hard, time-consuming process of designing and refining scientific workflows. Interoperability. Complex data products may result from long processing chains that require multiples tools (e. . , scientific workflows and visualization tools). In order to provide detailed provenance for such data products, it becomes necessary to integrate provenance derived from different systems and represented using different models. This was the goal of the Second Provenance Challenge, which brought together several research groups with the goal of integrating provenance across their independently developed workflow systems. Although the preliminary results are promising and indicate that such an integration is possible, there needs to be more principled approaches to this problem.One direction currently being investigated is the creation of a standard for representing provenance Connecting database and workflow provenance. In many s cientific applications, database manipulations co-exist with the execution of workflow modules: Data is selected from a database, potentially joined with data from other databases, reformatted, and used in an analysis. The results of the analysis may then be put into a database and potentially used in other analyses. To understand the provenance of a result, it is therefore important to be able to connect provenance information across databases and workflows.Combining these disparate forms of provenance information will require a framework in which database operators and workflow modules can be treated uniformly, and a model in which the interaction between the structure of data and the structure of workflows can be captured. Another issue which in data provenance is Data Citation, Data citation is about citing a component in a digital library that consists of documents and databases. More generally there is no specified way citing. In databases they use key for citing tuples. Docum ent can be cited using Url, universal locator of the document.Most major problem of provenance is invalidated citing due to update of the cited documents. To overcome this problem there are many solution but each with a problem. One way of solution is release successive version of database separately. But it needs large storage. Another way of solution is kept history of database to trace history of the components of database. But it is complex. As a solution of whole, by giving a date to url, at least the person who follows the citation will know whether to question the validity of the citation. 7] 3. 6. 1 DATA PROVENANCE AND FINANCIAL SYSTEMIC RISK Data Provenance is not only important in E-Learning environment but also play vital part in large-scale analytic environment to support financial systemic risk analysis. In financial Sector, Data should be managed as a strategic, enterprise asset. This requires active management of data quality, so that managers or CEOs of the organizat ion understand the quality of the information on which they base their decisions. So Data provenance is needed for make a financial based decision. rovenance information enables analysts to better understand the data and assumptions used for potentially vast numbers of simulation runs. Even though, it is not enough to provide data structures, query mechanisms, and graph renderings for provenance; one also needs a scalable strategy for collecting provenance. 4. Data Provenance and E-Learning Rather than thinking like â€Å"E-Learning is a new education method which uses Internet† , actual norm can be expressed as collaborating different pieces of technologies/products to make learning happens.This gradually leads to the idea of virtual learning environment. In E-Learning most of the resources which are related to the studies are gathered from web. So it is important to make sure that the information gathered from web is trustworthy. Some of the information provided in internet is not considered as proper reference. For example wikipedia. org, wikipedia is an online encyclopedia, for most of the google search result wikipedia will appear in the top of the search result, still it is considered as untrusted source because of the openness of it.There is another problem, some of the information would be truthful but the information is outdated, so referring to that information is incorrect. This is where the data provenance come into play. Data provenance can be used to get information about data creation, and the modification happened to the information. Using these information we can come to a conclusion about trustworthiness of the resource gathered from the web. Most of the researches in data provenance are done under the field of E-Science, but it can be adapted into the E-Learning environment.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.