Thursday 6 December 2012

Source Apportionment through Advanced Factor Analysis Modeling



Brian Diefendorf
Honors Thesis Proposal

 

Source Apportionment through Advanced Factor Analysis Modeling


The field of aerosol chemistry is very diverse and extensive. The goal of this project is to analyze several sets of particle composition data such as data from the Regional Air Pollution Study of St. Louis using Positive Matrix Factorization. With this approach, we seek to resolve and distinguish the sources contributing to the measured particulate matter concentration at each site. If successful, a greater understanding of source contributions will be gained as well as a more accurate way to control particulate matter in the atmosphere.
The first National Ambient Air Quality Standards were established in 1971. These standards set limits on allowable levels of particulate matter suspended in the air. However, these standards only focused on total particulate mass and did not distinguish between the different sizes of particles present in air. As such, control efforts were directed towards the larger particles, as they comprise a larger proportion of the total particulate mass and are more easily removed from the air. There is increasing evidence that the smaller particles pose a greater threat to human health. These risks are mostly the result of respiratory problems that arise from the continued inhalation of the particulate matter. Thus, it is important to control and regulate the inhalable particles, those with diameters ten mm or less. The current NAAQS for 10mm or less diameter particles is an annual arithmetic mean of 50 mm /m3 and a 24-hour average of 150 mm /m3. Standards for particles in the 2.5 mm or less range have also been established. The standard calls for an annual mean of 15 mm /m3 and a 24-hour average of 65 mm /m3. The measurement and tracking of the relative emission rates from the sources would enhance the controls developed from these standards. Nonetheless, the complexity of urban ecosystems makes determining the sources of particulate matter a very difficult problem.
The fundamental principle behind source receptor relationships is the conservation of mass and the application of a mass balance analysis. This analysis can be used to identify and assign sources of airborne particulate matter in the atmosphere. This particular method is known as receptor modeling. There are two approaches to obtaining a data set for receptor modeling. One is to determine a large number of chemical constituents such as elemental concentrations in a number of samples. The other is to use automated electron microscopy to characterize the composition and shape of particles in a series of particle samples. Regardless of the approach used, a mass balance can be made to account for all chemical species in the samples as contributions from the independent sources.
Natural physical constraints exist on the system and must be considered in developing any model and obtaining physically realistic solutions from the model [Henry, 1991]. These fundamental, natural constraints are:
1)      The model must reproduce the original data; the model must explain the observations.   
2)      The predicted source compositions must be non-negative; a source cannot have a negative percentage of an element.
3)      The predicted source contributions to the aerosol concentrations must all be non-negative; a source cannot emit negative mass.
4)      The sum of the predicted elemental mass contributions for each source must be less than or equal to the total measured mass for each element; the whole is greater than or equal to the sum of its parts.

There are several methods utilized to model such problems. Miller et al. (1972) initially used a chemical element balance analysis to solve these problems. In this method, it is assumed that the number and composition of the sources are known. The observed composition data is then regressed against the known source profile matrix. This method has produced very good fits to the data in recent studies. The major drawback in this method is that it requires an a priori knowledge of both the number and composition of the source emissions.
Thus, it is necessary to estimate the number and composition of the sources as well as their contributions to the measured particulate mass. The multivariate data analysis methods that are used are referred to as factor analysis. The most common form of factor analysis is Principal Components Analysis (PCA). The PCA results are usually calculated using an eigenvector analysis of a correlation matrix [Hopke, 1985; Henry, 1991]. The PCA method utilizes a singular value decomposition of the matrix. However there are numerous problems that arise from using the PCA method. Paatero and Tapper [1993] show that in PCA, there is a scaling of the data by column or row. This scaling will lead to distortions in the analysis. They also show that the optimum scaling of the data would be to scale each data point individually so as to have more precise data, having greater influence on the solution than points with higher uncertainties. However, point by point scaling results in a data matrix that cannot be reproduced by a conventional factor analysis method based on the singular value decomposition. Therefore, it is necessary to use a different form of factor analysis.
Recently, a new technique known as Positive Matrix Factorization has been developed. Positive Matrix Factorization (PMF) differs from previous analysis methods in that all other methods are eigenvector based and the problem of non-optimal scaling has been specifically addressed in the PMF method. PMF utilizes error estimates of the data to provide optimum data point scaling. This scaling is accomplished through considering the problem as a least-squares problem. Initially the problem was solved iteratively using alternating least squares [Paatero and Tapper, 1993]. In an early version of this approach, one of the matrices is taken as known and the chi-squared is minimized with respect to the other matrix as a weighted linear-least-squares problem. Then the roles of the matrices are reversed and the process is repeated. This reversal is continued until convergence. The drawback to solving PMF in this fashion is that the process is slow. In order to improve speed, each step in the iteration was changed so that modifications are made to both matrices instead of only one [Paatero and Tapper, 1994]. Subsequently, a computer program, PMF2, determines the joint solution.
Now that the problem and the data analysis methods utilized have been identified, it is necessary to determine a data set to examine PMF's analysis power. Data from the Regional Air Pollution Study (RAPS) of St. Louis, MO makes an excellent choice for PMF analysis. From May 1975 to Apri11977, roughly 35,000 ambient aerosol samples were collected at 10 sampling sites in and around the city of St. Louis (Goulding et al., 1981). Coarse and fine fractions of particles were deposited on membrane filters utilizing dichotomous air samplers (Nelson, 1979). Total mass of samples was measured by B-gauge measurements and for concentrations of up to 27 elements by utilizing an energy dispersive X-ray fluorescence analysis (Goulding et al., 1981). This data is not only robust in samples, but it has been examined using numerous analysis methods and thus provides the perfect opportunity to compare PMF's prove its accuracy and precision to that of other methods.
It is likely that additional data from Washington, DC with more complete elemental analysis will also be examined as time permits. These data include organic and elemental carbon that was not measured in St. Louis.
Recent unpublished studies have suggested that these variables can be used to separate diesel from spark ignition sources, which is a critical problem currently facing receptor modeling. The Washington data will provide a good test to see if such an analysis can be duplicated elsewhere.



References:
Alpert, D.J and P.K. Hopke (1981) A Determination of the Sources of Airborne Particles      Collected During the Regional Air Pollution Study, Atmospheric Environment 15:675-        687.

Chang, S.N, P .K. Hopke, G.E. Gordon and S. W .Rheingrover (1988) Target-Transformation           Factor Analysis of Airborne Particulate Samples Selected by Wind-Trajectory Analysis, Aerosol Sci. Technol. 8:63-80.

Cobourn, W .G., and R.B. Husor (1982) Diurnal and Seasonal Pattems.of Particulate Sulfur and       Sulfuric Acid in St. Louis, July 1977- June 1978, Atmospheric Environment 16:1441-      1450.

Dzubay, T.G. (1980) Chemical Element Balance Method Applied to Dichotomous Sampler Data,     New York Academy of Sciences 338:126-144.

Goulding, F .S, J.M. Jak1evic and B. W .Loo (1978) Aerosol Analysis for the Regional Air   Pollution Study-Interim Report. EPA-600/4-78-034, U.S. Environmental Protection    Agency, Research Triangle Park, N.C.

Henry, R.C. (1991) Multivariate Receptor Models, In: Receptor Modeling for Air Quality      Management, P .K. Hopke, ed., Elsevier Science Publishers, Amsterdam, 117-147.           Hopke, P .K. (2000) A Guide to Positive          Matrix Factorization

Hopke, P .K. (1985) Receptor Modeling in Environmental Chemistry, John Wiley & Sons, Inc.,       New York.

Hwang, C.S., K.G. Severin, and P.K. Hopke (1984) A comparison of R- and Q- Modes in target       Transformation Factor Analysis for Resolving Environmental Data, Atmospheric        Environment 18:345-352.

Jaklevic, J.M. R.C. Gatti, F.S Goulding, B.W. Loo and A.C. Thompson (1981) Aerosol Analysis      for the Regional Air Pollution Study- Final Report EP A-600/4-81-006, U.S.
            Environmental Protection Agency, Research Triangle Park, N.C.

Karl, T.R. (1980) A Study on the Spatial Variability of Ozone and Other Pollutants at St. Louis,
            Missouri, Atmospheric Environment 14:681-694.

Liu, C.K., B.A. Roscoe, K.G. Severin and P.K. Hopke (1982) The Application of Factor Analysis    to Source Apportionment of Aerosol Mass, Am. Ind. Hyg. Assoc. 43:314-318.

No comments:

Post a Comment