Brian Diefendorf
Honors Thesis Proposal
Source Apportionment through Advanced Factor Analysis Modeling
The field
of aerosol chemistry is very diverse and extensive. The goal of this project is
to analyze several sets of particle composition data such as data from the
Regional Air Pollution Study of St. Louis using Positive Matrix Factorization.
With this approach, we seek to resolve and distinguish the sources contributing
to the measured particulate matter concentration at each site. If successful, a
greater understanding of source contributions will be gained as well as a more
accurate way to control particulate matter in the atmosphere.
The first
National Ambient Air Quality Standards were established in 1971. These
standards set limits on allowable levels of particulate matter suspended in the
air. However, these standards only focused on total particulate mass and did
not distinguish between the different sizes of particles present in air. As
such, control efforts were directed towards the larger particles, as they
comprise a larger proportion of the total particulate mass and are more easily
removed from the air. There is increasing evidence that the smaller particles
pose a greater threat to human health. These risks are mostly the result of
respiratory problems that arise from the continued inhalation of the
particulate matter. Thus, it is important to control and regulate the inhalable
particles, those with diameters ten mm or less. The current NAAQS for 10mm or less diameter particles
is an annual arithmetic mean of 50 mm /m3 and a 24-hour average of 150 mm /m3. Standards
for particles in the 2.5 mm or less
range have also been established. The standard calls for an annual mean of 15 mm /m3 and a
24-hour average of 65 mm /m3.
The measurement and tracking of the relative emission rates from the sources
would enhance the controls developed from these standards. Nonetheless, the
complexity of urban ecosystems makes determining the sources of particulate
matter a very difficult problem.
The
fundamental principle behind source receptor relationships is the conservation
of mass and the application of a mass balance analysis. This analysis can be
used to identify and assign sources of airborne particulate matter in the
atmosphere. This particular method is known as receptor modeling. There are two
approaches to obtaining a data set for receptor modeling. One is to determine a
large number of chemical constituents such as elemental concentrations in a
number of samples. The other is to use automated electron microscopy to
characterize the composition and shape of particles in a series of particle
samples. Regardless of the approach used, a mass balance can be made to account
for all chemical species in the samples as contributions from the independent
sources.
Natural
physical constraints exist on the system and must be considered in developing
any model and obtaining physically realistic solutions from the model [Henry,
1991]. These fundamental, natural constraints are:
1)
The model must reproduce the original data; the model
must explain the observations.
2)
The predicted source compositions must be
non-negative; a source cannot have a negative percentage of an element.
3)
The predicted source contributions to the aerosol
concentrations must all be non-negative; a source cannot emit negative mass.
4)
The sum of the predicted elemental mass contributions
for each source must be less than or equal to the total measured mass for each
element; the whole is greater than or equal to the sum of its parts.
There are
several methods utilized to model such problems. Miller et al. (1972) initially
used a chemical element balance analysis to solve these problems. In this
method, it is assumed that the number and composition of the sources are known.
The observed composition data is then regressed against the known source
profile matrix. This method has produced very good fits to the data in recent
studies. The major drawback in this method is that it requires an a priori knowledge
of both the number and composition of the source emissions.
Thus, it is
necessary to estimate the number and composition of the sources as well as
their contributions to the measured particulate mass. The multivariate data
analysis methods that are used are referred to as factor analysis. The most common
form of factor analysis is Principal Components Analysis (PCA). The PCA results
are usually calculated using an eigenvector analysis of a correlation matrix
[Hopke, 1985; Henry, 1991]. The PCA method utilizes a singular value
decomposition of the matrix. However there are numerous problems that arise
from using the PCA method. Paatero and Tapper [1993] show that in PCA, there is
a scaling of the data by column or row. This scaling will lead to distortions
in the analysis. They also show that the optimum scaling of the data would be
to scale each data point individually so as to have more precise data, having
greater influence on the solution than points with higher uncertainties.
However, point by point scaling results in a data matrix that cannot be reproduced
by a conventional factor analysis method based on the singular value
decomposition. Therefore, it is necessary to use a different form of factor
analysis.
Recently, a
new technique known as Positive Matrix Factorization has been developed. Positive
Matrix Factorization (PMF) differs from previous analysis methods in that all
other methods are eigenvector based and the problem of non-optimal scaling has
been specifically addressed in the PMF method. PMF utilizes error estimates of
the data to provide optimum data point scaling. This scaling is accomplished
through considering the problem as a least-squares problem. Initially the
problem was solved iteratively using alternating least squares [Paatero and
Tapper, 1993]. In an early version of this approach, one of the matrices is
taken as known and the chi-squared is minimized with respect to the other
matrix as a weighted linear-least-squares problem. Then the roles of the
matrices are reversed and the process is repeated. This reversal is continued until
convergence. The drawback to solving PMF in this fashion is that the process is
slow. In order to improve speed, each step in the iteration was changed so that
modifications are made to both matrices instead of only one [Paatero and
Tapper, 1994]. Subsequently, a computer program, PMF2, determines the joint
solution.
Now that
the problem and the data analysis methods utilized have been identified, it is
necessary to determine a data set to examine PMF's analysis power. Data from
the Regional Air Pollution Study (RAPS) of St. Louis, MO makes an excellent
choice for PMF analysis. From May 1975 to Apri11977, roughly 35,000 ambient
aerosol samples were collected at 10 sampling sites in and around the city of
St. Louis (Goulding et al., 1981). Coarse and fine fractions of particles were
deposited on membrane filters utilizing dichotomous air samplers (Nelson,
1979). Total mass of samples was measured by B-gauge measurements and for
concentrations of up to 27 elements by utilizing an energy dispersive X-ray fluorescence
analysis (Goulding et al., 1981). This data is not only robust in samples, but
it has been examined using numerous analysis methods and thus provides the
perfect opportunity to compare PMF's prove its accuracy and precision to that
of other methods.
It is
likely that additional data from Washington, DC with more complete elemental
analysis will also be examined as time permits. These data include organic and
elemental carbon that was not measured in St. Louis.
Recent unpublished studies have suggested that these
variables can be used to separate diesel from spark ignition sources, which is
a critical problem currently facing receptor modeling. The Washington data will
provide a good test to see if such an analysis can be duplicated elsewhere.
References:
Alpert,
D.J and P.K. Hopke (1981) A Determination of the Sources of Airborne Particles Collected During the Regional Air Pollution
Study, Atmospheric Environment 15:675- 687.
Chang,
S.N, P .K. Hopke, G.E. Gordon and S. W .Rheingrover (1988)
Target-Transformation Factor
Analysis of Airborne Particulate Samples Selected by Wind-Trajectory Analysis, Aerosol Sci. Technol. 8:63-80.
Cobourn,
W .G., and R.B. Husor (1982) Diurnal and Seasonal Pattems.of Particulate Sulfur
and Sulfuric Acid in St. Louis, July
1977- June 1978, Atmospheric Environment 16:1441- 1450.
Dzubay,
T.G. (1980) Chemical Element Balance Method Applied to Dichotomous Sampler
Data, New York Academy of Sciences 338:126-144.
Goulding,
F .S, J.M. Jak1evic and B. W .Loo (1978) Aerosol Analysis for the Regional Air Pollution Study-Interim Report.
EPA-600/4-78-034, U.S. Environmental Protection Agency, Research Triangle Park, N.C.
Henry,
R.C. (1991) Multivariate Receptor Models, In: Receptor Modeling for Air
Quality Management, P .K. Hopke, ed.,
Elsevier Science Publishers, Amsterdam, 117-147. Hopke, P .K. (2000) A Guide to Positive Matrix Factorization
Hopke,
P .K. (1985) Receptor Modeling in Environmental Chemistry, John Wiley
& Sons, Inc., New York.
Hwang,
C.S., K.G. Severin, and P.K. Hopke (1984) A comparison of R- and Q- Modes in
target Transformation Factor
Analysis for Resolving Environmental Data, Atmospheric Environment 18:345-352.
Jaklevic,
J.M. R.C. Gatti, F.S Goulding, B.W. Loo and A.C. Thompson (1981) Aerosol
Analysis for the Regional Air
Pollution Study- Final Report EP A-600/4-81-006, U.S.
Environmental Protection Agency,
Research Triangle Park, N.C.
Karl,
T.R. (1980) A Study on the Spatial Variability of Ozone and Other Pollutants at
St. Louis,
Missouri, Atmospheric Environment
14:681-694.
Liu,
C.K., B.A. Roscoe, K.G. Severin and P.K. Hopke (1982) The Application of Factor
Analysis to Source Apportionment of
Aerosol Mass, Am. Ind. Hyg. Assoc. 43:314-318.
No comments:
Post a Comment